//

An OpenStack Crime Story solved by tcpdump, sysdig, and iostat – Episode 2

15.9.2014 | 5 minutes of reading time

Previously on OpenStack Crime Investigation. I was called to a crime scene; our OpenStack based private cloud for CenterDevice. Somebody or something was causing our virtual load balancers to flap their highly available IP address. tcpdump showed me that there were time gaps between the keepalive VRRP packets. And these gaps originated on the first hop, the master load balancer.

After discovering the culprit might be hiding inside the virtual machine, I fetched my assistant sysdig  and we ssh’ed into loadblancer01. Together, we started to ask questions; very intense, uncomfortable questions to finally get to the bottom of this.

sysdig is a relatively new tool that basically traces all syscalls and uses Lua scripts – called chisels – to derive conclusions about the captured sequence of syscalls. In general, it is like tcdump for syscalls. Since every process that wants to do anything useful has to use syscalls, a very detailed view of your system’s behavior evolves. sysdig is developing into a serious swiss army knife and you should give it a try; especially, read these three blog post to learn about sysdig’s capabilities which convinced me.

sysdig being the good detective assistant went to work immediately and kept a transcript about all questions and answers in a file.

1$ sysdig -w transcript
2

I was especially interested in the sendmsg syscall that actually sends network packets issued by the keepalived process.

After a syscall is invoked by a process, the process is paused until the syscall has been finished by the kernel and only then the execution of the process is resumed.

I was interested in the invocation, that is the event direction from user process to kernel.

1$ sysdig -r transcript proc.name=keepalived and evt.type=sendmsg and evt.dir='>'
2

That stirred up quite some evidence. But I wanted to find the time gaps larger than one second. I drew Lua from my concealed holster inside my jacket and fired a quick chisel right between two sendmsg syscalls:

1$ sysdig -r transcript -c find_time_gap proc.name=keepalived and evt.type=sendmsg and evt.dir='>'
2370964 17:01:26.375374738 (5.03543187) keepalived > sendmsg fd=13(192.168.205.8->224.0.0.18) size=40 tuple=0.0.0.0:112->224.0.0.18:0
3

Got you! A time gap of more than five seconds turned up. I inspected the events closely looking for event number 370964.

1$ sysdig -r transcript proc.name=keepalived and evt.type=sendmsg and evt.dir='>' | grep -C 1 370964
2368250 17:01:21.339942868 0 keepalived (10400) > sendmsg fd=13(192.168.205.8->224.0.0.18) size=40 tuple=0.0.0.0:112->224.0.0.18:0
3370964 17:01:26.375374738 1 keepalived (10400) > sendmsg fd=13(192.168.205.8->224.0.0.18) size=40 tuple=0.0.0.0:112->224.0.0.18:0
4371446 17:01:26.377770247 1 keepalived (10400) > sendmsg fd=13(192.168.205.8->224.0.0.18) size=40 tuple=0.0.0.0:112->224.0.0.18:0
5

There, in lines 1 and 2. There it was. A gap much larger than one second. Now it was clear. The perpetrator was hiding inside the virtual machine causing lags in the processing of keepalived’s sendmsg. But who was it?

After some ruling out of innocent bystanders, I found this:

1$ sysdig -r transcript | tail -n +368250 | grep -v "keepalived\|haproxy\|zabbix"
2369051 17:01:23.621071175 3  (0) > switch next=7 pgft_maj=0 pgft_min=0 vm_size=0 vm_rss=0 vm_swap=0
3369052 17:01:23.621077045 3  (7) > switch next=0 pgft_maj=0 pgft_min=0 vm_size=0 vm_rss=0 vm_swap=0
4369053 17:01:23.625105578 3  (0) > switch next=7 pgft_maj=0 pgft_min=0 vm_size=0 vm_rss=0 vm_swap=0
5369054 17:01:23.625112785 3  (7) > switch next=0 pgft_maj=0 pgft_min=0 vm_size=0 vm_rss=0 vm_swap=0
6369055 17:01:23.628568892 3  (0) > switch next=25978(sysdig) pgft_maj=0 pgft_min=0 vm_size=0 vm_rss=0 vm_swap=0
7369056 17:01:23.628597921 3 sysdig (25978) > switch next=0 pgft_maj=0 pgft_min=2143 vm_size=99684 vm_rss=6772 vm_swap=0
8369057 17:01:23.629068124 3  (0) > switch next=7 pgft_maj=0 pgft_min=0 vm_size=0 vm_rss=0 vm_swap=0
9369058 17:01:23.629073516 3  (7) > switch next=0 pgft_maj=0 pgft_min=0 vm_size=0 vm_rss=0 vm_swap=0
10369059 17:01:23.633104746 3  (0) > switch next=7 pgft_maj=0 pgft_min=0 vm_size=0 vm_rss=0 vm_swap=0
11369060 17:01:23.633110185 3  (7) > switch next=0 pgft_maj=0 pgft_min=0 vm_size=0 vm_rss=0 vm_swap=0
12369061 17:01:23.637061129 3  (0) > switch next=7 pgft_maj=0 pgft_min=0 vm_size=0 vm_rss=0 vm_swap=0
13369062 17:01:23.637065648 3  (7) > switch next=0 pgft_maj=0 pgft_min=0 vm_size=0 vm_rss=0 vm_swap=0
14369063 17:01:23.641104396 3  (0) > switch next=7 pgft_maj=0 pgft_min=0 vm_size=0 vm_rss=0 vm_swap=0
15369064 17:01:23.641109883 3  (7) > switch next=0 pgft_maj=0 pgft_min=0 vm_size=0 vm_rss=0 vm_swap=0
16369065 17:01:23.645069607 3  (0) > switch next=7 pgft_maj=0 pgft_min=0 vm_size=0 vm_rss=0 vm_swap=0
17369066 17:01:23.645075551 3  (7) > switch next=0 pgft_maj=0 pgft_min=0 vm_size=0 vm_rss=0 vm_swap=0
18369067 17:01:23.649071836 3  (0) > switch next=7 pgft_maj=0 pgft_min=0 vm_size=0 vm_rss=0 vm_swap=0
19369068 17:01:23.649077700 3  (7) > switch next=0 pgft_maj=0 pgft_min=0 vm_size=0 vm_rss=0 vm_swap=0
20369069 17:01:23.653073066 3  (0) > switch next=7 pgft_maj=0 pgft_min=0 vm_size=0 vm_rss=0 vm_swap=0
21369070 17:01:23.653078791 3  (7) > switch next=0 pgft_maj=0 pgft_min=0 vm_size=0 vm_rss=0 vm_swap=0
22369071 17:01:23.657069807 3  (0) > switch next=7 pgft_maj=0 pgft_min=0 vm_size=0 vm_rss=0 vm_swap=0
23369072 17:01:23.657075525 3  (7) > switch next=0 pgft_maj=0 pgft_min=0 vm_size=0 vm_rss=0 vm_swap=0
24369073 17:01:23.658678681 3  (0) > switch next=25978(sysdig) pgft_maj=0 pgft_min=0 vm_size=0 vm_rss=0 vm_swap=0
25369074 17:01:23.658698453 3 sysdig (25978) > switch next=0 pgft_maj=0 pgft_min=2143 vm_size=99684 vm_rss=6772 vm_swap=0
26

The processes masked behind pids 0 and 7 were consuming the virtual CPU. 0 and 7? These low process ids indicate kernel threads! I double checked  with sysdig’s supervisor, just to be sure. I was right.

The kernel, the perpetrator? This hard working, ever reliable guy? No, I could not believe it. There must be more. And by the way, the virtual machine was not under load. So what could the virtual machine’s kernel be doing … except … it was not doing anything at all? Maybe it was waiting for IO! Could that be? Since all hardware of a virtual machine is, well, virtual, the IO wait could only be on the baremetal level.

I left loadbalancer01 for its bare metal node01. On my way, I was trying to wrap my head around what was happening. What is this all about? So many proxies, so many things happening in the shadows. So many false suspects. This is not a regular case. There is something much more vicious happening behind the scene. I must find it …

Stay tuned for our all new, next and final episode of OpenStack Crime Investigation tomorrow on codecentric Blog.

share post

Likes

0

//

More articles in this subject area\n

Discover exciting further topics and let the codecentric world inspire you.

//

Gemeinsam bessere Projekte umsetzen

Wir helfen Deinem Unternehmen

Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.

Hilf uns, noch besser zu werden.

Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.