Strace is a useful diagnostic, instructional, and debugging tool. System administrators, diagnosticians and trouble-shooters will find it invaluable for solving problems with programs for which the source is not readily available since they do not need to be recompiled in order to trace them. Students, hackers and the overly-curious will find that a great deal can be learned about a system and its system calls by tracing even ordinary programs. And programmers will find that since system calls and signals are events that happen at the user/kernel interface, a close examination of this boundary is very useful for bug isolation, sanity checking and attempting to capture race conditions.
Debugging why some process isn’t connecting to a remote server can be exceedingly frustrating. DNS can fail, connect can hang, the server might send something unexpected back etc. You can use tcpdump to analyze a lot of that, and that too is a very nice tool, but a lot of the time strace will give you less chatter, simply because it will only ever return data related to the syscalls generated by “your” process. If you’re trying to figure out what one of hundreds of running processes connecting to the same database server does for example (where picking out the right connection with tcpdump is a nightmare), strace makes life a lot easier.
This is an example of a trace of “nc” connecting to www.news.com on port 80 without any problems:
$ strace -e poll,select,connect,recvfrom,sendto nc www.news.com 80
sendto(3, “24 26 13255373NH ”, 20, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 20
connect(3, {sa_family=AF_FILE, path=”/var/run/nscd/socket”}, 110) = -1 ENOENT (No such file or directory)
connect(3, {sa_family=AF_FILE, path=”/var/run/nscd/socket”}, 110) = -1 ENOENT (No such file or directory)
connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr(“62.30.112.39”)}, 28) = 0
poll([{fd=3, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1
sendto(3, “2133211 1 3www4news3com 34 1”, 30, MSG_NOSIGNAL, NULL, 0) = 30
poll([{fd=3, events=POLLIN, revents=POLLIN}], 1, 5000) = 1
recvfrom(3, “213321201200 1 1 1 3www4news3com 34 1300f”…, 1024, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr(“62.30.112.39”)}, [16]) = 153
connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr(“62.30.112.39”)}, 28) = 0
poll([{fd=3, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1
sendto(3, “k3741 1 3www4news3com 1 1”, 30, MSG_NOSIGNAL, NULL, 0) = 30
poll([{fd=3, events=POLLIN, revents=POLLIN}], 1, 5000) = 1
recvfrom(3, “k374201200 1 2 3www4news3com 1 1300f”…, 1024, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr(“62.30.112.39”)}, [16]) = 106
connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr(“62.30.112.39”)}, 28) = 0
poll([{fd=3, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1
sendto(3, “21 1 3www4news3com 1 1”, 30, MSG_NOSIGNAL, NULL, 0) = 30
poll([{fd=3, events=POLLIN, revents=POLLIN}], 1, 5000) = 1
recvfrom(3, “2201200 1 2 3www4news3com 1 1300f”…, 1024, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr(“62.30.112.39”)}, [16]) = 106
connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr(“216.239.122.102”)}, 16) = -1 EINPROGRESS (Operation now in progress)
select(4, NULL, [3], NULL, NULL) = 1 (out [3])
So what happens here?
Notice the connection attempts to /var/run/nscd/socket? They mean nc first tries to connect to NSCD – the Name Service Cache Daemon – which is usually used in setups that rely on NIS, YP, LDAP or similar directory protocols for name lookups. In this case the connects fails.
It then moves on to DNS (DNS is port 53, hence the “sin_port=htons(53)” in the following connect. You can see it then does a “sendto()” call, sending a DNS packet that contains www.news.com. It then reads back a packet. For whatever reason it tries three times, the last with a slightly different request. My best guess why in this case is that www.news.com is a CNAME (an “alias”), and the multiple requests may just be an artifact of how nc deals with that.
Then in the end, it finally issues a connect() to the IP it found. Notice it returns EINPROGRESS. That means the connect was non-blocking – nc wants to go on processing. It then calls select(), which succeeds when the connection was successful.
Try adding “read” and “write” to the list of syscalls given to strace and enter a string when connected, and you’ll get something like this:
read(0, “testn”, 1024) = 5
write(3, “testn”, 5) = 5
poll([{fd=3, events=POLLIN, revents=POLLIN}, {fd=0, events=POLLIN}], 2, -1) = 1
read(3, ”
This shows it reading “test” + linefeed from standard in, and writing it back out to the network connection, then calling poll() to wait for a reply, reading the reply from the network connection and writing it to standard out. Everything seems to be working right.
You can also run strace “start to finish”, here with “ls”:
root@slxvm:~# strace -c >/dev/null ls
% time seconds usecs/call calls errors syscall
—— ———– ———– ——— ——— —————-
-nan 0.000000 0 9 read
-nan 0.000000 0 1 write
-nan 0.000000 0 10 open
-nan 0.000000 0 13 close
-nan 0.000000 0 11 fstat
-nan 0.000000 0 26 mmap
-nan 0.000000 0 16 mprotect
-nan 0.000000 0 3 munmap
-nan 0.000000 0 3 brk
-nan 0.000000 0 2 rt_sigaction
-nan 0.000000 0 1 rt_sigprocmask
-nan 0.000000 0 3 3 ioctl
-nan 0.000000 0 9 9 access
-nan 0.000000 0 1 execve
-nan 0.000000 0 2 getdents
-nan 0.000000 0 1 getrlimit
-nan 0.000000 0 1 statfs
-nan 0.000000 0 1 arch_prctl
-nan 0.000000 0 1 1 futex
-nan 0.000000 0 1 set_tid_address
-nan 0.000000 0 1 openat
-nan 0.000000 0 1 set_robust_list
—— ———– ———– ——— ——— —————-
100.00 0.000000 117 13 total
root@slxvm:~#
root@slxvm:~# cat /proc/3727/status
Name: VBoxHeadless
State: S (sleeping)
Tgid: 3727
Pid: 3727
PPid: 1808
TracerPid: 0
Uid: 1002 1002 1002 1002
Gid: 130 130 130 130
FDSize: 64
Groups: 130 1002
VmPeak: 2314600 kB
VmSize: 2314436 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 187508 kB
VmRSS: 185936 kB
VmData: 1975372 kB
VmStk: 136 kB
VmExe: 24 kB
VmLib: 29396 kB
VmPTE: 888 kB
VmSwap: 0 kB
Threads: 37
SigQ: 5/63679
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000002000
SigIgn: 0000000000001000
SigCgt: 1000000180012000
CapInh: 0000000000000000
CapPrm: 0000000000002000
CapEff: 0000000000002000
CapBnd: 0000001fffffffff
Seccomp: 0
Cpus_allowed: ffffffff
Cpus_allowed_list: 0-31
Mems_allowed: 00000000,00000001
Mems_allowed_list: 0
voluntary_ctxt_switches: 23843
nonvoluntary_ctxt_switches: 948
root@slxvm:~# strace -c -p 3727
Process 3727 attached – interrupt to quit
^CProcess 3727 detached
% time seconds usecs/call calls errors syscall
—— ———– ———– ——— ——— —————-
100.00 0.188000 94000 2 select
0.00 0.000000 0 2 read
—— ———– ———– ——— ——— —————-
100.00 0.188000 4 total
root@slxvm:~#
root@slxvm:~# strace -p 3727
Process 3727 attached – interrupt to quit
select(10, [9], NULL, [9], NULL) = 1 (in [9])
read(9, “372”, 1) = 1
select(10, [9], NULL, [9], NULL) = 1 (in [9])
futex(0x1e09c5c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x1e09c58, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
sched_yield() = 0
read(9, “372”, 1) = 1
select(10, [9], NULL, [9], NULL) = 1 (in [9])
read(9, “372”, 1) = 1
select(10, [9], NULL, [9], NULL) = 1 (in [9])
read(9, “372”, 1) = 1
select(10, [9], NULL, [9], NULL) = 1 (in [9])
read(9, “372”, 1) = 1
select(10, [9], NULL, [9], NULL^C
Process 3727 detached
root@slxvm:~#