Difference between revisions of "Strace: what a process does"
imported>ThorstenStaerk |
imported>ThorstenStaerk |
||
Line 4: | Line 4: | ||
You can start a new process with strace or you can attach strace to an already running process. To start a new process just put strace in front of its command: | You can start a new process with strace or you can attach strace to an already running process. To start a new process just put strace in front of its command: | ||
strace <abbr title="print the first 99 characters of every output string">-s 99</abbr> <abbr title="follow child processes">-ff</abbr> <abbr title="command that you want to execute and trace">ls</abbr> | strace <abbr title="print the first 99 characters of every output string">-s 99</abbr> <abbr title="follow child processes">-ff</abbr> <abbr title="command that you want to execute and trace">ls</abbr> | ||
+ | To attach strace to an already running process, in this example firefox, find out the process' id: | ||
+ | # ps -A | grep firefox | ||
+ | 2728 pts/1 00:00:02 firefox | ||
+ | Then call strace with the id as parameter: | ||
+ | strace <abbr title="print the first 99 characters of every output string">-s 99</abbr> <abbr title="follow child processes">-ff</abbr><abbr title="process id">p</abbr> 2728 | ||
clock_gettime(CLOCK_MONOTONIC, {4433, 764347636}) = 0 | clock_gettime(CLOCK_MONOTONIC, {4433, 764347636}) = 0 |
Revision as of 17:28, 24 November 2014
So you have a process that is crashing or misbehaving and you found the tool strace to tell you what the process does. However you don't understand its output? Then this article is for you.
starting strace
You can start a new process with strace or you can attach strace to an already running process. To start a new process just put strace in front of its command:
strace -s 99 -ff ls
To attach strace to an already running process, in this example firefox, find out the process' id:
# ps -A | grep firefox 2728 pts/1 00:00:02 firefox
Then call strace with the id as parameter:
strace -s 99 -ffp 2728
clock_gettime(CLOCK_MONOTONIC, {4433, 764347636}) = 0 poll([{fd=12, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 1 ([{fd=12, revents=POLLIN|POLLRDNORM}]) recv(12, "\37\36&\364T<\264\6Og\20\2\30%-\31C\3757\350{\260\0203\351\23&O\365(y\212"..., 16384, 0) = 1448
analysis
Every line follows the syntax
syscall(arguments) = return value
The meaning of the syscall, its expected arguments and its return value can be determined from its man page. For example to understand the first line call man 2 clock_gettime. It retrieves the current time.
Wow, that was easy. Now to the next line, poll. According to man 2 poll it waits for an event on a file descriptor, like "data to be read". The file descriptor is the first argument, 12. What's this? Let's assume the process ID we are tracing is 7179. Then we can find out its file descriptors like this:
/proc/7179/fd # ll total 0 lr-x------ 1 root root 64 Apr 19 08:02 0 -> /dev/null lrwx------ 1 root root 64 Apr 19 08:02 1 -> socket:[156171] l-wx------ 1 root root 64 Apr 19 08:02 2 -> /var/log/apache2/error_log lr-x------ 1 root root 64 Apr 19 08:02 4 -> pipe:[156208] l-wx------ 1 root root 64 Apr 19 08:02 5 -> pipe:[156208] l-wx------ 1 root root 64 Apr 19 08:02 6 -> /var/log/apache2/access_log lrwx------ 1 root root 64 Apr 19 08:02 7 -> anon_inode:[eventpoll] lrwx------ 1 root root 64 Apr 19 08:02 12 -> socket:[155097]
Ok, to find out what socket:[155097] is we use the command
- lsof | grep 155097
httpd2-pr 7158 root 3u IPv4 155097 0t0 TCP *:http (LISTEN) httpd2-pr 7179 wwwrun 3u IPv4 155097 0t0 TCP *:http (LISTEN)
So this is a socket listening to the http port.
Search for a syscall
With strace, you can search for a special syscall. For example,
strace -e open command
will show you all files that have been touched (opened) by command. This can be used e.g. to find out where configuration changes are stored.
Performance analysis
With strace, you can find out the biggest time-consuming syscalls during a program run:
strace -c ls -R Entries Repository Root % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 67.09 0.412153 14 29664 getdents64 27.70 0.170168 11 14849 14 open 4.24 0.026043 0 123740 write 0.72 0.004443 0 14837 close 0.20 0.001204 0 14836 fstat 0.05 0.000285 285 1 execve 0.00 0.000000 0 12 read 0.00 0.000000 0 4 3 stat 0.00 0.000000 0 33 mmap 0.00 0.000000 0 18 mprotect 0.00 0.000000 0 4 munmap 0.00 0.000000 0 12 brk 0.00 0.000000 0 2 rt_sigaction 0.00 0.000000 0 1 rt_sigprocmask 0.00 0.000000 0 2 ioctl 0.00 0.000000 0 1 1 access 0.00 0.000000 0 3 mremap 0.00 0.000000 0 1 fcntl 0.00 0.000000 0 1 getrlimit 0.00 0.000000 0 1 statfs 0.00 0.000000 0 1 arch_prctl 0.00 0.000000 0 3 1 futex 0.00 0.000000 0 1 set_tid_address 0.00 0.000000 0 8 fadvise64 0.00 0.000000 0 1 set_robust_list ------ ----------- ----------- --------- --------- ---------------- 100.00 0.614296 198036 19 total
Now you go
man 2 getdents64
to find out what that syscall is about.