mirror of
https://github.com/brendangregg/perf-tools.git
synced 2025-11-30 23:16:03 +07:00
154 lines
6.7 KiB
Plaintext
154 lines
6.7 KiB
Plaintext
Demonstrations of execsnoop, the Linux ftrace version.
|
|
|
|
|
|
Here's execsnoop showing what's really executed by "man ls":
|
|
|
|
# ./execsnoop
|
|
Tracing exec()s. Ctrl-C to end.
|
|
PID PPID ARGS
|
|
22898 22004 man ls
|
|
22905 22898 preconv -e UTF-8
|
|
22908 22898 pager -s
|
|
22907 22898 nroff -mandoc -rLL=164n -rLT=164n -Tutf8
|
|
22906 22898 tbl
|
|
22911 22910 locale charmap
|
|
22912 22907 groff -mtty-char -Tutf8 -mandoc -rLL=164n -rLT=164n
|
|
22913 22912 troff -mtty-char -mandoc -rLL=164n -rLT=164n -Tutf8
|
|
22914 22912 grotty
|
|
|
|
Many commands. This is particularly useful for understanding application
|
|
startup.
|
|
|
|
|
|
Another use for execsnoop is identifying short-lived processes. Eg, with the -t
|
|
option to see timestamps:
|
|
|
|
# ./execsnoop -t
|
|
Tracing exec()s. Ctrl-C to end.
|
|
TIMEs PID PPID ARGS
|
|
7419756.154031 8185 8181 mawk -W interactive -v o=1 -v opt_name=0 -v name= [...]
|
|
7419756.154131 8186 8184 cat -v trace_pipe
|
|
7419756.245264 8188 1698 ./run
|
|
7419756.245691 8189 1696 ./run
|
|
7419756.246212 8187 1689 ./run
|
|
7419756.278993 8190 1693 ./run
|
|
7419756.278996 8191 1692 ./run
|
|
7419756.288430 8192 1695 ./run
|
|
7419756.290115 8193 1691 ./run
|
|
7419756.292406 8194 1699 ./run
|
|
7419756.293986 8195 1690 ./run
|
|
7419756.294149 8196 1686 ./run
|
|
7419756.296527 8197 1687 ./run
|
|
7419756.296973 8198 1697 ./run
|
|
7419756.298356 8200 1685 ./run
|
|
7419756.298683 8199 1688 ./run
|
|
7419757.269883 8201 1696 ./run
|
|
[...]
|
|
|
|
So we're running many "run" commands every second. The PPID is included, so I
|
|
can debug this further (they are "supervise" processes).
|
|
|
|
Short-lived processes can consume CPU and not be visible from top(1), and can
|
|
be the source of hidden performance issues.
|
|
|
|
|
|
Here's another example: I noticed CPU usage was high in top(1), but couldn't
|
|
see the responsible process:
|
|
|
|
$ top
|
|
top - 00:04:32 up 78 days, 15:41, 3 users, load average: 0.85, 0.29, 0.14
|
|
Tasks: 123 total, 1 running, 121 sleeping, 0 stopped, 1 zombie
|
|
Cpu(s): 15.7%us, 34.9%sy, 0.0%ni, 49.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.2%st
|
|
Mem: 7629464k total, 7537216k used, 92248k free, 1376492k buffers
|
|
Swap: 0k total, 0k used, 0k free, 5432356k cached
|
|
|
|
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
|
|
7225 bgregg-t 20 0 29480 6196 2128 S 3 0.1 0:02.64 ec2rotatelogs
|
|
1 root 20 0 24320 2256 1340 S 0 0.0 0:01.23 init
|
|
2 root 20 0 0 0 0 S 0 0.0 0:00.00 kthreadd
|
|
3 root 20 0 0 0 0 S 0 0.0 1:19.61 ksoftirqd/0
|
|
4 root 20 0 0 0 0 S 0 0.0 0:00.00 kworker/0:0
|
|
5 root 20 0 0 0 0 S 0 0.0 0:00.01 kworker/u:0
|
|
6 root RT 0 0 0 0 S 0 0.0 0:16.00 migration/0
|
|
7 root RT 0 0 0 0 S 0 0.0 0:17.29 watchdog/0
|
|
8 root RT 0 0 0 0 S 0 0.0 0:15.85 migration/1
|
|
9 root 20 0 0 0 0 S 0 0.0 0:00.00 kworker/1:0
|
|
[...]
|
|
|
|
See the line starting with "Cpu(s):". So there's about 50% CPU utilized (this
|
|
is a two CPU server, so that's equivalent to one full CPU), but this CPU usage
|
|
isn't visible from the process listing.
|
|
|
|
vmstat agreed, showing the same average CPU usage statistics:
|
|
|
|
# vmstat 1
|
|
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
|
|
r b swpd free buff cache si so bi bo in cs us sy id wa
|
|
2 0 0 92816 1376476 5432188 0 0 0 3 2 1 0 1 99 0
|
|
1 0 0 92676 1376484 5432264 0 0 0 24 6573 6130 12 38 49 0
|
|
1 0 0 91964 1376484 5432272 0 0 0 0 6529 6097 16 35 49 0
|
|
1 0 0 92692 1376484 5432272 0 0 0 0 6192 5775 17 35 49 0
|
|
1 0 0 92692 1376484 5432272 0 0 0 0 6554 6121 14 36 50 0
|
|
1 0 0 91940 1376484 5432272 0 0 0 12 6546 6101 13 38 49 0
|
|
1 0 0 92560 1376484 5432272 0 0 0 0 6201 5769 15 35 49 0
|
|
1 0 0 92676 1376484 5432272 0 0 0 0 6524 6123 17 34 49 0
|
|
1 0 0 91932 1376484 5432272 0 0 0 0 6546 6107 10 40 49 0
|
|
1 0 0 92832 1376484 5432272 0 0 0 0 6057 5710 13 38 49 0
|
|
1 0 0 92248 1376484 5432272 0 0 84 28 6592 6183 16 36 48 1
|
|
1 0 0 91504 1376492 5432348 0 0 0 12 6540 6098 18 33 49 1
|
|
[...]
|
|
|
|
So this could be caused by short-lived processes, who vanish before they are
|
|
seen by top(1). Do I have my execsnoop handy? Yes:
|
|
|
|
# ~/perf-tools/bin/execsnoop
|
|
Tracing exec()s. Ctrl-C to end.
|
|
PID PPID ARGS
|
|
10239 10229 gawk -v o=0 -v opt_name=0 -v name= -v opt_duration=0 [...]
|
|
10240 10238 cat -v trace_pipe
|
|
10242 7225 sh [?]
|
|
10243 10242 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.201201.3122.txt
|
|
10245 7225 sh [?]
|
|
10246 10245 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.202201.3122.txt
|
|
10248 7225 sh [?]
|
|
10249 10248 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.203201.3122.txt
|
|
10251 7225 sh [?]
|
|
10252 10251 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.204201.3122.txt
|
|
10254 7225 sh [?]
|
|
10255 10254 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.205201.3122.txt
|
|
10257 7225 sh [?]
|
|
10258 10257 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.210201.3122.txt
|
|
10260 7225 sh [?]
|
|
10261 10260 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.211201.3122.txt
|
|
10263 7225 sh [?]
|
|
10264 10263 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.212201.3122.txt
|
|
10266 7225 sh [?]
|
|
10267 10266 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.213201.3122.txt
|
|
[...]
|
|
|
|
The output scrolled quickly, showing that many shell and lsof processes were
|
|
being launched. If you check the PID and PPID columns carefully, you can see that
|
|
these are ultimately all from PID 7225. We saw that earlier in the top output:
|
|
ec2rotatelogs, at 3% CPU. I now know the culprit.
|
|
|
|
I should have used "-t" to show the timestamps with this example.
|
|
|
|
|
|
Run -h to print the USAGE message:
|
|
|
|
# ./execsnoop -h
|
|
USAGE: execsnoop [-hrt] [-a argc] [-d secs] [name]
|
|
-d seconds # trace duration, and use buffers
|
|
-a argc # max args to show (default 8)
|
|
-r # include re-execs
|
|
-t # include time (seconds)
|
|
-h # this usage message
|
|
name # process name to match (REs allowed)
|
|
eg,
|
|
execsnoop # watch exec()s live (unbuffered)
|
|
execsnoop -d 1 # trace 1 sec (buffered)
|
|
execsnoop grep # trace process names containing grep
|
|
execsnoop 'log$' # filenames ending in "log"
|
|
|
|
See the man page and example file for more info.
|