funcgraph

This commit is contained in:
Brendan Gregg
2014-07-29 17:52:46 -07:00
parent e4bf5a8106
commit 09d7b69c55
5 changed files with 1932 additions and 0 deletions

View File

@@ -23,6 +23,7 @@ Using ftrace:
- [opensnoop](opensnoop): trace open() syscalls showing filenames. [Examples](examples/opensnoop_example.txt).
- kernel/[funccount](kernel/funccount): count kernel functions that match a string. [Examples](examples/funccount_example.txt).
- kernel/[functrace](kernel/functrace): trace kernel functions that match a string. [Examples](examples/functrace_example.txt).
- kernel/[funcgraph](kernel/funcgraph): trace kernel function graph showing children and times. [Examples](examples/funcgraph_example.txt).
- kernel/[kprobe](kernel/kprobe): trace a given kprobe definition. [Examples](examples/kprobe_example.txt).
- tools/[reset-ftrace](tools/reset-ftrace): reset ftrace state if needed. [Examples](examples/reset-ftrace_example.txt).
@@ -66,4 +67,6 @@ Some of these tools are intended as short-term workarounds until more kernel cap
As my main target is a fleet of Linux 3.2 servers that do not have debuginfo, these tools try not to require it. At times, this makes the tool more brittle than it needs to be, as I'm employing workarounds (that may be kernel version and platform specific) instead of using debuginfo information (which can be generic). See the man page for detailed prerequisites for each tool.
I've tried to use perf_events ("perf") where possible, since that interface has been developed for multi-user use. For various reasons I've often needed to use ftrace instead. ftrace is suprisingly powerful (thanks Steven Rostedt!), and not all of its features are exposed via perf, or in common usage. This tool collection is in some ways a demonstration of hidden Linux features using ftrace.
Since things are changing, it's very possible you may find some tools don't work on your Linux kernel version. Some expertise and assembly will be required to fix them.

1
bin/funcgraph Symbolic link
View File

@@ -0,0 +1 @@
../kernel/funcgraph

File diff suppressed because it is too large Load Diff

235
kernel/funcgraph Executable file
View File

@@ -0,0 +1,235 @@
#!/bin/bash
#
# funcgraph - trace kernel function graph, showing child function calls.
# Uses Linux ftrace.
#
# This is an exploratory tool that shows the graph of child function calls
# for a given kernel function. This can cost moderate overhead to execute, and
# should only be used to understand kernel behavior for a given function before
# using other, lower overhead tools. This is a proof of concept using Linux
# ftrace capabilities on older kernels.
#
# USAGE: funcgraph [-acDhHPtT] [-p PID] [-d secs] funcstring
#
# Run "funcgraph -h" for full usage.
#
# The output format is the same as the ftrace function graph trace format,
# described in the kernel source under Documentation/trace/ftrace.txt.
# Note that the output may be shuffled when different CPU buffers are read;
# check the CPU column for changes, or include timestamps (-t) and post sort.
#
# The "-d duration" mode leaves the trace data in the kernel buffer, and
# only reads it at the end. If the trace data is large, beware of exhausting
# buffer space (/sys/kernel/debug/tracing/buffer_size_kb) and losing data.
#
# Also beware of feedback loops: tracing tcp* functions over an ssh session,
# or writing ext4* functions to an ext4 file system. For the former, tcp
# trace data could be redirected to a file (as in the usage message). For
# the latter, trace to the screen or a different file system.
#
# WARNING: This uses dynamic tracing of kernel functions, and could cause
# kernel panics or freezes. Test, and know what you are doing, before use.
#
# OVERHEADS: This tool causes moderate to high overheads. Use with caution for
# exploratory purposes, then switch to lower overhead techniques based on
# findings. It's expected that the kernel will run at least 50% slower while
# this tool is running -- even while no output is being generated. This is
# because ALL kernel functions are traced, and filtered based on the function
# of interest. When output is generated, it can generate many lines quickly
# depending on the traced event. Such data will cause performance overheads.
# This also works without buffering by default, printing function events
# as they happen (uses trace_pipe), context switching and consuming CPU to do
# so. If needed, you can try the "-d secs" option, which buffers events
# instead, reducing overhead. If you think the buffer option is losing events,
# try increasing the buffer size (buffer_size_kb).
#
# From perf-tools: https://github.com/brendangregg/perf-tools
#
# COPYRIGHT: Copyright (c) 2014 Brendan Gregg.
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software Foundation,
# Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
#
# (http://www.gnu.org/copyleft/gpl.html)
#
# 12-Jul-2014 Brendan Gregg Created this.
### default variables
tracing=/sys/kernel/debug/tracing
flock=/var/tmp/.ftrace-lock
opt_duration=0; duration=; opt_pid=0; pid=; pidtext=; opt_headers=0
opt_proc=0; opt_time=0; opt_tail=0; opt_nodur=0; opt_cpu=0
trap ':' INT QUIT TERM PIPE HUP # sends execution to end tracing section
function usage {
cat <<-END >&2
USAGE: funcgraph [-acDhHPtT] [-p PID] [-d secs] funcstring
-a # all info (same as -HPt)
-c # measure on-CPU time only
-d seconds # trace duration, and use buffers
-D # do not show function duration
-h # this usage message
-H # include column headers
-p PID # trace when this pid is on-CPU
-P # show process names & PIDs
-t # show timestamps
-T # comment function tails
eg,
funcgraph do_nanosleep # trace do_nanosleep() and children
funcgraph -p 198 vfs_read # trace vfs_read() for PID 198 only
funcgraph -d 1 vfs_read > out # trace 1 sec, then write to file
See the man page and example file for more info.
END
exit
}
function warn {
if ! eval "$@"; then
echo >&2 "WARNING: command failed \"$@\""
fi
}
function end {
# disable tracing
echo 2>/dev/null
echo "Ending tracing..." 2>/dev/null
cd $tracing
(( opt_time )) && warn "echo nofuncgraph-abstime > trace_options"
(( opt_proc )) && warn "echo nofuncgraph-proc > trace_options"
(( opt_tail )) && warn "echo nofuncgraph-tail > trace_options"
(( opt_nodur )) && warn "echo funcgraph-duration > trace_options"
(( opt_cpu )) && warn "echo sleep-time > trace_options"
warn "echo nop > current_tracer"
(( opt_pid )) && warn "echo > set_ftrace_pid"
warn "echo > set_graph_function"
warn "echo > trace"
(( wroteflock )) && warn "rm $flock"
}
function die {
echo >&2 "$@"
exit 1
}
function edie {
# die with a quiet end()
echo >&2 "$@"
exec >/dev/null 2>&1
end
exit 1
}
### process options
while getopts acd:DhHp:PtT opt
do
case $opt in
a) opt_headers=1; opt_proc=1; opt_time=1 ;;
c) opt_cpu=1; ;;
d) opt_duration=1; duration=$OPTARG ;;
D) opt_nodur=1; ;;
p) opt_pid=1; pid=$OPTARG ;;
H) opt_headers=1; ;;
P) opt_proc=1; ;;
t) opt_time=1; ;;
T) opt_tail=1; ;;
h|?) usage ;;
esac
done
shift $(( $OPTIND - 1 ))
### option logic
(( $# == 0 )) && usage
funcs="$1"
(( opt_pid )) && pidtext=" for PID $pid"
if (( opt_duration )); then
echo "Tracing \"$funcs\"$pidtext for $duration seconds..."
else
echo "Tracing \"$funcs\"$pidtext... Ctrl-C to end."
fi
### check permissions
cd $tracing || die "ERROR: accessing tracing. Root user? Kernel has FTRACE?
debugfs mounted? (mount -t debugfs debugfs /sys/kernel/debug)"
### ftrace lock
[[ -e $flock ]] && die "ERROR: ftrace may be in use by PID $(cat $flock) $flock"
echo $$ > $flock || die "ERROR: unable to write $flock."
wroteflock=1
### setup and commence tracing
sysctl -q kernel.ftrace_enabled=1 # doesn't set exit status
read mode < current_tracer
[[ "$mode" != "nop" ]] && edie "ERROR: ftrace active (current_tracer=$mode)"
if (( opt_pid )); then
if ! echo $pid > set_ftrace_pid; then
edie "ERROR: setting -p $pid (PID exist?). Exiting."
fi
fi
if ! echo > set_ftrace_filter; then
edie "ERROR: writing to set_ftrace_filter. Exiting."
fi
if ! echo "$funcs" > set_graph_function; then
edie "ERROR: enabling \"$funcs\". Exiting."
fi
if ! echo function_graph > current_tracer; then
edie "ERROR: setting current_tracer to \"function\". Exiting."
fi
if (( opt_cpu )); then
if ! echo nosleep-time > trace_options; then
edie "ERROR: setting -c (nosleep-time). Exiting."
fi
fi
# the following must be done after setting current_tracer
if (( opt_time )); then
if ! echo funcgraph-abstime > trace_options; then
edie "ERROR: setting -t (funcgraph-abstime). Exiting."
fi
fi
if (( opt_proc )); then
if ! echo funcgraph-proc > trace_options; then
edie "ERROR: setting -P (funcgraph-proc). Exiting."
fi
fi
if (( opt_tail )); then
if ! echo funcgraph-tail > trace_options; then
edie "ERROR: setting -T (funcgraph-tail). Old kernel? Exiting."
fi
fi
if (( opt_nodur )); then
if ! echo nofuncgraph-duration > trace_options; then
edie "ERROR: setting -D (nofuncgraph-duration). Exiting."
fi
fi
### print trace buffer
warn "echo > trace"
if (( opt_duration )); then
sleep $duration
if (( opt_headers )); then
cat trace
else
grep -v '^#' trace
fi
else
# trace_pipe lack headers, so fetch them from trace
(( opt_headers )) && cat trace
cat trace_pipe
fi
### end tracing
end

158
man/man8/funcgraph.8 Normal file
View File

@@ -0,0 +1,158 @@
.TH funcgraph 8 "2014-07-29" "USER COMMANDS"
.SH NAME
funcgraph \- trace kernel function graph, showing child function calls and times. Uses Linux ftrace.
.SH SYNOPSIS
.B funcgraph
[\-acDhHPtT] [\-p PIDs] [\-d secs] funcstring
.SH DESCRIPTION
This is an exploratory tool that shows the graph of child function calls
for a given kernel function. This can cost moderate overhead to execute, and
should only be used to understand kernel behavior before using other, lower
overhead tools. This is a proof of concept using Linux ftrace capabilities
on older kernels.
The output format is the same as the ftrace function graph trace format,
described in the kernel source under Documentation/trace/ftrace.txt.
Note that the output may be shuffled when different CPU buffers are read;
check the CPU column for changes, or include timestamps (-t) and post sort.
The "-d duration" mode leaves the trace data in the kernel buffer, and
only reads it at the end. If the trace data is large, beware of exhausting
buffer space (/sys/kernel/debug/tracing/buffer_size_kb) and losing data.
Also beware of feedback loops: tracing tcp* functions over an ssh session,
or writing ext4* functions to an ext4 file system. For the former, tcp
trace data could be redirected to a file (as in the usage message). For
the latter, trace to the screen or a different file system.
WARNING: This uses dynamic tracing of kernel functions, and could cause
kernel panics or freezes. Test, and know what you are doing, before use.
Also see the OVERHEAD section.
Since this uses ftrace, only the root user can use this tool.
.SH REQUIREMENTS
FTRACE CONFIG, which you may already have enabled and available on recent
kernels.
.SH OPTIONS
.TP
\-a
All info. Same as \-HPt. (But no -T, which isn't available in older kernels.)
.TP
\-c
Function durations measure on-CPU time only (exclude sleep time).
.TP
\-d seconds
Set the duration of tracing, in seconds. Trace output will be buffered and
printed at the end. This also reduces overheads by buffering in-kernel,
instead of printing events as they occur.
The ftrace buffer has a fixed size per-CPU (see
/sys/kernel/debug/tracing/buffer_size_kb). If you think events are missing,
try increasing that size.
.TP
\-D
Do not show function duration times.
.TP
\-h
Print usage message.
.TP
\-H
Print column headers.
.TP
\-p PID
Only trace kernel functions when this process ID is on-CPU.
.TP
\-P
Show process names and process IDs with every line of output.
.TP
\-t
Show timestamps on every line of output.
.TP
\-T
Tail mode: decorate function return lines with the name of the function. This
option may not be available for older kernels.
.TP
funcstring
A function name to trace, which may include file glob style wildcards ("*") at
the beginning or ending of a string only. Eg, "vfs*" means match "vfs" followed
by anything. Since the output is verbose, you probably only want to trace
single functions, and not use wildcards.
.SH EXAMPLES
.TP
Trace calls to do_nanosleep(), showing child functions and durations:
#
.B funcgraph do_nanosleep
.TP
Same as above, but include column headers:
#
.B funcgraph -H do_nanosleep
.TP
Same as above, but include timestamps and process names as well:
#
.B funcgraph -HtP do_nanosleep
.TP
Trace all vfs_read() kernel function calls, and child functions, for PID 198 only:
#
.B funcgraph \-p 198 vfs_read
.TP
Trace all vfs_read() kernel function calls, and child functions, for 1 second then write to a file.
#
.B funcgraph \-d 1 vfs_read > out
.SH FIELDS
The output format depends on the kernel version, and headings can be printed
using \-H. The format is the same as the ftrace function trace format, described
in the kernel source under Documentation/trace/ftrace.txt.
Typical fields are:
.TP
TIME
(Shown with \-t.) Time of event, in seconds.
.TP
CPU
The CPU this event occurred on.
.TP
TASK/PID
(Shown with \-P.) The process name (which could include dashes), a dash, and the process ID.
.TP
DURATION
Elapsed time during the function call, inclusive of children, and inclusive of
sleep time. This is either displayed on the return
of a function ("}"), or for a leaf function (no children), on the same line.
If the trace output begins with some returns that lack entries, their durations
may not be trusted. This is usually only the case for the first dozen or so
lines.
.TP
FUNCTION CALLS
Entries and returns from kernel functions.
.SH OVERHEAD
This tool causes moderate to high overheads. Use with caution for
exploratory purposes, then switch to lower overhead techniques based on
findings. It's expected that the kernel will run at least 50% slower while
this tool is running -- even while no output is being generated. This is
because ALL kernel functions are traced, and filtered based on the function
of interest. When output is generated, it can generate many lines quickly
depending on the traced event. Such data will cause performance overheads.
This also works without buffering by default, printing function events
as they happen (uses trace_pipe), context switching and consuming CPU to do
so. If needed, you can try the "-d secs" option, which buffers events
instead, reducing overhead. If you think the buffer option is losing events,
try increasing the buffer size (buffer_size_kb).
It's a good idea to use funccount(8) first, which is lower overhead, to
help you select which functions you may want to trace using funcgraph(8).
.SH SOURCE
This is from the perf-tools collection:
.IP
https://github.com/brendangregg/perf-tools
.PP
Also look under the examples directory for a text file containing example
usage, output, and commentary for this tool.
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
funccount(8), functrace(8), kprobe(8)