funcgraph

2025-11-30 23:16:03 +07:00 · 2014-07-29 17:52:46 -07:00
parent e4bf5a8106
commit 09d7b69c55
5 changed files with 1932 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -23,6 +23,7 @@ Using ftrace:
 - [opensnoop](opensnoop): trace open() syscalls showing filenames. [Examples](examples/opensnoop_example.txt).
 - kernel/[funccount](kernel/funccount): count kernel functions that match a string. [Examples](examples/funccount_example.txt).
 - kernel/[functrace](kernel/functrace): trace kernel functions that match a string. [Examples](examples/functrace_example.txt).
+- kernel/[funcgraph](kernel/funcgraph): trace kernel function graph showing children and times. [Examples](examples/funcgraph_example.txt).
 - kernel/[kprobe](kernel/kprobe): trace a given kprobe definition. [Examples](examples/kprobe_example.txt).
 - tools/[reset-ftrace](tools/reset-ftrace): reset ftrace state if needed. [Examples](examples/reset-ftrace_example.txt).

@@ -66,4 +67,6 @@ Some of these tools are intended as short-term workarounds until more kernel cap

 As my main target is a fleet of Linux 3.2 servers that do not have debuginfo, these tools try not to require it. At times, this makes the tool more brittle than it needs to be, as I'm employing workarounds (that may be kernel version and platform specific) instead of using debuginfo information (which can be generic). See the man page for detailed prerequisites for each tool.

+I've tried to use perf_events ("perf") where possible, since that interface has been developed for multi-user use. For various reasons I've often needed to use ftrace instead. ftrace is suprisingly powerful (thanks Steven Rostedt!), and not all of its features are exposed via perf, or in common usage. This tool collection is in some ways a demonstration of hidden Linux features using ftrace.
+
 Since things are changing, it's very possible you may find some tools don't work on your Linux kernel version. Some expertise and assembly will be required to fix them.
--- a/bin/funcgraph
+++ b/bin/funcgraph
@@ -0,0 +1 @@
+../kernel/funcgraph
--- a/examples/funcgraph_example.txt
+++ b/examples/funcgraph_example.txt
--- a/kernel/funcgraph
+++ b/kernel/funcgraph
@@ -0,0 +1,235 @@
+#!/bin/bash
+#
+# funcgraph - trace kernel function graph, showing child function calls.
+#             Uses Linux ftrace.
+#
+# This is an exploratory tool that shows the graph of child function calls
+# for a given kernel function. This can cost moderate overhead to execute, and
+# should only be used to understand kernel behavior for a given function before
+# using other, lower overhead tools. This is a proof of concept using Linux
+# ftrace capabilities on older kernels.
+#
+# USAGE: funcgraph [-acDhHPtT] [-p PID] [-d secs] funcstring
+#
+# Run "funcgraph -h" for full usage.
+#
+# The output format is the same as the ftrace function graph trace format,
+# described in the kernel source under Documentation/trace/ftrace.txt.
+# Note that the output may be shuffled when different CPU buffers are read;
+# check the CPU column for changes, or include timestamps (-t) and post sort.
+#
+# The "-d duration" mode leaves the trace data in the kernel buffer, and
+# only reads it at the end. If the trace data is large, beware of exhausting
+# buffer space (/sys/kernel/debug/tracing/buffer_size_kb) and losing data.
+#
+# Also beware of feedback loops: tracing tcp* functions over an ssh session,
+# or writing ext4* functions to an ext4 file system. For the former, tcp
+# trace data could be redirected to a file (as in the usage message). For
+# the latter, trace to the screen or a different file system.
+#
+# WARNING: This uses dynamic tracing of kernel functions, and could cause
+# kernel panics or freezes. Test, and know what you are doing, before use.
+#
+# OVERHEADS: This tool causes moderate to high overheads. Use with caution for
+# exploratory purposes, then switch to lower overhead techniques based on
+# findings. It's expected that the kernel will run at least 50% slower while
+# this tool is running -- even while no output is being generated. This is
+# because ALL kernel functions are traced, and filtered based on the function
+# of interest. When output is generated, it can generate many lines quickly
+# depending on the traced event. Such data will cause performance overheads.
+# This also works without buffering by default, printing function events
+# as they happen (uses trace_pipe), context switching and consuming CPU to do
+# so. If needed, you can try the "-d secs" option, which buffers events
+# instead, reducing overhead. If you think the buffer option is losing events,
+# try increasing the buffer size (buffer_size_kb).
+#
+# From perf-tools: https://github.com/brendangregg/perf-tools
+#
+# COPYRIGHT: Copyright (c) 2014 Brendan Gregg.
+#
+#  This program is free software; you can redistribute it and/or
+#  modify it under the terms of the GNU General Public License
+#  as published by the Free Software Foundation; either version 2
+#  of the License, or (at your option) any later version.
+#
+#  This program is distributed in the hope that it will be useful,
+#  but WITHOUT ANY WARRANTY; without even the implied warranty of
+#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+#  GNU General Public License for more details.
+#
+#  You should have received a copy of the GNU General Public License
+#  along with this program; if not, write to the Free Software Foundation,
+#  Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
+#
+#  (http://www.gnu.org/copyleft/gpl.html)
+#
+# 12-Jul-2014	Brendan Gregg	Created this.
+
+### default variables
+tracing=/sys/kernel/debug/tracing
+flock=/var/tmp/.ftrace-lock
+opt_duration=0; duration=; opt_pid=0; pid=; pidtext=; opt_headers=0
+opt_proc=0; opt_time=0; opt_tail=0; opt_nodur=0; opt_cpu=0
+trap ':' INT QUIT TERM PIPE HUP	# sends execution to end tracing section
+
+function usage {
+	cat <<-END >&2
+	USAGE: funcgraph [-acDhHPtT] [-p PID] [-d secs] funcstring
+	                 -a              # all info (same as -HPt)
+	                 -c              # measure on-CPU time only
+	                 -d seconds      # trace duration, and use buffers
+	                 -D              # do not show function duration
+	                 -h              # this usage message
+	                 -H              # include column headers
+	                 -p PID          # trace when this pid is on-CPU
+	                 -P              # show process names & PIDs
+	                 -t              # show timestamps
+	                 -T              # comment function tails
+	  eg,
+	       funcgraph do_nanosleep    # trace do_nanosleep() and children
+	       funcgraph -p 198 vfs_read # trace vfs_read() for PID 198 only
+	       funcgraph -d 1 vfs_read > out  # trace 1 sec, then write to file
+
+	See the man page and example file for more info.
+END
+	exit
+}
+
+function warn {
+	if ! eval "$@"; then
+		echo >&2 "WARNING: command failed \"$@\""
+	fi
+}
+
+function end {
+	# disable tracing
+	echo 2>/dev/null
+	echo "Ending tracing..." 2>/dev/null
+	cd $tracing
+
+	(( opt_time )) && warn "echo nofuncgraph-abstime > trace_options"
+	(( opt_proc )) && warn "echo nofuncgraph-proc > trace_options"
+	(( opt_tail )) && warn "echo nofuncgraph-tail > trace_options"
+	(( opt_nodur )) && warn "echo funcgraph-duration > trace_options"
+	(( opt_cpu )) && warn "echo sleep-time > trace_options"
+
+	warn "echo nop > current_tracer"
+	(( opt_pid )) && warn "echo > set_ftrace_pid"
+	warn "echo > set_graph_function"
+	warn "echo > trace"
+
+	(( wroteflock )) && warn "rm $flock"
+}
+
+function die {
+	echo >&2 "$@"
+	exit 1
+}
+
+function edie {
+	# die with a quiet end()
+	echo >&2 "$@"
+	exec >/dev/null 2>&1
+	end
+	exit 1
+}
+
+### process options
+while getopts acd:DhHp:PtT opt
+do
+	case $opt in
+	a)	opt_headers=1; opt_proc=1; opt_time=1 ;;
+	c)	opt_cpu=1; ;;
+	d)	opt_duration=1; duration=$OPTARG ;;
+	D)	opt_nodur=1; ;;
+	p)	opt_pid=1; pid=$OPTARG ;;
+	H)	opt_headers=1; ;;
+	P)	opt_proc=1; ;;
+	t)	opt_time=1; ;;
+	T)	opt_tail=1; ;;
+	h|?)	usage ;;
+	esac
+done
+shift $(( $OPTIND - 1 ))
+
+### option logic
+(( $# == 0 )) && usage
+funcs="$1"
+(( opt_pid )) && pidtext=" for PID $pid"
+if (( opt_duration )); then
+	echo "Tracing \"$funcs\"$pidtext for $duration seconds..."
+else
+	echo "Tracing \"$funcs\"$pidtext... Ctrl-C to end."
+fi
+
+### check permissions
+cd $tracing || die "ERROR: accessing tracing. Root user? Kernel has FTRACE?
+    debugfs mounted? (mount -t debugfs debugfs /sys/kernel/debug)"
+
+### ftrace lock
+[[ -e $flock ]] && die "ERROR: ftrace may be in use by PID $(cat $flock) $flock"
+echo $$ > $flock || die "ERROR: unable to write $flock."
+wroteflock=1
+
+### setup and commence tracing
+sysctl -q kernel.ftrace_enabled=1	# doesn't set exit status
+read mode < current_tracer
+[[ "$mode" != "nop" ]] && edie "ERROR: ftrace active (current_tracer=$mode)"
+if (( opt_pid )); then
+	if ! echo $pid > set_ftrace_pid; then
+		edie "ERROR: setting -p $pid (PID exist?). Exiting."
+	fi
+fi
+if ! echo > set_ftrace_filter; then
+	edie "ERROR: writing to set_ftrace_filter. Exiting."
+fi
+if ! echo "$funcs" > set_graph_function; then
+	edie "ERROR: enabling \"$funcs\". Exiting."
+fi
+if ! echo function_graph > current_tracer; then
+	edie "ERROR: setting current_tracer to \"function\". Exiting."
+fi
+if (( opt_cpu )); then
+	if ! echo nosleep-time > trace_options; then
+		edie "ERROR: setting -c (nosleep-time). Exiting."
+	fi
+fi
+# the following must be done after setting current_tracer
+if (( opt_time )); then
+	if ! echo funcgraph-abstime > trace_options; then
+		edie "ERROR: setting -t (funcgraph-abstime). Exiting."
+	fi
+fi
+if (( opt_proc )); then
+	if ! echo funcgraph-proc > trace_options; then
+		edie "ERROR: setting -P (funcgraph-proc). Exiting."
+	fi
+fi
+if (( opt_tail )); then
+	if ! echo funcgraph-tail > trace_options; then
+		edie "ERROR: setting -T (funcgraph-tail). Old kernel? Exiting."
+	fi
+fi
+if (( opt_nodur )); then
+	if ! echo nofuncgraph-duration > trace_options; then
+		edie "ERROR: setting -D (nofuncgraph-duration). Exiting."
+	fi
+fi
+
+### print trace buffer
+warn "echo > trace"
+if (( opt_duration )); then
+	sleep $duration
+	if (( opt_headers )); then
+		cat trace
+	else
+		grep -v '^#' trace
+	fi
+else
+	# trace_pipe lack headers, so fetch them from trace
+	(( opt_headers )) && cat trace
+	cat trace_pipe
+fi
+
+### end tracing
+end
--- a/man/man8/funcgraph.8
+++ b/man/man8/funcgraph.8
@@ -0,0 +1,158 @@
+.TH funcgraph 8  "2014-07-29" "USER COMMANDS"
+.SH NAME
+funcgraph \- trace kernel function graph, showing child function calls and times. Uses Linux ftrace.
+.SH SYNOPSIS
+.B funcgraph
+[\-acDhHPtT] [\-p PIDs] [\-d secs] funcstring
+.SH DESCRIPTION
+This is an exploratory tool that shows the graph of child function calls
+for a given kernel function. This can cost moderate overhead to execute, and
+should only be used to understand kernel behavior before using other, lower
+overhead tools. This is a proof of concept using Linux ftrace capabilities
+on older kernels.
+
+The output format is the same as the ftrace function graph trace format,
+described in the kernel source under Documentation/trace/ftrace.txt.
+Note that the output may be shuffled when different CPU buffers are read;
+check the CPU column for changes, or include timestamps (-t) and post sort.
+
+The "-d duration" mode leaves the trace data in the kernel buffer, and
+only reads it at the end. If the trace data is large, beware of exhausting
+buffer space (/sys/kernel/debug/tracing/buffer_size_kb) and losing data.
+
+Also beware of feedback loops: tracing tcp* functions over an ssh session,
+or writing ext4* functions to an ext4 file system. For the former, tcp
+trace data could be redirected to a file (as in the usage message). For
+the latter, trace to the screen or a different file system.
+
+WARNING: This uses dynamic tracing of kernel functions, and could cause
+kernel panics or freezes. Test, and know what you are doing, before use.
+Also see the OVERHEAD section.
+
+Since this uses ftrace, only the root user can use this tool.
+.SH REQUIREMENTS
+FTRACE CONFIG, which you may already have enabled and available on recent
+kernels.
+.SH OPTIONS
+.TP
+\-a
+All info. Same as \-HPt. (But no -T, which isn't available in older kernels.)
+.TP
+\-c
+Function durations measure on-CPU time only (exclude sleep time).
+.TP
+\-d seconds
+Set the duration of tracing, in seconds. Trace output will be buffered and
+printed at the end. This also reduces overheads by buffering in-kernel,
+instead of printing events as they occur.
+
+The ftrace buffer has a fixed size per-CPU (see
+/sys/kernel/debug/tracing/buffer_size_kb). If you think events are missing,
+try increasing that size.
+.TP
+\-D
+Do not show function duration times.
+.TP
+\-h
+Print usage message.
+.TP
+\-H
+Print column headers.
+.TP
+\-p PID
+Only trace kernel functions when this process ID is on-CPU.
+.TP
+\-P
+Show process names and process IDs with every line of output.
+.TP
+\-t
+Show timestamps on every line of output.
+.TP
+\-T
+Tail mode: decorate function return lines with the name of the function. This
+option may not be available for older kernels.
+.TP
+funcstring
+A function name to trace, which may include file glob style wildcards ("*") at
+the beginning or ending of a string only. Eg, "vfs*" means match "vfs" followed
+by anything. Since the output is verbose, you probably only want to trace
+single functions, and not use wildcards.
+.SH EXAMPLES
+.TP
+Trace calls to do_nanosleep(), showing child functions and durations:
+#
+.B funcgraph do_nanosleep
+.TP
+Same as above, but include column headers:
+#
+.B funcgraph -H do_nanosleep
+.TP
+Same as above, but include timestamps and process names as well:
+#
+.B funcgraph -HtP do_nanosleep
+.TP
+Trace all vfs_read() kernel function calls, and child functions, for PID 198 only:
+#
+.B funcgraph \-p 198 vfs_read
+.TP
+Trace all vfs_read() kernel function calls, and child functions, for 1 second then write to a file.
+#
+.B funcgraph \-d 1 vfs_read > out
+.SH FIELDS
+The output format depends on the kernel version, and headings can be printed
+using \-H. The format is the same as the ftrace function trace format, described
+in the kernel source under Documentation/trace/ftrace.txt.
+
+Typical fields are:
+.TP
+TIME
+(Shown with \-t.) Time of event, in seconds.
+.TP
+CPU
+The CPU this event occurred on.
+.TP
+TASK/PID
+(Shown with \-P.) The process name (which could include dashes), a dash, and the process ID.
+.TP
+DURATION
+Elapsed time during the function call, inclusive of children, and inclusive of
+sleep time. This is either displayed on the return
+of a function ("}"), or for a leaf function (no children), on the same line.
+
+If the trace output begins with some returns that lack entries, their durations
+may not be trusted. This is usually only the case for the first dozen or so
+lines.
+.TP
+FUNCTION CALLS
+Entries and returns from kernel functions.
+.SH OVERHEAD
+This tool causes moderate to high overheads. Use with caution for
+exploratory purposes, then switch to lower overhead techniques based on
+findings. It's expected that the kernel will run at least 50% slower while
+this tool is running -- even while no output is being generated. This is
+because ALL kernel functions are traced, and filtered based on the function
+of interest. When output is generated, it can generate many lines quickly
+depending on the traced event. Such data will cause performance overheads.
+This also works without buffering by default, printing function events
+as they happen (uses trace_pipe), context switching and consuming CPU to do
+so. If needed, you can try the "-d secs" option, which buffers events
+instead, reducing overhead. If you think the buffer option is losing events,
+try increasing the buffer size (buffer_size_kb).
+
+It's a good idea to use funccount(8) first, which is lower overhead, to
+help you select which functions you may want to trace using funcgraph(8).
+.SH SOURCE
+This is from the perf-tools collection:
+.IP
+https://github.com/brendangregg/perf-tools
+.PP
+Also look under the examples directory for a text file containing example
+usage, output, and commentary for this tool.
+.SH OS
+Linux
+.SH STABILITY
+Unstable - in development.
+.SH AUTHOR
+Brendan Gregg
+.SH SEE ALSO
+funccount(8), functrace(8), kprobe(8)