This commit is contained in:
Brendan Gregg
2014-07-22 02:27:54 -07:00
parent 7737ae2cae
commit b1a16b11fb
5 changed files with 492 additions and 0 deletions

View File

@@ -21,6 +21,7 @@ Using ftrace:
- [opensnoop](opensnoop): trace open() syscalls showing filenames. [Examples](examples/opensnoop_example.txt).
- kernel/[funccount](kernel/funccount): count kernel functions that match a string. [Examples](examples/funccount_example.txt).
- kernel/[functrace](kernel/functrace): trace kernel functions that match a string. [Examples](examples/functrace_example.txt).
- kernel/[kprobe](kernel/kprobe): trace a given kprobe definition. [Examples](examples/kprobe_example.txt).
- tools/[reset-ftrace](tools/reset-ftrace): reset ftrace state if needed. [Examples](examples/reset-ftrace_example.txt).
## Prerequisites

1
bin/kprobe Symbolic link
View File

@@ -0,0 +1 @@
../kernel/kprobe

168
examples/kprobe_example.txt Normal file
View File

@@ -0,0 +1,168 @@
Demonstrations of kprobe, the Linux ftrace version.
This traces do_sys_open() returns, showing the return value:
# ./kprobe 'r:myopen do_sys_open $retval'
Tracing kprobe myopen. Ctrl-C to end.
kprobe-26386 [001] d... 6593278.858754: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) arg1=0x3
<...>-26387 [001] d... 6593278.860043: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) arg1=0x3
<...>-26387 [001] d... 6593278.860064: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) arg1=0x3
<...>-26387 [001] d... 6593278.860433: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) arg1=0x3
<...>-26387 [001] d... 6593278.860521: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) arg1=0x3
supervise-1685 [001] d... 6593279.178806: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) arg1=0x9
supervise-1689 [001] d... 6593279.228756: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) arg1=0x9
supervise-1689 [001] d... 6593279.229106: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) arg1=0x9
supervise-1688 [000] d... 6593279.229501: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) arg1=0x9
supervise-1695 [000] d... 6593279.229944: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) arg1=0x9
supervise-1685 [001] d... 6593279.230104: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) arg1=0x9
supervise-1687 [001] d... 6593279.230293: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) arg1=0x9
supervise-1699 [000] d... 6593279.230381: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) arg1=0x9
supervise-1692 [000] d... 6593279.230825: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) arg1=0x9
supervise-1698 [000] d... 6593279.230915: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) arg1=0x9
supervise-1698 [000] d... 6593279.231277: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) arg1=0x9
supervise-1690 [000] d... 6593279.231703: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) arg1=0x9
^C
Ending tracing...
The syntax for the kprobe definition is the same as can be written to the
/sys/kernel/debug/tracing/kprobe_events file, and is documented in the Linux
kernel source under Documentation/trace/kprobetrace.txt.
This example probe was named "myopen" (which is arbitrary). You can also provide
arbitrary names for arguments. Eg:
# ./kprobe 'r:myopen do_sys_open rval=$retval'
Tracing kprobe myopen. Ctrl-C to end.
kprobe-27454 [001] d... 6593356.250019: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) rval=0x3
<...>-27455 [001] d... 6593356.251280: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) rval=0x3
<...>-27455 [001] d... 6593356.251301: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) rval=0x3
<...>-27455 [001] d... 6593356.251672: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) rval=0x3
<...>-27455 [001] d... 6593356.251769: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) rval=0x3
supervise-1689 [000] d... 6593356.859758: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) rval=0x9
supervise-1689 [000] d... 6593356.860143: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) rval=0x9
supervise-1696 [000] d... 6593356.862682: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) rval=0x9
supervise-1685 [001] d... 6593356.862684: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) rval=0x9
[...]
That's a bit better.
Tracing the open() mode:
# ./kprobe 'p:myopen do_sys_open mode=%cx:u16'
Tracing kprobe myopen. Ctrl-C to end.
kprobe-29572 [001] d... 6593503.353923: myopen: (do_sys_open+0x0/0x220) mode=0x1
kprobe-29572 [001] d... 6593503.353945: myopen: (do_sys_open+0x0/0x220) mode=0x0
kprobe-29572 [001] d... 6593503.354307: myopen: (do_sys_open+0x0/0x220) mode=0x5c00
kprobe-29572 [001] d... 6593503.354401: myopen: (do_sys_open+0x0/0x220) mode=0x0
supervise-1689 [000] d... 6593503.944125: myopen: (do_sys_open+0x0/0x220) mode=0x1a4
supervise-1688 [001] d... 6593503.944125: myopen: (do_sys_open+0x0/0x220) mode=0x1a4
supervise-1688 [001] d... 6593503.944606: myopen: (do_sys_open+0x0/0x220) mode=0x1a4
supervise-1689 [000] d... 6593503.944606: myopen: (do_sys_open+0x0/0x220) mode=0x1a4
supervise-1698 [000] d... 6593503.944728: myopen: (do_sys_open+0x0/0x220) mode=0x1a4
supervise-1698 [000] d... 6593503.945077: myopen: (do_sys_open+0x0/0x220) mode=0x1a4
[...]
Here I guessed that the mode was in register %cx, and cast it as a 16-bit
unsigned integer (":u16"). Your platform and kernel may be different, and the
mode may be in a different register. If fiddling with such registers becomes too
painful or unreliable for you, consider installing kernel debuginfo and using
the named variables with perf_events "perf probe".
Tracing the open() filename:
# ./kprobe 'p:myopen do_sys_open filename=+0(%si):string'
Tracing kprobe myopen. Ctrl-C to end.
kprobe-32369 [001] d... 6593706.999728: myopen: (do_sys_open+0x0/0x220) filename="/etc/ld.so.cache"
kprobe-32369 [001] d... 6593706.999748: myopen: (do_sys_open+0x0/0x220) filename="/lib/x86_64-linux-gnu/libc.so.6"
kprobe-32369 [001] d... 6593707.000092: myopen: (do_sys_open+0x0/0x220) filename="/usr/lib/locale/locale-archive"
kprobe-32369 [001] d... 6593707.000176: myopen: (do_sys_open+0x0/0x220) filename="trace_pipe"
supervise-1699 [000] d... 6593707.254970: myopen: (do_sys_open+0x0/0x220) filename="supervise/status.new"
supervise-1689 [001] d... 6593707.254970: myopen: (do_sys_open+0x0/0x220) filename="supervise/status.new"
supervise-1689 [001] d... 6593707.255432: myopen: (do_sys_open+0x0/0x220) filename="supervise/status.new"
supervise-1699 [000] d... 6593707.255432: myopen: (do_sys_open+0x0/0x220) filename="supervise/status.new"
supervise-1695 [001] d... 6593707.258805: myopen: (do_sys_open+0x0/0x220) filename="supervise/status.new"
[...]
As mentioned previously, the %si register may be different on your platform.
In this example, I cast it as a string.
The -H option will show the headings:
# ./kprobe -H 'p:myopen do_sys_open filename=+0(%si):string'
Tracing kprobe myopen. Ctrl-C to end.
# tracer: nop
#
# entries-in-buffer/entries-written: 4/4 #P:2
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
kprobe-1460 [001] d... 6593814.070876: myopen: (do_sys_open+0x0/0x220) filename="/etc/ld.so.cache"
kprobe-1460 [001] d... 6593814.070898: myopen: (do_sys_open+0x0/0x220) filename="/lib/x86_64-linux-gnu/libc.so.6"
kprobe-1460 [001] d... 6593814.071256: myopen: (do_sys_open+0x0/0x220) filename="/usr/lib/locale/locale-archive"
kprobe-1460 [001] d... 6593814.071355: myopen: (do_sys_open+0x0/0x220) filename="trace"
kprobe-1460 [001] d... 6593814.070876: myopen: (do_sys_open+0x0/0x220) filename="/etc/ld.so.cache"
[...]
Specifying a duration will buffer in-kernel (reducing overhead), and write at
the end. Here's tracing for 10 seconds, and writing to the "out" file:
# ./kprobe -d 10 'p:myopen do_sys_open filename=+0(%si):string' > out
You can match on a single PID only:
# ./kprobe -p 1696 'p:myopen do_sys_open filename=+0(%si):string'
Tracing kprobe myopen. Ctrl-C to end.
supervise-1696 [001] d... 6593773.677033: myopen: (do_sys_open+0x0/0x220) filename="supervise/status.new"
supervise-1696 [001] d... 6593773.677332: myopen: (do_sys_open+0x0/0x220) filename="supervise/status.new"
supervise-1696 [001] d... 6593774.697144: myopen: (do_sys_open+0x0/0x220) filename="supervise/status.new"
supervise-1696 [001] d... 6593774.697675: myopen: (do_sys_open+0x0/0x220) filename="supervise/status.new"
supervise-1696 [001] d... 6593775.717986: myopen: (do_sys_open+0x0/0x220) filename="supervise/status.new"
supervise-1696 [001] d... 6593775.718499: myopen: (do_sys_open+0x0/0x220) filename="supervise/status.new"
^C
Ending tracing...
This will only show events when that PID is on-CPU.
The -v option will show you the available variables you can use in custom
filters:
# ./kprobe -v 'p:myopen do_sys_open filename=+0(%si):string'
name: myopen
ID: 1443
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:unsigned long __probe_ip; offset:8; size:8; signed:0;
field:__data_loc char[] filename; offset:16; size:4; signed:1;
print fmt: "(%lx) filename=\"%s\"", REC->__probe_ip, __get_str(filename)
Tracing filenames that end in "stat", by adding a filter:
# ./kprobe 'p:myopen do_sys_open filename=+0(%si):string' 'filename ~ "*stat"'
Tracing kprobe myopen. Ctrl-C to end.
postgres-1172 [000] d... 6594028.787166: myopen: (do_sys_open+0x0/0x220) filename="pg_stat_tmp/pgstat.stat"
postgres-1172 [001] d... 6594028.797410: myopen: (do_sys_open+0x0/0x220) filename="pg_stat_tmp/pgstat.stat"
postgres-1172 [001] d... 6594028.797467: myopen: (do_sys_open+0x0/0x220) filename="pg_stat_tmp/pgstat.stat"
postgres-4443 [001] d... 6594028.800908: myopen: (do_sys_open+0x0/0x220) filename="pg_stat_tmp/pgstat.stat"
postgres-4443 [000] d... 6594028.811237: myopen: (do_sys_open+0x0/0x220) filename="pg_stat_tmp/pgstat.stat"
postgres-4443 [000] d... 6594028.811290: myopen: (do_sys_open+0x0/0x220) filename="pg_stat_tmp/pgstat.stat"
^C
Ending tracing...
This filtering is done in-kernel context.

196
kernel/kprobe Executable file
View File

@@ -0,0 +1,196 @@
#!/bin/bash
#
# kprobe - trace a given kprobe definition. Kernel dynamic tracing.
# Written using Linux ftrace.
#
# This will create, trace, then destroy a given kprobe definition. See
# Documentation/trace/kprobetrace.txt in the Linux kernel source for the
# syntax of a kprobe definition.
#
# USAGE: ./kprobe [-hv] [-d secs] [-p pid] kprobe_definition [filter]
#
# Run "kprobe -h" for full usage.
#
# I wrote this because I kept testing different custom kprobes at the command
# line, and wanted a way to automate the steps.
#
# REQUIREMENTS: FTRACE and KPROBE CONFIG, which you may already have on recent
# kernel versions.
#
# From perf-tools: https://github.com/brendangregg/perf-tools
#
# See the kprobe(8) man page (in perf-tools) for more info.
#
# COPYRIGHT: Copyright (c) 2014 Brendan Gregg.
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software Foundation,
# Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
#
# (http://www.gnu.org/copyleft/gpl.html)
#
# 22-Jul-2014 Brendan Gregg Created this.
### default variables
tracing=/sys/kernel/debug/tracing
flock=/var/tmp/.ftrace-lock; wroteflock=0
opt_duration=0; duration=; opt_pid=0; pid=; opt_filter=0; filter=
opt_view=0; opt_headers=0; dmesg=2
trap ':' INT QUIT TERM PIPE # sends execution to end tracing section
function usage {
cat <<-END >&2
USAGE: kprobe [-hHv] [-d secs] [-p PID] kprobe_definition [filter]
-d seconds # trace duration, and use buffers
-p PID # PID to match on I/O issue
-v # view format file (don't trace)
-H # include column headers
-h # this usage message
Note that these examples may need modification to match your kernel
version's function names and platform's register usage.
eg,
kprobe 'r:myopen do_sys_open \$retval'
# trace open() return value
kprobe 'p:myopen do_sys_open mode=%cx:u16'
# trace open() file mode
kprobe 'p:myopen do_sys_open filename=+0(%si):string'
# trace open() with filename
END
exit
}
function warn {
if ! eval "$@"; then
echo >&2 "WARNING: command failed \"$@\""
fi
}
function end {
# disable tracing
echo 2>/dev/null
echo "Ending tracing..." 2>/dev/null
cd $tracing
warn "echo 0 > events/kprobes/$kname/enable"
if (( opt_filter )); then
warn "echo 0 > events/kprobes/$kname/filter"
fi
warn "echo -:$kname >> kprobe_events"
warn "echo > trace"
(( wroteflock )) && warn "rm $flock"
}
function die {
echo >&2 "$@"
exit 1
}
function edie {
# die with a quiet end()
echo >&2 "$@"
exec >/dev/null 2>&1
end
exit 1
}
### process options
while getopts d:hHp:v opt
do
case $opt in
d) opt_duration=1; duration=$OPTARG ;;
p) opt_pid=1; pid=$OPTARG ;;
H) opt_headers=1 ;;
v) opt_view=1 ;;
h|?) usage ;;
esac
done
shift $(( $OPTIND - 1 ))
(( $# )) || usage
kprobe=$1
shift
if (( $# )); then
opt_filter=1
filter=$1
fi
### option logic
(( opt_pid && opt_filter )) && die "ERROR: use either -p or -f."
(( opt_duration && opt_view )) && die "ERROR: use either -d or -v."
if (( opt_pid )); then
# convert to filter
opt_filter=1
filter="common_pid == $pid"
fi
if [[ "$kprobe" != p:* && "$kprobe" != r:* ]]; then
echo >&2 "ERROR: invalid kprobe definition (should start with p: or r:)"
usage
fi
kname=${kprobe#*:}
kname=${kname%% *}
if (( !opt_view )); then
if (( opt_duration )); then
echo "Tracing kprobe $kname for $duration seconds (buffered)..."
else
echo "Tracing kprobe $kname. Ctrl-C to end."
fi
fi
### check permissions
cd $tracing || die "ERROR: accessing tracing. Root user? Kernel has FTRACE?
debugfs mounted? (mount -t debugfs debugfs /sys/kernel/debug)"
### ftrace lock
[[ -e $flock ]] && die "ERROR: ftrace may be in use by PID $(cat $flock) $flock"
echo $$ > $flock || die "ERROR: unable to write $flock."
wroteflock=1
### setup and begin tracing
echo nop > current_tracer
if ! echo $kprobe >> kprobe_events; then
echo >&2 "ERROR: adding kprobe \"$kprobe\"."
if (( dmesg )); then
echo >&2 "Last $dmesg dmesg entries (might contain reason):"
dmesg | tail -$dmesg | sed 's/^/ /'
fi
edie "Exiting."
fi
if (( opt_view )); then
cat events/kprobes/$kname/format
edie ""
fi
if (( opt_filter )); then
if ! echo "$filter" > events/kprobes/$kname/filter; then
edie "ERROR: setting filter or -p. Exiting."
fi
fi
if ! echo 1 > events/kprobes/$kname/enable; then
edie "ERROR: enabling kprobe $kname. Exiting."
fi
### print trace buffer
warn "echo > trace"
if (( opt_duration )); then
sleep $duration
if (( opt_headers )); then
cat trace
else
grep -v '^#' trace
fi
else
# trace_pipe lack headers, so fetch them from trace
(( opt_headers )) && cat trace
cat trace_pipe
fi
### end tracing
end

126
man/man8/kprobe.8 Normal file
View File

@@ -0,0 +1,126 @@
.TH kprobe 8 "2014-07-20" "USER COMMANDS"
.SH NAME
kprobe \- trace a given kprobe definition. Kernel dynamic tracing. Uses Linux ftrace.
.SH SYNOPSIS
.B kprobe
[\-hv] [\-d secs] [\-p PIDs] kprobe_definition [filter]
.SH DESCRIPTION
This will create, trace, then destroy a given kprobe. See
Documentation/trace/kprobetrace.txt in the Linux kernel source for the
syntax of a kprobe definition.
WARNING: This uses kernel dynamic tracing, which can cause cause kernel panics
or freezes. Test, and know what you are doing, before use.
Also beware of feedback loops: tracing tcp functions over an ssh session,
or writing ext4 functions to an ext4 file system. For the former, tcp
trace data could be redirected to a file (as in the usage message). For
the latter, trace to the screen or a different file system.
Since this uses ftrace, only the root user can use this tool.
.SH REQUIREMENTS
FTRACE and KPROBES CONFIG, which you may already have enabled and available on
recent kernels.
.SH OPTIONS
.TP
\-d seconds
Set the duration of tracing, in seconds. Trace output will be buffered and
printed at the end. This also reduces overheads by buffering in-kernel,
instead of printing events as they occur.
The ftrace buffer has a fixed size per-CPU (see
/sys/kernel/debug/tracing/buffer_size_kb). If you think events are missing,
try increasing that size.
.TP
\-h
Print usage message.
.TP
\-H
Print column headers.
.TP
\-v
Show the kprobe format file only (do not trace), identifying possible variables
for use in a custom filter.
.TP
\-p PID
Only trace kernel functions when this process ID is on-CPU.
.TP
kprobe_definition
A full kprobe definition, as documented by Documentation/trace/kprobetrace.txt
in the Linux kernel source. See the EXAMPLES section.
.TP
filter
An ftrace filter definition.
.SH EXAMPLES
These examples may need modification to match your kernel version's function
names and platform's register usage. If using platform specific registers
becomes too painful in practice, consider a kernel debuginfo-based tracer,
which can trace variables names instead. For example, perf_events.
.TP
Trace do_sys_open() return values:
#
.B kprobe 'r:myopen do_sys_open $retval'
.TP
Trace do_sys_open() file mode:
#
.B kprobe 'p:myopen do_sys_open mode=%cx:u16'
.TP
Trace do_sys_open() file mode for PID 81:
#
.B kprobe -p 81 'p:myopen do_sys_open mode=%cx:u16'
.TP
Trace do_sys_open() with filename string:
#
.B kprobe 'p:myopen do_sys_open filename=+0(%si):string'
.TP
Trace do_sys_open() for filenames ending in "stat"
#
.B kprobe 'p:myopen do_sys_open fn=+0(%si):string' 'fn ~ "*stat"'
.SH FIELDS
The output format depends on the kernel version, and headings can be printed
using \-H. The format is the same as the ftrace function trace format, described
in the kernel source under Documentation/trace/ftrace.txt.
Typical fields are:
.TP
TASK-PID
The process name (which could include dashes), a dash, and the process ID.
.TP
CPU#
The CPU ID, in brackets.
.TP
||||
Kernel state flags. For example, on Linux 3.16 these are for irqs-off,
need-resched, hardirq/softirq, and preempt-depth.
.TP
TIMESTAMP
Time of event, in seconds.
.TP
FUNCTION
Kernel function name.
.SH OVERHEAD
This can generate a lot of trace data quickly, depending on the
frequency of the traced events. Such data will cause performance overheads.
This also works without buffering by default, printing function events
as they happen (uses trace_pipe), context switching and consuming CPU to do
so. If needed, you can try the "\-d secs" option, which buffers events
instead, reducing overhead. If you think the buffer option is losing events,
try increasing the buffer size (buffer_size_kb).
It's a good idea to use funccount(8) first, which is lower overhead, to
help you select which functions you may want to trace using kprobe(8).
.SH SOURCE
This is from the perf-tools collection:
.IP
https://github.com/brendangregg/perf-tools
.PP
Also look under the examples directory for a text file containing example
usage, output, and commentary for this tool.
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
functrace(8), funccount(8)