mirror of
https://github.com/brendangregg/perf-tools.git
synced 2025-11-30 23:16:03 +07:00
iolatency
This commit is contained in:
@@ -16,6 +16,7 @@ Using perf_events:
|
||||
Using ftrace:
|
||||
|
||||
- [iosnoop](iosnoop): trace disk I/O with details including latency. [Examples](examples/iosnoop_example.txt).
|
||||
- [iolatency](iolatency): summarize disk I/O latency as a histogram. [Examples](examples/iolatency_example.txt).
|
||||
- [execsnoop](execsnoop): trace process exec() with command line argument details. [Examples](examples/execsnoop_example.txt).
|
||||
- [opensnoop](opensnoop): trace open() syscalls showing filenames. [Examples](examples/opensnoop_example.txt).
|
||||
- kernel/[funccount](kernel/funccount): count kernel functions that match a string. [Examples](examples/funccount_example.txt).
|
||||
|
||||
1
bin/iolatency
Symbolic link
1
bin/iolatency
Symbolic link
@@ -0,0 +1 @@
|
||||
../iolatency
|
||||
57
examples/iolatency_example.txt
Normal file
57
examples/iolatency_example.txt
Normal file
@@ -0,0 +1,57 @@
|
||||
Demonstrations of iolatency, the Linux ftrace version.
|
||||
|
||||
Here's a busy system doing over 4k disk IOPS:
|
||||
|
||||
# ./iolatency
|
||||
Tracing block I/O. Output every 1 seconds. Ctrl-C to end.
|
||||
|
||||
>=(ms) .. <(ms) : I/O |Distribution |
|
||||
0 -> 1 : 4381 |######################################|
|
||||
1 -> 2 : 9 |# |
|
||||
2 -> 4 : 5 |# |
|
||||
4 -> 8 : 0 | |
|
||||
8 -> 16 : 1 |# |
|
||||
|
||||
>=(ms) .. <(ms) : I/O |Distribution |
|
||||
0 -> 1 : 4053 |######################################|
|
||||
1 -> 2 : 18 |# |
|
||||
2 -> 4 : 9 |# |
|
||||
4 -> 8 : 2 |# |
|
||||
8 -> 16 : 1 |# |
|
||||
16 -> 32 : 1 |# |
|
||||
|
||||
>=(ms) .. <(ms) : I/O |Distribution |
|
||||
0 -> 1 : 4658 |######################################|
|
||||
1 -> 2 : 9 |# |
|
||||
2 -> 4 : 2 |# |
|
||||
|
||||
>=(ms) .. <(ms) : I/O |Distribution |
|
||||
0 -> 1 : 4298 |######################################|
|
||||
1 -> 2 : 17 |# |
|
||||
2 -> 4 : 10 |# |
|
||||
4 -> 8 : 1 |# |
|
||||
8 -> 16 : 1 |# |
|
||||
^C
|
||||
Ending tracing...
|
||||
|
||||
Disk I/O latency is usually between 0 and 1 milliseconds, as this system uses
|
||||
SSDs. There are occasional outliers, up to the 16->32 ms range.
|
||||
|
||||
Identifying outliers like these is difficult from iostat(1) alone, which at
|
||||
the same time reported:
|
||||
|
||||
# iostat 1
|
||||
[...]
|
||||
avg-cpu: %user %nice %system %iowait %steal %idle
|
||||
0.53 0.00 1.05 46.84 0.53 51.05
|
||||
|
||||
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
|
||||
xvdap1 0.00 0.00 0.00 28.00 0.00 112.00 8.00 0.02 0.71 0.00 0.71 0.29 0.80
|
||||
xvdb 0.00 0.00 2134.00 0.00 18768.00 0.00 17.59 0.51 0.24 0.24 0.00 0.23 50.00
|
||||
xvdc 0.00 0.00 2088.00 0.00 18504.00 0.00 17.72 0.47 0.22 0.22 0.00 0.22 46.40
|
||||
md0 0.00 0.00 4222.00 0.00 37256.00 0.00 17.65 0.00 0.00 0.00 0.00 0.00 0.00
|
||||
|
||||
I/O latency ("await") averages 0.24 and 0.22 ms for our busy disks, but this
|
||||
output doesn't show that occasionally is much higher.
|
||||
|
||||
To get more information on these I/O, try the iosnoop(8) tool.
|
||||
287
iolatency
Executable file
287
iolatency
Executable file
@@ -0,0 +1,287 @@
|
||||
#!/bin/bash
|
||||
#
|
||||
# iolatency - summarize block device I/O latency as a histogram.
|
||||
# Written using Linux ftrace.
|
||||
#
|
||||
# This shows the distribution of latency, allowing modes and latency outliers
|
||||
# to be identified and studied.
|
||||
#
|
||||
# USAGE: ./iolatency [-hT] [-d device] [-i iotype] [interval [count]]
|
||||
#
|
||||
# REQUIREMENTS: FTRACE CONFIG and block:block_rq_* tracepoints, which you may
|
||||
# already have on recent kernels.
|
||||
#
|
||||
# OVERHEAD: block device I/O issue and completion events are traced and buffered
|
||||
# in-kernel, then processed and summarized in user space. There may be
|
||||
# measurable overhead with this approach, relative to the block device IOPS.
|
||||
#
|
||||
# This was written as a proof of concept for ftrace.
|
||||
#
|
||||
# From perf-tools: https://github.com/brendangregg/perf-tools
|
||||
#
|
||||
# COPYRIGHT: Copyright (c) 2014 Brendan Gregg.
|
||||
#
|
||||
# This program is free software; you can redistribute it and/or
|
||||
# modify it under the terms of the GNU General Public License
|
||||
# as published by the Free Software Foundation; either version 2
|
||||
# of the License, or (at your option) any later version.
|
||||
#
|
||||
# This program is distributed in the hope that it will be useful,
|
||||
# but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
# GNU General Public License for more details.
|
||||
#
|
||||
# You should have received a copy of the GNU General Public License
|
||||
# along with this program; if not, write to the Free Software Foundation,
|
||||
# Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
|
||||
#
|
||||
# (http://www.gnu.org/copyleft/gpl.html)
|
||||
#
|
||||
# 20-Jul-2014 Brendan Gregg Created this.
|
||||
|
||||
### default variables
|
||||
tracing=/sys/kernel/debug/tracing
|
||||
flock=/var/tmp/.ftrace-lock
|
||||
bufsize_kb=4096
|
||||
opt_device=0; device=; opt_iotype=0; iotype=; opt_timestamp=0
|
||||
opt_interval=0; interval=1; opt_count=0; count=0
|
||||
trap ':' INT QUIT TERM PIPE # sends execution to end tracing section
|
||||
|
||||
function usage {
|
||||
cat <<-END >&2
|
||||
USAGE: iolatency [-hT] [-d device] [-i iotype] [interval [count]]
|
||||
-d device # device string (eg, "202,1)
|
||||
-i iotype # match type (eg, '*R*' for all reads)
|
||||
-T # timestamp on output
|
||||
-h # this usage message
|
||||
interval # summary interval, seconds (default 1)
|
||||
count # number of summaries
|
||||
eg,
|
||||
iolatency # summarize latency every second
|
||||
iolatency 5 2 # 2 x 5 second summaries
|
||||
iolatency -i '*R*' # trace reads
|
||||
iolatency -d 202,1 # trace device 202,1 only
|
||||
END
|
||||
exit
|
||||
}
|
||||
|
||||
function warn {
|
||||
if ! eval "$@"; then
|
||||
echo >&2 "WARNING: command failed \"$@\""
|
||||
fi
|
||||
}
|
||||
|
||||
function end {
|
||||
# disable tracing
|
||||
echo 2>/dev/null
|
||||
echo "Ending tracing..." 2>/dev/null
|
||||
cd $tracing
|
||||
warn "echo 0 > events/block/block_rq_issue/enable"
|
||||
warn "echo 0 > events/block/block_rq_complete/enable"
|
||||
if (( opt_device || opt_iotype )); then
|
||||
warn "echo 0 > events/block/block_rq_issue/filter"
|
||||
warn "echo 0 > events/block/block_rq_complete/filter"
|
||||
fi
|
||||
warn "echo > trace"
|
||||
(( wroteflock )) && warn "rm $flock"
|
||||
}
|
||||
|
||||
function die {
|
||||
echo >&2 "$@"
|
||||
exit 1
|
||||
}
|
||||
|
||||
function edie {
|
||||
# die with a quiet end()
|
||||
echo >&2 "$@"
|
||||
exec >/dev/null 2>&1
|
||||
end
|
||||
exit 1
|
||||
}
|
||||
|
||||
### process options
|
||||
while getopts d:hi:T opt
|
||||
do
|
||||
case $opt in
|
||||
d) opt_device=1; device=$OPTARG ;;
|
||||
i) opt_iotype=1; iotype=$OPTARG ;;
|
||||
T) opt_timestamp=1 ;;
|
||||
h|?) usage ;;
|
||||
esac
|
||||
done
|
||||
shift $(( $OPTIND - 1 ))
|
||||
if (( $# )); then
|
||||
opt_interval=1
|
||||
interval=$1
|
||||
shift
|
||||
fi
|
||||
if (( $# )); then
|
||||
opt_count=1
|
||||
count=$1
|
||||
fi
|
||||
if (( opt_device )); then
|
||||
major=${device%,*}
|
||||
minor=${device#*,}
|
||||
dev=$(( (major << 20) + minor ))
|
||||
fi
|
||||
|
||||
### select awk
|
||||
[[ -x /usr/bin/mawk ]] && awk='mawk -W interactive' || awk=awk
|
||||
|
||||
### check permissions
|
||||
cd $tracing || die "ERROR: accessing tracing. Root user? Kernel has FTRACE?
|
||||
debugfs mounted? (mount -t debugfs debugfs /sys/kernel/debug)"
|
||||
|
||||
### ftrace lock
|
||||
[[ -e $flock ]] && die "ERROR: ftrace may be in use by PID $(cat $flock) $flock"
|
||||
echo $$ > $flock || die "ERROR: unable to write $flock."
|
||||
wroteflock=1
|
||||
|
||||
### setup and begin tracing
|
||||
warn "echo nop > current_tracer"
|
||||
warn "echo $bufsize_kb > buffer_size_kb"
|
||||
filter=
|
||||
if (( opt_iotype )); then
|
||||
filter="rwbs ~ \"$iotype\""
|
||||
fi
|
||||
if (( opt_device )); then
|
||||
[[ "$filter" != "" ]] && filter="$filter && "
|
||||
filter="${filter}dev == $dev"
|
||||
fi
|
||||
if (( opt_iotype || opt_device )); then
|
||||
echo $filter
|
||||
if ! echo "$filter" > events/block/block_rq_issue/filter || \
|
||||
! echo "$filter" > events/block/block_rq_complete/filter
|
||||
then
|
||||
edie "ERROR: setting -d or -t filter. Exiting."
|
||||
fi
|
||||
fi
|
||||
if ! echo 1 > events/block/block_rq_issue/enable || \
|
||||
! echo 1 > events/block/block_rq_complete/enable; then
|
||||
edie "ERROR: enabling block I/O tracepoints. Exiting."
|
||||
fi
|
||||
etext=
|
||||
(( !opt_count )) && etext=" Ctrl-C to end."
|
||||
echo "Tracing block I/O. Output every $interval seconds.$etext"
|
||||
|
||||
#
|
||||
# Determine output format. It may be one of the following (newest first):
|
||||
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
|
||||
# TASK-PID CPU# TIMESTAMP FUNCTION
|
||||
# To differentiate between them, the number of header fields is counted,
|
||||
# and an offset set, to skip the extra column when needed.
|
||||
#
|
||||
offset=$($awk 'BEGIN { o = 0; }
|
||||
$1 == "#" && $2 ~ /TASK/ && NF == 6 { o = 1; }
|
||||
$2 ~ /TASK/ { print o; exit }' trace)
|
||||
|
||||
### print trace buffer
|
||||
warn "echo > trace"
|
||||
i=0
|
||||
while (( !opt_count || (i < count) )); do
|
||||
(( i++ ))
|
||||
sleep $interval
|
||||
|
||||
# snapshots were added in 3.10
|
||||
if [[ -x snapshot ]]; then
|
||||
echo 1 > snapshot
|
||||
echo > trace
|
||||
cat snapshot
|
||||
else
|
||||
cat trace
|
||||
echo > trace
|
||||
fi
|
||||
|
||||
(( opt_timestamp )) && printf "time %(%H:%M:%S)T:\n" -1
|
||||
echo "tick"
|
||||
done | \
|
||||
$awk -v o=$offset -v opt_timestamp=$opt_timestamp '
|
||||
function star(sval, smax, swidth) {
|
||||
stars = ""
|
||||
if (smax == 0) return ""
|
||||
for (si = 0; si < (swidth * sval / smax); si++) {
|
||||
stars = stars "#"
|
||||
}
|
||||
return stars
|
||||
}
|
||||
|
||||
BEGIN { max_i = 0 }
|
||||
|
||||
# common fields
|
||||
$1 != "#" {
|
||||
time = $(3+o); sub(":", "", time)
|
||||
dev = $(5+o)
|
||||
}
|
||||
|
||||
# block I/O request
|
||||
$1 != "#" && $0 ~ /rq_issue/ {
|
||||
#
|
||||
# example: (fields1..4+o) 202,1 W 0 () 12862264 + 8 [tar]
|
||||
# The cmd field "()" might contain multiple words (hex),
|
||||
# hence stepping from the right (NF-3).
|
||||
#
|
||||
loc = $(NF-3)
|
||||
starts[dev, loc] = time
|
||||
next
|
||||
}
|
||||
|
||||
# block I/O completion
|
||||
$1 != "#" && $0 ~ /rq_complete/ {
|
||||
#
|
||||
# example: (fields1..4+o) 202,1 W () 12862256 + 8 [0]
|
||||
#
|
||||
dir = $(6+o)
|
||||
loc = $(NF-3)
|
||||
|
||||
if (starts[dev, loc] > 0) {
|
||||
latency_ms = 1000 * (time - starts[dev, loc])
|
||||
i = 0
|
||||
for (ms = 1; latency_ms > ms; ms *= 2) { i++ }
|
||||
hist[i]++
|
||||
if (i > max_i)
|
||||
max_i = i
|
||||
delete starts[dev, loc]
|
||||
}
|
||||
next
|
||||
}
|
||||
|
||||
# timestamp
|
||||
$1 == "time" {
|
||||
lasttime = $2
|
||||
}
|
||||
|
||||
# print summary
|
||||
$1 == "tick" {
|
||||
print ""
|
||||
if (opt_timestamp)
|
||||
print lasttime
|
||||
|
||||
# find max value
|
||||
max_v = 0
|
||||
for (i = 0; i <= max_i; i++) {
|
||||
if (hist[i] > max_v)
|
||||
max_v = hist[i]
|
||||
}
|
||||
|
||||
# print histogram
|
||||
printf "%8s .. %-8s: %-8s |%-38s|\n", ">=(ms)", "<(ms)",
|
||||
"I/O", "Distribution"
|
||||
ms = 1
|
||||
from = 0
|
||||
for (i = 0; i <= max_i; i++) {
|
||||
printf "%8d -> %-8d: %-8d |%-38s|\n", from, ms,
|
||||
hist[i], star(hist[i], max_v, 38)
|
||||
from = ms
|
||||
ms *= 2
|
||||
}
|
||||
fflush()
|
||||
delete hist
|
||||
delete starts # invalid if events missed between snapshots
|
||||
max_i = 0
|
||||
}
|
||||
|
||||
$0 ~ /LOST.*EVENTS/ { print "WARNING: " $0 > "/dev/stderr" }
|
||||
'
|
||||
|
||||
### end tracing
|
||||
end
|
||||
108
man/man8/iolatency.8
Normal file
108
man/man8/iolatency.8
Normal file
@@ -0,0 +1,108 @@
|
||||
.TH iolatency 8 "2014-07-12" "USER COMMANDS"
|
||||
.SH NAME
|
||||
iolatency \- summarize block device I/O latency as a histogram. Uses Linux ftrace.
|
||||
.SH SYNOPSIS
|
||||
.B iolatency
|
||||
[\-hT] [\-d device] [\-i iotype] [interval [count]]
|
||||
.SH DESCRIPTION
|
||||
This shows the distribution of latency, allowing modes and latency outliers
|
||||
to be identified and studied. For more details of block device I/O, use
|
||||
iosnoop(8).
|
||||
|
||||
This is a proof of concept tool using ftrace, and involves user space
|
||||
processing and related overheads. See the OVERHEAD section.
|
||||
|
||||
NOTE: Due to the way trace buffers are switched per interval, there is the
|
||||
possibility of losing a small number of I/O (usually less than 1%). The
|
||||
summary therefore shows the general distribution, but may be slightly
|
||||
incomplete. If 100% of I/O must be studied, use iosnoop(8) and post-process.
|
||||
Also note that I/O may be missed when the trace buffer is full: see the
|
||||
interval section in OPTIONS.
|
||||
|
||||
Since this uses ftrace, only the root user can use this tool.
|
||||
.SH REQUIREMENTS
|
||||
FTRACE CONFIG, and the tracepoints block:block_rq_issue and
|
||||
block:block_rq_complete, which you may already have enabled and available on
|
||||
recent Linux kernels. And awk.
|
||||
.SH OPTIONS
|
||||
.TP
|
||||
\-d device
|
||||
Only show I/O issued by this device. (eg, "202,1"). This matches the DEV
|
||||
column in the iolatency output, and is filtered in-kernel.
|
||||
.TP
|
||||
\-i iotype
|
||||
Only show I/O issued that matches this I/O type. This matches the TYPE column
|
||||
in the iolatency output, and wildcards ("*") can be used at the beginning or
|
||||
end (only). Eg, "*R*" matches all reads. This is filtered in-kernel.
|
||||
.TP
|
||||
\-h
|
||||
Print usage message.
|
||||
.TP
|
||||
\-T
|
||||
Include timestamps with each summary output.
|
||||
.TP
|
||||
interval
|
||||
Interval between summary histograms, in seconds.
|
||||
|
||||
During the interval, trace output will be buffered in-kernel, which is then
|
||||
read and processed for the summary. This buffer has a fixed size per-CPU (see
|
||||
/sys/kernel/debug/tracing/buffer_size_kb). If you think events are missing,
|
||||
try increasing that size (the bufsize_kb setting in iolatency). With the
|
||||
default setting (4 Mbytes), I'd expect this to happen around 50k I/O per
|
||||
summary.
|
||||
.TP
|
||||
count
|
||||
Number of summaries to print.
|
||||
.SH EXAMPLES
|
||||
.TP
|
||||
Default output, print a summary of block I/O latency every 1 second:
|
||||
#
|
||||
.B iolatency
|
||||
.TP
|
||||
Print 5 x 1 second summaries:
|
||||
#
|
||||
.B iolatency 1 5
|
||||
.TP
|
||||
Trace reads only:
|
||||
#
|
||||
.B iolatency \-i '*R*'
|
||||
.TP
|
||||
Trace I/O issued to device 202,1 only:
|
||||
#
|
||||
.B iolatency \-d 202,1
|
||||
.SH FIELDS
|
||||
.TP
|
||||
>=(ms)
|
||||
Latency was greater than or equal-to this value, in milliseconds.
|
||||
.TP
|
||||
<(ms)
|
||||
Latency was less than this value, in milliseconds.
|
||||
.TP
|
||||
I/O
|
||||
Number of block device I/O in this latency range, during the interval.
|
||||
.TP
|
||||
Distribution
|
||||
ASCII histogram representation of the I/O column.
|
||||
.SH OVERHEAD
|
||||
Block device I/O issue and completion events are traced and buffered
|
||||
in-kernel, then processed and summarized in user space. There may be
|
||||
measurable overhead with this approach, relative to the block device IOPS.
|
||||
|
||||
The overhead may be acceptable in many situations. If it isn't, this tool
|
||||
can be reimplemented in C, or using a different tracer (eg, perf_events,
|
||||
SystemTap, ktap.)
|
||||
.SH SOURCE
|
||||
This is from the perf-tools collection.
|
||||
.IP
|
||||
https://github.com/brendangregg/perf-tools
|
||||
.PP
|
||||
Also look under the examples directory for a text file containing example
|
||||
usage, output, and commentary for this tool.
|
||||
.SH OS
|
||||
Linux
|
||||
.SH STABILITY
|
||||
Unstable - in development.
|
||||
.SH AUTHOR
|
||||
Brendan Gregg
|
||||
.SH SEE ALSO
|
||||
iosnoop(8), iostat(1)
|
||||
Reference in New Issue
Block a user