cachestat for page cache statistics

This commit is contained in:
Brendan Gregg
2015-01-02 11:54:12 +09:30
parent 21ee61e151
commit 41fe778cd3
5 changed files with 338 additions and 0 deletions

View File

@@ -25,6 +25,7 @@ Using ftrace:
- [execsnoop](execsnoop): trace process exec() with command line argument details. [Examples](examples/execsnoop_example.txt).
- [opensnoop](opensnoop): trace open() syscalls showing filenames. [Examples](examples/opensnoop_example.txt).
- [killsnoop](killsnoop): trace kill() signals showing process and signal details. [Examples](examples/killsnoop_example.txt).
- fs/[cachestat](fs/cachestat): basic cache hit/miss statistics for the Linux page cache. [Examples](examples/cachestat_example.txt).
- net/[tcpretrans](net/tcpretrans): show TCP retransmits, with address and other details. [Examples](examples/tcpretrans_example.txt).
- system/[tpoint](system/tpoint): trace a given tracepoint. [Examples](examples/tpoint_example.txt).
- kernel/[funccount](kernel/funccount): count kernel function calls, matching a string with wildcards. [Examples](examples/funccount_example.txt).

1
bin/cachestat Symbolic link
View File

@@ -0,0 +1 @@
../fs/cachestat

View File

@@ -0,0 +1,58 @@
Demonstrations of cachestat, the Linux ftrace version.
Here is some sample output showing file system cache statistics, followed by
the workload that caused it:
# ./cachestat -t
Counting cache functions... Output every 1 seconds.
TIME HITS MISSES DIRTIES RATIO BUFFERS_MB CACHE_MB
08:28:57 415 0 0 100.0% 1 191
08:28:58 411 0 0 100.0% 1 191
08:28:59 362 97 0 78.9% 0 8
08:29:00 411 0 0 100.0% 0 9
08:29:01 775 20489 0 3.6% 0 89
08:29:02 411 0 0 100.0% 0 89
08:29:03 6069 0 0 100.0% 0 89
08:29:04 15249 0 0 100.0% 0 89
08:29:05 411 0 0 100.0% 0 89
08:29:06 411 0 0 100.0% 0 89
08:29:07 411 0 3 100.0% 0 89
[...]
I used the -t option to include the TIME column, to make describing the output
easier.
The workload was:
# echo 1 > /proc/sys/vm/drop_caches; sleep 2; cksum 80m; sleep 2; cksum 80m
At 8:28:58, the page cache was dropped by the first command, which can be seen
by the drop in size for "CACHE_MB" (page cache size) from 191 Mbytes to 8.
After a 2 second sleep, a cksum command was issued at 8:29:01, for an 80 Mbyte
file (called "80m"), which caused a total of ~20,400 misses ("MISSES" column),
and the page cache size to grow by 80 Mbytes. The hit ratio during this dropped
to 3.6%. Finally, after another 2 second sleep, at 8:29:03 the cksum command
was run a second time, this time hitting entirely from cache.
Instrumenting all file system cache accesses does cost some overhead, and this
tool might slow your target system by 2% or so. Test before use if this is a
concern.
This tool also uses dynamic tracing, and is tied to Linux kernel implementation
details. If it doesn't work for you, it probably needs fixing.
Use -h to print the USAGE message:
# ./cachestat -h
USAGE: cachestat [-Dht] [interval]
-D # print debug counters
-h # this usage message
-t # include timestamp
interval # output interval in secs (default 1)
eg,
cachestat # show stats every second
cachestat 5 # show stats every 5 seconds
See the man page and example file for more info.

167
fs/cachestat Executable file
View File

@@ -0,0 +1,167 @@
#!/bin/bash
#
# cachestat - show Linux page cache hit/miss statistics.
# Uses Linux ftrace.
#
# This is a proof of concept using Linux ftrace capabilities on older kernels,
# and works by using function profiling for in-kernel counters. Specifically,
# four kernel functions are traced:
#
# mark_page_accessed() for measuring cache accesses
# mark_buffer_dirty() for measuring cache writes
# add_to_page_cache_lru() for measuring page additions
# account_page_dirtied() for measuring page dirties
#
# It is possible that these functions have been renamed (or are different
# logically) for your kernel version, and this script will not work as-is.
# This script was written on Linux 3.13. This script is a sandcastle: the
# kernel may wash some away, and you'll need to rebuild.
#
# USAGE: cachestat [-Dht] [interval]
# eg,
# cachestat 5 # show stats every 5 seconds
#
# Run "cachestat -h" for full usage.
#
# WARNING: This uses dynamic tracing of kernel functions, and could cause
# kernel panics or freezes. Test, and know what you are doing, before use.
# It also traces cache activity, which can be frequent, and cost some overhead.
# The statistics should be treated as best-effort: there may be some error
# margin depending on unusual workload types.
#
# REQUIREMENTS: CONFIG_FUNCTION_PROFILER, awk.
#
# From perf-tools: https://github.com/brendangregg/perf-tools
#
# COPYRIGHT: Copyright (c) 2014 Brendan Gregg.
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software Foundation,
# Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
#
# (http://www.gnu.org/copyleft/gpl.html)
#
# 28-Dec-2014 Brendan Gregg Created this.
### default variables
tracing=/sys/kernel/debug/tracing
interval=1; opt_timestamp=0; opt_debug=0
trap 'quit=1' INT QUIT TERM PIPE HUP # sends execution to end tracing section
function usage {
cat <<-END >&2
USAGE: cachestat [-Dht] [interval]
-D # print debug counters
-h # this usage message
-t # include timestamp
interval # output interval in secs (default 1)
eg,
cachestat # show stats every second
cachestat 5 # show stats every 5 seconds
See the man page and example file for more info.
END
exit
}
function warn {
if ! eval "$@"; then
echo >&2 "WARNING: command failed \"$@\""
fi
}
function die {
echo >&2 "$@"
exit 1
}
### process options
while getopts Dht opt
do
case $opt in
D) opt_debug=1 ;;
t) opt_timestamp=1 ;;
h|?) usage ;;
esac
done
shift $(( $OPTIND - 1 ))
### option logic
if (( $# )); then
interval=$1
fi
echo "Counting cache functions... Output every $interval seconds."
### check permissions
cd $tracing || die "ERROR: accessing tracing. Root user? Kernel has FTRACE?
debugfs mounted? (mount -t debugfs debugfs /sys/kernel/debug)"
### enable tracing
sysctl -q kernel.ftrace_enabled=1 # doesn't set exit status
printf "mark_page_accessed\nmark_buffer_dirty\nadd_to_page_cache_lru\naccount_page_dirtied\n" > set_ftrace_filter || \
die "ERROR: tracing these four kernel functions: mark_page_accessed,"\
"mark_buffer_dirty, add_to_page_cache_lru and account_page_dirtied (unknown kernel version?). Exiting."
warn "echo nop > current_tracer"
if ! echo 1 > function_profile_enabled; then
echo > set_ftrace_filter
die "ERROR: enabling function profiling. Have CONFIG_FUNCTION_PROFILER? Exiting."
fi
(( opt_timestamp )) && printf "%-8s " TIME
printf "%8s %8s %8s %8s %12s %10s" HITS MISSES DIRTIES RATIO "BUFFERS_MB" "CACHE_MB"
(( opt_debug )) && printf " DEBUG"
echo
### summarize
quit=0; secs=0
while (( !quit && (!opt_duration || secs < duration) )); do
(( secs += interval ))
echo 0 > function_profile_enabled
echo 1 > function_profile_enabled
sleep $interval
(( opt_timestamp )) && printf "%(%H:%M:%S)T " -1
# cat both meminfo and trace stats, and let awk pick them apart
cat /proc/meminfo trace_stat/function* | awk -v debug=$opt_debug '
# match meminfo stats:
$1 == "Buffers:" && $3 == "kB" { buffers_mb = $2 / 1024 }
$1 == "Cached:" && $3 == "kB" { cached_mb = $2 / 1024 }
# identify and save trace counts:
$2 ~ /[0-9]/ && $3 != "kB" { a[$1] += $2 }
END {
mpa = a["mark_page_accessed"]
mbd = a["mark_buffer_dirty"]
apcl = a["add_to_page_cache_lru"]
apd = a["account_page_dirtied"]
total = mpa - mbd
misses = apcl - apd
if (misses < 0)
misses = 0
hits = total - misses
ratio = 100 * hits / total
printf "%8d %8d %8d %7.1f%% %12.0f %10.0f", hits, misses, mbd,
ratio, buffers_mb, cached_mb
if (debug)
printf " (%d %d %d %d)", mpa, mbd, apcl, apd
printf "\n"
}'
done
### end tracing
echo 2>/dev/null
echo "Ending tracing..." 2>/dev/null
warn "echo 0 > function_profile_enabled"
warn "echo > set_ftrace_filter"

111
man/man8/cachestat.8 Normal file
View File

@@ -0,0 +1,111 @@
.TH cachestat 8 "2014-12-28" "USER COMMANDS"
.SH NAME
cachestat \- Measure page cache hits/misses. Uses Linux ftrace.
.SH SYNOPSIS
.B cachestat
[\-Dht] [interval]
.SH DESCRIPTION
This tool provides basic cache hit/miss statistics for the Linux page cache.
Its current implementation uses Linux ftrace dynamic function profiling to
create custom in-kernel counters, which is a workaround until such counters
can be built-in to the kernel. Specifically, four kernel functions are counted:
.IP
mark_page_accessed() for measuring cache accesses
.IP
mark_buffer_dirty() for measuring cache writes
.IP
add_to_page_cache_lru() for measuring page additions
.IP
account_page_dirtied() for measuring page dirties
.PP
It is possible that these functions have been renamed (or are different
logically) for your kernel version, and this script will not work as-is.
This was written for a Linux 3.13 kernel, and tested on a few others versions.
This script is a sandcastle: the kernel may wash some away, and you'll
need to rebuild.
This program's implementation can be improved in the future when other
kernel capabilities are made available. If you need a more reliable tool now,
then consider other tracing alternatives (eg, SystemTap). This tool is really
a proof of concept to see what ftrace can currently do.
WARNING: This uses dynamic tracing of kernel functions, and could cause
kernel panics or freezes. Test, and know what you are doing, before use.
It also traces cache activity, which can be frequent, and cost some overhead.
The statistics should be treated as best-effort: there may be some error
margin depending on unusual workload types.
Since this uses ftrace, only the root user can use this tool.
.SH REQUIREMENTS
CONFIG_FUNCTION_PROFILER, which you may already have enabled and available on
recent kernels, and awk.
.SH OPTIONS
.TP
\-D
Include extra fields for debug purposes (see script).
.TP
\-h
Print usage message.
.TP
\-t
Include timestamps in units of seconds.
.TP
interval
Output interval in seconds. Default is 1.
.SH EXAMPLES
.TP
Show per-second page cache statistics:
#
.B cachestat
.SH FIELDS
.TP
TIME
Time, in HH:MM:SS.
.TP
HITS
Number of page cache hits (reads). Each hit is for one memory page (the size
depends on your processor architecture; commonly 4 Kbytes). Since this tool
outputs at a timed interval, this field indicates the cache hit rate.
.TP
MISSES
Number of page cache misses (reads from storage I/O). Each miss is for one
memory page. Cache misses should be causing disk I/O. Run iostat(1) for
correlation (although the miss count and size by the time disk I/O is issued
can differ due to I/O subsystem merging).
.TP
DIRTIES
Number of times a page in the page cache was written to and thus "dirtied".
The same page may be counted multiple times per interval, if it is written
to multiple times. This field gives an indication of how much cache churn there
is, caused by applications writing data.
.TP
RATIO
The ratio of cache hits to total cache accesses (hits + misses), as a
percentage.
.TP
BUFFERS_MB
Size of the buffer cache, for disk I/O. From /proc/meminfo.
.TP
CACHED_MB
Size of the page cache, for file system I/O. From /proc/meminfo.
.SH OVERHEAD
This tool currently uses ftrace function profiling, which provides efficient
in-kernel counters. However, the functions profiled are executed frequently,
so the overheads can add up. Test and measure before use. My own testing
showed around a 2% loss in application performance while this tool was running.
.SH SOURCE
This is from the perf-tools collection.
.IP
https://github.com/brendangregg/perf-tools
.PP
Also look under the examples directory for a text file containing example
usage, output, and commentary for this tool.
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
iostat(1), iosnoop(8)