tcpretrans

This commit is contained in:
Brendan Gregg
2014-07-31 22:47:08 -07:00
parent 12be5c4288
commit a576f7eb72
5 changed files with 368 additions and 0 deletions

View File

@@ -22,6 +22,7 @@ Using ftrace:
- [execsnoop](execsnoop): trace process exec() with command line argument details. [Examples](examples/execsnoop_example.txt).
- [opensnoop](opensnoop): trace open() syscalls showing filenames. [Examples](examples/opensnoop_example.txt).
- system/[tpoint](system/tpoint): trace a given tracepoint. [Examples](examples/tpoint_example.txt).
- net/[tcpretrans](net/tcpretrans): show TCP retransmits, with address and other details. [Examples](examples/tcpretrans_example.txt).
- kernel/[funccount](kernel/funccount): count kernel functions that match a string. [Examples](examples/funccount_example.txt).
- kernel/[functrace](kernel/functrace): trace kernel functions that match a string. [Examples](examples/functrace_example.txt).
- kernel/[funcslower](kernel/funcslower): trace kernel functions slower than a threshold. [Examples](examples/funcslower_example.txt).

1
bin/tcpretrans Symbolic link
View File

@@ -0,0 +1 @@
../net/tcpretrans

View File

@@ -0,0 +1,41 @@
Demonstrations of tcpretrans, the Linux ftrace version.
Tracing TCP retransmits on a busy server:
# ./tcpretrans
TIME PID LADDR:LPORT -- RADDR:RPORT STATE
05:16:44 3375 10.150.18.225:53874 R> 10.105.152.3:6001 ESTABLISHED
05:16:44 3375 10.150.18.225:53874 R> 10.105.152.3:6001 ESTABLISHED
05:16:54 4028 10.150.18.225:6002 R> 10.150.30.249:1710 ESTABLISHED
05:16:54 4028 10.150.18.225:6002 R> 10.150.30.249:1710 ESTABLISHED
05:16:54 4028 10.150.18.225:6002 R> 10.150.30.249:1710 ESTABLISHED
05:16:54 4028 10.150.18.225:6002 R> 10.150.30.249:1710 ESTABLISHED
05:16:54 4028 10.150.18.225:6002 R> 10.150.30.249:1710 ESTABLISHED
05:16:54 4028 10.150.18.225:6002 R> 10.150.30.249:1710 ESTABLISHED
05:16:54 4028 10.150.18.225:6002 R> 10.150.30.249:1710 ESTABLISHED
05:16:55 0 10.150.18.225:47115 R> 10.71.171.158:6001 ESTABLISHED
05:16:58 0 10.150.18.225:44388 R> 10.103.130.120:6001 ESTABLISHED
05:16:58 0 10.150.18.225:44388 R> 10.103.130.120:6001 ESTABLISHED
05:16:58 0 10.150.18.225:44388 R> 10.103.130.120:6001 ESTABLISHED
05:16:59 0 10.150.18.225:56086 R> 10.150.32.107:6001 ESTABLISHED
05:16:59 0 10.150.18.225:56086 R> 10.150.32.107:6001 ESTABLISHED
^C
Ending tracing...
This shows TCP retransmits by dynamically tracing the kernel function that does
the retransmit. This is a low overhead approach.
The PID may or may not make sense: it's showing the PID that was on-CPU,
however, retransmits are often timer-based, where it's the kernel that is
on-CPU.
The STATE column shows the TCP state for the socket performing the retransmit.
Use -h to print the USAGE message:
# ./tcpretrans -h
USAGE: tcpretrans [-h]
eg,
tcpretrans # trace TCP retransmits

78
man/man8/tcpretrans.8 Normal file
View File

@@ -0,0 +1,78 @@
.TH tcpretrans 8 "2014-07-31" "USER COMMANDS"
.SH NAME
tcpretrans \- show TCP retransmits, with address and other details. Uses Linux ftrace.
.SH SYNOPSIS
.B tcpretrans
[\-h]
.SH DESCRIPTION
This traces TCP retransmits that are sent by the system tcpretrans is executed
from, showing address, port, and TCP state information,
and sometimes the PID (although usually not, since retransmits are usually
sent by the kernel on timeout events). To keep overhead low, only
tcp_retransmit_skb() kernel calls are traced (this does not trace every packet).
This was written as a proof of concept for ftrace, for older Linux systems,
and without kernel debuginfo. It uses dynamic tracing of tcp_retransmit_skb(),
and reads /proc/net/tcp for socket details. Its use of dynamic tracing and
CPU registers is an unstable platform-specific workaround, and may require
modifications to work on different kernels and platforms. This would be better
written using a tracer such as SystemTap, and will likely be rewritten in the
future when certain tracing features are added to the Linux kernel.
Currently only IPv4 is supported.
Since this uses ftrace, only the root user can use this tool.
.SH REQUIREMENTS
FTRACE and KPROBE CONFIG, tcp_retransmit_skb() kernel function.
You may have these already have these on recent kernels. And Perl.
.SH OPTIONS
.TP
\-h
Print usage message.
.SH EXAMPLES
.TP
Trace TCP retransmits
#
.B tcpretrans
.TP
TIME
Time of retransmit (may be rounded up to the nearest second).
.TP
PID
Process ID that was on-CPU. This is less useful than it might sound, as it
may usually be 0, for the kernel, for timer-based retransmits.
.TP
LADDR
Local address.
.TP
LPORT
Local port.
.TP
RADDR
Remote address.
.TP
RPORT
Remote port.
.TP
STATE
TCP session state.
.SH OVERHEAD
The CPU overhead is relative to the rate of TCP retransmits, and is
designed to be low as this does not examine every packet. Once per second the
/proc/net/tcp file is read, and a buffer of retransmit trace events is
retrieved from the kernel and processed.
.SH SOURCE
This is from the perf-tools collection.
.IP
https://github.com/brendangregg/perf-tools
.PP
Also look under the examples directory for a text file containing example
usage, output, and commentary for this tool.
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
tcpdump(1)

247
net/tcpretrans Executable file
View File

@@ -0,0 +1,247 @@
#!/usr/bin/perl
#
# tcpretrans - show TCP retransmts, with address and other details.
# Written using Linux ftrace.
#
# This traces TCP retransmits, showing address, port, and TCP state information,
# and sometimes the PID (although usually not, since retransmits are usually
# sent by the kernel on timeouts). To keep overhead low, only
# tcp_retransmit_skb() calls are traced (this does not trace every packet).
#
# USAGE: ./tcpretrans [-h]
#
# REQUIREMENTS: FTRACE and KPROBE CONFIG, tcp_retransmit_skb() kernel function.
# You may have these already have these on recent kernels. And Perl.
#
# This was written as a proof of concept for ftrace, for older Linux systems,
# and without kernel debuginfo. It uses dynamic tracing of tcp_retransmit_skb(),
# and reads /proc/net/tcp for socket details. Its use of dynamic tracing and
# CPU registers is an unstable platform-specific workaround, and may require
# modifications to work on different kernels and platforms. This would be better
# written using a tracer such as SystemTap, and will likely be rewritten in the
# future when certain tracing features are added to the Linux kernel.
#
# Currently only IPv4 is supported.
#
# OVERHEAD: The CPU overhead is relative to the rate of TCP retransmits, and is
# designed to be low as this does not examine every packet. Once per second the
# /proc/net/tcp file is read, and a buffer of retransmit trace events is
# retrieved from the kernel and processed.
#
# From perf-tools: https://github.com/brendangregg/perf-tools
#
# See the tcpretrans(8) man page (in perf-tools) for more info.
#
# COPYRIGHT: Copyright (c) 2014 Brendan Gregg.
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software Foundation,
# Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
#
# (http://www.gnu.org/copyleft/gpl.html)
#
# 28-Jul-2014 Brendan Gregg Created this.
use strict;
use warnings;
use POSIX qw(strftime);
use Getopt::Long;
my $tracing = "/sys/kernel/debug/tracing";
my $flock = "/var/tmp/.ftrace-lock";
my $interval = 1;
local $SIG{INT} = \&cleanup;
local $SIG{QUIT} = \&cleanup;
local $SIG{TERM} = \&cleanup;
local $SIG{PIPE} = \&cleanup;
local $SIG{HUP} = \&cleanup;
$| = 1;
### options
my ($help);
GetOptions("help" => \$help)
or usage();
usage() if $help;
sub usage {
print STDERR "USAGE: tcpretrans [-h]\n";
print STDERR " eg,\n";
print STDERR " tcpretrans # trace TCP retransmits\n";
exit;
}
sub ldie {
unlink $flock;
die @_;
}
sub writeto {
my ($string, $file) = @_;
open FILE, ">$file" or return 0;
print FILE $string or return 0;
close FILE or return 0;
}
sub appendto {
my ($string, $file) = @_;
open FILE, ">>$file" or return 0;
print FILE $string or return 0;
close FILE or return 0;
}
# tcp socket cache
my %tcp;
sub cache_tcp {
undef %tcp;
open(TCP, "/proc/net/tcp") or ldie "ERROR: reading /proc/net/tcp.";
while (<TCP>) {
next if /^ *sl/;
my ($sl, $local_address, $rem_address, $st, $tx_rx, $tr_tm,
$retrnsmt, $uid, $timeout, $inode, $jf, $sk) = split;
$sk =~ s/^0x//;
$tcp{$sk}{laddr} = $local_address;
$tcp{$sk}{raddr} = $rem_address;
$tcp{$sk}{state} = $st;
}
close TCP;
}
my @tcpstate;
sub map_tcp_states {
push @tcpstate, "NULL";
for (<DATA>) {
chomp;
s/.*TCP_//;
s/[, ].*$//;
push @tcpstate, $_;
}
}
# /proc/net/tcp hex addr to dotted quad decimal
sub inet_h2a {
my ($haddr) = @_;
my @addr = ();
for my $num ($haddr =~ /(..)(..)(..)(..)/) {
unshift @addr, hex($num);
}
return join(".", @addr);
}
### check permissions
chdir "$tracing" or die "ERROR: accessing tracing. Root? Kernel has FTRACE?" .
"\ndebugfs mounted? (mount -t debugfs debugfs /sys/kernel/debug)";
### ftrace lock
if (-e $flock) {
open FLOCK, $flock; my $fpid = <FLOCK>; chomp $fpid; close FLOCK;
die "ERROR: ftrace may be in use by PID $fpid ($flock)";
}
writeto "$$", $flock or die "ERROR: unable to write $flock.";
### setup and begin tracing
writeto "nop", "current_tracer" or ldie "ERROR: disabling current_tracer.";
my $kname = "tcpretrans_tcp_retransmit_skb";
appendto "p:$kname tcp_retransmit_skb sk=%di", "kprobe_events"
or ldie "ERROR: creating kprobe for inet_csk_accept().";
unless (writeto "1", "events/kprobes/$kname/enable") {
appendto "-:$kname", "kprobe_events";
ldie "ERROR: enabling kprobe.";
}
map_tcp_states();
printf "%-8s %-6s %-20s -- %-20s %s\n", "TIME", "PID",
"LADDR:LPORT", "RADDR:RPORT", "STATE";
#
# Read and print event data. This loop waits one second then reads the buffered
# trace data, then caches /proc/net/tcp, then iterates over the buffered trace
# data using the cached state. While this minimizes CPU overheads, it only
# works because sockets that are retransmitting are usually long lived, and
# remain in /proc/net/tcp for at least our sleep interval.
#
while (1) {
sleep $interval;
# buffer trace data
open TPIPE, "trace" or ldie "ERROR: opening trace_pipe.";
my @trace = ();
while (<TPIPE>) {
next if /^#/;
push @trace, $_;
}
close TPIPE;
writeto "0", "trace" or ldie "ERROR: clearing trace";
# cache /proc/net/tcp state
if (scalar @trace) {
cache_tcp();
}
# process and print events
for (@trace) {
my ($taskpid, $rest) = split ' ', $_, 2;
my ($task, $pid) = $taskpid =~ /(.*)-(\d+)/;
my ($skp) = $rest =~ /sk=([0-9a-fx]*)/;
next unless defined $skp and $skp ne "";
$skp =~ s/^0x//;
my ($laddr, $lport, $raddr, $rport, $state);
if (defined $tcp{$skp}) {
# convert /proc/net/tcp hex to dotted quads
my ($hladdr, $hlport) = split /:/, $tcp{$skp}{laddr};
my ($hraddr, $hrport) = split /:/, $tcp{$skp}{raddr};
$laddr = inet_h2a($hladdr);
$raddr = inet_h2a($hraddr);
$lport = hex($hlport);
$rport = hex($hrport);
$state = $tcpstate[hex($tcp{$skp}{state})];
} else {
# socket closed too quickly
($laddr, $raddr) = ("-", "-");
($lport, $rport) = ("-", "-");
$state = "-";
}
my $now = strftime "%H:%M:%S", localtime;
printf "%-8s %-6s %-20s R> %-20s %s\n", $now, $pid,
"$laddr:$lport", "$raddr:$rport", $state;
}
}
### end tracing
cleanup();
sub cleanup {
print "\nEnding tracing...\n";
close TPIPE;
writeto "0", "events/kprobes/$kname/enable"
or ldie "ERROR: disabling kprobe";
appendto "-:$kname", "kprobe_events";
writeto "", "trace";
unlink $flock;
exit;
}
# from /usr/include/netinet/tcp.h:
__DATA__
TCP_ESTABLISHED = 1,
TCP_SYN_SENT,
TCP_SYN_RECV,
TCP_FIN_WAIT1,
TCP_FIN_WAIT2,
TCP_TIME_WAIT,
TCP_CLOSE,
TCP_CLOSE_WAIT,
TCP_LAST_ACK,
TCP_LISTEN,
TCP_CLOSING /* now a valid state */