Packet trace files can contain packets from many distinct TCP flows. Sometimes you want to look at specific flows or compare two flows. If you know what flows you want to look at it is easy to extract them from the file BUT if you want to extract each flow into its own trace it can be very time consuming. You can use the command line in figure 1 to have Tshark do it BUT it requires scanning the trace file N+1 times where N is the number of TCP streams that are identified. This can take many minutes to hours for a large file with many TCP flows.
# date; f=test-short.pcap; for x in $(tshark -r $f -T fields -e tcp.stream | sort -nu); do tshark \ -r $f -R "tcp.stream == $x" -w /tmp/Stream-$x--$f; done; date Sun Feb 8 11:20:42 MST 2015 . . . . Sun Feb 8 13:36:48 MST 2015 # # ls -l Stream*test-short.pcap | wc -l 958 # |
What I wanted was something that would extract all the TCP streams with 1 pass through the file. I could not find anything and so wrote the following Perl script. See figure 2 for a comparison of execution times.
Usage
perl split-pcap.pl TYPE PCAP-FILE
Where:
TYPE is either the string "ether" for frames containing an Ethernet header followed by an IP header or the string "evlan" for frames containing an Ethernet header followed by a VLAN header followed by an IP header or the string "sll" for frames containing a "Linux cooked" header. SLL stands for sockaddr_ll or socket link layer. If the script runs and does not split out any traces chances are you used the wrong type or the trace contains a frame type other than one of these types.
PCAP-FILE is the name of the pcap file. This can be a relative or absolute name. The file must be in pcap format, not pcapng format.
Output consists primarily of a set of files names with the format PCAP-FILE-IPx:TCP_PORTx-IPy:TCP_PORTy.pcap. The IPx:TCP_PORTx:IPy:TCP_PORTy is ordered based on the lowest TCP PORT value so TCP_PORTx < TCP_PORTy. Files are written in the same directory as PCAP-FILE. NOTE that this is the TCP 4-tuple which is NOT quite the same as a Wireshark/Tshark stream. If the port numbers are reused in multiple connections Wireshark/Tshark will recognize this and create multiple streams. This script is not so smart and all packets with the same 4-tuple are grouped together.
There is also a status line that gives an approximate percent complete with the number of bytes processed and total number of bytes. This is approximate but close enough to give you an idea of how long the process with take. There is also a count of the number of flows identified and identified but not separated out because of the open file count limitation. If the "identified but not separated out" number is not zero there will also be a message that the file count has been exceeded (see figure 3). If that happens there will also be a PCAP-FILE-missed-4-tuples file.
Requirements
Limitations
Examples
Figure 2 shows the processing of the same file as figure 1. Note the time difference, 4 seconds versus 2+ hours.
Figure 2
# date; perl ../y.pl sll test-short.pcap; date
Sun Feb 8 17:24:07 MST 2015
0% (76/76413360) 4-tuple Saved/No Saved count is 0/0
0% (144/76413360) 4-tuple Saved/No Saved count is 0/0
0% (212/76413360) 4-tuple Saved/No Saved count is 0/0
0% (294/76413360) 4-tuple Saved/No Saved count is 1/0
. . . . .
100% (76181920/76413360) 4-tuple Saved/No Saved count is 959/0
100% (76183131/76413360) 4-tuple Saved/No Saved count is 959/0
100% (76183254/76413360) 4-tuple Saved/No Saved count is 959/0
100% (76183336/76413360) 4-tuple Saved/No Saved count is 959/0
Sun Feb 8 17:24:11 MST 2015
#
# ls | head -5
test-short.pcap-103.10.4.216:80-192.168.1.200:42583.pcap
test-short.pcap-103.10.4.216:80-192.168.1.200:42584.pcap
test-short.pcap-103.10.4.216:80-192.168.1.200:42585.pcap
test-short.pcap-103.10.4.216:80-192.168.1.200:42586.pcap
test-short.pcap-103.10.4.216:80-192.168.1.200:42589.pcap
#
#!/usr/bin/perl # # split-pcap.pl # # version 1.0 2015-01-25 # version 1.1 2015-02-07 added sll frame type processing # version 1.2 2016-03-03 added evlan frame type for an Ethernet frame with VLAN tags # # Usage: # perl split-pcap.pl TYPE PCAP-FILE # # This script will read the PCAP-FILE and for each Ethernet (TYPE == ether) # SLL (TYPE = sll), or Ethernet with VLAN tags (type == evlan) frame containing # an IP/TCP packet will write a file named PCAP-FILE-IPx:TCP_PORTx-IPy:TCP_PORTy.pcap. # The IPx:TCP_PORTx:IPy:TCP_PORTy is ordered based on the lowest TCP PORT value so # TCP_PORTx < TCP_PORTy. Files are written in the same directory as PCAP-FILE. Status # information is written as the PCAP-FILE is read but the percentage complete value # is approximate. # # Known limitations # 1. The split into spearate file is based on the 4 tuple # |