Interesting Traces - TCP Checksum Errors - again

Blue Bar separator


A couple of years ago (see Trace 35. Checksum errors I provided an example of checksum errors in an ICMP packet. This is an example of checksum errors in a TCP segment.

With the development of IP/TCP/UDP offloading features it is not uncommon for a captured trace to report that every transmitted frame has checksum errors.

No.     Time            Source                Destination           Info                                                                             
      5 11:24:47.385782 172.16.9.3          172.28.10.8        54029 > ssh [PSH, ACK] Seq=1 Ack=1 Win=256 Len=48
      6 11:24:47.385840 172.28.10.8         172.16.9.3         ssh > 54029 [ACK] Seq=1 Ack=49 Win=149 Len=0
      7 11:24:47.387457 172.28.10.8         172.16.9.3         ssh > 54029 [TCP CHECKSUM INCORRECT] [PSH, ACK] Seq=1 Ack=49 Win=149 Len=48
      9 11:24:47.597809 172.16.9.3          172.28.10.8        54029 > ssh [ACK] Seq=49 Ack=49 Win=256 Len=0
     10 11:24:47.641815 172.16.9.3          172.28.10.8        54029 > ssh [PSH, ACK] Seq=49 Ack=49 Win=256 Len=48
     11 11:24:47.643448 172.28.10.8         172.16.9.3         ssh > 54029 [TCP CHECKSUM INCORRECT] [PSH, ACK] Seq=49 Ack=97 Win=149 Len=48
     12 11:24:47.848397 172.16.9.3          172.28.10.8        54029 > ssh [ACK] Seq=97 Ack=97 Win=255 Len=0
     27 11:24:48.097808 172.16.9.3          172.28.10.8        54029 > ssh [PSH, ACK] Seq=97 Ack=97 Win=255 Len=48
     28 11:24:48.099660 172.28.10.8         172.16.9.3         ssh > 54029 [TCP CHECKSUM INCORRECT] [PSH, ACK] Seq=97 Ack=145 Win=149 Len=48
     33 11:24:48.185694 172.16.9.3          172.28.10.8        54029 > ssh [PSH, ACK] Seq=145 Ack=145 Win=255 Len=48
     34 11:24:48.187442 172.28.10.8         172.16.9.3         ssh > 54029 [TCP CHECKSUM INCORRECT] [PSH, ACK] Seq=145 Ack=193 Win=149 Len=48
     35 11:24:48.275724 172.16.9.3          172.28.10.8        54029 > ssh [PSH, ACK] Seq=193 Ack=193 Win=255 Len=48
     36 11:24:48.277452 172.28.10.8         172.16.9.3         ssh > 54029 [TCP CHECKSUM INCORRECT] [PSH, ACK] Seq=193 Ack=241 Win=149 Len=48
     42 11:24:48.417796 172.16.9.3          172.28.10.8        54029 > ssh [PSH, ACK] Seq=241 Ack=241 Win=255 Len=48
     43 11:24:48.419379 172.28.10.8         172.16.9.3         ssh > 54029 [TCP CHECKSUM INCORRECT] [PSH, ACK] Seq=241 Ack=289 Win=149 Len=48
     44 11:24:48.440737 172.28.10.8         172.16.9.3         ssh > 54029 [TCP CHECKSUM INCORRECT] [PSH, ACK] Seq=289 Ack=289 Win=149 Len=64
     45 11:24:48.451488 172.16.9.3          172.28.10.8        54029 > ssh [ACK] Seq=289 Ack=353 Win=260 Len=0
     47 11:24:48.552728 172.16.9.3          172.28.10.8        54029 > ssh [PSH, ACK] Seq=289 Ack=353 Win=260 Len=48
     48 11:24:48.554451 172.28.10.8         172.16.9.3         ssh > 54029 [TCP CHECKSUM INCORRECT] [PSH, ACK] Seq=353 Ack=337 Win=149 Len=48
     49 11:24:48.757471 172.16.9.3          172.28.10.8        54029 > ssh [ACK] Seq=337 Ack=401 Win=260 Len=0
     50 11:24:48.816885 172.16.9.3          172.28.10.8        54029 > ssh [PSH, ACK] Seq=337 Ack=401 Win=260 Len=48

In fact it is so common that starting with version 1.2 of Wireshark, checksum validation is disabled by default.

No.     Time            Source                Destination           Info                                                                             
      5 11:24:47.385782 172.16.9.3          172.28.10.8        54029 > ssh [PSH, ACK] Seq=1 Ack=1 Win=256 Len=48
      6 11:24:47.385840 172.28.10.8         172.16.9.3         ssh > 54029 [ACK] Seq=1 Ack=49 Win=149 Len=0
      7 11:24:47.387457 172.28.10.8         172.16.9.3         ssh > 54029 [PSH, ACK] Seq=1 Ack=49 Win=149 Len=48
      9 11:24:47.597809 172.16.9.3          172.28.10.8        54029 > ssh [ACK] Seq=49 Ack=49 Win=256 Len=0
     10 11:24:47.641815 172.16.9.3          172.28.10.8        54029 > ssh [PSH, ACK] Seq=49 Ack=49 Win=256 Len=48
     11 11:24:47.643448 172.28.10.8         172.16.9.3         ssh > 54029 [PSH, ACK] Seq=49 Ack=97 Win=149 Len=48
     12 11:24:47.848397 172.16.9.3          172.28.10.8        54029 > ssh [ACK] Seq=97 Ack=97 Win=255 Len=0
     27 11:24:48.097808 172.16.9.3          172.28.10.8        54029 > ssh [PSH, ACK] Seq=97 Ack=97 Win=255 Len=48
     28 11:24:48.099660 172.28.10.8         172.16.9.3         ssh > 54029 [PSH, ACK] Seq=97 Ack=145 Win=149 Len=48
     33 11:24:48.185694 172.16.9.3          172.28.10.8        54029 > ssh [PSH, ACK] Seq=145 Ack=145 Win=255 Len=48
     34 11:24:48.187442 172.28.10.8         172.16.9.3         ssh > 54029 [PSH, ACK] Seq=145 Ack=193 Win=149 Len=48
     35 11:24:48.275724 172.16.9.3          172.28.10.8        54029 > ssh [PSH, ACK] Seq=193 Ack=193 Win=255 Len=48
     36 11:24:48.277452 172.28.10.8         172.16.9.3         ssh > 54029 [PSH, ACK] Seq=193 Ack=241 Win=149 Len=48
     42 11:24:48.417796 172.16.9.3          172.28.10.8        54029 > ssh [PSH, ACK] Seq=241 Ack=241 Win=255 Len=48
     43 11:24:48.419379 172.28.10.8         172.16.9.3         ssh > 54029 [PSH, ACK] Seq=241 Ack=289 Win=149 Len=48
     44 11:24:48.440737 172.28.10.8         172.16.9.3         ssh > 54029 [PSH, ACK] Seq=289 Ack=289 Win=149 Len=64
     45 11:24:48.451488 172.16.9.3          172.28.10.8        54029 > ssh [ACK] Seq=289 Ack=353 Win=260 Len=0
     47 11:24:48.552728 172.16.9.3          172.28.10.8        54029 > ssh [PSH, ACK] Seq=289 Ack=353 Win=260 Len=48
     48 11:24:48.554451 172.28.10.8         172.16.9.3         ssh > 54029 [PSH, ACK] Seq=353 Ack=337 Win=149 Len=48
     49 11:24:48.757471 172.16.9.3          172.28.10.8        54029 > ssh [ACK] Seq=337 Ack=401 Win=260 Len=0
     50 11:24:48.816885 172.16.9.3          172.28.10.8        54029 > ssh [PSH, ACK] Seq=337 Ack=401 Win=260 Len=48

However on occasion TCP segments with checksum errors are received, not transmitted and with checksum validation turned off these errors are masked. The only evidence of a problem is that the received segment is not acknowledged and we see a retransmission.

No      Time        Source                Destination           SEQ        Nxt-SEQ    ACK        Info                                                
  39720 0.000000    172.16.9.3            172.28.10.8           1572302730 1572304190 363482188  [ACK] Len = 1460
  39721 0.000002    172.16.9.3            172.28.10.8           1572304190 1572305650 363482188  [ACK] Len = 1460
  39722 0.000003    172.16.9.3            172.28.10.8           1572305650 1572307110 363482188  [ACK] Len = 1460
  39723 0.000003    172.28.10.8           172.16.9.3            363482320             1572304190 [ACK] Len = 0

  . . . .  LOTS of new data from 172.16.9.3 and duplicate ACKs from 172.28.10.8 . . . . 

  39808 0.000513    172.16.9.3            172.28.10.8           1572367194 1572368654 363482320  [ACK] Len = 1460
  39809 0.000001    172.28.10.8           172.16.9.3            363482320             1572304190 [TCP Dup ACK 39723#43] Len = 0
  39810 0.000001    172.16.9.3            172.28.10.8           1572304190 1572305650 363482320  [TCP Fast Retransmission] [ACK] Len = 1460

Turning checksum validation back on shows the error

No      Time        Source                Destination           SEQ        Nxt-SEQ    ACK        Info                                                
  39720 0.000000    172.16.9.3            172.28.10.8           1572302730 1572304190 363482188  [ACK] Len = 1460
  39721 0.000002    172.16.9.3            172.28.10.8           1572304190 1572305650 363482188  [TCP CHECKSUM INCORRECT] [ACK] Len = 1460
  39722 0.000003    172.16.9.3            172.28.10.8           1572305650 1572307110 363482188  [ACK] Len = 1460
  39723 0.000003    172.28.10.8           172.16.9.3            363482320             1572304190 [ACK] Len = 0

  . . . .  LOTS of new data from 172.16.9.3 and duplicate ACKs from 172.28.10.8 . . . . 

  39808 0.000513    172.16.9.3            172.28.10.8           1572367194 1572368654 363482320  [ACK] Len = 1460
  39809 0.000001    172.28.10.8           172.16.9.3            363482320             1572304190 [TCP Dup ACK 39723#43] Len = 0
  39810 0.000001    172.16.9.3            172.28.10.8           1572304190 1572305650 363482320  [TCP Fast Retransmission] [ACK] Len = 1460

Random frame corruption would most likely cause the Ethernet CRC value to be invalid. Most general purpose Ethernet chips would reject the frame and the tracing software would never see it. The only way for a TCP trace to see the frame is if the Ethernet CRC is valid which means that it is probably not a random error.

Since the trace contains both the frame with an invalid TCP checksum and the valid TCP checksum we can compare the two frames. By writing the summary line and packet data of each frame into a separate file and then comparing the files we can see the differences. The first 16 bytes, the Ethernet header and the start of the IP header are identical between the two frames and do not show up in the comparison. The next three lines contain the IP ID and IP checksum and the TCP ACK number and TCP checksum. The IP ID and IP checksum will change with every IP packet. The ACK number may be different and if it is different then the TCP checksum must also different. Note that except for the 1 byte that showed up as different at offset 0x580 I replaced the application data in the packets with "AD".

$ diff 39721.txt 39810.txt                                                                                                                           
2c2
<   39721 0.000000    172.16.9.3            172.28.10.8           1572304190 1572305650 363482188  [TCP CHECKSUM INCORRECT] [ACK Len = 1460
----
>   39810 0.000000    172.16.9.3            172.28.10.8           1572304190 1572305650 363482320  [TCP Fast Retransmission] [ACK] Len = 1460
5,7c5,7
< 0010  05 dc 33 93 40 00 3f 06 e2 7c ac 10 09 03 ac 1c   ..3.@.?..|...Z..
< 0020  0a 08 08 01 02 f0 5d b7 75 3e 15 aa 4c 4c 50 10   .x....].u>..LLP.
< 0030  ff ff 1f 24 00 00 AD AD AD AD AD AD AD AD AD AD   ......XXXXXXXXXX
---
> 0010  05 dc 33 9b 40 00 3f 06 e2 74 ac 10 09 03 ac 1c   ..3.@.?..t...Z..
> 0020  0a 08 08 01 02 f0 5d b7 75 3e 15 aa 4c d0 50 10   .x....].u>..L.P.
> 0030  ff ff 1e a0 00 00 AD AD AD AD AD AD AD AD AD AD   ......XXXXXXXXXX
92c92
< 0580  ff AD AD AD AD AD AD AD AD AD AD AD AD AD AD AD   .XXXXXXXXXXXXXXX
---
> 0580  ef AD AD AD AD AD AD AD AD AD AD AD AD AD AD AD   .XXXXXXXXXXXXXXX
$

Here are two more examples, while the offset of the difference changes there appears to be only the 1 byte difference.

$ diff 57848.txt 57968.txt                                                                                                                           
2c2
<   57848 0.000000    172.16.9.3            172.28.10.8           1588348415 1588349875 363498160  [TCP CHECKSUM INCORRECT] [ACK Len = 1460
---
>   57968 0.000000    172.16.9.3            172.28.10.8           1588348415 1588349875 363498424  [TCP Retransmission] [ACK] Len = 1460
5,7c5,7
< 0010  05 dc 3c 89 40 00 3f 06 d9 86 ac 10 09 03 ac 1c   ..<.@.?......Z..
< 0020  0a 08 08 01 02 f0 5e ac 45 ff 15 aa 8a b0 50 10   .x....^.E.....P.
< 0030  ff ff 89 36 00 00 AD AD AD AD AD AD AD AD AD AD   ......XXXXXXXXXX
---
> 0010  05 dc 3c b3 40 00 3f 06 d9 5c ac 10 09 03 ac 1c   ..<.@.?..\...Z..
> 0020  0a 08 08 01 02 f0 5e ac 45 ff 15 aa 8b b8 50 10   .x....^.E.....P.
> 0030  ff ff 88 2e 00 00 AD AD AD AD AD AD AD AD AD AD   ......XXXXXXXXXX
84c84
< 0500  7e AD AD AD AD AD AD AD AD AD AD AD AD AD AD AD   .XXXXXXXXXXXXXXX
---
> 0500  76 AD AD AD AD AD AD AD AD AD AD AD AD AD AD AD   .XXXXXXXXXXXXXXX
$ 

$ diff 84483.txt 84572.txt                                                                                                                           
2c2
<   84483 0.000000    172.16.9.3            172.28.10.8           1612720136 1612721596 363522844  [TCP CHECKSUM INCORRECT] [ACK Len = 1460
---
>   84572 0.000519    172.16.9.3            172.28.10.8           1612720136 1612721596 363522976  [TCP Fast Retransmission] [ACK Len = 1460
5,7c5,7
< 0010  05 dc 49 bd 40 00 3f 06 cc 52 ac 10 09 03 ac 1c   ..I.@.?..R...Z..
< 0020  0a 08 08 01 02 f0 60 20 28 08 15 aa eb 1c 50 10   .x....` (.....P.
< 0030  ff ff 7c cf 00 00 AD AD AD AD AD AD AD AD AD AD   ......XXXXXXXXXX
---
> 0010  05 dc 49 c6 40 00 3f 06 cc 49 ac 10 09 03 ac 1c   ..I.@.?..I...Z..
> 0020  0a 08 08 01 02 f0 60 20 28 08 15 aa eb a0 50 10   .x....` (.....P.
> 0030  ff ff 7c 4b 00 00 AD AD AD AD AD AD AD AD AD AD   ......XXXXXXXXXX
68c68
< 0400  3f AD AD AD AD AD AD AD AD AD AD AD AD AD AD AD   .XXXXXXXXXXXXXXX
---
> 0400  2f AD AD AD AD AD AD AD AD AD AD AD AD AD AD AD   .XXXXXXXXXXXXXXX

You can see a distinct pattern in the differences. In all three cases it is always either the lower bit of the upper nibble or the upper bit of the lower nibble that is corrupted from a 0 to a 1. This is an indication of failing memory in some device that copies the frame and then recalculates the Ethernet CRC value.

      BAD		     GOOD
FF (1111 1111)		EF (1110 1111)
7E (0111 1110)		76 (0111 0110)
3F (0011 1111)		2F (0010 1111)                                                                                                                                                    

Unfortunately, I never found out what the device was or what exactly the problem was.

Blue Bar separator
This page was last modified on 13-09-22
mailbox Send comments and suggestions
to noah@noahdavids.org