Interesting Traces - Server responds to connection request with old sequence numbers

Blue Bar separator


The client is reporting random delays of up to a "few minutes" in responses that typically are very fast. This started with the installation of new firmware on a load balancing appliance that sits between the clients and the server.

Trace 1 starts out as a typical connection, SYN,SYN-ACK,ACK (frames 2595, 2596, and 2603), followed by data (frames 2607, 2678 and 2680), followed by FIN, ACK, FIN-ACK, and ACK (frames, 2711, 2722, 2724 and 2726). That is a lot of overhead for 182 bytes of data but that is another story.

Host 172.16.1.14 then sends a new SYN (frame 3227) to start a new connection but it uses the same local port number as the previous connection, 1287. I know it's a new connection because the sequence number of the new SYN is 3272303794 while the last sequence number from the previous connection was 3566092853. The server however, ACKs the old sequence number using its sequence number from the previous connection 2132478809. It obviously thinks the old connection is still active.

Host 172.16.1.14 tries again (frame 4898) with the same result (4901) and then one more time (frame 8470) but this time the server sends a SYN-ACK with a new sequence number (101055931) and ACKs the client's new sequence number.

No.       Time         Source       Destination           Protocol Info
 2595   383.186090   172.16.1.14    172.16.1.110  1287 > 38094 [SYN] Seq=3566092683 Win=65535 Len=0 MSS=1460
 2596   383.186648   172.16.1.110   172.16.1.14   8094 > 1287 [SYN, ACK] Seq=2132478615 Ack=3566092684 Win=8192 Len=0 MSS=1460
 2603   383.229113   172.16.1.14    172.16.1.110  1287 > 38094 [ACK] Seq=3566092684 Ack=2132478616 Win=65535 Len=0
 2607   383.232835   172.16.1.110   172.16.1.14   38094 > 1287 [PSH, ACK] Seq=2132478616 Ack=3566092853 Win=8192 Len=30
 2678   383.426600   172.16.1.14    172.16.1.110  1287 > 38094 [ACK] Seq=3566092853 Ack=2132478646 Win=65505 Len=0
 2680   383.426698   172.16.1.110   172.16.1.14   38094 > 1287 [PSH, ACK] Seq=2132478646 Ack=3566092853 Win=8192 Len=162
 2711   383.486675   172.16.1.110   172.16.1.14   38094 > 1287 [FIN, ACK] Seq=2132478808 Ack=3566092853 Win=8192 Len=0
 2722   383.529177   172.16.1.14    172.16.1.110  1287 > 38094 [ACK] Seq=3566092853 Ack=2132478809 Win=65343 Len=0
 2724   383.529765   172.16.1.14    172.16.1.110  1287 > 38094 [FIN, ACK] Seq=3566092853 Ack=2132478809 Win=65343 Len=0
 2726   383.529912   172.16.1.110   172.16.1.14   38094 > 1287 [ACK] Seq=2132478809 Ack=3566092854 Win=8192 Len=0
 
 3227   437.570535   172.16.1.14    172.16.1.110  [TCP Port numbers reused] 1287 > 38094 [SYN] Seq=3272303794 Win=65535 Len=0 MSS=1460
 3228   54.040850    172.16.1.110   172.16.1.14   [TCP ACKed lost segment] 38094 > 1287 [ACK] Seq=2132478809 Ack=3566092854 Win=8192 Len=0
 
 4898   440.436689   172.16.1.14    172.16.1.110  1287 > 38094 [SYN] Seq=3272303794 Win=65535 Len=0 MSS=1460
 4901   440.436866   172.16.1.110   172.16.1.14   [TCP Dup ACK 283228#1] 38094 > 1287 [ACK] Seq=2132478809 Ack=3566092854 Win=8192 Len=0
  
 8470   446.463857   172.16.1.14    172.16.1.110  1287 > 38094 [SYN] Seq=3272303794 Win=65535 Len=0 MSS=1460
 8471   446.464413   172.16.1.110   172.16.1.14   38094 > 1287 [SYN, ACK] Seq=101055931 Ack=3272303795 Win=8192 Len=0 MSS=1460
 8474   446.474257   172.16.1.14    172.16.1.110  1287 > 38094 [ACK] Seq=3272303795 Ack=101055932 Win=65535 Len=0
Trace #1

Trace 2 gives a hint as to what the server is doing. It is the same trace but with the time reset at the point where the client sends the FIN-ACK and completes closing the original connection. We can see that the SYN request in frame 3227 occurs 54 seconds after the previous connection was closed and the second request is sent 56.9 seconds after the close. The third request - the one that worked was sent almost 63 seconds after the close.

No.       Time         Source       Destination           Protocol Info
 2595   383.186090   172.16.1.14    172.16.1.110  1287 > 38094 [SYN] Seq=3566092683 Win=65535 Len=0 MSS=1460
 2596   383.186648   172.16.1.110   172.16.1.14   38094 > 1287 [SYN, ACK] Seq=2132478615 Ack=3566092684 Win=8192 Len=0 MSS=1460
 2603   383.229113   172.16.1.14    172.16.1.110  1287 > 38094 [ACK] Seq=3566092684 Ack=2132478616 Win=65535 Len=0
 2604   383.229148   172.16.1.14    172.16.1.110  1287 > 38094 [PSH, ACK] Seq=3566092684 Ack=2132478616 Win=65535 Len=169
 2607   383.232835   172.16.1.110   172.16.1.14   38094 > 1287 [PSH, ACK] Seq=2132478616 Ack=3566092853 Win=8192 Len=30
 2678   383.426600   172.16.1.14    172.16.1.110  1287 > 38094 [ACK] Seq=3566092853 Ack=2132478646 Win=65505 Len=0
 2680   383.426698   172.16.1.110   172.16.1.14   38094 > 1287 [PSH, ACK] Seq=2132478646 Ack=3566092853 Win=8192 Len=162
 2711   383.486675   172.16.1.110   172.16.1.14   38094 > 1287 [FIN, ACK] Seq=2132478808 Ack=3566092853 Win=8192 Len=0
 2722   383.529177   172.16.1.14    172.16.1.110  1287 > 38094 [ACK] Seq=3566092853 Ack=2132478809 Win=65343 Len=0
 2724   *REF*        172.16.1.14    172.16.1.110  1287 > 38094 [FIN, ACK] Seq=3566092853 Ack=2132478809 Win=65343 Len=0
 2726   0.000147     172.16.1.110   172.16.1.14   38094 > 1287 [ACK] Seq=2132478809 Ack=3566092854 Win=8192 Len=0
 3227   54.040770    172.16.1.14    172.16.1.110  [TCP Port numbers reused] 1287 > 38094 [SYN] Seq=3272303794 Win=65535 Len=0 MSS=1460
 3228   54.040850    172.16.1.110   172.16.1.14   [TCP ACKed lost segment] 38094 > 1287 [ACK] Seq=2132478809 Ack=3566092854 Win=8192 Len=0
 4898   56.906924    172.16.1.14    172.16.1.110  1287 > 38094 [SYN] Seq=3272303794 Win=65535 Len=0 MSS=1460
 4901   56.907101    172.16.1.110   172.16.1.14   [TCP Dup ACK 283228#1] 38094 > 1287 [ACK] Seq=2132478809 Ack=3566092854 Win=8192 Len=0
 8470   62.934092    172.16.1.14    172.16.1.110  1287 > 38094 [SYN] Seq=3272303794 Win=65535 Len=0 MSS=1460
 8471   62.934648    172.16.1.110   172.16.1.14   38094 > 1287 [SYN, ACK] Seq=101055931 Ack=3272303795 Win=8192 Len=0 MSS=1460
 8474   62.944492    172.16.1.14    172.16.1.110  1287 > 38094 [ACK] Seq=3272303795 Ack=101055932 Win=65535 Len=0
Trace #2

When a connection is closed, the socket that sent the initial FIN enters the TIME_WAIT state for 2*MSL (mean segment lifetime). In this case the server has set the MSL value to 30 so it will take 60 seconds for the server to remove the socket from its state tables and allow the client to reuse what amounts to the same connection (same local and remote IP addresses and port numbers).

What the client effectively saw was a delay of 9 seconds for the request to start. The worst case scenario would be to try to start another transaction immediately after the previous one finishes. In that case the user would see the full 1 minute delay caused by the MSL timer and also some addition delay because it is doubtful that the client's retransmission timer would cause the SYN segment to be retransmitted at exactly the right time.

The only solutions are to reduce the server's MSL timer or to configure the appliance not to try to reuse the local port numbers. In this case the appliance was reconfigured.

Blue Bar separator
This page was last modified on 09-06-17
mailbox Send comments and suggestions
to ndav1@cox.net