Interesting Traces - Ignoring Destination unreachable fragmentation needed messages

Blue Bar separator

The site has many network attached printers on multiple subnets. A Windows 2003 server has these printers configured to use the LPD protocol. About once a week all the printers stop working until the server is rebooted, then they work fine - for about a week. A trace taken with Network Monitor (but displayed with Ethereal) during a period when the printers are not working shows this.
No.     Time        Source             Destination     Protocol Info
     95 1.531250    172.16.5.61        10.10.3.40      TCP      1065 > printer [SYN] Seq=725037403 Ack=0 Win=65535 Len=0 MSS=1460
    102 0.031250    10.10.3.40         172.16.5.61     TCP      printer > 1065 [SYN, ACK] Seq=1598008283 Ack=725037404 Win=8760 Len=0 MSS=1460
    103 0.000000    172.16.5.61        10.10.3.40      TCP      1065 > printer [ACK] Seq=725037404 Ack=1598008284 Win=65535 Len=0
    104 0.000000    172.16.5.61        10.10.3.40      TCP      1065 > printer [PSH, ACK] Seq=725037404 Ack=1598008284 Win=65535 Len=7
    108 0.015625    10.10.3.40         172.16.5.61     TCP      printer > 1065 [ACK] Seq=1598008284 Ack=725037411 Win=8760 Len=0
    109 0.015625    10.10.3.40         172.16.5.61     TCP      printer > 1065 [PSH, ACK] Seq=1598008284 Ack=725037411 Win=8760 Len=1
    110 0.000000    172.16.5.61        10.10.3.40      TCP      1065 > printer [PSH, ACK] Seq=725037411 Ack=1598008285 Win=65534 Len=17
    111 0.015625    10.10.3.40         172.16.5.61     TCP      printer > 1065 [PSH, ACK] Seq=1598008285 Ack=725037428 Win=8760 Len=1
    112 0.000000    172.16.5.61        10.10.3.40      TCP      1065 > printer [PSH, ACK] Seq=725037428 Ack=1598008286 Win=65533 Len=91
    113 0.031250    10.10.3.40         172.16.5.61     TCP      printer > 1065 [PSH, ACK] Seq=1598008286 Ack=725037519 Win=8760 Len=1
    114 0.000000    172.16.5.61        10.10.3.40      TCP      1065 > printer [PSH, ACK] Seq=725037519 Ack=1598008287 Win=65532 Len=20
    115 0.015625    10.10.3.40         172.16.5.61     TCP      printer > 1065 [PSH, ACK] Seq=1598008287 Ack=725037539 Win=8760 Len=1
    116 0.000000    172.16.5.61        10.10.3.40      TCP      1065 > printer [ACK] Seq=725037539 Ack=1598008288 Win=65531 Len=1460
    117 0.000000    172.16.5.61        10.10.3.40      TCP      1065 > printer [ACK] Seq=725038999 Ack=1598008288 Win=65531 Len=1460
    118 0.000000    172.16.5.61        10.10.3.40      TCP      1065 > printer [ACK] Seq=725040459 Ack=1598008288 Win=65531 Len=1460
    119 0.000000    172.16.5.61        10.10.3.40      TCP      1065 > printer [ACK] Seq=725041919 Ack=1598008288 Win=65531 Len=1460
    120 0.000000    172.16.5.61        10.10.3.40      TCP      1065 > printer [ACK] Seq=725043379 Ack=1598008288 Win=65531 Len=1460
    121 0.000000    172.16.5.61        10.10.3.40      TCP      1065 > printer [ACK] Seq=725044839 Ack=1598008288 Win=65531 Len=1460
    200 1.390625    172.16.5.61        10.10.3.40      TCP      [TCP Retransmission] 1065 > printer [ACK] Seq=725037539 Ack=1598008288 Win=65531 Len=1460
    233 0.562500    10.10.3.40         172.16.5.61     TCP      [TCP Keep-Alive] printer > 1065 [PSH, ACK] Seq=1598008287 Ack=725037539 Win=8760 Len=1
    234 0.000000    172.16.5.61        10.10.3.40      TCP      [TCP Keep-Alive ACK] 1065 > printer [ACK] Seq=725038999 Ack=1598008288 Win=65531 Len=0
    426 0.218750    172.16.5.61        10.10.3.40      TCP      [TCP Retransmission] 1065 > printer [ACK] Seq=725037539 Ack=1598008288 Win=65531 Len=1460
    784 5.250000    172.16.5.61        10.10.3.40      TCP      [TCP Retransmission] 1065 > printer [ACK] Seq=725037539 Ack=1598008288 Win=65531 Len=1460
We have a normal connection and at least until frame 200 things appear to be going OK, then we see some retransmissions with the sequence number of 725037539. If the trace continued until the end you would see retransmissions until the server killed the connection.

The simplest conclusion is that there are problems in the network but then why does a reboot solve the problem? I might buy rebooting the server at the same time that the network problems go away once but in all cases the problems do not go away until the server is rebooted and other servers do not experience any problems communicating with the printers.

The above trace was filtered to show only the connection between printer and server. If you expand the filter to include ICMP messages we begin to get an inkling of the problem. Note the "Destination unreachable" messages.

No.     Time        Source             Destination     Protocol Info
     95 1.531250    172.16.5.61        10.10.3.40      TCP      1065 > printer [SYN] Seq=725037403 Ack=0 Win=65535 Len=0 MSS=1460
    102 0.031250    10.10.3.40         172.16.5.61     TCP      printer > 1065 [SYN, ACK] Seq=1598008283 Ack=725037404 Win=8760 Len=0 MSS=1460
    103 0.000000    172.16.5.61        10.10.3.40      TCP      1065 > printer [ACK] Seq=725037404 Ack=1598008284 Win=65535 Len=0
    104 0.000000    172.16.5.61        10.10.3.40      TCP      1065 > printer [PSH, ACK] Seq=725037404 Ack=1598008284 Win=65535 Len=7
    108 0.015625    10.10.3.40         172.16.5.61     TCP      printer > 1065 [ACK] Seq=1598008284 Ack=725037411 Win=8760 Len=0
    109 0.015625    10.10.3.40         172.16.5.61     TCP      printer > 1065 [PSH, ACK] Seq=1598008284 Ack=725037411 Win=8760 Len=1
    110 0.000000    172.16.5.61        10.10.3.40      TCP      1065 > printer [PSH, ACK] Seq=725037411 Ack=1598008285 Win=65534 Len=17
    111 0.015625    10.10.3.40         172.16.5.61     TCP      printer > 1065 [PSH, ACK] Seq=1598008285 Ack=725037428 Win=8760 Len=1
    112 0.000000    172.16.5.61        10.10.3.40      TCP      1065 > printer [PSH, ACK] Seq=725037428 Ack=1598008286 Win=65533 Len=91
    113 0.031250    10.10.3.40         172.16.5.61     TCP      printer > 1065 [PSH, ACK] Seq=1598008286 Ack=725037519 Win=8760 Len=1
    114 0.000000    172.16.5.61        10.10.3.40      TCP      1065 > printer [PSH, ACK] Seq=725037519 Ack=1598008287 Win=65532 Len=20
    115 0.015625    10.10.3.40         172.16.5.61     TCP      printer > 1065 [PSH, ACK] Seq=1598008287 Ack=725037539 Win=8760 Len=1
    116 0.000000    172.16.5.61        10.10.3.40      TCP      1065 > printer [ACK] Seq=725037539 Ack=1598008288 Win=65531 Len=1460
    117 0.000000    172.16.5.61        10.10.3.40      TCP      1065 > printer [ACK] Seq=725038999 Ack=1598008288 Win=65531 Len=1460
    118 0.000000    172.16.5.61        10.10.3.40      TCP      1065 > printer [ACK] Seq=725040459 Ack=1598008288 Win=65531 Len=1460
    119 0.000000    172.16.5.61        10.10.3.40      TCP      1065 > printer [ACK] Seq=725041919 Ack=1598008288 Win=65531 Len=1460
    120 0.000000    172.16.5.61        10.10.3.40      TCP      1065 > printer [ACK] Seq=725043379 Ack=1598008288 Win=65531 Len=1460
    121 0.000000    172.16.5.61        10.10.3.40      TCP      1065 > printer [ACK] Seq=725044839 Ack=1598008288 Win=65531 Len=1460
    122 0.015625    192.168.42.49      172.16.5.61     ICMP     Destination unreachable
    200 1.390625    172.16.5.61        10.10.3.40      TCP      [TCP Retransmission] 1065 > printer [ACK] Seq=725037539 Ack=1598008288 Win=65531 Len=1460
    201 0.000000    192.168.42.49      172.16.5.61     ICMP     Destination unreachable
    233 0.562500    10.10.3.40         172.16.5.61     TCP      [TCP Keep-Alive] printer > 1065 [PSH, ACK] Seq=1598008287 Ack=725037539 Win=8760 Len=1
    234 0.000000    172.16.5.61        10.10.3.40      TCP      [TCP Keep-Alive ACK] 1065 > printer [ACK] Seq=725038999 Ack=1598008288 Win=65531 Len=0
    320 1.187500    192.168.42.49      172.16.5.61     ICMP     Destination unreachable
    343 0.328125    192.168.42.49      172.16.5.61     ICMP     Destination unreachable
    376 0.328125    192.168.42.49      172.16.5.61     ICMP     Destination unreachable
    426 0.218750    172.16.5.61        10.10.3.40      TCP      [TCP Retransmission] 1065 > printer [ACK] Seq=725037539 Ack=1598008288 Win=65531 Len=1460
    428 0.000000    192.168.42.49      172.16.5.61     ICMP     Destination unreachable
    784 5.250000    172.16.5.61        10.10.3.40      TCP      [TCP Retransmission] 1065 > printer [ACK] Seq=725037539 Ack=1598008288 Win=65531 Len=1460
Based on the sequence number in the TCP section of the first ICMP destination unreachable message we know that it is referring to packet 116. The message indicates that the maximum MTU is only 1443 not 1500.
Frame 122 (590 bytes on wire, 590 bytes captured)
    Arrival Time: Nov 19, 2005 18:34:07.020250000
    Time delta from previous packet: 0.015625000 seconds
    Time since reference or first frame: 3.406250000 seconds
    Frame Number: 122
    Packet Length: 590 bytes
    Capture Length: 590 bytes
Ethernet II, Src: XX:XX:XX:04:9d:00, Dst: XX:XX:XX:01:31:16
    Destination: XX:XX:XX:01:31:16 (XX:XX:XX:01:31:16)
    Source: XX:XX:XX:04:9d:00 (XX:XX:XX:04:9d:00)
    Type: IP (0x0800)
Internet Protocol, Src Addr: 192.168.42.49 (192.168.42.49), Dst Addr: 172.16.5.61 (172.16.5.61)
    Version: 4
    Header length: 20 bytes
    Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00)
        0000 00.. = Differentiated Services Codepoint: Default (0x00)
        .... ..0. = ECN-Capable Transport (ECT): 0
        .... ...0 = ECN-CE: 0
    Total Length: 576
    Identification: 0xf1a7 (61863)
    Flags: 0x00
        0... = Reserved bit: Not set
        .0.. = Don't fragment: Not set
        ..0. = More fragments: Not set
    Fragment offset: 0
    Time to live: 199
    Protocol: ICMP (0x01)
    Header checksum: 0x18e6 (correct)
    Source: 192.168.42.49 (192.168.42.49)
    Destination: 172.16.5.61 (172.16.5.61)
Internet Control Message Protocol
    Type: 3 (Destination unreachable)
    Code: 4 (Fragmentation needed)
    Checksum: 0x87a9 (correct)
    MTU of next hop: 1443
    Internet Protocol, Src Addr: 172.16.5.61 (172.16.5.61), Dst Addr: 10.10.3.40 (10.10.3.40)
        Version: 4
        Header length: 20 bytes
        Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00)
            0000 00.. = Differentiated Services Codepoint: Default (0x00)
            .... ..0. = ECN-Capable Transport (ECT): 0
            .... ...0 = ECN-CE: 0
        Total Length: 1500
        Identification: 0x9c3d (39997)
        Flags: 0x04 (Don't Fragment)
            0... = Reserved bit: Not set
            .1.. = Don't fragment: Set
            ..0. = More fragments: Not set
        Fragment offset: 0
        Time to live: 126
        Protocol: TCP (0x06)
        Header checksum: 0x3fbc (correct)
        Source: 172.16.5.61 (172.16.5.61)
        Destination: 10.10.3.40 (10.10.3.40)
    Transmission Control Protocol, Src Port: 1065 (1065), Dst Port: printer (515), Seq: 725037539, Ack: 1598008288
        Source port: 1065 (1065)
        Destination port: printer (515)
        Sequence number: 725037539    (relative sequence number)
        Acknowledgement number: 1598008288    (relative ack number)
        Header length: 20 bytes
        Flags: 0x0010 (ACK)
            0... .... = Congestion Window Reduced (CWR): Not set
            .0.. .... = ECN-Echo: Not set
            ..0. .... = Urgent: Not set
            ...1 .... = Acknowledgment: Set
            .... 0... = Push: Not set
            .... .0.. = Reset: Not set
            .... ..0. = Syn: Not set
            .... ...0 = Fin: Not set
        Window size: 65531
        Checksum: 0xdf18 (incorrect, should be 0x5230)
    Line Printer Daemon Protocol
    Data (508 bytes)
So why isn't the server changing the message length - note that the retransmitted packet still has a TCP length of 1460 bytes.

When the Windows 2003 registry key HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\EnablePMTUDiscovery is set to 0 Windows will ignore these kinds of messages and packets going to a remote subnet will have a maximum length of 576 bytes. When the value of the key is 1 Windows will use a maximum packet length of 1460 bytes and rely on Destination unreachable fragmentation needed messages to tell it what the maximum size should be. We know that the value is set to 1 because the packets are longer than 576 bytes.

So whats is going on, why is Windows not reducing the packet size? It's a bug introduced into Windows 2003 with security update MS05-019 and Service Pack 1 (SP1). Installing either one will give you this problem. The good news is that there is a fix. See http://support.microsoft.com/default.aspx?scid=kb;en-us;898060 for details.

Blue Bar separator
This page was last modified on 05-12-22
mailbox Send comments and suggestions
to ndav1@cox.net