
At first, I tested the infrastructure with the default MTU size of 1500.I began testing the infrastructure in order to find the optimal MTU and TCP-MSS values by doing a defragmented ping test: If the DF (don’t fragment) bit is set then the packet can be dropped, which can cause delays or slowness in the network. If the MTU is lower somewhere in the path, then the packet can be fragmented. The server will therefore think that the client can receive 1500 bytes (1460 MSS+20 IP header+20 TCP header=1500 bytes) and will send a packet with a size of 1500 bytes. If the communication network has a lower MTU value, but the client endpoint is not aware of it, it will send its MSS value of 1460 bytes to the server. TCP-MSS – stands for ‘Maximum Segment Size’ is the maximum size of the payload field inside a single IP packet.MTU – stands for ‘Maximum Transmission Unit’ is the maximum size of an IP packet that can be handled by the layer-3 device.My next step was to find the bottleneck and to measure MTU and MSS sizes between the sites.

I could see that after a while, I had a significant packet loss which that began to accumulate and to grow, what clearly explain why we have issues with TCP based protocols. I used a network analyzer in order to measure ping response, delay, jitter and packet loss between the management environment and the remote sites. My main suspicions focused on layers 3-4, which are responsible of the IP protocols and transmission protocols (TCP/UDP), as other layers were cleared of issues. The troubleshooting process was based on the OSI model, starting from the physical layer up to the application layer. On the other hand, protocols like SSH and RDP worked properly and the most confusing part was that one of the sites connected to this infrastructure didn’t had any of these symptoms. Packet loss between the DC and the internal segments of the remote sites.

File transfer would fail when copying big files.Inability to withdraw the GPO policy from the domain.Inability to connect several clients to the servers located in the DC.Inability to access firewall management GUI using HTTPS.In addition, the observed symptoms were exactly the same, as listed below.ĭuring the troubleshooting process, the following symptoms were observed:

The common denominator between the sites was the WAN infrastructure (owned by the client). After the deployment of several remote sites, certain management and application capabilities have stopped working.
