I decided to run some diagnostics from my laptop in the area during a busy rush hour. I expected to see that the network was just hopelessly oversubscribed, with high packet loss. This is a very busy commuter train station, and there are probably tens of thousands of 3G smart-phones in the vicinity at rush hour. There's also lots of mirrored glass and steel in the high-rise buildings above - basically it's the worst "urban canyon" radio environment imaginable.
However, some simple ping tests from my laptop broadband card showed almost no packet loss. What these simple tests revealed instead was a problem I previously thought carriers knew to avoid and only really affected consumer devices: "buffer bloat".
|Click to embiggen|
The minimum round-trip time observed was 92 ms, which is similar to what I see on 3G networks in uncongested locations. However, ping times varied wildly, and the worst round trip time was over 1.7 seconds.
There was no other network activity on my laptop at all, so this insane result seems to have come from the 3G network itself. It looks to me as though Sprint has massively over-sized the buffers on their wireless towers. There is really no excuse for this at all given the recent attention buffer bloat has received in the networking community. I can't think of any circumstance beyond perhaps satellite communications where holding on to an IP packet for 1.5 seconds is at all reasonable.
Now, I suppose the problem could be caused by upstream buffering on the laptop. But as I said there was no other activity, confirmed by the wireless card's byte counters. Even if a flow-control mechanism in EVDO was telling my laptop not to transmit, or even telling it to re-transmit previous data, there should not be 1.5 seconds of buffering in the card or its drivers. An IP packet 1.5 seconds old should just be dropped.
I plan on doing more testing in the near future, but I have to ask the mobile networking experts out there: am I totally mis-interpreting the data from this admittedly simple test? Is this buffering something inherent in CDMA technology? Can anybody think of a test to see if it is the OS or driver buffers holding on to the packets for so long (I don' think Wireshark would work for this). Obviously I don't have access to hardware-based radio testing equipment, so software tests are all I can really do.
Assuming the problem is actually the downstream buffers in the tower, Sprint really needs to adjust their buffer sizes and start dropping some packets to make their 3G network usable again.
Technical details of equipment used in the test: Dell D430 laptop, Windows 7 current with all patches and service packs, Dell-Novatel Wireless 5720 Sprint Mobile Broadband (EVDO-Rev A) card with firmware version 145 and driver version 220.127.116.11.