Skip to main content

Bufferbloat on a 3G network

I've been experiencing terrible network performance lately on Sprint's 3G (EVDO) network in downtown Chicago. I have both a Sprint Mobile Broadband card for my laptop and an iPhone 4S on Sprint's network. Sprint performance used to be fantastic compared with AT&T and Verizon mobile data networks in Chicago, but the introduction of the iPhone on Sprint seems to have caused some capacity problems. The worst spot I've come across seems to be the Ogilvie train station.

I decided to run some diagnostics from my laptop in the area during a busy rush hour. I expected to see that the network was just hopelessly oversubscribed, with high packet loss. This is a very busy commuter train station, and there are probably tens of thousands of 3G smart-phones in the vicinity at rush hour. There's also lots of mirrored glass and steel in the high-rise buildings above - basically it's the worst "urban canyon" radio environment imaginable.

However, some simple ping tests from my laptop broadband card showed almost no packet loss. What these simple tests revealed instead was a problem I previously thought carriers knew to avoid and only really affected consumer devices: "buffer bloat".

Click to embiggen
The data in this chart was collected on Friday, 3 February 2012 starting at 5:03 PM. I ran a simple ping test from my laptop to the first-hop router on the 3G connection, so I was essentially only testing the "wireless" part of the network. I presume that Sprint has fiber-based backhaul from their towers in downtown Chicago that have plenty of bandwidth.

The minimum round-trip time observed was 92 ms, which is similar to what I see on 3G networks in uncongested locations. However, ping times varied wildly, and the worst round trip time was over 1.7 seconds. . An RTT this long is disastrous from a user experience standpoint. It means, for example, that connecting to a HTTPS-enabled site takes nearly 7 seconds before the first bit of the HTML web page is transferred to the client.

There was no other network activity on my laptop at all, so this insane result seems to have come from the 3G network itself. It looks to me as though Sprint has massively over-sized the buffers on their wireless towers. There is really no excuse for this at all given the recent attention buffer bloat has received in the networking community. I can't think of any circumstance beyond perhaps satellite communications where holding on to an IP packet for 1.5 seconds is at all reasonable.

Now, I suppose the problem could be caused by upstream buffering on the laptop. But as I said there was no other activity, confirmed by the wireless card's byte counters. Even if a flow-control mechanism in EVDO was telling my laptop not to transmit, or even telling it to re-transmit previous data, there should not be 1.5 seconds of buffering in the card or its drivers. An IP packet 1.5 seconds old should just be dropped.

I plan on doing more testing in the near future, but I have to ask the mobile networking experts out there: am I totally mis-interpreting the data from this admittedly simple test? Is this buffering something inherent in CDMA technology? Can anybody think of a test to see if it is the OS or driver buffers holding on to the packets for so long (I don' think Wireshark would work for this). Obviously I don't have access to hardware-based radio testing equipment, so software tests are all I can really do.

Assuming the problem is actually the downstream buffers in the tower, Sprint really needs to adjust their buffer sizes and start dropping some packets to make their 3G network usable again.

Technical details of equipment used in the test: Dell D430 laptop, Windows 7 current with all patches and service packs, Dell-Novatel Wireless 5720 Sprint Mobile Broadband (EVDO-Rev A) card with firmware version 145 and driver version 3.0.3.0.



Comments

Popular posts from this blog

Fixing slow NFS performance between VMware and Windows 2008 R2

I've seen hundreds of reports of slow NFS performance between VMware ESX/ESXi and Windows Server 2008 (with or without R2) out there on the internet, mixed in with a few reports of it performing fabulously.
We use the storage on our big Windows file servers periodically for one-off dev/test VMware virutal machines, and have  been struggling with this quite a bit recently. It used to be fast. Now it was very slow, like less than 3 MB/s for a copy of a VMDK. It made no sense.
We chased a lot of ideas. Started with the Windows and WMware logs of course, but nothing significant showed up. The Windows Server performance counters showed low CPU utilization and queue depth, low disk queue depth, less than 1 ms average IO service time, and a paltry 30 Mbps network utilization on bonded GbE links.
So where was the bottleneck? I ran across this Microsoft article about slow NFS performance when user name mapping wasn't set up, but it only seemed to apply to Windows 2003. Surely the patch me…

Google's public NTP servers?

I was struggling with finding a good set of low-ping NTP servers for use as upstream sources in the office. Using pool.ntp.org is great and all, but the rotating DNS entries aren't fabulous for Windows NTP clients (or really any NTP software except the reference ntpd implementation).

ntpd resolves a server hostname to an IP once at startup, and then sticks with that IP forever. Most other NTP clients honor DNS TTLs, and will follow the rotation of addresses returned by pool.ntp.org. This means Windows NTP client using the built-in Windows Time Service will actually be trying to sync to a moving set of target servers when pointed at a pool.ntp.org source. Fine for most client, but not great for servers trying to maintain stable timing for security and logging purposes.

I stumbled across this link referencing Google's ntp servers at hostname time[1-4].google.com. These servers support IPv4 and IPv6, and seem to be anycast just like Google's public DNS servers at 8.8.8.8. time…

Presets versus quality in x264 encoding

I'm scoping a project that will require re-encoding a large training video library into HTML5 and Flash-compatible formats. As of today, this means using H.264-based video for best compatability and quality (although WebM might become an option in a year or two).
The open source x264 is widely considered the state of the art in H.264 encoders. Given the large amount of source video we need to convert as part of the project, finding the optimal trade-off between encoding speed and quality with x264-based encoders (x264 itself, FFmpeg, MEencoder, HandBrake, etc.) is important.
So I created a 720p video comprised of several popular video test sequences concatenated together. All of these sequences are from lossless original sources, so we are not re-compressing the artifacts of another video codec. The sequences are designed to torture video codecs: scenes include splashing water, flames, slow pans, detailed backgrounds and fast motion. I did several two-pass 2500 kbps encodings using …