Friday, January 06, 2012

What the Internet looks like in 2012

Click to zoom
This is a graph of links between visible Autonomous Systems on the Internet generated from public BGP routing tables early on 1 Jan 2012. Each link of each AS path in the BGP data is represented as an edge, with duplicates removed. The data was then graphed using the twopi layout tool from Graphviz. Links to the top twelve most-connected service provider networks are highlighted in color, with all other AS links in white.

I'm struck by the sheer density of connectivity on the modern Internet. Each of the 94865 lines on this graph represents at least one physical link between organizations. But in the case of larger networks that same thin line might represent dozens of routers and 10 Gb/s fibers at many locations throughout the world.

It certainly looks as robust as originally intended, but also chaotic and disordered. Surely no government, organization, or evil genius bent on world domination could possibly control all those links. The sooner our politicians figure that out, the better.

Wednesday, January 12, 2011

H.264 versus WebM

There's been a lot of noise recently about Google's supposedly "free" WebM video format versus the widely-used (but patent encumbered) H.264 video format used by Flash, Apple devices, Blu-Ray players, and just about everything else.
The best H.264 encoders perform better than WebM's VP8 codec based on the objective SSIM metric, and the consensus is that H.264 video in general looks "better" at the same bitrate than VP8. But how much better, at "web video" resolutions and bitrates?
I took a number of widley-used HD test clips (which are designed to "stress" video codecs) and concatenated them into one test video. I then encoded the result at 640x360 resolution at 500 kbps.
First, the H.264 sample. I used x264 as it seems to be the best H.264 encoder around.


x264 core:110 r1820 fdcf2ae

x264 --preset veryslow --tune film -B 500 --threads 0 --pass 1 -o all_vids_360p_500k_h264hi31.mp4 all_vids_360p.y4m
x264 --preset veryslow --tune film -B 500 --threads 0 --pass 2 -o all_vids_360p_500k_h264hi31.mp4 all_vids_360p.y4m

pass 1: 62.66 fps, pass 2: 32.48 fps, total encode rate: 21.39 fps, final bitrate: 502.10 kbps


Next, the WebM sample, encoded with the vpxenc tool provided by the WebM project.

vpxenc WebM Project VP8 Encoder v0.9.5

vpxenc all_vids_360p.y4m -o all_vids_360p_500k.webm -p 2 -t 8 --best --target-bitrate=500 --end-usage=0 --auto-alt-ref=1 -v --minsection-pct=5 --maxsection-pct=800 --lag-in-frames=16 --kf-min-dist=0 --kf-max-dist=250 --static-thresh=0 --drop-frame=0 --min-q=0 --max-q=60

pass 1: 71.36 fps, pass 2: 8.88 fps, total encode rate: 7.90 fps, final bitrate: 508.298 kbps


Now, the WebM sample looks pretty good, except for the "tough spots" such as the flames, water, and fade transitions. It also seems "softer" overall than the H.264 video. Clearly, VP8 has improved with the latest 0.9.5 release, but it is still not close to H.264 at "web video" sizes and bitrates.
Of course, if you throw enough bits at the problem, the differences between codecs start to disappear. I was unable to see much of a difference between H.264 and WebM using the same clips at triple the bitrate (1500 kbps). However, such a high bitrate for a low-res 640x360 video would be very uncommon. YouTube, for example, uses ~500 kbps for the 640x360 rendition of a video.

Source of lossless HD test clips used in this video, which were resized to 640x360 before encoding.

Tuesday, December 14, 2010

Presets versus quality in x264 encoding

I'm scoping a project that will require re-encoding a large training video library into HTML5 and Flash-compatible formats. As of today, this means using H.264-based video for best compatability and quality (although WebM might become an option in a year or two).


The open source x264 is widely considered the state of the art in H.264 encoders. Given the large amount of source video we need to convert as part of the project, finding the optimal trade-off between encoding speed and quality with x264-based encoders (x264 itself, FFmpeg, MEencoder, HandBrake, etc.) is important.


So I created a 720p video comprised of several popular video test sequences concatenated together. All of these sequences are from lossless original sources, so we are not re-compressing the artifacts of another video codec. The sequences are designed to torture video codecs: scenes include splashing water, flames, slow pans, detailed backgrounds and fast motion. I did several two-pass 2500 kbps encodings using the x264 presets distributed with the x264 command line encoder (version 0.110.1820 fdcf2ae). Excepting the "ultrafast" preset, which does not use B-frames and was dropped from the charts as an extreme outlier, all of the presets created files that varied by less than 0.2% in bitrate.


Here is the actual encoding command used in the test:

x264 --preset {presetname} -B 2500 --ssim --pass {1|2} -o {output}.mp4 input.y4m


The mean luminance SSIM (as reported by x264) was used as the objective quality metric in my tests. Yes, I know about the weaknesses of using arithmetic means for video metrics, and the benefits of box-and-whisker plots for showing variance between frames. However, a single number is quite illustrative, especially since we are testing the same basic codec at the same bitrate with marginally differing tunings. This was a quick-and-dirty test. If I find the time to get avisynth working correctly on my Windows 7 x64 machine I will update the plots to include variance information.


Here's a quick and dirty chart (click to enlarge):


I was quite surprised that there was only a 0.75 dB difference in mean SSIM from the veryfast to placebo presets, despite placebo being 68 times slower than veryfast mode. I would have expected much more quality improvement for the CPU effort expended, given that placebo was producing just 1.5 fps on an eight-core machine. From a subjective standpoint, the results are indistinguishable to me, even going frame-by-frame through tough sections of the video.


Needless to say, all of my x264 encoding will now be done with medium preset or faster. Decoding a lossless source video for re-encoding became the bottleneck with medium presets or faster. If there is interest, please leave a comment, and I will find a hosting spot for the lossless, veryfast, and placebo versions of the video so others can compare, reproduce, or extend this simple test.


Thursday, December 11, 2008

Did ND get any better in 2008?

With all the talk about the potential firing of Charlie Weis (which ultimately didn't happen) I thought I would ask this question: did the Irish improve substantially between 2007 and 2008?

The most obvious way to answer this is to use the computer polls. I chose the Colley Matrix, which is part of the BCS formula, since it has a characteristic that makes comparing teams year-to-year straightforward. All teams in this computer poll are inherently normalized to a rating of 0.5, which makes comparisons year-to-year much more valid.

Here's what it looks like (pre-bowl games for both years):






























YearRankingRatingRecordStrength of ScheduleSchedule RankBest win
2008580.5296-60.53344#45 NAVY
2007900.3753-90.60410#46 UCLA


First off, it is clear that the team was much better this year. The rating differential of 0.154 is the same as the differential between 2008 Ohio State and 2008 Cal - clearly a big step forward.

Secondly, the "weak schedule" of 2008 turned out to be not-so-weak: it was a lot tougher than the 86th-ranked "mighty SEC" schedule played by Alabama, for example. Tougher than 6 of the top 10 teams in 2008 in fact.

Yes, the team still looked inept at times. And the loss to Syracuse was frankly unforgivable. But there were also many bright spots in 2008 as well. Michael Floyd and Golden Tate are going to be one of the best WR tandems in the country next year. The defense improved. Claussen looked better. Special teams looked better, with even Brandon Walker finishing the year strong.

In the end, I'm glad Charlie is staying for another year. The last thing ND needs is to hit the reset button on the program without having locked up a great successor. Willingham was fired because Urban Meyer was on the market, and ND failed to land him, so we have Weis. There is no Urban Meyer out there this year. Until there is a can't-miss candidate available, and it is clear that he will actually take the ND job, Weis is the better option.

Monday, March 10, 2008

Network-focused analysis of the Windows Time Service

Due to some recent posts on the comp.protocols.time.ntp newsgroup, I took it upon myself to investigate the behavior of the Windows Time Service a bit further using the Wireshark protocol analyzer.

  1. It appears that in Windows XP, 2003, and Vista, the Windows Time Service (w32time) will by default always try to form a "symmetric active" association with configured NTP servers. This can be problematic with some time servers, violates the published RFC-1305 specification, and is not necessary. I could find no explanation on Microsoft's site for this behavior; I suspect it has something to do with interoperability with older Windows 2000 domain controllers that had very broken NTP.

    However, there is a simple workaround. You can simply add ",0x8" to the end of any configured time server, and Windows will only use a client-mode association. For example, the command:
    w32tm /configure /manualpeerlist:"0.pool.ntp.org,0x8 1.pool.ntp.org,0x8 2.pool.ntp.org,0x8" /syncfromflags:MANUAL /update

    will configure your Windows machine to form client-mode associations with three different NTP Pool servers.

  2. The minimum polling interval on all Windows machines except domain controllers is set to 1024 seconds by default. Windows domain controllers have a minimum poll interval of 64s.

    This is reasonable as clients usually do not need extremely accurate time. However, quite a few servers that are not domain controllers do need to get accurate time offset and frequency synchonization quickly. You can configure "MinPollInterval" and "MaxPollInterval" through the registry or using Group Policy tools, as documented here.

    Important note: never set the minimum poll value to less than 6 (which is 26 = 64 seconds). You won't get better time synchronization, and will be abusing the servers you have configured. Many time server administrators have automated tools that block clients that poll too frequently.

  3. Windows Time Service does follow sensible rules for "backing off" the polling interval, and adjusting the interval to network conditions. In my testing, a Windows Server 2003 domain controller began polling at 64 seconds, and then backed off to one poll every 1024 seconds within about 30 minutes. This is the same behavior as the reference ntpd implementation.

    Also, in my tests, Windows Time Service did respond to unreachable servers sensibly, backing off the polling interval to 215s. However, when a server became unreachable, it did increase polling in steps down to 24s before reverting to 215s. This rather strange polling pattern (15-9-9-8-7-6-5-4) continued until the server became reachable again. There have been quite a few problems in the past caused by NTP implementations that polled too frequently. Fortunatley, the Windows Time Service should not cause problems in this area, as an unreachable server results in an average of one poll every 212s (about once an hour).

Monday, November 12, 2007

A cynic's view on the Sun-Network Appliance lawsuit

I've been thinking about the Sun - NetApp lawsuit, which is an interesting case that highlights what is wrong with the Patent Office in the US. I don't necessarily think that all software patents are bad, and that innovation should be rewarded with temporary monopoly on a piece of technology.

However, this lawsuit shouldn't have happened, because at least one of the patents in question should probably not have been granted in the first place. David Hitz, founder of Network Appliance, has a blog in which he contends that Sun's ZFS violates patents held by NetApp for their WAFL. If that's truly the case, Sun should certainly be held liable, and should stop publishing ZFS as open source code. You can't give away what isn't yours.

The larger issue, in my opinion, is that it seems significant claims of the WAFL patent should never have been granted. There exists a significant amount of prior art in the use of a "tree of block pointers" to maintain logical consistency in data storage. Relational database management systems (Oracle, Microsoft SQL Server, IBM DB2, Sybase, PostgreSQL, etc.) have been using the same techniques for decades to maintain transaction-consistent indexes and relational data inside databases. WAFL may indeed be the first implementation of the idea where the "tree of blocks" points to files in a general-purpose file system. But in an RDBMS, the tree of blocks (tree of index pages) typically points to arbitrary row data in the database. What's the difference? Not much, bits are just bits. The application to a file system seems obvious to me (reasonably skilled in the art), meaning the patent should likely be challenged. Heck, not-so-innovative Microsoft was kicking the same ideas around back in the early 1990s as part of the WinFS file system for the "Chicago" project.

In fact, I personally designed the database for a document management system that used a "tree of blocks" to store and organize arbitrary file data in Microsoft SQL Server back in the late 1990s. This system had point-in-time recovery and look-up capability, based on valid time-stamps in the block pointers (folder and file tables with their indexes). I suppose you could call this a "snapshot" capability. The database took care of transaction logging, check-pointing, and referential integrity, all of which seem to be additional claims in the WAFL patent.

I have passable programming skills, I am not an algorithm design guru, and I had certainly never read the WAFL patents before implementing this "file system on a database". The basic ideas were widely known and used frequently in the database arena. I am not an intellectual property lawyer, but experience leads me to believe there isn't much innovation in the "always-consistent tree of blocks" described in WAFL patent. They devil may be in the details, I suppose - I have only reviewed the patent abstract at this point.

Still, in my opinion, the U.S. Patent Office, as currently constituted, is incapable of identifying true innovation. It grants far too many patents on obvious or derivative technology, especially in the software arena. If they can't get it right, even with the enormous resources at their disposal, they should probably not grant any software patents at all.

As a side note, this in-house document management system was never widely used, and the project was considered a failure. I believe this was largely the result of a cumbersome legacy ASP-based web front-end, though, not because of deficiencies in the storage engine.

Monday, April 23, 2007

time.windows.com fixed

Well, it appears that time.windows.com is now fixed, after a few weeks of serving up invalid time. Presumably, the clocks on millions of Windows machines worldwide are now slowly drifting back into synchronization with the rest of humanity.

I find it rediculous that such a problem could go unnoticed and unfixed by Microsoft for so long, and that it took a Microsoft participant on a programmer's blog reading about it to track down and correct the issue.
U:\>w32tm /monitor /computers:time.windows.com,us.pool.ntp.org
time.windows.com [207.46.130.100]:
NTP: +0.0541156s offset from local clock
RefID: time-nw.nist.gov [131.107.1.10]
us.pool.ntp.org [66.91.129.70]:
NTP: +0.0293621s offset from local clock
RefID: bigben.ucsd.edu [132.239.1.6]