Skip to main content

H.264 versus WebM

There's been a lot of noise recently about Google's supposedly "free" WebM video format versus the widely-used (but patent encumbered) H.264 video format used by Flash, Apple devices, Blu-Ray players, and just about everything else.
The best H.264 encoders perform better than WebM's VP8 codec based on the objective SSIM metric, and the consensus is that H.264 video in general looks "better" at the same bitrate than VP8. But how much better, at "web video" resolutions and bitrates?
I took a number of widley-used HD test clips (which are designed to "stress" video codecs) and concatenated them into one test video. I then encoded the result at 640x360 resolution at 500 kbps.
First, the H.264 sample. I used x264 as it seems to be the best H.264 encoder around.

x264 core:110 r1820 fdcf2ae

x264 --preset veryslow --tune film -B 500 --threads 0 --pass 1 -o all_vids_360p_500k_h264hi31.mp4 all_vids_360p.y4m
x264 --preset veryslow --tune film -B 500 --threads 0 --pass 2 -o all_vids_360p_500k_h264hi31.mp4 all_vids_360p.y4m

pass 1: 62.66 fps, pass 2: 32.48 fps, total encode rate: 21.39 fps, final bitrate: 502.10 kbps

Next, the WebM sample, encoded with the vpxenc tool provided by the WebM project.

vpxenc WebM Project VP8 Encoder v0.9.5

vpxenc all_vids_360p.y4m -o all_vids_360p_500k.webm -p 2 -t 8 --best --target-bitrate=500 --end-usage=0 --auto-alt-ref=1 -v --minsection-pct=5 --maxsection-pct=800 --lag-in-frames=16 --kf-min-dist=0 --kf-max-dist=250 --static-thresh=0 --drop-frame=0 --min-q=0 --max-q=60

pass 1: 71.36 fps, pass 2: 8.88 fps, total encode rate: 7.90 fps, final bitrate: 508.298 kbps

Now, the WebM sample looks pretty good, except for the "tough spots" such as the flames, water, and fade transitions. It also seems "softer" overall than the H.264 video. Clearly, VP8 has improved with the latest 0.9.5 release, but it is still not close to H.264 at "web video" sizes and bitrates.
Of course, if you throw enough bits at the problem, the differences between codecs start to disappear. I was unable to see much of a difference between H.264 and WebM using the same clips at triple the bitrate (1500 kbps). However, such a high bitrate for a low-res 640x360 video would be very uncommon. YouTube, for example, uses ~500 kbps for the 640x360 rendition of a video.

Source of lossless HD test clips used in this video, which were resized to 640x360 before encoding.


Anonymous said…
great article!

Popular posts from this blog

Fixing slow NFS performance between VMware and Windows 2008 R2

I've seen hundreds of reports of slow NFS performance between VMware ESX/ESXi and Windows Server 2008 (with or without R2) out there on the internet, mixed in with a few reports of it performing fabulously.
We use the storage on our big Windows file servers periodically for one-off dev/test VMware virutal machines, and have  been struggling with this quite a bit recently. It used to be fast. Now it was very slow, like less than 3 MB/s for a copy of a VMDK. It made no sense.
We chased a lot of ideas. Started with the Windows and WMware logs of course, but nothing significant showed up. The Windows Server performance counters showed low CPU utilization and queue depth, low disk queue depth, less than 1 ms average IO service time, and a paltry 30 Mbps network utilization on bonded GbE links.
So where was the bottleneck? I ran across this Microsoft article about slow NFS performance when user name mapping wasn't set up, but it only seemed to apply to Windows 2003. Surely the patch me…

Google's public NTP servers?

I was struggling with finding a good set of low-ping NTP servers for use as upstream sources in the office. Using is great and all, but the rotating DNS entries aren't fabulous for Windows NTP clients (or really any NTP software except the reference ntpd implementation).

ntpd resolves a server hostname to an IP once at startup, and then sticks with that IP forever. Most other NTP clients honor DNS TTLs, and will follow the rotation of addresses returned by This means Windows NTP client using the built-in Windows Time Service will actually be trying to sync to a moving set of target servers when pointed at a source. Fine for most client, but not great for servers trying to maintain stable timing for security and logging purposes.

I stumbled across this link referencing Google's ntp servers at hostname time[1-4] These servers support IPv4 and IPv6, and seem to be anycast just like Google's public DNS servers at time…

Presets versus quality in x264 encoding

I'm scoping a project that will require re-encoding a large training video library into HTML5 and Flash-compatible formats. As of today, this means using H.264-based video for best compatability and quality (although WebM might become an option in a year or two).
The open source x264 is widely considered the state of the art in H.264 encoders. Given the large amount of source video we need to convert as part of the project, finding the optimal trade-off between encoding speed and quality with x264-based encoders (x264 itself, FFmpeg, MEencoder, HandBrake, etc.) is important.
So I created a 720p video comprised of several popular video test sequences concatenated together. All of these sequences are from lossless original sources, so we are not re-compressing the artifacts of another video codec. The sequences are designed to torture video codecs: scenes include splashing water, flames, slow pans, detailed backgrounds and fast motion. I did several two-pass 2500 kbps encodings using …