Friday, June 29, 2012

Windows Server 2012 storage = awesome sauce

We've been playing with the Windows Server 2012 release candidate on a new NAS system, and the combination of Storage Spaces and deduplication make for an impressive combination (see screenshot).

89% deduplication rate
We copied a week's worth of database and disk-image backups from a few servers to a deduplication-enabled volume on the test system. This amounted to a total of 845 GiB of raw, uncompressed data files. After waiting a bit for the deduplication to kick in, we ended up with a 90% savings in space.

This is the kind of result usually seen on purpose-built and reassuringly expensive dedplication appliances such as those from Data Domain.

The data copy process itself was also quite interesting. We configured twelve 2TB 7200 RPM drives into a Windows Storage Spaces pool, and set up a 5 TB NTFS volume on them in parity mode. Storage Spaces give you much of the flexibility of something like ZFS or Drobo: you create a pool of raw disks, and can carve it up into thin-provisioned volumes with different RAID and size policies. These volumes can be formatted with NTFS, ReFS, or shared out as raw iSCSI to other systems. Disks of different sizes can be added or removed and the pool will re-balance data automatically.

We copied the files from another NAS using ROBOCOPY with two threads, and the Windows 2012 system was able to write out the data at 100% of network speed (about 120 MiB/s) while using just 2% of a single Xeon E5-2620. Parity calculations are not a bottleneck here. Supposedly Microsoft also supposedly has some tricks in Storage Spaces to prevent the "software RAID-5 write hole" for parity volumes a la ZFS. The actual deduplication process took a few hours after the data was ingested, as it is a post-process system that runs at a low priority in the background.

There are caveats with the new deduplication feature, making it unsuitable for things like live VM disks or live databases. But it's certainly great for backup data, archival data, and general purpose file sharing. Management of the Storage Spaces and Deduplication features is dead-simple through the GUI, with sensible defaults. There is also a wealth of PowerShell commands to let you dig into the details not exposed in the GUI.

Finally, you can't beat the cost, which is basically "free" if you were already buying Windows Server 2012 anyway.