Skip to main content

Cooperative backup and archiving

After testing my company's shiny, new 100 Mbps internet service from Cogent, I was struck with an idea. Plenty of companies out there offer internet-based backup service, but most of these are priced per-gigabyte-per-month and are prohibitively expensive. Especially when you consider that our full backup sets are typically 600 GB in compressed form, and we of course want to store daily incrementals for several months.

My idea: find another Cogent subscriber, and enter into a mutual backup agreement. That is, we buy a storage server to sit on their site, and they buy one to sit on ours. We can then exchange backup traffic, eliminating the need to shuffle tapes and send them off-site. Many large enterprises already do this with SAN hardware replication; however this would be a budget "roll your own" solution.

There exist several open-source distributed storage systems that might help, however, none seem ready for prime-time yet. I feel quite a lot could be accomplished with judicious use of native backup tools and open-source encryption software.

Ideally, we could use something like rsync or rdiff-backup to send only the changed data each night. However, since the backup server will be at an untrusted site, all data must be encrypted before it leaves our network, and no decryption keys can exist at the untrusted backup site. Since all good encryption methods produce entirely different byte streams with each encryption, tools like rsync won't gain us anything.

My current thoughts are to use native backup tools to create a local file-based backup, and then use GnuPG or a similar tool to compress and encrypt full backup files for transmission via FTP. With weekly full backups and daily incrementals, we'll be transmitting a lot more data than we would with rsync, but a 100 Mbps connection could make it workable (~20 hours for a full backup over the weekend, and much shorter incremental times).

One thing that could drastically reduce the amount of data that needs transit is some form of single-instance storage. Using native tools, we'd be storing dozens of copies of many binary files (OS and application files). However, I can't see how this could be made to work when encryption has to happen before file transmission.

Any thoughts?


Popular posts from this blog

Fixing slow NFS performance between VMware and Windows 2008 R2

I've seen hundreds of reports of slow NFS performance between VMware ESX/ESXi and Windows Server 2008 (with or without R2) out there on the internet, mixed in with a few reports of it performing fabulously.
We use the storage on our big Windows file servers periodically for one-off dev/test VMware virutal machines, and have  been struggling with this quite a bit recently. It used to be fast. Now it was very slow, like less than 3 MB/s for a copy of a VMDK. It made no sense.
We chased a lot of ideas. Started with the Windows and WMware logs of course, but nothing significant showed up. The Windows Server performance counters showed low CPU utilization and queue depth, low disk queue depth, less than 1 ms average IO service time, and a paltry 30 Mbps network utilization on bonded GbE links.
So where was the bottleneck? I ran across this Microsoft article about slow NFS performance when user name mapping wasn't set up, but it only seemed to apply to Windows 2003. Surely the patch me…

Google's public NTP servers?

I was struggling with finding a good set of low-ping NTP servers for use as upstream sources in the office. Using is great and all, but the rotating DNS entries aren't fabulous for Windows NTP clients (or really any NTP software except the reference ntpd implementation).

ntpd resolves a server hostname to an IP once at startup, and then sticks with that IP forever. Most other NTP clients honor DNS TTLs, and will follow the rotation of addresses returned by This means Windows NTP client using the built-in Windows Time Service will actually be trying to sync to a moving set of target servers when pointed at a source. Fine for most client, but not great for servers trying to maintain stable timing for security and logging purposes.

I stumbled across this link referencing Google's ntp servers at hostname time[1-4] These servers support IPv4 and IPv6, and seem to be anycast just like Google's public DNS servers at time…

Presets versus quality in x264 encoding

I'm scoping a project that will require re-encoding a large training video library into HTML5 and Flash-compatible formats. As of today, this means using H.264-based video for best compatability and quality (although WebM might become an option in a year or two).
The open source x264 is widely considered the state of the art in H.264 encoders. Given the large amount of source video we need to convert as part of the project, finding the optimal trade-off between encoding speed and quality with x264-based encoders (x264 itself, FFmpeg, MEencoder, HandBrake, etc.) is important.
So I created a 720p video comprised of several popular video test sequences concatenated together. All of these sequences are from lossless original sources, so we are not re-compressing the artifacts of another video codec. The sequences are designed to torture video codecs: scenes include splashing water, flames, slow pans, detailed backgrounds and fast motion. I did several two-pass 2500 kbps encodings using …