Skip to main content

Alternatives to Cisco for 10 Gb/s Servers

This is a post written in response to Chris Marget's well done series "Nexus 5K Layout for 10Gb/s Servers". While I appreciate the detail and thought that went into the Cisco-based design, he's clearly a "Cisco guy" showing a teensy-weensy bit of bias toward the dominant vendor in the networking industry. The end result of his design is a price that is over $50K per rack just for networking - approaching the aggregate cost of the actual servers inside the rack! Cisco is the Neiman Marcus of the networking world.

So I came up with an alternate design based on anything-but-Cisco switches. The 10G switches from Arista here use Broadcom's Trident+ chip, which supports very similar hardware features as the Cisco solution (MLAG/vPC, DCB, hardware support for TRILL if needed in the future, etc.). Many other vendors offer 10G switches based on this merchant silicon, such as Dell/Force10 and IBM. Because this is commodity switching silicon which BRCM will sell to just about anyone (like an x86 processor), pricing would be similar for Trident+ solutions from other vendors1.

Like Chris, I also include a non-redundant 1G layer 2 switch for out-of-band management of servers and switches, in this case a Dell 5548. I have followed his design by organizing into 3-rack "pods", each containing sixteen 2U servers that each have dual-port 10 Gbps network with SFP+ ports. All needed optics are included. Server-to-switch copper cabling is not included in the pricing, nor are the fiber runs. Switch-to-switch Twinax cables are included.

Here's a quick diagram, note that the "core" switches are not included in the costs, but the optics used on those core switches are:

Made with Graphviz in 5 minutes because drop shadows don't add information to a diagram!

Vendor SKU Desc Qty List$ Extend$
Arista DCS-7050S-64-R Arista 7050, 48xSFP+ & 4xQSFP+ switch, rear-to-front airflow and dual 460W AC power supplies 2 29995 59990
Artisa QSFP-SR4 40GBASE-SR4 QSFP+ Optic, up to 100m over OM3 MMF or 150m over OM4 MMF 8 1995 15960
Arista CAB-Q-Q-2M 40GBASE-CR4 QSFP+ to QSFP+ Twinax Copper Cable 2 Meter 2 190 380
Dell 225-0849 PCT5548, 48 GbE Ports, Managed Switch, 10GbE and Stacking built-in 1 1295 1295
Dell 320-2880 SFP Transceiver 1000BASE-SX for PowerConnect LC Connector 4 169 676
        subtotal 78301
        per rack 26100






As you can see, the costs are roughly half that of the Cisco Nexus 5000-based solution, at just over US$26K per rack (list pricing) versus just over $50K per rack in Chris's favored design. The total oversubscription ratio is the same as Chris's design as well, although we have 4x40G links going to the core here instead of 16x10G as in his design. 40G links can be broken out into 4x10G links in any case with a splitter cable if your core is not capable of 40G, or you want to use a "wider" Clos-style architecture with more than two core/spine switches. You'll need to do layer 3 access or TRILL to take advantage of that design, though.

Note also that the control and management planes of the two Artista 7050S switches remain independent for resiliency: this is not a "stacked" configuration. There is also 80 Gbps available between switches on the MLAG peer link in the event of uplink or downlink failures that cause traffic to transit the peer link (which it would not do in normal operation). Assuming, as Chris does, that your core switches support MLAG/vPC, the uplinks are all active and form a 4x40G port channel to the network core.

Finally, if you want to do 10GBASE-T instead of SFP+/twinax to the servers, you can get away with spending less than $20K per rack! Arista's 7050T-64 switch is basically the same as the 7050S-64, but has 48 10GBASE-T ports instead of the 48 SFP+ ports. And it lists for just $20995. If you assume like everyone else that servers will soon have 10GBASE-T "for free" on the motherboard, that is proibably the way to go.

Full disclosure: I am not affiliated with any networking equipment vendor in any way, except as a small customer. I might indirectly own stock in one or more of the companies mentioned here via mutual funds, but if I do, I am unaware of it. I pay mutual fund managers to make those decisions for me, thereby stimulating the luxury-sedan/yacht/country-club sector of the economy.


1. Except Cisco, of course. They want $48K (list) for their Nexus 3064 which is based on the older "non-plus" version of the Broadcom Trident, and will therefore never support TRILL or DCB in hardware.

Comments

Anonymous said…
I'm a Cisco guy? Ugh.

My posts tend to be about what I'm working on, and yeah... My customers tend to be blue.

It's a small nit, but it was important in my layout: I included optics for the "core" end of the links in all of my pricing exercises.

It's not important here, except to make a fair price comparison, but it mattered in my examples because different designs called for wildly differently priced optics.

Throw in another $8300 for core-end QSFP and SX modules and we'll be on the same page :-)

Also interesting: Cisco and Arista don't seem to agree about which end of the switch is the "front".

Cisco: Front is the end with the power supply

Arista: Front is the end with the ports.

Awesome, thanks guys.

It may be the bias of which I've been recently accused, but I'm more sympathetic to the Cisco view on this point. These are top-of-rack switches. Nobody is going to claim that the wiring side of a server rack is "the front"

Nice counter example!

/chris
RPM said…
@chris I updated to include core optics for both the data and OOB management networks, to make the prices comparable to yours (although I think you assume a copper run of for the managment net in your design, but that's not material in the total pricing).
Anonymous said…
Ahh. "Teensy-weensy bit of bias." That feels much better :-)
/chris

Popular posts from this blog

Fixing slow NFS performance between VMware and Windows 2008 R2

I've seen hundreds of reports of slow NFS performance between VMware ESX/ESXi and Windows Server 2008 (with or without R2) out there on the internet, mixed in with a few reports of it performing fabulously.
We use the storage on our big Windows file servers periodically for one-off dev/test VMware virutal machines, and have  been struggling with this quite a bit recently. It used to be fast. Now it was very slow, like less than 3 MB/s for a copy of a VMDK. It made no sense.
We chased a lot of ideas. Started with the Windows and WMware logs of course, but nothing significant showed up. The Windows Server performance counters showed low CPU utilization and queue depth, low disk queue depth, less than 1 ms average IO service time, and a paltry 30 Mbps network utilization on bonded GbE links.
So where was the bottleneck? I ran across this Microsoft article about slow NFS performance when user name mapping wasn't set up, but it only seemed to apply to Windows 2003. Surely the patch me…

Google's public NTP servers?

I was struggling with finding a good set of low-ping NTP servers for use as upstream sources in the office. Using pool.ntp.org is great and all, but the rotating DNS entries aren't fabulous for Windows NTP clients (or really any NTP software except the reference ntpd implementation).

ntpd resolves a server hostname to an IP once at startup, and then sticks with that IP forever. Most other NTP clients honor DNS TTLs, and will follow the rotation of addresses returned by pool.ntp.org. This means Windows NTP client using the built-in Windows Time Service will actually be trying to sync to a moving set of target servers when pointed at a pool.ntp.org source. Fine for most client, but not great for servers trying to maintain stable timing for security and logging purposes.

I stumbled across this link referencing Google's ntp servers at hostname time[1-4].google.com. These servers support IPv4 and IPv6, and seem to be anycast just like Google's public DNS servers at 8.8.8.8. time…

Presets versus quality in x264 encoding

I'm scoping a project that will require re-encoding a large training video library into HTML5 and Flash-compatible formats. As of today, this means using H.264-based video for best compatability and quality (although WebM might become an option in a year or two).
The open source x264 is widely considered the state of the art in H.264 encoders. Given the large amount of source video we need to convert as part of the project, finding the optimal trade-off between encoding speed and quality with x264-based encoders (x264 itself, FFmpeg, MEencoder, HandBrake, etc.) is important.
So I created a 720p video comprised of several popular video test sequences concatenated together. All of these sequences are from lossless original sources, so we are not re-compressing the artifacts of another video codec. The sequences are designed to torture video codecs: scenes include splashing water, flames, slow pans, detailed backgrounds and fast motion. I did several two-pass 2500 kbps encodings using …