Tangle of specs snarls computing I/O road map

By Rick Merritt
EE Times
January 24, 2003 (12:55 PM EST)

MONTEREY, Calif. Thanks to the downturn and a few too many ambitious entries, the road map for I/O technology in the data center is as twisted as a wiring closet.

But amid the clutter, a few trends emerged this past week at the Server I/O conference here. InfiniBand probably will stay in a small but strategic niche and live to see a significant performance upgrade in 2006. On the motherboard, PCI Express is off to a slow start, while the competing PCI-X will likely be huge in 2004, then slowly fade. And the battle between Ethernet and Fibre Channel in storage networks won't kick into high gear until 2005.

On this spaghetti trail, IBM Corp. and Hewlett-Packard Co. are revving up a high-octane version of Ethernet as a unifying interconnect for some applications. However, the so-called R-NICs they are helping to define won't hit the market until 2005, and their underlying technology is already being criticized as too expensive and too slow.

Only IBM provided substantive details of its server I/O intentions at the conference here. All IBM servers will adopt the PCI-X 2.0 parallel bus for direct-attachment slots, the company said, probably migrating to the PCI Express serial interconnect after the 533-MHz generation of PCI-X. Express also will appear in IBM's low-end X series servers as a link for external I/O expansion. However, all other IBM servers including P, I and Z series boxes will use InfiniBand for both I/O expansion and clustering, supplanting three IBM proprietary interconnects used today.

"Just from that point, InfiniBand was a good investment for IBM," said Renato Recio, a distinguished engineer at the company. Eventually, IBM sees an enhanced version of Ethernet taking over storage networks and some low-end clustering jobs.

In a similar vein, HP said it will adopt PCI-X 2.0 for its servers, probably until a new form-factor module for Express emerges in 2005 or later. HP has also said it will use InfiniBand for clustering distributed databases and technical applications though it has not committed to winnowing down its portfolio of more than three proprietary cluster interconnects.

For its part, Dell Computer Corp. remains committed to adopting PCI Express in servers as soon as it becomes available in 2004 and will probably avoid a move to PCI-X 2.0, said Jimmy Pike, a senior server architect at Dell. The company also plans to use InfiniBand for its server clusters.

Chip support

The debate over PCI-X vs. Express pits chip set maker ServerWorks Corp. against Intel Corp., each claiming their I/O choice is the lowest in cost.

Intel is studying whether it will support PCI-X 2.0, but hasn't made a commitment yet. Observers expect the company could offer PCI-X-to-Express bridge chips once that standard is finalized later this year.

For its part, ServerWorks (Santa Clara, Calif.) said it will take its first step toward serial I/O with a 6.25-Gbit/second version of its proprietary IMB chip-to-chip bus in its next-generation chip set. That move may mark ServerWorks' effort to align with the serial Express group, which is heatedly debating 5 or 6.25 Gbits/s as its next-generation data rate.

OEMs such as HP and IBM downplay the PCI battle as one that's invisible to their users and one that will ultimately be rendered moot by integrated silicon. Instead, they point to their ambitious efforts as part of the RDMA Consortium to bring remote direct memory access one of the key techniques of InfiniBand to Internet Protocol (IP) networks.

The RDMA group's 1.0 spec, completed late last year, should emerge in so-called RDMA network interface card (R-NIC) silicon in 2004 and hit cost points suitable for challenging Fibre Channel in storage-area nets in 2005. The R-NICs will include TCP offload engines.

IBM developed about eight enhancements to RDMA as part of the IP work. These additions generally cut the amount of interplay among an application, operating system and adapter card for tasks such as buffer management and memory registration.

IBM is proposing that the InfiniBand Trade Association also adopt the RDMA enhancements, hoping to create a single RDMA technology common to both IP and InfiniBand. IBM's Recio said he expects those additions could be combined with a doubling of InfiniBand's current 2.5-Gbit/s base data transfer rate as part of a major upgrade of the spec for systems shipping in 2006 and beyond.

Stirring the pot further, the RDMA group is pushing forward a handful of new application programming interfaces through groups such as the Interconnect Software Consortium (www.opengroup.org/icsc) to create a common base of RDMA software for InfiniBand and Internet Protocol. The APIs probably will require new versions of key data center applications.

Backers think the resulting R-NICs and APIs could become the driver for moving storage networks to Ethernet and serve some low-end clusters as well.

However, the R-NIC silicon will likely be more complex than today's InfiniBand host controllers, according to Recio and others. The IBM engineer also worries that the lack of a standard interface for TCP offload engine (TOE) chips could slow the growth of those parts. TOEs are set to debut later this year as discrete chips.

"There is a big need for a TOE interface standard. If everyone uses a different interface, there's going to be a real problem getting traction for TOE," Recio said.

Questionable performance

Meanwhile, a senior Oracle Corp. manager said that company's internal tests on early InfiniBand adapters cast doubt on the performance benefits of RDMA. "It looks like RDMA may provide a 10 percent better performance than today's proprietary clustering interconnects," said Angelo Pruscino, who manages Oracle's clustering software. "This has been a very surprising result. We are still committed to RDMA, but from a purely performance perspective I am not convinced it is going to be there. The cost is high for InfiniBand today, and the performance doesn't justify it."

Bernd Winkelstrater, a senior technology analyst with server maker Fujitsu Siemens Computers (Paderborn, Germany), said internal tests at his company validate the Oracle results.

"We have completely different numbers for IBM's DB2 database," Recio countered.

Michael Krause, senior I/O architect for HP, said the RDMA Consortium's work will address the performance criticisms, which he dismissed as limited to specific implementations using Oracle's cluster software. "Angelo Pruscino at Oracle will have a whole new take on RDMA in two years," Krause promised. "RDMA is also about enabling new functions, not just performance."

Costly mistake?

The ambitious RDMA effort comes as many in this sector are still smarting with disappointment over InfiniBand. The technology once aspired to be a mainstream interconnect in the data center, but got broadsided by what some said was a combination of the downturn and certain technical missteps.

"The industry spent hundreds of millions on InfiniBand and Intel spent a large fraction of that," said Irving Robinson, an engineering director for server chip sets at Intel and a former chief architect for InfiniBand.

But in the face of the downturn, several companies realized that end users would not adopt a new technology like InfiniBand at a time when their IT budgets were getting squeezed.

Intel was one of the first to make major cuts in InfiniBand programs, shifting some engineers, like Robinson, into server chip sets. While InfiniBand chips held out little hope for immediate sales, Intel's high-margin multiprocessing Xeon CPU, dubbed Foster, was delayed getting to market for five quarters, in large part for lack of a chip set, said Tom MacDonald, another lead InfiniBand manager who now heads Intel's server chip set group.

When the market went south, less well-heeled companies found it difficult to pony up the money required for a complex, 130-nanometer mask set needed to make InfiniBand host controllers.

The InfiniBand spec itself was overly complex with too many implementation options, something the RDMA group is trying to address with its R-NICs, said HP's Krause.

Yet about the time the InfiniBand spec was set in stone, developers like Recio realized they had made a strategic mistake in not allowing an option for a memory-mapped version that would have made the migration easier for adapter makers accustomed to PCI.

The bold new I/O concepts in InfiniBand proved too daunting for the tiny R&D budgets of the strategic, but relatively small, card-making companies.

"We were arrogant," said Intel's Robinson, who still holds out hope that InfiniBand can establish itself as a standard interconnect for clusters, helping drive low-cost X86 servers into the high-margin space now owned by IBM, Sun and HP.

"Some people say the window for InfiniBand has closed because Gigabit Ethernet is here today and that's good enough. But I tend to side with those who say the window won't close until 10G Ethernet arrives," Robinson said. "I spent two years of my life on InfiniBand. My name is on that spec and I'd love to see it succeed."

Market test coming

"InfiniBand needs to prove itself in the market, something which will happen in the second half of this year. Companies like IBM and Sun have a commitment to InfiniBand," said Kevin Deierling, vice president of product marketing for Mellanox Technologies (Santa Clara), one of the few remaining merchant-market providers of InfiniBand chips. (IBM and Sun are expected to procure many of their InfiniBand chips from internal designs.)

Mellanox claims it has shipped 50,000 InfiniBand ports to date, many of them to Network Appliance for its network-attached-storage systems.

"I tend to believe InfiniBand will remain a niche. It won't be the data center backbone or the storage interconnect," said HP's Krause.

Meanwhile, Intel was a favorite whipping boy at the Server I/O conference for pushing its PCI Express initiative on the heels of the InfiniBand debacle. The InfiniBand push, in its turn, came not long after Intel's failed I2O interconnect initiative.

Copyright 2004 CMP Media LLC.

Questions or problems regarding this web site should be directed to abeckman@outdoorssite.com.

Copyright 2008 Art Beckman. All rights reserved.

Last Modified: March 9, 2008