The evolution of Intel® Architecture (IA) processors features a broad range of innovation ranging from new instructions and hardware acceleration to multi-core architectures. The newest processors in the embedded family - The Intel® Xeon® 5500 Processor Series -rely on the microarchitecture codenamed Nehalem that introduced a new system architecture called Intel® QuickPath Technology. The centerpiece of the new platform is the Intel® QuickPath Interconnect (QPI) that will provide point-to-point links for I/O and interprocessor communications, and enable a dedicated memory controller. The scalable QPI architecture can deliver 6.4 Gigatransfers/second. That bandwidth enables designers to get the most out of multi-core, multiprocessor designs and enables designs that use the Non-Uniform Memory Access (NUMA) architecture to optimize access to shared memory.


The NUMA concept embraces shared memory between multiple processors and/or cores, but also acknowledges that all processors need faster access to local memory for optimum system performance. In a NUMA system, any core can access any memory in the shared-memory system, but access to memory connected to another core or processor will take longer than access to local memory. QPI is vital to supporting NUMA designs in IA-based systems.


QPI has introduced a significant shift in the look of IA-based system platforms. For years, IA-based systems have used a memory controller IC called the Northbridge that linked to each processor via the bi-directional data bus called the Front Side Bus (FSB). In a Symmetrical Multiprocessing System (SMP), the Northbridge linked the cores to a single array of shared memory. While, Intel continually ramped FSB speeds, supported multiple FSBs in a system, and added more cache local to each core, the FSB still gated SMP performance.


New processors such as the Xeon 5500 series integrate a memory controller on chip that connects each processor directly to as much as 144 Gbytes of DDR3 memory. Meanwhile the point-to-point QPI links each processor to an I/O Hub that in turn links I/O resources and other processors. In aggregate, the QPI links in the 5500 series that's shipping today deliver 25 Gigabytes/second of total bandwidth. The diagram below offers a simple look at QPI.




























QPI will certainly find use in applications such as servers, but the value that it brings to embedded applications may be even more exciting. You can already buy QPI-enabled hardware from a number of vendors. For example, Kontron, a Premier member of the Intel® Embedded Alliance, has several QPI-based products including the AT8050 that's based on a L5518 processor. The AT8050 targets communication applications and in fact is based on the Advanced Telecommunications Computing Architecture (ATCA) form factor that was conceived for communications applications. Kontron notes that the QPI-based architecture delivers 3.5x the bandwidth of prior generation processors and of course I/O bandwidth is vital in communication applications.


I'll recommend a number of links where you can find more information on QPI and processors that support the technology:


Intel has an excellent whitepaper on QPI.


You also might find a review of the Nehalem microarchitecture informative.


The Intel® Embedded Design Center has an interactive web page dedicated to the Xeon 5500 processor.


Finally, a PDF of the platform briefing for the Xeon 500 processor and 5520 chipset provides more details.


I have no doubt that QPI will deliver more powerful multiprocessing systems. Certainly standard applications from the IT space will benefit. And existing embedded applications in areas like communications will get an immediate performance boost from QPI as multiple processors more efficiently share the I/O load.


I also believe that the embedded space is rife with unique applications where QPI will provide unexpected benefits. For example, I envision imaging systems that far more efficiently move data from image capture hardware into memory for processing, and even allow multiple processors to more easily share the load. Indeed QPI will afford design teams I/O performance unmatched in any other general-purpose processor family.


What I, and other followers of the Intel Embedded Alliance, would like to know is how you are taking advantage of QPI or how you hope to in your next design. I'd go so far to speculate that someone will ultimately integrate QPI directly into a custom chip to link their hardware directly to a Nehalem processor. Have you evaluated or used QPI? Please share your thoughts via a comment.


Maury Wright

Roving Reporter (Intel Contractor)

Intel® Embedded Alliance