Since the early 1990’s the COTS High Performance Embedded Computing (HPEC) world has been dominated by processors in the sub-25 watt range. Digital signal processors or processors with special DSP-like processing elements like the Intel® i860were the typical choice of system engineers. They offered optimal megaflops-per-watt, which is generally considered the critical unit of measure for an HPEC system processor. Heat dissipation is also an issue for HPEC systems, so a lower power target is always a benefit.
Over the last five years this pattern has changed with the introduction of FPGAs and graphics processors (GPUs) into the mix. These components often broke the 25w limit normally imposed by board designers, but brought a huge jump in processing power over available microprocessors or digital signal processors (DSPs) with similar power consumption, giving them a vastly superior megaflops-per-watt over their more traditional competition.
The downside of both the earlier DSP/DSP-like processors and the later FPGA/GPU options is ease of development. Applications developed on traditional HPC supercomputers (High Performance Computing, as opposed to HPEC) often had to be redesigned and rewritten to run on these HPEC platforms, and couldn’t use many of the tools available to the HPC community. This formed a costly disconnect between the scientists developing algorithms and applications, and the engineers tasked with developing deployable systems.
A recent development by Intel has the potential to change that. At first glance the Intel® Xeon® processor E5-2400/2600 v2 family’s 50W to 115W power consumption would appear to high to be useful for an HPEC system. A deeper look, however, uncovers an architecture and feature set that has so far convinced suppliers like Mercury Systems, Trenton, and ADLINK Technology that this processor is a compelling platform for HPEC applications.
Why Intel® Xeon® Processor E5-2400/2600 v2 for HPEC?
If we look at the basic feature set for the Intel Xeon processor E5-2400/2600 v2 we see the following features that contribute to HPEC performance:
- Up to 10 cores (each capable of running two threads) ranging from 1.9 to 2.8 GHz (although clock rates higher than 1.9 GHz may consume too much power to be useful for an HPEC system).
- Intel® Advanced Vector Extensions (Intel® AVX), a 256-bit vector SIMD engine for each core.
- Large, 25 MB cache shared between the cores
- Three or four DDR3 memory channels of DDR3-1866, providing up to 14.9 GB/s per channel (59.7 GB/s raw memory bandwidth)
- Two QuickPath Interconnect (QPI) links to a second processing node, each providing 8 GT/s (giga-transfers – 32 GB/s) or a total of 64 GB/s of bandwidth between processing nodes.
- Up to 40 PCI Express Gen 3.0 lanes, although for HPEC applications these would most likely be configured as dual x16 interfaces, each giving up to 16 GT/s (approximately 15 MB/sec effective bandwidth).
In addition, the Intel Xeon processor E5-2400/2600 v2 family chipset includes support functions such as an integrated Gigabit Ethernet controller, USB and SATA ports, and an additional 8 lanes of PCIe Gen 2.0.
Figure 1: Intel® Xeon® processor E5-2600 v2 two-processor architecture block diagram.
What These Features Mean for HPEC Systems
A classical application of an HPEC system is to perform large, two-dimensional Fast-Fourier Transforms (FFT) or its inverse, the iFFT. This computation (along with many other digital signal processing algorithms) depends heavily on a multiply- accumulate operation, the so-called “butterfly” operation. Given this, the Intel Xeon processor E5-2400/2600's AVX extensions are critical to raw calculation performance.
Raw calculation performance is only a part of the story, however. Calculation latency minimization is the real aim of HPEC, and a multi-mega point (1Kx1K or greater) 2D FFT takes a lot of calculations. Therefore, the only way to minimize calculation latency is to break the problem up across multiple processors. The problem with performing a 2D FFT/iFFT across multiple processing cores is that a distributed matrix transform (or, “corner turn” as it’s known in DSP circles) is needed half way through the calculation. This requires a great deal of data movement between processors and core caches, which means that calculations essentially halt until the data movement is completed. Several features of the E5 V2 family lend themselves to optimizing this operation, including:
- Large last-level cache, which minimizes memory reads or flushes.
- Four independent banks of high-speed DDR3 memory, which when managed appropriately and combined with the last-level cache, minimizes memory overheads.
- Dual QPI channels, giving each core direct high-speed access to the corresponding processor’s memories.
- Dual x16 Gen3 PCIe, to maximize data transfer performance between boards.
Looking at the practical limits of a VPX, ATCA, or similar form factor based HPEC system we can set our power consumption limits at 160-200W/slot. Applying that to the natural two-processor architecture of the Intel Xeon processor E5-2400/2600 v2 we see that the Intel procdessor E5-2648L is a natural candidate (although the Intel Xeon processor E5-2628L or the Intel Xeon processor E5-2468L may be alternatives if cost is a driving factor). An Intel Xeon E5-2648L 2-processor system provides:
- 20 cores at 1.9 GHz on a shared high-speed bus
- Eight DDR3 memory channels with 50 MB of last-level cache
- Four channels of 16-lane PCIe for inter-board communications
We can immediately see that the Intel Xeon processor E5-2400/2600 v2 architecture provides an effective megaflops per watt calculation density with extremely fast memory and inter-core communications. However, GPUs, FPGAs, and DSPs also provide similar performance numbers. What, then, makes the Intel Xeon processor E5-2400/2600 v2 a compelling choice for HPEC systems over these alternatives?
We mentioned the downside of using DSPs, FPGAs, or GPUs above. This problem is largely avoided with Intel Xeon processor E5-2400/2600 v2-based system. John Bratton, Product Marketing Manager at Mercury Systems says that the ability to leverage traditional HPC tools and codes is a specific goal of their Intel Xeon processor E5-2400/2600 v2-based HDS6602 product – essentially “moving HPC to HPEC”. By using Intel Xeon processor E5-2400/2600 v2based platforms, system integrators can benefit from faster time to deployment by leveraging SMP single OSs like Linux and other latency-reduction HPC tools. It also directly address the problem of “model year upgrades” by enabling better performance through future new platform integration with little or no software porting effort.
Lastly, this aligns military HPEC systems with Telco and other commercial embedded multicomputer markets that have already embraced Intel Xeon processor-based platforms. Suppliers such as Trenton with their BXT7059 SBC and ADLINK with their aTCA-6250 blade serve these markets, and are now interesting solutions for the HPEC market as well. This has the potential to expand the supplier base for military embedded systems and to drive down system lifecycle costs.
Figure 2: Three embedded solutions all leveraging the power of the Intel® Xeon® processor E5-2400/2600 v2 family of processors, including the ADLINK aTCA-2650, the Mercury Systems HDS6602, and Trenton Systems BXT7059. All three share the two-processor architecture that maximizes the processing and communications performance of the Intel® Xeon® processor E5-2400/2600 v2 family.
The Intel Xeon processor E5-2400/2600 v2 family is proving to mark a paradigm shift in military high-performance embedded computing. By integrating 10’s of cores in a tightly-connected high-performance communications fabric, with multiple high-speed memories, large caches, fat high-speed I/O pipes, and a friendly and widely-support development environment, these components warrant serious consideration by system integrators and are undoubtedly destined to be widely deployed.
Contact featured members:
Solutions in this blog:
Roving Reporter, Intel® Internet of Things Solutions Alliance