Military and aerospace industries have highly computationally intensive image and signal processing requirements calling for high levels of floating-point data precision. These industries are always seeking solutions that can process ever larger sets of complex data in real time, faster, while using less power. Such solutions help optimize the size, weight, and power (SWaP) ratio of their embedded systems.
A big development last year was the introduction of the Intel® Xeon® processor E5 v3 family. These processors include the Intel® Advanced Vector Extensions 2 (Intel® AVX2) instruction set which can provide significant performance improvements over Intel® Advanced Vector Extensions (Intel® AVX) and Intel® Streaming SIMD Extensions (Intel® SSE) – see Figure 1.
Figure 1. Measuring performance on the same processor using Linpack* benchmarks shows a significant performance increase delivered by Intel® Advanced Vector Extensions 2 (Intel® AVX2).
Where the first version of Intel AVX accelerated floating-point compute performance by doubling the size of the floating-point (vector) SIMD registers from 128 to 256 bits, Intel AVX2 goes one better. It extends Intel SSE and Intel AVX with 256-bit integer instructions and also adds support for floating-point fused multiply-add instructions, as well as gather operations. By doubling the number of double-precision floating-point operations per second (FLOPS) per clock cycle, Intel AVX2 can theoretically double the core’s peak floating point throughput (Figure 2).
Figure 2. FLOPS performance comparison by instruction set.
It isn’t all processor magic though – there is a tradeoff in processor frequency. However, the latest Intel Xeon processor X5 v3 family has a new strategy for eking out the most performance possible at any moment. Let’s look at how this works.
When a processor detects Intel AVX2 instructions, additional voltage is applied to the core. The processor then may run hotter, requiring the operating frequency to be reduced to maintain operations within the TDP limits. The higher voltage is maintained for 1 millisecond after the last Intel AVX instructions completes, and then the voltage returns to the nominal TDP voltage level.
Historically, Intel has specified a marked TDP frequency and a turbo frequency for all workloads. A significant advance with the Intel Xeon processor E5 v3 family is the addition of two new AVX frequencies (Figure 3):
- AVX base –
- AVX max all core turbo
Figure 3. The new Intel® Advanced Vector Extensions 2 (Intel® AVX2) instructions could operate at or below the marked TDP frequency while still providing up to two times the floating-point throughput.
These two new frequencies deliver three advantages:
- Workloads making extensive use of Intel AVX2 instructions may reduce processor frequency as far down as the AVX base frequency to stay within TDP limits
- Some workloads using Intel AVX2 instructions can deliver greater performance by achieving a turbo frequency above the AVX base frequency all the way up to “AVX max all core turbo.”
- Workloads with no Intel AVX2 instructions can operate at the marked TDP frequency up to the “max all core turbo (non-AVX).”
A key technology here is Intel® Turbo Boost Technology. It orchestrates optimal performance by providing opportunistic frequency increases based on workload, number of active cores, temperature, power, and current. If there is power and thermal headroom, Intel TurboBoost Technology enables Intel AVX2 workloads to opportunistically run at high turbo frequencies for a performance boost. On the other hand, workloads utilizing a very high percentage of Intel AVX2 instructions may operate closer to the AVX base frequency.
No matter what though, you’ll see a significant performance increase with Intel AVX2 compared to workloads using Intel AVX instructions on previous generation processors. Intel AVX2 can deliver up to a 1.7x increase in peak GFLOPS (Figure 4).
Figure 4. Calculating the theoretical peak FLOPS using the AVX base frequency results in an up to 1.7x increase in peak GFLOPS on Intel® Xeon® processor E5 v3 family with Intel® AVX2 compared to the previous generation processor (v2) with the same core count and similar frequency using Intel® AVX.
Blades and Boards Delivering These Gains
Members of the Intel® Internet of Things Solutions Alliance offer blades and boards based on the Intel Xeon processor E5 v3 family that can provide Intel AVX2 performance gains. Let’s look at a blade and a board specifically designed for mil/aero applications.
Mercury Systems offers the Ensemble* HDS6603 High Density Server (Figure 5). This powerful open systems architecture (OSA) blade delivers more than one TFLOP of general processing power in a single OpenVPX* slot. A single-slot, 6U OpenVPX (VITA 46/65) compliant module, the HDS6603 is powered by two 1.8 GHz processors from the Intel® Xeon® processor E5-2600 v3 product family, each with 12 cores for a total of 1.38 TFLOPS of general-purpose processing power. Configuration can include up to 32 GB DDR4-2133 SDRAM per processor.
Figure 5. Mercury Ensemble HDS6603 high density server blade.
Each processor includes fused-multiply-add (FMA) functionality enabling common radar functions like fast Fourier transformations (FFTs) to be performed twice as quickly. Onboard Gen 3 PCIe* pipes feed the module’s switch fabric interconnects, which are managed by dual Mellanox ConnectX*-3 devices to deliver 40Gb/s Ethernet or InfiniBand* inter-module data rates. Native Intel® QuickPath Interconnect (Intel® QPI) inter-processor interconnects support virtual cache coherent processor cores to create a true deterministic SMP environment. For rugged applications, Mercury offers air-cooled, rugged conduction-cooled, and Air Flow-By* OpenVPX packaging options.
The HEP8225 HDEC series system host board (SHB) from Trenton Systems features two Intel® Xeon® processor E5-2600 v3 product family and the Intel® C610 Chipset. The mechanical layout of the HEP8225 SHB is similar to current PICMG* 1.3 system host boards (Figure 6). To reduce performance-robbing latency, this made-in-USA system routes all 80 PCIe links from the board’s two processors down to the double-density PCIe card edge fingers. Additional device I/O and power pins are also available in the board to enable greater system design flexibility in a wide variety of embedded computing applications. This additional I/O also enables a greater level of cable routing efficiency within an HDEC Series system.
Figure 6. Trenton HEP8225 HDEC system host board.
HEP8225 HDEC series system features include:
- Embedded processor options from the six-core Intel® Xeon® processor E5-2608L v3 up to the twelve-core Intel® Xeon® processor E5-2680 v3; also available is the fourteen-core Intel® Xeon® processor E5-2695 v3
- Up to 128GB of DDR4 memory with four channels per processor
- 2x 10GbE and 2x 1GbE Ethernet ports
- 6x USB 3.0 and 4x USB 2.0 interfaces
- 8x SATA revision 3.0 interfaces
- Baseboard management controller (BMC) for system management
- On-board video, audio and serial interfaces
- New mechanical packaging attributes that are suited for rugged application environments
The HEP8225 SHB can be combined with a number of HDEC Series backplanes and system platforms. These backplanes and platforms enable mil/aero system designers to create solutions that conform to industry standard 2U, 4U, and 5U 19” rackmount form factors, but deliver higher data throughput than traditional systems. Together the SHB and backplane maximize data throughput directly to the processors to reduce latency and reliance on additional PCIe switches.
Discover More Solutions
You will find more mil/aero solutions of all kinds in our Solutions Directory. You can also browse this community for additional insights on handling highly computationally intensive image processing requirements. Let me know if there is a particular topic you’d like us to address.
Contact featured members:
Solutions in this blog:
Roving Reporter (Intel Contractor), Intel® Internet of Things Solutions Alliance