The military and aerospace industries have some of the most demanding imaging requirements of any industry. Many of these applications are highly computationally intensive with requirements for high levels of precision, thus the need for floating-point data formats. These industries have been pioneers in imaging for decades, influencing much of the technology that is commonly used today, for example, the Vector Signal Image Processing Library (VSIPL) that is a standard set of functions and an open application programming interface (API) for signal and image processing applications.
The Intel® Advanced Vector Extensions (Intel® AVX) 2.0 introduced in the Haswell microarchitecture take these capabilities to a new level, delivering a 2x increase in peak floating-point throughput for an impressive 307 billion floating point operations per second (GFLOPS) at 2.4 GHz in a quad-core 4th generation Intel® Core™ processor. Fixed-point arithmetic also sees a 2x boost in peak throughput, and both fixed- and floating-point algorithms benefit from new vector gather, scatter, and permute operations.
Improvements and Benefits of AVX2
Improvements in Intel AVX2 include:
- Extension of most integer instructions to 256 bits for 2x higher peak integer throughput which is particularly useful for imaging processing workloads.
- New vector gather, shift, and cross-lane permute functions that enable more vectorization and more efficient loads and stores. The amount of shift is controlled by vector, critical in vectorized loops with variable shifts.
- Fused multiply-add (FMA) instructions for 2x higher peak throughput—up to 307 GFLOPS at 2.4 GHz in a quad-core 4th generation Intel Core processor. These instructions are very useful in high performance computing, professional quality imaging, and face detection.
The most important benefit to the military and aerospace industry is the ability to process even larger sets of complex data in real time, faster and with less power, which helps to optimize the size, weight, and power ratio of their embedded systems. This leads to systems that are more effective, economical and more capable of processing the immense streams of real time data that is collected with today’s embedded systems.
Members of the Intel® Intelligent Systems Alliance having been working hard to ensure that they are in step with the Haswell microarchitecture and are able to offer enhancements to their own products that bring out the best capabilities of AVX2. Here is a peek at what a couple of members have done to leverage AVX and improvements they have seen with AVX2.
N.A. Software Ltd (NAS) has a suite of software tools for signal processing, vector processors, and DSP-related applications for military, aerospace and other industries requiring fast or real time processing. They also develop and license advanced radar algorithms and low-level DSP libraries including the Vector Signal Image Processing Library (VSIPL). NAS has produced a highly optimized Intel AVX2 VSIPL library that is especially well optimized for complex vector multiply operations, sine/cosine (when the data is not range reduced), and split complex FFTs. The NAS library is standalone code that does not rely on any third party software, enabling the library to be recompiled for any operating system quickly and easily to gain the most out of the Intel AVX2 instruction set.
NAS recently used VSIPL DSP operations on the Ivy Bridge and Haswell platforms to benchmark Intel AVX and AVX2. The results show that the Haswell platform has a significant performance advantage for all the DSP operations over most of the data sizes studied. The following table shows which DSP library and platform produced the optimum performance.
GFLOP Peak Performance
Complex vector multiply
Figure 1: N.A. Software benchmarks of Intel Advanced Vector Extensions 2.0 compare Ivy Bridge and Haswell architectures.
More details on the NAS benchmarking can be found in the following reports.
- VSIPL Benchmarks
- AVX Performance Gain Study: SAR/MTI
- Image Signal Processing Performance on 2nd Generation INTEL® CORE™ Microarchitecture
NAS provides core software to several of the Intel Intelligent Systems Alliance members that is then used to enhance and compliment their own software libraries.
Curtiss-Wright Controls Defense Solutions (CWCDS) offers Continuum Vector, a comprehensive set of C-callable functions which have been optimized to exploit the performance of the SIMD instruction sets of Intel AVX.
“Many more pixels and bits of data requiring more precision are being processed in military and aerospace imaging applications,” commented Eran Strod, systems architect at Curtiss-Wright Controls Defense Solutions. The increases in data and precision drive demand for more performance from the processor to get the job done. “We expect continuous generation-to-generation improvements from Intel and Haswell has done exactly that,” continued Eran. Support for AVX2 in Continuum Vector is expected in the near future as CWCDS completes integration and optimization of the libraries that maximizes the hardware potential of their boards and systems. Eran noted that in early testing, they have seen as much as a 4x improvement in execution times in certain 2D FFT’s, though specific improvements are highly dependent on the function and algorithms.
Intel AVX2 in the Haswell microarchitecture continues the performance improvement trend for floating-point data formats that is so important to many imaging applications within the military and aerospace industries. Improvements in many benchmarks are exceeding 2x over the original AVX numbers. This has developers excited about SWaP improvements and the huge gains in real time floating point data processing.
Contact Featured Alliance Members:
Solutions in this blog: