The opening of the Intel® Atom™ E6xx processor interface for Input/Output Hub (IOH) functions creates a new option for designers considering adding specialized hardware to their system for any one of a variety of reasons. Selecting which functions to implement in dedicated hardware in a customized E6xx chipset can be an involved task. But there are some simple ways to evaluate whether or not to implement specific functions in a customized chip set. If you do select the customized chip set route, there are easy to use software strategies that provide maximum flexibility and permit field fixes for bugs discovered in fielded, dedicated hardware.
Generally the decision to add specialized hardware to a system is made for one of a few reasons:
- Cost effectiveness, and
Capability: There is a well known proof in Computer Science, often called the Church-Turing Thesis, that tells us that any computable function can be calculated by a Turing machine given infinite memory and time. Thus it is with our modern microprocessors: given enough time and memory we can calculate any function that we want. Practical systems have neither infinite time nor infinite memory. Time and memory limitations lead us to choose processors with a sufficiently compact program and data representation combined with enough processing power to meet our real world performance needs. In one example of adding capability, the E6xx itself added capability to the Atom-based offerings by expanding video capabilities to include full-motion hardware accelerated video CODEC (coder and decoder) functions to its predecessor processors. By offering expanded capabilities, the Atom E6xx expands the range of Intel Architecture solutions for embedded applications.
Capacity: Intel multi-core processors are one way to extend capacity. By adding more cores to the processor chip, Intel adds capacity without requiring any significant change to well-crafted systems software. The potential downside of simply adding more cores to a processor comes from increased power consumption.
Cost Effectiveness: High volume end products can sometimes benefit from increased integration, while other products can only meet their design objectives if customized chip sets are employed. In a recent blog (url) I wrote about how ADI Engineering (1) reduced the area and complexity footprint of an E6xx-based design by eliminating the need to employ a general purpose hub component, replacing it with a purpose-designed chip.
Security: Customized parts can make reverse engineering a design more difficult, which improves systems security and increases the difficulty for “knock off designs” to quickly displace the original design. While no security system is truly “crack proof,” replacing a standard off-the-shelf component with a custom design may increase the difficulty of simply copying the design. In the case of ADI’s hub replacement chip, an FPGA implements the necessary Atom processor I/O functions. Many FPGAs have provisions that permit the device to be programmed to disable reading the device connection details. Reverse engineering techniques exist that allow a highly skilled scientist to decode the FPGA design, but at a significant cost.
Making systems tradeoffs can quickly become a quagmire of competing requirements including comparative power consumption, board size, resource utilization, and systems security – all with interaction with the systems software. Making tradeoffs between hardware and software can be achieved using a few simplifying guidelines:
- Plan on using no more than 80% of total processor performance without careful engineering evaluation – usage of additional processor performance as a requirement increases development time and cost.
- Reserve at least the last 10% of the processor performance for unexpected engineering surprises.
- If possible, engineer the system using the mid-range performance processor in a family. For designs using the E6xx processor, that means choosing a 1.0 or 1.3 GHz processor. Selecting a mid-range performance processor provides the option of using a faster processor should it prove necessary during the initial system implementation, or when additional features are required.
- Select a lightweight performance footprint Real Time Operating System (RTOS) to conserve processing power for the application. OSes in general can be deceptively heavy users of not only processor performance, but other systems resources such as memory. In a purely engineering-driven choice, the lowest resource demanding OS would be chosen, but other factors such as familiarity and time-to-market are often just as important at the technical requirements for the application.
Making the hardware/software tradeoff: A classic example of hardware/software tradeoffs can be found in Apple’s original Floppy Disk drive. Originally based on the Suggart Associates 5.25” floppy disk A400 drive, Steve Wozniak recognized that part of the 40-chip drive controller could be replaced by a bit of software and some out-of-the-box hardware thinking to reduce the 40 chips to just 8, saving a lot of money.
Today’s embedded systems software-for-hardware tradeoffs are seldom as relatively simple and easy as the Apple software replacement for parts of a floppy disk controller. The E6xx already has many of the hardware accelerators built-in for common functions: video, audio, display controllers and various timers. These higher level functions serve a number of important markets such as Medical, Surveillance Systems, Digital Signage, Automotive Entertainmen, Communications, Streaming Media Player, Deeply Embedded Media Player, and low cost communications products such as wireless-based embedded applications.
There’s opportunity for including other performance-demanding functions, or eliminating high cost hub-based Input/Output control. ADI Engineering has pioneered using Intel’s open standard interface to the IOH for cost reductions in the hub-replacement function. Intel’s choice to repartition the Atom architecture so that the memory controller is on the CPU, but other I/O functions are housed in a second chip, gives engineers more options. In a past blog, I wrote about implementing a PID controller in the Atom processor. That application had relatively modest performance requirements. Let’s consider a control system that must operate substantially faster than any Atom processor can compute the control parameters. If the control algorithm uses a small number of history states to calculate new parameters, it’s easy to see that a full hardware implementation might be done using the PCIexpress interface to set control parameters but with all coefficients and arithmetic contained in the custom chip. By containing the specialized memory locations and arithmetic circuits on a single chip, we minimize power requirements and bus traffic between the CPU and the chip during operation. Power is minimized because we avoid the unnecessary movement of data on and off chip. Each transmission off a chip requires driving a signal line, which involves charging the line to a specific voltage. Each charge-discharge cycle of an external bus consumes large amounts of power. By eliminating the need to go off-chip, we save both power and improve performance.
The challenge for making tradeoffs occurs in the vast middle ground between a full software implementation, and a dedicated bit of circuitry to perform the entire function. To make these tradeoffs it’s best to use both analytical modeling such as using a spreadsheet to estimate resource requirements, and software performance analysis tools.
Green Hills Software (2) includes a Performance Profiler as part of their Multi software development suite for the Atom processor. Combining this tool with simulation capability permits engineers to judge the real performance advantage of one design versus another. You can start with a fully software implemented algorithm and gradually replace software modules with hardware implementations until you reach the required systems performance. The ability to save and use test data minimizes the effort required to test alternative designs.
Wind River Systems (3) also provides a Performance Profiler as part of their Workbench product.
The key to using standard off-the-shelf development tools is to implement the proposed hardware function first in software. Use the performance analysis tools of your favorite development platform to establish a benchmark for processor utilization. Next, use systems simulation tools such as Eclipse-based tools to include hardware models in the systems simulation. This allows an additional benchmark to be determined for a part-software-part-hardware implementation. Depending on the external chip used to implement the hardware accelerator, you may be able to use the software implementation as a launching pad to implement the hardware functions – many FPGAs use a C or C-like language for designing chip functions.
Field fixes: Updating or fixing hardware bugs can become a nightmare for engineers. But by using less than 80% of the Atom processor performance, engineers can buy time and space for field upgrades. Besides having enough processor performance left for fixes, part of the solution is to develop the external hardware with separable modules.
For example, a full-motion video CODEC can be engineered with only access to the input and output streams. Smart designers will include the ability to use subsets of the full external hardware to reconfigure operations to fix bugs. One way to accomplish this is to create the external functional blocks so that a variable is set in the external hardware that changes based on where in the system the functional block is initiated. A function that calculates a discrete cosine transform could be developed with a flag that is initially set to indicate that the function is initiated by the Atom processor. This variable is then reset to indicate that the function is called by the internal hardware. Software control of the external hardware simply starts the function at an alternate entry point that will result in control being returned to the processor when the function is complete. Using this approach means that an error in a specific function may be bypassed through the technique of controlling the external hardware by the Atom processor at a finer level of control.
The repartitioning of the Atom processor implementation provides more alternatives for engineers.
What hardware/software tradeoffs do you have?
- ADI Engineering is an Associate member of the Intel Embedded Alliance
- Green Hills Software is an Affiliate member of the Intel Embedded Alliance
- Wind River Systems is an Associate member of the Intel Embedded Alliance
Roving Reporter (Intel Contractor)
Intel® Embedded Alliance