Skip navigation
2010

Since the VirtuaLab provides remote access to Intel hardware, we decided to continue developing the control system using the Intel Development kit. Of course, the Intel development kit will eventually be replaced with an ATOM hardware design that contains only the necessary functions for the system.

 

Choosing an embedded software development tool chain may be trivially simple if your company has already standardized on a specific tool chain that supports the ATOM processor. But if the Intel architecture is new to your development team and there is not already a standard embedded development environment chosen, you have several questions to answer:

 

  • Is this design a “one off”
  • What is the nature of future developments likely to be (large scale software, small scale software, software based on open source)?
  • Does my company have other architectures that would strongly suggest a specific vendor be chosen?
  • What is the anticipated lifespan of this software and product?
  • Does my company have a specific development methodology as a standard?

The off-grid electric controller is a one-off system and can be developed by a small team of 1-2 programmers in a short period of time. The software complexity is small, consisting of approximately 3,000 lines of C code. Of this C code, about 2,000 lines is preexisting Open Source or Free Software C developments. The projected life is five years and is not expected to be used after that time.

 

Many software developments start with similar assumptions about the life of the software, only to find that the life is much longer than anticipated and undergoes incremental improvement for the actual life of the software. So, making these decisions based on the specified factors may not be the best business decision. A conservative choice would conclude that the prudent choice was to project a longer software life with significant unplanned feature extensions.

 

For the Intel ATOM processor there are three major vendors of embedded development tool chains: Intel, Green Hills Software (1) Inc, and Wind River Systems (2) Intel provides a software development toolkit for ATOM that includes a C/C++ compiler for Linux, Intel-specific debugger – optionally including the open-source Eclipse framework, JTAG debugger featuring full access to hardware resources, and VTune performance analyzer. In addition to the compilers, Intel’s development toolkit includes assembly language debug for getting directly at machine capabilities.

 

The Intel complier provides faster compile and run time - up to 30\% performance gain over GCC. Faster execution means that less power is consumed. For low power applications this means an extend battery life if battery-powered and lower current draw for applications operating in a fixed power application. Faster application completion and faster execution of performance-critical code allows the ATOM-based device to return to idle mode faster, decreasing overall power consumption. The complier also provides an in-order scheduler for the use of Intel Atom processor—a hardware platform-specific optimization technique to obtain an extra performance improvement.

 

238i92869ED21389A0DD

 

Green Hills Software Inc offers Intel ATOM support via its Multi Integrated Development Environment (IDE). A central focus of the Multi environment is software project management. Project Bulder provides the basis for system-wide static validation of modules and build control. <image>Automatic dependency determination further cuts development time by eliminating the need to write and debug makefiles. Application build time can be minimized with the parallel build setting. This permits defined build processes to run parallel to decrease build times on single- or multiple-processor machines. Based on their long involvement with IDE development for multiple processor families, Green Hills provides a whole suite of tightly integrated tools to aid in debugging code. <image>

 

Green Hills Softweare’s compiler support for Intel’s ATOM processor includes both C and C++. In addition to C-based languages, Green Hills also  offers Fortran and ADA. Where Intel offers its development toolkit under several Linux flavors, Green Hills adds Windows to: Linux, Solaris, and HP-UX to its offering.

 

239i95F3147DB1476FF3

 

Wind River Workbench includes C and C++ compilers have application-specific optimizatipon capabilities built-in. The  Compiler uses a wide range of global, local, processor-specific, and application-specific optimization techniques to generate code that runs faster with a smaller footprint. In this case application-specific means that actual run time data from the execution profile. Whole program optimization permits the compiler to inline functions across multiple modules and source files, significantly boosting performance. Profile-driven optimizations employ the compiler’s capability to instrument the code and collect profile information specific for the application being developed. This information is then fed back into the compiler, enabling it to make optimized decisions when performing function inlining, register allocation, branch prediction, and other optimizations. Each of these optimizations improve application performance and footprint.

 

These optimizations are achieved by including two-staged compilation during the compilation process. During the first stage, the compiler instruments the code. This instrumentation is then executed on the target or in a simulation environment on a preselected dataset. Next the instrumented code collects execution profile information, which is then fed back into the compiler for the second stage of the compilation.<image>

 

During the second stage, the compiler uses the profile information to further improve optimizations. These optimizations include loop unrolling, inlining, basic block reorganization, register allocation, and branch prediction. Because a typical dataset is supplied during the first phase of the compilation, these optimizations will be highly tuned.

 

There are three very good selections for C compilers to produce code for the Intel ATOM processor. Each offers distinct advantages in optimization approaches. The differences for your specific application may be significant. The best way to make a determination is to compile your actual application code in final form.

 

How would you make a compiler selection under these circumstances?

 

________________________________________________________________

  1. Green Hills Software is an Affiliate Member of the Intel Embedded Alliance
  2. Wind River Systems is an Associate Member of the Intel Embedded Alliance

 

Henry Davis
Roving Reporter (Intel Contractor)
Intel(r) Embedded Alliance

Fortunately there are tools available for Intel embedded processors to optimize systems performance. The key is that these tools are geared towards optimizing performance and not maximizing performance as the default condition. Performance analysis is achieved via a series of tools that measure parameters that skilled programmers can use to achieve performance and size goals. These measurements and reports include:

A call graph provides a graphical view of the flow of an application permitting applications developers to gain a higher level view of the operation of the application. This helps to identify critical functions and timing details. Call graph profiling offers a graphical high-level, algorithmic view of program execution. This is achieved based on instrumenting the executable files used to produce function calling sequence data.

 

Time-based and Event-based sampling is a statistical method for locating performance bottlenecks imposing a low overhead on the application. Time-based sampling finds “hot spots” that consume a relatively significant amount of CPU time. Event based sampling helps identify possible places where cache misses, branch mis-predictions and other performance issues occur.

 

219i79CC53E84FB26F4E

 

Source view sampling results are displayed line by line on the source / assembly code to aid the programmer in analyzing where the data should be associated with the program code.

 

A counter monitor provides system level performance information. This includes resource consumption during the execution of an application

 

The Intel Thread Profiler gives programmers a timeline view identifying what threads are doing and how they interact. It shows the distribution of work to threads and locates load imbalances.

 

A Performance Tuning Utility (PTU) is an optional function that gives VTune analyzer users access to experimental tuning technology. This includes information like Data Access Analysis that identifies memory hotspots and relates them to code hotspots.

 

Intel Parallel Amplifie is the performance profiler component of Intel Parallel Studio. A VTune user license carries access to Parallel Amplifier. A statistical call graph which is lower overhead than VTune's exact call graph, provides concurrency analysis.

 

How these tools are used depends on what your optimization goals are. For example, to obtain maximum performance there are a number of tricks available for programmers. For example, consider the following pseudo code:

 

Do a[i+]=b[i+]*c[i+]

Until i>27;

 

This code performs one arithmetic operation per iteration through the loop. Ignoring the arithmetic capability of the processor we have one loop branch operation per loop iteration. This in the limiting case causes the loop to take twice as much time per full execution of the code fragment as the basic arithmetic functions. So, to speed up this fragment, programmers will perform a loop unrolling operation. Some compilers permit the loop unrolling to be performed automatically by the compiler according to some control switches in the source code. The result in the extreme case of loop unrolling is a single line per stage of the arithmetic operation:

 

a[1] = b[1]*c[1]

a2] = b[2]*c[2]

a[27] = b[27]*c[27]

 

In the condensed form, the VTune tool kit will show the loop as a “hot spot” because the iteration causes the arithmetic operation to be counted twenty seven times and report that information in the time based analysis of the code. In the fully unrolled form, the time based analysis will lose the hot spot because the code now consists of a series of individual lines of code.

 

Loop unrolling can speed up applications, but once unrolled manually it is very difficult to identify the code as a candidate for size reduction. For example, the second piece of code will look like twenty seven unrelated lines of code. In general automated tools do not identify these examples as an iteration. So, as a practical method of optimizing embedded applications, it’s generally best to write code in a dense form first and expand via loop unrolling and other rewrites as required. Using this approach VTune can guide the development process to achieve performance requirements.

 

Most performance issues gain the most attention from many developers. Much of this approach to performance comes from a general focus on a “memory is free” philosophy. But memory can become a significant cost in embedded systems. For a great many applications the electronic system consists of the processor, support circuits, and memory. It may be obvious that memory comes in discrete units of size, but this is a critical component of system cost. As an example, for small data sets and simple applications data can be stores in variable length arrays that use simple brute force searching in favor of simplified display. The program is small and takes few resources, but the data is loosely packed, taking more space than other techniques. An alternative is the trie. A trie is an ordered graph with an associated array of data. No node in the tree stores the key associated with that node. Instead, its position in the tree shows what key it is associated with. In this type of data structure information retrieval is a more complex process and takes more processing time to perform any operations as compared to a simple binary structure.

 

Using performance tuning tools permits developers to try alternative representations quickly with analytical proof of the effects of the alternative representations.

 

220i92154B0F1A303512

 

There are alternative tools available for performance analysis. Green Hills Software (1) offers The Performance Profiler. The Profiler provides a view into the behavior of the program by precisely specifying:

 

  • the percentage of time spent executing each source line or instruction
  • the total number of times each line or instruction was executed
  • the total number of times each function was called

Wind River Systems (2) provides a series of run-time analysis tools within its Workbench product:

 

  • System Viewer
  • Memory Analyzer
  • Performance Profiler
  • Data Monitor

Regardless of the tool suite that you use for developing embedded applications, basic tools exist within each of the mainstream development tool kits to aid in analyzing and optimizing systems performance.

 

Have you considered how you will optimize your next application? Speed/size/complexity of the application?

 

______________________________________________________________________

  1. Green Hills Software, Inc is an Affiliate Member of the  Intel® Embedded Alliance
  2. Wind River Systems is an Associate Member of the Intel® Embedded Alliance

Henry Davis
Roving Reporter (Intel Contractor)
Intel(r) Embedded Alliance

It’s often helpful to divide the “debugging” task into a non-real time constrained code and real time timing code. For example, a simple type-ahead typewriter application has a real time component that is “soft” in nature. The application continues to function, within reason, regardless of how fast a single key press is processed. The real time component can become more stringent, dropping the soft aspect, making the debugging process more complex because external factors control the software functioning based on time.

 

Debugging embedded software is a well supported activity with a wide range of tools available to perform the tasks. The tools for debugging include:

 

  • Processor Probes
  • In Circuit Emulators,  
  • Instruction Set Simulators,
  • ROM Monitors
  • cross-debuggers
  • integrated debugging capability with  Real Time Operating Systems (RTOS)
  • in-house custom systems

 

Of these tools, many industry standard tools based on probes and ROM monitors have similar capabilities. These include at least forming simple hardware breakpoints, instruction tracing, complex compound conditions for breakpoints, turning on and off instruction or data traces, and nested conditionals for all types of instruction and data conditions.

 

When designed properly, programmers can use either an instruction set simulator based debugging tool or an actual hardware development system with no loss in general purpose debugging capabilities excluding external hardware functionality.

 

Let’s look at the basic debugging capabilities needed to successfully complete a software debugging activity using the debugging tool recommendation from the Nexus Forum™, an industry standards setting body. Basic control features of any embedded debugger include the ability to:

 

query and modify all locations available in the processor’s supervisor map. For simplicity of implementation, this is often restricted to when the processor is halted.
support breakpoint/watchpoint features in the debuggers. These are available as either hardware or software breakpoints with software breakpoints used for ultra-low cost development environments. Some processors may favor one or the other approach depending on the architecture. Because of its simplicity of implementation, configuration of breakpoint/watchpoint features is often performed when the processor is halted.

 

 

Logic analysis requires basic facilities to:

 

  • read instruction trace information with acceptable impact to the system under development. For low cost development kits access to instruction trace data is often performed by halting execution of the application program. This system succeeds in permitting developers to interrogate and correlate instruction flow to real-world interactions.

 

  • retrieve information on how data flows through the system with acceptable impact to application under development. Central to this capability is understanding what system resources are creating and accessing data.

 

  • assess whether the embedded software meets required performance goals.

 

Debugging can be carried out using the most rudimentary facilities, but at the cost of time and perhaps the need to build special hardware to add control or reporting. The decision of what tools you need is highly dependent on the size of code and the type of application that you are developing. An embedded application that fits into 2kW of program memory has different requirements than an application of tens of thousands of lines of code or more. Regardless of the application type, there is valuable information that can be obtained by any of the debugging facilities, no matter how basic.

 

217iD831B107C783C73A

 

Many processors lack full tracing facilities due to the high bandwidth requirements of the data unloading and the large number of I/O pins required to allow full access to address ad data busses for all internal busses. For example, Intel Architecture for embedded applications has no dedicated trace port – which reduces the pin count. The architecture supports the essential tracing capabilities for branch tracing. There is a branch trace recording buffer, and a means of emitting branch trace messages in line with write data.

 

Using the debug facilities described so far we can gather valuable information that allows us to find and fix several important classes of errors. For strongly typed languages with linguistic redundancy for type declarations, many difficult to find bugs are eliminated by the language construction. For most common bugs the simple tools described are adequate to find and fix most errors. Commercial debug tools include facilities that make test and evaluation straight forward.

 

Green Hills Software’s(1) processor probes for the Intel Architecture are provided by the Green Hills probe with  high-performance debugging.

  
  
  




The Green Hills probe is a hardware debug device that enables the MULTI debugger to load, control, debug, and test a target system. Debugging can be accomplished without the need for prior board initialization, an RTOS, or even a ROM monitor.



Wind River Systems (2) also provides a full suite of debug tools. The Wind River Probe is a portable JTAG-based probe. The Probe integrates Wind River Workbench On-Chip Debugging and the Wind River On-Chip Debugging API to provide a flexible debugging platform.

 

218i9A0800D9746A305E

 

Features

 

  • USB-based portable debug unit for single-core/single-thread operation
  • Full control of target: start/stop/reset, data- and expression-based hardware and software conditional breakpoints, single step through code
  • Access to core and peripheral registers
  • Hardware diagnostics scripts to enable validation of address/data bus configurations and memory read/write verification
  • Operating system awareness to provide access to kernel objects to simplify OS and device driver stabilization for
  • Internal trace buffer for visibility of code execution and system bus (for supported processors)

While these debug (probe) capabilities are adequate for debugging small programs of up to a few thousand lines of code, modern or more complex applications software requires additional capabilities. More importantly, there is a class of bugs that are pathological based on language structures.  These include errors associated with type coercion, type casting, and array boundary checking. Bounds checking as an extremely expensive process often requiring twice or more resources than the array calculation itself. Due to the relatively expensive bounds check, it is a frequently overlooked means to improve software reliability. Instead the vast majority of applications rely on test methodologies to avoid this type of error. These errors are most often a dynamic fault which may change its behavior based on the size of the code space.

Where many control flow and logical errors that can be diagnosed using the simple probes described above, this class of error is best discovered through instrumentation of the software. Instrumentation through software combined with some hardware capabilities to find:

 

  • accessing an element beyond an array’s declared bounds
  • assigning an out-of-range value to a variable or field of small integral type
  • unhandled case in a switch statement
  • dividing by zero
  • accessing invalid memory through a pointer
  • memory leak detection

These dynamically oriented tools are supplemented by static analysis tools. But perhaps the most often overlooked debug tool provides programmers with program structure, hierarchy, dependency graphs and other related tools that aid programmers in understanding the intended function when original programmers are no longer available.

Software tools offer a wide range of capabilities for system test, diagnosis and verification. The tool choice, complexity, and cost depend on the complexity of that application, and the size of the organization. Perhaps no factor plays a bigger role than the processor architecture. For Intel embedded processors there are a number of commercial tool chains available, including offerings from Green Hills Software and Wind River Systems together with Intel’s tool kit.

 

What choice will you make?    

 

 

____________________________________________________________________

  1. Green Hills Software is an Affiliate Member of the Intel Embedded Alliance
  2. Wind River Systems is an Associate Member of the Intel Embedded Alliance
Henry Davis
Roving Reporter (Intel Contractor)
Intel(r) Embedded Alliance

The development of the off-grid electric system is based on the Intel® Atom™ Processor Z5xx Series and Intel® System Controller Hub US15W Development Kit. Although the Development Kit offers a large number of peripherals, the kit does not include wireless radio links for experimentation.

 

215iCF9D3698074C69BB

 

 

To add WiFi capabilities to the Development Kit requires the addition of both hardware (the WiFi transceiver) and software (a TCP/IP protocol stack). Green Hills Software, Inc {1} high-performance stack is one of the alternative stacks that can satisfy a socket connection for WiFi connectivity. The foundation for all networking protocols for the Green Hills family of operating systems is the GHNet TCP/IP stack. It's a full featured and high performance dual mode IPv4/IPv6 stack for embedded systems. There are options for advanced routing and security protocols. GHNet is a compact IPv4/IPv6 stack designed for minimum footprint and maximum performance. The stack is integrated and validated with INTEGRITY®, INTEGRITY-178B, velOSity™, and µ-velOSity™.

GHNet is suited for use in products including small foot print consumer devices and advanced core network equipment. It has broad Internet engineering support and has been through extensive protocol conformance and interoperability testing. Like other vendor-supported protocol stacks, it is also integrated with a broad range of networking applications, management, and security protocols.

GHNet is a true dual mode IPv4/IPv6 stack and can be configured for IPv4 only, IPv6 only, or to support both protocols simultaneously. This is an important feature since the transition from IPv4 to IPv6 is expected to take several years, perhaps more. Furthermore, the IPv6 functionality has been approved by the industry standard IPv6 READY Program, which guarantees IPv6 interoperability.

The GHNet protocol suite has a modular design and is highly configurable providing maximum size and feature scalability. When a module is not utilized, it is not just deactivated, it is removed entirely to save storage memory space in the sometimes – perhaps often - limited capacity commonly the case in embedded devices. While not the smallest stack, the completeness and existing validation for the protocol stack can make up for its size. The GHNet stack results in footprint sizes as small as 25 kilobytes for a UDP only configured stack and 41 kilobytes for a TCP enabled stack. GHNet can be configured to run in either the kernel’s address space, or in a separate partition for maximum security. It is also possible to run multiple instances of the stack in separate partitions enabling stacks to execute at multiple independent levels of security.

 

Like Green Hills, Wind River Systens (2) offers an integrated proprietary RTOS and a protocol stack. The tradeoff using Wind River’s TCP/IP package is the requirement to use their Real Time Operating System (RTOS). The upside is that the TCP/IP package is fully integrated and tested with VxWorks eliminating that task as part of the software development process. Wind River partners also offer a variety of TCP/IP packages including:

 

ACCESS Systems America Inc 

 

EmbVUE Inc.

 

Express Logic http://www.expresslogic.com/

 

Team F1 http://www.teamf1.com/

 

216i5DCB134D12C6724F

 

As often happens during development, we gained more information on data bandwidths. We are able to obtain all of the textual information from the electrical components at relatively low data rates. Information from the Outback inverter/charger is transmitted at 19.6 kbaud – a significantly lower rate than originally planned. Other sources of data (and control to the subsystem) includes a remote generator controller with a similar bandwidth requirement, water tank sensor with a bits per day data rate, and weather monitoring also at a very low data rate. With the significant reduction in the real data rate we can consider other alternatives. Recently a friend at TechBites  wrote a review of RadioCraft’s  transceiver modules. At 100kbps the RC232 series offers more than the required bandwidth. The big win for this design is a reduction in radio power requirements.

Although this system can meet its performance requirements using only a 100kbps radio communications channel, there remains the potential of using a WiFi connection. WiFi moves 100 to 1,000 times the data throughput as the lower speed radio. The Intel® System Controller Hub US15W employs a USB peripheral interface. Thus, should new peripherals or WiFi data rates become a system requirement, there remains enough processing capability to service this requirement. Of course this only deals with the data movement processing requirements and not any additional processing requirements.

 

What opportunities do you have in your designs to reduce cost, power, size, or cost by reducing an assumed performance requirement?

 

______________________________________________________________________

  1. Green Hills Software, Inc is an Affiliate Member of the  Intel® Embedded Alliance
  2. Wind River Systens is an Associate Member of the Intel® Embedded Alliance

Henry Davis
Roving Reporter (Intel Contractor)
Intel(r) Embedded Alliance