Skip navigation

The original design philosophy for the underlying technology that enables the Internet was founded on providing multiple alternative routings from one computer to another. By re-establishing the original “Internet” approach to communications channels, the communications can continue through most communications outages. Internet routing employs automatic fallback to alternative channels which assures continuous communications under most circumstances. Establishing redundant physical connections further improves reliability, but it is also possible to bond channels together to achieve faster communications. This was one approach used by some vendors of V.32 telephony modems to achieve better communications throughput. Today, there is a wide variety of physical channels available to designers of embedded systems.

A basic tenant of systems design is to avoid all single point failures if practical. Within embedded systems that include communications to other systems, the highest probability of failure lies with the communications channel. For any system that relies on communication with another system, the failure of a sole communications link generally leads to systems failure. The answer to this point failure is to add another communications link.

Companies like Australia’s NewSat actively deploy backup and emergency communications to supplement their mainline offerings.  Despite the ever-shrinking size of electronics, physical limitations impose minimum footprint sizes for systems that use satellite links.


Systems that include redundant communications links can use all available channels to increase bandwidth. Or, the links may be used in fallback configurations to increase systems reliability and availability. The process of using multiple links to increase bandwidth is variously called:

  • Link aggregation
  • trunking
  • link bundling
  • Ethernet/network/NIC bonding
  • NIC teaming


For LAN connections, more than one physical port is configured to be bonded. But not all bonding schemes increase the channel bandwidth.  For example, the Linux bonding driver provides a method for aggregating multiple network interfaces into a single logical interface. How the bonded interfaces behave depends on the bonding mode selected in the driver. For Linux, the bonding driver modes provide either hot standby or load balancing.

  • Round-robin
  • Transmit packets in sequential order from the first available NIC through the last - provides both load balancing and fault       tolerance.
    • Active-backup
      • Only one NIC in the bond is active. A different NIC becomes active if the active NIC fails - provides fault tolerance only.
    • XOR
    • Selects the same NIC for each destination MAC address - provides load balancing and fault tolerance.
      • Broadcast
        • transmits on all NICs - provides fault tolerance.
      • IEEE 802.3ad Dynamic link aggregation
      • Adaptive transmit load balancing
        • channel bonding that does not require any special switch support - if the receiving NIC fails, another NIC takes over the       MAC address of the failed receiving NIC.
      • Adaptive load balancing
        • The Linux bonding driver overwrites the source hardware address with the hardware address of one of the NICs in the bond       - different peers use different hardware addresses.



      Bonding may take place at any one of the three lowest level OSI layers.  Wireless and power line devices are generally bonded at layer 1 while Ethernet links are typically bonded at the data link layer (layer 2). It is also possible to bond at the network protocol (layer 3) such as Internet Protocol (IP). Bonding was originally the domain of infrastructure companies using large scale hardware. But the advances in CPUs have moved the potential of bonding from near the central office towards the outer edges of the communications network.


      Edge Access is a pioneering company in the VoIP business focusing on emergency communications. Edge Access’ equipment was deployed to New Orleans, Louisiana during hurricane Katrina. The equipment facilitated the first voice call from the devastated area. The electronics required to perform this embedded task was a bit bigger than a full size PC tower, but the use of a satellite link makes the footprint of the complete package footprint to be defined by the size of the satellite dish.  Edge also manufactures VoIP systems based on other communication links that are smaller in size.


      Norco (1) has another smaller, lower power approach to assuring reliable communications. The BIS 6623 can provide 3G/4G connections with voice, video, and data. Presented as a way of maintaining mission critical communications in an ever-expanding application of data types in emergency communications, the 6625 is a fan-less design base in the Intel® Atom™ 6xx processor.


      BIS-6625 Block Diagram.jpg




      Norco’s BIS-6623 is an example of using commercially available communications channels in a redundant manner to provide critical communications for systems used by emergency services and others. The system employs standard protocol stacks to manage multiple channels simultaneously under Microsoft® Corporation’s (2) Windows® embedded Operating Systems. Fallback protocols are implemented in software to select (or even bond) between available channels.

      The hardware platform Is available in several standard versions and comes standard with a 1 GHz processor clock rate and 1 G of DDR2 memory. Alternative standard versions are available including one based on the 600 MHz Tunnel Creek, and OS-less bare hardware option.


      Norco identifies third party wireless hardware suppliers Huawei and Sierra Wireless products to enable wireless communications. The 3G-capable Sierra Wireless AirCard® 503 2-in-1 Data Card provides PC Card and PC Card Express form factors in one package. Redundancy for the BIS-6623 is achieved by loading two or more 3G cards (or USB modems) into the 6625. If you are using the Windows operating system, device bonding is determined by how you set up the Windows drivers. Keep in mind that although the Norco product is aimed at wireless situations, these compact products also have other connections including USB and Ethernet.


      Bare hardware options for the BIS-6625 can run any OS that supports Tunnel Creek. So, Wind River Systems (3) Linux product for telecommunications carrier grade applications brings high software reliability to the compact Norco form factor.


      The Intel Atom processor family enables many small form-factor designs. Intel Embedded Alliance Premiere members Advantech (4), Emerson (5), Kontron (6), and RadiSys (7) all offer a variety of configurations. You can learn more about these and other Alliance members’ Atom-based products at Intel’s convenient membership web site.


      Communications channel bonding is increasingly becoming a viable option for embedded systems, but consider the totality of the environment that your system operates within. Single points of failure may exist in switches and routers outside your embedded system.

      Can your next embedded system benefit from redundant communications?




      1. Norco is an Associate member of the Intel Embedded Alliance
      2. Microsoft Corporation is an Associate member of the Intel Embedded Alliance
      3. Wind River Systems is an Associate member of the Intel Embedded Alliance
      4. Advantech is a Premiere member of the Intel Embedded Alliance
      5. Emerson is a Premiere member of the Intel Embedded Alliance
      6. Kontron is a Premiere member of the Intel Embedded Alliance
      7. RadiSys is a Premiere member of the Intel Embedded Alliance




      Henry Davis
      Roving Reporter (Intel Contractor)
      Intel® Embedded Alliance

      Interoperability is just a fancy way of saying that systems have to work together. The concept is simple but the mechanisms required to make it work are not. Interoperability is achieved through adherence to international and other standards, combined with translation packages that homologate “nearly identical” communications channels. The first approach is exemplified by Internet standards like TCP/IP while the second is implemented by systems like Common Object Request Broker Architecture (CORBA) and its Object Request Broker (ORB).  Microsoft Corporation (1) has a different idea than CORBA, instead relying on its own Distributed Component Object Model (DCOM) and Windows Communication Foundation. However, Microsoft has agreed to a gateway standard to translate between CORBA and DCOM.  Although CORBA and DCOM have been developed for the Information Technology server-client model, embedded systems are closing the system complexity gap between embedded systems and IT systems.


      Interoperability has multiple levels of meaning including hardware and software interoperability. Adlink’s(2) Jeff Munch presented an overview of COM Express interoperability in an Intel® Embedded Community blog that also shows the way for other hardware interoperability. While hardware is one key piece of the interoperability solution, software represents the bigger total systems challenge.

      Virtualization of interoperability enforces a new discipline on software developers. By adopting virtualization as a fundamental of the software design process, engineers will develop software components that may be combined in different ways to easily create new products. Until recently, virtualization for microprocessor-based systems was touted as a way to control expenses and manageability in data centers. While virtualizations do those things for datacenters, embedded systems have other more pressing concerns today and for the near future.


      Let’s take apart virtualization to see how you might be able to apply the principles to embedded systems. Virtualization comes in multiple flavors:


      • System virtual machines - System virtual machines are sometimes called hardware virtual machines by some authors. They support sharing the underlying physical machine resources between different virtual machines running their own operating system. The software layer implementing the virtualization is known as a Virtual Machine Monitor or Hypervisor. A hypervisor can run on bare hardware (native VM also called Type 1) or on an operating system (hosted VM also called Type 2).
      • Process virtual machines – also called application virtual machines, runs as an application inside an OS and supports a single process. It is created when the process is started and destroyed when it exits. Its purpose is to provide a platform-independent programming environment that abstracts the environment. Doing so hides details of the underlying hardware or operating system. Using this type of virtualization allows a program to execute in the same way on any platform.

      • Emulation of the underlying raw hardware - This approach is also called full virtualization of the hardware. Implementation is done using a Type 1 or Type 2 hypervisor. Each virtual machine can run any operating system supported by the underlying hardware. Users can run two or more different guest operating systems simultaneously. Each guest OS is resident in a separate private virtual computer.

      • Emulation of a non-native system - Virtual machines can also perform the role of a software emulator. Emulating non-native hardware allows operation of software applications and operating systems written for different processor architecture.

      • Operating system-level virtualization - can be thought of as partitioning: a single physical embedded platform is sliced into multiple small partitions (sometimes called virtual environments (VE), virtual private servers (VPS), guests, and zones). Each partition looks like a real system from the point of view of its operating software.

      All of these virtualization approaches must provide abstraction from the physical hardware peripherals and services to the operating system or application. The dividing line between the physical hardware and the systems software depends on which type of virtualization is employed. But regardless of the virtualization chosen, virtualization establishes a mindset for designers.


      Virtualization benefits:

      • Consolidation
      • Maximizing hardware cycle usage
      • Security
      • Separate development & production  platforms
      • Better logical software partitioning
      • Hardware independence


      Most of the fielded systems employing virtualization have been deployed in IT-focused servers based on the first two factors. The financial benefit for server configurations is leading to industry-wide API interface standardization of non-embedded virtualization systems. As a natural outgrowth of this standardization, you can expect to see similar efforts take hold for embedded systems. In the meantime, virtualization for embedded systems is here and provides ways to improve systems’ ability to be re-targeted to new and different platforms.




      Virtualization tools from RTOS vendors like TenAsys (3), QNX (4), Microsoft Corporation, Green Hills Software (5), and Wind River Systems (6) provide frameworks for developers to create new software structures. Virtualization can be extended to include physically separate hardware systems using software techniques such as remote procedure calls and inter-process communications.

      Interoperability can be defined by the mechanism used to communicate between computing components – whether they are realized in a hardware platform or inside a virtual environment. By virtualizing the software components of a system you gain more flexibility and control. For systems that use Ethernet communications, virtualizing the system means creating virtual Ethernet adapters, switches, and other communications support systems. How you accomplish this depends on the virtualization approach that you adopt.


      Full virtualization gives you the easiest environment in which to implement interoperation elements. For full virtualization, software is written for a virtual Ethernet adapter without regard for other software that may require the resource. Relief from considerations for physical hardware during the design process gives programmers the freedom to develop a software structure that best relates to the problem statement.


      Programming is an intellectually complex undertaking that is difficult regardless of the techniques used to implement software. There is no “magic bullet” for embedded programming. Some tools and languages have specific advantages for some programming problems, but there really isn’t one tool to solve every programming problem. Each part of the systems programming specification carries with it tradeoffs. In theory, software developed using ‘C’ is faster to develop than the same program written in assembly language, but assembly language carries with it the prospect of greater machine (platform) utilization. The rapid decrease in cost per CPU operation, with simultaneously dramatic increases in CPU performance, combined with the increase in systems complexity, has shifted the software decision point sharply to favor higher level languages like ‘C’ and Ada.  The same hardware dynamics are favoring increased use of parallel programming to use multi-core processors effectively.  Adopting virtualization as part of your programming bag of tricks encourages better program structure because the modules that are naturally defined by systems requirements are easier to code and test than a single glut of code. This is especially important for improving interoperability.


      Virtualization helps improve program abstraction. Many popular embedded programming languages lack semantic structure to abstract concurrency, and with a few notable exceptions like Ada and Java, creating concurrency is left to explicit programming by the developers. Virtualization aids the expression of concurrency by embedding the inter-process communications and control within the virtualization mechanism. In short, the virtualization mechanism can remove the need for explicit software structure to deal with multiple modules.


      Ada intrinsically had language semantics to create concurrent or parallel programs, but ‘C’ doesn’t. Threading libraries help in this regard, but using virtualization abstracts the program structure so that there is better modularity. As part of using virtualization to its best effect:


      • Avoid side-effect programming at all costs
      • Employ dataflow and data-parallel design techniques
      • Focus on task-centric programming
      • Emphasize reliability
      • Design in architectural efficiency
      • Employ asynchronicity in your designs


      Each of the RTOS companies mentioned above includes unique capabilities in their products. But every virtualization/OS offers some type of Inter-Process Communication (IPC).  IPC is one of the powerful tools available to improve modularity. Just as Ada intrinsically supports parallelism through its semantics, IPC mechanisms  extend similar mechanisms to all programming languages. Inter-Process Communications (IPC) can be achieved through a number of mechanisms:


      • Files
      • Signals
      • Sockets
      • Message queues
      • Pipes
      • Named pipes
      • Semaphores
      • Shared memory
      • Message passing
      • Memory-mapped file


      Using virtualization to achieve interoperability encourages better code structure while providing more usable components for developers. Well defined software modules that adhere to virtualization principles result in lower error rates and more robust systems. The degree of abstraction used in a specific system is largely up to the embedded systems designers.


      Using the advanced techniques that virtualization and expended parallel program structure provide is not without its risks. Choosing a fine program structure can burden the system with excessive compute cycles dedicated to the IPC mechanisms and parallel structures.


      For the major systems that influence interoperability, there are simple means to minimize compute overhead while maximizing module reusability:


      • Define modules according to industry standard protocols’ state diagrams
        • Choose maximum size modules that are self-contained
        • Within the self-contained maximum modules, decompose the modules to find high compute requirements that can benefit from parallel execution
        • Generalize the module operation to apply to multiple standards
        • Choose the persistence of each module
          • Create and destroy each time used?
          • Load and save select modules
          • Consider the number of threads or cores present on the minimally-capable hardware platform
          • Too much parallelism can overwhelm the processor(s) with executing virtualization and IPC mechanisms


      Virtualization and parallelism make a powerful programming technique. The two together can offer freedom and flexibility to choose new, more capable multi-core platforms while improving interoperability.


      How will you make your next systems design virtualization and parallelism decisions?


      More information



      To learn more about virtualization for embedded devices, visit




      1. Microsoft® Corporation is an Associate member of the Intel      Embedded Alliance
      2. Adlink is an Associate member of the Intel Embedded      Alliance
      3. TenAsys is an Affiliate member of the Intel Embedded      Alliance
      4. QNX Software Systems, Ltd is an Associate member of the      Intel Embedded Alliance
      5. Green Hills Software, Inc is an Affiliate member of the      Intel Embedded Alliance
      6. Wind River Systems is an Associate member of the Intel      Embedded Alliance


      Henry Davis
      Roving Reporter (Intel Contractor)
      Intel® Embedded Alliance

      Developing software for embedded systems is different and arguably more complex than software intended to be used in an Information Technology-focused PC environment. For thirty years the paradigm for Personal Computers followed the progression of mainframe and mini-computer models for computing. These models have evolved to rely heavily on abstracting the operating environment into a common programming interface in which applications are separated fully from essential underlying hardware.   Where PC software is most often completely divorced from the environment, embedded systems are intimately involved with their environment.


      Commercial software development tools have benefited from decades of tool development and millions of man-years of usage by professional programmers. These general purpose software development tools have a wide range of little-used capabilities that can empower embedded software developers. Single processor/single core CPUs are straightforward platforms for development and software debug and lack some of the complicating aspects of multi-core systems.

      Uniprocessors (CPUs with only one processor) can support multiple threads but only one thread actually executes at a time. Still, having tools to trace the operation of threaded software can aid both the understanding of how threads work in practice and debugging code that has been written to use threading. Green Hills Software®, Inc (1) offers MULTI EventAnalyzer as part of their Multi Integrated Development Environment (IDE). Part of the Time Machine Suite, the EventAnalyzer centers on a graphical display of operating system events.  These events include essential information such as kernel service calls, interrupts, exceptions, and context switches. This information is often viewed primarily as an optimization tool – operations that take the most time are obvious meaning that developers can spend time on optimization efforts that will have the best payback. But the EventAnalyzer can be a good learning tool for new members of a development team.




      Being able to capture operating systems events are a “nice to have” feature for uniprocessors, but when development moves to multi-processors or multi-core processors the feature goes from a nice feature to a “must have” capability.


      When developing with a threading environment all drivers and libraries must be “thread safe.” But adding multiple processors adds an additional requirement on software: everything must be priority inversion safe as well. Priority inversion can happen when software executes on more than one processor. Priority inversion is a problem of scheduling in which a higher priority task is indirectly preempted by a lower priority task. This effectively inverts the relative priorities of the two tasks. Such situations apparently violate the priority model - high priority tasks can only be prevented from running by higher priority.

      By way of illustrating the problem we’ll use a task called L, with low priority. This task uses a resource called R. Assume that L is executing and it gains control of resource R. The problem comes in when  there is another high priority task H that also requires resource R. If H starts after L has acquired resource R, H has to wait until L relinquishes resource R. Everything works as expected up to this point, but problems arise when a new task M starts with medium priority during this time. Since R is still in use (by L), H cannot run. Since M is the highest priority unblocked task, it will be scheduled before L. Since L has been preempted by M, L cannot relinquish R. So M will run till it is finished, then L will run - at least up to a point where it can relinquish R - and then H will run. Thus, in above scenario, a task with medium priority ran before a task with high priority, effectively giving us a priority inversion.


      Green Hills’ Event Analyzer shows developers the operating systems calls graphically. It’s straightforward to detect this situation by looking just at the task statuses. Wind River Systems® (2) adds another capability to their tool chain called Simics.




      Wind River Simics simulates everything from a single processor, system-on-chip (SoC), or board to the most complex system conceivable. Simics can simulate an entire system including racks of platforms, with each running different operating systems on different processor architectures. One of the goals of Simics is to allow all developers, testers, and integrators to debug/test and integrate the system as a single unit rather than working with individual system pieces as has been the case. Lest you think that these facilities are more than required for developing embedded systems, consider large scale embedded systems like telecom central office functions implemented by racks of dedicated embedded systems. Or, think of alternative systems hardware portioning such as might be found in Digital Signage in which displays are remote but the embedded computing could be performed in a physically localized fashion.

      Experts at Green Hills, Wind River, and Intel® offer these tips for multi-core development:


      • Consolidate the hardware as much as possible.
      • Employ virtualization.
      • Choose Symmetrical Multi Processor Operating System if building on a  homogeneous processor base, or an Asymmetrical Multi Processor Operating System if using some specialty processors like Digital Signal Processors.
      • Select libraries that have been optimized for multi-core use – when operating on fewer cores the code will still operate.
      • Adopt driver software and OSes that are both thread and priority inversion safe.
      • Develop software that makes maximum use of threading since it can simplify software development and also adapts to a new platform embodying more cores with little effort.
      • Enforce coding standards that minimize the ability to violate priority models while encouraging maximum architectural parallelism consistent with the selected processor family.


      Chances are that your company has already standardized on a specific tool chain. Regardless of the vendor, most tool chains support the Eclipse standard. Green Hills tools embrace Eclipse as part of the framework. Eclipse is recommended by the company to extend the range of capabilities provided in the EventAnayzer tool to meet unique requirements. Wind River’s tool chain also uses Eclipse as part of their framework. Wind River has been a strong supporter of the Eclipse efforts -contributing software to the industry wide effort.


      Embedded systems are gaining in complexity at a rate faster than implied just by the growth of Integrated Circuit complexity. Developing embedded systems is a challenging undertaking that demands a series of development techniques that are new to the embedded community.

      How will you adapt your tool chain to fit evolving embedded requirements?




      1. Green Hills Software, Inc  is an Affiliate member of the Intel Embedded Alliance
      2. Wind River Systems is an Associate member of the Intel Embedded Alliance


      Henry Davis
      Roving Reporter (Intel Contractor)
      Intel® Embedded Alliance

      How do you solve embedded scalability issues to build physically dispersed, large scale, real world systems?


      Embedded systems were once relatively independent, purpose-built hardware intended to serve a fixed function within a fixed and predictable demand system.  The emergence of organically growing embedded systems like streaming media and the “SmartGrid” system demand scalability on a large scale.




      Design techniques pioneered for large scale computing can be applied to embedded systems. The techniques rely on systems scalability, which is enabled by software structure. Embedded software vendors offer many of the building blocks necessary to create these complex systems.


      In a recent blog I posted about some methods available to embedded developers. But, as with any program designed to solve a specific problem, the best program structure reflects the problem statement. And that is where the software structure comes into play. Good programming languages should enable one obvious way to create the code and not easily permit many alternatives. Unfortunately, the language of choice for general embedded systems (‘C’) doesn’t inherently funnel the software creative process towards one “right” implementation. Software tool chains can be used to augment the language to programming standards and styles.  For example, Green Hills Software (1) DoubleCheck product is aimed at finding and flagging potential errors in C and C++ programs. DoubleCheck is a tightly integrated adjunct to Green Hills’ C and C++ compilers. DoubleCheck extends traditional static analyzers to help catch a slew of errors that can become runtime reliability problems:


      • Potential NULL pointer dereferences
      • Access beyond an allocated area - otherwise known as a buffer overflow and also underflows
      • Potential writes to read-only memory
      • Reads of potentially uninitialized objects
      • Resource leaks including memory and file descriptor leaks
      • Use of memory that has already been deallocated
      • Out of scope memory usage such as returning the address of an automatic variable from a      subroutine
      • Failure to set a return value from a subroutine


      As with the Green Hills tool chain framework, this tool may be extended by programming it to recognize uniquely defined structures with their own unique checking requirements.


      Wind River Systems (2) Link-Time Lint Checker is also an integrated error-checking tool. The lint facility finds common C programming mistakes at compile and link time. Typical errors flagged include:


      • unused variables and functions
      • missing return statements
      • constants out of range
      • function call mismatches.


      Link-time checking finds inconsistencies across modules, which is impossible to do at compile time.


      Maybe it’s time to consider other languages that don’t have the faults inherent in C. Ada is one such language supported directly by Green Hills Software and through a partnership between Wind River Systems and AdaCore. Ada had its genesis in a US Department of Defense contract starting in 1977. Today, it is the language of choice in many embedded fields including aerospace and other high reliability applications. Ada is a structured, statically typed, imperative, object-oriented programming language.  It has strong built-in language support for explicit concurrency, synchronous message passing, protected objects, tasks, and nondeterminism . Synchronous message passing employs a monitor-like construct with additional guards as in conditional critical regions. Nondeterminism is accomplished by the select statement. This is a language that you should seriously consider when developing large scale, advanced, systems requiring high reliability.




      Referring to an overview level of detail for SmartGrid Operations, it should be clear from inspection that the operations environment is conceptually complex. In an earlier blog a holistic system for readi-mix concrete incorporated some of the elements of the software complexity required for SmartGrid, but for a well defined problem of managing and controlling a batch concrete plant. One of the main differences between the complexity of a batch plant operation and the ever-growing infrastructure of power distribution and management is the dispersed nature of the American (and other) power grid. The US power “grid” started as a series of ad hoc, local, distributed networks to supply local consumers with relatively small amounts of power more than one hundred years ago. Since then these local distribution networks have been connected together in an expanding series of power distribution cables. There may be social debate about SmartGrid, but the proliferation of residential power generation net-metered to the grid, combined with deregulation of larger scale power generation requires a smarter mechanism to control supply and demand not only locally, but also regionally and globally.


      SmartGrid points towards a mixture of embedded systems for the US electric infrastructure: systems that are of varying sizes, complexity and architecture. Looking at the High Performance Computing project (HPC) gives us a look not too far into our future. HPC saw many of the problems that embedded systems are just starting to encounter when it was in its early stages. Embedded systems are quickly closing the gap between the pedantic, isolated small-scale embedded system and HPC – the problems  HPC faced a decade ago are our problems today, and today’s HPC problem will be on our doorstep in a few short years.  We will need to deal with three broad categories:


      • Efficient use of systems with a large number of concurrent operations (scalability)
      • Reliability with large tightly coupled systems
      • Jitter based on hardware, software, and the applications


      Scalability carries with it an intrinsic requirement for improved reliability of each software component. As the number of components increases, the reliability of each component in isolation becomes critical to the continued operation of the assembled system. Although an individual component, software or hardware, may fail, design techniques are available to permit continued operation in the face of component failure.  Embedded systems can be implemented by custom hardware or collections of industry standard hardware modules combined with scalable software.


      One of the key messages from the experience with large scale systems such as HPC is that virtualization is a key technology required to manage complexity, reliability, and multiple hardware platform types. Virtualization is a technique that separates software from the underlying hardware on which it operates.  While scalability is possible without virtualization, using virtualization simplifies systems design and offers more options for systems implementation. Using an approach based on scalability systematically improves effectiveness while minimizing power consumption. Virtualization is a key component of the software architecture. Embedded system providers achieve one code base that can support a continuum of performance and efficiency.


      Operating Systems Jitter is a new concept for many embedded programmers, and indeed to most programmers. OSJitter is related to other unexpected performance degradations of large numbers of computing nodes. In one of the most recent of research results on the subject, researchers at Lawrence Livermore Laboratories discovered that a computer made up of 4096 elements had a 13-fold reduction in throughput  based solely on jitter. This fact has some future implications for embedded systems. Looking again at the SmartGrid Operations block, you can see that there is substantial potential for large numbers of processors configured in computing clusters. Which in turn means that for at least this application, we’ll be facing OSJitter issues. Researchers believe that OSJitter can best be managed by:


      • Improving interrupt routing
      • Better user and kernel thread scheduling
      • More intelligent scheduling policies
      • Synchronization of jitter sources through various co-scheduling techniques 


      Virtualization makes many of these systems design decisions able to be changed with minimal perturbation of the remainder of the system. Vendors of RTOS products like QNX (3) and TenAsys (4) have different takes on what is important in an RTOS. <url to recent blog including QNX and TenAsys> But by employing virtualization as a cornerstone of your systems design you can minimize code rework.

      Although the subject is vast, improving software scalability boils down to a handful of points:


      • Adopt virtualization as a fundamental part of your design process
      • Consider the changing landscape of large scale embedded systems – what lessons are to be learned from them?
      • Chose a language, like Ada, that includes concurrency in the language itself
      • Employ threading
      • Identify what information is required to be used by the embedded system – minimize the span of information
      • Investigate your existing systems execution profile for bottleneck information – sometimes the resulting information is counter-intuitive
      • Evaluate the minimum number of cores that your application requires with a load low enough to NOT impact software development – usually keep loading under 80%
      • Ensure that your drivers and libraries are written for a maximum number of processors, but don’t force the use of more cores than are required


      There’s a place for most every RTOS, embedded programming language, and tool chain in your future. Which will you choose?




      1. Green Hills Software, Inc  is an Affiliate member of the Intel Embedded Alliance
      2. Wind River Systems is an Associate member of the Intel Embedded Alliance
      3. QNX Software Systems, Ltd. is an Associate member of the Intel Embedded Alliance
      4. TenAsys is an Affiliate member of the Intel Embedded Alliance


      Henry Davis
      Roving Reporter (Intel Contractor)
      Intel® Embedded Alliance

      Multi-core processor technology can bring higher systems performance and lower power consumption to a broad range of embedded applications running on distributed computing elements. But with the benefits of multi-core come new challenges and complexity, not just from a hardware perspective but more importantly from the software development task. Many developers find the move from single-core to multi-core systems challenging.  Developing embedded systems to achieve scalability is a particular challenge.  How can developers migrate software between processors with different core counts without rewriting their code?  An even bigger challenge is present in distributed systems, where the processing cores are in physically separate processors.  How can developers harness these physically separate multi-processor distributed resources to work in concert for their system?


      There are alternative approaches to developing software that can be migrated between systems employing processors with differing numbers of cores in each processor. Generally, software is usually developed using message passing with a Single Process, Multiple Data (SPMD) model or for shared memory with threads in OpenMP, Threads+C/C++ or Java. Software using message passing generally scales easily while the shared memory approach is easier to program but has performance limitations.





      Some programming languages encourage developing software architectures that employ parallel execution paths to use parallel hardware resources.  Unified Parallel C  (UPC) is one such language. Originally targeted at massively parallel mainframe computers, the UPC language was created based on experiences gained from three earlier languages: AC, Split-C, and Parallel C Preprocessor (PCP). UPC combines the programmability advantages of the shared memory programming approach with control over data layout. In addition, there are performance improvements gained with the message passing programming paradigm.


      QNX Software Systems, Ltd.(1) approach to supporting multi-core and multi-CPU systems is based on the idea of a micro kernel. Traditional embedded operating systems are constructed from amonolithic software in which every aspect of the OS is loaded whether used or not. Depending on the OS chosen, it may not be possible to reduce the memory footprint of the OS. The QNX kernel contains only support for CPU scheduling, interprocess communication, interrupt redirection and timers. All other support runs as a user process - including the special process called “proc” which performs process creation, and memory management by operating in conjunction with the microkernel. QNS achieves this functionality using two key mechanisms: subroutine-call type interprocess communication and a boot loader. The bootloader can load an image containing the kernel and any desired collection of user programs and shared libraries. QNX contains no device drivers in the kernel which separates much of the machine-specific code from the general OS code. Like many OS functions available in the market, the network stack is based on NetBSD code. QNX supports its legacy, io-net manager server, and the network drivers ported from NetBSD along with its own native device drivers.


      QNX’s interprocess communication technique works by sending a message from one process to another and waiting for a reply in one operation called MsgSend in the OS. The message is copied by the OS kernel from the address space of the sending process into the address space of the receiving process. Context switching is streamlined by QNX’s decision to switch control to the receiving process if the receiving process is waiting for the message - without invoking a pass through the CPU scheduler.


      QNX can be a distributed Operating System due to its microkernel architecture. Using this approach, a logical system may be partitioned across multiple hardware instances, each of which may perform a unique function such as disk access, I/O operation, and network operations without software regard for where the actual operation is taking place.  Each of these operations may be accessed through the message passing mechanism. By taking advantage of advanced inter-process communications techniques, developers can write code that scales across different core counts and even across disparate, networked processors.

      SMP is not the only Multi Processing approach that works for embedded systems. TenAsys’(2)  INtime® Distributed RTOS (DRTOS) is a 32 bit RTOS using embedded virtualization technology to partition resources on a multi-core processor platform.  The DRTOS enables multiple instances of the INtime RTOS running on a multi-core processor to communicate with each other.


      TenAsys takes a different approach to embedded Multi Processing. Developers work in a delivery platform using a managed Asymmetric Multi Processing (AMP) environment with the ability to distribute an application across several CPUs in a manner similar to SMP.




      TenAsys’ approach recognizes the value in assigning a specific processor to deal with critical real time I/O.  In a TenAsys-based design, the critical I/O resources are explicitly dedicated to a specific processor and its’ associated OS. This relationship is maintained by virtue of the binding of processes  within a specific processor and the dedicated connection to the I/O. QNX has a similar facility to bind specific process to a specific processor using what the company calls “Bound Multi Processing.” Both forms of binding a process to a processor minimizes the chances that a critical I/O channel with get starved for processing cycles inadvertently.


      A third approach to multi-processing is embodied by a software package sponsored by an industry consortium of companies. The OpenMP Application Program Interface (API) supports multi-platform shared-memory parallel programming in C/C++ on many software architectures, including Unix and Unix-like platforms. Since the work group has been mostly focused on data processing systems, the majority of effort has been in that arena. OpenMP has been jointly defined by a group of major computer hardware and software vendors. It is a portable, scalable model that gives shared-memory parallel programmers a simple and flexible interface for developing parallel applications. A GNU implementation of OpenMP is available for GNU-based tool chains and can be adapted to other tools as well. Of course, unlike TenAsys and QNX offerings, there’s work to implementing an OpenMP system for your embedded systems.


      Wind River Systems’(3) Mark Hermeling asks a pertinent question in a blog that he wrote about AMP vs SMP. There is no question that programming is easier for SMP-based systems. But AMP clearly has some performance advantages under some conditions. Since the particulars of every embedded system will be different, the answer to the question is “it depends.” Not a surprising answer, but one that provides little guidance. Wind River’s VsWorks OS puts a foot firmly in all three camps. Three camps? VxWorks can operate as a single OS in either SMP or AMP modes, or it can operate on top of Wind Rivers’ Hypervisor to provide more options for platform configuration.


      AMP, SMP, and Hypervisors. There’s powerful arguments for both AMP and SMP. Hypervisors add flexibility and power to both approaches. How will you choose what path is right for you?



      1. QNX Software Systems, Ltd is an Associate member of the Intel Embedded Alliance
      2. TenAsys is an Affiliate member of the Intel Embedded Alliance
      3. Wind River is an Associate member of the Intel Embedded Alliance


      Henry Davis

      Roving Reporter (Intel Contractor)

      Intel® Embedded Alliance

      The Intel Developer Forum is this week (September 13-15), and we have some exciting training and demonstrations prepared around firmware and boot loaders for Intel® architecture.


      Of course, the Embedded group is presenting a course on the Intel® Boot Loader Development Kit (Intel® BLDK) titled Reshaping the Intel® Architecture Firmware Landscape using Intel® Boot Loader Development Kit (Intel® BLDK) for Embedded Designs.  Cris Rhodes is a long-time BIOS development manager at Intel, and it is great to have him on the Intel BLDK team and presenting this course.  (Elmer Amaya from the Software and Services Group, who was planning to co-present, unfortunately was called away at the last minute and will not be able to co-present.)


      There is also a bunch of courses on UEFI (Unified Extensible Firmware Interface), which are too numerous to list here.  However, one other course of note will be from Pete Dice, who is currently the lead Firmware Architect in our Chipset group (and also former Architect for the Intel BLDK).  His course is titled Designing for Next Generation Best-In-Class Platform Responsiveness, and goes into a lot of details on what Intel is doing around boot time improvement, and how to design Intel based systems fast boot time responsiveness.


      You are also going to see use of Intel BLDK and other Intel boot loaders for embedded devices all across the IDF event.  We will have an Intel BLDK demonstration in the Software Community exhibit area, as well as demos in the Embedded Zone which are utilizing Intel BLDK.  There are also a couple lab courses (noted below) that will be using embedded platforms that utilize Intel BLDK based boot loaders.  Also, you are going to see Intel architecture boot loader demos from the following exhibitors:


      • American Megatrends Inc.
      • Arium
      • Inforce Computing
      • Intelligraphics Incorporated
      • Macraigor Systems, Inc.
      • Phoenix Technologies Ltd.
      • SBS Science & Technology Co., Ltd.
      • Wind River


      Finally, you might want to check out the course on Next Generation Intel® Atom™ Processors for Embedded, and find out what we have in store for embedded boot loaders for upcoming platforms.


      I’ll be at the event all week, and would love to talk with you about what is happening with embedded firmware to improve the development experience and performance of embedded systems.  If you don’t catch me at the classes or in the Embedded Zone exhibit area, I also will be joining Cris for office hours Wednesday from 4:00 – 5:00 PM, and again on Thursday from 10:00 – 11:00 AM, in room 2000 (on the 2nd floor of Moscone).  I hope to see you there!




      p.s. Here are details on the course and labs I mentioned:


      EMBS002:  Reshaping the Intel® Architecture Firmware Landscape using Intel® Boot Loader Development Kit (Intel® BLDK) for Embedded Designs,  Wednesday 1:05 – 1:55 PM, Room 2001


      EFIS004:  Designing for Next Generation Best-In-Class Platform Responsiveness, Tuesday 4:25 - 5:15 PM, Room 2009


      EMBS001:  Designing Embedded Intelligent Devices Powered by the Next Generation Intel® Atom Processor Based Platform,  Wednesday 11:20 AM - 12:30 PM, Room 2001


      EMBL001:  Application Graphic and Video Performance with the Intel® Atom™ Processor E6XX Platform,  Tuesday 1:05 – 2:20 PM, and again on Tuesday 3:20 - 4:45 PM, Room 2012


      SFTL003:  Create a Custom Embedded Linux* OS for Any Embedded Device using the Yocto Project*, Wednesday 1:05 – 2:20 PM, and again Wednesday 3:20 - 4:45 PM, Room 2012


      p.s.s.  I’ll also be presenting at Embedded Systems Conference in Boston, from Sep. 26-29, so if you don’t catch me at IDF, perhaps I’ll see you at ESC!