Multi-core processor technology can bring higher systems performance and lower power consumption to a broad range of embedded applications running on distributed computing elements. But with the benefits of multi-core come new challenges and complexity, not just from a hardware perspective but more importantly from the software development task. Many developers find the move from single-core to multi-core systems challenging.  Developing embedded systems to achieve scalability is a particular challenge.  How can developers migrate software between processors with different core counts without rewriting their code?  An even bigger challenge is present in distributed systems, where the processing cores are in physically separate processors.  How can developers harness these physically separate multi-processor distributed resources to work in concert for their system?


There are alternative approaches to developing software that can be migrated between systems employing processors with differing numbers of cores in each processor. Generally, software is usually developed using message passing with a Single Process, Multiple Data (SPMD) model or for shared memory with threads in OpenMP, Threads+C/C++ or Java. Software using message passing generally scales easily while the shared memory approach is easier to program but has performance limitations.





Some programming languages encourage developing software architectures that employ parallel execution paths to use parallel hardware resources.  Unified Parallel C  (UPC) is one such language. Originally targeted at massively parallel mainframe computers, the UPC language was created based on experiences gained from three earlier languages: AC, Split-C, and Parallel C Preprocessor (PCP). UPC combines the programmability advantages of the shared memory programming approach with control over data layout. In addition, there are performance improvements gained with the message passing programming paradigm.


QNX Software Systems, Ltd.(1) approach to supporting multi-core and multi-CPU systems is based on the idea of a micro kernel. Traditional embedded operating systems are constructed from amonolithic software in which every aspect of the OS is loaded whether used or not. Depending on the OS chosen, it may not be possible to reduce the memory footprint of the OS. The QNX kernel contains only support for CPU scheduling, interprocess communication, interrupt redirection and timers. All other support runs as a user process - including the special process called “proc” which performs process creation, and memory management by operating in conjunction with the microkernel. QNS achieves this functionality using two key mechanisms: subroutine-call type interprocess communication and a boot loader. The bootloader can load an image containing the kernel and any desired collection of user programs and shared libraries. QNX contains no device drivers in the kernel which separates much of the machine-specific code from the general OS code. Like many OS functions available in the market, the network stack is based on NetBSD code. QNX supports its legacy, io-net manager server, and the network drivers ported from NetBSD along with its own native device drivers.


QNX’s interprocess communication technique works by sending a message from one process to another and waiting for a reply in one operation called MsgSend in the OS. The message is copied by the OS kernel from the address space of the sending process into the address space of the receiving process. Context switching is streamlined by QNX’s decision to switch control to the receiving process if the receiving process is waiting for the message - without invoking a pass through the CPU scheduler.


QNX can be a distributed Operating System due to its microkernel architecture. Using this approach, a logical system may be partitioned across multiple hardware instances, each of which may perform a unique function such as disk access, I/O operation, and network operations without software regard for where the actual operation is taking place.  Each of these operations may be accessed through the message passing mechanism. By taking advantage of advanced inter-process communications techniques, developers can write code that scales across different core counts and even across disparate, networked processors.

SMP is not the only Multi Processing approach that works for embedded systems. TenAsys’(2)  INtime® Distributed RTOS (DRTOS) is a 32 bit RTOS using embedded virtualization technology to partition resources on a multi-core processor platform.  The DRTOS enables multiple instances of the INtime RTOS running on a multi-core processor to communicate with each other.


TenAsys takes a different approach to embedded Multi Processing. Developers work in a delivery platform using a managed Asymmetric Multi Processing (AMP) environment with the ability to distribute an application across several CPUs in a manner similar to SMP.




TenAsys’ approach recognizes the value in assigning a specific processor to deal with critical real time I/O.  In a TenAsys-based design, the critical I/O resources are explicitly dedicated to a specific processor and its’ associated OS. This relationship is maintained by virtue of the binding of processes  within a specific processor and the dedicated connection to the I/O. QNX has a similar facility to bind specific process to a specific processor using what the company calls “Bound Multi Processing.” Both forms of binding a process to a processor minimizes the chances that a critical I/O channel with get starved for processing cycles inadvertently.


A third approach to multi-processing is embodied by a software package sponsored by an industry consortium of companies. The OpenMP Application Program Interface (API) supports multi-platform shared-memory parallel programming in C/C++ on many software architectures, including Unix and Unix-like platforms. Since the work group has been mostly focused on data processing systems, the majority of effort has been in that arena. OpenMP has been jointly defined by a group of major computer hardware and software vendors. It is a portable, scalable model that gives shared-memory parallel programmers a simple and flexible interface for developing parallel applications. A GNU implementation of OpenMP is available for GNU-based tool chains and can be adapted to other tools as well. Of course, unlike TenAsys and QNX offerings, there’s work to implementing an OpenMP system for your embedded systems.


Wind River Systems’(3) Mark Hermeling asks a pertinent question in a blog that he wrote about AMP vs SMP. There is no question that programming is easier for SMP-based systems. But AMP clearly has some performance advantages under some conditions. Since the particulars of every embedded system will be different, the answer to the question is “it depends.” Not a surprising answer, but one that provides little guidance. Wind River’s VsWorks OS puts a foot firmly in all three camps. Three camps? VxWorks can operate as a single OS in either SMP or AMP modes, or it can operate on top of Wind Rivers’ Hypervisor to provide more options for platform configuration.


AMP, SMP, and Hypervisors. There’s powerful arguments for both AMP and SMP. Hypervisors add flexibility and power to both approaches. How will you choose what path is right for you?



  1. QNX Software Systems, Ltd is an Associate member of the Intel Embedded Alliance
  2. TenAsys is an Affiliate member of the Intel Embedded Alliance
  3. Wind River is an Associate member of the Intel Embedded Alliance


Henry Davis

Roving Reporter (Intel Contractor)

Intel® Embedded Alliance