How do you solve embedded scalability issues to build physically dispersed, large scale, real world systems?
Embedded systems were once relatively independent, purpose-built hardware intended to serve a fixed function within a fixed and predictable demand system. The emergence of organically growing embedded systems like streaming media and the “SmartGrid” system demand scalability on a large scale.
Design techniques pioneered for large scale computing can be applied to embedded systems. The techniques rely on systems scalability, which is enabled by software structure. Embedded software vendors offer many of the building blocks necessary to create these complex systems.
In a recent blog I posted about some methods available to embedded developers. But, as with any program designed to solve a specific problem, the best program structure reflects the problem statement. And that is where the software structure comes into play. Good programming languages should enable one obvious way to create the code and not easily permit many alternatives. Unfortunately, the language of choice for general embedded systems (‘C’) doesn’t inherently funnel the software creative process towards one “right” implementation. Software tool chains can be used to augment the language to programming standards and styles. For example, Green Hills Software (1) DoubleCheck product is aimed at finding and flagging potential errors in C and C++ programs. DoubleCheck is a tightly integrated adjunct to Green Hills’ C and C++ compilers. DoubleCheck extends traditional static analyzers to help catch a slew of errors that can become runtime reliability problems:
- Potential NULL pointer dereferences
- Access beyond an allocated area - otherwise known as a buffer overflow and also underflows
- Potential writes to read-only memory
- Reads of potentially uninitialized objects
- Resource leaks including memory and file descriptor leaks
- Use of memory that has already been deallocated
- Out of scope memory usage such as returning the address of an automatic variable from a subroutine
- Failure to set a return value from a subroutine
As with the Green Hills tool chain framework, this tool may be extended by programming it to recognize uniquely defined structures with their own unique checking requirements.
Wind River Systems (2) Link-Time Lint Checker is also an integrated error-checking tool. The lint facility finds common C programming mistakes at compile and link time. Typical errors flagged include:
- unused variables and functions
- missing return statements
- constants out of range
- function call mismatches.
Link-time checking finds inconsistencies across modules, which is impossible to do at compile time.
Maybe it’s time to consider other languages that don’t have the faults inherent in C. Ada is one such language supported directly by Green Hills Software and through a partnership between Wind River Systems and AdaCore. Ada had its genesis in a US Department of Defense contract starting in 1977. Today, it is the language of choice in many embedded fields including aerospace and other high reliability applications. Ada is a structured, statically typed, imperative, object-oriented programming language. It has strong built-in language support for explicit concurrency, synchronous message passing, protected objects, tasks, and nondeterminism . Synchronous message passing employs a monitor-like construct with additional guards as in conditional critical regions. Nondeterminism is accomplished by the select statement. This is a language that you should seriously consider when developing large scale, advanced, systems requiring high reliability.
Referring to an overview level of detail for SmartGrid Operations, it should be clear from inspection that the operations environment is conceptually complex. In an earlier blog a holistic system for readi-mix concrete incorporated some of the elements of the software complexity required for SmartGrid, but for a well defined problem of managing and controlling a batch concrete plant. One of the main differences between the complexity of a batch plant operation and the ever-growing infrastructure of power distribution and management is the dispersed nature of the American (and other) power grid. The US power “grid” started as a series of ad hoc, local, distributed networks to supply local consumers with relatively small amounts of power more than one hundred years ago. Since then these local distribution networks have been connected together in an expanding series of power distribution cables. There may be social debate about SmartGrid, but the proliferation of residential power generation net-metered to the grid, combined with deregulation of larger scale power generation requires a smarter mechanism to control supply and demand not only locally, but also regionally and globally.
SmartGrid points towards a mixture of embedded systems for the US electric infrastructure: systems that are of varying sizes, complexity and architecture. Looking at the High Performance Computing project (HPC) gives us a look not too far into our future. HPC saw many of the problems that embedded systems are just starting to encounter when it was in its early stages. Embedded systems are quickly closing the gap between the pedantic, isolated small-scale embedded system and HPC – the problems HPC faced a decade ago are our problems today, and today’s HPC problem will be on our doorstep in a few short years. We will need to deal with three broad categories:
- Efficient use of systems with a large number of concurrent operations (scalability)
- Reliability with large tightly coupled systems
- Jitter based on hardware, software, and the applications
Scalability carries with it an intrinsic requirement for improved reliability of each software component. As the number of components increases, the reliability of each component in isolation becomes critical to the continued operation of the assembled system. Although an individual component, software or hardware, may fail, design techniques are available to permit continued operation in the face of component failure. Embedded systems can be implemented by custom hardware or collections of industry standard hardware modules combined with scalable software.
One of the key messages from the experience with large scale systems such as HPC is that virtualization is a key technology required to manage complexity, reliability, and multiple hardware platform types. Virtualization is a technique that separates software from the underlying hardware on which it operates. While scalability is possible without virtualization, using virtualization simplifies systems design and offers more options for systems implementation. Using an approach based on scalability systematically improves effectiveness while minimizing power consumption. Virtualization is a key component of the software architecture. Embedded system providers achieve one code base that can support a continuum of performance and efficiency.
Operating Systems Jitter is a new concept for many embedded programmers, and indeed to most programmers. OSJitter is related to other unexpected performance degradations of large numbers of computing nodes. In one of the most recent of research results on the subject, researchers at Lawrence Livermore Laboratories discovered that a computer made up of 4096 elements had a 13-fold reduction in throughput based solely on jitter. This fact has some future implications for embedded systems. Looking again at the SmartGrid Operations block, you can see that there is substantial potential for large numbers of processors configured in computing clusters. Which in turn means that for at least this application, we’ll be facing OSJitter issues. Researchers believe that OSJitter can best be managed by:
- Improving interrupt routing
- Better user and kernel thread scheduling
- More intelligent scheduling policies
- Synchronization of jitter sources through various co-scheduling techniques
Virtualization makes many of these systems design decisions able to be changed with minimal perturbation of the remainder of the system. Vendors of RTOS products like QNX (3) and TenAsys (4) have different takes on what is important in an RTOS. <url to recent blog including QNX and TenAsys> But by employing virtualization as a cornerstone of your systems design you can minimize code rework.
Although the subject is vast, improving software scalability boils down to a handful of points:
- Adopt virtualization as a fundamental part of your design process
- Consider the changing landscape of large scale embedded systems – what lessons are to be learned from them?
- Chose a language, like Ada, that includes concurrency in the language itself
- Employ threading
- Identify what information is required to be used by the embedded system – minimize the span of information
- Investigate your existing systems execution profile for bottleneck information – sometimes the resulting information is counter-intuitive
- Evaluate the minimum number of cores that your application requires with a load low enough to NOT impact software development – usually keep loading under 80%
- Ensure that your drivers and libraries are written for a maximum number of processors, but don’t force the use of more cores than are required
There’s a place for most every RTOS, embedded programming language, and tool chain in your future. Which will you choose?
- Green Hills Software, Inc is an Affiliate member of the Intel Embedded Alliance
- Wind River Systems is an Associate member of the Intel Embedded Alliance
- QNX Software Systems, Ltd. is an Associate member of the Intel Embedded Alliance
- TenAsys is an Affiliate member of the Intel Embedded Alliance
Roving Reporter (Intel Contractor)
Intel® Embedded Alliance