Skip navigation
2009

As you may have noticed, code and data security is a hot topic, and one covered by a number of the Intel® Embedded Communications Alliance (Intel® ECA) Roving Reporters. Indeed, security is important in protecting financial transactions, personal data such as medical records, and of course military systems dealing with national security. Today, security is tougher than ever because almost all systems, even embedded ones, connect to networks and the Internet. Moreover cost concerns result in the mix of secure and non-secure applications on the same multi-core and/or multiprocessor systems. The good news is that software technology such as separation kernels along with new security-centric features integrated in Intel® processors enable secure system design. Features such as Intel® Trusted Execution Technology (Intel® TXT) and Intel® Virtualization Technology (Intel® VT) - unique to Intel -- can be invaluable in building secure systems.

 

In this post, I'll dig deep into architectures and technology for maximum security. You might also review two recent Intel ECA blog posts on Intel TXT and Intel VT.

 

Let's start with formal security standards and specifications so you have some background. Most all security standards promulgate from military work. The commercial industry finds it easier and cheaper to leverage the military work even if the commercial systems require lower levels of security. The Common Criteria for Information Technology Security Evaluation (called Common Criteria or CC) defined in the ISO/IEC 15408 provides a framework for designers to specify security capabilities and for testing labs to validate. While conceived for IT applications, CC applies equally to embedded systems.

 

Products evaluated to CC get what is essentially a grade called the Evaluation Assurance Level (EAL) - EAL1 through EAL7. Commercial Operating Systems (OSs) such as some versions of Windows and Linux earn the moderately secure grade of EAL4.

 

Today, the most common approach to meeting CC requirements leverages the architecture called Multiple Independent Levels of Security (MILS). A MILS implementation relies on separation to meet security requirements. Most implementations rely on a separation kernel - a thin layer of software that emulates an environment where secure applications were isolated on dedicated hardware. But the separation kernel can actually mix secure and non-secure OSs and applications. MILS implementations also separate resources, such as memory or I/O, and ensure that secure and non-secure data are never intermixed.

 

LynuxWorks, an Affiliate member of Intel ECA,  is one of several companies offering separation kernels based on a MILS architecture. The LynxSecure Separation Kernel also embeds a hypervisor and relies on Intel VT. The diagram below depicts the kernel architecture.

 

 

87i0171E16BC52935D5

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

LynxSecure partitions the secure LynxOS separate from other guest OSs such as Windows or Linux.

 

Steve Blackman, Director of Business Development at LynuxWorks, points out that the separation kernel concept relies on small modular code blocks that can survive scrutiny. Blackman states, "Achieving security relies on a combination of proofs used to prove a higher-level proof."

 

The design team behind LynxSecure developed the kernel based on the Separation Kernel Protection Profile (SKPP) developed by the National Security Agency. According to Blackman, SKPP requires at a minimum that a kernel separate partitions and control communications between the partitions. But SKPP also defines a more advanced implementation called a Least Privileged Separation kernel (LPSK) and LynxSecure was architected on the LPSK model.

 

LPSK introduces separation and control at a more granular level. Specifically LPSK defines the concepts of subjects, resources, and partitions. Executable code is an example of a subject. A resource might be a processor core, a section of memory, a network interface, or a disk drive. In the simplest case, a subject may be equivalent to a partition. Or a more complex partition might include multiple subjects.

 

LynxSecure relies on an XML configuration file to define the relationships between subjects, resources, and partitions. The design team can group objects using the file. For example, you might group one or more subjects along with the specific policies and privileges of those subjects. Blackman suggests that one way you might use such a group is to set scheduling policies for fast interrupt response.

 

LynuxWorks is preparing to submit LynxSecure for evaluation to both EAL7 and the SKPP -- related but separate evaluation criteria.

 

Have you mixed secure and non-secure applications on one system? How did you isolate the sensitive code or data? Do you have experience with the CC and EAL process? Design teams following the Intel ECA community would sure benefit from your comments. Let us know your thoughts

 

 

 

 

The practice of virtualization has been around for more than a decade, but who's to say which of the three frontrunner methods - binary translation (or runtime handling of system behavior/control-sensitive instructions); OS-assisted (also called para-virtualization); or hardware-assisted (also known as full virtualization) - is best? Or is there a best? While some continue to rely on binary translation as the de facto method, others might think that either para-virtualization or hardware-assisted virtualization - which often includes Intel's VT-x technology - is best. But which would serve the industry best?

Virtualization at a 50,000-foot glance
Simply stated, gone are the days of unlimited rack space and costly computers - and their extra expense. Embedded applications like defense, communications, and industrial are trying to reduce size, weight, and power consumption without sacrificing compute power. Legacy applications no longer have to run on different physical machines than their more updated counterparts. Software migrations are even possible without the hassle of taking down the entire system or application. And forget about redundant hardware - it's no longer necessary because system uptime can be increased through software failover instead.

The magic enabling panacea: virtualization, which enables several disparate OSs (and therefore their dependent applications) to execute within one physical machine through the use of a Virtual Machine Monitor (VMM). A VMM is actually a new software layer sometimes referred to as a "hypervisor," which manages these disparate OSs and applications running on them commonly known as Virtual Machines (VMs) Virtualization is executed through a "context switch" state that makes each separate application on its respective OS think it has sole control over all the hardware[1]. Applicable to both single-core and multi-core scenarios, this delusion or illusion of sole control is highly beneficial to engineers for the aforestated reasons. But which of the three methods is the most effective?

The virtualization triad - which one wins?
Like all things technology, there's more than one way to reach the goal, but is there a perfect route for perfectionists? One can only say for certain ... well, it depends.

Binary translation

How it was developed
The binary translation method of virtualization was developed for good reason: OSs crafted for Intel® Architecture processors are designed to execute on native hardware directly and therefore assume they have sole control over computing resources. Additionally, x86 architectures comprise various privilege levels, which presents no issues because OS code is designed to execute at the top privilege level natively. However, that privilege expectation became a challenge when the x86 architecture was virtualized and the guest OS was therefore relegated to execute at a privilege level lower than the VMM (because the VMM manages shared resource allocation). Additionally, instruction semantics differ when an OS is run natively versus in a virtualized scenario[2].

Pros and cons
With the VMM providing decoupling of the guest OS from the hardware platform, no OS assist nor hardware assist is needed. The primary drawback, though: Performance is somewhat hindered as runtime OS modification is necessary[2]. Another snafu of binary translation is its complexity, says Chris Main, CTO at TenAsys Corporation, an Affiliate member of the Intel® Embedded and Communications Alliance (Intel® ECA). "Binary translation describes the technique where 'problem code' in the guest software ... is replaced by on-the-fly 'safe code.' ... It requires detailed knowledge of the guest software and thus is typically complex to implement."

OS-assisted or para-virtualization

Where it fits in
Para-virtualization schemes feature a hypervisor and modified guest OS collaboration, where the OS's privileged access areas are altered to request hypervisor action instead of executing privileged instructions, explains Mark Hermeling, senior product manager at Wind River, an Intel® ECA Associate member. This technique is most suitable for scenarios when the real hardware environment and guest environment are alike or quite similar. The guest OS is optimized for performance and to ensure it does not commit guest-inappropriate actions.

Plusses and minuses
Para-virtualization typically renders the highest performance amongst the virtualization methods discussed herein[2]. "Para-virtualization can result in good system performance but is generally applicable to situations where the guest is well-known or fixed for a given product," says Main.

"[Para-virtualization] can be done on top of any processor, and the real-time performance is the best of all three methods. Para-virtualization is generally regarded as the best option for real-time behavior. Para-virtualization can be mixed with full virtualization, for example, to execute Microsoft Windows using full virtualization in one virtual [machine] and VxWorks or Linux para-virtualized in another on top of the same processor (both single-core and multi-core)," details Hermerling.

Hardware-assisted or full virtualization
In contrast to binary translation and para-virtualization, the hardware-centric full virtualization method utilizes an unmodified OS that runs on a virtual machine - without the OS knowing it's running in a virtualized environment sharing the physical system with other OSs. Of course, the OS will try to execute a privileged instruction, but in this case, the processor sends the hypervisor an exception. The next step: The hypervisor then performs the requested behavior, Hermeling reports.

Consequently, processors such as Intel® Core 2 Duo, Intel® Xeon, and the latest Intel® Core i7 feature Intel's virtualization technology and support technologies like the Intel® Virtualization Technology (Intel® VT)[3] for IA-32, Intel® 64, and Intel® Architecture (Intel® VT-x).

With VT-x, the processor provides two new operation modes, where VMs run in the "VMX non-Root mode" while the VMM executes in the "VMX Root mode." Here's how it works: Processor behavior in VMX non-Root mode operation is modified and restricted for virtualization facilitation. However, in contrast to ordinary operation, specific events and instructions prompt the VMM (Root mode) to take action. This enables the VMM to keep processor-resource control.

Meanwhile, processor behavior in VMX Root mode operation is very similar to that within its native environment. The primary differences include a newly available set of VMX instructions, in addition to a limitation of the values that might be loaded into specific control registers. VMX operation invokes restrictions on software executing with Current Privilege Level (CPL) 0; therefore, guest software is able to execute at the privilege level to which it was originally designed, simplifying VMM development.

Why or why not use it?
With the exploding popularity of the world's most pervasive OS - Windows - it's important to note that the source code for Windows cannot be modified for a para-virtualization scheme. However, Windows embedded flavors including Windows 7 Professional for Embedded Systems, Windows 7 Ultimate for Embedded Systems, Windows Embedded POSReady, Windows XP Embedded, and others are gaining more acceptance among the embedded community, especially in market segments such as medical, industrial, gaming, and retail, to name a few.

"The advantage of the hardware virtualization technique is that it requires no knowledge of the guest software other than the specific set of interaction with the [VMM]. This makes it more useful in a general-purpose solution to support many different guest runtime environments," states Main.

Hermeling has another point of view. "Full virtualization is really attractive as you don't have to modify the operating system. However, there is a significant impact due to the required emulation work when the processor throws an exception. The impact is very much noticeable in handling devices. This is measurable in throughput, as well as latency and jitter in interrupt handling." This method of virtualization also necessitates hardware assist, something not always found in embedded processors. However, most Intel® processors support Intel's VT-x virtualization technology, as do many other competitive architectures featuring their own hardware extensions for supporting virtualization in embedded.

 

86iE4B3AA47E3EDACAC

 

Now you decide
In a fragmented embedded industry where virtualization is relatively new territory and multiple processors - all with different requirements and IP - could be used, not to mention present costliness of the virtualization equation, standardization is likely a perplexing equation. And then there's the matter of whether to standardize one of the two software-based methods (binary translation and para-virtualization) versus the hardware-assisted method. Should software or hardware be emphasized in potential standardization - or both? Time will tell, but for now, it appears that the pros and cons of standardization are evenly weighted. Typically having no answer means a "no" answer ... so I'm told. What are your thoughts?

Written by Sharon Schnakenburg, OpenSystems Media®, by special arrangement with Intel® ECA

References:

[1] "Virtualization for Embedded and Communications Infrastructure Applications," by Edwin Verplanke, Intel® Corporation, Oct. 31, 2006.

[2] "Intel's CPU extensions transform virtualization," by Stuart Fisher, LynuxWorks, Inc., www.mil-embedded.com/articles/id/?3733.

[3] "Intel® Virtualization Technology for Embedded Applications," by Amit Aneja, Intel,
Rev 1.0, July, 2009, http://edc.intel.com/Training/Courses.aspx?ttag=ttipt&ptag=&ftag=&sort=2

Embedded design teams often have legacy considerations that dictate system design with each revision or evolution. Software is a particular concern. Teams seek to preserve proven algorithms and code even while attempting to take advantage of the latest hardware. And the legacy code may well have been written for a number of loosely-coupled specialized processors either combined on one board or spread across multiple boards. Today's Intel® Architecture multi-core processors, based on the Core or Nehalem microarchitectures, provide a very cost effective platform that can easily replace a dozen or more processors in a legacy system. But how do you migrate the software?

 

Of course there is no simple answer. The design team could rewrite the software from the ground up and likely realize performance gains. Such a redesign, however, adds engineering costs and adds time to the design cycle resulting in undetermined but potentially significant opportunity costs. Teams looking to harness the latest processors do have options to maintain the legacy code and still consolidate their design to a single- or dual-socket multi-core design.

 

QNX Software Systems, an Associate member of the Intel® Embedded and Communications Alliance (Intel® ECA), offers one alternative for porting legacy code to a multi-core processor. The company supports a technique called Bound Multiprocessing (BMP) that offers many of the advantages of Symmetric Multiprocessing (SMP) on a multi-core platform, while simplifying the task of getting legacy code to work on the platform.

 

Multi-processor legacy systems often dedicated an instantiation of an operating system (OS) to each processor and dedicated each processor to a specific set of tasks or software processes - a technique often called asymmetric multiprocessing (AMP or ASMP). In a modern SMP system, a single copy of the OS hosts all processes and assigns any process needing service to the next available processor or core. But legacy code designed for a single-processor environment may not run without significant modification in an SMP system. The code may have no shared-memory mechanism and  in real-time implementations may expect direct access to the processor at all times.

 

BMP offers a compromise between the legacy AMP system and an SMP system. The BMP implementation available in the QNX Momentics Tool Suite relies on a single copy of the OS to handle all processes. But the design team can specify that a process or set of processes, and all associated threads, be dedicated to a particular core. The legacy code gets the equivalent of a dedicated processor. Meanwhile, SMP-aware code can take full advantage of SMP with the OS scheduling processes for the next available execution resource. Below you'll see a screen shot of the memory analysis feature in Momentics.

 

82iD731394FE927CA81

 

For a thorough look at AMP, BMP, and SMP, you might read the article, "Software migration strategies for multi-core processors" from Embedded Control Europe magazine. In addition, Intel multi-core expert Lori Matassa recently posted an SMP article to the Intel® ECA site.

 

As for your choices in multi-core platforms, Intel has a broad variety of platforms that are in the company's embedded program with guaranteed availability for many years. The platforms include both options optimized for performance and others optimized for low power. The available processors range from single- and dual-core Intel® Core TM-2 Duo processors to dual- and quad-core Intel® Xeon® processors. And in all cases, the platforms support dual processor sockets for as many as eight cores total. For more information on the Intel embedded platforms see the Hardware Platforms web page.

 

I'd like to hear other ideas on moving legacy multiprocessor applications to a modern multi-core platform. How did you migrate legacy software and how have you leveraged Multi-Core features? Please share your techniques with the Intel® Embedded Community via comments to this blog.

Generally, Intel® Hyper-Threading technology (Intel® HT technology) seeks to boost performance in systems by making the most efficient use of multiple execution units in a superscalar processor core. It turns out, however, that Intel HT technology can also be leveraged in a virtualization application -- running both a general purpose OS (operating system) like Windows and a RTOS (real time operating system) simultaneously on a single core. Such an implementation on an Intel® AtomTM processor allows the ultra-low-power and low-cost processor to respond to external events with latency under 10 microseconds - achieving what is commonly called hard real-time performance. In such an application, Atom can usurp the need for a processor such as a PowerPC that's often dedicated to the real-time task.

 

Hyper threading allows multiple software threads to execute in parallel on a single processor core. Generically in the industry you will hear this technology referred to as simultaneous multithreading. Parallel threads execute simultaneously and maximize efficiency by minimizing the time that an execution unit idles, waiting for work due to situations such as a thread waiting for external data.

 

RadiSys, a Premier member of the Intel® Embedded and Communications Alliance (Intel® ECA),  has leveraged Intel Hyper-Threading technology in multiple ways. In a future blog post I'll describe how RadiSys leveraged hyper threading to boost performance in an imaging application on a Nehalem-based system. But in this post, we will focus on how to implement a real-time embedded system such as an industrial controller on a single low-power processor while maintaining the convenience of a general purpose OS.

 

Real-time industrial controllers often rely on two separate, but closely-linked computers -- a ruggedized PC running Windows, or perhaps Linux, alongside an embedded system running an RTOS. The Windows system provides the convenience of the GUI for a robust user interface and of course support for networks like Wi-Fi and connectivity such as USB. The real-time systems often use processors such as the PowerPC that lack the high-end performance of the x86 family but that can provide the needed low-latency response. RadiSys Director of Software Marketing Linda Xiao states, "We regularly see demand for sub-10-microsecond level latency and sometimes for response in as little as 2 microseconds."

 

Clearly, a single-processor approach would provide cost savings in the hypothetical industrial controller that I've described. Moving to a single processor, however, would require loosing the convenience of Windows, or conceiving a way to run both OSs on the same processor while preserving the fast interrupt response time.

 

RadiSys found the answer in the Hypervisor technology that the company jointly developed with Real Time Systems. RadiSys supplies the Hypervisor as middleware with its OS-9 RTOS that's designed for hard real-time applications. Real Time Systems GmBH is an Affiliate member of the Intel ECA.

 

Hypervisor is virtualization technology that from a macro view may seem similar to Intel® Virtualization Technology (Intel® VT). Both allow multiple operating systems to run simultaneously with each fully isolated from the other. According to Xiao, however, Intel VT targets high performance applications ranging from servers in the IT space to compute-intensive embedded applications, whereas Hypervisor targets low latency. She states, "It's not about fast but about determinism." In the case of a real-time system, Hypervisor runs Windows on one thread and the RTOS on another.

 

According to Xiao, the Virtualization Manager (VM) software layer in typical virtualized implementations adds jitter to the RTOS response. The Hypervisor approach seeks to minimize the amount of OS code that runs through a VM layer by partitioning software access to processor hardware features whenever possible. You can access more details on the technology in the Hypervisor data sheet. The block diagram below shows how the RTOS is afforded direct access to hardware.

 

 

 

72iA034710F9CBCF3E1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Of course, the designers working on industrial control applications also care about preserving their prior investment in software. And many were not using x86 processors in real-time systems. Xiao claims that porting OS-9 to the x86 from the PowerPC took only three days. And the Hypervisor approach preserves compatibility with existing drivers and applications - all executing on the single Atom core.

 

Hyper-Threading Technology and the Hypervisor provide a good match for many applications. The user interface development gets the benefit of Windows and even the graphics accelerators that are widely available. And according to Xiao, Atom provides a good balance of cost, performance, and power consumption. She states, "You don't use Atom for performance, but for system consolidation, cost reduction, and software reuse."

I'd also point out an earlier blog post that covers the use of Hypervisor on a specific RadiSys multi-core board based on an Intel® CoreTM 2 Duo processor.

 

How have you combined general-purpose and real-time OSs in an application? The many followers of the Intel® Embedded Community would greatly benefit from comments sharing your techniques.

 

For engineers involved with board bring-up, BIOS programming, OS customization, driver optimization, firmware and low-level software development, JTAG debug is an indispensable component of the developer's toolkit. Looking around the embedded space, I note that the availability and cost of JTAG tools varies widely by processor architecture. It seems that ARM has the most options- I even found a JTAG debugger for ARM on amazon.com selling for $299. In contrast, high-end solutions command prices over $10,000. Where does Intel® Architecture (IA) fit into this picture? In this blog I'll explore the evolution of JTAG support for IA- what exists today, and where it's heading tomorrow. You might not find an IA debugger on amazon.com today but big changes are afoot.

 

To paint the complete picture, let me start with a bit of history. In the days of the 386EX, 486 and the early Pentium, there were several companies offering IA emulators. Those tools needed special bondout versions of the processors and needed a socketed processor since they required access to all the processor pins. Besides having those limitations, they were often very expensive. Over time the list of companies offering IA emulators dwindled to a list of one.

 

Meanwhile, enter JTAG. Considering its pioneering role in processors, I wasn't too surprised to learn that Intel is credited with JTAG's adoption by electronics manufacturers worldwide, leading with the release of the 80486- the industry's first processor with JTAG. That was in 1990, the same year the Joint Test Action Group (JTAG) finalized the IEEE 1149.1 standard entitled Standard Test Access Port and Boundary-Scan Architecture. I look for the word "debug" in this standards title and I'm not finding it. That's because JTAG originated as a method to test populated circuit boards after manufacture. But since it provided developers with a convenient "back door" into processors and other ICs, JTAG's use as a debug aid was soon to follow.

 

The first commercial JTAG debug for IA appeared in 1992 from Arium, an Affiliate member of Intel® Embedded and Communications Alliance (Intel® ECA). No surprise here that Arium was that "list of one" I mentioned earlier. According to their website, the company is entirely focused on development and debug tools, with products supporting Intel and ARM processors and various software environments, both host and target. The hardware emulators are platform-specific but the software is common which means that if you're now developing with ARM it should be an easy transition for you to debug IA. Of course you'll have to learn the unique features of IA but the user interface to the core debug functions like setting breakpoints, inspecting registers and memory and writing flash will be virtually identical. Click here to check out the specs.

 

 

 

 

 

 

 

 

 

 

While almost since the inception of JTAG, Arium had been your only commercial choice for an IA debugger. However, in June 2009, along came Wind River, an Intel® ECA Associate member. You may have seen the previews in our community of the new Wind River JTAG Tools for Intel Processors. Wind River's solution integrates JTAG debug hardware with their Eclipse-based integrated development environment (IDE) known as Wind River Workbench. Hardware adapters are available in two configurations, an entry-level "portable" unit, and one that is multi-core, multi-thread capable.

 

 

76iB71C8166A190AED1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Consistent with Wind River's legacy of cross-platform support, their JTAG offering covers most popular embedded processors including ARM, MIPS, Power Architecture, ColdFire, RMI, and now IA. The first Intel processor supported is ATOM, and Wind River's collateral states that support for Core 2 Duo and certain Xeon chips will follow "in the near future." That makes total sense considering that Intel has just completed the acquisition of Wind River. In case you hadn't heard that news, you can read that announcement here. As in the case with Arium, if you're familiar with Wind River JTAG on one of their other supported processors it should be an easy transition for you to IA. Read the full specs.

 

For well over a decade IA developers had a single choice of JTAG debug. And now there are two. What's in store for the future? Will we wait another decade for additional options? The short answer is, no. My sources tell me that Intel is quickly closing the gap by working with other software and tools companies to expand JTAG support for embedded IA. At the time of writing this blog, these projects haven't been publicly launched so I'm not privy to the details. I am told however that the announcement timeframe will be weeks or months - not years.

 

As these new products roll out, I'll be interested in the price points. The IA processor family spans a wide cost spectrum. Is there a "one-size-fits-all" debugger, or rather a one-price-fits-all? My experience tells me that many developers scale their tool budget expectations to the target product cost. While customers will gladly pay x for tools in support of a big ATCA blade with dual Xeon processors, they might balk at that same x figure to support a low cost ATOM-based handheld device, even though the debug effort may be similar. We will just have to wait and see how this scenario unfolds.

 

Will it be long before you can buy JTAG for IA on amazon.com? More importantly, what are your favorite JTAG debug products that you would like to see Intel-enabled? Let me know your thoughts.

 

Felix

 

 

J. Felix McNulty
Community Moderator
Intel® Embedded Design Center Community
(Intel contractor)