We all hear about the tough operating environment in IT data centers with servers densely packed in racks and operating continuously. I'd bet that a lot of you in the embedded space chuckle a bit when the IT-centric guys complain about their tough design task. Embedded designers have to design for the worst of environments such as the heat and vibration found in a military vehicle, on the factory floor, or even in a server-like setting in a communications application. Many embedded applications - say real-time vision inspection in a factory - require the maximum performance that today's processors have to offer, and must reliably operate around the clock. Intel® Architecture (IA) processors offer embedded teams a leg up in reliable system design compared to other general purpose processors or even other processors that use the x86 instruction set. That advantage is precise temperature sensors on chip, and automatic thermal protection capabilities integrated into the processor.


Most designers of complex digital ICs include some form of thermal protection these days, but often it's a simple diode that doesn't accurately measure temperature. Intel has been at the forefront of thermal protection for years. You might peruse this IDF paper that describes the evolution of the thermal protection features in the transition from the P6 microarchitecture used in Intel® Pentium III processors to that in the Intel® NetBurst microarchitecture used in the Pentium 4.


The Intel® Core microarchitecture that's the basis for the Intel® Core™ 2 family of processors introduced more advanced thermal protection. This html version of an article from the Intel Technology Journal describes the integration of digital temperature sensors localized to each core as opposed to the single analog sensor used previously. The image below depicts the sensors.































Embedded design teams can both rest easy that the integrated thermal management features provides automatic protection for their system and can proactively develop software to monitor temperatures and actively manage the system in the face of hot conditions.


Generally, IA processors work similarly across generations in the face of temperature issues. When the core temperature exceeds a preset value indicating a hot condition, the processor's TCC (Thermal Control Circuit) automatically takes steps to lower the core temperature. Early TCC implementations simply modulated or throttled the core clock - making the core intermittently inactive. The modulation process does not cause the processor to lose state information, but does reduce performance. If temperatures continue to rise beyond a temperature threshold deemed catastrophic, the TCC places the processor in thermal shutdown ceasing operation until the next reset cycle. The drastic shutdown saves the hardware from permanent failure.


The latest IA processors have an even more elaborate 2-state reaction to potential temperature issues. Processors such as the Intel® Xeon® 5500 series that are based on the microarchitecture codenamed Nehalem first automatically reduce the operating clock frequency and the input voltage to attempt to eliminate the thermal problem. If the problem persists, the processor takes the additional step of modulating operation. You will find details of TCC operation in processor datasheets. For example, check the Thermal Chapter in this Nehalem-based-processor data sheet.


Embedded designers have access to the data from the digital temperature sensors so long as their system implements a BIOS with ACPI (Advanced Configuration & Power Interface) support. American Megatrends, an Affiliate member of the Intel® Embedded Alliance (Intel® ECA), supports ACPI in its AMBIOS8 product that targets everything from PCs to embedded applications. And Phoenix Technologies, another Affiliate member of the Alliance supports ACPI in a variety of BIOS products.


Design teams can leverage the ACPI capabilities for a variety of uses. For instance ACPI supports power management allowing systems to enter deep sleep mode between periods of activity [Author's note: insert link to ACPI post once that post is active]. For reliability and thermal protection needs, embedded teams can constantly monitor the health of the system and proactively take steps to avoid dangerous thermal conditions. Indeed, ACPI will allow system software to read case temperatures and the temperature of devices such as disk drives - in addition to processor temperatures.


Is thermal management a part of your core competency? How do you protect your system? Has the Intel TCC implementation ever saved one of your designs from failure? Please share your experience via a comment with other followers of the Intel® Embedded Community.