Skip navigation


4 Posts authored by: dsandy_emerson

It wasn't that long ago that all eyes were on processor clock speeds. It seemed like every week a faster processor was announced. First it was a mad scramble to 1GHz. Next, the race was on for 2GHz. When could we expect 10GHz? It was mesmerizing. Sadly, even though raw Megahertz is an unreliable predictor of performance, many design decisions were driven almost entirely by these numbers. Fortunately we've learned a bit from this, right?



We've now entered an era of focus on number of cores. First it was dual-core, then quad. Who will be the first company to offer one-hundred cores? When can we expect a thousand-core processor? Suddenly it's all about cores. Again, I feel the familiar draw to adopt a "more is better" approach to product comparison. Perhaps you do too. Let me offer the following tips to help resist this temptation.



Not all Processors are Created Equal



First and foremost, processors are designed with a purpose in mind: DSPs are designed to perform signal and image processing tasks, packet processors are best suited for manipulating network headers and routing traffic, etc. Processor specialization comes from unique instruction sets, architectural tuning, and hardware acceleration engines. The bottom line is that processors will perform best on the applications for which they were designed, and might perform quite poorly in other uses.



This isn't to say that a packet processor or communications processor can't run your server-class application; however, this should be approached with caution. You probably have an uphill battle ahead of you regardless of the number of cores available. Processor type, not number of cores, should be the first criteria in selecting a solution.



You Can't Get Something for Nothing



Secondly, processor designers have a limited number of transistors they can use (determined by semiconductor process) and more cores come at the expense of something else. A fifty-core processor designed using the same semiconductor technology as a four-core core processor must have gotten rid of something. In many cases, the processor cache is the first thing to go. After that, floating point, machine word width, and instruction set depth are all candidates for the chopping block. In order to make a reasonable comparison, you need to determine how important each of these features is to your design.



Selecting the right multicore processor for your application doesn't need to be a complicated process. Guided by these tips you can probably avoid the pitfalls associated with simply counting cores and hoping for the best. If however you would like to delve into this topic in more depth, I invite you to post to this blog or contact me directly.

King Demandius tasked his two royal advisors, Max and Rupert, with fixing the problems with the road between the kingdom's two most prosperous cities: Hither and Yon. Max, the head of the ministry of taxes and tolls, realized that royal revenue could be improved if traffic on the road was increased. Horse and chariot speeds had already been pushed to their limits. His solution: widen the road in order to allow multiple travelers to pass simultaneously. Max's proposal was warmly accepted by the king, giving rise to the kingdom's first multilane-tollway.


Rupert, on the other hand, had received complaints from the king that travel time between the royal palace in Hither and the vacation home in Yon was just too long. During these trips the road was closed to everyone but the royal caravan so additional lanes were of no benefit. Horse and chariot speeds couldn't be increased. What was Rupert to do? In the end he made two proposals: Move Hither closer to Yon or add in-chariot entertainment to the royal caravan to help pass the time. Rupert now enjoys full accommodations in the dungeon.


In the same way, it is the best of times and the worst of times for embedded processor users. If you are looking to increase the throughput of your application, like Max, adding more cores is equivalent to creating a multilane expressway. More cores can be a very effective solution for increasing total application capacity. However, if your problem is more like Rupert's, your options are a bit more limited. Core speeds are not increasing and more core's by themselves, like more highway lanes, might do very little to improve application speed.


Fortunately, there can be a happy ending to our story. Although less obvious, most multicore processors do offer possibilities for improving application speed. I'll be talking about some of these on November 13 in a webinar hosted by Open Systems Publishing. I'd love to have you there. In the meantime, if you have any success stories or tricks to increase application speed on Intel multicore processors, I invite you to share them here.

Plumbing repairs – Argh! I never have all the parts that I need to complete my project. Even though I keep a supply of valves and fittings in my garage, I always seem to need one more trip to the store to get the job done. At least in my household, the time to fix a sink has more to do with driving to and from the hardware shop than the actual repairing.


Strangely, microprocessor performance is a lot like my plumbing repairs. If a processor doesn't have the data or instructions that it needs, it must get the information it requires before it can continue. If the needed information is in its caches (equivalent to the collection of parts in my garage) then the delay will be fairly insignificant. However, if it needs to fetch the information from main memory (a trip to the hardware store) then overall performance will sag. In many applications it is this access timing (or latency) that determines performance more than any other factor.


The latency problem becomes even more interesting as we add more processor cores into the mix. Continuing with the plumbing analogy, imagine if seven other do-it-yourselfers arrive at the hardware store at the same time as I do -- this is the same as eight processor cores attempting to access memory at the same time. Chances are pretty good that some of us are going to have to wait in line while the others are paying for their goods. The average time for each of us increases.


What if everyone in town decided to visit the same hardware store at the same time as me? I could easily spend days waiting in traffic before I even reached the store parking lot. Of course, this never happens because it’s extremely unlikely that everyone will go the store at the same time. Furthermore, there are many hardware stores in my town – this tends to balance the load during peak hours. The equivalent multiprocessing solution for limiting congestion is to have multiple memory channels. More memory channels help to balance the load and reduce the average access latency.


Just to underscore the importance of watching latency, let’s look at a few numbers. For most microprocessors, accessing the L1 cache takes between two and four nanoseconds. Contrast this with main memory latencies of sixty to one hundred eighty nanoseconds. Programs with poor cache utilization (and hence, long latencies) can easily run fifteen to ninety times slower!


Okay, latency is important. But where do we go from here? As a starting point I recommend obtaining a copy of “Intel® 64 and IA-32 Architectures Optimization Reference Manual”. This can be freely downloaded from Intel's website and contains a wealth of information about optimizing for memory accesses. Emerson Network Power Embedded Computing also has sophisticated tools for estimating multi-core processing performance. I will gladly address whatever questions you might have. I look forward to hearing from you.

Which is the best embedded multicore processing chip? Let me start by way of analogy.


Suppose you are the owner of a car dealership. One day a customer comes in who wants to buy “the best” car on your lot. Smiling, you begin asking questions. One thing you know from experience is that there is no one “best car.” Some customers are looking for no-frills while others want luxury. Customers wanting highest levels of fuel efficiency are only satisfied with hybrid vehicles. When it comes to “performance,” “best” can mean the fastest time from zero to sixty, or the highest towing capacity. Asking the right questions is tantamount to matching a car with each potential driver.


The same is true of processor chips. Unfortunately, there is no “best” for all applications. Power consumption, integration, cost, and performance characteristics all come into play. Likewise, application structure, programming model and scalability could also be factors. What is best for your application might not be right for mine.


Why then, do we gravitate immediately to the number of processor cores? Assuming that more cores are better is a lot like insisting that a V12 engine is always better than a four cylinder. It’s exciting, but it just isn’t so. Let’s take a break from core counting for a moment and get back to asking questions more relevant to embedded computing. I for one know of several applications that would benefit from something other than ever increasing cores. I suspect you do, too.

Filter Blog

By date: By tag: