Compared with single-core, multi-core parallelism enables processing a constant volume of data in less time (quicker turnaround), more data within a constant time (increased throughput), or a combination of both. Symmetric Multiprocessing (SMP) is a computer system with multiple CPUs that share the same Operating System (OS) and main memory. SMP OSs are well suited for multi-core due to their inherent parallel processing capabilities. SMP treats the multi-core hardware as a shared resource with a single OS image running across all cores. Processes are dynamically assigned to run in the available cores in a truly parallel manner.


Let me define a few terms. Process is a term generally used to describe the "heavyweight" unit of execution that is a collection of resources required for program instruction execution, such as virtual memory, I/O descriptors, the runtime stack, signal handlers, and other control resources. A thread of execution is associated with a process and is viewed more as the "lightweight" unit of execution because threads share the process's environment, which makes context switches between shares efficient. Threads also share an address space with other threads. Task is a term commonly used interchangeably with "process" and "thread," but more accurately is simply a group of instructions that are a part of a program and associated more with real-time operating systems.


Existing applications that already break out processing into concurrent jobs can realize multi-core benefits with few, if any, changes. For example, a networked printer application with separate threads for image processing and network protocols should have higher performance if those threads can run in parallel. Optimization for serial code can be achieved by multi-threading the compute- or data-intensive portion of a program to extract its parallelism. Although this can be tricky, if done correctly it produces the best performance and scalability. Write the code once and performance will scale on systems with any number of cores. Various Intel® Software Development Products are available to support this effort, including performance analysis, thread debugging and profiling, performance libraries, and C compilers.



To achieve optimal results, the software developer will benefit by understanding a few subtleties of the multi-core architecture and tuning the SMP implementation to take advantage of features that are platform processor architecture-specific, such as shared L2 cache. The Intel® multi-core processor family comprises uni-processor systems in which all cores share a common L2 cache; as well as dual- and multi-processor . These variants can affect software performance.



The SMP OS normally assigns processes to the available cores on a first-available basis. At some point the process will relinquish control to the OS, e.g., pending an I/O request or when the OS wants to give a time slice for execution to another process. When execution is resumed, the OS might well assign that same process to a different core from where it left off. In the uni-processor case, since all cores share a common cache, the caching effect is the same regardless of where the process executes. However, in "multi-package" systems- multi-core processors that do not share last level cache- if a process running on one core is suspended and then resumes on a core served by another cache, chances of a cache miss are greater, resulting in a missed opportunity to take advantage of the performance benefit of shared L2 cache. This condition can be circumvented by employing the SMP technique known as "processor affinity" where the programmer manually "pins," or restricts, process execution to a specific subset of cores- in this case the packages that share a cache- thus leveraging the shared L2 cache benefit for increased overall performance. This technique is also useful for threads that frequently share data.



SMP is a great software implementation for multi-core systems. It can take advantage of the additional cores by running multiple applications simultaneously (serial or multi-threaded), and for optimum performance and scalability the software should be programmed for parallelism and be aware of the specific processor platform's cache architecture.



While SMP is great, it's not the only system design for multiprocessing. Stay tuned for more choices!



  • Lori


Message Edited by serenajoy on 03-11-2009 08:42 PM