Platform hardware and software tuning help you to extract the last bit of performance out of a system but hardware configuration ultimately determines a platform’s performance potential. Nowhere is this statement truer than for memory configuration. The number of memory banks, the operating speed of the memory interface, and the FSB operating frequency all play a role in determining overall system memory performance.
For example, the Intel® 5100 MCH chipset supports FSB speeds of 667, 1067, and 1333 Mtransfers/sec. It’s obvious that processors supporting a 1333-Mtransfers/sec FSB speed deliver the best performance. Higher FSB operating frequencies result in higher effective FSB bandwidth and lower memory-access latency. However, there is some interplay among FSB frequency, memory channel count, and memory transfer rate that bears close examination if you want to maximize system performance with any component set.
Note: The following information applies directly to processor boards based on the Intel® 5100 MCH chipset such as Kontron’s CP6014 Dual Intel® Quad-Core LV Xeon® 6U CompactPCI Processor Board. (Kontron is a Premier member of Intel® Embedded Alliance). These performance numbers also suggest similar type results for processor boards based on other chipsets, but only performance testing can confirm the real performance numbers for those other chipsets.
The memory subsystem plays a vital role in platform performance and memory performance often becomes the limiting factor for benchmark throughput. It’s critical to populate memory with end performance in mind. When populating memory, you must pick the right mix of:
- The number of memory channels
- The number of memory DIMMs per channel
- Dual-rank versus single-rank DIMMs
- Memory operating frequency
All of these factors play a role in determining memory performance.
Memory Channel Population
The Intel® 5100 MCH chipset implements two independent DDR2 memory channels. Each DDR2 channel has its own independent memory controller. You’ll get maximum performance from the Intel® 5100 MCH chipset’s memory system by populating both channels. For memory configurations with multiple DIMMs divide DIMMs equally between the two channels.
Figure 1 compares the performance delta when two DIMMs are placed in one channel versus placing one DIMM in each of the two channels. These memory-bandwidth test results are based on an Intel internal benchmark that behaves much like the Stream Benchmark, but with higher memory efficiency. Two-channel operation with 1-Gbyte, dual-rank, DDR2-667 modules delivers a 92\% performance increase versus 1-channel operation (based on the CPU is issuing 66\% memory-read and 33\% memory-write requests.) Clearly, utilizing both memory channels of the Intel® 5100 MCH chipset’s memory channels boosts memory performance.
Figure 1: Memory performance – 1 vs 2 channels
DIMMs per Channel
Each of the Intel® 5100 MCH chipset’s memory channels supports as many as three DDR2 DIMMs. The number of DIMMs placed in each channel also affects performance. The estimated performance gains from one to two to three DIMM configurations. There’s a 4.5 percent performance improvement when going from one two DIMMs per channel using 1-Gbyte, dual-rank, DDR2-667 memory modules. Populating a memory channel with three DIMMs could potentially yield higher application performance for capacity-limited usage, but does not actually improve memory bandwidth. Generally speaking, populating both memory channels improves memory performance much more than installing multiple DIMMs in one channel.
For applications requiring the highest possible memory throughput, you should install two DIMMs per channel. For applications with strict power and cost limits, install only one DIMM in each of the two channels.
Dual-Rank versus Single-Rank
Dual-rank DIMMs (with a separate memory rank installed on each side of the DIMM) enable full-rank interleaving of 4:1 and deliver superior performance. Figure 2 shows the performance benefit delivered by dual-rank DIMMs (with 66\% read and 33\% write traffic) for various memory configurations. For the maximum configuration the dual rank memory provides an additional 6.5 percent throughput.
Figure 2: Memory performance – Single- vs Dual-Rank
DDR2-533 versus DDR2-667
Memory operating frequency alone does not determine overall system performance. The FSB frequency also plays a role due to memory gearing, which is the frequency ratio between the front side bus and memory interface operating frequencies. When the memory-gearing ratio is not an integer, additional memory latency occurs due to the need to force an integer ratio by slowing the memory controller. Table 1 shows possible frequency ratios related to memory gearing.
Table 1: Memory Gearing Ratios
However, benchmark tests show that the reduced memory latency of DDR2-667 memory compared to DDR2-533 memory makes up for the negative impact of memory gearing. A simple rule of thumb is that higher memory frequency provides higher performance. Figure 3 shows the relative CPU memory bandwidth recorded with each combination of FSB and memory operating frequency for a dual-socket configuration.
Figure 3: Memory performance – Memory Gearing
Note that there is no performance gain for a 10333/533 configuration relative to a 1067/533 configuration, but there’s an estimated 22\% bandwidth improvement for either DDR2-667 configuration versus either DDR2-533 configuration. For these tests, memory gearing played no role in performance.
What’s your experience with memory? Do these benchmark tests reflect your reality?
1. Perry Taylor, Configuring and Tuning for Performance on Intel® 5100 Memory Controller Hub Chipset Based Platforms, Intel® Corp, Intel® Technology Journal, Volume13, Issue 1, 2009, pages 16-28.
Roving Reporter (Intel Contractor)
Intel® Embedded Alliance