ECC Memory Isn't Everything -- It's the Only Thing
by Bob Day (bobday.nh@verizon.net)
Copyright (C) January, 2005 by Bob Day. All rights reserved.
First, I have to state my total bias on this topic. I have been using ECC memory in my computers since I built my first one in about 1997. I went to a lot of effort back then to find a mainboard that supported ECC and finally found a Megatrends HX-83 board that incorporated the Intel 82430HX Triton II chipset. Why do I insist on ECC memory? I just don't like the possibility of undetected errors occurring -- errors that could corrupt my hard drive, the registry in Windows XP, or files downloaded from the internet, and having no way to know the corruption had happened or what might have caused it. Also, memory errors can cause "mysterious random crashes" of a computer, and, once again, there is no way of knowing what caused it.
In the past few years, the standard memory size of desktop PCs has increased from about 32MB to 256MB or more. At the same time, memory chips have become more dense (more bits per chip), and faster. And software uses increasing amounts of memory. Adobe Photoshop CS, for example, can put 1GB to good use. All of these factors can only act to increase the memory error rate. And, for that reason, ECC memory is becoming increasingly important.
According to a white paper
published in January 2004 by Tezzaron Semiconductor, a PC with 512MB of memory
running 24 hours a day will sustain a memory error about every 10 days.
(Reference:
http://www.tezzaron.com/about/papers/Soft%20Errors%201_1%20secure.pdf
See Appendix B, Calculations, on page 6) Each memory error stands a chance
of causing your computer to crash, or worse yet, corrupting a program or data on
your hard drive. Any process that passes data through memory is vulnerable
to memory errors. Such processes include downloading files from the
internet, reading or writing CD's or DVD's, heavy number crunching, and defragmenting your hard drive.
Will ECC memory detect all memory errors? No. But it will go a long way in that direction. It will detect and correct all single bit errors, detect all two bit errors, and most even-number-of-bit errors of more than two bits. Multiple bit errors of an odd number of bits will be mistaken for a single bit error, erroneously corrected, and consequently go undetected.
Does use of ECC memory affect a computer's performance? ECC obviously won't make a computer go faster -- after all, the computer must do some work to do the ECC checking, and that takes time. On my computer, which has a 1GHz Celeron Pentium III CPU, a 100MHz front side bus, and 512MB of SDRAM memory, I have found that ECC checking adds one CPU cycle per RAM access. On my computer, with ECC checking disabled, a read of a random 32 bit word in RAM (if it is not in cache) averages a little over 21 CPU cycles. Thus, for a random 32 bit read, ECC checking adds about 4.6 percent to the time required. That's quite a lot. However, typically RAM accesses are not random, and very often the data required is already in cache, which eliminates the need to read from RAM. And to make the situation better still, a read of RAM brings not just a single word into cache, but an entire cache line, which is typically 32 or 64 bytes depending on the CPU (which, along with an occasional translation lookaside buffer access, is why my computer averages about 21 CPU cycles to read a random 32 bit word from RAM). And the single cycle required for ECC checking covers all the bytes in the cache line (at least it does on my computer, which has 32 byte cache lines).
So, in practice, how much does ECC checking cost in performance? I'm not sure, but I've surfed the internet quite a bit and I've seen estimates that ECC causes a performance reduction of between 1 and 4 percent, with most estimates hovering around 3 percent. I don't know where these figures are coming from though. I haven't found a post that refers to actual studies or gives sources, and I wonder how many of the posters are just rewording what they have read in other posts.
I do have one comparison of ECC memory to standard memory that has a little bit of solidity to it. On the Standard Performance Evaluation Corporation (SPEC) website, take a look at the results for a Dell Precision WorkStation 340 (2.4GHz P4):
http://www.specbench.org/osg/cpu2000/results/res2002q3/cpu2000-20020909-01625.html
and an Intel computer with a D850EMV2 mainboard (2.4GHz P4):
http://www.specbench.org/osg/cpu2000/results/res2002q3/cpu2000-20020827-01587.html
Both systems have the same type and speed of CPU, the same chipset (the Intel 850E), the same front side bus speed, and the same type (RDRAM) and speed (PC800-40) of memory. But the Dell has ECC memory, and the Intel does not. The result was that the SPECint2000 rating of the Dell computer, with ECC memory, was actually a little better (faster) than that of the Intel computer. The same was true of the SPECfp2000 rating. There could be lots of little differences that could account for this outcome. But it seems clear that ECC memory is not a big negative.
So the conclusion I think we can draw from the internet posts I referred to, and the comparison in the previous paragraph, is that ECC memory does affect performance, but not much.
Given all of the above, it seems to me that ECC memory is the only memory that is appropriate for today's computers. Remember, every memory error can cause hard drive or registry corruption, or a system crash, depending on where it occurs. To me, the choice is not between ECC or non-ECC memory, but between ECC and Chipkill (which is a more advanced and stronger type of ECC).
If you want to test your computer's memory, there is a memory diagnostic tool called memtest86+, which can be downloaded from http://www.memtest.org.