Intel Woodcrest, AMD's Opteron and Sun's UltraSparc T1: Server CPU Shoot-out
by Johan De Gelas on June 7, 2006 12:00 PM EST- Posted in
- IT Computing
The Official SPEC Numbers
SPEC FP and Int 2000 are the standard benchmarks to evaluate CPU performance. However, the benchmark numbers are highly dependant on the compiler. SPEC fp and Integer show the best case performance as the CPU runs on the aggressively compiled and highly optimized code. In the real world, code is compiled in a more conservative/less optimized way.
In practice this means that Intel's SPEC numbers - thanks to it's highly capable compiler team - are (slightly) higher than in real applications. Nevertheless, SPEC CPU 2000 is a good starting point to understand what a CPU is capable off. As mentioned earlier, the Xeon 5100 is the Xeon Woodcrest, based on the new core architecture.
The new Woodcrest is about 20-25% faster than the fastest dual-core Opteron. The 7% clockspeed advantage is most likely a result of the fact that the Woodcrest was baked with a newer 65nm process. If AMD manages to keep up with Intel when it comes to clockspeed, the advantage of their newest CPU might shrink to 15% or less. However, Intel's Woodcrest will have a much bigger advantage in all applications that make heavy use of 64 and 128-bit SSE.
When it comes to integer performance, the Woodcrest numbers are simply stunning and vastly superior to any other architecture. Let us find out if this vastly superior integer performance in SPEC Int 2000 pays off in server applications.
Latencies...
LMBench is a set of micro-benchmarks which can be helpful for determining memory latency and instruction latencies. We tested with LMBench 3.0a-5. It must be said that LMBench is usually right, but not always. If the benchmark is not aware of some of the particularities of a certain architecture, it can measure wrong values. So we have to double check if the values measured make sense.
The massive 4 MB L2 cache has an amazingly low latency of 14 cycles. This seems to be the worst case, as we have measured 12 cycles with other benchmarking tools such as ScienceMark. Nevertheless, even 14 cycles at 3 GHz is pretty amazing. The Core Duo, a.k.a. Yonah, accesses a shared cache that's half as large in 14 cycles at a substantially lower 2.33 GHz.
On the other hand, the memory latency very high; luckily the 4 MB L2 cache will minimize that effect. The problem seems to be the FB-DIMMs. The Advanced Memory Buffer introduces extra latency, and of course the registered DDR-2 533 chips with a CAS latency of 4 have a higher latency by themselves. This results in a memory subsystem with pretty high 115 ns latency, while the Opteron has access to the RAM in only 73 ns
ScienceMark didn't agree completely and reported about 65-70 ns latency on the Opteron system and 70-76 ns (230 cycles) on the Woodcrest system. We have reason to believe that Woodcrest's latency is closer to what LMBench reports: the excellent prefetchers are hiding the true latency numbers from Sciencemark. It must also be said that the measurements for the Opteron on the Opteron are only for the local memory, not the remote memory.
SPEC FP and Int 2000 are the standard benchmarks to evaluate CPU performance. However, the benchmark numbers are highly dependant on the compiler. SPEC fp and Integer show the best case performance as the CPU runs on the aggressively compiled and highly optimized code. In the real world, code is compiled in a more conservative/less optimized way.
In practice this means that Intel's SPEC numbers - thanks to it's highly capable compiler team - are (slightly) higher than in real applications. Nevertheless, SPEC CPU 2000 is a good starting point to understand what a CPU is capable off. As mentioned earlier, the Xeon 5100 is the Xeon Woodcrest, based on the new core architecture.
SPECfp | ||
Clockspeed | SPEC fp 2000 | |
POWER5+ | 2200 | 3271 |
Itanium 2 | 1666 | 2851 |
Xeon 5160 | 3000 | 2783 |
Opteron | 2800 | 2256 |
Pentium 4 E | 3733 | 2232 |
The new Woodcrest is about 20-25% faster than the fastest dual-core Opteron. The 7% clockspeed advantage is most likely a result of the fact that the Woodcrest was baked with a newer 65nm process. If AMD manages to keep up with Intel when it comes to clockspeed, the advantage of their newest CPU might shrink to 15% or less. However, Intel's Woodcrest will have a much bigger advantage in all applications that make heavy use of 64 and 128-bit SSE.
SPECint | ||
Clockspeed | SPEC Int 2000 | |
Xeon 5160 | 3000 | 3057 |
Pentium 4 E | 3733 | 1870 |
Opteron | 2800 | 1837 |
Pentium 4 Xeon | 3733 | 1813 |
POWER5+ | 2200 | 1705 |
Itanium 2 | 1666 | 1502 |
When it comes to integer performance, the Woodcrest numbers are simply stunning and vastly superior to any other architecture. Let us find out if this vastly superior integer performance in SPEC Int 2000 pays off in server applications.
Latencies...
LMBench is a set of micro-benchmarks which can be helpful for determining memory latency and instruction latencies. We tested with LMBench 3.0a-5. It must be said that LMBench is usually right, but not always. If the benchmark is not aware of some of the particularities of a certain architecture, it can measure wrong values. So we have to double check if the values measured make sense.
LMBench | |||||||
Clockspeed | L1 (ns) | L1 (cycles) | L2 (ns) | L2 (cycles) | RAM (ns) | RAM (cycles) | |
Xeon 5160 3 GHz | 3000 | 1.01 | 3 | 4.7 | 14 | 117.3 | 345 |
Pentium- M 1.6 GHz | 1593 | 2 | 3 | 6 | 10 | 92.1 | 147 |
Sun T1 1 GHz | 980 | 3 | 3 | 22.1 | 22 | 107.5 | 105 |
Opteron 275 | 2209 | 1 | 3 | 5.5 | 12 | 73 | 161 |
Xeon Irwindale 3.6 GHz | 3594 | 1 | 4 | 8 | 28 | 48.8 | 175 |
The massive 4 MB L2 cache has an amazingly low latency of 14 cycles. This seems to be the worst case, as we have measured 12 cycles with other benchmarking tools such as ScienceMark. Nevertheless, even 14 cycles at 3 GHz is pretty amazing. The Core Duo, a.k.a. Yonah, accesses a shared cache that's half as large in 14 cycles at a substantially lower 2.33 GHz.
On the other hand, the memory latency very high; luckily the 4 MB L2 cache will minimize that effect. The problem seems to be the FB-DIMMs. The Advanced Memory Buffer introduces extra latency, and of course the registered DDR-2 533 chips with a CAS latency of 4 have a higher latency by themselves. This results in a memory subsystem with pretty high 115 ns latency, while the Opteron has access to the RAM in only 73 ns
ScienceMark didn't agree completely and reported about 65-70 ns latency on the Opteron system and 70-76 ns (230 cycles) on the Woodcrest system. We have reason to believe that Woodcrest's latency is closer to what LMBench reports: the excellent prefetchers are hiding the true latency numbers from Sciencemark. It must also be said that the measurements for the Opteron on the Opteron are only for the local memory, not the remote memory.
91 Comments
View All Comments
zsdersw - Thursday, June 8, 2006 - link
I'm not saying the board is particularly stellar.. I'm saying that it's referred to by MSI as a "server" product.ashyanbhog - Thursday, June 8, 2006 - link
Irespective of what MSI says,fact is there were better mainstream boards for Anandtech to choose from if a honest, independent review was their intention
zsdersw - Thursday, June 8, 2006 - link
I.E., your comment belongs under someone else's.. not mine.zsdersw - Thursday, June 8, 2006 - link
And that's completely irrelevant to what I was saying.ashyanbhog - Thursday, June 8, 2006 - link
all I was saying is, its nice to see Intel finally making a comebackbut Anandtech seems have conducted a skewed benchmark that favours Intel, that unfairly increases the performance gap between Opteron and Woodcrest
In the final summary of the review he says
"In one word: Woodcrest rocks!"
There are quite a few holes in the review, the motherboard is just on of them,
I quoted MySQL number errors in my posts above,
just search for "ashyanbhog" in the page and read my earlier comments if you are interested.
AnandThenMan - Thursday, June 8, 2006 - link
What you're saying in general is irrelevant. Intel calls their integrated graphics "high performance" but that doesn't make it so.
MSI calling that a server board is just marketing, it does not represent what a true, high performance server class mobo is all about. Not that it's a bad piece of hardware, it is good for the price to be sure. But it is NOT a server class product.
zsdersw - Thursday, June 8, 2006 - link
At a certain price point, it could certainly be a nice entry-level server board.Performance alone isn't what makes a server-class motherboard a server-class motherboard.
ashyanbhog - Thursday, June 8, 2006 - link
One of the motherboards used in this review is a cheap piece that trades performance to keep price low.
Why was that motherboard selected over mainstream server/workstation boards that are proven to offer slightly better performance? Why pick a 250$ MSI board for opteron over $500 boards from Tyan, Iwill, Supermicro or others. The Intel Xeon "Inderwale" gets a $500 board, so price could not have been the issue.
So what's the point in using a Single Channel board for this benchmark, when price was not a limitation?
Single memory channel boards like the one from MSI, are known to offer lower performance than dual / dedicated memory channel boards when used in 2P Opteron configurations. Dual Channel boards are the mainstream boards for 2P Opteron systems. There are plently Server boards available in Dual / dedicated memory lane configuration. There are enough reviews on the net to show the performance diff b/w single memory channel boards and dual memory channel boards
The issue is not about the MSI or its class, the issue is why did Anandtech pick a Single memeory channel board instead of a more mainstream dual memory channel board.
Hope that clears up "zsdersw"'s query
zsdersw - Thursday, June 8, 2006 - link
I'm not making excuses for the choices that were made regarding this comparison test. I'm talking about what constitutes a "server-class" motherboard.ashyanbhog - Thursday, June 8, 2006 - link
Game PC review link for the above commenthttp://www.gamepc.com/labs/view_content.asp?id=tig...">http://www.gamepc.com/labs/view_content.asp?id=tig...