Intel Woodcrest, AMD's Opteron and Sun's UltraSparc T1: Server CPU Shoot-out
by Johan De Gelas on June 7, 2006 12:00 PM EST- Posted in
- IT Computing
The Official SPEC Numbers
SPEC FP and Int 2000 are the standard benchmarks to evaluate CPU performance. However, the benchmark numbers are highly dependant on the compiler. SPEC fp and Integer show the best case performance as the CPU runs on the aggressively compiled and highly optimized code. In the real world, code is compiled in a more conservative/less optimized way.
In practice this means that Intel's SPEC numbers - thanks to it's highly capable compiler team - are (slightly) higher than in real applications. Nevertheless, SPEC CPU 2000 is a good starting point to understand what a CPU is capable off. As mentioned earlier, the Xeon 5100 is the Xeon Woodcrest, based on the new core architecture.
The new Woodcrest is about 20-25% faster than the fastest dual-core Opteron. The 7% clockspeed advantage is most likely a result of the fact that the Woodcrest was baked with a newer 65nm process. If AMD manages to keep up with Intel when it comes to clockspeed, the advantage of their newest CPU might shrink to 15% or less. However, Intel's Woodcrest will have a much bigger advantage in all applications that make heavy use of 64 and 128-bit SSE.
When it comes to integer performance, the Woodcrest numbers are simply stunning and vastly superior to any other architecture. Let us find out if this vastly superior integer performance in SPEC Int 2000 pays off in server applications.
Latencies...
LMBench is a set of micro-benchmarks which can be helpful for determining memory latency and instruction latencies. We tested with LMBench 3.0a-5. It must be said that LMBench is usually right, but not always. If the benchmark is not aware of some of the particularities of a certain architecture, it can measure wrong values. So we have to double check if the values measured make sense.
The massive 4 MB L2 cache has an amazingly low latency of 14 cycles. This seems to be the worst case, as we have measured 12 cycles with other benchmarking tools such as ScienceMark. Nevertheless, even 14 cycles at 3 GHz is pretty amazing. The Core Duo, a.k.a. Yonah, accesses a shared cache that's half as large in 14 cycles at a substantially lower 2.33 GHz.
On the other hand, the memory latency very high; luckily the 4 MB L2 cache will minimize that effect. The problem seems to be the FB-DIMMs. The Advanced Memory Buffer introduces extra latency, and of course the registered DDR-2 533 chips with a CAS latency of 4 have a higher latency by themselves. This results in a memory subsystem with pretty high 115 ns latency, while the Opteron has access to the RAM in only 73 ns
ScienceMark didn't agree completely and reported about 65-70 ns latency on the Opteron system and 70-76 ns (230 cycles) on the Woodcrest system. We have reason to believe that Woodcrest's latency is closer to what LMBench reports: the excellent prefetchers are hiding the true latency numbers from Sciencemark. It must also be said that the measurements for the Opteron on the Opteron are only for the local memory, not the remote memory.
SPEC FP and Int 2000 are the standard benchmarks to evaluate CPU performance. However, the benchmark numbers are highly dependant on the compiler. SPEC fp and Integer show the best case performance as the CPU runs on the aggressively compiled and highly optimized code. In the real world, code is compiled in a more conservative/less optimized way.
In practice this means that Intel's SPEC numbers - thanks to it's highly capable compiler team - are (slightly) higher than in real applications. Nevertheless, SPEC CPU 2000 is a good starting point to understand what a CPU is capable off. As mentioned earlier, the Xeon 5100 is the Xeon Woodcrest, based on the new core architecture.
SPECfp | ||
Clockspeed | SPEC fp 2000 | |
POWER5+ | 2200 | 3271 |
Itanium 2 | 1666 | 2851 |
Xeon 5160 | 3000 | 2783 |
Opteron | 2800 | 2256 |
Pentium 4 E | 3733 | 2232 |
The new Woodcrest is about 20-25% faster than the fastest dual-core Opteron. The 7% clockspeed advantage is most likely a result of the fact that the Woodcrest was baked with a newer 65nm process. If AMD manages to keep up with Intel when it comes to clockspeed, the advantage of their newest CPU might shrink to 15% or less. However, Intel's Woodcrest will have a much bigger advantage in all applications that make heavy use of 64 and 128-bit SSE.
SPECint | ||
Clockspeed | SPEC Int 2000 | |
Xeon 5160 | 3000 | 3057 |
Pentium 4 E | 3733 | 1870 |
Opteron | 2800 | 1837 |
Pentium 4 Xeon | 3733 | 1813 |
POWER5+ | 2200 | 1705 |
Itanium 2 | 1666 | 1502 |
When it comes to integer performance, the Woodcrest numbers are simply stunning and vastly superior to any other architecture. Let us find out if this vastly superior integer performance in SPEC Int 2000 pays off in server applications.
Latencies...
LMBench is a set of micro-benchmarks which can be helpful for determining memory latency and instruction latencies. We tested with LMBench 3.0a-5. It must be said that LMBench is usually right, but not always. If the benchmark is not aware of some of the particularities of a certain architecture, it can measure wrong values. So we have to double check if the values measured make sense.
LMBench | |||||||
Clockspeed | L1 (ns) | L1 (cycles) | L2 (ns) | L2 (cycles) | RAM (ns) | RAM (cycles) | |
Xeon 5160 3 GHz | 3000 | 1.01 | 3 | 4.7 | 14 | 117.3 | 345 |
Pentium- M 1.6 GHz | 1593 | 2 | 3 | 6 | 10 | 92.1 | 147 |
Sun T1 1 GHz | 980 | 3 | 3 | 22.1 | 22 | 107.5 | 105 |
Opteron 275 | 2209 | 1 | 3 | 5.5 | 12 | 73 | 161 |
Xeon Irwindale 3.6 GHz | 3594 | 1 | 4 | 8 | 28 | 48.8 | 175 |
The massive 4 MB L2 cache has an amazingly low latency of 14 cycles. This seems to be the worst case, as we have measured 12 cycles with other benchmarking tools such as ScienceMark. Nevertheless, even 14 cycles at 3 GHz is pretty amazing. The Core Duo, a.k.a. Yonah, accesses a shared cache that's half as large in 14 cycles at a substantially lower 2.33 GHz.
On the other hand, the memory latency very high; luckily the 4 MB L2 cache will minimize that effect. The problem seems to be the FB-DIMMs. The Advanced Memory Buffer introduces extra latency, and of course the registered DDR-2 533 chips with a CAS latency of 4 have a higher latency by themselves. This results in a memory subsystem with pretty high 115 ns latency, while the Opteron has access to the RAM in only 73 ns
ScienceMark didn't agree completely and reported about 65-70 ns latency on the Opteron system and 70-76 ns (230 cycles) on the Woodcrest system. We have reason to believe that Woodcrest's latency is closer to what LMBench reports: the excellent prefetchers are hiding the true latency numbers from Sciencemark. It must also be said that the measurements for the Opteron on the Opteron are only for the local memory, not the remote memory.
91 Comments
View All Comments
JohanAnandtech - Thursday, June 8, 2006 - link
I should have mentioned this: most of the tests have also been done on SUSE linux SLES9. The reason why we use Gentoo is that we are able to use the latest kernel and to tune the kernel specifically for the AMD or Intel architecture.With SUSE Enterprise you need to wait for SUSE to use a new kernel. Your suggestion is noted, and from now on I will include the SUSE SLES numbers too.
But to call our numbers useless, well that is a heavy exageration. There was about 1-2% difference between running on Gentoo than on SUSE. It is only natural: they both use more or less the same kernel, only the tools are different.
ashyanbhog - Thursday, June 8, 2006 - link
"Two months of testing and tweaking"so thats the time you took to make sure you could say
"In one word: Woodcrest rocks!"
and suprisingly your emotions were quite tepid when AMD processors where showing similar performance advantages over Intel processors earlier!
http://www.anandtech.com/IT/showdoc.aspx?i=2447&am...">http://www.anandtech.com/IT/showdoc.aspx?i=2447&am...
ashyanbhog - Thursday, June 8, 2006 - link
What 1-2% difference b/w SLES and Gentoo are you talking about? Anand's own earlier benchmarks show SLES performance as 9-17% better than than Gentoo!http://www.anandtech.com/IT/showdoc.aspx?i=2447&am...">http://www.anandtech.com/IT/showdoc.aspx?i=2447&am...
If you have specifically used Gentoo for the optimization options that it provides, why didn't you list the specific compile time optimizations for Intel and AMD that were finally used to run the benchmarks? The purpose of a independent benchmark is to ensure a setup that is neutral and verifiable by any third party using similar hardware and software. Does your review report provide the info necessary for the same?
Your earlier benchmarks using Linux + DB2 show dual dual core opterons gaining 50% - 80% improvement over dual single core opteron when more than 5 threads come into picture, and a mere 1% to 2% gain in case of one or two threads. Okay, I know DB2 was not part of this benchmarks this around, but shoudn't these figures have setoff enough alarm bells to force inclusion of something other than MySQL for database benchmarks?
Even MySQL on gentoo shows a modest 10% to 17% gain with concurrency numbers from 5 and higher in Sinle Core + Dual CPU vs Dual Core + Dual CPU. Strange that Linux and MySQL misbehave on Opteron this time and show a 10% performance degradation! You deserve a award for this! How can somebody contradict their own earlier benchmarks?
http://www.anandtech.com/IT/showdoc.aspx?i=2447&am...">http://www.anandtech.com/IT/showdoc.aspx?i=2447&am...
The MSI motherboard you used for the benchmarks has only a single channel to the memory bank for both the processors, a comprise made to cut its price and compete in the lowest market segment for 2P Opteron boards. A major design feature of the Opteron is its ability to use seperate memory channels for each procesor giving it NUMA capabilities, and dedicated memory lanes also cut lantencies when accessing the memory. Did you specifically choose this motherboard to negate opterons advantage? The Intel board used for "Irwindale" retails for around $500, the price for one used for woodcrest is not known, the MSI board is available for $250, so even the price range is different! "relatively cheap workstation board" as you noted in your earlier benchmarks. Were Tyan K8WE, ASUS K8N-DL, Supermicro H8DCi or the Iwill DK8EW that are more popular, so hard to come by? Also you dont specify which of the three opteron systems was used for which benchmark, or was it a average of three. The extreme attention to details usually found at Anand is suprisingly lacking for this review
http://www.ocforums.com/showthread.php?t=459111">http://www.ocforums.com/showthread.php?t=459111
http://forums.amd.com/lofiversion/index.php/t56855...">http://forums.amd.com/lofiversion/index.php/t56855...
http://geek.pricegrabber.com/search_getprod.php/ma...">http://geek.pricegrabber.com/search_getprod.php/ma...
http://geek.pricegrabber.com/search_getprod.php/ma...">http://geek.pricegrabber.com/search_get...php/mast...
Arent temprature readings also important, specially for a new xeon chip, as earlier ones had forced admins to double their AC capacities and discard covers of rack cabinets for better cooling.
Its good to know Intel is back on track, but this review seems to have only one purpose - Show woodcrest in favorable light against opterons.
rayl - Wednesday, June 7, 2006 - link
It doesn't take a server to compute that "Woodcrest rocks" = 2 words. :pmerlinm - Wednesday, June 7, 2006 - link
where are the postgresql quad core benchmarks? My experience is that 2-4 cores on postgresql gives you 1.7x the power on non i/o constrained databases. This would have been a huge upset to have PostgreSQL blow out mysql in a quad core configuration.also a postgresql.conf containing non-default values would have been nice.
blackbrrd - Wednesday, June 7, 2006 - link
Where does it say they are using the default postgresql.conf? Actually I can't find any information on what kind of tweaking that has been done here at all?Is there any special reason for only running postgresql on a single cpu instead of a dual dual core setup like you did for the rest of the tests? There are no commends about it on http://www.anandtech.com/IT/showdoc.aspx?i=2772&am...">page 9 atleast...
merlinm - Wednesday, June 7, 2006 - link
right...what I meant was, could you please supply .conf entries which where edited and changed from the stock configuration. Actually, for this type of benchmark (90% read), there's not a whole lot to change in postgresql.conf...generally the more writes there are the more you have to tweak.the major tweak in postgresql is to use prepared statements over the parameterized interface...
merlinm - Wednesday, June 7, 2006 - link
oh, and postgresql 8.1 is about 20+% faster than 8.0 in most read operations involving very small (one statement) transactions.squash - Wednesday, June 7, 2006 - link
Hello,With the recent "official" support in Ubuntu for that Niagra server, would it be possible to also include performance numbers for that server running Linux?
I have seen other benchmarks showing Linux to have improved performance on the same hardware compared to Solaris. Filesystem performance is typically much higher in ext2 vs ufs+logging, and if you scan Sun's issue tracking database, there are many entries for libc and kernel operations which are much slower than Linux.
Maybe as a seperate article....
Squash
OddTSi - Wednesday, June 7, 2006 - link
The Verify/s graph (the last one on the page) doesn't have a line for the Dual Opteron yet the author still claims "Again, the Opteron takes the lead." Does the Dual Opteron take the lead and there was just an error in showing up on the graph?Also in the signs/s chart the Dual Woodcrest tops out at just over 6,000 and the Dual Opteron tops out at just over 5,000, which is a 20% lead, yet the author writes "The Opteron at 2.4 GHz has no trouble keeping up with the 3 GHz Woodcrest." I'm not trying to be a nitpicky fanboy here but being beaten by 20% isn't "keeping up," at least not in my defition of the expression.
Finally, I have a question. Why are there no Windows-based tests? I know that LAMP is very popular in the webserving part of the server world but in most other server/enterprise areas it's mostly Windows, SQL Server, Visual Studio, .NET, etc. I'd like to see some benchmarks that use software that those of us in the non-webserving community are most likely to use. I know there's no chance of running the UltraSPARC in a Windows configuration but quite frankly who cares. I would like to see which of the x86 offerings (which is FAR more likely to be used) is better.