AMD's 3rd generation Opteron versus Intel's 45nm Xeon: a closer look
by Johan De Gelas on November 27, 2007 6:00 AM EST- Posted in
- IT Computing
Conclusion
When it comes to floating-point performance, we feel we can say we have a very good picture of what AMD's and Intel's best are capable off. The Barcelona floating-point architecture is able to beat the 53xx in quite a few benchmarks, but the Xeon 5472 shows that AMD's third generation Opteron is late to the party. Our FLOPS, LINPACK, and rendering benchmarks show that the Xeon 5472 is at least as good as or better than AMD's latest in raw FP performance on a clock-for-clock basis.
We have less data on "pure" integer performance, with the exception of our Fritz Chess benchmark. This benchmark gave us a first hint that the improvement in integer performance from the Opteron 22xx to the Opteron 23xx is probably rather small. The single-threaded SpecInt2006 numbers published by IBM are probably not optimal, but also confirm this:
- A 1.9GHz Opteron 2347 got a score of
11.3, 9.97 base
- A 2GHz Opteron 2212 gets a score of 10.8, 9.77 base
- A 2GHz Xeon E5335 gets a score of 15.5, 14 base
This indicates that the Opteron 23xx is about 10% faster in integer tasks than the 22xx series. Considering that the best SPECint_rate2006 score of AMD's quad-core at 2.5GHz is 102 while Intel's 5460 (3.16GHz) is already at 138, we think it is safe to assume that the integer performance of AMD's Barcelona is still not up to Intel Core levels. The Xeon 5365 at 3GHz is also able to deliver a significantly higher score (117). This, together with our own benchmark data, makes us believe that the Xeon 54xx based on the Penryn architecture will beat the best AMD chips on every aspect of raw processing performance: integer, legacy x87 FP, and SIMD (SSE). It is clear now why Intel's CPUs are so dominant in desktop and workstation workloads.
Add to this a significant clock advantage: there is already a 3.2GHz Xeon 5485 (150W). If you prefer a less power hungry CPU, Intel can provide a 3GHz 5472 that is still clocked 20% higher than what AMD will be able to deliver 2 to 3 months later. Although the 3GHz models are quite pricey (>$1000), you can already find a 2.5GHz quad-core Xeon for $316. That's the same price as a 1.9GHz Opteron 2347 chip. There is little doubt in our mind that a 2.5GHz Xeon is faster in almost every application we can think off, so Intel's newest Xeon does have the price/performance crown as well.
While AMD loses quite a few battles, the war is far from lost. The server/HPC situation is entirely different from the desktop scene where the Core 2 Quad overpowers the Phenom in almost every benchmark. There is more to server and HPC performance than simple raw processing power. Intel's flagship still has an Achilles heel: the platform it is running on has higher latency and much lower bandwidth than AMD's platform. Once you really stress all those cores with many threads, AMD's platform starts to pay off.
Look at the summary of our benchmarking below. (Blue numbers mean Intel is faster; green show a victory for the AMD chip).
AMD vs. Intel Performance Summary | ||
General applications | Opteron 2360SE vs. Xeon E5365 |
Opteron 2360SE vs. Xeon 5472 |
WinRAR 3.62 | 23% faster | 6% faster |
Fritz Chess engine | 24% slower | 26% slower |
HPC applications | ||
LINPACK | 4% slower* | 9% slower* |
3D Applications | ||
3DS Max 9 | 19% slower | 25 % slower |
zVisuel 3D Kribi Engine | 7% faster | 14% slower |
zVisuel 3D Kribi Engine (AA) | 2% slower | 23% slower |
Server applications | ||
SPECjbb (Sun) | 28% faster | 11% faster |
SPECjbb (BEA) | 12% faster | 12% slower |
MySQL | 14% faster | Equal |
* Faster LINPACK binaries from Intel were available at the time that we finished this article.
To put it in car terms, our SPECjbb, LINPACK, and MySQL benchmarks have shown that Intel's "powerful CPU engines" sometimes have problems putting the "massive torque" to the "wheels". You may feel for example that using four instances in our SPECjbb test favors AMD too much, but there is no denying that using more virtual machines on fewer physical servers is what is happening in the real world. Intel's best have a solid lead over AMD's quad-core in rendering benchmarks, but some HPC, Java and MySQL benchmarks show that the 2.5GHz Barcelona is able to keep up with (or come close to) a 3GHz Xeon 5472. That is impressive, on the condition that we finally see some higher clocked Opteron 23xx chips in commercially available servers.
We still cannot draw any solid conclusion on the server performance of AMD's quad-core as no MS Exchange, SAP ERP, TPC-C, or TPC-H results have been published. In fact, with the exception of the SPECjbb and MySQL numbers in this article, all server benchmarks on AMD's third generation Opteron are MIA. This situation will probably continue for a few more months as most of these benchmark results traditionally come from OEMs and not AMD.
43 Comments
View All Comments
befair - Friday, November 28, 2008 - link
ok .. getting tired of this! Intel loving Anandtech employs very unfair & unreasonable tactics to show AMD processors in bad light every single time. And most readers have no clue about the jargon Anandtech uses every time.1 - HPL needs to be compiled with appropriate flags to optimize code for the processor. Anandtech always uses the code that is optimized for Intel processors to measure performance on AMD processors. As much as AMD and Intel are binary compatible, when measuring performance even a college grad who studies HPC knows the code has to be recompiled with the appropriate flags
2 - Clever words: sometimes even 4 GFLOPS is described as significant performance difference
3- "The Math Kernel Libraries are so well optimized that the effect of memory speed is minimized." - So ... MKL use is justified because Intel processors need optimized libraries for good performance. However, they dont want to use ACML for AMD processors. Instead they want to use MKL optimized for Intel on AMD processors. Whats more ... Intel codes optimize only for Intel processors and disable everything for every other processors. They have corrected it now but who knows!! read here http://techreport.com/discussions.x/8547">http://techreport.com/discussions.x/8547
I am not saying anything bad about either processor but an independent site that claims to be fair and objective in bringing facts to the readers is anything but fair and just!!! what a load!
DonPMitchell - Friday, December 7, 2007 - link
I think a lot of us are intrigued by AMD's memory architecture, its ability to support NUMA, etc. A lot of benchmarch test how fast a small application runs with a high cash-hit rate, and that's not necessarily interesting to everyone.The MySQL test is the right direction, but I'd rather see numbers for a more sophisticated application that utilizes multiple cores -- Oracle or MS SQL Server, for example. These are products designed to run on big iron like Unisys multi-proc servers, so what happens when they are running on these more economical Harpertown or Barcelona.
kalyanakrishna - Thursday, November 29, 2007 - link
http://scalability.org/?p=453">http://scalability.org/?p=453kalyanakrishna - Thursday, November 29, 2007 - link
a much better review than the original one. But I still see some cleverly put sentences, wish it were otherwise.Viditor - Thursday, November 29, 2007 - link
Nice review Johan!On the steppimgs note you made, it's not the B2 stepping that is supposed to perform better, it's the BA stepping...
The BA stepping was the improved form for B1s, and the B3 stepping is the improved form of the B2. BA and B2 came out at the same time in Sept (though BA was the one launched, B1 was what was reviewed), B2 for Phenom and performance clockspeeds, BA for standard and low power chips.
Do you happen to have a BA chip to test (those are the production chips)?
BitByBit - Wednesday, November 28, 2007 - link
Despite K10's rather extensive architectural improvements, it looks likes its core performance isn't too different to K8. In fact, the gains we've seen so far could easily be attributable to the improved memory controller and increased cache bandwidth. It seems that introducing load reordering, a dedicated stack, improved branch prediction, 32B instruction fetch, and improved prefetching has had little impact, certainly far less than expected. The question is, why?JohanAnandtech - Wednesday, November 28, 2007 - link
Well, we are still seeing 5-10% better integer performance on applications that are runing in the L2, so it is more than just a K8 with a better IMC. But you are right, I expected more too.However, the MySQL benchmark deserves more attention. In this case the Barcelona core is considerably faster than the previous generation (+ 25%). This might be a case where 32 bit fetch and load reordering are helping big time. But unfortunately our Codeanalyst failed to give all the numbers we needed
BaronMatrix - Wednesday, November 28, 2007 - link
At any rate, it was the most in-depth review I've seen, especially with the code analysis. I too, thought it would be higher, but remember that Barcelona is NOT HT3 and doesn't have the advantage of "gangning\unganging." There was an interesting article recently that showed perf CAN be improved by unganging (maybe it was ganging, can't find it) the HT3 links.I really hate that OEMs decided to stand up to the big, bad AMD and DEMAND that Barcelona NOT have HT3 with ALL OF ITS BENEFITS.
I mean people complain that Barcelona uses more power, but HT3 would cut that somewhat. At least in idle mode, and even in cases where IMC is used more than the CPU or vice versa.
I also may as well use this to CONDEMN all of these "analysts" who insist on crapping on the underdog that keeps prices reasonable and technology advancing.
INSERT SEVERAL EXPLETIVES. REPEATEDLY. FOR A FEW DAYS. A WEEK. FOR A YEAR.
INSERT MORE EXPLETIVES.
donaldrumsfeld - Wednesday, November 28, 2007 - link
Conjecture regarding why AMD went quad core on the same die... and this has nothing to do with performance. I think one place where Intel is way ahead of AMD is package technology. Remember they were doing a type of Multichip module with the P6. Having 2 dice instead of a single die allows them to have an overall lower defect rate, higher yield, and higher GHz. This is vs. AMD's lower GHz but (it was hoped) greater data efficiency using an L3 die and lower latency of on-die communications amongst cores vs. Intel's solution of die to die communication.Can anyone confirm/deny this?
thanks
tshen83 - Tuesday, November 27, 2007 - link
Seriously, can you buy the 2360SE? Newegg doesn't even stock the 1.7Ghz 2344HEs.The same situation exist on the Phenom line of CPUs. I don't see the value of reviewing Phenom 9700, 9900s when AMD cannot deliver them. I have trouble locating Phenom 9500s.