Intel Xeon 3.6 2MB vs AMD Opteron 252 Database Test
by Jason Clark & Ross Whitehead on February 14, 2005 8:00 AM EST- Posted in
- IT Computing
"Order Entry" Stress Test: Measuring Enterprise Class Performance
One complaint that we've historically received regarding our Forums database test was that it isn't strenuous enough for some of the Enterprise customers to make a good decision based on the results.
In our infinite desire to please everyone, we worked very closely with a company that could provide us with a truly Enterprise Class SQL stress application. We cannot reveal the identity of the Corporation that provided us with the application because of non-disclosure agreements in place. As a result, we will not go into specifics of the application, but rather provide an overview of its database interaction so that you can grasp the profile of this application, and understand the results of the tests better (and how they relate to your database environment).
We will use an Order Entry system as an analogy for how this test interacts with the database. All interaction with the database is via stored procedures. The main stored procedures used during the test are:
sp_AddOrder - inserts an Order
sp_AddLineItem - inserts a Line Item for an Order
sp_UpdateOrderShippingStatus - updates a status to "Shipped"
sp_AssignOrderToLoadingDock - inserts a record to indicate from which Loading Dock the Order should be shipped
sp_AddLoadingDock - inserts a new record to define an available Loading Dock
sp_GetOrderAndLineItems - selects all information related to an Order and its Line Items
The above is only intended as an overview of the stored procedure functionality; obviously, the stored procedures perform other validation, and audit operations.
Each Order had a random number of Line Items, ranging from one to three. Also randomized was the Line Items chosen for an order, from a pool of approximately 1500 line items.
Each test was run for 10 minutes and was repeated three times. The average between the three tests was used. The number of Reads to Writes was maintained at 10 reads for every write. We debated for a long while about which ratio of reads to writes would best serve the benchmark, and we decided that there was no correct answer. So, we went with 10.
The application was developed using C#, and all database connectivity was accomplished using ADO.NET and 20 threads - 10 for reading and 10 for inserting.
So, to ensure that IO was not the bottleneck, each test was started with an empty database and expanded to ensure that auto-grow activity did not occur during the test. Additionally, a gigabit switch was used between the client and the server. During the execution of the tests, there were no applications running on the server or monitoring software. Task Manager, Profiler, and Performance Monitor were used when establishing the baseline for the test, but never during execution of the tests.
At the beginning of each platform, both the server and client workstation were rebooted to ensure a clean and consistent environment. The database was always copied to the 8-disk RAID 0 array with no other files present to ensure that file placement and fragmentation was consistent between runs. In between each of the three tests, the database was deleted, and the empty one was copied again to the clean array. SQL Server was not restarted.
97 Comments
View All Comments
Viditor - Monday, February 14, 2005 - link
"DMA operations initiated by a peripheral device that does not directly support 64-bit addressing will have performance issues"I'm not sure you are correct in this...I believe the issue is
"physical addresses above 4GB (32 bits) cannot reliably be the source or destination of DMA operations"
I found another article that explains my concern quite well...
http://www.spodesabode.com/content/article/nocona/...
"Unlike the Itanium, which is solely a 64-Bit processor, these chips have the ability to run in both 32-Bit and 64-Bit mode. Some devices, such as a large majority of PCI cards cannot directly access memory above the 4GB point. To solve this, the software has to ensure the physical memory address is below the 4GB point. AMD solved this solution by using a hardware IOMMU, which is effectively a "bounce buffer" or look-up table of physically memory addresses corresponding to a virtual address that is given to the incompatible hardware, allowing it to use memory above the 4GB barrier.
Intels solution isn't quite as elegant. If a device needs to access memory above the 4GB point, the data is just copied from wherever it is, to a fixed location below the 4GB point. This takes time and can reduce performance. In extreme cases we have heard there could be as much as 30-50% decrease in performance on the Nocona platform"
This does not appear to be a 64bit driver issue to me as no mention in any of the access scenarios is described as 32bit...
Accord99 - Monday, February 14, 2005 - link
It's an Intel thing I think, for why they don't have an IOMMU. Even their chipsets for the Itanium 2 don't have one while HP and SGI's chipsets do. Or perhaps Intel just wants (and has the power to force) peripheral manufacturer's to make proper 64-bit devices and drivers.Viditor - Monday, February 14, 2005 - link
OIC what you are saying...and yes, it's a problem with the chipset. Of course that is exactly what I said in the first place..."Because there is still no hardware IOMMU on Xeon chipsets"
The big question is, why hasn't Intel fixed this?
I can only assume that it is a design problem for them that is inherent to EM64T...
I can't imagine that they would just let this slide on their chipset development.
What that problem is, I have no idea...I would just like to see what effect it has on system function.
Accord99 - Monday, February 14, 2005 - link
The linuxhardware article supports what I'm saying, DMA operations initiated by a peripheral device that does not directly support 64-bit addressing will have performance issues. Server-level peripherals typically support 64-bit addressing and it is not a problem with the CPU, or the EMT64 instruction set, it is a problem with the chipset. It does not affect the Xeon's ability to addres flatly >4GB of memory.Viditor - Monday, February 14, 2005 - link
I don't believe I do...I have read that post before, and I don't see your point.
Try reading this article to understand what I'm saying:
http://www.linuxhardware.org/article.pl?sid=04/10/...
“Software IOTLB — Intel EM64T does not support an IOMMU in hardware while AMD64 processors do. This means that physical addresses above 4GB (32 bits) cannot reliably be the source or destination of DMA operations. Therefore, the Red Hat Enterprise Linux 3 Update 2 kernel "bounces" all DMA operations to or from physical addresses above 4GB to buffers that the kernel pre-allocated below 4GB at boot time. This is likely to result in lower performance for IO-intensive workloads for Intel EM64T as compared to AMD64 processors.”
Although this shouldn't affect people that run with under 4GB of memory, this is an important point to note. If you do ever need the extra memory, you may take a performance hit. Unfortunately, we do not have over 4GB of DDR2 memory here today so we will not be able to test how much of a hit you would take if any"
The bottom line is that many believe (including myself) the physical addressing will be a significant problem, and many (including you) don't.
That's why I have requested that AT do an actual test...nothing like reality to settle a discussion...:-)
BTW, thanks for correcting my typo...
Dubb - Monday, February 14, 2005 - link
/taps fingers impatiently waiting on rendering benchmarks...which hopefully include (hint hint)
mental ray
brazil
renderman
Accord99 - Monday, February 14, 2005 - link
Your understanding of the IOMMU is wrong. Please refer to this thread:http://realworldtech.com/forums/index.cfm?action=d...
Also, the Xeon supports 36-bits.
Viditor - Monday, February 14, 2005 - link
Accord99 - "The IOMMU is only used for peripherals that don't support 64-bit addressing"The IOMMU is a memory mapping unit sitting between the I/O bus and physical memory. While the memory controller of the Xeon can address 64bit, it uses PAE to do so because current chipsets only address 32 bits. The on-die memory controller for AMD64 chips address 40 bits...
Accord99 - Monday, February 14, 2005 - link
The hardware IOMMU has no impact on the Xeon's ability to flat address >4GB of memory. The IOMMU is only used for peripherals that don't support 64-bit addressing, ie USB 2.0 cards, EIDE controllers, soundcards, some network controllers and will reduce IO performance for these devices. High-end 64-bit SCSI controllers, gigabit network controllers and newer SATA controllers all support 64-bit addressing and run at full performance.Viditor - Monday, February 14, 2005 - link
One other request (if it's possible)...Just so we can get a well-rounded view on the results, is it possible for you to do a Solaris/Oracle and Linux/MySQL (or Linux/Postgres) test?
I realise that I'm asking a lot, but if you have the time...:-)