Oracle VM Review
VMware ESX vs. Oracle VM


This is a very simple post to show the results of some recent testing that Tom and I ran using Oracle SLOB on Violin to determine the impact of using virtualization. But before we get to that, I am duty bound to write a paragraph of text featuring lots of long sentences peppered with industry buzz words. Forgive me, it’s just the way I’m wired.

It is increasingly common these days to find database environments running in virtual machines – even large, business critical ones. The driver is the trend to commoditize I.T. services and build consolidated, private-cloud style solutions in order to control operational expense and increase agility (not to mention reduce exposure to Oracle licenses). But, as I’ve said in previous posts, the catalyst has been the unblocking of I/O as legacy disk systems are replaced by flash memory. In the past, virtual environments caused a kind of I/O blender effect whereby I/O calls become increasingly randomized – and this sucked for the performance of disk drives. Flash memory arrays on the other hand can deliver random I/O all day long because… well, if you don’t know the reasons by now can I just recommend starting at the beginning. The outcome is that many large and medium-sized organisations are now building database-as-a-service platforms with Oracle databases (other database products are available) running in virtual machines. It’s happening right now.

Phew. Anyway, that last paragraph was just a wordy way of telling you that I’m often seeing Oracle running in virtual machines on top of hypervisors. But how much of a performance impact do those hypervisors have? Step this way to find out.

The Contenders

When it comes to running Oracle on a hypervisor using Intel x86 hardware (for that is what I have available), I only know of three real contenders:

Hyper-V has been an option for a couple of years now, but I’ll be honest – I have neither the time nor the inclination to test it today. It’s not that I don’t rate it as a product, it’s just that I’ve never used it before and don’t have enough time to learn something new right now. Maybe someday I’ll come back and add it to the mix.

In the meantime, it’s the big showdown: VMware versus Oracle VM. Not that Oracle VM is really in the same league as VMware in terms of market share… but you know, I’m trying to make this sound exciting.

The Test

This is going to be an Oracle SLOB sustained throughput test. In other words, I’m going to build an Oracle database and then shovel a massive amount of I/O through it (you can read all about SLOB here and here). SLOB will be configured to run with 25% of statements being UPDATEs (the remainder are SELECTs) and will run for 8 hours straight. What we want to see is a) which hypervisor configuration allows the greatest I/O bandwidth, and b) which hypervisor configuration exhibits the most predictable performance.

This is the configuration. First the hardware: 

Violin Memory 6616 flash Memory Array

This is the configuration. First the hardware

  • 1x Dell PowerEdge R720 server
  • 2x Intel Xeon CPU E5-2690 v2 10-core @ 3.00GHz [so that’s 2 sockets, 20 cores, 40 threads for this server]
  • 128GB DRAM
  • 1x Violin Memory 6616 (SLC) flash memory array [the one that did this]
  • 8GB fibre-channel

And the software:

  • Hypervisor: VMware ESXi 5.5.1
  • Hypervisor: Oracle VM for x86 3.3.1
  • VM: Oracle Linux 6 Update 5 (with the Unbreakable Enterprise v3 Kernel 3.6.18)
  • Oracle Grid Infrastructure 11.2.0.4 (for Automatic Storage Management)
  • Oracle Database Enterprise Edition 11.2.0.4

Each VM is configured with 20 vCPUs and is using Linux Device Mapper Multipath and Oracle ASMLib. ASM is configured to use one single +DATA disgroup comprising 8 ASM disks (LUNs from Violin) with external redundancy. The database parameters and SLOB settings are all listed on the SLOB sustained throughput test page.

Results: Bare Metal (Baseline)

First let’s see what happens when we don’t use a hypervisor at all and just run OL6.5 on bare metal:

IO Profile Read+Write/Second Read/Second Write/Second
Total Requests 232,431.0 194,452.3 37,978.7
DB Requests
228,909.4 194,447.9 34,461.5
Optimized Requests 0.0 0.0 0.0
Redo Requests 3,515.1 0.3 3,514.8
Total(Mb) 1,839.6 1,519.2 320.4

Ok so we’re looking at 1519 MB/sec of read throughput and 320 MB/sec of write throughput. Crucially, the lines are nice and consistent – with very little deviation from the mean. By dividing the amount of time spent waiting on db file sequential read(i.e. random physical reads) with the number of waits, we can calculate that the average latency for random reads was 438 microseconds.

Results: VMware vSphere

VMware is configured to use Raw Device Mapping (RDM) which essentially gives the benefits of raw devices… read here for more details on that. Here are the test results:

IO Profile Read+Write/Second Read/Second Write/Second
Total Requests 173,141.7 145,066.8 28,075.0
DB Requests
170,615.3 145,064.0 25,551.4
Optimized Requests 0.0 0.0 0.0
Redo Requests 2,522.8 0.1 2,522.7
Total(Mb) 1,370.0 1,133.4 236.7

Average read throughput for this test was 1133 MB/sec and write throughput averaged at 237 MB/sec. Average read latency was 596 microseconds. That’s an increase of 36%.

In comparison to the bare metal test, we see that total bandwidth dropped by around 25%. That might seem like a lot but remember, we are absolutely hammering this system. A real database is unlikely to ever create this level of sustained I/O. In my role at Violin I’ve been privileged to work on some of the busiest databases in Europe – nothing is ever this crazy (although a few do come close).

Results: Oracle VM

Oracle VM is based on the Xen hypervisor and therefore uses Xen virtual disks to present block devices. For this test I downloaded the Oracle Linux 6 Update 5 template from Oracle’s eDelivery site. You can see more about the way this VM was configured here. Here are the test results:

IO Profile Read+Write/Second Read/Second Write/Second
Total Requests 160.563.8 134,592.9 25,970.9
DB Requests 158,538.1 134,587.3 23,950.8
Optimized Requests 0.0 0.0 0.0
Redo Requests 2,017.2 0.2 2,016.9
Total(Mb) 1,273.4 1,051.6 221.9

This time we see average read bandwidth of 1052MB/sec and average write bandwidth of 222MB/sec, with the average read latency at 607 microseconds, which is 39% higher than the baseline test.

Meanwhile, total bandwidth dropped by 31%. That’s slightly worse than VMware, but what’s really interesting is the deviation. Look at how ragged the lines are on the OVM test! There is a much higher degree of variance exhibited here than on the VMware test.

Conclusion

This is only one test so I’m not claiming it’s conclusive. VMware does appear to deliver slightly better performance than OVM in my tests, but it’s not a huge difference. However, I am very much concerned by the variance of the OVM test in comparison to VMware. Look, for example, at the wait event histograms for db file sequential read:

Wait Event Histogram
-> Units for Total Waits column: K is 1000, M is 1000000, G is 1000000000
-> % of Waits: value of .0 indicates value was <.05%; value of null is truly 0
-> % of Waits: column heading of <=1s is truly <1024ms, >1s is truly >=1024ms
-> Ordered by Event (idle events last)

% of Waits

Hypervisor Event Total Watts <1ms <2ms <4ms <8ms <16ms <32ms <=1ms >1s
Baremetal db file sequential read 5557 98.7 1.3 0.0 0.0 0.0 0.0
VMWare ESX db file sequential read 4164 92.2 6.7 1.1 0.0 0.0 0.0
Oracle VM db file sequential read 3834 95.6 4.1 0.1 0.1 0.0 0.0 0.0 0.0

The OVM tests show occasional results in the two highest buckets, meaning once or twice there were waits in excess of 1 second! However, to be fair, OVM also had more millisecond waits than VMware.

Anyway, for now – and for this setup at least – I’m sticking with VMware. You should of course test your own workloads before choosing which hypervisor works for you…

Thanks as always to Kevin for bringing Oracle SLOB to the community.

Disclosure: My company has a business relationship with this vendor other than being a customer: I work for Violin Memory
2 visitors found this review helpful

4 Comments

it_user90075Vendor

Why did you use RDMs for the VMware test but virtual disks for OVM? Oracle makes it very clear in all it's literature that raw LUNS should be used with ODB on OVM, in fact they are required for production support in the case of RAC. Raw disks are as easy to use as virtual disks in OVM - you just add the physical disks/LUNS in the same place you would add virtual disks.

19 May 15
flashdbaVendor

Maybe the language I used to describe that could have been better. I did use raw devices, it's just that when you view raw devices in the VM they appear as /dev/xvd* devices, which stands for Xen Virtual Device. However, they are the OVM equivalent of RDMs in VMware.

23 May 15
it_user244857Vendor

Nicely done - I was also curious why you would introduce a variable as Wayne pointed out.

However, I see you were being very literal in your use of the term "Xen virtual disk". I also read your explanation of how you implemented the storage elements, your detailed description of information returned from VPD queries and use of I/O scheduler.

Not knowing much about VMware, I would really like to see the same in-depth explanation and analysis of how you configured the raw disks for the test. Do you have a link to something similar explaining how you configured the storage for VMware as you did for Oracle VM?

I followed the links to the VMware documentation you provided, but I'm really looking for the same type of information you presented in your own words (it is quite good by-the-way)

26 May 15
Jari RaatikainenReal User

On OVM, you just need few tunable parameters on PVM virtualmachine to make it almost as fast as bare metal is. Perhaps someday I share those to global world for free.

04 May 16
Guest
Why do you like it?

Sign Up with Email