KVM: Optimizing performance on virtual machines (VM`s)

After having set up quite a few VM`s in my career, i have picked up a couple of tips on how to get the most power out of your VM`s:

Get new/correct drivers for your VM`s
Remember to make sure that you have all the correct drivers, this is at least important for IO(Disk) and Network devices. Windows has many devices that will work with basic Microsoft drivers, but that does not mean that the performance magically gets awesome. After making sure that the correct drivers were in place, i managed to go from ~900Mbit to 9.9Gbit on a 10gbit-network between a Linux (Red Hat) and Windows 2k8 server. (tested with iperf which exists for both Windows and Linux)

Turn off Power saving options in BIOS / Hardware
More or less all servers, either they being brick servers, home servers or blade servers, have BIOS settings that enables or disables power saving mode. I know from experience that at least all HP blades comes with power saving enabled per default. Turn off this to make sure that your VM`s gets the performance they expect to get. (I have had VM`s simply be sluggish with this feature turned on, turning it off made CPU-performance get normal)

Turn off CPU throttling on the VM host machine
I have also had issues with slow VM`s even when the power options was fixed in BIOS. I then realized that some Linux distributions (Ubuntu) have a default CPU scheduler that throttles down the CPU when it is not needed. After making sure that the host did NOT do this, the VM`s finally started acting as they should. Check out your Linux distributions guides on how to change this.

 

Any other tips i should add to the list? Feel free to add a comment below! ūüôā

HP iLO: Controller firmware revision 2.07.00 Embedded media manager failed initialization

I just¬†received¬†a notice during boot of a HP G8 server regarding iLO¬†issues: “iLO 4 has detected a self test error. For details consult the iLO¬†4 server and iLO¬†4 diagnostics page”.

Went into the iLO diagnostics page (in the¬†Blade Center¬†admin) and found the error message “Controller firmware revision 2.07.00 Embedded media manager failed initialization”. Google does not give too much info regarding this, so it is a bit hard to see what is going on regarding this. I tried to restart both the blade and the iLO itself, but the error did not go away.

I had to contact HP regarding this, and the conclusion(after both having to try a firmware upgrade and firmware re install), was that it was a hardware error on the iLO storage, so HP had to replace the whole motherboard for that server.

Xerox ColorQube error code 001-535

New rounds with problems on a Xerox ColorQube, this time i was greeted with error code “001-535”. This “translates” to problems with the power supply.This was not resolved until a technician came on site with a new power supply to replace the old one.

Xerox-printer giving error code 016-799

Just figured out that some Xerox-printers when getting a job with a unknown paper format or other strange settings. The printer will simply return the error code 016-799.

The only solution i has found for this so far is to make sure that the print jobs actually have a paper format that the printer supports. (And of course, no other strange settings…)

Pdf-files for example can in some cases have print settings embedded, so it is extra smart to check those before printing them.

The effects of old NIC drivers on Data Protector

During the Data Protector migration we noticed that the performance on the new Data Protector Cell Manager was a lot lower than the performance on the old solution. Speeds of around 100Mbit was the best we ever could get from it, although the machine is running on a gigabit network.

To make the case even more strange, performance on other network tasks was quite normal where the gigabit  connection was maxed ASAP.

After some digging, the problem turned out to be that the server was running the old default drivers for the NIC (Broadcom BCM5708S netXtreme II GigE). Windows update has in the latest Windows versions become better on delivering driver updates but that was not the case this time.

After installing the newest drivers from Broadcom`s website the performance jumped straight to gigabit even on Data Protector.

Expanding a kVM disk image

Had to expand a KVM virtual machine today. Luckily, that`s pretty straight forward. You simply create a new disk image with the extra size needed, merge it into the original disk and voila. Then you just need to partition in the extra space and you are good to go.

How-to:

1: Halt your virtual machine.

You need to stop your virtual machine before going wild with the drive. Virsh stop <vm name>, or virsh destroy <vm name> if it somehow wont stop.

2: Create a disk with the extra space needed:

qemu-img create -f raw 5gig.img 5G

3: Merge it into the disk you are working with

cat 5gig.img >> yourdisk.img

4: Boot up and and partition your drive.

Then start up your virtual machine again with virsh start <vm name>. If you use Windows server, all you need to do is to visit disk managent, right click your drive with little free space and choose “extend partition”. The job takes seconds and does not require any reboot.

Cannot open exchanger control device ([2] The system cannot find the file specified)

You will receive this error from (among others) Data Protector if the D2D/Tape unit has gone offline, or is not reachable from the machine controlling it. Normally the Data Protector cell manager. In Windows 2008 server you can make sure that Windows is connected correctly via the iSCSI Initiator which you can find in the Control Panel.

Tuning Ubuntu mdadm RAID5/6

If you are using mdadm RAID 5 or 6 with Ubuntu, you might notice that the performance is not all uber all the time. Reason for this is that the default tuning settings for Ubuntu is set to rather motdest values. These can lucikly easily be tuned. I will in this article increase some settings until my read and write performance against my RAID 6 has been improved a lot.

My setup:
CPU: Intel(R) Core(TM)2 Quad CPU Q9300
RAM: 16G
Drives: 11 drives in one RAID6 with drives split over two cheap PCI-E x4 controllers and the motherboard`s internal controller.

I will test my system between each tuning by using dd for read and write testing. Since i have a nice amount of RAM available, i will use a test file of 36G. (bs=16k) Between each test (both read and write), i clear the OS disk cache with the command:

sync;echo 3 > /proc/sys/vm/drop_caches

Tuning stripe_cache_size

stripe_cache_size affects RAM used by mdadm to writing of data. Ubuntu`s default value is 256, you can verify your value by doing:

cat /sys/block/md0/md/stripe_cache_size

And changing it with:

echo *number* > /sys/block/md0/md/stripe_cache_size

Test results with stripe_cache_size=256
РWrite performance: 174 MB/s

Not to good, i therefore increased it some levels, each level with result is described below:

Test results with stripe_cache_size=512
РWrite performance: 212 MB/s

Test results with stripe_cache_size=1024
РWrite performance: 237 MB/s

Test results with stripe_cache_size=2048
РWrite performance: 254 MB/s

Test results with stripe_cache_size=4096
РWrite performance: 295 MB/s

Test results with stripe_cache_size=8192
РWrite performance: 362 MB/s

Test results with stripe_cache_size=16384
РWrite performance: 293 MB/s

Test results with stripe_cache_size=32768
РWrite performance: 326 MB/s

So, going from 256 to 32K ~doubled my write performance, not bad! ūüôā

Tuning Read Ahead

Time to change a bit on read ahead, which should impact read performance.¬†Default read ahead value is “1536”, and you can change it with the command:

blockdev --setra *number* /dev/md0

Test results with Read Ahead @ 1536
РRead performance: 717 MB/s

Test results with Read Ahead @ 4096
РRead performance: 746 MB/s

Test results with Read Ahead @ 32768
РRead performance: 731 MB/s

Test results with Read Ahead @ 262144
РRead performance: 697 MB/s

Test results with Read Ahead @ 524288
РRead performance: 630 MB/s

So oposite of the write performance tuning, this actually became worse for most of the settings. So 4096 is the best for my system.

In conclution

This is just an example on how different settings can have rather large impact on a system, both for the better and for the worse. If you are going to tune your system you have to test different setting for yourself and see what works best for your setup. ¬†Higher values does not automaticly mean better results. I ended up with “stripe_cache_size=8192” and “Read Ahead @ 4096” for my system.

If you want to make sure that your changes is saved when rebooting the system, remember to add these commands (with your values) in /etc/rc.local.

Two Crucial C300 in RAID0 on a M4A87TD motherboard causes BSOD`s, hangs in Win7

So i have tried to use Two Crucial C300 in RAID0 on a M4A87TD motherboard for a while now. After a while i started experiencing BSOD`s, hangs, crashes, freezes etc. I tried to debug -everything- until i came over multiple complaints about the same on the Crucial forums. (Not exactly the same, but enough to check it out)

I decided to simply remove the RAID and only use one drive for Windows 7. So far it appears to be very stable again, so hopefully the issues was related to the combination of two SSD`s in RAID0 on a AMD-RAID controller.

Time will tell.

Under 4 months left before we run out of IPv4

The time to start implementing IPv6 is getting closer and closer. The “counters” over IPv4 now says that is just below four months left before the “apocalypse” arrive. ¬†It will of course not be any¬†noticeable¬†problem for end users but ISP`s should now really start planning for IPv6 if the¬†haven’t¬†done so yet.

HP R5500 XR UPS: Low battery warning light

I have noticed the manual for the HP R5500 XR UPS does not explain all possible solutions when you want to ged rid of the low battery warning light, so here is a quick how to for resolving the issue on your UPS.

  1. Start off by checking the software for the UPS,  or install it if missing (HP Power Manager)
  2. Consider giving the battery fuse a kick, it`s the fuse placed back up to the right amongst the outlets (Leave it off for 15 seconds++)*
  3. If you still dont have any luck, replace your battery/batteries.
  4. If you are using ERM`s, make sure that the UPS knows about them. (Read the manual)

PS: Remember to have¬†redundant¬†power set up, since you might have to give the whole ups a little kick if it`s acting very silly ūüôā

* = When turning off the battery fuse, the outlets will still get power from your input power, it just disables the batteries. When turning it on again, the UPS will start a recharge of them.

Cannot load exchanger medium (Target drive is busy.)

If you recieve the error message:

Cannot load exchanger medium (Target drive is busy.)

From Data Protector Manager (or from the backup reports), you have a tape “stuck” in your tape station. This is something that can happen for example if the tape station looses connection to the software using the tape station, (for example ¬†a crash on the server) and the software is not setup to automaticly correct the issue itself.

If you want to eject the tape in use you will normally find a option for it on the tape station itself. But a good thing to do is to set up Data Protector to automaticly eject a tape that has taken over a drive.

1) Start Data Protector Manager GUI
2) Select Devices and Media
3) Right click on the library, select properties
4) Go to Control Tab, and select “Eject medium” ¬†under the busy drive handling topic