Tuning Ubuntu mdadm RAID5/6

If you are using mdadm RAID 5 or 6 with Ubuntu, you might notice that the performance is not all uber all the time. Reason for this is that the default tuning settings for Ubuntu is set to rather motdest values. These can lucikly easily be tuned. I will in this article increase some settings until my read and write performance against my RAID 6 has been improved a lot.

My setup:
CPU: Intel(R) Core(TM)2 Quad CPU Q9300
RAM: 16G
Drives: 11 drives in one RAID6 with drives split over two cheap PCI-E x4 controllers and the motherboard`s internal controller.

I will test my system between each tuning by using dd for read and write testing. Since i have a nice amount of RAM available, i will use a test file of 36G. (bs=16k) Between each test (both read and write), i clear the OS disk cache with the command:

sync;echo 3 > /proc/sys/vm/drop_caches

Tuning stripe_cache_size

stripe_cache_size affects RAM used by mdadm to writing of data. Ubuntu`s default value is 256, you can verify your value by doing:

cat /sys/block/md0/md/stripe_cache_size

And changing it with:

echo *number* > /sys/block/md0/md/stripe_cache_size

Test results with stripe_cache_size=256
– Write performance: 174 MB/s

Not to good, i therefore increased it some levels, each level with result is described below:

Test results with stripe_cache_size=512
– Write performance: 212 MB/s

Test results with stripe_cache_size=1024
– Write performance: 237 MB/s

Test results with stripe_cache_size=2048
– Write performance: 254 MB/s

Test results with stripe_cache_size=4096
– Write performance: 295 MB/s

Test results with stripe_cache_size=8192
– Write performance: 362 MB/s

Test results with stripe_cache_size=16384
– Write performance: 293 MB/s

Test results with stripe_cache_size=32768
– Write performance: 326 MB/s

So, going from 256 to 32K ~doubled my write performance, not bad! 🙂

Tuning Read Ahead

Time to change a bit on read ahead, which should impact read performance. Default read ahead value is “1536”, and you can change it with the command:

blockdev --setra *number* /dev/md0

Test results with Read Ahead @ 1536
– Read performance: 717 MB/s

Test results with Read Ahead @ 4096
– Read performance: 746 MB/s

Test results with Read Ahead @ 32768
– Read performance: 731 MB/s

Test results with Read Ahead @ 262144
– Read performance: 697 MB/s

Test results with Read Ahead @ 524288
– Read performance: 630 MB/s

So oposite of the write performance tuning, this actually became worse for most of the settings. So 4096 is the best for my system.

In conclution

This is just an example on how different settings can have rather large impact on a system, both for the better and for the worse. If you are going to tune your system you have to test different setting for yourself and see what works best for your setup.  Higher values does not automaticly mean better results. I ended up with “stripe_cache_size=8192” and “Read Ahead @ 4096” for my system.

If you want to make sure that your changes is saved when rebooting the system, remember to add these commands (with your values) in /etc/rc.local.

Guide: Solr performance tuning

Introduction

I have for the last year been working a lot with the Solr search engine, and figuring out how to get the best performance from a Solr instance.  And it is almost funny how much impact the little things have on search performance. I have in this article described the points i have noticed myself that can be worked with in order to get just a little more juice out of Solr. (If you are working on tuning Solr yourself, remember to also look at the Solr Wiki for some extra hints)

 

Test data

For this article, i will be using the Wikipedia database as test content. I have downloaded a version from Wikimedia that only contain the current version of all english articles. ( download link)

I have generated XML-documents out of 633654 pages, to get a descent amount of test data. (This has given me a Solr index of 6.7 GB)  And then collected 25000 random words from those pages which i will use to run tests on. I will run the searches 5 times each, reaching to a total of 125 000 search queries against Solr. Solr will be restarted before each test to ensure correct cache levels (etc) for each test. When tests are performed on the same disk device, the OS disk cache is cleared from RAM in order to get a correct test each time. (sync;echo 3 > /proc/sys/vm/drop_caches)

The search queries will be simple search terms without any wildcards etc, since i will discuss this and usage of ngrams in Solr more in detail in a later article.

My test server

My test server is a Ubuntu Linux machine running with a single SATA drive for OS, and a RAID6 spanning over 11 SATA disks. The server started out with a single dual core FX-62 AMD processor with 8 GB of RAM, this was later replaced with a quad core Q9300 CPU with 16 GB of RAM.  Changes in the hardware is described in the hardware chapter, to give you a focus on how solr actually responds on relative small hardware changes.

My schema.xml

The test schema uses all the default field types, and had the following data fields.

1
2
3
4
5
6
             <fields>   
                        <field name="id" type="string" indexed="true" stored="true" required="true" />
                        <field name="title" type="text" indexed="true" stored="true" multiValued="true"/>
                        <field name="text" type="text" indexed="true" stored="true" multiValued="true"/>
                        <field name="text_rev" type="text_rev" indexed="true" stored="false" multiValued="true"/>
             </fields>

Tuning part one: Hardware

Starting out
I started solr on a single drive at first, without tweaking anything. The time taken to run all the queries was 168m43.705s. A quick calculation gives 12 queries per second for that.  I suspect that it is possible to speed up that a lot. So lets try to move our solr instance over to our RAID setup.

RAID performance
I moved the Solr instance over to the RAID set and performed the same test again. And this actually caused a performance gain of just over 100%.  The whole test now did not take more than 82m51.199s, which equals no less than 25 queries per second. But no need to stop here, time to try some hardware upgrade before turning to software tweaking.

Time for more juice!
Time to try a last change in the hardware. I replaced the dual core AMD CPU and its 8GB of RAM with a quad core Intel CPU with 16GB of RAM.  This time the test did not take more than 20m41.202s, a massive improvement from the first 168 minutes.  And we have actually now reached 100 queries per second, and that is even before tuning Solr itself.

Tuning part two: Tuning the solr cache (in solrconfig.xml)

I have now tuned as much as i have the posibility to do with the hardware i have available, so the next step is to look to our solrconfig.xml, which has not been touched yet. I will focus on the caches that you can track via the statistics page of solr admin (http://solrserver:8983/solr/admin/stats.jsp#cache).  The different cache elements there will have information about its size, elements inserted and elements removed in order to make room for new elements. If you have many evictions you should look into increasing that cache module so all elements can fit (but dont overdo it, adjust it and see what fits for your setup). It is likewise also a idea to decrease the size of some of them if they have a lot of unused slots.  A goal should be to get the hit rate as close to 1.00 as possible (1.00 beeing 100% hit ratio)

For my setup with simple search queries and no usage of filters, i only have two cache modules that i need to adjust, that is queryResultCache and documentCache.

queryResultCache
The queryResultCache is used to store ordered sets of document IDs. After running the test suite it noticed that the number of evictions already reached several thousands. So i start by adjusting it to 122880 (both size and initalSize), quite an increase from the default 512.  This cache does not have many lookups compared to inserts, but it still caused the test suite to go down to 17m17.550s. (120 queries per second)

documentCache
The document cache has had over 2800000 cache inserts, with a default of only 512 slots, that wont do for long. So i increased this from 512 to 2900000. This caused the test suite to go down a bit more to 16m15.414s (128 queries per second)

Other
Solr does have a couple of other cache settings too you can tweak, but these are dependet on your setup and solr usage. See http://wiki.apache.org/solr/SolrCaching# for more information.

Tuning part three: Java parameters
There is a lot of settings which you can tune for optimizing java, i will not go in dept on them here. But i would like to point out that one of the most important parameters to tweak is how much memory Java can use. If you use to little then Java have to work hard making sure it has enoug memory to use, while too much again causes Java to hog memory that could rather be used for disk caching.

I have done a couple of tests to display how different memory settings will affect the search suite i have:

“-Xmx14336m -Xms4096m” (14G/4G)
My test suite was down to 16 minutes, after giving Java too much memory the test went up to 26m50.914s. I suspect the reason for this beeing that Java hogged so much memory that the OS could no longer keep the index data in cache, causing more disk access.

“-Xmx2048m -Xms512m”(2G/512M)
I aborted the test after running for a staggering 1331m33.322s. I suspect that after the test had ran for a while, java/Solr had to use soo much resources on keeping enough memory available that it eventually died/hanged.

Skipping memory settings
I then tried to let Solr run without any limitation to memory (aka let Java decide for itself at startup, based on memory available on the machine)
This caused it to use around 4G of RAM after running for a while, quite a bit more than it had to spend in the previous test. This did of course do wonders for the response from Solr, sending it back to 16m26.850s.

In order to keep a bit control over your server, i suggest running without a limit first, then set a limit when you have had Solr running for a while and can see for yourself how much Solr wants to use.

Tuning part four: Tuning the search queries

If you have a schema with multiple fields which you can filter when doing queries, then use them! (With the help of the filterquery(fq) parameter) If solr has the possibility to remove X % of the documents before searching the remaining documents, you can risk a pretty good performance boost for those queries.

In conclution:

The numbers
Running test suite against default solr instance on single drive with 8G RAM and 2 cores: 168m43.705s
Running test suite against default solr instance on RAID6 with 8G RAM and 2 cores:  82m51.199s
Running test suite against default solr instance on RAID6 with 16G RAM and 4 cores: 20m41.202s
Running test suite with tuned solr cache on the same hardware: 16m15.414s

And finally
As you can see it is rather easy to get either a very bad performance or a very good performance from Solr, it all depends on your setup and what your needs are. You have to analyze and test to see what setup is the best for your needs, since there are no simple answers that fits every need.

Hardware will give you a lot of the needed performance, but if you have reconfigured something wrong, all the hardware in the world wont help. (It wont help with 32G of RAM on a 4GB index and when Solr only can use 512MB…)