Best Practices for Setting MySQL Server Runtime Parameters

Since we’ll be exposing MySQL to significantly more traffic (due mainly to the transition to a database-driven version of Benchmarks Online [dbBOL]) I decided to spend some time optimizing the server’s settings. There are a number of settings that can be tweaked to improve performance. I based my decisions on the information available from the references cited and the performance statistics reported by MySQL (SQL SHOW VARIABLES or use PHPMyAdmin). MySQL has been running for 131 days as of the writing of this post (see cached copy of the runtime stats), so I expect the data will be a fairly good indication of the performance of MySQL under its current usage. Unfortunately, I expect the usage pattern to change significantly once dbBOL is released. As a result some of the settings used will be based on expected usage patterns. At specific intervals after dbBOL is released we should examine the performance of MySQL based on the runtime stats to determine if additional tweaking needs to be performed. I recommend the following schedule: 1 week, 1 month, 3 months, then every 6 months.

MyISAM Recovery

MyISAM is a nice format for speed and has support for functionality not available in other MySQL storage engines (such as full text search). Unfortunately MyISAM is not nearly as robust as InnoDB. Since the data files are dealt with directly sans transactions a system crash can cause table corruption and loss of data (particularly is an INSERT/UPDATE operation were in progress). To ensure that the tables have not been corrupted at any time we can set myisam-recover=BACKUP,FORCE. This will tell MySQL to check a MyISAM table when it is opened, repair it if necessary, and make a backup of the table.

There are some drawbacks with this setting. First, if a row is corrupted the data from that row could be lost. That’s why we use the BACKUP option. Also, there can be a performance hit due to recovery operations, particularly if a large number of tables have to be repaired simultaneously. Not to mention that the recovery check is done every time a table is opened.

Another method of checking the MyISAM tables we should consider is a cron job that checks the tables outside of MySQL. This would give us the benefit of automated repair (or at least notification) while mitigating possible performance bottlenecks.

References:

Thread Cache

MySQL assigns a thread to each connection made by a client. Thread creation/destruction can be a relatively expensive process, but MySQL gets around this by implementing thread caching. Thread caching allows MySQL to reuse a thread once a connection is finished with it. A thread is not destroyed unless the number of active threads exceeds the thread cache limit. The thread cache does not have to be large enough to handle all simultaneous connections, particularly since maintaining a thread uses up system resources. The cache should be large enough, however, such that the number of threads created is small. Check the Threads_created server status variable.

We currently do not have thread caching enabled. Our threads created is at 60,000, which is extremely high (~460 per day). Unless you set up persistent connections to MySQL using the pconnect() function PHP will open/close a connection each time a web page is loaded. I’m going to go with a value of 20. Assuming maximum concurrent connection could reach 25 this provides plenty of cached connections for average usage. If the number of threads created does not budge past the cache limit we should consider lowering the value somewhat to free up resources.

For an hint of how important thread caching is, see MySQL, Linux, and Thread Caching and Optimize MySQL: The Thread Cache.

References:

Table Cache

Opening a table can be a slow process as data descriptors are created and file headers are modified. To mitigate this MySQL uses a file-based table cache that maintains a table in an open state for future connections. A unique entry in the table cache is required for each concurrent access to a table (multiple users or multiple accesses by a single user in a query). Every time a table is opened an unused table cache entry for that table is sought. If none is found a new one is created. Once the table cache reaches the limit specified by the table_cache variable MySQL has to close old cached connections before opening new ones, adding even more time to table access. MySQL recommends sizing the table cache so that it can handle the largest number of concurrent connections multiplied by the largest number of tables accessed by a single query. This is at the high end. You can start lower and watch opened_tables to see if the table cache is constantly swapping out tables. The faster opened_tables rises the more urgently the table cache needs to be increased.

One caveat to consider when setting the table cache is the per-process file pointer limit. Each cache entry is associated with MySQL. If the number of files held open by MySQL exceeds the limit allowed by the operating system no further files can be opened. MySQL does not fail gracefully in this situation and may, according to the documentation, “refuse connections, fail to perform queries, and be very unreliable.” You can find the file usage limit by issuing the following command cat /proc/sys/fs/file-max. It’s very unlikely we’ll have a problem; the current value indicated by this command is 50569. But the results of going over this limit appear to be fairly severe for MySQL, so it’s a good idea to check.

The current table cache of is set to 160 and is full, but the value of opened tables is rising slowly. Still, our maximum concurrent connections has already hit ten, so I believe we could easily see the table cache get overwhelmed once the database is exposed to a larger traffic base. If we assume maximum concurrent connections of 25 and a crazy join of 10 tables then we’re looking at a table cache of around 250. I’ll start with this number and watch the opened tables stat to see if the cache needs to go higher or can be lowered.

References:

Temporary Tables

Temporary tables may be used by MySQL when performing queries. By default these tables are created in memory. However, there are two situations in which a temporary table will be written to disk, resulting in a bit of a performance hit: when a temporary table grows beyond the maximum allowed; and when a condition exists that prevents the use of a temporary table. The former situation is one determined by the tmp_table_size/max_heap_table_size parameters. The latter is determined by table and query structure. tmp_table_size is specific to temporary tables while max_heap_table_size applies to all memory tables, so make sure that max_heap_table_size is at least as large as tmp_table_size.

The allowable size of temporary tables should be large enough to avoid writing to disk where possible, but small enough that memory is not eaten up. There is no provision to limit the number of temporary tables stored in memory. If there are many simultaneous connections and each connection is working with a large temporary table memory could be filled rather quickly.

You can determine whether or not your temporary tables are being created in memory by looking at the number of temporary tables that had to be written to disk (Created_tmp_disk_tables). Ours is hovering around 50% of the total number of temporary tables (Created_tmp_tables), but this isn’t enough information to make a decision about the optimal setting for tmp_table_size. What we don’t know is the reason a table is written to disk. That’s something that can only be determined using the EXPLAIN statement.

Since the number of disk-based temporary tables is relatively high I’m going to increase the maximum size allowed for memory-based tables and see if that improves things.

References:

Query Cache

For SELECT queries, the speed of the response can be affected by factors such as query structure and which columns are indexed. MySQL is able to provide fast results for often-run queries by storing the results in the query cache. The first time a query is run the query and its result set are stored in the query cache. Subsequent runs of the exact same query will pull the results from the cache (so long as the cached entry is available). There are some qualifications for the query cache to be used successful. First, a query must match in a binary manner in order for MySQL to use a cached query. This means a query must match character for character in a case-sensitive way with the previous run. Second, the query results must not exceed a defined size (1MB by default) or it will not be cached.

To enable the query cache set query-cache-type = 1 and give query_cache_size a value. The best size of the query cache is a guessing game. I’m starting with a value of 32M. Though we’re likely to see little performance improvement on the web-based applications due to constant table updates (which invalidates any cached results for that table) we should see decent performance improvement for more static tables such as those used to serve dbBOL. As a result, judging the efficiency of the cache based on the usage data provided by MySQL will be somewhat difficult. Still, I would expect to see a relatively high value for Qcache_hits when compared to Qcache_inserts and a low value for Qcache_lowmem_prunes. It’s also important to keep an eye on Qcache_free_blocks in order to ensure the cache memory is not fragmented. This number should remain low and can be improved temporarily by issuing a FLUSH QUERY CACHE statement, which will “defragment” the query cache.

One thing to keep in mind is that any cached query will be dropped if the table it references is updated. Tables that are updated often should include the SQL_NO_CACHE attribute in any SELECT queries to prevent caching. This will help prevent the extra overhead of storing and dropping queries from the cache when those caches can rarely be reused.

References:

Key Buffer

MySQL says this is one of the most important performance tuning variables and recommends allocating as much memory as possible. However, since the key buffer is stored in RAM a setting should be used that (in consideration with other settings) won’t cause the server to page memory. MySQL also recommends that the key_reads/key_read_requests be less than 0.01.

The MySQL Performance Blog recommends up to 40% of your system memory, taking into account the size of the MyISAM table indexes and available memory.

Currently the setting for this parameter is 16M and our key_reads/key_read_requests are very low (0.0002). So right now we seem to be doing great in regard to this value, but since MySQL recommends a high value I’m going to increase this to 24M or about 4.7% of system memory. Though this value is below the roughly 58M that the table indexes add up to, the maximum portion of the key buffer that has been used at any one time so far is only about 70% (based on a block size of 1M and Key_blocks_used showing 11K, meaning roughly 11M of 16M in the current setup).

Due to how MySQL uses the indexes, we should be O.K. with a key buffer size smaller than the sum of the indexes. As I understand it, MySQL divides indexes into blocks which allows MySQL to only access the part of an index it needs. Only used portions of a particular index need to be stored in the key buffer. Since some of the tables in the database are rarely (if ever) used their indexes won’t add to the overall key buffer usage.

Special note: some tables have rather large indexes. These tables should be reviewed to determine if any optimizations to the table structure can be made.

References:

Sort Buffer

The sort buffer is used whenever ORDER BY or GROUP BY is used in a query. If the amount of memory required to perform an operation of this type exceeds the value of sort_buffer_size MySQL has to sort the current working set, write the data out to disc, and start another working set. So not only do you have the problem of sorting parts of the result set separately, but then those groupings have to be combined on disc and sorted. Conceivably, then, a larger value would be beneficial. You can see how often a sort buffer has to be written out to disc and combined with additional buffered data by looking at the Sort_merge_passes status variable.

The sort buffer is a per-connection setting, meaning that each connection can allocate the amount specified by this value. As a result, care should be taken when setting this value in order to avoid eating up too much memory. Also, the MySQL Performance Blog has benchmarks showing a larger sort buffer actually hurting performance. With all this in mind it may be wise to use a relatively low value in the server settings and specify a larger value when necessary for a specific connection. Also, it may be advisable to perform some testing in various scenarios to see if there is an optimal minimum to cover most situations.

Still, MySQL recommends using a larger value to help improve sorting. Our current value was 512K. Our merge passes appear to be rising steadily (though not severely). So for now I’m doubling the value.

References:

Read Buffers

MySQL uses read buffers when accessing table data. There are two settings to pay attention to here: read_buffer_size and read_rnd_buffer_size. MySQL allocates read_buffer_size when a sequential scan is performed. A sequential scan is when every row in a table is read and typically would be done when an index can’t be used to satisfy a query. MySQL allocates read_rnd_buffer_size when tables rows are read based on a key sort.

As with the sort buffer, the read buffers are a per-connection setting. So setting this value with consideration of our available memory is important. Plus, the MySQL Performance Blog has found some performance issues with larger values for these settings, similar to the issues with the sort buffer. Once again these are variables that may be best increased as needed. And again, testing should be done to find an optimal minimum.

Since these settings can show something of a benefit we want to increase the value if possible. Until further testing can be done on our system, however, I’ll keep these values below the theorized threshold of decreasing returns, 256K (at least until further testing and optimization can be done with our own system).

References:

InnoDB

The MySQL Performance Blog recommends a number of settings for enhancement of InnoDB performance. After much consideration, however, I believe it may be best if we forgo usage of the InnoDB storage engine where appropriate. I have a few reasons for this opinion

  1. InnoDB tables are more resource intensive due to their use of transactions.
  2. Fulltext indexing, a feature of MySQL that we have come to depend on for a number of applications, is not currently available for InnoDB tables.
  3. Our current applications do not generally require the added safety of a transactional database.

If we do decide in the future to implement InnoDB for a future application (such as an online ordering system) we should revisit the optimization settings.

To ensure that tables are MyISAM by default I have set default-storage-engine = MyISAM. This can be overrode by specifying the table engine when creating a new table or by altering a table.

Fin

Once settings changes have been made to /etc/my.cnf restart the server and check the server variables (SHOW GLOBAL STATUS or use PHPMyAdmin) to ensure all settings have been implemented correctly.

Also, see my earlier post on optimization.

4 thoughts on “Best Practices for Setting MySQL Server Runtime Parameters”

  1. 11-day report:
    The settings I’ve implemented definitely seem to be helping with database performance.

    My assumption of the max number of concurrent connections appears to be fairly close. So far we’ve hit a concurrent connection max of 16. I’m not sure how MySQL determines when to cache a thread, but our thread cache is only at 8. Probably related to thread age, but something that should be researched.

    I’m going to lower the maximum connections to 50 (from the default of 100). We’re not coming anywhere near 100 and since the purpose of this setting is to prevent resource depletion it’s better to play safe than sorry. At some point in the future we should calculate the maximum per-connection resource usage and adjust this setting based on that information.

    The query cache seems to be performing well with what appears to be a high hit/insert ratio (1,179K/31K). It has plenty of free memory after 11 days of usage and no low memory prunes. It does show some modest fragmentation as indicated by qcache_free_blocks having a value of 47.

    The key buffer setting also appears to be acceptable. Our key_reads/key_read_requests is still significantly below the .01 ceiling recommended by MySQL.

    For the number of temporary tables created (148K) we’re not looking too bad on the number that had to be written to disk (1,840).

    The table cache appears to be doing well, though still not being used fully. Again something that might relate to pruning by MySQL based on age, but I’ll need to research this more as well.

    The one number that is disconcerting in the stats is handler_read_rnd_next with a value of 538M. The documentation states that a high value here probably indicates that tables indexes are not optimal for the queries being run. This is also bourne out by the values of select_full_join and select_scan. We should spend some time analyzing our queries to determine where optimizations can be made.

    Next update at around one month.

  2. 32-day report:
    A lot of the numbers appear to be worse than I was expecting given the early update. However, I won’t be making any changes until the three-month update; mainly because the import of the pilot test data may be having a detrimental affect on the number reported.

    Cached threads is still pushing upward (currently at 20 created; 17 cached, 3 in use), but I’m going to leave this at 25 for now. This is something that won’t really hurt us if we go over, so I’ll wait until we get closer to the setting.

    Opened tables is still rising, despite a table cache of 256. Right now it’s at 556. I think it may be worthwhile to increase the table cache, but we’ll see how things look at the next review.

    The number of temporary tables being written to disk is a little over 10% of the total number of temporary tables. This isn’t as good as the numbers seen in the previous update so it’s worth watching. If the numbers don’t improve the value of tmp_table_size/max_head_table_size should be increased.

    The query cache is not performing quite as well as in the previous update. It’s at roughly a 4:1 hit-to-insert ratio (1,652k/427k). Plus, the value in lowmem_prunes is high (19K) and the number of free blocks is also high (368). I’m going to flush the cache and see if the numbers improve. There’s no need to increase the size of the cache, though, since it’s not being fully used. These numbers may have more to do with caching of queries that would best be left out, which is something that can be controlled when issuing a query.

    The key buffer still appears to be at an appropriate value. However, the number of blocks used is starting to push against the size of the buffer. This should probably be increased on the next update.

    The sort buffer appears to be acceptable.

    The read handlers are still fairly high, but queries have not yet been optimized so this is something that may see improvement without further changes to the server settings.

  3. Update after 164 days uptime:

    After today’s update I won’t be providing further review. I think, for the most part, the decisions I made regarding the server settings have worked out well. There are a few settings that can still be tweaked, but I don’t think any further discussion is necessary at this point. I will, however, make any notes on any extraordinary changes in server performance. I will also report on the effects of any query optimizations.

    The thread cache is still at a good level. Current stats say 20 maximum concurrent connections with 18 cached and 100% hit rate.

    I think the table cache may actually be at an acceptable level. The number of opened tables is still rising (currently at 870), but not as quickly as it appeared in the last review. It may still be worthwhile to increase the cache slightly, but I’m not as concerned as I was initially.

    Temporary tables still appears to be problematic. The percentage of temporary tables written to disk continues to rise (now at around 17.5%). Some of this is likely due to lack of query optimization, meaning a code review needs to be performed. That will have to wait until I have a chance to outline best practices for code development. In the meantime I’m going to raise the values of tmp_table_size/max_heap_table_size from 32M to 48M.

    The query cache performance is doing a little better than in the previous update with a 7:1 hit-to-insert ratio. We’re still seeing a large number of prunes (35K) and free blocks (908). Still, these numbers aren’t increasing horribly. This is still likely a problem of leaving out some queries, plus we should set up a cron job to flush the cache on a regular basis in order to keep the number of free blocks low.

    The key buffer is fully used, but the key_reads/key_read_requests is still at an acceptable level (.001). It might be prudent to increase the buffer size, but I’ll wait until next review to see how the buffer is holding up.

    The read handlers are still alarmingly high. But again, until we spend some time optimizing queries I don’t expect to make any changes to these settings.

Comments are closed.