Slow on the draw
Some of our users were complaining of slow performance on one of our web-based utilities. There are, unfortunately, a number of factors that can affect system performance. I haven’t spent much time learning the best practices for profiling a system to discover the source of bottlenecks, but my hunch was that the problem was in MySQL.
The users in question were having problems with slow
UPDATE statements. I used the status and variables tabs of PHPMyAdmin (PMA) to try and find any possible bottlenecks in MySQL. These tabs are equivalent to the
SHOW STATUS and
SHOW VARIABLES SQL statements respectively. The nice thing about PMA is that it will highlight any values it thinks may be causing a problem.
At first I thought locking might be the source of the slowdown, but after some testing I don’t think that’s the case. In my informal click test (clicking the save button on the form being used over and over) the worst delay I ran into was about five seconds. The delay time being experienced by the users was in the neighborhood of thirty seconds.
When I thought about the delay more I remembered a similar problem I’ve experienced myself when logging in to one of the utilities. The log in could easily take up to 30 seconds at times. It took a while to nail down the possible culprit, but I believe it may be the MySQL table cache. These are the tables MySQL has accessed that it stores in memory in order to speed up future access. After a bit more testing I’ve come to the conclusion that a significant slow-down may be occurring while as MySQL removes one table from the cache and loads another one. By upping the value of the
table_cache parameter in my.cnf I believe that the delay caused by this operation should be avoided.
I upped the
table_cache from 64 to 80. I thought about doubling the value to have a bit of a safety net, but the drawback would be higher RAM usage. I’ll have to watch the
opened_tables status variable and see if the new cache value is sufficient.
I won’t be able to see the affects until next week. The users say the problem seems to be worse in the mornings. This is likely the most probable time that the table cache is being optimized based on usage, with tables going in and out of the cache.
Staying off the disk
Table locking isn’t totally out of the question, though. With up to five people entering data at the same time it’s possible that a user could get stuck at the end of a locking queue waiting for the others to finish. There’s not a lot that can be done to prevent lock conflicts. We can raise the priority of
SELECTs or decrease the priority of
UPDATEs. Otherwise switching to InnoDB might show something of a performance increase (from the user’s perspective). Since InnoDB uses row-level locking the overall time a user spends waiting on locks could decrease significantly. Of course, InnoDB isn’t without its drawbacks in the form of higher overhead and the inability to perform full text searches. (Full text search is not a concern for the table involved in this instance.)
While I was modifying the operational parameters of MySQL I went ahead and updated the
tmp_table_size option as well. The number of temporary tables being written to disk was relatively high when compared to the total number of temporary tables created (almost 50%). This can cause sluggish response time as well so I’m hoping that updating this value (32M to 48M) will also show an improvement in server responsiveness. I have a feeling I’ll need to push this even higher, but again I need to keep an eye on overall RAM usage. All this optimization will prove pointless if the system has to start paging out to disk because the RAM is full.
Keeping it real (fast)
One issue to keep in mind with regard to server performance is indexes. A join should always be done on an indexed field, otherwise there can be serious lag as the system seeks out matching records. We appear to have a large number of join operations running that do not use indexes. Finding the offending SQL statements will require a bit of investigation, however, and so I don’t expect to address quickly. Also, I don’t believe this is an issue in the user-experienced sluggishness so I’ll probably address these as I come across them.