Often, web developers turned database designers begin their coding endeavors by picking up whatever snippets are printed in the O’reilly title at hand or whatever Google throws their way. Building a database-driven website is easy. Making it fast is not. A little attention to schema, indexes, query optimization, and benchmarking could give your app the performance boost it needs to reach greatness.
- Benchmark to measure success
- Get familiar with the EXPLAIN command
- Know how the storage engines differ
- Narrow, non-redundant indexes are better
- Smaller data types are better
- Fewer reads = faster results: Use the query cache for read intensive apps
- Learn to use joins properly
- Use calculated fields only when necessary
- Increase memory
Benchmark to measure success
In a Google TechTalks video recorded on April 28, 2006, Jay Pipes gave some very sound benchmarking advice. First, you should record everything to do with your benchmarks: hardware, OS, database version, etc. Keep your benchmarking data because you never know when you may need it again. Set a goal such as to obtain a 20% increase in performance. Disable the query cache when running benchmarks. And, change only one thing at a time.
Some benchmarking tools to check out are the MySQL Benchmark Suite for performance testing and MySQL super-smack for stress testing.
Get familiar with the EXPLAIN command
EXPLAIN lets you quickly gauge which indexes were used and how many reads MySQL estimates the query will need. Using EXPLAIN, you can easily see if indexes are NOT getting used, in which case you need to check your query syntax and make sure indexes are setup correctly.
Here is the EXPLAIN command in the MySQL Reference Manual for version 4.1.
Know how the storage engines differ
MyISAM table types, or engines, are better for read intensive apps. They have full-text indexing, compression, and more, but they don’t offer transactions. Only MyISAM provides full-text search capabilities. MyISAM has very low overhead and can handle inserting thousands of records. InnoDB is modeled after Oracle. InnoDB can do lightning fast record lookups for queries that use a primary key.
Narrow, non-redundant indexes are better
Indexes are a trade off between disk space and processing time. More indexes take more time to write. Redundant indexes just add unneeded overhead. Just know that fewer, quality indexes is the best practice. When it comes to InnoDB, the smallest possible primary clustering key is really important because it is appended to each secondary index record (narrow is better). Look for covering index opportunities (i.e. queries that just hit index records). If you just access the index(es) in memory, your query won’t require any reads and will be very fast.
Smaller data types are better
Numbers are faster than strings. Short strings are faster than long strings. String processing time is in direct relationship to string length. Fixed-length is faster than variable-length. Fixed length columns are faster, but they take more space. Like indexes, you have to set them according to your data. An inefficient schema can kill MySQL performance. Know your data. If you don’t know enough about the data to create a good schema, then learn.
Here’s a nifty procedure that examines the result from a query and returns an analysis of the results that suggests optimal data types for each column. Check out PROCEDURE ANALYSE.
Fewer reads = faster results: Use the query cache for read intensive apps
The query cache is turned off by default. Most hosting providers should have turned the query cache on. Use the following SQL to check:
show variables like 'query_cache_%';
To enable the query cache (version 4.0.1+), set query_cache_type=1 in my.cnf. Make sure the query cache is getting utilized by optimizing your queries. Query cache is case sensitive and white space sensitive, so make sure queries referenced more than once in your code are exactly the same. Tip: using a calculation like CURRENT_DATE() would render the query cache useless because it would need to calculate the date each time the query runs. Instead, code today’s date into the SQL.
Learn to use joins properly
Joins are good. Correlated subqueries are bad. Join order does matter.
Use calculated fields only when necessary
Using calculated fields can slow your queries. For example, using a calculation such as CURRENT_DATE() obliterates the usefulness of the query cache because that date calculation has to be done each time the query is run. Code the actual date into the query instead. Even if your calculation can be cached, too much processing can just slow you down in some cases. Check your SQL for unnecessary calculations.
Memory is the cheapest, fastest, easiest way to better MySQL performance. For InnoDB, one MySQL employee recommends that 50-80% of the total memory on the box should be allocated to innodb_buffer_pool_size.
Inspiration for this article came from:
- Mac Newbold’s 11/16/2006 UPHPU presentation on database design
- Performance Tuning Best Practices for MySQLGoogle TechTalks presentation by Jay Pipes, 43 min – Apr 28, 2006
- MySQL by Paul Dubois, ISBN: 0672326736
- High Performance MySQL by Jeremy Zawodny and Derek Balling, ISBN: 0596003064
Software versions used during the time this post was written:
- MySQL 4.1.21
- phpMyAdmin 188.8.131.52.