Friday, December 4, 2009

Some performance experience with MyISAM

When I worked on a SQL statement optimization, once I found a same sql statement will
produce two huge different query time. One is 5 secs, the other is 17 mins. Oh my god,
what's going on in MyISAM. The reason is simple, MyISAM is using pread/pwrite to read
and write data to a file descriptor, from Vadim's post I found that Peter was ever
create a function that allow MyISAM to use mmap to cache table if the table size is
smaller than 2GB ( 32 bit limitation ). The original post and experimental testing
is here: http://www.mysqlperformanceblog.com/2006/05/26/myisam-mmap-feature-51/
Therefore, MyISAM is all dealing with real files no matter it is cached in file system cache or not.

Yeah, so I found some truth:
1. If your table is not being cached into file system memory, damn, you have to pray your hard drive is quick enough to read data and put into file system cache.

2. If you are lucky and running a sql statement like this:
SELECT * FROM your_table;

Yes, it is full table scan, but no worry, this is not the worst case. It will spit out data much faster than you thought.

3. If you are doing some joining in your sql statement like this:
SELECT * FROM A JOIN B;

You will have a very bad response time if your A and B tables are not cached into
file system cache. Remember that how MySQL join tables? "Nested Loop JOIN" check google what is this. MySQL will read couple rows from outer table and join with couple inner rows. I watch the IO behavior with iostat. It did show how "nested loop join" work, MySQL read couple blocks of 'outer loop rows' and then read couple blocks of 'inner loop rows' and do the join and then keep going until meet the condtion. I can see a lot of small IO back and forth. But from 2, I can see a small amount of IOPS but couple big IO size request in iostat. For some disks like SATA, it is good for large IO throughput, but not good at IOPS. Further, if your table is frequently delete and insert. Your data will not be continuously. So once you hit condition 3, you even will have worse performance since your disk will have longer seeking time to find the right data.

A trick you can avoid this condition 3 happen. You can have "SELECT * FROM" from your A and B tables. It force MySQL to cache your tables into memory within small amount of IOPS and large chunks of data read from disk. (for sure you have to have enough memory to host table files.) And once tables are in cache, you will be fine with any kind of operation, since you are dealing with your table all in memory now.

Back to how Peter done with mmap, I can not find this feature in mysql.com right now. Only using myisampack to compress your tables and having your tables as read only
table as this, MyISAM will use mmap to cache your tables into memory. I hope that in coming future, MyISAM team will consider to implement it, since from Vadim's post,
Soaris and Linux are not really good in pread call as what he said in the post.
Oracle is smart that it allocates share memory when system startup, it manage its own
memory, cache tables etc. It flush cache by its own.

No comments:

Post a Comment