Friday, December 4, 2009

MyISAM table files size growing fast?!

Recently I just found that in MyISAM table, when you delete rows, MyISAM will only mark the row as 'delete' and will re-use the blocks later. That means the actually table files size (.MYI, .MYD) will not shrink even you delete a lot of data.

A lots of way can force MyISAM to resize table files size which is kind of optimization.

1. You can create an new index or add a new column on the table, so that MyISAM will actually create a temporary table with new structure, and then copy data from old table to temp table, finally rename temp table to the name of the table we are working on. This is an optimization of MyISAM, it also does sorting your data when insert data into temp table. The drawback is that it will take you a lot of time even you thought that is an easy thing just create an index but that will actually cause the table lock for a while. Yeah, table lock, that will kill you if your application can not tolerate long locking. My experience is that 5 million rows, will take 8~10 mins, more or less depends on your hardware system performance as well as table cached in filesystem memory or not.

2. You can shutdown your system and run myisamchk -r to repair your table. It will also shrink table files size as well.

3. A nice way to smoothly shrink files size without long table locking can be like this:
First you create a new table as the same as your old table, and then copy over your data
to new table, once it done, just swap table by using rename table org_table to old_table, new_table to org_table. There is a trick between this process is that when you copy your
data from old to temp table, your application is still writing new data into old table, so
that you have to have a loop that incrementally insert data into temp table ( so that select
on old table will not lock table for long and block write, concurrent write can avoid locking
in some condition ) and then for a very short moment, you can pause your application
to write into old table and immediately swap tables, finally resume application. Downtime
will be 0 in this case. Second, you can keep your application running, however, between incrementally insertion and swapping table, you will have a chance to
get new data being written into old table once swap. Therefore, you can write
down your latest primary key or unique key in the new table once you done your
insertion (remember to adjust auto_increment id to latest id+1), after that, you swap your tables. And once new coming writing start
using new table, you have to go back to your old table and review it and compare
the latest primary id or unique id. If you find any id that is larger than the
one you record in your latest insertion in new table, you must copy them
over to new table after swapping. There is a problem within this case, once
you swap your table, if any data being written into old table in between and
you have to copy them over to new table, the sequence of id will be inconsistent
and if your application will rely on the sequence of insertion data or say
the order of the primary key. Then you probably don't want to follow this way instead
you must pause your application for a short moment.

Why we need to do it to shrink table files size? All about filesystem cache, if we have small table
files, we can cache more tables in the memory to speed up database read/write.

No comments:

Post a Comment