"froggy000" <EMAIL REMOVED> wrote:
> I'm trying to import a dump of the most interesting Wikipedia tables.
> Specifically page.sql (560MB), categorylinks.sql (584MB) and
> pagelinks.sql (3GB) from [download.wikimedia.org].
This is a 3GB uncompressed SQL dump?
> I've successfully imported both page.sql and categorylinks.sql in about
> 30 minutes each but I haven't had the same success with pagelinks.sql.
> I've alotted as much memory as is available (1.5GB) but it fills up and
> CPU usage drops to about 2-5% with gratuitous disk usage.
Seems your system is I/O bound now. This was foreseeable with only
1.5GB for InnoDBs buffer pool.
> A couple notes. pagelinks.sql begins with:
>
> DROP TABLE IF EXISTS `pagelinks`;
> CREATE TABLE `pagelinks` (
> `pl_from` int(8) unsigned NOT NULL default '0',
> `pl_namespace` int(11) NOT NULL default '0',
> `pl_title` varchar(255) binary NOT NULL default '',
> UNIQUE KEY `pl_from` (`pl_from`,`pl_namespace`,`pl_title`),
> KEY `pl_namespace` (`pl_namespace`,`pl_title`)
> ) TYPE=InnoDB;
Bah! Bad one!
I ***ume we're talking about 3GB of uncompressed SQL Dump. Since it
uses the multi-value INSERT syntax, there is not much overhead from
SQL in the dump. ***uming we have 6 digits for the numbers and k
characters in the VARCHAR() column we have 6+6+k+7 characters in the
dump and get 4+4+k bytes raw data from it. Estimating k~30 this leads
to approx. 2.3GB of raw data. But rows are small with 38 bytes and I
guess InnoDB adds another 10 Bytes of overhead so we're back at 3GB
for the rows alone.
Now for indexes: the first index is on *all* columns in the table
and therefore duplicates the amount of memory used. The second index
duplicates the bigger part of the data again. So overall you will
have approx. 8GB of active data in InnoDB (when you create an index
on a table you can consider the whole table "active"). With just
1.5GB of memory this will clearly become slow.
You may try to remove both indexes from the table definition, loading
the raw data and then creating the indexes. The following line
> /*!40000 ALTER TABLE `pagelinks` DISABLE KEYS */;
does that for the second index already, but not for the first one.
But I guess index creation will still take very long.
There is another workaround: switch the engine to MyISAM. MyISAM uses
much less memory than InnoDB. You should put your memory in key_buffer
instead of the innodb_buffer_pool then.
Wikipedia uses InnoDB because they have a lot of writes. Unless you
want to run a Wikipedia system yourself, MyISAM should be good.
XL
--
Axel Schwenke, Support Engineer, MySQL AB
Online User Manual:
http://dev.mysql.com/doc/refman/5.0/en/
MySQL User Forums:
http://forums.mysql.com/