在虚拟机上测评了下MySQL 和 PostgreSQL 的各种LOAD FILE方式以及时间。 因为是虚拟机上的测评,所以时间只做参考,不要太较真, 看看就好了。

MySQL 工具:

   1. 自带mysqlimport工具。

   2. 命令行 load data infile ...

   3. 利用mysql-connector-python Driver来写的脚本。

PostgreSQL 工具:

   1. pgloader 第三方工具。

   2. 命令行 copy ... from ...

   3. 利用psycopg2写的python 脚本。

测试表结构:

mysql> desc t1;+----------+-----------+------+-----+-------------------+-------+| Field    | Type      | Null | Key | Default           | Extra |+----------+-----------+------+-----+-------------------+-------+| id       | int(11)   | NO   | PRI | NULL              |       || rank     | int(11)   | NO   |     | NULL              |       || log_time | timestamp | YES  |     | CURRENT_TIMESTAMP |       |+----------+-----------+------+-----+-------------------+-------+3 rows in set (0.00 sec)mysql> select count(*) from t1;+----------+| count(*) |+----------+|  1000000 |+----------+1 row in set (6.80 sec)

测试CSV文件:

t1.csv

MySQL 自身的loader: (时间24妙)

mysql> load data infile '/tmp/t1.csv' into table t1 fields terminated by ',' enclosed by '"' lines terminated by '\r\n';Query OK, 1000000 rows affected (24.21 sec)Records: 1000000  Deleted: 0  Skipped: 0  Warnings: 0

MySQL python 脚本:(时间23秒)

>>>

Running 23.289 Seconds

MySQL 自带mysqlimport:(时间23秒)

[root@mysql56-master ~]# time mysqlimport t_girl '/tmp/t1.csv' --fields-terminated-by=',' --fields-enclosed-by='"' --lines-terminated-by='\r\n' --use-threads=2 -uroot -proott_girl.t1: Records: 1000000  Deleted: 0  Skipped: 0  Warnings: 0real    0m23.664suser    0m0.016ssys     0m0.037s

PostgreSQL 自身COPY:(时间7秒)

t_girl=# copy t1 from '/tmp/t1.csv' with delimiter ',';COPY 1000000Time: 7700.332 ms

Psycopg2 驱动copy_to方法:(时间6秒)

[root@postgresql-instance scripts]# python load_data.pyRunning 5.969 Seconds.

Pgloader 导入CSV:(时间33秒)

[root@postgresql-instance ytt]# pgloader commands.load                         table name       read   imported     errors            time                        ytt.t1    1000000    1000000          0         33.514s------------------------------  ---------  ---------  ---------  --------------------------------------------  ---------  ---------  ---------  --------------             Total import time    1000000    1000000          0         33.514sPgloader 直接从MySQL 拉数据:(时间51秒)[root@postgresql-instance ytt]# pgloader commands.mysql                    table name       read   imported     errors            time               fetch meta data          2          2          0          0.138s------------------------------  ---------  ---------  ---------  --------------                            t1    1000000    1000000          0         51.136s------------------------------  ---------  ---------  ---------  --------------------------------------------  ---------  ---------  ---------  --------------------------------------------  ---------  ---------  ---------  --------------             Total import time    1000000    1000000          0         51.274s

附上commands.load和commands.mysql

commands.load:LOAD CSV    FROM '/tmp/ytt.csv' WITH ENCODING UTF-8        (            id, rank, log_time         )    INTO postgresql://t_girl:t_girl@127.0.0.1:5432/t_girl?ytt.t1   WITH skip header = 0,         fields optionally enclosed by '"',         fields escaped by backslash-quote,         fields terminated by ','     SET work_mem to '32 MB', maintenance_work_mem to '64 MB';commands.mysql:LOAD DATABASE      FROM mysql://python_user:python_user@192.168.1.131:3306/t_girl?t1     INTO postgresql://t_girl:t_girl@127.0.0.1:5432/t_girl?ytt.t1 with data only  SET maintenance_work_mem to '64MB',       work_mem to '3MB',       search_path to 'ytt';

附pgloader 手册:

http://pgloader.io/howto/pgloader.1.html