2012年9月2日日曜日

カラム指向DB比較 - InfiniDB パフォーマンス検証の巻。


カラム指向DB比較 - MonetDB インストールの巻。
カラム指向DB比較 - MonetDB パフォーマンス検証の巻。 
カラム指向DB比較 - InfiniDB インストールの巻。
カラム指向DB比較 - InfiniDB(番外編) - はまりポイント挙げるの巻。
の続き。

Infinidb をインストールしたので MonetDB の時と同じクエリでパフォーマンスを検証してみる。

  1. 検証用DB作成
  2. 検証データ投入
  3. クエリによるパフォーマンス

実施環境

  • CloudCore
  • OS: Ubuntu 12.04
  • CPU: AMD Phenom(tm) 9550 Quad-Core Processor 2.2GHz
  • Memory: 2G


1. 検証用DB作成

まずはデータベースの作成。
$ idbmysql -u root
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 3485
Server version: 5.1.39 MySQL Community / Calpont InfiniDB Community 2.2.9-2 Final (COSS LA)

mysql> create database performance;
Query OK, 1 row affected (0.00 sec)

mysql> use performance;
Database changed

2. 検証データ投入
テストデータとして下記3ファイルをバルクロードする。
  • users: 100万件
  • payments: 5000万件
  • products: 26件
続いてテーブル作成。
# users
mysql> create table users(id int, prop1 int, prop2 int, prop3 int, prop4 varchar(8), registered_date date) engine=infinidb;
Query OK, 0 rows affected (0.81 sec)

# payments
mysql> create table payments(id int, user_id int, product_id int, quantity int, sale int, buy_at datetime) engine=infinidb;
Query OK, 0 rows affected (0.57 sec)

# products
mysql> create table products(id int, name varchar(8), price int) engine=infinidb;
Query OK, 0 rows affected (0.27 sec)

# fact
mysql> create table fact(user_id int, prop1 int, prop2 int, prop3 int, prop4 varchar(8), registered_date date, quantity int, sales int, first_buy_at datetime, last_buy_at datetime) engine=infinidb;
Query OK, 0 rows affected (0.44 sec)

mysql> show tables;
+-----------------------+
| Tables_in_performance |
+-----------------------+
| fact                  |
| payments              |
| products              |
| users                 |
+-----------------------+
4 rows in set (0.00 sec)
続いてデータ投入
# users
$ cpimport performance users /usr/local/var/db/data/no_header/users.csv -s ","
~省略~
2012-09-01 22:42:00 (16816) INFO : No of Read Threads Spawned = 1 # ここの設定がパフォーマンスに影響しそう
2012-09-01 22:42:00 (16816) INFO : No of Parse Threads Spawned = 3 # ここの設定がパフォーマンスに影響しそう
~省略~
2012-09-01 22:38:27 (16752) INFO : For table performance.users: 1000000 rows processed and 1000000 rows inserted.
2012-09-01 22:38:27 (16752) INFO : Bulk load completed, total run time : 5.50646 seconds

# payments
$ cpimport performance payments /usr/local/var/db/data/no_header/payments.csv -s ","
~省略~
2012-09-01 22:44:01 (16816) INFO : For table performance.payments: 50000000 rows processed and 50000000 rows inserted.
2012-09-01 22:44:01 (16816) INFO : Bulk load completed, total run time : 120.644 seconds

# products
$ cpimport performance products /usr/local/var/db/data/no_header/products.csv -s ","
~省略~
2012-09-01 22:45:18 (16863) INFO : For table performance.products: 26 rows processed and 26 rows inserted.
2012-09-01 22:45:18 (16863) INFO : Bulk load completed, total run time : 0.113074 seconds
1秒で約41万件以上(=5000万件 / 120s)インポートしている。相当速いが MonetDB よりは遅め。

3. クエリによるパフォーマンス検証
# all count: users
mysql> select count(id) from users;
+-----------+
| count(id) |
+-----------+
|   1000000 |
+-----------+
1 row in set (0.16 sec)

# all payments: count
mysql> select count(id) from payments;
+-----------+
| count(id) |
+-----------+
|  50000000 |
+-----------+
1 row in set (5.95 sec)

# all products: count
mysql> select count(id) from products;
+-----------+
| count(id) |
+-----------+
|        26 |
+-----------+
1 row in set (0.03 sec)

# 100 records: users
mysql> select * from users order by id limit 100;
+------+-------+-------+-------+--------+-----------------+
| id   | prop1 | prop2 | prop3 | prop4  | registered_date |
+------+-------+-------+-------+--------+-----------------+
|    1 |    41 |   857 |  7976 | prop_W | 2012-11-09      |
…
|  100 |    75 |   580 |  4855 | prop_X | 2011-05-14      |
+------+-------+-------+-------+--------+-----------------+
100 rows in set (3.03 sec)

# 100 records: payments
mysql> select * from payments order by id limit 100;
+------+---------+------------+----------+--------+---------------------+
| id   | user_id | product_id | quantity | sale   | buy_at              |
+------+---------+------------+----------+--------+---------------------+
|    1 |  817590 |          6 |       12 |  87960 | 2011-05-03 21:12:16 |
…
|  100 |  468994 |         14 |       39 | 300300 | 2012-08-29 07:10:37 |
+------+---------+------------+----------+--------+---------------------+
100 rows in set (2 min 9.46 sec)

# 100 records: products
mysql> select * from products order by id limit 100;
+------+--------+-------+
| id   | name   | price |
+------+--------+-------+
|    1 | item_A |  3800 |
…
|   26 | item_Z |  8240 |
+------+--------+-------+
26 rows in set (0.05 sec)

# a user: users
mysql> select * from users where id = 500000;
+--------+-------+-------+-------+--------+-----------------+
| id     | prop1 | prop2 | prop3 | prop4  | registered_date |
+--------+-------+-------+-------+--------+-----------------+
| 500000 |    23 |   371 |  5839 | prop_N | 2012-08-22      |
+--------+-------+-------+-------+--------+-----------------+
1 row in set (0.40 sec)

# a user: payments
mysql> select * from payments where user_id = 500000;
+----------+---------+------------+----------+--------+---------------------+
| id       | user_id | product_id | quantity | sale   | buy_at              |
+----------+---------+------------+----------+--------+---------------------+
|  1316372 |  500000 |          5 |       48 | 415680 | 2012-11-07 03:52:05 |
…
| 46098674 |  500000 |          9 |       32 |   7680 | 2011-04-01 11:17:47 |
+----------+---------+------------+----------+--------+---------------------+
40 rows in set (3.18 sec)

# group by: users
mysql> select
    ->   registered_date,
    ->   count(id) as uu
    -> from users
    -> group by registered_date
    -> order by registered_date;
+-----------------+------+
| registered_date | uu   |
+-----------------+------+
| 2010-01-01      |  974 |
…
| 2012-12-31      |  940 |
+-----------------+------+
1096 rows in set (0.42 sec)

# group by: payments
mysql> select
    ->   product_id,
    ->   count(id) as cnt,
    ->   sum(sale) as sale
    -> from payments
    -> group by product_id
    -> order by product_id;
+------------+---------+--------------+
| product_id | cnt     | sale         |
+------------+---------+--------------+
|          1 | 1922437 |  92240910320 |
…
|         26 | 1924741 | 189485558780 |
+------------+---------+--------------+
26 rows in set (17.75 sec)

# data join
mysql> select
    ->   usr.id as user_id,
    ->   pdt.name as product_name,
    ->   pmt.quantity,
    ->   pmt.sale,
    ->   pmt.buy_at
    -> from users as usr
    ->   left outer join payments as pmt
    ->   on
    ->     pmt.user_id = usr.id
    ->   left outer join products as pdt
    ->   on
    ->     pdt.id = pmt.product_id
    -> limit 100;
+---------+--------------+----------+--------+---------------------+
| user_id | product_name | quantity | sale   | buy_at              |
+---------+--------------+----------+--------+---------------------+
|  662139 | item_Q       |       42 | 399840 | 2012-03-24 07:13:09 |
…
|  823105 | item_F       |       25 | 183250 | 2011-10-26 21:08:21 |
+---------+--------------+----------+--------+---------------------+
100 rows in set (1.40 sec)

# data join & sort
mysql> select
    ->   usr.id as user_id,
    ->   pdt.name as product_name,
    ->   pmt.quantity,
    ->   pmt.sale,
    ->   pmt.buy_at
    -> from users as usr
    ->   left outer join payments as pmt
    ->   on
    ->     pmt.user_id = usr.id
    ->   left outer join products as pdt
    ->   on
    ->     pdt.id = pmt.product_id
    -> order by usr.id
    -> limit 100;
+---------+--------------+----------+--------+---------------------+
| user_id | product_name | quantity | sale   | buy_at              |
+---------+--------------+----------+--------+---------------------+
|       1 | item_V       |       13 |  30290 | 2011-08-21 01:30:40 |
…
|       2 | item_N       |       48 | 369600 | 2011-11-23 09:00:20 |
+---------+--------------+----------+--------+---------------------+
100 rows in set (3 min 56.20 sec)

# join count
mysql> select count(usr.id)
    -> from users as usr
    ->   left outer join payments as pmt
    ->   on
    ->     pmt.user_id = usr.id
    ->   left outer join products as pdt
    ->   on
    ->     pdt.id = pmt.product_id;
+---------------+
| count(usr.id) |
+---------------+
|      50000000 |
+---------------+
1 row in set (1 min 32.36 sec)

# delete a user: users
mysql> delete from users where id = 500000;
Query OK, 1 row affected (0.42 sec)

# delete a user: payments
mysql> delete from payments where user_id = 500000;
Query OK, 40 rows affected (3.27 sec)

# delete whole data
mysql> delete from users;
Query OK, 999999 rows affected (1.52 sec)
mysql> delete from payments;
Query OK, 49999960 rows affected (1 min 35.67 sec)
mysql> delete from products;
Query OK, 26 rows affected (0.32 sec)
select-insert が実施出来ないので select 文でデータ抽出&バルクロードという形で fact のデータ作成を実施。
# データ抽出
$ time idbmysql -u root -h localhost -D performance -e "
select
  usr.id,
  min(usr.prop1),
  min(usr.prop2),
  min(usr.prop3),
  min(usr.prop4),
  min(usr.registered_date),
  sum(pmt.quantity),
  sum(pmt.sale),
  min(pmt.buy_at),
  max(pmt.buy_at)
from users as usr
  left outer join payments as pmt
  on
    pmt.user_id = usr.id
group by usr.id;
" > result.tsv
 
real 4m24.022s
user 0m2.212s
sys 0m0.400s

# データ投入
cpimport performance fact ./result.tsv -s "\t"
~省略~
2012-09-01 23:53:32 (19885) INFO : For table performance.fact: 1000000 rows processed and 1000000 rows inserted.
2012-09-01 23:53:32 (19885) INFO : Bulk load completed, total run time : 8.37294 seconds

全体的に MonetDB よりもパフォーマンスは遅めなのかなという印象。ただ、クエリ実施中に cpu が 100% 近く達している事からもっと性能の良いマシンであれば結果は変わってくるかもしれない。またメモリの量も効いてくるはず。
あくまでも『今回の環境』だと上記の結果という事しか言えない。
※DB設定はインストール時のまま検証を行なっている。DWH用途にある程度のチューニングは実施済みだと思うが頑張ればもっとパフォーマンスは上がるかもしれない

0 件のコメント:

コメントを投稿