Õë¶Ô·ÖÇø±í£¬ÎÒÃÇҪΪÿһ¸ö·ÖÇø·Ö±ð×°ÔØÊý¾Ý£¬ÈçÏÂËùʾ£ºÎÒÃÇΪ sls_product_dim_part
·ÖÇø±í×°ÔØÁË product_line_code Ϊ 991£¬992,993 Èý¸ö·ÖÇøµÄÊý¾Ý£¬Big SQL
»áÔÚ sls_product_dim_part Ŀ¼ÏÂΪÿһ¸ö·ÖÇø´´½¨Ò»¸öĿ¼£º
Çåµ¥ 18. ×°ÔØ·ÖÇø±í
sls_product_dim_load_991.sql
load hive data local inpath '../samples/data/sls_product_prt_991.txt' overwrite into table gosalesdw. sls_product_dim_part partition (product_line_code=991);
sls_product_dim_load_992.sql
load hive data local inpath '../samples/data/sls_product_prt_992.txt' into table gosalesdw.sls_product_dim_part partition (product_line_code=992); ~
sls_product_dim_load_993.sql
load hive data local inpath '../samples/data/sls_product_prt_993.txt' into table gosaleSDW.sls_product_dim_part partition (product_line_code=993);
biadmin@imtebi1:/opt/ibm/biginsights/bin> hadoop fs -ls /biginsights/hive/warehouse/gosalesdw.db/sls_product_dim_part Found 3 items drwxr-xr-x - biadmin supergroup 0 2013-12-15 07:52 /biginsights/hive/warehouse/gosalesdw.db/sls_product_dim_part/product_line_code=991 drwxr-xr-x - biadmin biadmgrp 0 2013-12-15 07:52 /biginsights/hive/warehouse/gosalesdw.db/sls_product_dim_part/product_line_code=992 drwxr-xr-x - biadmin biadmgrp 0 2013-12-15 07:53 /biginsights/hive/warehouse/gosalesdw.db/sls_product_dim_part/product_line_code=993 |
ÎÒÃÇÒ²¿ÉÒÔͨ¹ýÖ´ÐÐ http://172.16.42.202:8080/
À´Æô¶¯ BigInsights Console À´²é¿´·ÖÇø±í´´½¨µÄĿ¼½á¹¹£¬ÈçÏÂËùʾ£º

ͼ 10. ʹÓà BigInsights
Console ²é¿´·ÖÇø±íĿ¼½á¹¹
Ŀǰ£¬Big SQL Ôݲ»Ö§³Ö Hive µÄ Bucketed table¡£
Big SQL ²»½öÖ§³Ö»ù±¾µÄÊý¾ÝÀàÐÍ£¬Ëü»¹Ö§³Ö Array Êý×é¼° Struct
½á¹¹¸´ÔÓÊý¾ÝÀàÐÍ¡£Êý×éÀàÐÍÊÇÓÉÒ»×éÏàͬµÄÊý¾ÝÀàÐÍ×é³É£¬ÎÒÃÇ¿ÉÒÔʹÓÃϱêÀ´·ÃÎÊÊý¾ÝÖеÄÔªËØ£¬Æäϱê´Ó 1
¿ªÊ¼£¬Èç phone[1]£»½á¹¹ÊÇÓÉһϵÁв»Í¬ÀàÐÍÊý¾Ý×é³É£¬ÎÒÃÇ¿ÉÒÔʹÓá±.¡±À´·ÃÎʽṹÖеÄÔªËØ£¬Èç address.city£¬Ä¿Ç°£¬²»Ö§³Ö½á¹¹µÄǶÌס£ÈçÏÂËùʾ£¬ÎÒÃÇ´´½¨ÁË
customer ±í£¬Ëü°üº¬Êý×é¼°½á¹¹Êý¾ÝÀàÐÍ£º
Çåµ¥ 19. ´´½¨¸´ÔÓÊý¾ÝÀàÐͱí
db25.sql
CREATE TABLE customer ( id int, name VARCHAR(100), phones ARRAY (20)>, address STRUCT(100), city:VARCHAR(100), state:VARCHAR(10), Country:VARCHAR(10),zip:VARCHAR(5)> ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' COLLECTION ITEMS TERMINATED BY ':';
biadmin@imtebi1:/opt/ibm/biginsights/bigsql/samples/queries> jsqsh --user=biadmin --password=password --server localhost --port 7052 --driver=bigsql -i db25.sql 0 rows affected (total: 0.73s)
Data.txt
1|David Li|63618888-12345:63618888|PGP25F:Beijing:Beijing:China:100101 2|David Wang|63618888-12346:63618888|PGP25F:Beijing:Beijing:China:100101 3|Jason Zhang|63618888-12347:63618888|PGP25F:Beijing:Beijing:China:100101 4|David Yang|63618888-12348:63618888|PGP25F:Beijing:Beijing:China:100101 5|Jason Wang|63618888-12349:63618888|PGP25F:Beijing:Beijing:China:100101
loadcomp.sql
load hive data local inpath '../samples/data/data.txt' overwrite into table customer;
biadmin@imtebi1:/opt/> $BIGSQL_HOME/bin/jsqsh --user=biadmin --password=password --server localhost --port 7052 --driver=bigsql -i loadcomp.sql ok. (total: 15.17s)
biadmin@imtebi1:/opt/ibm/biginsights/bigsql/samples/queries> jsqsh --user=biadmin --password=password --server localhost --port 7052 --driver=bigsql JSqsh Release 1.5-ibm, Copyright (C) 2007-2013, Scott C. Gray Type \help for available help topics. Using JLine. [localhost][biadmin] 1> select name,phones[1],address.country from customer; +-------------+----------------+---------+ | name | | Country | +-------------+----------------+---------+ | David Li | 63618888-12345 | China | | David Wang | 63618888-12346 | China | | Jason Zhang | 63618888-12347 | China | | David Yang | 63618888-12348 | China | | Jason Wang | 63618888-12349 | China | +-------------+----------------+---------+ 5 rows in results(first row: 0.28s; total: 0.40s) |
¼ÓÔØ Hive ±í
Big SQL LOAD ÃüÁî¿ÉÖ±½Ó´Ó´æ´¢ÔÚ±¾µØ»ò BigInsights
·Ö²¼Ê½ÎļþϵͳÖеÄÎļþ»ò´Ó¶àÖÖ¹ØÏµÊý¾Ý¿âϵͳ£¨ÊÜ Netezza ¼¼ÊõÖ§³ÖµÄ IBM PureData
Systems for Analytics¡¢DB2 ºÍ Teradata£©ÖжÁÈ¡Êý¾Ý×°ÔØµ½ Hive ±íÖС£
ÈçÏÂËùʾ£¬ÎÒÃÇʹÓà Big SQL LOAD ÃüÁî´Ó±¾µØÎļþ×°ÔØÊý¾Ýµ½
Hive ±íÖУº
Çåµ¥ 20. ´ÓÎļþ×°ÔØÊý¾Ýµ½ Hive ±í
--GOSALESDW.SLS_SALES_FACT load hive data local inpath '../samples/data/GOSALESDW.SLS_ORDER_METHOD_DIM.txt' overwrite into table SLS_ORDER_METHOD_DIM; load hive data local inpath '../samples/data/GOSALESDW.SLS_SALES_FACT.txt' overwrite into table SLS_SALES_FACT; load hive data local inpath '../samples/data/GOSALESDW.SLS_ORDER_METHOD_DIM.txt' overwrite into table SLS_SALES_ORDER_DIM; |
ÎÒÃÇÒ²¿ÉÒÔ½« DB2 Êý¾Ý¿â staff ±íÖеÄÊý¾Ý×°ÔØµ½ÎÒÃÇ´´½¨µÄ staff_sales
·ÖÇø±íÖУ¬ÈçÏÂËùʾ£¬ÔÚʹÓùØÏµÊý¾Ý¿â×°ÔØÊý¾Ýµ½ Hive ±íʱ£¬ÎÒÃÇÐèÒª½«Ïà¹ØµÄ JAR °ü¸´ÖƵ½ /opt/ibm/biginsights/sqoop/lib
Ŀ¼Ï£¬²¢ÖØÆô Big SQL ·þÎñ¡£±¾´ÎÃüÁîÖУ¬ÎÒÃǽ« DB2 µÄ db2jcc.jar, db2jcc_license_cu.jar
Îļþ¸´ÖƵ½ /opt/ibm/biginsights/sqoop/lib£¬²¢ÖØÆô Big SQL ·þÎñ¡£
Çåµ¥ 21. ´ÓÊý¾Ý¿â×°ÔØÊý¾Ýµ½ Hive ±í
db21.sql
create table staff_sales ( id smallint, name varchar(100), years smallint, salary decimal, comm decimal) partitioned by (dept smallint, job char(5)) ;
LOAD USING JDBC CONNECTION URL 'jdbc:db2://imtebi1:50001/SAMPLE' WITH PARAMETERS (user = 'db2inst1',password='password') FROM TABLE STAFF WHERE "dept=66 and job='Sales'" INTO TABLE staff_sales PARTITION ( dept=66 , job='Sales') APPEND WITH LOAD PROPERTIES (bigsql.load.num.map.tasks = 1) ;
biadmin@imtebi1:/opt/ibm/biginsights/bigsql/samples/queries> jsqsh --user=biadmin --password=password --server localhost --port 7052 --driver=bigsql -i db21.sql 0 rows affected (total: 0.65s) 3 rows affected (total: 1m28.84s)
biadmin@imtebi1:/opt/ibm/biginsights/bigsql/samples/queries> jsqsh --user=biadmin --password=password --server localhost --port 7052 --driver=bigsql JSqsh Release 1.5-ibm, Copyright (C) 2007-2013, Scott C. Gray Type \help for available help topics. Using JLine. [localhost][biadmin] 1> select * from staff_sales; +-----+----------+-------+--------+------+------+-------+ | id | name | years | salary | comm | dept | job | +-----+----------+-------+--------+------+------+-------+ | 280 | Wilson | 9 | 78674 | 811 | 66 | Sales | | 310 | Graham | 13 | 71000 | 200 | 66 | Sales | | 320 | Gonzales | 4 | 76858 | 844 | 66 | Sales | +-----+----------+-------+--------+------+------+-------+ 3 rows in results(first row: 0.36s; total: 0.37s)
|
´´½¨¡¢¼ÓÔØ Hbase ±í
´´½¨ Hbase ±í
ÎÒÃÇ¿ÉÒÔͨ¹ýÖ´ÐÐ CREATE HBASE TABLE ÃüÁîÀ´´´½¨ HBase
±í£¬Big SQL ½«Ê¹ÓÃ×Ô¼º×¨ÊôµÄ HBase ´æ´¢¹ÜÀí³ÌÐòÀ´¹ÜÀí HBase Êý¾ÝµÄ´æ´¢¹ÜÀí£¬¸Ã´æ´¢¹ÜÀí³ÌÐòÌṩÁËÔöÇ¿µÄ
HBase ¹ÜÀí¹¦ÄÜ£¬°üÀ¨¿ÉÒÔ´´½¨×éºÏÐн¡¼°×éºÏ×ֶΣ¬´´½¨¸¨ÖúË÷Òý£¬Îª²»Í¬ÁÐÖ¸¶¨²»Í¬µÄѹËõËã·¨£¬Ìṩ¼òÃ÷µÄ
HBase ±í´´½¨Óï·¨µÈ¡£ÈçÏÂËùʾ£¬ÎÒÃÇ´´½¨ÁË SLS_HBASE_SALES_FACT ±í£º
Çåµ¥ 22. ´´½¨ HBase ±í
use gosalesdw; CREATE HBASE TABLE SLS_HBASE_SALES_FACT ( ORDER_DAY_KEY int, ORGANIZATION_KEY int, EMPLOYEE_KEY int, RETAILER_KEY int, RETAILER_SITE_KEY int, PRODUCT_KEY int, PROMOTION_KEY int, ORDER_METHOD_KEY int, SALES_ORDER_KEY int, SHIP_DAY_KEY int, CLOSE_DAY_KEY int, QUANTITY int, UNIT_COST decimal(19,2), UNIT_PRICE decimal(19,2), UNIT_SALE_PRICE decimal(19,2), GROSS_MARGIN double, SALE_TOTAL decimal(19,2), GROSS_PROFIT decimal(19,2) ) column mapping (key mapped by (ORDER_DAY_KEY,ORGANIZATION_KEY, EMPLOYEE_KEY, RETAILER_KEY, RETAILER_SITE_KEY, PRODUCT_KEY, PROMOTION_KEY, ORDER_METHOD_KEY) encoding binary, cf1:sales_order mapped by (SALES_ORDER_KEY, SHIP_DAY_KEY,CLOSE_DAY_KEY) separator '|' encoding string, cf2:sales_data mapped by (QUANTITY, UNIT_COST, UNIT_PRICE, UNIT_SALE_PRICE, GROSS_MARGIN, SALE_TOTAL, GROSS_PROFIT) separator '|' encoding string ) column family options (cf1 compression(gz) bloom filter( none) in memory, cf2 compression(gz) bloom filter( none) in memory) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'; |
ÎÒÃÇ¿ÉÒÔΪ gosalesdw.SLS_HBASE_SALES_FACT
±í´´½¨¸¨ÖúË÷Òý£¬ÈçÏÂËùʾ£º
Çåµ¥ 23. ´´½¨ HBase ¸¨ÖúË÷Òý
CREATE INDEX index1 ON TABLE gosalesdw.SLS_HBASE_SALES_FACT(SALES_ORDER_KEY) AS 'HBASE'; |
ÎÒÃÇÒ²¿ÉÒÔʹÓà Hive ±¾ÉíÌṩµÄ HBase ´æ´¢¹ÜÀí³ÌÐòÀ´´´½¨ HBase
±í£¬µ«ÊÇ£¬Big SQL ²»ÄÜÖ±½Ó¶ÁÈ¡ÕâЩ±í£¬ÎÒÃÇÐèÒªÔÚ Big SQL Öд´½¨Íⲿ±íÀ´·ÃÎÊËü¡£ÔÚ Big
SQL ÖУ¬Ê¹ÓÃÍⲿ±íµÄÏÞÖÆÊDz»ÄÜÔÚÍⲿ±íÉÏ´´½¨¸¨ÖúË÷Òý¡£ÎÒÃÇ¿ÉÒÔΪÒѾ´æÔÚµÄ HBase ±í´´½¨¶à¸öÍⲿ±í£¬Ï൱ÓÚ´´½¨¶à¸öÊÓͼ¡£ÈçÏÂËùʾ£¬ÎÒÃÇ´´½¨ÁË
EXTERNAL_SALES_FACT HBase Íⲿ±í£º
Çåµ¥ 24. ´´½¨ HBase Íⲿ±í
CREATE external HBASE TABLE EXTERNAL_SALES_FACT ( ORDER_DAY_KEY int, ORGANIZATION_KEY int, EMPLOYEE_KEY int, RETAILER_KEY int, RETAILER_SITE_KEY int, PRODUCT_KEY int, PROMOTION_KEY int, ORDER_METHOD_KEY int, QUANTITY int, UNIT_COST decimal(19,2), UNIT_PRICE decimal(19,2), UNIT_SALE_PRICE decimal(19,2), GROSS_MARGIN double, SALE_TOTAL decimal(19,2), GROSS_PROFIT decimal(19,2) ) column mapping (key mapped by (ORDER_DAY_KEY,ORGANIZATION_KEY, EMPLOYEE_KEY, RETAILER_KEY, RETAILER_SITE_KEY, PRODUCT_KEY, PROMOTION_KEY, ORDER_METHOD_KEY) encoding binary, cf1:sales_data mapped by (QUANTITY, UNIT_COST, UNIT_PRICE, UNIT_SALE_PRICE, GROSS_MARGIN, SALE_TOTAL, GROSS_PROFIT) separator '|' encoding string ) hbase table name 'gosalesdw.sls_hbase_sales_fact'; |
¼ÓÔØ HBase ±í
ÎÒÃÇ¿ÉÒÔʹÓà insert Óï¾äΪ HBase ±í²åÈëÊý¾Ý£¬ÈçÏÂËùʾ£º
Çåµ¥ 25. ʹÓà insert Óï¾äΪ HBase ±í²åÈëÊý¾Ý
create hbase table hbase_staff ( id smallint, name varchar(100)) COLUMN MAPPING ( key mapped by (id), cf:name mapped by (name) ); insert into hbase_staff(id,name) values(202,'David Li'); |
ͬÑù£¬Big SQL LOAD ÃüÁî¿ÉÖ±½Ó´Ó´æ´¢ÔÚ±¾µØ»ò BigInsights
·Ö²¼Ê½ÎļþϵͳÖеÄÎļþ»ò´Ó¶àÖÖ¹ØÏµÊý¾Ý¿âϵͳ£¨ÊÜ Netezza ¼¼ÊõÖ§³ÖµÄ IBM PureData
Systems for Analytics¡¢DB2 ºÍ Teradata£©ÖжÁÈ¡Êý¾Ý×°ÔØµ½ HBase
±íÖС£
ÈçÏÂËùʾ£¬ÎÒÃÇʹÓà Big SQL LOAD ÃüÁî´Ó BigInsights
·Ö²¼Ê½ÎļþϵͳÖеÄÎļþ×°ÔØÊý¾Ýµ½ HBase ±íÖУº
Çåµ¥ 26. ´ÓÎļþ×°ÔØÊý¾Ýµ½ HBase ±í
load hbase data inpath 'hdfs://imtebi1.imte.com:9000/biginsights/GOSALESDW.SLS_SALES_FACT.txt' DELIMITED FIELDS TERMINATED BY '\t' into table gosalesdw.SLS_HBASE_SALES_FACT; |
ÎÒÃÇÒ²¿ÉÒÔ½« DB2 Êý¾Ý¿â staff ±íÖеÄÊý¾Ý×°ÔØµ½ÎÒÃÇ´´½¨µÄ hbase_staff
HBase ±íÖУ¬ÈçÏÂËùʾ£¬ÔÚʹÓùØÏµÊý¾Ý¿â×°ÔØÊý¾Ýµ½ HBase ±íʱ£¬ÎÒÃÇÐèÒª½«Ïà¹ØµÄ JAR °ü¸´ÖƵ½
/opt/ibm/biginsights/sqoop/lib Ŀ¼Ï£¬²¢ÖØÆô Big SQL ·þÎñ¡£±¾´ÎÃüÁîÖУ¬ÎÒÃǽ«
DB2 µÄ db2jcc.jar, db2jcc_license_cu.jar Îļþ¸´ÖƵ½ /opt/ibm/biginsights/sqoop/lib£¬²¢ÖØÆô
Big SQL ·þÎñ¡£
Çåµ¥ 27. ´ÓÊý¾Ý¿â×°ÔØÊý¾Ýµ½ HBase ±í
db22.sql
create hbase table hbase_staff ( id smallint, name varchar(100)) COLUMN MAPPING ( key mapped by (id), cf:name mapped by (name) );
LOAD USING JDBC CONNECTION URL 'jdbc:db2://imtebi1:50001/SAMPLE' WITH PARAMETERS (user = 'dbinst1',password = 'password' ) FROM TABLE STAFF COLUMNS (ID, NAME) WHERE " ID > 100 and NAME like 'S%' " INTO hbase TABLE hbase_staff APPEND WITH LOAD PROPERTIES (bigsql.load.num.map.tasks = 1);
biadmin@imtebi1:/opt/ibm/biginsights/bigsql/samples/queries> jsqsh --user=biadmin --password=password --server localhost --port 7052 --driver=bigsql -i db22.sql 0 rows affected (total: 5.58s) 3 rows affected (total: 1m34.47s)
biadmin@imtebi1:/opt/ibm/biginsights/bigsql/samples/queries> jsqsh --user=biadmin --password=password --server localhost --port 7052 --driver=bigsql JSqsh Release 1.5-ibm, Copyright (C) 2007-2013, Scott C. Gray Type \help for available help topics. Using JLine. [localhost][biadmin] 1> select * from hbase_staff1; +-----+----------+ | id | name | +-----+----------+ | 190 | Sneider | | 200 | Scoutten | | 220 | Smith | +-----+----------+ 3 rows in results(first row: 1.70s; total: 1.70s) |
ÎÒÃÇ»¹¿ÉÒÔͨ¹ý select Óï¾ä½« DB2 Êý¾Ý¿â staff ±íÖеÄÊý¾Ý×°ÔØµ½ÎÒÃÇ´´½¨µÄ
hbase_staff1 HBase ±íÖУ¬ÈçÏÂËùʾ£º
Çåµ¥ 28. ͨ¹ý select Óï¾ä×°ÔØÊý¾Ýµ½ HBase ±í
create hbase table hbase_staff1 ( id smallint, name varchar(100), dept smallint, job string, years smallint, salary string, comm string) COLUMN MAPPING ( key mapped by (ID), cf:name mapped by (name), cf:dept mapped by (dept), cf:job mapped by (job), cf:years mapped by (years), cf:salary mapped by (salary), cf:comm mapped by(comm) );
LOAD USING JDBC CONNECTION URL 'jdbc:db2://imtebi1:50001/SAMPLE' WITH PARAMETERS (user = 'db2inst1',password = 'password' ) FROM SQL QUERY 'select * from staff where $CONDITIONS' INTO hbase TABLE hbase_staff1 APPEND WITH LOAD PROPERTIES (bigsql.load.num.map.tasks = 1);
|
²éѯÊý¾Ý
Big SQL Ìṩ·ûºÏ ANSI SQL 92 ±ê×¼µÄ SQL ½Ó¿Ú£¬Ö§³ÖÔ¤²â¡¢ÏÞÖÆ¡¢Á¬½Ó¡¢ÁªºÏ¡¢ÅÅÐòºÍ·Ö×éÊý¾ÝµÄ
SELECT Óï¾ä¡£×Ó²éѯºÍ³£¼ûµÄ±í±í´ïʽ£¨ÒÔ WITH ×Ӿ俪ͷµÄ²éѯ£©Ò²Êܵ½Ö§³Ö¡£Big SQL ÌṩÁËÊýÊ®¸öÄÚÖõĺ¯Êý£¬°üÀ¨Ò»Ð©×¨ÓÃÓÚʵÏÖ
Hive ¼æÈÝÐԵĺ¯Êý¡£Ëü»¹Ö§³Ö´°¿Úº¯Êý¡£Èç¹ûÐèÒª£¬SQL ³ÌÐòÔ±»¹¿ÉÒÔÏÞ֯ij¸ö¸ø¶¨²éѯ·µ»ØµÄÐÐÊý¡£Big
SQL Ö§³Ö±ê×¼µÄ SQL Êý¾ÝÀàÐÍ£¬°üÀ¨ tinyint, smallint, bigint, varchar(),
binary(), decimal(), timestamp, struct, array µÈ¡£ÊìϤ SQL
µÄÓû§£¬¿ÉÒÔͨ¹ý Big SQL Ö±½Ó·ÃÎÊ»ùÓÚ Hadoop µÄ BigInsights ´óÊý¾Ý£¬¶ø²»ÐèÒªÐÞ¸ÄÒÑÓеÄÓ¦ÓóÌÐò£¬Îª
SQL Óû§·ÃÎÊ Hadoop ´óÊý¾ÝÌṩÁËÒ»¸ö¼ò±ã¡¢ÊìϤµÄ·½·¨¡£
ÁíÍ⣬Big SQL ÌṩÁ˱ê×¼µÄ JDBC/ODBC Çý¶¯³ÌÐò£¬Óû§ÒÑÓеÄ
SQL ¹ÜÀí¹¤¾ßÒ²¿ÉÒÔÖ±½Ó·ÃÎÊ BigInsights ´óÊý¾Ý¡£
Big SQL Ö§³Ö²éѯ²Ù×÷£¬µ«²»Ö§³Ö SQL UPDATE »ò DELETE
Óï¾ä¡£INSERT Óï¾ä½öÖ§³ÖÓÃÓÚ HBase ±í¡£
´Ë°æ±¾²»Ö§³ÖÊÓͼºÍÓû§¶¨ÒåµÄÔ¼Êø£¬Õâ¶þÕßÔÚ¹ØÏµÊý¾Ý¿âÖкܳ£¼û¡£²ÎÕÕÍêÕûÐÔÔ¼ÊøºÍÌØ¶¨ÓÚÓòµÄÔ¼ÊøÓ¦ÔÚÓ¦ÓóÌÐò¼¶±ðÉÏÖ´ÐС£ÎÞÐèʹÓÃ
GRANT ºÍ REVOKE Óï¾äÀ´ÏÞÖÆÊý¾Ý·ÃÎÊ£¬¹ÜÀíÔ±¿ÉʹÓñê×¼µÄ Hadoop ÃüÁîÖ¸¶¨ Hive
Êý¾ÝµÄÎļþϵͳ·ÃÎÊÌØÈ¨¡£Òò´Ë£¬Ó¦ÔÚ±í¼¶±ðÉÏ¿¼ÂÇÌØÈ¨£¬¶ø²»ÊÇÔÚÐлòÁм¶±ðÉÏ¡£
´«Í³ÊÂÎñ¹ÜÀí²»ÊÇ Hadoop Éú̬ϵͳµÄÒ»²¿·Ö£¬ËùÒÔ Big SQL
µÄÔËÐÐÎ´Éæ¼°µ½ÊÂÎñ»òËø¹ÜÀí£¬Õâ±íÃ÷Ìá½»ºÍ»Ø¹ö²Ù×÷²»ÊÜÖ§³Ö¡£
ÈçÏÂËùʾ£¬ÎÒÃÇ¿ÉÒÔʹÓà Big SQL Ö´Ðи´ÔӵĶà±íÁ¬½Ó²éѯ²Ù×÷£º
Çåµ¥ 29. ¶à±íÁ¬½Ó²éѯ
GOSALESDW_Counts_With_Joins.sql
SELECT count(*) FROM GOSALESDW.SLS_ORDER_METHOD_DIM AS MD, GOSALESDW.SLS_PRODUCT_DIM AS PD, GOSALESDW.EMP_EMPLOYEE_DIM AS ED, GOSALESDW.SLS_SALES_FACT AS SF WHERE PD.PRODUCT_KEY = SF.PRODUCT_KEY /*+ joinMethod = 'mapSideHash', buildTable = 'pd' +*/ AND PD.PRODUCT_NUMBER > 10000 AND PD.BASE_PRODUCT_KEY > 30 AND MD.ORDER_METHOD_KEY = SF.ORDER_METHOD_KEY /*+
joinMethod = 'mapSideHash', buildTable = 'md' +*/ AND ED.EMPLOYEE_KEY = SF.EMPLOYEE_KEY /*+ joinMethod = 'mapSideHash', buildTable = 'ed' +*/ AND ED.MANAGER_CODE1 > 20 AND MD.ORDER_METHOD_CODE > 5 ; |
ÎÒÃÇ¿ÉÒÔʹÓà with Óï¾äÖØÓà SQL ²éѯ»ò SQL ²éѯƬ¶ÎÀ´Ö´Ðиü¸´ÔӵIJéѯ²Ù×÷£¬²¢Ê¹ÓÃ
limit À´ÏÞÖÆÊä³ö½á¹û¼¯£¬ÈçÏÂËùʾ£º
Çåµ¥ 30. with Óï¾ä
GOSALESDW_starSchemaJoin.sql
WITH SALES AS (SELECT SF.* FROM GOSALESDW.SLS_ORDER_METHOD_DIM AS MD, GOSALESDW.SLS_PRODUCT_DIM AS PD, GOSALESDW.EMP_EMPLOYEE_DIM AS ED, GOSALESDW.SLS_SALES_FACT AS SF WHERE PD.PRODUCT_KEY = SF.PRODUCT_KEY /*+ joinMethod =
'mapSideHash', buildTable = 'pd' +*/ AND PD.PRODUCT_NUMBER > 10000 AND PD.BASE_PRODUCT_KEY > 30 AND MD.ORDER_METHOD_KEY = SF.ORDER_METHOD_KEY /*+
joinMethod = 'mapSideHash', buildTable = 'md' +*/ AND MD.ORDER_METHOD_CODE > 5 AND ED.EMPLOYEE_KEY = SF.EMPLOYEE_KEY /*+ joinMethod = 'mapSideHash', buildTable = 'ed' +*/ AND ED.MANAGER_CODE1 > 20), INVENTORY AS (SELECT IF.* FROM GOSALESDW.GO_BRANCH_DIM AS BD, GOSALESDW.DIST_INVENTORY_FACT AS IF WHERE IF.BRANCH_KEY = BD.BRANCH_KEY /*+ joinMethod = 'mapSideHash', buildTable = 'bd' +*/ AND BD.BRANCH_CODE > 20) SELECT SALES.PRODUCT_KEY AS PROD_KEY, SUM(CAST
(INVENTORY.QUANTITY_SHIPPED AS BIGINT)) AS INV_SHIPPED, SUM(CAST (SALES.QUANTITY AS BIGINT)) AS PROD_QUANTITY, RANK() OVER ( ORDER BY SUM(CAST (SALES.QUANTITY AS BIGINT)) DESC) AS PROD_RANK FROM SALES, INVENTORY WHERE SALES.PRODUCT_KEY = INVENTORY.PRODUCT_KEY GROUP BY SALES.PRODUCT_KEY Limit 20;
|
ÎÒÃÇ»¹¿ÉÒÔʹÓô°¿Úº¯ÊýÖ´Ðи´ÔÓµÄͳ¼Æ²éѯ£¬ÈçÏÂËùʾ£º
Çåµ¥ 31. ʹÓô°¿Úº¯Êý
GOSALESDW_rank.sql
SELECT EXTRACT(YEAR FROM CAST(CAST (order_day_key AS varchar(100)) AS timestamp)) AS year, SUM (sale_total) AS total_sales, RANK () OVER (ORDER BY SUM (sale_total) DESC) AS ranked_sales FROM gosalesdw.sls_sales_fact GROUP BY EXTRACT(YEAR FROM CAST(CAST (order_day_key AS varchar(100)) AS timestamp)); |
ÁíÍ⣬Big SQL »¹ÌṩÁËÒ»×éÔªÊý¾Ý±í£¬°üÀ¨ syscat.tables¡¢syscat.columns¡¢syscat.schemas¡¢syscat.indexcolumns¡£
ÈçÏÂËùʾ£¬ÎÒÃÇ¿ÉÒÔͨ¹ý²éѯÕâЩԪÊý¾Ý±íÀ´Á˽â Big SQL Öж¨ÒåµÄ±í¡¢ÁÐÐÅÏ¢£º
Çåµ¥ 32. Big SQL ÔªÊý¾Ý±í
[localhost][biadmin] 1> select * from syscat.tables where tablename='sls_sales_order_dim'; +------------+---------------------+ | schemaname | tablename | +------------+---------------------+ | gosalesdw | sls_sales_order_dim | +------------+---------------------+ 1 row in results(first row: 0.17s; total: 0.17s)
[localhost][biadmin] 1> select * from syscat.columns where tablename='sls_sales_order_dim'; +------------+------------+-------------+------ | schemaname | tablename | name | type | precision | scale | fulltype | +------------+------------+-------------+------ | gosalesdw | sls_sales_ | sales_order | INT | 10 | 0 | int | | | order_dim | _key | | | | | | gosalesdw | sls_sales_ | order_detai | INT | 10 | 0 | int | | | order_dim | l_code | | | | | | gosalesdw | sls_sales_ | order_numbe | INT | 10 | 0 | int | | | order_dim | r | | | | | | gosalesdw | sls_sales_ | warehouse_b | INT | 10 | 0 | int | | | order_dim | ranch_code | | | | | +------------+------------+-------------+----- 4 rows in results(first row: 2.8s; total: 2.10s) |
²éѯÓÅ»¯
±¾µØ²éѯ¼° MapReduce ÈÎÎñ
ÎÒÃÇÖªµÀ£¬Hadoop µÄ MapReduce ¿ò¼Ü¿ÉÒÔͨ¹ý·ÖÅä¶à¸ö Mapper¡¢Reducer
ÈÎÎñ²¢Ðд¦Àí´ó¹æÄ£Êý¾Ý¼¯À´ÊµÏÖ¸ßЧÊý¾Ý²éѯ¡£µ«ÊÇ£¬MapReduce Ò²»á²úÉúһЩÔËÐпªÏú£¬Ã¿·ÖÅäÒ»¸ö
mapper »ò reducer ÈÎÎñʱ£¬¶¼»áÉæ¼°µ½ JVM Æô¶¯¼°Í£Ö¹²Ù×÷£»mapper »á½«ÔËÐеÄÖмä½á¹ûдµ½´ÅÅÌ£¬reducer
ͨ¹ý¶ÁÈ¡´ÅÅÌÖмä½á¹û½øÐÐ×îÖÕÔËË㣬ÕâÑù¿ÉÒÔÖØÆô²¿·ÖʧЧ²Ù×÷£»ÈÎÎñÖ®¼äµÄµ÷¶ÈÒ²ÐèÒªÒ»¶¨µÄ¿ªÏú£¬Í¨³£À´½²£¬Ã¿¸öÈÎÎñ¶¼ÐèÒª¸ß´ï
20-30 ÃëµÄ¿ªÏú£¬Õâ¶ÔÓÚÒ»¸ö½Ï´óÊý¾Ý¼¯µÄ²Ù×÷£¬¿ÉÒÔºöÂÔ²»¼Æ£¬µ«¶ÔÓÚÒ»¸öСÊý¾Ý¼¯µÄ²Ù×÷£¬»òʹÓÃÐн¡À´²éÑ¯ÌØ¶¨ÐÐÊý¾Ý¼¯µÄ
HBase ²Ù×÷À´Ëµ£¬ÕâЩ¿ªÏúÏÔÈ»¾Í²»Ì«ºÏÊÊ£¬Îª½â¾öÉÏÊöÎÊÌ⣬Big SQL ¿ÉÒÔ¸ù¾Ý²éѯµÄÐÔÖÊ¡¢Êý¾ÝÁ¿ºÍÆäËûÒòËØ£¬Ñ¡ÔñʹÓÃ
Hadoop µÄ MapReduce ¿ò¼Ü²¢Ðд¦Àí¸÷ÖÖ²éѯÈÎÎñ£¬»òÕßÔÚµ¥¸ö½ÚµãÉ쵀 Big SQL ·þÎñÆ÷Éϱ¾µØÖ´ÐÐÄúµÄ²éѯ£¬
Ò²¿ÉÒÔ²¿·Ö²éѯ¹¤×÷ÔÚ Hadoop µÄ MapReduce ¿ò¼ÜÉÏÍê³É£¬²¿·Ö²éѯ¹¤×÷ÔÚ Big SQL
·þÎñÆ÷ÉÏÍê³É¡£
Big SQL ¿ÉÒÔÖ´ÐÐÈç϶¯Ì¬ÔËÐÐÓÅ»¯²ßÂÔÀ´Ìá¸ß²éѯÐÔÄÜ£º
Õë¶Ô¼òµ¥µÄ SQL ²éѯÓï¾ä£¬Èç SELECT c1,c2 FROM T1£¬¿ÉÒÔ×Ô¶¯Ñ¡Ôñ²ÉÓñ¾µØ²éѯ»¹ÊÇʹÓÃ
Hadoop µÄ MapReduce ¿ò¼ÜÔËÐÐ
Õë¶Ô²éѯµÄÿһ²½ÈÎÎñ£¬Èç¹ûËüµÄÊäÈëÊý¾Ý¼¯ºÜС£¬Ëü½«ÔÚ Big SQL ·þÎñÆ÷±¾µØÖ´ÐУ¬ÎÒÃÇ¿ÉÒÔͨ¹ýÉèÖÃ
$BIGSQL_HOME/conf/bigsql-site.xml ÎļþÖÐµÄ bigsql.localmode.size
ÊôÐÔÀ´½øÐпØÖÆ
Èç¹ûÒ»ÕűíºÜС£¬»áʹÓà map-side hash join Á¬½Ó·½Ê½Ìá¸ß²éѯЧÂÊ£¬ÎÒÃÇ¿ÉÒÔͨ¹ýÉèÖÃ
$BIGSQL_HOME/conf/bigsql-site.xml ÎļþÖÐµÄ bigsql.memoryjoin.size
ÊôÐÔÀ´½øÐпØÖÆ
Õë¶ÔÏà¶Ô¸´ÔÓ SQL ²éѯÓï¾ä£¬ÎÒÃÇ¿ÉÒÔͨ¹ýʹÓà SQL ²éѯÌáʾÀ´ÓÅ»¯²éѯЧÂÊ£¬±ÈÈ磬ÎÒÃÇ¿ÉÒÔÉèÖÃ
/*+ accessmode='local' +*/ ²éѯÌáʾ¸æËß Big SQL£¬Õë¶Ô t1 ±í²ÉÓñ¾µØ²éѯģʽ¡£ÈçÏÂËùʾ£º
Çåµ¥ 33. ²ÉÓñ¾µØ²éѯģʽ
SELECT c1 FROM t1 /*+ accessmode='local' +*/ WHERE c2 > 10 |
ÎÒÃÇÒ²¿ÉÒÔÉèÖà session ¼¶±ðµÄÓÅ»¯ÌáʾÀ´ÓÅ»¯²éѯЧÂÊ£¬ÈçÏÂËùʾ£º
Çåµ¥ 34. ÉèÖà session ¼¶±ðÓÅ»¯Ìáʾ
set force local on; SELECT c1 FROM t1 WHERE c2 > 10; |
´´½¨ HBase ¸¨ÖúË÷Òý
Big SQL Ö§³ÖʹÓà CREATE INDEX Óï¾äΪ HBase
´´½¨¸¨ÖúË÷Òý¡£ÕâЩË÷Òý¿É¸Ä½øÔÚ¼ÓÈëË÷ÒýµÄÁÐÉϽøÐйýÂ˵IJéѯµÄÔËÐÐÐÔÄÜ¡£Õë¶ÔË÷Òý×ֶεĹýÂ˲éѯ»á×Ô¶¯Ê¹ÓÃË÷Òý£¬ÎÒÃÇÒ²¿ÉÒÔͨ¹ý
SQL ²éѯÌáÊ¾Ç¿ÖÆÊ¹Óà HBase ¸¨ÖúË÷Òý£¬ÈçÏÂËùʾ£º
Çåµ¥ 35. HBase ¸¨ÖúË÷ÒýÌáʾ
CREATE INDEX index1 ON TABLE gosalesdw.SLS_HBASE_SALES_FACT(SALES_ORDER_KEY) AS 'HBASE';
select count(*) from gosalesdw.sls_hbase_sales_fact /*+ rowcachesize=2000,useindex='index1' +*/ where sales_order_key<195000; |
ÎÒÃÇ¿ÉÒÔ»ùÓÚµ¥¸ö¼ü»ò¸´ºÏ¼üÀ´´´½¨ HBase ¸¨ÖúË÷Òý¡£
ÔÚ´´½¨ HBase ¸¨ÖúË÷Òýʱ£¬Ë÷Òý±»±£´æÔÚÏàÓ¦µÄ HBase Ë÷Òý±íÖУ¬Big
SQL ʹÓà Map Reduce index builder Ïß³ÌÀ´´´½¨²¢¼ÓÔØ HBase Ë÷Òý±í£¬Í¨¹ý
synchronous coprocessor Ïß³ÌÀ´Í¬²½Ë÷ÒýÐÅÏ¢£¬ÈçÏÂͼËùʾ£º

ͼ 11. HBase ¸¨ÖúË÷Òý¼Ü¹¹
µ±Ê¹Óà Big SQL ½«Êý¾Ý²åÈë HBase ±íÖлò½«À´×ÔÒ»¸öÎļþµÄÊý¾Ý¼ÓÔØµ½
HBase ±íÖÐʱ»á×Ô¶¯¸üÐÂËüµÄË÷Òý¡£µ«ÊÇ£¬ÔÚ BigInsights 2.1 ÖУ¬½«À´×ÔÔ¶³Ì¹ØÏµÊý¾Ý¿âµÄÊý¾Ý¼ÓÔØµ½
HBase ±íÖв»»á×Ô¶¯¸üбíµÄ¸¨ÖúË÷Òý¡£Ïà·´£¬¹ÜÀíÔ±ÐèÒª¶ªÆú²¢ÖØÐ´´½¨±ØÒªµÄË÷Òý¡£
SQL ÓÅ»¯Ìáʾ
Big SQL µÄ²éѯÓÅ»¯Æ÷»á¶¯Ì¬µØ²Î¿¼Ä³Ð©Í³¼ÆÐÅÏ¢À´È·¶¨Ò»ÖÖ¸ßЧµÄÊý¾Ý·ÃÎʲßÂÔ¡£µ«ÊÇ£¬ÔÚһЩÇé¿öÏ£¬Big
SQL ¿ÉÄÜûÓÐ×ã¹»µÄͳ¼ÆÐÅÏ¢¿ÉÓá£ÀýÈ磬ËüµÄ»ù´¡Êý¾ÝÔ´¿ÉÄÜδÌṩÕâЩÐÅÏ¢¡£ÔÚÕâЩÇé¿öÏ£¬ Big SQL
³ÌÐòÔ±½«ÓÅ»¯ÌáʾǶÈëÆä²éѯÖпÉÄÜÓÐËù°ïÖú£¬ÒòΪÕâô×ö¿Éʹ Big SQL Éú³É¸üºÃµÄÖ´Ðмƻ®¡£Ìáʾ¿ÉÓë²éѯִÐÐģʽ£¨±¾µØ»ò²¢ÐУ©¡¢Á¬½Ó·½·¨¡¢Ë÷ÒýʹÓõÈÏà¹Ø¡£
±íÁ¬½Ó˳Ðò
ÎÒÃÇÔÚ FROM ×Ó¾äÖÐÌṩµÄ±íµÄ˳Ðò»á×öΪÒþº¬ÌáʾÀ´¾ö¶¨±íÁ¬½ÓµÄ²ßÂÔ¡£ÔÚ´ó¶àÊýÇé¿öÏ£¬Big
SQL »á°´ÕÕÎÒÃÇÌṩµÄ˳Ðò½øÐбíÁ¬½Ó²Ù×÷¡£Òò´Ë£¬ÎÒÃÇËùÖ¸¶¨µÄ±íµÄ˳ÐòÖÁ¹ØÖØÒª¡£µ±ÎÒÃÇÖ¸¶¨±íµÄ˳Ðòʱ£¬ÐèÒª¿¼ÂÇ£º¾¡¿ÉÄÜÔçµØÏÞÖÆ½á¹û¼¯¡£ÎÒÃÇÐèÒª½«¸ßÑ¡ÔñÐÔµÄ±í£¨ÅжÏν´Ê»á¹ýÂ˵ô´óÁ¿Êý¾Ý£¬»òÕß±íÁ¬½Ó²Ù×÷»áÈ¥³ý´óÁ¿Êý¾Ý£©·Åµ½²éѯµÄǰ±ß£¬ÕâÑù»á¼õÉÙÊä³öµ½²éѯÏÂÒ»½×¶ÎµÄÊý¾ÝÁ¿¡£
±í·ÃÎÊÌáʾ
±í·ÃÎÊÌáʾ»áÓ°ÏìÈçºÎ¶ÁÈ¡±í¡¢ÈçºÎÈ·¶¨Êý¾ÝÔ´ÒÔ¼°ÈçºÎÓÅ»¯²éѯµÄ²ßÂÔ¡£±í·ÃÎÊÌáʾ»áÓ°Ïì¶Á±íµÄÐÐΪ£¬ÊÇʹÓÃ
MapReduce Ö´ÐбíÁ¬½Ó²Ù×÷»¹ÊÇʹÓà hash Á¬½Ó²Ù×÷¡£Ëü»¹»áÓ°ÏìÈçºÎ¶ÁÈ¡ÌØ¶¨Êý¾ÝÔ´µÄ²ßÂÔ¡£
accessmode Ìáʾ
accessmode ÌáʾÓÃÀ´¾ö¶¨ÊÇʹÓà MapReduce À´´¦ÀíÊý¾Ý£¬»¹ÊÇÔÚ
Big SQL ·þÎñÆ÷±¾µØ´¦ÀíÊý¾Ý¡£accessmode Ìáʾ֧³ÖÏÂÁÐÑ¡Ôñ£º
Local - Ö¸¶¨ÔÚ Big SQL ·þÎñÆ÷±¾µØ´¦ÀíÊý¾Ý£¬²»²ÉÓà mapreduce
´¦Àí·½Ê½¡£
mapreduce »ò mr ¨C Ö¸¶¨Ê¹Óà MapReduce À´´¦ÀíÊý¾Ý¡£MapReduce
ÊÇ Big SQL ĬÈϵĴ¦Àí·½Ê½£¬Èç¹ûÈκÎÊý¾ÝÔ´µÄ´¦Àí·½Ê½²»Ö§³Ö MapReduce£¬Big SQL
»á×Ô¶¯Ñ¡Ôñ Big SQL ·þÎñÆ÷±¾µØ´¦Àí·½Ê½¡£
ϱߵÄÀý×ÓÖУ¬accessmode ÌáÊ¾Ç¿ÖÆ Big SQL ½«Õû¸ö T1
±í˳Ðò¶ÁÈë Big SQL ·þÎñÆ÷±¾µØÄÚ´æ²¢Ö´ÐоۺϲÙ×÷¡£
Çåµ¥ 36. accessmode Ìáʾ
SELECT c1,count(*) from T1 /*+ accessmode='local' +*/ group by c1; |
accessmode= ¡® local ¡¯Ìáʾһ°ãÓÃÓÚÒÔϳ¡¾°£º
Èç¹û±í×㹻С£¬´´½¨ MapReduce ×÷ÒµµÄ¿ªÏúÔ¶´óÓÚ˳Ðò¶ÁÈë Big
SQL ·þÎñÆ÷±¾µØÄÚ´æ½øÐд¦ÀíµÄ¿ªÏú¡£
ʹÓà TOP »ò LIMIT ×Ó¾ä´ó·ù¶È¼õÉÙÊý¾Ý·µ»Ø½á¹û¼¯µÄ´óС¡£×¢Ò⣬Èç¹ûʹÓÃÁË
GROUP BY »ò ORDER BY ×Ӿ䣬»áÉæ¼°µ½È«±í²Ù×÷£¬´Ëʱ£¬»áÓ°Ïì TOP »ò LIMIT ×Ó¾äµÄÔ¤ÆÚЧ¹û¡£
Big SQL ·þÎñÆ÷±¾µØÓÐ×ã¹»µÄÄÚ´æÀ´ÔËÐвéѯ¡£ÔÚÉϱߵÄÀý×ÓÖУ¬¾ÛºÏ²Ù×÷»áÍêÈ«ÔÚ
Big SQL ·þÎñÆ÷±¾µØÄÚ´æÖÐÍê³É£¬Í¬Ñù£¬Èç¹ûʹÓÃÁË ORDER BY ×Ӿ䣬Êý¾Ý»áÔÚÄÚ´æÖÐÅÅÐòÖ®ºó·µ»Ø£¬Ò²ÐèÒª
Big SQL ·þÎñÆ÷±¾µØÓÐ×ã¹»µÄÄÚ´æ¡£
×¢Ò⣬µ±Ê¹Óà accessmode= ¡® local ¡¯Ìáʾʱ£¬¿ÉÄÜҪʹÓôóÁ¿µÄ±¾µØÄÚ´æ£¬ÌØ±ðÊǵ±¶à¸öÓû§²¢·¢Ö´Ðжà¸ö²éѯʱ¡£
ÎÒÃÇÒ²¿ÉÒÔ²ÉÓÃÈçÏ accessmode ÌáÊ¾Ç¿ÖÆ²ÉÓà MapReduce
·½Ê½À´´¦ÀíÊý¾Ý£¬ÈçÏÂËùʾ :
Çåµ¥ 37. ²ÉÓà MapReduce ·½Ê½
select * from gosalesdw.sls_sales_order_dim /*+ accessmode = 'mr' +*/ where sales_order_key<100020; |
´ËÍ⣬ÎÒÃÇ»¹¿ÉÒÔÉèÖà session ¼¶±ð accessmode Ìáʾ£¬Ëü»áÔÚÕû¸ö»á»°¼¶±ðÉúЧ£¬ÈçÏÂËùʾ£º
Çåµ¥ 38. »á»°¼¶ accessmode Ìáʾ
set force local on; select * from gosalesdw.sls_sales_order_dim where sales_order_key<100020; |
¶à±íÁ¬½Ó¼°±¾µØ·ÃÎÊÌáʾ
µ±Óжà¸ö±í½øÐÐÁ¬½Ó²Ù×÷ʱ£¬±¾µØ·ÃÎÊģʽ£¨accessmode= ¡¯ local
¡¯£©»áÇ¿ÖÆºóÐø²ÎÓëÁ¬½ÓµÄ±í²ÉÓñ¾µØ·ÃÎÊģʽ¡£ ÈçÏÂËùʾ£¬ÎÒÃÇÉèÖà t1 ±íΪ±¾µØ·ÃÎÊģʽ£¬µ¼ÖÂÕû¸ö SQL
Óï¾ä¶¼»áÒÔ±¾µØ·½Ê½ÔËÐУ¬¼´Ê¹ÎÒÃÇÏÔʽָ¶¨ t2 ±íÒÔ mapreduce ģʽÔËÐУº
Çåµ¥ 39. ¶à±íÁ¬½Ó±¾µØ·ÃÎÊÌáʾ
SELECT t1.c1, count(*) FROM t1 /*+ accessmode='local' +*/, t2 WHERE t1.c2 = t2.c2 GROUP BY t1.c1; |
µ±²éѯ²Ù×÷Éæ¼°¶àÓÚÁ½¸ö±íµÄÁ¬½Ó²Ù×÷ʱ£¬Èç¹û²»Äܰ´ÕÕÎÒÃÇÖ¸¶¨µÄ˳Ðò½øÐбíÁ¬½Ó²Ù×÷»áµ¼ÖÂÖØÐµ÷Õû±íµÄÁ¬½Ó˳Ðò£¬Õâʱ£¬¿ÉÄܻᵼÖ±»±ê¼ÇΪ±¾µØ·ÃÎÊÌáʾµÄ±íÖ®ºóµÄ±í²ÉÓÃ
MapReduce ·½Ê½ÔËÐУ¬ÎÒÃÇ¿ÉÒÔͨ¹ýÉèÖÃËùÓбíΪ±¾µØ·ÃÎÊģʽ£¨accessmode= ¡¯ local
¡¯£©À´Ç¿ÖÆÕû¸ö²éѯÒÔ±¾µØ·½Ê½ÔËÐС£
tablesize Ìáʾ
µ±±íÁ¬½Ó²Ù×÷ÖÐÉæ¼°µ½Ð¡±íʱ£¬Big SQL ÍùÍù»áʹÓÃÒ»¸ö MapReduce
×÷ÒµÀ´´¦Àí¿ÉÒÔÔÚÄÚ´æÖнøÐÐÁ¬½Ó²Ù×÷µÄ±í´Ó¶øÌá¸ß²éѯЧÂÊ¡£ ÈçÏÂËùʾ£¬Í¨³££¬Õâ¸ö²éѯÐèÒªÁ½¸ö MapReduce
×÷Òµ£ºÒ»¸öÓÃÓÚ±íÁ¬½Ó²Ù×÷£¬Ò»¸öÓÃÓÚ·Ö×é²Ù×÷£º
Çåµ¥ 40. ±íÁ¬½Ó²Ù
SELECT c1, count(*) from T1, T2 where T1.c2 = T2.c2 GROUP BY c1; |
Èç¹û T2 ±íºÜС£¬ÎÒÃÇ¿ÉÒÔʹÓÃÈçÏÂÓï¾äÀ´Ìá¸ß²éѯЧÂÊ£º
Çåµ¥ 41. tablesize Ìáʾ
SELECT c1, count(*) FROM T1, T2 /*+ tablesize='small' +*/ where T1.c2 = T2.c2 GROUP BY c1; |
¸ÃÓï¾äֻʹÓÃÒ»¸ö MapReduce ×÷ÒµÀ´É¨Ãè T1 ±í£¬Ã¿Ò»¸ö mapper
ÈÎÎñ»áÔÚÄÚ´æÖÐÖ´ÐÐºÍ T2 ±íÁ¬½Ó²Ù×÷£¬·Ö×é²Ù×÷ÔÚͬһ¸öÈÎÎñµÄ reduce ½×¶ÎÍê³É¡£
tablesize ÌáʾÓÃÀ´Ö¸¶¨²ÉÓÃÄÚ´æÁ¬½Ó£¨hash join£©·½Ê½»¹ÊDzÉÓÃ
MapReduce Ö´ÐбíÁ¬½Ó²Ù×÷¡£tablesize Ìáʾ֧³ÖÒÔÏÂÑ¡Ôñ£º
small ¨C Ö¸¶¨¸Ã±í×㹻С£¬¿ÉÒÔ×°ÈëÄÚ´æ²¢Ö´ÐÐ hash ²Ù×÷¡£
large ¨C ĬÈÏÉèÖã¬Ëü±ÜÃâ²ÉÓÃÄÚ´æÁ¬½Ó£¨hash join£©·½Ê½¡£
ϱßÀý×Ó£¬ÎÒÃÇÖ¸¶¨ GOSALESDW.GO_BRANCH_DIM ±íÊÇÒ»¸ö´ó±í£¬Òò´Ë£¬»á±ÜÃâ²ÉÓÃÄÚ´æÁ¬½Ó£¨hash
join£©·½Ê½£º
Çåµ¥ 42. ±ÜÃâ²ÉÓÃÄÚ´æÁ¬½Ó
SELECT count(*) FROM GOSALESDW.GO_BRANCH_DIM /*+ tablesize='large' +*/ AS BD , GOSALESDW.DIST_INVENTORY_FACT AS IF WHERE IF.BRANCH_KEY = BD.BRANCH_KEY AND BD.BRANCH_CODE > 20; |
ϱßÀý×Ó£¬ÎÒÃÇÖ¸¶¨ GOSALESDW.GO_BRANCH_DIM ±íÊÇÒ»¸öС±í£¬Òò´Ë£¬»á¾¡Á¿²ÉÓÃÄÚ´æÁ¬½Ó£¨hash
join£©·½Ê½£º
Çåµ¥ 43. ²ÉÓÃÄÚ´æÁ¬½Ó
SELECT BD.BRANCH_KEY,count(*) FROM GOSALESDW.GO_BRANCH_DIM /*+ tablesize='small' +*/ AS BD , GOSALESDW.DIST_INVENTORY_FACT AS IF WHERE IF.BRANCH_KEY = BD.BRANCH_KEY AND BD.BRANCH_CODE > 20 GROUP BY BD.BRANCH_KEY; |
HBase Ìáʾ
Õë¶Ô HBase£¬Big SQL ÌṩÈçÏ SQL ÓÅ»¯Ìáʾ£º
1.rowcachesize Ìáʾ
2.colbatchsize Ìáʾ
3.useindex Ìáʾ
rowcachesize Ìáʾ
rowcachesize ÌáʾÓÃÀ´Ö¸¶¨Êý¾Ý´Ó region servers
·µ»Øµ½ Big SQL ( ·µ»Ø¸ø Big SQL ·þÎñÆ÷±¾µØÔËÐлò mappers or reducers
²¢Ðд¦Àí ) ¼Ç¼¼¯µÄ´óС¡£Õâ¸öÉèÖÿÉÒÔÏÔÖøÌá¸ß HBase Êý¾Ý¿â²éѯµÄÐÔÄÜ¡£µ±È»£¬Èç¹û¸ÃÖµÉèÖùý¸ß£¬Ò²»áÏûºÄ¸ü¶àµÄ
region servers ¼°²éѯÔËÐб¾µØ£¨Big SQL ·þÎñÆ÷»ò mappers or reducers£©µÄÄÚ´æ¡£Èç¹ûûÓÐÉèÖã¬rowcachesize
ĬÈÏΪ 2000 ÐС£
ÈçÏÂËùʾ£¬ÎÒÃÇÉèÖà rowcachesize ´óСΪ 4000 ÐУº
Çåµ¥ 44. ÉèÖà rowcachesize ´óС
SELECT c1, c3 FROM hbase_tab /*+ rowcachesize=4000
+*/ WHERE c2 > 10;
×¢Ò⣬Õë¶Ô¿í±í£¬ÎÒÃÇ¿ÉÄÜÐèÒª¼õС rowcachesize ÉèÖÃÖµÀ´¼õÉÙÿ´Î¶ÁÈ¡µÄ´«ÊäÊý¾ÝÁ¿¡£
colbatchsize Ìáʾ
colbatchsize ÌáʾÖ÷ÒªÓÃÀ´±ÜÃâÒòijһ¸ö»òij¼¸¸öÌØ±ð´óµÄ×ֶζøÏûºÄÌ«¶àµÄÄÚ´æ¡£colbatchsize
Ìáʾָ¶¨Ã¿Ò»´Î¶ÁÈ¡²Ù×÷´Ó region server ´«ÊäµÄ×ֶθöÊý£¬Ä¬ÈÏΪ -1£¬±íʾ¶ÁÈ¡ËùÓеÄ×ֶΡ£
ÈçÏÂËùʾ£¬ÎÒÃÇÉèÖà colbatchsize ´óСΪ 5£º
Çåµ¥ 45. ÉèÖà colbatchsize ´óС
SELECT c1, c3 FROM hbase_tab /*+ colbatchsize=5 +*/ WHERE c2 > 10; |
useindex Ìáʾ
useindex ÌáʾÓÃÀ´Ö¸¶¨²éѯÓï¾äËùʹÓÃµÄ HBase ¸¨ÖúË÷ÒýÃû³Æ¡£ÌرðÊǵ±Óжà¸öºòÑ¡Ë÷Òý´æÔÚʱ£¬ÎÒÃÇ¿ÉÒÔÖ¸¶¨Ê¹ÓÃÄĸöÌØ¶¨Ë÷Òý¡£ÈçÏÂËùʾ£¬ÎÒÃÇΪ
gosalesdw.SLS_HBASE_SALES_FACT HBase ±í´´½¨ÁË index1 ¸¨ÖúË÷Òý£¬²¢Ö¸¶¨Ï±߲éѯÓï¾äʹÓÃÎÒÃÇ´´½¨µÄ
index1 Ë÷Òý£º
Çåµ¥ 46. ÉèÖà HBase ¸¨ÖúË÷Òý
CREATE INDEX index1 ON TABLE gosalesdw.SLS_HBASE_SALES_FACT(SALES_ORDER_KEY) AS 'HBASE';
select count(*) from gosalesdw.sls_hbase_sales_fact /*+ rowcachesize=2000,useindex='index1' +*/ where sales_order_key<195000; |
Èç¹û useindex Ìáʾָ¶¨µÄË÷Òý²»´æÔÚ£¬Big SQL ½«»á±¨´í¡£
ʹÓÃË÷Òý»áÏÔÖøÌá¸ß²éѯÐÔÄÜ£¬µ«ÊÇ£¬ÔÚ HBase ÖУ¬·ÃÎÊË÷ÒýÐèÒªÊ×ÏȲéÕÒË÷Òý±í£¬Ö®ºóÔÙ²éÕÒÊý¾Ý±í£¬µ±ÐèҪɨÃè´óÁ¿Êý¾Ýʱ£¬ÏµÍ³¿ªÏúÒ²±È½Ï´ó£¬ÕâÖÖÇé¿öÏ£¬²ÉÓÃ
MapReduce ×÷ҵɨÃèÊý¾Ý±í¿ÉÄÜЧÂʸü¸ß¡£
Join Ìáʾ
ÎÒÃÇ¿ÉÒÔʹÓà Join ÌáʾÀ´Ìá¸ß MapReduce ×÷ÒµµÄÖ´ÐÐЧÂÊ¡£
joinmethod Ìáʾ
joinmethod ÌáʾÓÃÀ´Ö¸¶¨Ê¹ÓñíÁ¬½Ó²Ù×÷µÄ·½Ê½¡£Ä¿Ç°£¬ joinmethod
Ìáʾֻ֧³Ö mapsidehash Öµ¡£joinmethod ÌáʾµÄ¡± mapsidehash¡±ÉèÖñÈ
tablesize ÌáʾµÄ"small"ÉèÖøü¾ßÌ壬Ëü¿ÉÒÔ¿ØÖÆÊ¹ÓÃÄÄЩ×ֶνøÐÐ Hash
Á¬½Ó²Ù×÷¡£
µ±ÎÒÃÇʹÓà joinmethod ÌáʾµÄ¡± mapsidehash¡±ÉèÖÃʱ£¬ÎÒÃÇҪͬʱʹÓÃ
buildtable ÌáʾÀ´Ö¸¶¨Õë¶ÔÄÄЩ±í½øÐÐ hash ²Ù×÷¡£ÈçÏÂËùʾ£¬¸ÃÓï¾äÓÃÀ´Ö¸¶¨Õë¶Ô T2 ±íµÄ
c2 ×ֶνøÐÐ Hash Á¬½Ó²Ù×÷£º
Çåµ¥ 47. ÉèÖà joinmethod Ìáʾ
SELECT a.c1 FROM T1 as a, T2 as b WHERE a.c2 = b.c2 /*+ joinmethod='mapsidehash', buildtable='b' +*/ AND a.c3 = b.c3; |
buildtable Ìáʾ
µ±ÎÒÃÇʹÓà joinmethod ÌáʾµÄ¡± mapsidehash¡±ÉèÖÃʱ£¬ÎÒÃÇҪͬʱʹÓÃ
buildtable ÌáʾÀ´Ö¸¶¨Õë¶ÔÄÄЩ±í½øÐÐ hash ²Ù×÷¡£Èç¹ûδͬʱָ¶¨¸ÃÌáʾ½«±¨´í¡£
Subquery Ìáʾ
Big SQL µÄ SQL ²éѯÒýÇæ¿ÉÒÔͨ¹ý¸ÄдÏà¹ØµÄ SQL Óï¾äÀ´Ìá¸ß²éѯÐÔÄÜ£¬ÌرðÊÇÕë¶Ô
SQL ×Ó²éѯ£¬ËüÍùÍù»á¸ÄдΪ±íÁ¬½Ó²Ù×÷£¬ ÈçÏÂËùʾ£¬Õë¶ÔϱߵIJéѯÓï¾ä£º
Çåµ¥ 48. ×Ó²éѯÓï¾ä
SELECT * FROM T1 WHERE c2 IN (SELECT c2 FROM T2 WHERE T2.c3 = T1.c3 AND t2.c4 = 'Fred'); |
Big SQL »á¸ÃдΪ£º
Çåµ¥ 49. ¸Ãд×Ó²éѯÓï¾ä
SELECT T1.* FROM T1, (SELECT c2, c3 FROM T2 WHERE t2.c4 = 'Fred') as D0 WHERE T1.c3 = D0.c3; |
ÔÚһЩÇé¿öÏ£¬×Ó²éѯµÄÊý¾ÝÁ¿ºÜС£¬±ÈÈç˵£¬100 ÐмǼ£¬Õâʱ£¬²»¸Äд¸ÃÓï¾äÍùÍùЧÂʸü¸ß£¬ÎÒÃÇ¿ÉÒÔͨ¹ý
rewrite Ìáʾ½ûÖ¹×Ó²éÑ¯ÖØÐ´¹¦ÄÜ£¬ÈçÏÂËùʾ£º
Çåµ¥ 50. ½ûÖ¹×Ó²éÑ¯ÖØÐ´
SELECT * FROM T1 WHERE c2 IN (/*+ rewrite=false +*/ SELECT c2 FROM T2 WHERE T2.c3 = T1.c3 AND t2.c4 = 'Fred'); |
Big SQL ·þÎñÆ÷ÓÅ»¯
ÄÚ´æÓÅ»¯
ĬÈÏÇé¿öÏ£¬Big SQL »á³¢ÊÔʹÓÃϵͳÄÚ´æµÄ 1/3£¬ÎÒÃÇ¿ÉÒÔͨ¹ýÉèÖû·¾³±äÁ¿À´¸Ä±äÄÚ´æÊ¹ÓÃÁ¿£¬ÓÅ»¯ÏµÍ³ÐÔÄÜ¡£²ÎÊýÉèÖÃÍê³Éºó£¬ÐèÒªÖØÐÂÆô¶¯
Big SQL ·þÎñÆ÷ʹÆäÉúЧ¡£
ÈçÏÂËùʾ£¬ÎÒÃÇ¿ÉÒÔʹÓà VERBOSE=true $BIGSQL_HOME/bin/bigsql
status ÃüÁî²é¿´µ±Ç° Big SQL ·þÎñÆ÷ÄÚ´æÊ¹ÓÃÇé¿ö£º
Çåµ¥ 51. ²é¿´ÄÚ´æÊ¹ÓÃÇé¿ö
biadmin@imtebi1:/opt/ibm/biginsights> VERBOSE=true $BIGSQL_HOME/bin/bigsql status Total memory on the machine = 3958600 KB Max memory allocated to bigsql is: 1288m BigSQL server is running (pid 8243) |
ÎÒÃÇ¿ÉÒÔͨ¹ýÈçÏ»·¾³±äÁ¿À´ÉèÖà Big SQL ·þÎñÆ÷ʹÓÃÄÚ´æµÄ×î´óÖµ¼°×îСֵ£¬²¢ÖØÆô
Big SQL ·þÎñÆ÷ÉúЧеÄÉèÖã¬ÈçÏÂËùʾ£º
Çåµ¥ 52. ÉèÖÃÄÚ´æÊ¹ÓÃÁ¿
/opt/ibm/biginsights> export BIGSQL_CONF_INSTANCE_INITIAL_MEM=2g
/opt/ibm/biginsights> export BIGSQL_CONF_INSTANCE_MAX_MEM=4g biadmin@imtebi1:/opt/ibm/biginsights> $BIGSQL_HOME/bin/bigsql stop BigSQL pid 12611 stopped.
biadmin@imtebi1:/opt/ibm/biginsights> $BIGSQL_HOME/bin/bigsql start BigSQL running, pid 12811.
/opt/ibm/biginsights> VERBOSE=true $BIGSQL_HOME/bin/bigsql status Max memory allocated to bigsql is: 4g BigSQL server is running (pid 12811) |
²ÉÓà Big SQL ·þÎñÆ÷±¾µØÔËÐÐģʽ£¬ÄÚ´æ´óСµÄÉèÖÃÖÁ¹ØÖØÒª£¬ÒòΪ±¾µØÔËËãÖ÷ҪʹÓÃÄÚ´æÀ´Íê³É¡£Èç¹û±¾µØÔËËãÉæ¼°µÄÊý¾Ý¿ÉÒÔ×°ÈëÄÚ´æÖУ¬½«¼«´óµØÌá¸ß²éѯЧÂÊ¡£Í¬Ê±£¬ÎÒÃÇÒ²Òª¿¼Âǵ½£¬Õë¶Ô¶à¸ö²¢·¢ÇëÇó£¬Ã¿Ò»¸öÇëÇó¶¼ÐèÒªÉêÇëÄÚ´æ½øÐб¾µØ´¦Àí¡£
·þÎñÆ÷ÉèÖÃ
ÔÚ Hadoop ÖУ¬ÎÒÃÇ¿ÉÒÔͨ¹ýÐÞ¸Ä×÷ÒµÊôÐÔÀ´µ÷ÓÅÔËÐÐʱÐÔÄÜ¡£¾¡¹Ü Big
SQL »á³¢ÊÔÑ¡Ôñ×îÓŵÄÊôÐÔ£¬Èç¹ûÐèÒªµÄ»°£¬ÎÒÃÇ¿ÉÒÔÔÚ·þÎñÆ÷»ò²éѯ¼¶±ðÉϵ÷ÕûÕâЩÊôÐÔ¡£ÔÚ $BIGSQL_HOME/conf/bigsql-site.xml
ÎļþÖаüº¬ÁË Big SQL ·þÎñÆ÷¼¶±ðµÄĬÈÏ×÷ÒµÊôÐÔÉèÖã¬ÎÒÃÇÒ²¿ÉÒÔʹÓà SET ÃüÁ̬µ÷Õû×÷ÒµÊôÐÔµÄÉèÖá£Ï±ߣ¬ÎÒÃǼòÒª½éÉܼ¸¸ö³£ÓõÄ
Big SQL ×÷ÒµÊôÐÔÉèÖ㬹ØÓÚ Big SQL ×÷ÒµÊôÐÔÖµµ÷ÕûµÄ¾ßÌåÄÚÈÝ£¬´ó¼Ò¿ÉÒԲο¼ Big SQL
ÐÅÏ¢ÖÐÐÄ¡£
Çåµ¥ 53. µ÷Õû Big SQL ×÷ÒµÊôÐÔÖµ
bigsql.reducers.autocalculate [true] |
ĬÈÏֵΪ True£¬Big SQL »á×Ô¶¯¼ÆËã MapReduce ×÷ҵʹÓÃ
reducers µÄÊýÁ¿¡£
bigsql.reducers.bytes.per.reducer [1GB]
|
ĬÈÏֵΪ 1GB£¬Ö¸¶¨Ã¿Ò»¸ö reducer ¿ÉÒÔ´¦ÀíµÄÊý¾ÝÁ¿¡£
bigsql.memoryjoin.size [10MB] |
ĬÈÏֵΪ 10MB£¬µ±±íµÄ´óСµÍÓÚ¸Ãֵʱ£¬Big SQL ½«Ê¹Óà map-side
memory join ·½Ê½¡£
bigsql.localmode.size [200MB] |
ĬÈÏֵΪ 200MB£¬µ±ËùÓбíµÄÊý¾ÝÁ¿Ð¡ÓÚ¸ÃÖµ£¬Big SQL ½«×Ô¶¯ÔÚ·þÎñÆ÷±¾µØÔËÐС£
ÎÒÃÇÒ²¿ÉÒÔͨ¹ý SET ÃüÁ̬¸Ä±ä×÷ÒµÊôÐÔÖµµÄÉèÖã¬ÈçÏÂËùʾ£º
Çåµ¥ 54. ¶¯Ì¬¸Ä±ä×÷ÒµÊôÐÔÖµ
biadmin@imtebi1:/opt/ibm/biginsights> $BIGSQL_HOME/bin/jsqsh -U biadmin -P password JSqsh Release 1.5-ibm, Copyright (C) 2007-2013, Scott C. Gray Type \help for available help topics. Using JLine.
[localhost][biadmin] 1> set bigsql.reducers.byte.per.reducer = 209715200; 1 row affected (total: 0.8s)
[localhost][biadmin] 1> SELECT * FROM T1 WHERE c2 IN (SELECT c2 FROM T2 WHERE T2.c3 = T1.c3 AND t2.c4 = 'Fred'); +----+----+----+ | c1 | c2 | c3 | +----+----+----+ | 2 | 2 | 2 | +----+----+----+ 1 row in results(first row: 14.69s; total: 14.69s)
|
|