1¡¢Hive
·ÖÇø±í
ÔÚHive Select²éѯÖÐÒ»°ã»áɨÃèÕû¸ö±íÄÚÈÝ£¬»áÏûºÄºÜ¶àʱ¼ä×öû±ØÒªµÄ¹¤×÷¡£ÓÐʱºòÖ»ÐèҪɨÃè±íÖйØÐĵÄÒ»²¿·ÖÊý¾Ý£¬Òò´Ë½¨±íʱÒýÈëÁËpartition¸ÅÄî¡£·ÖÇø±íÖ¸µÄÊÇÔÚ´´½¨±íʱָ¶¨µÄpartitionµÄ·ÖÇø¿Õ¼ä¡£
Hive¿ÉÒÔ¶ÔÊý¾Ý°´ÕÕijÁлòÕßijЩÁнøÐзÖÇø¹ÜÀí£¬Ëùν·ÖÇøÎÒÃÇ¿ÉÒÔÄÃÏÂÃæµÄÀý×Ó½øÐнâÊÍ¡£
µ±Ç°»¥ÁªÍøÓ¦ÓÃÿÌì¶¼Òª´æ´¢´óÁ¿µÄÈÕÖ¾Îļþ£¬¼¸G¡¢¼¸Ê®GÉõÖÁ¸ü´ó¶¼ÊÇÓпÉÄÜ¡£´æ´¢ÈÕÖ¾£¬ÆäÖбØÈ»ÓиöÊôÐÔÊÇÈÕÖ¾²úÉúµÄÈÕÆÚ¡£ÔÚ²úÉú·ÖÇøÊ±£¬¾Í¿ÉÒÔ°´ÕÕÈÕÖ¾²úÉúµÄÈÕÆÚÁнøÐл®·Ö¡£°ÑÿһÌìµÄÈÕÖ¾µ±×÷Ò»¸ö·ÖÇø¡£
½«Êý¾Ý×éÖ¯³É·ÖÇø£¬Ö÷Òª¿ÉÒÔÌá¸ßÊý¾ÝµÄ²éѯËÙ¶È¡£ÖÁÓÚÓû§´æ´¢µÄÿһÌõ¼Ç¼µ½µ×·Åµ½Äĸö·ÖÇø£¬ÓÉÓû§¾ö¶¨¡£¼´Óû§ÔÚ¼ÓÔØÊý¾ÝµÄʱºò±ØÐëÏÔʾµÄÖ¸¶¨¸Ã²¿·ÖÊý¾Ý·Åµ½Äĸö·ÖÇø¡£
1.1 ʵÏÖϸ½Ú
1¡¢Ò»¸ö±í¿ÉÒÔÓµÓÐÒ»¸ö»òÕß¶à¸ö·ÖÇø£¬Ã¿¸ö·ÖÇøÒÔÎļþ¼ÐµÄÐÎʽµ¥¶À´æÔÚ±íÎļþ¼ÐµÄĿ¼Ï¡£
2¡¢±íºÍÁÐÃû²»Çø·Ö´óСд¡£
3¡¢·ÖÇøÊÇÒÔ×ֶεÄÐÎʽÔÚ±í½á¹¹ÖдæÔÚ£¬Í¨¹ýdescribe tableÃüÁî¿ÉÒԲ鿴µ½×ֶδæÔÚ£¬ µ«ÊǸÃ×ֶβ»´æ·Åʵ¼ÊµÄÊý¾ÝÄÚÈÝ£¬½ö½öÊÇ·ÖÇøµÄ±íʾ£¨Î±ÁУ©
¡£
1.2 Óï·¨
1. ´´½¨Ò»¸ö·ÖÇø±í£¬ÒÔ ds Ϊ·ÖÇøÁУº
create table invites (id int, name string) partitioned
by (ds string) row format delimited fields terminated
by 't' stored as textfile;
2. ½«Êý¾ÝÌí¼Óµ½Ê±¼äΪ 2013-08-16 Õâ¸ö·ÖÇøÖУº
load data local inpath '/home/hadoop/Desktop/data.txt'
overwrite into table invites partition (ds='2013-08-16');
3. ½«Êý¾ÝÌí¼Óµ½Ê±¼äΪ 2013-08-20 Õâ¸ö·ÖÇøÖУº
load data local inpath '/home/hadoop/Desktop/data.txt'
overwrite into table invites partition (ds='2013-08-20');
4. ´ÓÒ»¸ö·ÖÇøÖвéѯÊý¾Ý£º
select * from invites where ds ='2013-08-12';
5. ÍùÒ»¸ö·ÖÇø±íµÄijһ¸ö·ÖÇøÖÐÌí¼ÓÊý¾Ý£º
insert overwrite table invites partition (ds='2013-08-12')
select id,max(name) from test group by id;
¿ÉÒԲ鿴·ÖÇøµÄ¾ßÌåÇé¿ö£¬Ê¹ÓÃÃüÁ
hadoop fs -ls /home/hadoop.hive/warehouse/invites
»òÕߣº
show partitions tablename;
2¡¢Hive Ͱ
¶ÔÓÚÿһ¸ö±í£¨table£©»òÕß·ÖÇø£¬ Hive¿ÉÒÔ½øÒ»²½×éÖ¯³ÉͰ£¬Ò²¾ÍÊÇ˵ͰÊǸüΪϸÁ£¶ÈµÄÊý¾Ý·¶Î§»®·Ö¡£HiveÒ²ÊÇ
Õë¶ÔijһÁнøÐÐͰµÄ×éÖ¯¡£Hive²ÉÓöÔÁÐÖµ¹þÏ££¬È»ºó³ýÒÔͰµÄ¸öÊýÇóÓàµÄ·½Ê½¾ö¶¨¸ÃÌõ¼Ç¼´æ·ÅÔÚÄĸöͰµ±ÖС£
°Ñ±í£¨»òÕß·ÖÇø£©×éÖ¯³ÉͰ£¨Bucket£©ÓÐÁ½¸öÀíÓÉ£º
£¨1£©»ñµÃ¸ü¸ßµÄ²éѯ´¦ÀíЧÂÊ¡£Í°Îª±í¼ÓÉÏÁ˶îÍâµÄ½á¹¹£¬Hive ÔÚ´¦ÀíÓÐЩ²éѯʱÄÜÀûÓÃÕâ¸ö½á¹¹¡£¾ßÌå¶øÑÔ£¬Á¬½ÓÁ½¸öÔÚ£¨°üº¬Á¬½ÓÁеģ©ÏàͬÁÐÉÏ»®·ÖÁËͰµÄ±í£¬¿ÉÒÔʹÓÃ
Map ¶ËÁ¬½Ó £¨Map-side join£©¸ßЧµÄʵÏÖ¡£±ÈÈçJOIN²Ù×÷¡£¶ÔÓÚJOIN²Ù×÷Á½¸ö±íÓÐÒ»¸öÏàͬµÄÁУ¬Èç¹û¶ÔÕâÁ½¸ö±í¶¼½øÐÐÁËͰ²Ù×÷¡£ÄÇô½«±£´æÏàͬÁÐÖµµÄͰ½øÐÐJOIN²Ù×÷¾Í¿ÉÒÔ£¬¿ÉÒÔ´ó´ó½ÏÉÙJOINµÄÊý¾ÝÁ¿¡£
£¨2£©Ê¹È¡Ñù£¨sampling£©¸ü¸ßЧ¡£ÔÚ´¦Àí´ó¹æÄ£Êý¾Ý¼¯Ê±£¬ÔÚ¿ª·¢ºÍÐ޸IJéѯµÄ½×¶Î£¬Èç¹ûÄÜÔÚÊý¾Ý¼¯µÄһС²¿·ÖÊý¾ÝÉÏÊÔÔËÐвéѯ£¬»á´øÀ´ºÜ¶à·½±ã¡£
1. ´´½¨´øÍ°µÄ table £º
create table bucketed_user(id int,name string) clustered
by (id) sorted by(name) into 4 buckets row format delimited
fields terminated by '\t' stored as textfile;
Ê×ÏÈ£¬ÎÒÃÇÀ´¿´ÈçºÎ¸æËßHive¡ª¸ö±íÓ¦¸Ã±»»®·Ö³ÉͰ¡£ÎÒÃÇʹÓÃCLUSTERED
BY ×Ó¾äÀ´Ö¸¶¨»®·ÖͰËùÓõÄÁкÍÒª»®·ÖµÄͰµÄ¸öÊý£º
CREATE TABLE bucketed_user (id INT) name STRING) CLUSTERED BY (id) INTO 4 BUCKETS; |
ÔÚÕâÀÎÒÃÇʹÓÃÓû§IDÀ´È·¶¨ÈçºÎ»®·ÖͰ(HiveʹÓöÔÖµ½øÐйþÏ£²¢½«½á¹û³ý ÒÔͰµÄ¸öÊýÈ¡ÓàÊý¡£ÕâÑù£¬ÈκÎһͰÀï¶¼»áÓÐÒ»¸öËæ»úµÄÓû§¼¯ºÏ£¨PS£ºÆäʵҲÄÜ˵ÊÇËæ»ú£¬²»ÊÇÂ𣿣©¡£
¶ÔÓÚmap¶ËÁ¬½ÓµÄÇé¿ö£¬Á½¸ö±íÒÔÏàͬ·½Ê½»®·ÖͰ¡£´¦Àí×ó±ß±íÄÚij¸öͰµÄ mapperÖªµÀÓұ߱íÄÚÏàÆ¥ÅäµÄÐÐÔÚ¶ÔÓ¦µÄͰÄÚ¡£Òò´Ë£¬mapperÖ»ÐèÒª»ñÈ¡ÄǸöͰ
(ÕâÖ»ÊÇÓұ߱íÄÚ´æ´¢Êý¾ÝµÄһС²¿·Ö)¼´¿É½øÐÐÁ¬½Ó¡£ÕâÒ»ÓÅ»¯·½·¨²¢²»Ò»¶¨ÒªÇó Á½¸ö±í±ØÐëͰµÄ¸öÊýÏàͬ£¬Á½¸ö±íµÄͰ¸öÊýÊDZ¶Êý¹ØÏµÒ²¿ÉÒÔ¡£ÓÃHiveQL¶ÔÁ½¸ö»®·ÖÁËͰµÄ±í½øÐÐÁ¬½Ó£¬¿É²Î¼û¡°mapÁ¬½Ó¡±²¿·Ö£¨P400£©¡£
ͰÖеÄÊý¾Ý¿ÉÒÔ¸ù¾ÝÒ»¸ö»ò¶à¸öÁÐÁíÍâ½øÐÐÅÅÐò¡£ÓÉÓÚÕâÑù¶Ôÿ¸öͰµÄÁ¬½Ó±ä³ÉÁ˸ßЧµÄ¹é²¢ÅÅÐò(merge-sort),
Òò´Ë¿ÉÒÔ½øÒ»²½ÌáÉýmap¶ËÁ¬½ÓµÄЧÂÊ¡£ÒÔÏÂÓï·¨ÉùÃ÷Ò»¸ö±íʹÆäʹÓÃÅÅÐòͰ£º
CREATE TABLE bucketed_users (id INT, name STRING) CLUSTERED BY (id) SORTED BY (id ASC) INTO 4 BUCKETS; |
ÎÒÃÇÈçºÎ±£Ö¤±íÖеÄÊý¾Ý¶¼»®·Ö³ÉͰÁËÄØ£¿°ÑÔÚHiveÍâÉú³ÉµÄÊý¾Ý¼ÓÔØµ½»®·Ö³É ͰµÄ±íÖУ¬µ±È»ÊÇ¿ÉÒԵġ£ÆäʵÈÃHiveÀ´»®·ÖͰ¸üÈÝÒס£ÕâÒ»²Ù×÷ͨ³£Õë¶ÔÒÑÓÐµÄ±í¡£
Hive²¢²»¼ì²éÊý¾ÝÎļþÖеÄͰÊÇ·ñºÍ±í¶¨ÒåÖеÄͰһÖÂ(ÎÞÂÛÊǶÔÓÚͰ µÄÊýÁ¿»òÓÃÓÚ»®·ÖͰµÄÁУ©¡£Èç¹ûÁ½Õß²»Æ¥Å䣬ÔÚ–Ëѯʱ¿ÉÄÜ»áÅöµ½´í
Îó»ò䶨ÒåµÄ½á¹û¡£Òò´Ë£¬½¨ÒéÈÃHiveÀ´½øÐл®·ÖͰµÄ²Ù×÷¡£
ÓÐÒ»¸öûÓл®·ÖͰµÄÓû§±í£º
hive> SELECT * FROM users; 0 Nat 2 Doe B Kay 4 Ann |
2. Ç¿ÖÆ¶à¸ö reduce ½øÐÐÊä³ö£º
ÒªÏò·ÖͰ±íÖÐÌî³ä³ÉÔ±£¬ÐèÒª½« hive.enforce.bucketing ÊôÐÔÉèÖÃΪ true¡£¢ÙÕâ
Ñù£¬Hive ¾ÍÖªµÀÓÃ±í¶¨ÒåÖÐÉùÃ÷µÄÊýÁ¿À´´´½¨Í°¡£È»ºóʹÓà INSERT ÃüÁî¼´¿É¡£ÐèҪעÒâµÄÊÇ£º clustered
byºÍsorted by²»»áÓ°ÏìÊý¾ÝµÄµ¼È룬ÕâÒâζ×Å£¬Óû§±ØÐë×Ô¼º¸ºÔðÊý¾ÝÈçºÎÈçºÎµ¼È룬°üÀ¨Êý¾ÝµÄ·ÖͰºÍÅÅÐò¡£
'set hive.enforce.bucketing = true' ¿ÉÒÔ×Ô¶¯¿ØÖÆÉÏÒ»ÂÖreduceµÄÊýÁ¿´Ó¶øÊÊÅäbucketµÄ¸öÊý£¬µ±È»£¬Óû§Ò²¿ÉÒÔ×ÔÖ÷ÉèÖÃmapred.reduce.tasksÈ¥ÊÊÅäbucket¸öÊý£¬ÍƼöʹÓÃ'set
hive.enforce.bucketing = true'
3. Íù±íÖвåÈëÊý¾Ý£º
INSERT OVERWRITE TABLE bucketed_users SELECT * FROM
users;
ÎïÀíÉÏ£¬Ã¿¸öͰ¾ÍÊDZí(»ò·ÖÇø£©Ä¿Â¼ÀïµÄÒ»¸öÎļþ¡£ËüµÄÎļþÃû²¢²»ÖØÒª£¬µ«ÊÇͰ n Êǰ´ÕÕ×ÖµäÐòÅÅÁеĵÚ
n ¸öÎļþ¡£ÊÂʵÉÏ£¬Í°¶ÔÓ¦ÓÚ MapReduce µÄÊä³öÎļþ·ÖÇø£ºÒ»¸ö×÷Òµ²úÉúµÄͰ(Êä³öÎļþ)ºÍreduceÈÎÎñ¸öÊýÏàͬ¡£ÎÒÃÇ¿ÉÒÔͨ¹ý²é¿´¸Õ²Å
´´½¨µÄbucketd_users±íµÄ²¼¾ÖÀ´Á˽âÕâÒ»Çé¿ö¡£ÔËÐÐÈçÏÂÃüÁ
4. ²é¿´±íµÄ½á¹¹£º
hive> dfs -ls /user/hive/warehouse/bucketed_users;
½«ÏÔʾÓÐ4¸öн¨µÄÎļþ¡£ÎļþÃûÈçÏÂ(ÎļþÃû°üº¬Ê±¼ä´Á£¬ÓÉHive²úÉú£¬Òò´Ë
ÿ´ÎÔËÐж¼»á¸Ä±ä)£º
attempt_201005221636_0016_r_000000_0 attempt_201005221636_0016_r-000001_0 attempt_201005221636_0016_r_000002_0 attempt_201005221636_0016_r_000003_0 |
µÚÒ»¸öͰÀï°üÀ¨Óû§IDOºÍ4£¬ÒòΪһ¸öINTµÄ¹þÏ£Öµ¾ÍÊÇÕâ¸öÕûÊý±¾Éí£¬ÔÚÕâÀï ³ýÒÔͰÊý(4)ÒÔºóµÄÓàÊý£º¢Ú
5. ¶ÁÈ¡Êý¾Ý£¬¿´Ã¿Ò»¸öÎļþµÄÊý¾Ý£º
hive> dfs -cat /user/hive/warehouse/bucketed_users/*0_0; 0 Nat 4 Ann |
ÓÃTABLESAMPLE×Ó¾ä¶Ô±í½øÐÐÈ¡Ñù£¬ÎÒÃÇ¿ÉÒÔ»ñµÃÏàͬµÄ½á¹û¡£Õâ¸ö×Ó¾ä»á½« ²éѯÏÞ¶¨ÔÚ±íµÄÒ»²¿·ÖͰÄÚ£¬¶ø²»ÊÇʹÓÃÕû¸ö±í£º
6. ¶ÔͰÖеÄÊý¾Ý½øÐвÉÑù£º
hive> SELECT * FROM bucketed_users > TABLESAMPLE(BUCKET 1 OUT OF 4 ON id); 0 Nat 4 Ann |
ͰµÄ¸öÊý´Ó1¿ªÊ¼¼ÆÊý¡£Òò´Ë£¬Ç°ÃæµÄ²éѯ´Ó4¸öͰµÄµÚÒ»¸öÖлñÈ¡ËùÓеÄÓû§¡£ ¶ÔÓÚÒ»¸ö´ó¹æÄ£µÄ¡¢¾ùÔÈ·Ö²¼µÄÊý¾Ý¼¯£¬Õâ»á·µ»Ø±íÖÐÔ¼ËÄ·ÖÖ®Ò»µÄÊý¾ÝÐС£ÎÒÃÇ
Ò²¿ÉÒÔÓÃÆäËû±ÈÀý¶ÔÈô¸É¸öͰ½øÐÐÈ¡Ñù(ÒòΪȡÑù²¢²»ÊÇÒ»¸ö¾«È·µÄ²Ù×÷£¬Òò´ËÕâ¸ö ±ÈÀý²»Ò»¶¨ÒªÊÇͰÊýµÄÕûÊý±¶)¡£ÀýÈ磬ÏÂÃæµÄ²éѯ·µ»ØÒ»°ëµÄͰ£º
7. ²éѯһ°ë·µ»ØµÄͰÊý£º
hive> SELECT * FROM bucketed_users > TABLESAMPLE(BUCKET 1 OUT OF 2 ON id)£» 0 Nat 4 Ann 2 Joe |
ÒòΪ²éѯֻÐèÒª¶ÁÈ¡ºÍTABLESAMPLE×Ӿ䯥ÅäµÄͰ£¬ËùÒÔÈ¡Ñù·ÖͰ±íÊǷdz£¸ßЧ
µÄ²Ù×÷¡£Èç¹ûʹÓÃrand()º¯Êý¶ÔûÓл®·Ö³ÉͰµÄ±í½øÐÐÈ¡Ñù£¬¼´Ê¹Ö»ÐèÒª¶ÁÈ¡ºÜ Сһ²¿·ÖÑù±¾£¬Ò²ÒªÉ¨ÃèÕû¸öÊäÈëÊý¾Ý¼¯£º
hive¡µ SELECT * FROM users > TABLESAMPLE(BUCKET 1 OUT OF 4 ON rand()); 2 Doe |
¢Ù´ÓHive 0.6.0¿ªÊ¼£¬¶ÔÒÔǰµÄ°æ±¾£¬±ØÐë°Ñmapred.reduce .tasksÉèΪ±íÖÐÒªÌî
³äµÄͰµÄ¸öÊý¡£Èç¹ûͰÊÇÅÅÐòµÄ£¬»¹ÐèÒª°Ñhive.enforce.sortingÉèΪtrue¡£
¢ÚÏÔʽÔʼÎļþʱ£¬ÒòΪ·Ö¸ô×Ö·ûÊÇÒ»¸ö²»ÄÜ´òÓ¡µÄ¿ØÖÆ×Ö·û£¬Òò´Ë×ֶζ¼¼·ÔÚÒ»Æð¡£
3¡¢¾Ù¸öÍêÕûµÄСÀõ×Ó£º
£¨1£©½¨student & student1 ±í£º
create table student(id INT, age INT, name STRING) partitioned by(stat_date STRING) clustered by(id) sorted by(age) into 2 buckets row format delimited fields terminated by ','; create table student1(id INT, age INT, name STRING) partitioned by(stat_date STRING) clustered by(id) sorted by(age) into 2 buckets row format delimited fields terminated by ','; |
£¨2£©ÉèÖû·¾³±äÁ¿£º
set hive.enforce.bucketing = true; |
£¨3£©²åÈëÊý¾Ý£º
cat bucket.txt 1,20,zxm 2,21,ljz 3,19,cds 4,18,mac 5,22,android 6,23,symbian 7,25,wp LOAD DATA local INPATH '/home/lijun/bucket.txt' OVERWRITE INTO TABLE student partition(stat_date="20120802"); from student insert overwrite table student1 partition(stat_date="20120802") select id,age,name where stat_date="20120802" sort by age; |
£¨4£©²é¿´ÎļþĿ¼£º
hadoop fs -ls /hive/warehouse/test.db/student1/stat_date=20120802 Found 2 items -rw-r--r-- 2 lijun supergroup 31 2013-11-24 19:16 /hive/warehouse/test.db/student1/stat_date=20120802/000000_0 -rw-r--r-- 2 lijun supergroup 39 2013-11-24 19:16 /hive/warehouse/test.db/student1/stat_date=20120802/000001_0 |
£¨5£©²é¿´samplingÊý¾Ý£º
hive> select * from student1 tablesample(bucket 1 out of 2 on id); Total MapReduce jobs = 1 Launching Job 1 out of 1 ....... OK 4 18 mac 20120802 2 21 ljz 20120802 6 23 symbian 20120802 Time taken: 20.608 seconds |
×¢£ºtablesampleÊdzéÑùÓï¾ä£¬Óï·¨£ºTABLESAMPLE(BUCKET x OUT OF y)
y±ØÐëÊÇtable×ÜbucketÊýµÄ±¶Êý»òÕßÒò×Ó¡£hive¸ù¾ÝyµÄ´óС£¬¾ö¶¨³éÑùµÄ±ÈÀý¡£ÀýÈ磬table×ܹ²·ÖÁË64·Ý£¬µ±y=32ʱ£¬³éÈ¡(64/32=)2¸öbucketµÄÊý¾Ý£¬µ±y=128ʱ£¬³éÈ¡(64/128=)1/2¸öbucketµÄÊý¾Ý¡£x±íʾ´ÓÄĸöbucket¿ªÊ¼³éÈ¡¡£ÀýÈ磬table×ÜbucketÊýΪ32£¬tablesample(bucket
3 out of 16)£¬±íʾ×ܹ²³éÈ¡£¨32/16=£©2¸öbucketµÄÊý¾Ý£¬·Ö±ðΪµÚ3¸öbucketºÍµÚ£¨3+16=£©19¸öbucketµÄÊý¾Ý¡£
1¡¢ÃüÁîÐвÙ×÷
£¨1£©´òÓ¡²éѯͷ£¬ÐèÒªÏÔʾÉèÖãº
set hive.cli.print.header=true;
£¨2£©¼Ó"--"£¬ÆäºóµÄ¶¼±»ÈÏΪÊÇ×¢ÊÍ£¬µ« CLI
²»½âÎö×¢ÊÍ¡£´øÓÐ×¢Ê͵ÄÎļþÖ»ÄÜͨ¹ýÕâÖÖ·½Ê½Ö´ÐУº
hive -f script_name
£¨3£©-eºó¸ú´øÒýºÅµÄhiveÖ¸Áî»òÕß²éѯ£¬-SÈ¥µô¶àÓàµÄÊä³ö£º
hive -S -e "select * FROM mytable
LIMIT 3" > /tmp/myquery
£¨4£©±éÀúËùÓзÖÇøµÄ²éѯ½«²úÉúÒ»¸ö¾Þ´óµÄMapReduce×÷Òµ£¬Èç¹ûÄãµÄÊý¾Ý¼¯ºÍĿ¼·Ç³£¶à£¬
Òò´Ë½¨ÒéÄãʹÓÃstrictÄ£ÐÍ£¬Ò²¾ÍÊÇÄã´æÔÚ·ÖÇøÊ±£¬±ØÐëÖ¸¶¨whereÓï¾ä
hive> set hive.mapred.mode=strict;
£¨5£©ÏÔʾµ±Ç°Ê¹ÓÃÊý¾Ý¿â
set hive.cli.print.current.db=true;
£¨6£©ÉèÖà Hive Job ÓÅÏȼ¶
set mapred.job.priority=VERY_HIGH | HIGH | NORMAL | LOW | VERY_LOW
(VERY_LOW=1,LOW=2500,NORMAL=5000,HIGH=7500,VERY_HIGH=10000)
set mapred.job.map.capacity=MÉèÖÃͬʱ×î¶àÔËÐÐM¸ömapÈÎÎñ
set mapred.job.reduce.capacity=NÉèÖÃͬʱ×î¶àÔËÐÐN¸öreduceÈÎÎñ |
£¨7£©Hive ÖеÄMapper¸öÊýµÄÊÇÓÉÒÔϼ¸¸ö²ÎÊýÈ·¶¨µÄ£º
mapred.min.split.size £¬mapred.max.split.size £¬dfs.block.size splitSize = Math.max(minSize, Math.min(maxSize, blockSize)); |
map¸öÊý»¹ÓëinputfillesµÄ¸öÊýÓйأ¬Èç¹ûÓÐ2¸öÊäÈëÎļþ£¬¼´Ê¹×Ü´óССÓÚblocksize,Ò²»á²úÉú2¸ömap
mapred.reduce.tasksÓÃÀ´ÉèÖÃreduce¸öÊý¡£
2¡¢±í²Ù×÷
£¨1£©²é¿´Ä³¸ö±íËùÓзÖÇø
SHOW PARTITIONS ext_trackflow
²éѯ¾ßÌåij¸ö·ÖÇø
SHOW PARTITIONS ext_trackflow PARTITION(statDate='20140529');
£¨2£©²é¿´¸ñʽ»¯µÄÍêÕû±í½á¹¹
desc formatted ext_trackflow; DESCRIBE EXTENDED ext_trackflow; |
£¨3£©É¾³ý·ÖÇø£º·ÖÇøµÄÔªÊý¾ÝºÍÊý¾Ý½«±»Ò»²¢É¾³ý£¬µ«ÊǶÔÓÚÀ©Õ¹±íÔòֻɾ³ýÔªÊý¾Ý
ALTER TABLE ext_trackflow DROP PARTITION
(statDate='20140529');
£¨4£©²éѯÊÇÍⲿ±í»¹ÊÇÄÚ²¿±í
DESCRIBE EXTENDED tablename
£¨5£©¸´ÖƱí½á¹¹
CREATE EXTERNAL TABLE IF NOT EXISTS mydb.employees3 LIKE mydb.employees LOCATION '/path/to/data'; |
Note£ºÈç¹ûÄãºöÂԹؼü×ÖEXTERNAL£¬ÄÇô½«ÒÀ¾Ý employees
ÊÇÍⲿ»¹ÊÇÄÚ²¿£¬Èç¹û¼ÓÁËÄÇôһ¶¨ÊÇEXTERNAL,²¢ÒªLOCATION
£¨6£©ÎªÄÚ²¿±íij¸ö·ÖÇøµ¼ÈëÊý¾Ý£¬Hive½«½¨Á¢Ä¿Â¼²¢¿½±´Êý¾Ýµ½·ÖÇøµ±ÖÐ
LOAD DATA LOCAL INPATH '${env:HOME}/california-employees' INTO TABLE employees PARTITION (country = 'US', state = 'CA'); |
£¨7£©ÎªÍⲿ±íij¸ö·ÖÇøÌí¼ÓÊý¾Ý
ALTER TABLE log_messages ADD IF NOT EXISTS PARTITION(year = 2012, month = 1, day = 2) LOCATION 'hdfs://master_server/data/log_messages/2012/01/02'; |
Note£ºHive²¢²»¹ØÐÄ·ÖÇø£¬Ä¿Â¼ÊÇ·ñ´æÔÚ£¬ÊÇ·ñÓÐÊý¾Ý£¬Õâ»áµ¼ÖÂûÓвéѯ½á¹û
£¨8£©ÐÞ¸Ä±í£ºÔÚÈκÎʱºòÄã¶¼¿ÉÒÔÐÞ¸Ä±í£¬µ«ÊÇÄã½ö½öÐ޸ĵÄÊDZíµÄÔªÊý¾Ý£¬¶¼Êµ¼ÊÊý¾Ý²»»áÔì³ÉÈκÎÓ°Ïì
ÀýÈç¸ü¸Ä·ÖÇøÖ¸¶¨Î»Öã¬Õâ¸öÃüÁî²»»áɾ³ý¾ÉµÄÊý¾Ý
ALTER TABLE log_messages PARTITION(year = 2011, month = 12, day = 2) SET LOCATION 's3n://ourbucket/logs/2011/01/02'; |
£¨9£©¸ü¸Ä±íÊôÐÔ
ALTER TABLE log_messages SET TBLPROPERTIES ( 'notes' = 'The process id is no longer captured; this column is always NULL' ); |
£¨10£©¸ü¸Ä´æ´¢ÊôÐÔ
ALTER TABLE log_messages PARTITION(year = 2012, month = 1, day = 1) SET FILEFORMAT SEQUENCEFILE; |
Note£ºÈç¹ûtableÊÇ·ÖÇøµÄ»°ÄÇôpartitionÊDZØÐëµÄ
£¨11£©Ö¸¶¨Ð嵀 SerDe
ALTER TABLE table_using_JSON_storage SET SERDE 'com.example.JSONSerDe' WITH SERDEPROPERTIES ( 'prop1' = 'value1', 'prop2' = 'value2' ); |
Note£ºSERDEPROPERTIE½âÊÍSERDEÓõĺÎÖÖÄ£ÐÍ£¬ÊôÐÔÖµºÍÃû³Æ¶¼Îª×Ö·û´®£¬·½±ã¸æËßÓû§£¬Îª×Ô¼ºÖ¸¶¨SERDE²¢ÇÒÓ¦ÓÃÓÚʲôģÐÍ
Ϊµ±Ç°SERDEÉ趨
ALTER TABLE table_using_JSON_storage SET SERDEPROPERTIES ( 'prop3' = 'value3', 'prop4' = 'value4' ); |
£¨12£©¸Ä±ä´æ´¢ÊôÐÔ
ALTER TABLE stocks CLUSTERED BY (exchange, symbol) SORTED BY (symbol) INTO 48 BUCKETS; |
£¨13£©¸´ÔÓ¸ü¸Ä±íÓï¾ä£ºÎª¸÷ÖÖ²»Í¬µÄ²Ù×÷Ìí¼Ó hook ALTER TABLE ¡ TOUCH
ALTER TABLE log_messages TOUCH PARTITION(year = 2012, month = 1, day = 1); |
µäÐ͵ÄÓ¦Óó¡¾°¾ÍÊǵ±·ÖÇøÓи͝µÄʱºò£¬ÄÇô½«´¥·¢
hive -e 'ALTER TABLE log_messages TOUCH
PARTITION(year = 2012, month = 1, day = 1);'
£¨14£©ALTER TABLE ¡ ARCHIVE PARTITION
²¶»ñ·ÖÇøÎļþµ½Hadoop archive fileÒ²¾ÍÊÇHAR
ALTER TABLE log_messages ARCHIVE PARTITION(year = 2012, month = 1, day = 1);£¨Ö»¿ÉÒÔÓÃÔÚ±»·ÖÇøµÄ±í£© |
£¨15£©±£»¤·ÖÇø²»±»É¾³ýºÍ²éѯ
ALTER TABLE log_messages PARTITION(year = 2012, month = 1, day = 1) ENABLE NO_DROP;
ALTER TABLE log_messages
PARTITION(year = 2012, month = 1, day = 1) ENABLE
OFFLINE; |
Note£ºÓëENABLE¶ÔÓ¦µÄÊÇDISABLE£¬²»ÄÜÓ¦ÓÃÔÚδ±»·ÖÇøµÄ±í
£¨16£©°´ÕýÌõ¼þ£¨ÕýÔò±í´ïʽ£©ÏÔʾ±í
hive> SHOW TABLES '.*s';
£¨17£©Íⲿ±í¡¢ÄÚ²¿±í»¥×ª
alter table tablePartition set TBLPROPERTIES ('EXTERNAL'='TRUE'); //ÄÚ²¿±íתÍⲿ±í alter table tablePartition set TBLPROPERTIES ('EXTERNAL'='FALSE'); //Íⲿ±íתÄÚ²¿±í |
£¨18£©·ÖÇøÓë·ÖͰ£º
partition£¨·ÖÇø£º°´Ä¿Â¼±£´æÎļþ£¬Ã¿¸öpartition¶ÔÓ¦Ò»¸öĿ¼£©ÀýÈ磺
CREATE EXTERNAL TABLE table1 ( column1 STRING, column2 STRING, column3 STRING, ) PARTITIONED BY (dt STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE; ALTER TABLE table1 ADD IF NOT EXISTS PARTITION (dt=20090105); ALTER TABLE table1 ADD IF NOT EXISTS PARTITION (dt=20090102); ALTER TABLE table1 ADD IF NOT EXISTS PARTITION (dt=20081231); |
bucket£¨·ÖͰ£¬¶ÔÖ¸¶¨ÁÐ×÷hash£¬Ã¿¸öbucket¶ÔÓ¦Ò»¸öÎļþ£©
CREATE TABLE VT_NEW_DATA ( column1 STRING, column2 STRING, column3 STRING, ) CLUSTERED BY (column1) SORTED BY (column1) INTO 48 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS RCFILE; |
3¡¢ÁвÙ×÷
£¨1£©ÖØÃüÃûÁУ¬¸ü¸ÄλÖã¬ÀàÐͺÍ×¢ÊÍ
ALTER TABLE log_messages CHANGE COLUMN hms hours_minutes_seconds INT COMMENT 'The hours, minutes, and seconds part of the timestamp' AFTER severity; |
¸ü¸ÄÃû³Æ: new column old column type
comment²»ÊDZØÐëµÄ£¬Äã¿ÉÒÔÌí¼Ó×¢ÊÍ
AFTERÓÃÓÚ¸ü¸Ä×Ö¶ÎλÖÃ
½öÐÞ¸ÄÁËÔªÊý¾Ý²¢Î´¶ÔÔ´data×öÈκθ͝
£¨2£©Ìí¼ÓÐÂÁÐ
ALTER TABLE log_messages ADD COLUMNS ( app_name STRING COMMENT 'Application name', session_id LONG COMMENT 'The current session id'); |
£¨3£©É¾³ýºÍÌæ»»ÁУºÉ÷Óã¡£¡£¡
ALTER TABLE table_name ADD|REPLACE COLUMNS
(col_name data_type [COMMENT col_comment], ...)
ADDÊÇ´ú±íÐÂÔöÒ»×ֶΣ¬×Ö¶ÎλÖÃÔÚËùÓÐÁкóÃæ(partitionÁÐǰ)
REPLACE COLUMNS removes all existing columns and adds the new set of columns. REPLACE COLUMNS can also be used to drop columns. For example:
"ALTER TABLE test_change REPLACE COLUMNS (a int, b int);"
will remove column `c' from test_change's schema. Note that this does not delete underlying data,
it just changes the schema. |
£¨4£©REGEX Column Specification
SELECT Óï¾ä¿ÉÒÔʹÓÃÕýÔò±í´ïʽ×öÁÐÑ¡Ôñ£¬ÏÂÃæµÄÓï¾ä²éѯ³ýÁË ds
ºÍ hr Ö®ÍâµÄËùÓÐÁУº
SELECT `(ds|hr)?+.+` FROM test
4¡¢²é¿´±äÁ¿
hive> set; ¡ hive> set-v; ¡ even more output!¡ |
set¡¯Êä³ö hivevar,hiveconf,system ºÍ env
ÃüÃû¿Õ¼äϵÄËùÓбäÁ¿¡£
¡®set -v¡¯°üÀ¨ÁËÊä³öHadoop¶¨ÒåµÄÈ«²¿±äÁ¿¡£
hive> set hivevar:foo=hello; hive> set hivevar:foo; hivevar:foo=hello |
ʹÓñäÁ¿£º
hive> create table toss1(i int, ${hivevar:foo}
string);
5¡¢Ò»¸öÍêÕûµÄ½¨¿â¡¢±íÀý×Ó
-- ´´½¨Êý¾Ý¿â create database ecdata WITH DBPROPERTIES ('creator' = 'June', 'date' = '2014-06-01'); -- »òÕßʹÓà COMMENT ¹Ø¼ü×Ö -- ²é¿´Êý¾Ý¿âÃèÊö DESCRIBE DATABASE ecdata; DESCRIBE DATABASE EXTENDED ecdata; -- Çпâ use ecdata; -- ɾ³ý±í drop table ext_trackflow; -- ´´½¨±í create EXTERNAL table IF NOT EXISTS ext_trackflow ( cookieId string COMMENT '05dvOVC6Il6INhYABV6LAg==', cate1 string COMMENT '4', area1 string COMMENT '102', url string COMMENT 'http://cd.ooxx.com/jinshan-mingzhan-1020', trackTime string COMMENT '2014-05-25 23:03:36', trackURLMap map<string,String> COMMENT '{"area":"102","cate":"4,29,14052"}', ) PARTITIONED BY (statDate STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' COLLECTION ITEMS TERMINATED BY '\002' MAP KEYS TERMINATED BY '\003' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/DataWarehouse/ods/TrackFlowTable' ; --Ìí¼Ó·ÖÇøÓï¾ä ALTER TABLE ext_trackflow ADD PARTITION (statDate='20140525') LOCATION '/DataWarehouse/ods/TrackFlowTable/20140525'; --ÿÌ콨Á¢·ÖÇø yesterday=`date -d '1 days ago' +'%Y%m%d'` hive -e "use ecdata; ALTER TABLE ext_trackflow ADD PARTITION
(statDate='$yesterday') LOCATION '/DataWarehouse/ods/TrackFlowTable/$yesterday';" |
|