Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Modeler   Code  
»áÔ±   
 
   
 
 
     
   
 ¶©ÔÄ
  ¾èÖú
Hive »ù´¡
 
×÷Õߣºxrzs À´Ô´£º¿ªÔ´ÏîÄ¿ ·¢²¼ÓÚ£º 2015-8-31
  4002  次浏览      30
 

1¡¢Hive ·ÖÇø±í

ÔÚHive Select²éѯÖÐÒ»°ã»áɨÃèÕû¸ö±íÄÚÈÝ£¬»áÏûºÄºÜ¶àʱ¼ä×öû±ØÒªµÄ¹¤×÷¡£ÓÐʱºòÖ»ÐèҪɨÃè±íÖйØÐĵÄÒ»²¿·ÖÊý¾Ý£¬Òò´Ë½¨±íʱÒýÈëÁËpartition¸ÅÄî¡£·ÖÇø±íÖ¸µÄÊÇÔÚ´´½¨±íʱָ¶¨µÄpartitionµÄ·ÖÇø¿Õ¼ä¡£

Hive¿ÉÒÔ¶ÔÊý¾Ý°´ÕÕijÁлòÕßijЩÁнøÐзÖÇø¹ÜÀí£¬Ëùν·ÖÇøÎÒÃÇ¿ÉÒÔÄÃÏÂÃæµÄÀý×Ó½øÐнâÊÍ¡£

µ±Ç°»¥ÁªÍøÓ¦ÓÃÿÌì¶¼Òª´æ´¢´óÁ¿µÄÈÕÖ¾Îļþ£¬¼¸G¡¢¼¸Ê®GÉõÖÁ¸ü´ó¶¼ÊÇÓпÉÄÜ¡£´æ´¢ÈÕÖ¾£¬ÆäÖбØÈ»ÓиöÊôÐÔÊÇÈÕÖ¾²úÉúµÄÈÕÆÚ¡£ÔÚ²úÉú·ÖÇøÊ±£¬¾Í¿ÉÒÔ°´ÕÕÈÕÖ¾²úÉúµÄÈÕÆÚÁнøÐл®·Ö¡£°ÑÿһÌìµÄÈÕÖ¾µ±×÷Ò»¸ö·ÖÇø¡£

½«Êý¾Ý×éÖ¯³É·ÖÇø£¬Ö÷Òª¿ÉÒÔÌá¸ßÊý¾ÝµÄ²éѯËÙ¶È¡£ÖÁÓÚÓû§´æ´¢µÄÿһÌõ¼Ç¼µ½µ×·Åµ½Äĸö·ÖÇø£¬ÓÉÓû§¾ö¶¨¡£¼´Óû§ÔÚ¼ÓÔØÊý¾ÝµÄʱºò±ØÐëÏÔʾµÄÖ¸¶¨¸Ã²¿·ÖÊý¾Ý·Åµ½Äĸö·ÖÇø¡£

1.1 ʵÏÖϸ½Ú

1¡¢Ò»¸ö±í¿ÉÒÔÓµÓÐÒ»¸ö»òÕß¶à¸ö·ÖÇø£¬Ã¿¸ö·ÖÇøÒÔÎļþ¼ÐµÄÐÎʽµ¥¶À´æÔÚ±íÎļþ¼ÐµÄĿ¼Ï¡£

2¡¢±íºÍÁÐÃû²»Çø·Ö´óСд¡£

3¡¢·ÖÇøÊÇÒÔ×ֶεÄÐÎʽÔÚ±í½á¹¹ÖдæÔÚ£¬Í¨¹ýdescribe tableÃüÁî¿ÉÒԲ鿴µ½×ֶδæÔÚ£¬ µ«ÊǸÃ×ֶβ»´æ·Åʵ¼ÊµÄÊý¾ÝÄÚÈÝ£¬½ö½öÊÇ·ÖÇøµÄ±íʾ£¨Î±ÁУ© ¡£

1.2 Óï·¨

1. ´´½¨Ò»¸ö·ÖÇø±í£¬ÒÔ ds Ϊ·ÖÇøÁУº

create table invites (id int, name string) partitioned by (ds string) row format delimited fields terminated by 't' stored as textfile;

2. ½«Êý¾ÝÌí¼Óµ½Ê±¼äΪ 2013-08-16 Õâ¸ö·ÖÇøÖУº

load data local inpath '/home/hadoop/Desktop/data.txt' overwrite into table invites partition (ds='2013-08-16');

3. ½«Êý¾ÝÌí¼Óµ½Ê±¼äΪ 2013-08-20 Õâ¸ö·ÖÇøÖУº

load data local inpath '/home/hadoop/Desktop/data.txt' overwrite into table invites partition (ds='2013-08-20');

4. ´ÓÒ»¸ö·ÖÇøÖвéѯÊý¾Ý£º

select * from invites where ds ='2013-08-12';

5. ÍùÒ»¸ö·ÖÇø±íµÄijһ¸ö·ÖÇøÖÐÌí¼ÓÊý¾Ý£º

insert overwrite table invites partition (ds='2013-08-12') select id,max(name) from test group by id;

¿ÉÒԲ鿴·ÖÇøµÄ¾ßÌåÇé¿ö£¬Ê¹ÓÃÃüÁ

hadoop fs -ls /home/hadoop.hive/warehouse/invites

»òÕߣº

show partitions tablename;

2¡¢Hive Ͱ

¶ÔÓÚÿһ¸ö±í£¨table£©»òÕß·ÖÇø£¬ Hive¿ÉÒÔ½øÒ»²½×éÖ¯³ÉͰ£¬Ò²¾ÍÊÇ˵ͰÊǸüΪϸÁ£¶ÈµÄÊý¾Ý·¶Î§»®·Ö¡£HiveÒ²ÊÇ Õë¶ÔijһÁнøÐÐͰµÄ×éÖ¯¡£Hive²ÉÓöÔÁÐÖµ¹þÏ££¬È»ºó³ýÒÔͰµÄ¸öÊýÇóÓàµÄ·½Ê½¾ö¶¨¸ÃÌõ¼Ç¼´æ·ÅÔÚÄĸöͰµ±ÖС£

°Ñ±í£¨»òÕß·ÖÇø£©×éÖ¯³ÉͰ£¨Bucket£©ÓÐÁ½¸öÀíÓÉ£º

£¨1£©»ñµÃ¸ü¸ßµÄ²éѯ´¦ÀíЧÂÊ¡£Í°Îª±í¼ÓÉÏÁ˶îÍâµÄ½á¹¹£¬Hive ÔÚ´¦ÀíÓÐЩ²éѯʱÄÜÀûÓÃÕâ¸ö½á¹¹¡£¾ßÌå¶øÑÔ£¬Á¬½ÓÁ½¸öÔÚ£¨°üº¬Á¬½ÓÁеģ©ÏàͬÁÐÉÏ»®·ÖÁËͰµÄ±í£¬¿ÉÒÔʹÓà Map ¶ËÁ¬½Ó £¨Map-side join£©¸ßЧµÄʵÏÖ¡£±ÈÈçJOIN²Ù×÷¡£¶ÔÓÚJOIN²Ù×÷Á½¸ö±íÓÐÒ»¸öÏàͬµÄÁУ¬Èç¹û¶ÔÕâÁ½¸ö±í¶¼½øÐÐÁËͰ²Ù×÷¡£ÄÇô½«±£´æÏàͬÁÐÖµµÄͰ½øÐÐJOIN²Ù×÷¾Í¿ÉÒÔ£¬¿ÉÒÔ´ó´ó½ÏÉÙJOINµÄÊý¾ÝÁ¿¡£

£¨2£©Ê¹È¡Ñù£¨sampling£©¸ü¸ßЧ¡£ÔÚ´¦Àí´ó¹æÄ£Êý¾Ý¼¯Ê±£¬ÔÚ¿ª·¢ºÍÐ޸IJéѯµÄ½×¶Î£¬Èç¹ûÄÜÔÚÊý¾Ý¼¯µÄһС²¿·ÖÊý¾ÝÉÏÊÔÔËÐвéѯ£¬»á´øÀ´ºÜ¶à·½±ã¡£

1. ´´½¨´øÍ°µÄ table £º

create table bucketed_user(id int,name string) clustered by (id) sorted by(name) into 4 buckets row format delimited fields terminated by '\t' stored as textfile;

Ê×ÏÈ£¬ÎÒÃÇÀ´¿´ÈçºÎ¸æËßHive¡ª¸ö±íÓ¦¸Ã±»»®·Ö³ÉͰ¡£ÎÒÃÇʹÓÃCLUSTERED BY ×Ó¾äÀ´Ö¸¶¨»®·ÖͰËùÓõÄÁкÍÒª»®·ÖµÄͰµÄ¸öÊý£º

CREATE TABLE bucketed_user (id INT) name STRING) 
CLUSTERED BY (id) INTO 4 BUCKETS;

ÔÚÕâÀÎÒÃÇʹÓÃÓû§IDÀ´È·¶¨ÈçºÎ»®·ÖͰ(HiveʹÓöÔÖµ½øÐйþÏ£²¢½«½á¹û³ý ÒÔͰµÄ¸öÊýÈ¡ÓàÊý¡£ÕâÑù£¬ÈκÎһͰÀï¶¼»áÓÐÒ»¸öËæ»úµÄÓû§¼¯ºÏ£¨PS£ºÆäʵҲÄÜ˵ÊÇËæ»ú£¬²»ÊÇÂ𣿣©¡£

¶ÔÓÚmap¶ËÁ¬½ÓµÄÇé¿ö£¬Á½¸ö±íÒÔÏàͬ·½Ê½»®·ÖͰ¡£´¦Àí×ó±ß±íÄÚij¸öͰµÄ mapperÖªµÀÓұ߱íÄÚÏàÆ¥ÅäµÄÐÐÔÚ¶ÔÓ¦µÄͰÄÚ¡£Òò´Ë£¬mapperÖ»ÐèÒª»ñÈ¡ÄǸöͰ (ÕâÖ»ÊÇÓұ߱íÄÚ´æ´¢Êý¾ÝµÄһС²¿·Ö)¼´¿É½øÐÐÁ¬½Ó¡£ÕâÒ»ÓÅ»¯·½·¨²¢²»Ò»¶¨ÒªÇó Á½¸ö±í±ØÐëͰµÄ¸öÊýÏàͬ£¬Á½¸ö±íµÄͰ¸öÊýÊDZ¶Êý¹ØÏµÒ²¿ÉÒÔ¡£ÓÃHiveQL¶ÔÁ½¸ö»®·ÖÁËͰµÄ±í½øÐÐÁ¬½Ó£¬¿É²Î¼û¡°mapÁ¬½Ó¡±²¿·Ö£¨P400£©¡£

ͰÖеÄÊý¾Ý¿ÉÒÔ¸ù¾ÝÒ»¸ö»ò¶à¸öÁÐÁíÍâ½øÐÐÅÅÐò¡£ÓÉÓÚÕâÑù¶Ôÿ¸öͰµÄÁ¬½Ó±ä³ÉÁ˸ßЧµÄ¹é²¢ÅÅÐò(merge-sort), Òò´Ë¿ÉÒÔ½øÒ»²½ÌáÉýmap¶ËÁ¬½ÓµÄЧÂÊ¡£ÒÔÏÂÓï·¨ÉùÃ÷Ò»¸ö±íʹÆäʹÓÃÅÅÐòͰ£º

CREATE TABLE bucketed_users (id INT, name STRING) 
CLUSTERED BY (id) SORTED BY (id ASC) INTO 4 BUCKETS;

ÎÒÃÇÈçºÎ±£Ö¤±íÖеÄÊý¾Ý¶¼»®·Ö³ÉͰÁËÄØ£¿°ÑÔÚHiveÍâÉú³ÉµÄÊý¾Ý¼ÓÔØµ½»®·Ö³É ͰµÄ±íÖУ¬µ±È»ÊÇ¿ÉÒԵġ£ÆäʵÈÃHiveÀ´»®·ÖͰ¸üÈÝÒס£ÕâÒ»²Ù×÷ͨ³£Õë¶ÔÒÑÓÐµÄ±í¡£

Hive²¢²»¼ì²éÊý¾ÝÎļþÖеÄͰÊÇ·ñºÍ±í¶¨ÒåÖеÄͰһÖÂ(ÎÞÂÛÊǶÔÓÚͰ µÄÊýÁ¿»òÓÃÓÚ»®·ÖͰµÄÁУ©¡£Èç¹ûÁ½Õß²»Æ¥Å䣬ÔÚ–Ëѯʱ¿ÉÄÜ»áÅöµ½´í Îó»ò䶨ÒåµÄ½á¹û¡£Òò´Ë£¬½¨ÒéÈÃHiveÀ´½øÐл®·ÖͰµÄ²Ù×÷¡£

ÓÐÒ»¸öûÓл®·ÖͰµÄÓû§±í£º

hive> SELECT * FROM users; 
0 Nat
2 Doe
B Kay
4 Ann

2. Ç¿ÖÆ¶à¸ö reduce ½øÐÐÊä³ö£º

ÒªÏò·ÖͰ±íÖÐÌî³ä³ÉÔ±£¬ÐèÒª½« hive.enforce.bucketing ÊôÐÔÉèÖÃΪ true¡£¢ÙÕâ Ñù£¬Hive ¾ÍÖªµÀÓÃ±í¶¨ÒåÖÐÉùÃ÷µÄÊýÁ¿À´´´½¨Í°¡£È»ºóʹÓà INSERT ÃüÁî¼´¿É¡£ÐèҪעÒâµÄÊÇ£º clustered byºÍsorted by²»»áÓ°ÏìÊý¾ÝµÄµ¼È룬ÕâÒâζ×Å£¬Óû§±ØÐë×Ô¼º¸ºÔðÊý¾ÝÈçºÎÈçºÎµ¼È룬°üÀ¨Êý¾ÝµÄ·ÖͰºÍÅÅÐò¡£

'set hive.enforce.bucketing = true' ¿ÉÒÔ×Ô¶¯¿ØÖÆÉÏÒ»ÂÖreduceµÄÊýÁ¿´Ó¶øÊÊÅäbucketµÄ¸öÊý£¬µ±È»£¬Óû§Ò²¿ÉÒÔ×ÔÖ÷ÉèÖÃmapred.reduce.tasksÈ¥ÊÊÅäbucket¸öÊý£¬ÍƼöʹÓÃ'set hive.enforce.bucketing = true'

3. Íù±íÖвåÈëÊý¾Ý£º

INSERT OVERWRITE TABLE bucketed_users SELECT * FROM users;

ÎïÀíÉÏ£¬Ã¿¸öͰ¾ÍÊDZí(»ò·ÖÇø£©Ä¿Â¼ÀïµÄÒ»¸öÎļþ¡£ËüµÄÎļþÃû²¢²»ÖØÒª£¬µ«ÊÇͰ n Êǰ´ÕÕ×ÖµäÐòÅÅÁÐµÄµÚ n ¸öÎļþ¡£ÊÂʵÉÏ£¬Í°¶ÔÓ¦ÓÚ MapReduce µÄÊä³öÎļþ·ÖÇø£ºÒ»¸ö×÷Òµ²úÉúµÄͰ(Êä³öÎļþ)ºÍreduceÈÎÎñ¸öÊýÏàͬ¡£ÎÒÃÇ¿ÉÒÔͨ¹ý²é¿´¸Õ²Å ´´½¨µÄbucketd_users±íµÄ²¼¾ÖÀ´Á˽âÕâÒ»Çé¿ö¡£ÔËÐÐÈçÏÂÃüÁ

4. ²é¿´±íµÄ½á¹¹£º

hive> dfs -ls /user/hive/warehouse/bucketed_users;

½«ÏÔʾÓÐ4¸öн¨µÄÎļþ¡£ÎļþÃûÈçÏÂ(ÎļþÃû°üº¬Ê±¼ä´Á£¬ÓÉHive²úÉú£¬Òò´Ë ÿ´ÎÔËÐж¼»á¸Ä±ä)£º

attempt_201005221636_0016_r_000000_0 
attempt_201005221636_0016_r-000001_0
attempt_201005221636_0016_r_000002_0
attempt_201005221636_0016_r_000003_0

µÚÒ»¸öͰÀï°üÀ¨Óû§IDOºÍ4£¬ÒòΪһ¸öINTµÄ¹þÏ£Öµ¾ÍÊÇÕâ¸öÕûÊý±¾Éí£¬ÔÚÕâÀï ³ýÒÔͰÊý(4)ÒÔºóµÄÓàÊý£º¢Ú

5. ¶ÁÈ¡Êý¾Ý£¬¿´Ã¿Ò»¸öÎļþµÄÊý¾Ý£º

hive> dfs -cat /user/hive/warehouse/bucketed_users/*0_0; 
0 Nat
4 Ann

ÓÃTABLESAMPLE×Ó¾ä¶Ô±í½øÐÐÈ¡Ñù£¬ÎÒÃÇ¿ÉÒÔ»ñµÃÏàͬµÄ½á¹û¡£Õâ¸ö×Ó¾ä»á½« ²éѯÏÞ¶¨ÔÚ±íµÄÒ»²¿·ÖͰÄÚ£¬¶ø²»ÊÇʹÓÃÕû¸ö±í£º

6. ¶ÔͰÖеÄÊý¾Ý½øÐвÉÑù£º

hive> SELECT * FROM bucketed_users 
> TABLESAMPLE(BUCKET 1 OUT OF 4 ON id);
0 Nat
4 Ann

ͰµÄ¸öÊý´Ó1¿ªÊ¼¼ÆÊý¡£Òò´Ë£¬Ç°ÃæµÄ²éѯ´Ó4¸öͰµÄµÚÒ»¸öÖлñÈ¡ËùÓеÄÓû§¡£ ¶ÔÓÚÒ»¸ö´ó¹æÄ£µÄ¡¢¾ùÔÈ·Ö²¼µÄÊý¾Ý¼¯£¬Õâ»á·µ»Ø±íÖÐÔ¼ËÄ·ÖÖ®Ò»µÄÊý¾ÝÐС£ÎÒÃÇ Ò²¿ÉÒÔÓÃÆäËû±ÈÀý¶ÔÈô¸É¸öͰ½øÐÐÈ¡Ñù(ÒòΪȡÑù²¢²»ÊÇÒ»¸ö¾«È·µÄ²Ù×÷£¬Òò´ËÕâ¸ö ±ÈÀý²»Ò»¶¨ÒªÊÇͰÊýµÄÕûÊý±¶)¡£ÀýÈ磬ÏÂÃæµÄ²éѯ·µ»ØÒ»°ëµÄͰ£º

7. ²éѯһ°ë·µ»ØµÄͰÊý£º

hive> SELECT * FROM bucketed_users 
> TABLESAMPLE(BUCKET 1 OUT OF 2 ON id)£»
0 Nat
4 Ann
2 Joe

ÒòΪ²éѯֻÐèÒª¶ÁÈ¡ºÍTABLESAMPLE×Ӿ䯥ÅäµÄͰ£¬ËùÒÔÈ¡Ñù·ÖͰ±íÊǷdz£¸ßЧ µÄ²Ù×÷¡£Èç¹ûʹÓÃrand()º¯Êý¶ÔûÓл®·Ö³ÉͰµÄ±í½øÐÐÈ¡Ñù£¬¼´Ê¹Ö»ÐèÒª¶ÁÈ¡ºÜ Сһ²¿·ÖÑù±¾£¬Ò²ÒªÉ¨ÃèÕû¸öÊäÈëÊý¾Ý¼¯£º

hive¡µ SELECT * FROM users 
> TABLESAMPLE(BUCKET 1 OUT OF 4 ON rand());
2 Doe

¢Ù´ÓHive 0.6.0¿ªÊ¼£¬¶ÔÒÔǰµÄ°æ±¾£¬±ØÐë°Ñmapred.reduce .tasksÉèΪ±íÖÐÒªÌî ³äµÄͰµÄ¸öÊý¡£Èç¹ûͰÊÇÅÅÐòµÄ£¬»¹ÐèÒª°Ñhive.enforce.sortingÉèΪtrue¡£

¢ÚÏÔʽԭʼÎļþʱ£¬ÒòΪ·Ö¸ô×Ö·ûÊÇÒ»¸ö²»ÄÜ´òÓ¡µÄ¿ØÖÆ×Ö·û£¬Òò´Ë×ֶζ¼¼·ÔÚÒ»Æð¡£

3¡¢¾Ù¸öÍêÕûµÄСÀõ×Ó£º

£¨1£©½¨student & student1 ±í£º

create table student(id INT, age INT, name STRING)
partitioned by(stat_date STRING)
clustered by(id) sorted by(age) into 2 buckets
row format delimited fields terminated by ',';

create table student1(id INT, age INT, name STRING)
partitioned by(stat_date STRING)
clustered by(id) sorted by(age) into 2 buckets
row format delimited fields terminated by ',';

£¨2£©ÉèÖû·¾³±äÁ¿£º

set hive.enforce.bucketing = true; 

£¨3£©²åÈëÊý¾Ý£º

cat bucket.txt

1,20,zxm
2,21,ljz
3,19,cds
4,18,mac
5,22,android
6,23,symbian
7,25,wp

LOAD DATA local INPATH '/home/lijun/bucket.txt' OVERWRITE INTO TABLE student partition(stat_date="20120802");

from student
insert overwrite table student1 partition(stat_date="20120802")
select id,age,name where stat_date="20120802" sort by age;

£¨4£©²é¿´ÎļþĿ¼£º

hadoop fs -ls /hive/warehouse/test.db/student1/stat_date=20120802 
Found 2 items
-rw-r--r-- 2 lijun supergroup 31 2013-11-24 19:16 /hive/warehouse/test.db/student1/stat_date=20120802/000000_0
-rw-r--r-- 2 lijun supergroup 39 2013-11-24 19:16 /hive/warehouse/test.db/student1/stat_date=20120802/000001_0

£¨5£©²é¿´samplingÊý¾Ý£º

hive> select * from student1 tablesample(bucket 1 out of 2 on id);
Total MapReduce jobs = 1
Launching Job 1 out of 1
.......
OK
4 18 mac 20120802
2 21 ljz 20120802
6 23 symbian 20120802
Time taken: 20.608 seconds

×¢£ºtablesampleÊdzéÑùÓï¾ä£¬Óï·¨£ºTABLESAMPLE(BUCKET x OUT OF y)

y±ØÐëÊÇtable×ÜbucketÊýµÄ±¶Êý»òÕßÒò×Ó¡£hive¸ù¾ÝyµÄ´óС£¬¾ö¶¨³éÑùµÄ±ÈÀý¡£ÀýÈ磬table×ܹ²·ÖÁË64·Ý£¬µ±y=32ʱ£¬³éÈ¡(64/32=)2¸öbucketµÄÊý¾Ý£¬µ±y=128ʱ£¬³éÈ¡(64/128=)1/2¸öbucketµÄÊý¾Ý¡£x±íʾ´ÓÄĸöbucket¿ªÊ¼³éÈ¡¡£ÀýÈ磬table×ÜbucketÊýΪ32£¬tablesample(bucket 3 out of 16)£¬±íʾ×ܹ²³éÈ¡£¨32/16=£©2¸öbucketµÄÊý¾Ý£¬·Ö±ðΪµÚ3¸öbucketºÍµÚ£¨3+16=£©19¸öbucketµÄÊý¾Ý¡£

1¡¢ÃüÁîÐвÙ×÷

£¨1£©´òÓ¡²éѯͷ£¬ÐèÒªÏÔʾÉèÖãº

set hive.cli.print.header=true;

£¨2£©¼Ó"--"£¬ÆäºóµÄ¶¼±»ÈÏΪÊÇ×¢ÊÍ£¬µ« CLI ²»½âÎö×¢ÊÍ¡£´øÓÐ×¢Ê͵ÄÎļþÖ»ÄÜͨ¹ýÕâÖÖ·½Ê½Ö´ÐУº

hive -f script_name

£¨3£©-eºó¸ú´øÒýºÅµÄhiveÖ¸Áî»òÕß²éѯ£¬-SÈ¥µô¶àÓàµÄÊä³ö£º

hive -S -e "select * FROM mytable LIMIT 3" > /tmp/myquery

£¨4£©±éÀúËùÓзÖÇøµÄ²éѯ½«²úÉúÒ»¸ö¾Þ´óµÄMapReduce×÷Òµ£¬Èç¹ûÄãµÄÊý¾Ý¼¯ºÍĿ¼·Ç³£¶à£¬ Òò´Ë½¨ÒéÄãʹÓÃstrictÄ£ÐÍ£¬Ò²¾ÍÊÇÄã´æÔÚ·ÖÇøÊ±£¬±ØÐëÖ¸¶¨whereÓï¾ä

hive> set hive.mapred.mode=strict;

£¨5£©ÏÔʾµ±Ç°Ê¹ÓÃÊý¾Ý¿â

set hive.cli.print.current.db=true;

£¨6£©ÉèÖà Hive Job ÓÅÏȼ¶

set mapred.job.priority=VERY_HIGH | HIGH | NORMAL | LOW | VERY_LOW

(VERY_LOW=1,LOW=2500,NORMAL=5000,HIGH=7500,VERY_HIGH=10000)
set mapred.job.map.capacity=MÉèÖÃͬʱ×î¶àÔËÐÐM¸ömapÈÎÎñ
set mapred.job.reduce.capacity=NÉèÖÃͬʱ×î¶àÔËÐÐN¸öreduceÈÎÎñ

£¨7£©Hive ÖеÄMapper¸öÊýµÄÊÇÓÉÒÔϼ¸¸ö²ÎÊýÈ·¶¨µÄ£º

mapred.min.split.size £¬mapred.max.split.size £¬dfs.block.size
splitSize = Math.max(minSize, Math.min(maxSize, blockSize));

map¸öÊý»¹ÓëinputfillesµÄ¸öÊýÓйأ¬Èç¹ûÓÐ2¸öÊäÈëÎļþ£¬¼´Ê¹×Ü´óССÓÚblocksize,Ò²»á²úÉú2¸ömap

mapred.reduce.tasksÓÃÀ´ÉèÖÃreduce¸öÊý¡£

2¡¢±í²Ù×÷

£¨1£©²é¿´Ä³¸ö±íËùÓзÖÇø

SHOW PARTITIONS ext_trackflow

²éѯ¾ßÌåij¸ö·ÖÇø

SHOW PARTITIONS ext_trackflow PARTITION(statDate='20140529');

£¨2£©²é¿´¸ñʽ»¯µÄÍêÕû±í½á¹¹

desc formatted ext_trackflow; 
DESCRIBE EXTENDED ext_trackflow;

£¨3£©É¾³ý·ÖÇø£º·ÖÇøµÄÔªÊý¾ÝºÍÊý¾Ý½«±»Ò»²¢É¾³ý£¬µ«ÊǶÔÓÚÀ©Õ¹±íÔòֻɾ³ýÔªÊý¾Ý

ALTER TABLE ext_trackflow DROP PARTITION (statDate='20140529');

£¨4£©²éѯÊÇÍⲿ±í»¹ÊÇÄÚ²¿±í

DESCRIBE EXTENDED tablename

£¨5£©¸´ÖƱí½á¹¹

CREATE EXTERNAL TABLE IF NOT EXISTS mydb.employees3  
LIKE mydb.employees
LOCATION '/path/to/data';

Note£ºÈç¹ûÄãºöÂԹؼü×ÖEXTERNAL£¬ÄÇô½«ÒÀ¾Ý employees ÊÇÍⲿ»¹ÊÇÄÚ²¿£¬Èç¹û¼ÓÁËÄÇôһ¶¨ÊÇEXTERNAL,²¢ÒªLOCATION

£¨6£©ÎªÄÚ²¿±íij¸ö·ÖÇøµ¼ÈëÊý¾Ý£¬Hive½«½¨Á¢Ä¿Â¼²¢¿½±´Êý¾Ýµ½·ÖÇøµ±ÖÐ

LOAD DATA LOCAL INPATH '${env:HOME}/california-employees' 
INTO TABLE employees
PARTITION (country = 'US', state = 'CA');

£¨7£©ÎªÍⲿ±íij¸ö·ÖÇøÌí¼ÓÊý¾Ý

ALTER TABLE log_messages ADD  IF NOT EXISTS PARTITION(year = 2012, month = 1, day = 2) 
LOCATION 'hdfs://master_server/data/log_messages/2012/01/02';

Note£ºHive²¢²»¹ØÐÄ·ÖÇø£¬Ä¿Â¼ÊÇ·ñ´æÔÚ£¬ÊÇ·ñÓÐÊý¾Ý£¬Õâ»áµ¼ÖÂûÓвéѯ½á¹û

£¨8£©ÐÞ¸Ä±í£ºÔÚÈκÎʱºòÄã¶¼¿ÉÒÔÐÞ¸Ä±í£¬µ«ÊÇÄã½ö½öÐ޸ĵÄÊDZíµÄÔªÊý¾Ý£¬¶¼Êµ¼ÊÊý¾Ý²»»áÔì³ÉÈκÎÓ°Ïì

ÀýÈç¸ü¸Ä·ÖÇøÖ¸¶¨Î»Öã¬Õâ¸öÃüÁî²»»áɾ³ý¾ÉµÄÊý¾Ý

ALTER TABLE log_messages PARTITION(year = 2011, month = 12, day = 2) 
SET LOCATION 's3n://ourbucket/logs/2011/01/02';

£¨9£©¸ü¸Ä±íÊôÐÔ

ALTER TABLE log_messages SET TBLPROPERTIES (
'notes' = 'The process id is no longer captured; this column is always NULL'
);

£¨10£©¸ü¸Ä´æ´¢ÊôÐÔ

ALTER TABLE log_messages
PARTITION(year = 2012, month = 1, day = 1)
SET FILEFORMAT SEQUENCEFILE;

Note£ºÈç¹ûtableÊÇ·ÖÇøµÄ»°ÄÇôpartitionÊDZØÐëµÄ

£¨11£©Ö¸¶¨Ð嵀 SerDe

ALTER TABLE table_using_JSON_storage
SET SERDE 'com.example.JSONSerDe'
WITH SERDEPROPERTIES (
'prop1' = 'value1',
'prop2' = 'value2'
);

Note£ºSERDEPROPERTIE½âÊÍSERDEÓõĺÎÖÖÄ£ÐÍ£¬ÊôÐÔÖµºÍÃû³Æ¶¼Îª×Ö·û´®£¬·½±ã¸æËßÓû§£¬Îª×Ô¼ºÖ¸¶¨SERDE²¢ÇÒÓ¦ÓÃÓÚʲôģÐÍ
Ϊµ±Ç°SERDEÉ趨

ALTER TABLE table_using_JSON_storage
SET SERDEPROPERTIES (
'prop3' = 'value3',
'prop4' = 'value4'
);

£¨12£©¸Ä±ä´æ´¢ÊôÐÔ

ALTER TABLE stocks
CLUSTERED BY (exchange, symbol)
SORTED BY (symbol)
INTO 48 BUCKETS;

£¨13£©¸´ÔÓ¸ü¸Ä±íÓï¾ä£ºÎª¸÷ÖÖ²»Í¬µÄ²Ù×÷Ìí¼Ó hook ALTER TABLE ¡­ TOUCH

ALTER TABLE log_messages TOUCH
PARTITION(year = 2012, month = 1, day = 1);

µäÐ͵ÄÓ¦Óó¡¾°¾ÍÊǵ±·ÖÇøÓи͝µÄʱºò£¬ÄÇô½«´¥·¢

hive -e 'ALTER TABLE log_messages TOUCH PARTITION(year = 2012, month = 1, day = 1);'

£¨14£©ALTER TABLE ¡­ ARCHIVE PARTITION ²¶»ñ·ÖÇøÎļþµ½Hadoop archive fileÒ²¾ÍÊÇHAR

ALTER TABLE log_messages ARCHIVE
PARTITION(year = 2012, month = 1, day = 1);£¨Ö»¿ÉÒÔÓÃÔÚ±»·ÖÇøµÄ±í£©

£¨15£©±£»¤·ÖÇø²»±»É¾³ýºÍ²éѯ

ALTER TABLE log_messages
PARTITION(year = 2012, month = 1, day = 1) ENABLE NO_DROP;

ALTER TABLE log_messages
PARTITION(year = 2012, month = 1, day = 1) ENABLE OFFLINE;

Note£ºÓëENABLE¶ÔÓ¦µÄÊÇDISABLE£¬²»ÄÜÓ¦ÓÃÔÚδ±»·ÖÇøµÄ±í

£¨16£©°´ÕýÌõ¼þ£¨ÕýÔò±í´ïʽ£©ÏÔʾ±í

hive> SHOW TABLES '.*s';

£¨17£©Íⲿ±í¡¢ÄÚ²¿±í»¥×ª

alter table tablePartition set TBLPROPERTIES ('EXTERNAL'='TRUE');  //ÄÚ²¿±íתÍⲿ±í 
alter table tablePartition set TBLPROPERTIES ('EXTERNAL'='FALSE'); //Íⲿ±íתÄÚ²¿±í

£¨18£©·ÖÇøÓë·ÖͰ£º

partition£¨·ÖÇø£º°´Ä¿Â¼±£´æÎļþ£¬Ã¿¸öpartition¶ÔÓ¦Ò»¸öĿ¼£©ÀýÈ磺

CREATE EXTERNAL TABLE table1 (
column1 STRING,
column2 STRING,
column3 STRING,
)
PARTITIONED BY (dt STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE;
ALTER TABLE table1 ADD IF NOT EXISTS PARTITION (dt=20090105);
ALTER TABLE table1 ADD IF NOT EXISTS PARTITION (dt=20090102);
ALTER TABLE table1 ADD IF NOT EXISTS PARTITION (dt=20081231);

bucket£¨·ÖͰ£¬¶ÔÖ¸¶¨ÁÐ×÷hash£¬Ã¿¸öbucket¶ÔÓ¦Ò»¸öÎļþ£©

CREATE TABLE VT_NEW_DATA
(
column1 STRING,
column2 STRING,
column3 STRING,
)
CLUSTERED BY (column1)
SORTED BY (column1)
INTO 48 BUCKETS
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\n'
STORED AS RCFILE;

3¡¢ÁвÙ×÷

£¨1£©ÖØÃüÃûÁУ¬¸ü¸ÄλÖã¬ÀàÐͺÍ×¢ÊÍ

ALTER TABLE log_messages 
CHANGE COLUMN hms hours_minutes_seconds INT
COMMENT 'The hours, minutes, and seconds part of the timestamp'
AFTER severity;

¸ü¸ÄÃû³Æ: new column old column type

comment²»ÊDZØÐëµÄ£¬Äã¿ÉÒÔÌí¼Ó×¢ÊÍ

AFTERÓÃÓÚ¸ü¸Ä×Ö¶ÎλÖÃ

½öÐÞ¸ÄÁËÔªÊý¾Ý²¢Î´¶ÔÔ´data×öÈκθ͝

£¨2£©Ìí¼ÓÐÂÁÐ

ALTER TABLE log_messages ADD COLUMNS ( 
app_name STRING COMMENT 'Application name',
session_id LONG COMMENT 'The current session id');

£¨3£©É¾³ýºÍÌæ»»ÁУºÉ÷Óã¡£¡£¡

ALTER TABLE table_name ADD|REPLACE COLUMNS (col_name data_type [COMMENT col_comment], ...)

ADDÊÇ´ú±íÐÂÔöÒ»×ֶΣ¬×Ö¶ÎλÖÃÔÚËùÓÐÁкóÃæ(partitionÁÐǰ)

REPLACE COLUMNS removes all existing columns and adds the new set of columns.  
REPLACE COLUMNS can also be used to drop columns. For example: "ALTER TABLE test_change REPLACE COLUMNS (a int, b int);" will remove column `c' from test_change's schema. Note that this does not delete underlying data, it just changes the schema.

£¨4£©REGEX Column Specification

SELECT Óï¾ä¿ÉÒÔʹÓÃÕýÔò±í´ïʽ×öÁÐÑ¡Ôñ£¬ÏÂÃæµÄÓï¾ä²éѯ³ýÁË ds ºÍ hr Ö®ÍâµÄËùÓÐÁУº

SELECT `(ds|hr)?+.+` FROM test

4¡¢²é¿´±äÁ¿

hive> set; 
¡­
hive> set-v;
¡­ even more output!¡­

set¡¯Êä³ö hivevar,hiveconf,system ºÍ env ÃüÃû¿Õ¼äϵÄËùÓбäÁ¿¡£

¡®set -v¡¯°üÀ¨ÁËÊä³öHadoop¶¨ÒåµÄÈ«²¿±äÁ¿¡£

hive> set hivevar:foo=hello; 
hive> set hivevar:foo;
hivevar:foo=hello

ʹÓñäÁ¿£º

hive> create table toss1(i int, ${hivevar:foo} string);

5¡¢Ò»¸öÍêÕûµÄ½¨¿â¡¢±íÀý×Ó

-- ´´½¨Êý¾Ý¿â
create database ecdata WITH DBPROPERTIES ('creator' = 'June', 'date' = '2014-06-01');
-- »òÕßʹÓà COMMENT ¹Ø¼ü×Ö
-- ²é¿´Êý¾Ý¿âÃèÊö
DESCRIBE DATABASE ecdata;
DESCRIBE DATABASE EXTENDED ecdata;
-- Çпâ
use ecdata;

-- ɾ³ý±í
drop table ext_trackflow;

-- ´´½¨±í
create EXTERNAL table IF NOT EXISTS ext_trackflow (
cookieId string COMMENT '05dvOVC6Il6INhYABV6LAg==',
cate1 string COMMENT '4',
area1 string COMMENT '102',
url string COMMENT 'http://cd.ooxx.com/jinshan-mingzhan-1020',
trackTime string COMMENT '2014-05-25 23:03:36',
trackURLMap map<string,String> COMMENT '{"area":"102","cate":"4,29,14052"}',
)
PARTITIONED BY (statDate STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\001'
COLLECTION ITEMS TERMINATED BY '\002'
MAP KEYS TERMINATED BY '\003'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/DataWarehouse/ods/TrackFlowTable'
;

--Ìí¼Ó·ÖÇøÓï¾ä
ALTER TABLE ext_trackflow ADD PARTITION (statDate='20140525')
LOCATION '/DataWarehouse/ods/TrackFlowTable/20140525';


--ÿÌ콨Á¢·ÖÇø
yesterday=`date -d '1 days ago' +'%Y%m%d'`
hive -e "use ecdata; ALTER TABLE ext_trackflow ADD PARTITION (statDate='$yesterday') LOCATION '/DataWarehouse/ods/TrackFlowTable/$yesterday';"
   
4002 ´Îä¯ÀÀ       30
Ïà¹ØÎÄÕÂ

»ùÓÚEAµÄÊý¾Ý¿â½¨Ä£
Êý¾ÝÁ÷½¨Ä££¨EAÖ¸ÄÏ£©
¡°Êý¾Ýºþ¡±£º¸ÅÄî¡¢ÌØÕ÷¡¢¼Ü¹¹Óë°¸Àý
ÔÚÏßÉ̳ÇÊý¾Ý¿âϵͳÉè¼Æ ˼·+Ч¹û
 
Ïà¹ØÎĵµ

GreenplumÊý¾Ý¿â»ù´¡Åàѵ
MySQL5.1ÐÔÄÜÓÅ»¯·½°¸
ijµçÉÌÊý¾ÝÖÐ̨¼Ü¹¹Êµ¼ù
MySQL¸ßÀ©Õ¹¼Ü¹¹Éè¼Æ
Ïà¹Ø¿Î³Ì

Êý¾ÝÖÎÀí¡¢Êý¾Ý¼Ü¹¹¼°Êý¾Ý±ê×¼
MongoDBʵս¿Î³Ì
²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿âÉè¼ÆÓëÓÅ»¯
PostgreSQLÊý¾Ý¿âʵսÅàѵ
×îл¼Æ»®
DeepSeek´óÄ£ÐÍÓ¦Óÿª·¢ 6-12[ÏÃÃÅ]
È˹¤ÖÇÄÜ.»úÆ÷ѧϰTensorFlow 6-22[Ö±²¥]
»ùÓÚ UML ºÍEA½øÐзÖÎöÉè¼Æ 6-30[±±¾©]
ǶÈëʽÈí¼þ¼Ü¹¹-¸ß¼¶Êµ¼ù 7-9[±±¾©]
Óû§ÌåÑé¡¢Ò×ÓÃÐÔ²âÊÔÓëÆÀ¹À 7-25[Î÷°²]
ͼÊý¾Ý¿âÓë֪ʶͼÆ× 8-23[±±¾©]

MySQLË÷Òý±³ºóµÄÊý¾Ý½á¹¹
MySQLÐÔÄܵ÷ÓÅÓë¼Ü¹¹Éè¼Æ
SQL ServerÊý¾Ý¿â±¸·ÝÓë»Ö¸´
ÈÃÊý¾Ý¿â·ÉÆðÀ´ 10´óDB2ÓÅ»¯
oracleµÄÁÙʱ±í¿Õ¼äдÂú´ÅÅÌ
Êý¾Ý¿âµÄ¿çƽ̨Éè¼Æ


²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿â
¸ß¼¶Êý¾Ý¿â¼Ü¹¹Éè¼ÆÊ¦
HadoopÔ­ÀíÓëʵ¼ù
Oracle Êý¾Ý²Ö¿â
Êý¾Ý²Ö¿âºÍÊý¾ÝÍÚ¾ò
OracleÊý¾Ý¿â¿ª·¢Óë¹ÜÀí


GE Çø¿éÁ´¼¼ÊõÓëʵÏÖÅàѵ
º½Ìì¿Æ¹¤Ä³×Ó¹«Ë¾ Nodejs¸ß¼¶Ó¦Óÿª·¢
ÖÐÊ¢Òæ»ª ׿Խ¹ÜÀíÕß±ØÐë¾ß±¸µÄÎåÏîÄÜÁ¦
ijÐÅÏ¢¼¼Êõ¹«Ë¾ PythonÅàѵ
ij²©²ÊITϵͳ³§ÉÌ Ò×ÓÃÐÔ²âÊÔÓëÆÀ¹À
ÖйúÓÊ´¢ÒøÐÐ ²âÊÔ³ÉÊì¶ÈÄ£Ðͼ¯³É(TMMI)
ÖÐÎïÔº ²úÆ·¾­ÀíÓë²úÆ·¹ÜÀí