Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Modeler   Code  
»áÔ±   
 
   
 
 
     
   
 ¶©ÔÄ
  ¾èÖú
HiveÈëÃÅÖ¸ÄÏ
 
À´Ô´£ºlife ·¢²¼ÓÚ£º 2015-9-2
  3054  次浏览      30
 

1¡¢°²×°ÓëÅäÖÃ

HiveÊǽ¨Á¢ÔÚHadoopÉϵÄÊý¾Ý²Ö¿âÈí¼þ£¬ÓÃÓÚ²éѯºÍ¹ÜÀí´æ·ÅÔÚ·Ö²¼Ê½´æ´¢ÉϵĴó¹æÄ£Êý¾Ý¼¯¡£ËüÌṩ£º

£¨1£©Ò»ÏµÁеŤ¾ß£¬¿ÉÒÔ·½±ãµØ¶ÔÊý¾Ý½øÐÐÌáÈ¡/ת»¯/¼ÓÔØ£¨ETL£©£»

£¨2£©Ò»ÖÖ¿ÉÒÔ¶Ô¸÷ÖÖÊý¾Ý¸ñʽÉϽøÐнṹ»¯µÄ»úÖÆ£»

£¨3£©´æÈ¡´æ·ÅÔÚApache HDFS»òÆäËû´æ´¢ÏµÍ³ÈçApache HBaseÉϵÄÎļþ£»

£¨4£©Í¨¹ýMapReduceÖ´Ðвéѯ¹¦ÄÜ¡£

Hive ¶¨ÒåÁ˼òµ¥µÄÀàSQL²éѯÓïÑÔ£¬³ÆÎªQL£¬ËüÔÊÐíÊìϤSQLµÄÓû§²éѯÊý¾Ý¡£Í¬Ê±£¬Õâ¸öÓïÑÔÒ²ÔÊÐíÊìϤMapReduceµÄ¿ª·¢Õß¿ª·¢×Ô¶¨ÒåµÄmapperºÍreducerÀ´´¦ÀíÄÚ½¨µÄmapperºÍreducerÎÞ·¨Íê³ÉµÄ¸´ÔÓ·ÖÎö¹¤×÷¡£QLÒ²¿ÉÒÔ±»À©Õ¹£¬ÒÔÈÃÓû§×Ô¶¨Òå±êÁ¿º¯Êý£¨UDF's£©¡¢¾ÛºÏ£¨UDAF's£©ºÍ±íº¯Êý£¨UDTF's£©¡£

Hive²¢²»ÒªÇóʹÓÃ"Hive¸ñʽ"À´¶ÁдÊý¾Ý£­£­Ã»ÓÐÕâÑùµÄ¸ñʽ£¬HiveûÓÐרÃŵÄÊý¾Ý¸ñʽ¡£ Hive¿ÉÒԺܺõŤ×÷ÔÚApache Thrift¡¢¿ØÖÆ·Ö¸ô·û¡¢»òÓû§Ö¸¶¨µÄÊý¾Ý¸ñʽÉÏ¡£

Hive²¢²»ÊÇΪOLTP¹¤×÷¸ºÔØÉè¼ÆµÄ£¬Ò²²»Ìṩʵʱ²éѯºÍÐм¶±ðµÄ¸üС£Ëü×î³£ÓÃÓÚÅúÁ¿×÷Òµ£¬±È½ÏÊʺϹ¤×÷ÔÚÖ»×·¼ÓµÄ´óÊý¾Ý¼¯ÉÏ£¨ÈçWebÈÕÖ¾£©¡£HiveµÄºËÐÄÉè¼ÆË¼ÏëÊÇ¿ÉÉìËõÐÔ£¨Í¨¹ý¶¯Ì¬Ìí¼Óµ½Hadoop¼¯ÈºµÄ»úÆ÷À´½øÐкáÏòÀ©Õ¹£©¡¢¿ÉÀ©Õ¹ÐÔ£¨Ê¹ÓÃMapReduce¿ò¼ÜºÍUDF/UDAF/UDTF£©¡¢ÈÝ´íÐÔ¡¢ÓëÊäÈë¸ñʽµÄËÉñîºÏÐÔ¡£

´Óhttp://hive.apache.org/releases.html´¦ÏÂÔØ×îеÄÎȶ¨°æhive-0.9.0.tar.gz¡£ÏÂÒ»²½ÄãÐèÒª½âѹËõTar°ü£¬Õ⽫»á´´½¨Ò»¸öÃûΪhive-x.y.zµÄ×ÓĿ¼£º

$ tar -xzvf hive-x.y.z.tar.gz

ÉèÖÃHIVE_HOME»·¾³±äÁ¿Ö¸Ïòµ½HiveµÄ°²×°Ä¿Â¼£º

$ cd hive-x.y.z
$ export HIVE_HOME=$(pwd)

×îºó£¬½«$HIVE_HOME/binÌí¼Óµ½ÄãµÄPATH»·¾³±äÁ¿ÖУº

$ export PATH=$HIVE_HOME/bin:$PATH

2¡¢ÔËÐÐHive

HiveʹÓÃHadoop£¬ÕâÒâζ×ÅÄã±ØÐëÔÚPATHÀïÃæÉèÖÃÁËhadoop·¾¶£¬»òÕßµ¼³öexport HADOOP_HOME=<hadoop-install-dir>Ò²¿ÉÒÔ¡£ÁíÍ⣬Äã±ØÐëÔÚ´´½¨Hive¿â±íǰ£¬ÔÚHDFSÉÏ´´½¨/tmpºÍ/user/hive/warehouse£¨Ò²³ÆÎªhive.metastore.warehouse.dir£©£¬²¢ÇÒ½«ËüÃǵÄȨÏÞÉèÖÃΪchmod g+w¡£Íê³ÉÕâ¸ö²Ù×÷µÄÃüÁîÈçÏ£º

$ $HADOOP_HOME/bin/hadoop fs -mkdir /tmp
$ $HADOOP_HOME/bin/hadoop fs -mkdir /user/hive/warehouse
$ $HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp
$ $HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse

ÎÒͬÑù·¢ÏÖÉèÖÃHIVE_HOMEÊǺÜÖØÒªµÄ£¬µ«²¢·Ç±ØÐë¡£

$ export HIVE_HOME=<hive-install-dir>

ÔÚShellÖÐʹÓÃHiveÃüÁîÐÐ(cli)ģʽ£º

$ $HIVE_HOME/bin/hive

3¡¢ÅäÖÃÎļþ¹ÜÀí¸ÅÊö

£¨1£©HiveĬÈϵÄÅäÖÃÎļþ±£´æÔÚ <install-dir>/conf/hive-default.xml£¬Äã¿ÉÒÔÐÞ¸ÄÆäÖеÄÅäÖ㬲¢ÖØÃüÃûÕâ¸öÎļþΪ <install-dir>/conf/hive-site.xml£¨×¢£ºÎÒ½¨ÒéÄ㻹ÊDZ£ÁôԭʼÅäÖÃÎļþ£©¡£

£¨2£©HiveÅäÖÃÎļþµÄλÖÿÉÒÔͨ¹ýÉèÖÃHIVE_CONF_DIR»·¾³±äÁ¿À´¸Ä±ä¡£

£¨3£©Log4jµÄÅäÖñ£´æÔÚ<install-dir>/conf/hive-log4j.properties¡£

£¨4£©HiveµÄÅäÖôæÔÚÓÚHadoopÖ®ÉÏ£¬ÕâÒâζ×ÅHadoopµÄÅäÖÃĬÈϻᱻ¼Ì³Ðµ½HiveÖС£

£¨5£©HiveÅäÖÿÉÒÔ±»ÈçÏ·½Ê½¿ØÖÆ£º

±à¼­hive-site.xml²¢¶¨ÒåÈκÎÐèÒªµÄ±äÁ¿ (°üÀ¨hadoopµÄ±äÁ¿)

´ÓcliģʽʹÓÃʹÓÃsetÃüÁʹÓÃÈçÏ·½Ê½£º

$ bin/hive -hiveconf x1=y1 -hiveconf x2=y2

Õâ¸öÀý×Ó·Ö±ðÉèÖÃÁ˱äÁ¿x1Ϊy1£¬x2Ϊy2¡£ÉèÖÃʹÓà HIVE_OPTS»·¾³±äÁ¿ "-hiveconf x1=y1 -hiveconf x2=y2" ÓëÉÏÃæµÄ¹¦ÄÜÏàͬ¡£

Hive²éѯÊÇÖ´ÐÐmap-reduce²éѯ£¬²¢ÇÒÕâЩ²éѯÊÇ¿ÉÒÔ±»hadoopµÄÅäÖÃËù¿ØÖƵġ£ÃüÁîÐÐÃüÁî 'SET' ¿ÉÒÔ±»ÓÃÀ´ÉèÖÃÈκÎhadoop£¨»òÕßhive£©µÄÅäÖñäÁ¿£¬ÀýÈ磺

hive> SET mapred.job.tracker=myhost.mycompany.com:50030;
hive> SET -v;

ºóÕßSET -vÓÃÀ´²é¿´µ±Ç°È«²¿µÄÉèÖ᣶ø²»Ê¹Óà -v Ñ¡ÏÔòÊÇÓÃÀ´²é¿´µ±Ç°ÓëHadoop²»Í¬µÄÅäÖá£

4¡¢Hive£¬Map-ReduceÓë±¾µØÄ£Ê½

Hive±àÒëÆ÷»áΪ¾ø´ó¶àÊý²éѯÉú³Émap-reduceµÄjobs¡£ÕâЩJobsʹÓÃÏÂÃæÕâ¸ö±äÁ¿À´±íÃ÷±»Ìá½»µ½Map-Reduce¼¯ÈºÖУº

mapred.job.tracker

ÓÉÓÚÕâͨ³£ÊÇÔÚÒ»¸ö¶à½ÚµãµÄmap-reduce¼¯ÈºÖб»Ö¸³ö£¬HadoopͬÑùÓиö¾Í½üµÄ·½Ê½À´ÔÚÓû§µÄ¹¤×÷»úÉÏÔËÐÐmap-reduce jobs¡£Õâ¾ÍÔÚСÊý¾Ý¼¯µÄ²éѯÉÏÏԵ÷dz£ÓÐÓà - ÔÚÕâÖÖÇé¿öÏ£¬±¾µØÄ£Ê½ÔËÐÐͨ³£»á±È½«ÈÎÎñÌá½»µ½Õû¸ö´ó¼¯ÈºÖвéѯ¸ü¿ì¡£Êý¾Ý´ÓHDFSÉÏ»ñÈ¡ÊÇ͸Ã÷µÄ¡£Í¬ÑùµÄ£¬ÓÉÓÚ±¾µØÄ£Ê½½öÔËÐÐÒ»¸öreducer£¬ÕâÑù£¬ÔÚ´óÊý¾Ý²éѯÉÏÊǷdz£ÂýµÄ¡£

´Ó0.7°æ±¾¿ªÊ¼£¬HiveÈ«ÃæÖ§³Ö±¾µØÔËÐÐģʽ£¬ÒªÊ¹ÓÃÕâ¸öģʽ£¬Çë°´ÕÕÏÂÁнøÐÐÉèÖãº

hive> SET mapred.job.tracker=local;

ÁíÍ⣬mapred.local.dirÓ¦¸ÃÖ¸¶¨Ò»¸öºÏ·¨µÄ±¾»ú·¾¶£¨×¢£º°²×°hiveµÄÄÇ̨·þÎñÆ÷£©£¨ÀýÈ磺/tmp/<username>/mapred/local£©¡£·ñÔòÓû§½«»ñȡһ¸ö¶¨Î»±¾µØ¿Õ¼äµÄÒì³£Å׳ö£©¡£

´Ó0.7°æ±¾¿ªÊ¼£¬HiveͬÑùÖ§³Ö×Ô¶¯/·Ç×Ô¶¯µØÊ¹Óñ¾µØÄ£Ê½À´Íê³Émap-reduce jobs£¬Ïà¹ØµÄÑ¡ÏîÊÇ£º

hive> SET hive.exec.mode.local.auto=false;

Çë×¢ÒâÕâ¸ö¹¦ÄÜĬÈÏÊǹرյģ¬Èç¹û´ò¿ª£¬Hive½«·ÖÎöÿһ¸ö²éѯµÄmap-reduce job£¬²¢ÇÒÈç¹ûÒÔÏ·§Öµ±»È·ÈÏΪOK£¬¾Í³¢ÊÔÔËÐÐÔÚ±¾µØ£º

£¨1£©È«²¿jobµÄÊäÈëµÍÓÚ£ºhive.exec.mode.local.auto.inputbytes.max £¨128MB ĬÈÏ£©

£¨2£©È«²¿µÄmapÈÎÎñÊýµÍÓÚ£ºhive.exec.mode.local.auto.tasks.max £¨4 ¸öĬÈÏ£©

£¨3£©È«²¿reduceÈÎÎñÊýµÈÓÚ1»òÕß0¡£

¶ÔÓÚСÊý¾Ý¼¯ÉϵIJéѯ£¬»òÕß´øÓжà¸ömap-reduce jobsµÄ²éѯ£¬µ«ÊÇÕâЩjobµÄÊäÈëÊǺÜСµÄ(×¢£ºÐ¡ÓÚÉÏÊöÌõ¼þ)£¬jobsÈÔ¿ÉÄÜʹÓñ¾µØÄ£Ê½À´ÔËÐС£

×¢Ò⣬¿ÉÄÜhadoop·þÎñÆ÷½ÚµãºÍhive¿Í»§¶ËµÄÔËÐÐʱ»·¾³²»Í¬£¨ÓÉÓÚ²»Í¬µÄjvm°æ±¾»òÕß²»Í¬µÄÈí¼þ¿â£©¡£Õâ¿ÉÄܻᵼÖÂÔËÐб¾µØÄ£Ê½Ê±³öÏÖһЩÒâÍâµÄ´íÎó¡£Í¬ÑùÐèҪעÒâµÄÊÇ£¬±¾µØÔËÐÐģʽÊÇÔËÐÐÔÚÒ»¸ö¶ÀÁ¢µÄ×ÓjvmÖУ¨hive ¿Í»§¶ËµÄ×Ó½ø³Ì£©¡£Èç¹ûÓû§Ô¸Ò⣬×ÓjvmËùÄÜʹÓõÄ×î´óÄÚ´æÊý¿ÉÒÔͨ¹ýÑ¡Ïîhive.mapred.local.memÀ´½øÐпØÖÆ¡£Ä¬ÈÏÉèÖÃÊÇ0£¬ÕâʱHiveÈÃHadoopÀ´¾ö¶¨×ÓjvmµÄĬÈÏÄÚ´æÏÞÖÆ¡£

5¡¢´íÎóÈÕÖ¾

HiveʹÓÃlog4jÀ´¼Ç¼ÈÕÖ¾¡£Ä¬ÈÏÀ´Ëµ£¬ÈÕÖ¾²»»á±»·µ»Øµ½CLIģʽµÄ¿ØÖÆÌ¨ÉÏ¡£Ä¬ÈϵÄÈÕÖ¾¼Ç¼µÈ¼¶ÊÇWARN£¬²¢±»±£´æµ½ÒÔÏÂÎļþ¼ÐÖУº

/tmp/<user.name>/hive.log

Èç¹ûÓû§Ô¸Ò⣬ÈÕÖ¾¿ÉÒÔͨ¹ýÐÞ¸ÄÏÂÃæµÄ²ÎÊýÀ´·µ»Øµ½¿ØÖÆÌ¨ÉÏ£º

bin/hive -hiveconf hive.root.logger=INFO,console

ÁíÍ⣬Óû§¿ÉÒԸıä¼Ç¼µÈ¼¶£º

bin/hive -hiveconf hive.root.logger=INFO,DRFA

×¢Ò⣬ÅäÖÃÏîhive.root.loggerÔÚhive³õʼ»¯ÒԺ󣬲»ÄÜͨ¹ýʹÓÃ'set'ÃüÁîÀ´¸Ä±äÁË¡£

HiveÒ²»áÔÚÔÚ/tmp/<user.name>/ ÏÂΪÿ¸öhive»á»°±£´æ²éѯÈÕÖ¾£¬µ«ÊÇ¿ÉÒÔͨ¹ýÐÞ¸Ähive-site.xmlÖеÄhive.querylog.locationÊôÐÔÀ´±ä¸ü¡£

HiveÔÚÒ»¸öhadoop¼¯ÈºÉϵÄÔËÐÐÈÕÖ¾ÊÇÓÉHadoopµÄÅäÖÃËù¾ö¶¨µÄ¡£Í¨³£Hadoop»áΪÿ¸ömapºÍreduceÈÎÎñ´´½¨ÈÕÖ¾Îļþ£¬²¢±£´æÔÚÔËÐÐÈÎÎñµÄ¼¯Èº·þÎñÆ÷ÉÏ¡£ÈÕÖ¾Îļþ¿ÉÒÔͨ¹ýHadoop JobtrackerÌṩµÄWeb UIÉϵÄTask DetailÒ³ÃæÀ´¸ú×ٹ۲졣

ÔËÐб¾µØÄ£Ê½Ê±£¨mapred.job.tracker=local£©£¬Hadoop/Hive»á½«Ö´ÐÐÈÕÖ¾·ÅÔÚ±¾»úÉÏ£¬´Ó0.6°æ±¾¿ªÊ¼£¬HiveʹÓÃhive-exec-log4j.properties £¨Èç¹û²»´æÔÚ£¬ÔòÊÇʹÓÃhive-log4j.propertiesÎļþ£©À´¾ö¶¨Ä¬ÈÏÈÕÖ¾µÄ±£´æ·½Ê½¡£Ä¬ÈϵÄÅäÖÃÎļþ½«ÎªÃ¿¸öͨ¹ý±¾µØÄ£Ê½Ö´ÐеIJéѯÉú³ÉÒ»¸öÈÕÖ¾£¬²¢´æ·Åµ½/tmp/<user.name>Ï¡£ÕâÑù×öµÄÄ¿µÄÊÇΪÁ˽«ÅäÖõ¥¶À¹ÜÀí£¬²¢¿ÉÒÔ½«ÈÕÖ¾¼¯Öдæ·Åµ½Ò»¸öÓû§ÐèÒªµÄλÖÃÉÏ£¨ÀýÈçÒ»¸öNFSÎļþ·þÎñÆ÷£©¡£Ö´ÐÐÈÕÖ¾¶Ôµ÷ÊÔÔËÐÐʱ´íÎó²¢ÎÞ°ïÖú¡£

´íÎóÈÕÖ¾¶ÔÓÚ¶¨Î»ÎÊÌâ·Ç³£ÓÐÓã¬Ç뽫´æÔÚµÄÈκÎbug·¢Ë͵½hive-dev@hadoop.apache.org

6¡¢DDL²Ù×÷

´´½¨Hive±íºÍ²é¿´Ê¹ÓÃ

hive> CREATE TABLE pokes (foo INT, bar STRING);

´´½¨Ò»¸ö°üº¬Á½¸ö×ֶΣ¬Ãû³ÆÎªpokesµÄ±í£¬µÚÒ»¸ö×Ö¶ÎÊÇint£¨×¢£ºÕûÐÍ£©£¬µÚ¶þ¸ö×Ö¶ÎÊÇstring£¨×¢£º×Ö·û´®£©

hive> CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING);

´´½¨Ò»¸öÃûΪinvitesµÄ±í£¬ÆäÖаüº¬Á½¸ö×ֶκÍÒ»¸ö½Ð×ödsµÄ·ÖÇø(partition)×ֶΡ£·ÖÇø×Ö¶ÎÊÇÒ»¸öÐéÄâµÄ×ֶΣ¬±¾Éí²¢²»°üº¬ÔÚÊý¾ÝÖУ¬µ«ÊÇÊÇ´Ó¼ÓÔØ½øÀ´µÄÊý¾ÝÖÐÌØ±ðÑÜÉú³öÀ´µÄÊý¾Ý¼¯¡£

ĬÈÏÇé¿öÏ£¬±í±»¼Ù¶¨ÊÇ´¿Îı¾µÄ¸ñʽ£¬²¢ÇÒʹÓÃ^A(ctrl-a)À´×÷ΪÊý¾Ý·Ö¸îµÄ¡£

hive> SHOW TABLES;

ÁгöËùÓеıí

hive> SHOW TABLES '.*s';

ÁгöËùÓнáβ°üº¬ 's' µÄ±íÃû¡£Æ¥Å䷽ʽʹÓÃJavaÕýÔò±í´ïʽ£¬²é¿´ÏÂÁÐÁ¬½Ó»ñÈ¡¹ØÓÚJavaÕýÔòµÄÐÅÏ¢http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html

hive> DESCRIBE invites;

²é¿´invites±íµÄÈ«²¿×Ö¶Î

¿ÉÒÔÐ޸ıíÃû£¬Ôö¼Óɾ³ýеÄ×ֶεȣº

hive> ALTER TABLE pokes ADD COLUMNS (new_col INT);
hive> ALTER TABLE invites ADD COLUMNS (new_col2 INT COMMENT 'a comment');
hive> ALTER TABLE events RENAME TO 3koobecaf;

ɾ³ý±í£º

hive> DROP TABLE pokes;

7¡¢ÔªÊý¾Ý´æ´¢

ÔªÊý¾ÝĬÈÏʹÓÃDerbyÊý¾Ý¿â±£´æÔÚ±¾µØÎļþϵͳÖУ¬²¢±£´æÔÚ./metastore_dbÏ¡£Í¨¹ýÐÞ¸Äconf/hive-default.xmlÖеÄjavax.jdo.option.ConnectionURL±äÁ¿Ð޸ġ£

µ±Ç°£¬ÔÚĬÈÏÅäÖÃÏ£¬ÔªÊý¾Ýÿ´ÎÖ»ÄÜͬʱ±»Ò»¸öÓû§ËùʹÓá£

ÔªÊý¾Ý¿ÉÒÔ´æ´¢ÔÚÈκÎÒ»¸öʹÓÃJPOXÖ§³ÖµÄÊý¾Ý¿âÖУ¬ÕâЩ¹ØÏµÐÍÊý¾Ý¿âµÄÁ¬½ÓºÍÀàÐÍ¿ÉÒÔͨ¹ýÁ½¸ö±äÁ¿½øÐпØÖÆ¡£javax.jdo.option.ConnectionURLºÍjavax.jdo.option.ConnectionDriverName¡£

ÄãÐèÒª²é¿´Êý¾Ý¿âµÄJDO(»òJPOX)ÊÖ²áÀ´»ñÈ¡¸ü¶àÐÅÏ¢¡£

Êý¾Ý¿âµÄSchema¶¨ÒåÔÚJDOÔªÊý¾Ý×¢ÊÍÎļþpackage.jdoÖУ¬Î»ÖÃÔÚsrc/contrib/hive/metastore/src/model¡£

¼Æ»®ÔÚδÀ´£¬ÔªÊý¾Ý´æ´¢ÒýÇæ¿ÉÒÔ³ÉΪһ¸ö¶ÀÁ¢µÄ·þÎñ¡£

Èç¹ûÄãÏ뽫ԪÊý¾Ý×÷Ϊһ¸öÍøÂçµÄ·þÎñÀ´ÔÚ¶à¸ö½ÚµãÖзÃÎÊ£¬Çë³¢ÊÔHiveDerbyServerMode¡£

8¡¢DML²Ù×÷

½«ÎļþÖеÄÊý¾Ý¼ÓÔØµ½HiveÖÐ:

hive> LOAD DATA LOCAL INPATH './examples/files/kv1.txt' OVERWRITE INTO TABLE pokes;

¼ÓÔØµ½pokes±íµÄÎļþ°üº¬Á½¸öÓÃctrl-a·ûºÅ·Ö¸îµÄÊý¾ÝÁУ¬'LOCAL' Òâζ×ÅÎļþÊÇ´Ó±¾µØÎļþϵͳ¼ÓÔØ£¬Èç¹ûûÓÐ 'LOCAL' ÔòÒâζ×Å´ÓHDFSÖмÓÔØ¡£

¹Ø¼ü´Ê 'OVERWRITE' Òâζ×ŵ±Ç°±íÖÐÒѾ­´æÔÚµÄÊý¾Ý½«»á±»É¾³ýµô¡£

Èç¹ûûÓиø³ö 'OVERWRITE'£¬ÔòÒâζ×ÅÊý¾ÝÎļþ½«×·¼Óµ½µ±Ç°µÄÊý¾Ý¼¯ÖС£

×¢Ò⣬ͨ¹ýloadÃüÁî¼ÓÔØµÄÊý¾Ý²»»á±»Ð£ÑéÕýÈ·ÐÔ¡£Èç¹ûÎļþÔÚHDFSÉÏ£¬Ëû½«»á±»Òƶ¯µ½hiveËù¹ÜÀíµÄÎļþϵͳµÄÃüÃû¿Õ¼äÖС£HiveĿ¼µÄ¸ù·¾¶ÊÇÔÚhive-default.xmlÎļþÖеıäÁ¿hive.metastore.warehouse.dir¾ö¶¨µÄ¡£

ÎÒÃǽ¨ÒéÓû§ÔÚʹÓÃHive½¨±í֮ǰ¾Í´´½¨ºÃÕâ¸ö±äÁ¿Ö¸¶¨µÄĿ¼¡£

hive> LOAD DATA LOCAL INPATH './examples/files/kv2.txt' 
OVERWRITE INTO TABLE invites PARTITION (ds='2008-08-15');
hive> LOAD DATA LOCAL INPATH './examples/files/kv3.txt' OVERWRITE INTO TABLE invites PARTITION (ds='2008-08-08');

ÉÏÃæÕâÁ½¸öLOADÓï¾ä£¬½«¼ÓÔØ²»Í¬µÄÊý¾Ýµ½invites±íµÄ·ÖÇø(partition)ÖС£invites±í±ØÐëÊÂÏÈʹÓÃds´´½¨ºÃpartition¡£

hive> LOAD DATA INPATH '/user/myname/kv2.txt' OVERWRITE INTO TABLE invites PARTITION (ds='2008-08-15');

ÉÏÊöÃüÁîÊǽ«HDFSÉϵÄÎļþ¼ÓÔØµ½±íÖС£

×¢Òâ´ÓHDFSÖмÓÔØÊý¾Ý£¬½«»á°ÑÊý¾ÝÒÆ¶¯µ½Ä¿Â¼Ï¡£Õ⼸ºõÊÇ˲¼äÍê³ÉµÄ¡££¨×¢£ºÒòΪֻÊÇÔÚHDFSÔªÊý¾ÝÖÐÐÞ¸ÄÁËÎļþ·¾¶µÄÖ¸Ïò¡££©

9¡¢SQL ²éѯ

£¨1£©²éѯʾÀý

ÏÂÃæ»áÑÝʾһЩ²éѯ·¶Àý£¬ÔÚbuild/dist/examples/queriesÖпÉÒÔÕÒµ½¡£¸ü¶àµÄ£¬¿ÉÒÔÔÚhiveÔ´ÂëÖеÄql/src/test/queries/positiveÖпÉÒÔÕÒµ½¡£

£¨2£©SELECTSºÍFILTERS

hive> SELECT a.foo FROM invites a WHERE a.ds='2008-08-15';

´Óinvite±íµÄ×Ö¶Î 'foo' ÖÐÑ¡ÔñËùÓзÖÇøds=2008-08-15µÄ½á¹û¡£ÕâЩ½á¹û²¢²»´æ´¢ÔÚÈκεط½£¬Ö»ÔÚ¿ØÖÆÌ¨ÖÐÏÔʾ¡£

×¢ÒâÏÂÃæµÄʾÀýÖУ¬INSERT (µ½hive±í£¬±¾µØÄ¿Â¼»òÕßHDFSĿ¼) ÊÇ¿ÉÑ¡ÃüÁî¡£

hive> INSERT OVERWRITE DIRECTORY '/tmp/hdfs_out' SELECT a.* FROM invites a WHERE a.ds='2008-08-15';

´Óinvites±íÖÐÑ¡Ôñ·ÖÇøds=2008-08-15 µÄËùÓÐÐУ¬²¢·ÅÈëHDFSĿ¼ÖС£½á¹ûÊý¾Ý´æ·ÅÓÚ/tmp/hdfs_outĿ¼ÖеÄÎļþ(¶à¸öÎļþ£¬ÎļþÊýÁ¿È¡¾öÓÚmapperµÄÊýÁ¿£©¡£

´æÔÚ·ÖÇøµÄ±íÔÚʹÓà WHERE Ìõ¼þ¹ýÂ˵Äʱºò±ØÐëÖÁÉÙÖ¸¶¨Ò»¸ö·ÖÇøÀ´²éѯ¡£

hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/local_out' SELECT a.* FROM pokes a;

Ñ¡Ôñpokes±íÖÐËùÓеÄÊý¾Ý²¢·Åµ½Ò»¸ö±¾µØ£¨×¢£ºµ±Ç°·þÎñÆ÷£©µÄÎļþ·¾¶ÖС£

hive> INSERT OVERWRITE TABLE events SELECT a.* FROM profiles a;

hive> INSERT OVERWRITE TABLE events SELECT a.* FROM profiles a;

hive> INSERT OVERWRITE TABLE events SELECT a.* FROM profiles a WHERE a.key < 100;

hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/reg_3' SELECT a.* FROM events a;

hive> INSERT OVERWRITE DIRECTORY '/tmp/reg_4' select a.invites, a.pokes FROM profiles a;

hive> INSERT OVERWRITE DIRECTORY '/tmp/reg_5' SELECT COUNT(*) FROM invites a WHERE a.ds='2008-08-15';

hive> INSERT OVERWRITE DIRECTORY '/tmp/reg_5' SELECT a.foo, a.bar FROM invites a;

hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/sum' SELECT SUM(a.pc) FROM pc1 a;

×ֶμÆËãºÍ£¬×î´óÖµ£¬×îСֵͬÑù¿ÉÒÔʹÓã¬×¢Òâ¶Ô²»°üº¬HIVE-287µÄHive°æ±¾£¬ÄãÐèҪʹÓÃCOUNT(1) À´´úÌæ COUNT(*)¡£

£¨3£©GROUP BY

hive> FROM invites a INSERT OVERWRITE TABLE events SELECT a.bar, count(*) WHERE a.foo > 0 GROUP BY a.bar;

hive> INSERT OVERWRITE TABLE events SELECT a.bar, count(*) FROM invites a WHERE a.foo > 0 GROUP BY a.bar;

×¢Òâ¶Ô²»°üº¬HIVE-287µÄHive°æ±¾£¬ÄãÐèҪʹÓÃCOUNT(1) À´´úÌæ COUNT(*)¡£

£¨4£©JOIN

hive> FROM pokes t1 JOIN invites t2 ON (t1.bar = t2.bar) INSERT OVERWRITE TABLE events SELECT t1.bar, t1.foo, t2.foo;

£¨5£©MULTITABLE INSERT(¶àÖØ²åÈë)

FROM src

INSERT OVERWRITE TABLE dest1 SELECT src.* WHERE src.key < 100

INSERT OVERWRITE TABLE dest2 SELECT src.key, src.value WHERE src.key >= 100 and src.key < 200

INSERT OVERWRITE TABLE dest3 PARTITION(ds='2008-04-08', hr='12') SELECT src.key WHERE src.key >= 200 and src.key < 300

INSERT OVERWRITE LOCAL DIRECTORY '/tmp/dest4.out' SELECT src.value WHERE src.key >= 300;

£¨6£©STREAMING

hive> FROM invites a INSERT OVERWRITE TABLE events
SELECT TRANSFORM(a.foo, a.bar) AS (oof, rab) USING '/bin/cat' WHERE a.ds > '2008-08-09';

ÔÚmapÖÐʹÓýű¾/bin/cat¶ÔÊý¾ÝµÄÁ÷ʽ·ÃÎÊ£¨ÀàËÆÓÚhadoop streaming£©¡£Í¬ÑùµÄ£¬ Á÷ʽ·ÃÎÊÒ²¿ÉÒÔʹÓÃÔÚreduce½×¶Î£¨Çë²é¿´Hive Tutorial·¶Àý£©¡£

10¡¢¼òµ¥µÄʹÓ÷¶Àý

£¨1£©Óû§¶ÔµçÓ°µÄͶƱͳ¼Æ

Ê×ÏÈ£¬´´½¨Ò»¸öʹÓÃtab·Ö¸îµÄÎı¾ÎļþµÄ±í

CREATE TABLE u_data (  

userid INT,

movieid INT,

rating INT,

unixtime STRING)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY '\t'

STORED AS TEXTFILE;

È»ºó£¬ÏÂÔØÕâ¸öÊý¾ÝÎļþ²¢½âѹ£º

wget http://www.grouplens.org/system/files/ml-data.tar+0.gz

tar xvzf ml-data.tar+0.gz

½«Õâ¸öÎļþ¼ÓÔØµ½¸Õ¸Õ´´½¨µÄ±íÖУº

LOAD DATA LOCAL INPATH 'ml-data/u.data' OVERWRITE INTO TABLE u_data;

¼ÆËã±íu_dataÖеÄ×ÜÐÐÊý£º

SELECT COUNT(*) FROM u_data;

×¢Òâ¶Ô²»°üº¬HIVE-287µÄHive°æ±¾£¬ÄãÐèҪʹÓÃCOUNT(1) À´´úÌæ COUNT(*)¡£

ÏÖÔÚ£¬ÎÒÃÇ¿ÉÒÔÔÚ±íu_dataÖÐ×öһЩ¸´ÔÓµÄÊý¾Ý·ÖÎö

´´½¨weekday_mapper.py£º

import sys  

import datetime

for line in sys.stdin:

line = line.strip()

userid, movieid, rating, unixtime = line.split('\t')

weekday = datetime.datetime.fromtimestamp(float(unixtime)).isoweekday()

print '\t'.join([userid, movieid, rating, str(weekday)])

ʹÓÃmapper½Å±¾£º

CREATE TABLE u_data_new (  

userid INT,

movieid INT,

rating INT,

weekday INT)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY '\t';

add FILE weekday_mapper.py;

INSERT OVERWRITE TABLE u_data_new

SELECT

TRANSFORM (userid, movieid, rating, unixtime)

USING 'python weekday_mapper.py'

AS (userid, movieid, rating, weekday)

FROM u_data;

SELECT weekday, COUNT(*)

FROM u_data_new

GROUP BY weekday;

×¢Òâ¶Ô0.5.0¼°¸üÔçµÄµÄHive°æ±¾£¬ÄãÐèҪʹÓÃCOUNT(1) À´´úÌæ COUNT(*)¡£

£¨2£©Apache WebÈÕÖ¾Êý¾Ý

ApacheÈÕÖ¾¸ñʽÊÇ¿ÉÒÔ×Ô¶¨ÒåµÄ£¬×÷Ϊ´ó¶àÊýÍø¹ÜÀ´Ëµ¶¼ÊÇʹÓÃĬÈÏÉèÖá£

ÎÒÃÇ¿ÉÒÔ¸øÄ¬ÈϵÄApacheÈÕÖ¾´´½¨Ò»¸öÈçÏÂµÄ±í¡£

add jar ../build/contrib/hive_contrib.jar;  

CREATE TABLE apachelog (

host STRING,

identity STRING,

user STRING,

time STRING,

request STRING,

status STRING,

size STRING,

referer STRING,

agent STRING)

ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'

WITH SERDEPROPERTIES (

"input.regex" = "([^]*) ([^]*) ([^]*) (-|\\[^\\]*\\]) ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\".*\") ([^ \"]*|\".*\"))?",

"output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s"

)

STORED AS TEXTFILE;

11¡¢HiveµÄ»ù±¾¸ÅÄî

ÏÂÃæÕûÀí×ÔHive Tutorial²¿·Ö£ºhttps://cwiki.apache.org/confluence/display/Hive/Tutorial

£¨1£©HiveÊÇʲô£¿

HiveÊÇ»ùÓÚhadoop¹¹½¨µÄÊý¾Ý²Ö¿â»ù´¡¼Ü¹¹£¬Í¨¹ýÌṩһϵÁеŤ¾ß£¬Ê¹µÃÓû§Äܹ»·½±ãµÄ×öÊý¾ÝETL£¬Êý¾Ý½á¹¹»¯£¬²¢Õë¶Ô´æ·ÅÔÚhadoopÉϵĺ£Á¿Êý¾Ý½øÐвéѯºÍ·ÖÎö¡£

hive¶¨ÒåÁËÒ»ÖÖ¼òµ¥µÄÀàSQL²éѯÓïÑÔ---QL£¬QLÓïÑÔ·½±ãÊìϤSQLÓïÑÔµÄÓû§È¥²éѯÊý¾Ý¡£´ËÍ⣬hiveÒ²Ö§³ÖÊìϤmap-reduceµÄ¿ª·¢ÕßʹÓÃmap-reduce³ÌÐò¶ÔÊý¾Ý×ö¸ü¼Ó¸´ÔӵķÖÎö¡£hive¿ÉÒԺܺõĽáºÏthriftºÍ¿ØÖÆ·Ö¸ô·û£¬Ò²Ö§³ÖÓû§×Ô¶¨Òå·Ö¸ô·û¡£

£¨2£©Hive²»ÊÇʲô£¿

Hive»ùÓÚhadoop£¬hadoopÊÇÅú´¦Àíϵͳ£¬²»Äܱ£Ö¤µÍÑÓ³Ù£¬Òò´Ë£¬hiveµÄ²éѯҲ²»Äܱ£Ö¤µÍÑÓ³Ù¡£

HiveµÄ¹¤×÷ģʽÊÇÌá½»Ò»¸öÈÎÎñ£¬µÈµ½ÈÎÎñ½áÊøÊ±±»Í¨Öª£¬¶ø²»ÊÇʵʱ²éѯ¡£Ïà¶ÔÓ¦µÄÊÇ£¬ÀàËÆÓÚoracleÕâÑùµÄϵͳµ±ÔËÐÐÓÚСÊý¾Ý¼¯µÄʱºò£¬ÏìÓ¦·Ç³£¿ì£¬¿Éµ±´¦ÀíµÄÊý¾Ý¼¯·Ç³£´óµÄʱºò£¬¿ÉÄÜÐèÒªÊýСʱ¡£ÐèҪ˵Ã÷µÄÊÇ£¬hive¼´Ê¹ÔÚºÜСµÄÊý¾Ý¼¯ÉÏÔËÐУ¬Ò²¿ÉÄÜÐèÒªÊý·ÖÖÓ²ÅÄÜÍê³É¡£

×ÜÖ®£¬µÍÑÓ³Ù²»ÊÇhive×·ÇóµÄÊ×ҪĿ±ê¡£hiveµÄÉè¼ÆÄ¿±êÊÇ£º¿ÉÉìËõ¡¢¿ÉÀ©Õ¹¡¢ÈÝ´í¼°ÊäÈë¸ñʽËÉñîºÏ¡£

£¨3£©Êý¾Ýµ¥Ôª

°´ÕÕÊý¾ÝµÄÁ£¶È´óС£¬hiveÊý¾Ý¿ÉÒÔ±»×éÖ¯³É£º

1£©databases£º ±ÜÃⲻͬ±í²úÉúÃüÃû³åÍ»µÄÒ»ÖÖÃüÃû¿Õ¼ä

2£©tables£º¾ßÓÐÏàͬscemaµÄͬÖÊÊý¾ÝµÄ¼¯ºÏ

3£©partitions£ºÒ»¸ö±í¿ÉÒÔÓÐÒ»¸ö»ò¶à¸ö¾ö¶¨Êý¾ÝÈçºÎ´æ´¢µÄpartition key

4£©buckets£¨»òclusters£©£ºÔÚͬһ¸öpartitionÖеÄÊý¾Ý¿ÉÒÔ¸ù¾Ýij¸öÁеÄhashÖµ·ÖΪ¶à¸öbucket¡£partitionºÍbucket²¢·Ç±ØÒª£¬µ«ÊÇËüÃÇÄÜ´ó´ó¼Ó¿ìÊý¾ÝµÄ²éѯËÙ¶È¡£

12¡¢Êý¾ÝÀàÐÍ

£¨1£©¼òµ¥ÀàÐÍ£º

TINYINT - 1 byte integer

SMALLINT - 2 byte integer

INT - 4 byte integer

BIGINT - 8 byte

BOOLEAN - TRUE/ FALSE

FLOAT - µ¥¾«¶È

DOUBLE - Ë«¾«¶È

STRING - ×Ö·û´®¼¯ºÏ

£¨2£©¸´ÔÓÀàÐÍ£º

Structs£º structsÄÚ²¿µÄÊý¾Ý¿ÉÒÔͨ¹ýDOT£¨.£©À´´æÈ¡£¬ÀýÈ磬±íÖÐÒ»ÁÐcµÄÀàÐÍΪSTRUCT{a INT; b INT}£¬ÎÒÃÇ¿ÉÒÔͨ¹ýc.aÀ´·ÃÎÊÓòa¡£

Maps£¨Key-Value¶Ô£©£º·ÃÎÊÖ¸¶¨Óò¿ÉÒÔͨ¹ý['element name']½øÐУ¬ÀýÈ磬һ¸öMap M°üº¬ÁËÒ»¸ögroup->gidµÄk-v¶Ô£¬gidµÄÖµ¿ÉÒÔͨ¹ýM['group']À´»ñÈ¡¡£

Arrays£ºarrayÖеÄÊý¾ÝΪÏàͬÀàÐÍ£¬ÀýÈ磬¼ÙÈçarray AÖÐÔªËØ['a','b','c']£¬ÔòA[1]µÄֵΪ'b'¡£

13¡¢ÄÚ½¨ÔËËã·ûºÍº¯Êý

°üÀ¨¹ØÏµÔËËã·û(A=B, A!=B, A<BµÈµÈ£©¡¢ËãÊõÔËËã·û£¨A+B, A*B, A&B, A|BµÈµÈ£©¡¢Âß¼­ÔËËã·û£¨A&&B, A|BµÈµÈ£©¡¢¸´ÔÓÀàÐÍÉϵÄÔËËã·û£¨A[n], M[key], S.x£©¡¢¸÷ÖÖÄÚ½¨º¯Êý¡£

14¡¢ÓïÑÔÄÜÁ¦

hive²éѯÓïÑÔÌṩ»ù±¾µÄÀàsql²Ù×÷£¬ÕâЩ²Ù×÷»ùÓÚtableºÍpartition£¬°üÀ¨£º

1. ʹÓÃwhereÓï¾ä¹ýÂËÖÆ¶¨ÐÐ

2. ʹÓÃselect²éÕÒÖ¸¶¨ÁÐ

3. joinÁ½ÕÅtable

4. group by

5. Ò»¸ö±íµÄ²éѯ½á¹û´æÈëÁíÒ»Õűí

6. ½«Ò»¸ö±íµÄÄÚÈÝ´æÈë±¾µØÄ¿Â¼

7. ½«²éѯ½á¹û´æ´¢µ½hdfsÉÏ

8. ¹ÜÀítableºÍpartition(creat¡¢drop¡¢alert)

9. ÔÚ²éѯÖÐǶÈëmap-reduce³ÌÐò

   
3054 ´Îä¯ÀÀ       30
Ïà¹ØÎÄÕÂ

»ùÓÚEAµÄÊý¾Ý¿â½¨Ä£
Êý¾ÝÁ÷½¨Ä££¨EAÖ¸ÄÏ£©
¡°Êý¾Ýºþ¡±£º¸ÅÄî¡¢ÌØÕ÷¡¢¼Ü¹¹Óë°¸Àý
ÔÚÏßÉ̳ÇÊý¾Ý¿âϵͳÉè¼Æ ˼·+Ч¹û
 
Ïà¹ØÎĵµ

GreenplumÊý¾Ý¿â»ù´¡Åàѵ
MySQL5.1ÐÔÄÜÓÅ»¯·½°¸
ijµçÉÌÊý¾ÝÖÐ̨¼Ü¹¹Êµ¼ù
MySQL¸ßÀ©Õ¹¼Ü¹¹Éè¼Æ
Ïà¹Ø¿Î³Ì

Êý¾ÝÖÎÀí¡¢Êý¾Ý¼Ü¹¹¼°Êý¾Ý±ê×¼
MongoDBʵս¿Î³Ì
²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿âÉè¼ÆÓëÓÅ»¯
PostgreSQLÊý¾Ý¿âʵսÅàѵ
×îл¼Æ»®
DeepSeekÔÚÈí¼þ²âÊÔÓ¦ÓÃʵ¼ù 4-12[ÔÚÏß]
DeepSeek´óÄ£ÐÍÓ¦Óÿª·¢Êµ¼ù 4-19[ÔÚÏß]
UAF¼Ü¹¹ÌåϵÓëʵ¼ù 4-11[±±¾©]
AIÖÇÄÜ»¯Èí¼þ²âÊÔ·½·¨Óëʵ¼ù 5-23[ÉϺ£]
»ùÓÚ UML ºÍEA½øÐзÖÎöÉè¼Æ 4-26[±±¾©]
ÒµÎñ¼Ü¹¹Éè¼ÆÓ뽨ģ 4-18[±±¾©]

MySQLË÷Òý±³ºóµÄÊý¾Ý½á¹¹
MySQLÐÔÄܵ÷ÓÅÓë¼Ü¹¹Éè¼Æ
SQL ServerÊý¾Ý¿â±¸·ÝÓë»Ö¸´
ÈÃÊý¾Ý¿â·ÉÆðÀ´ 10´óDB2ÓÅ»¯
oracleµÄÁÙʱ±í¿Õ¼äдÂú´ÅÅÌ
Êý¾Ý¿âµÄ¿çƽ̨Éè¼Æ


²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿â
¸ß¼¶Êý¾Ý¿â¼Ü¹¹Éè¼ÆÊ¦
HadoopÔ­ÀíÓëʵ¼ù
Oracle Êý¾Ý²Ö¿â
Êý¾Ý²Ö¿âºÍÊý¾ÝÍÚ¾ò
OracleÊý¾Ý¿â¿ª·¢Óë¹ÜÀí


GE Çø¿éÁ´¼¼ÊõÓëʵÏÖÅàѵ
º½Ìì¿Æ¹¤Ä³×Ó¹«Ë¾ Nodejs¸ß¼¶Ó¦Óÿª·¢
ÖÐÊ¢Òæ»ª ׿Խ¹ÜÀíÕß±ØÐë¾ß±¸µÄÎåÏîÄÜÁ¦
ijÐÅÏ¢¼¼Êõ¹«Ë¾ PythonÅàѵ
ij²©²ÊITϵͳ³§ÉÌ Ò×ÓÃÐÔ²âÊÔÓëÆÀ¹À
ÖйúÓÊ´¢ÒøÐÐ ²âÊÔ³ÉÊì¶ÈÄ£Ðͼ¯³É(TMMI)
ÖÐÎïÔº ²úÆ·¾­ÀíÓë²úÆ·¹ÜÀí