
HiveÊÇ»ùÓÚHadoopµÄÊý¾Ý²Ö¿â¹¤¾ß£¬¿É¶Ô´æ´¢ÔÚHDFSÉϵÄÎļþÖеÄÊý¾Ý¼¯½øÐÐÊý¾ÝÕûÀí¡¢ÌØÊâ²éѯºÍ·ÖÎö´¦Àí£¬ÌṩÁËÀàËÆÓÚSQLÓïÑԵIJéѯÓïÑÔ¨CHiveQL£¬¿Éͨ¹ýHQLÓï¾äʵÏÖ¼òµ¥µÄMRͳ¼Æ£¬Hive½«HQLÓï¾äת»»³ÉMRÈÎÎñ½øÐÐÖ´ÐС£
Ò»¡¢¸ÅÊö
1-1 Êý¾Ý²Ö¿â¸ÅÄî
Êý¾Ý²Ö¿â(Data Warehouse)ÊÇÒ»¸öÃæÏòÖ÷ÌâµÄ(Subject Oriented)¡¢¼¯³ÉµÄ(Integrated)¡¢Ïà¶ÔÎȶ¨µÄ(Non-Volatile)¡¢·´Ó¦ÀúÊ·±ä»¯(Time
Variant)µÄÊý¾Ý¼¯ºÏ£¬ÓÃÓÚÖ§³Ö¹ÜÀí¾ö²ß¡£
Êý¾Ý²Ö¿âÌåϵ½á¹¹Í¨³£º¬Ëĸö²ã´Î£ºÊý¾ÝÔ´¡¢Êý¾Ý´æ´¢ºÍ¹ÜÀí¡¢Êý¾Ý·þÎñ¡¢Êý¾ÝÓ¦Óá£
Êý¾ÝÔ´£ºÊÇÊý¾Ý²Ö¿âµÄÊý¾ÝÀ´Ô´£¬º¬ÍⲿÊý¾Ý¡¢ÏÖÓÐÒµÎñϵͳºÍÎĵµ×ÊÁϵÈ;
Êý¾Ý¼¯³É£ºÍê³ÉÊý¾ÝµÄ³éÈ¡¡¢ÇåÏ´¡¢×ª»»ºÍ¼ÓÔØÈÎÎñ£¬Êý¾ÝÔ´ÖеÄÊý¾Ý²ÉÓÃETL(Extract-Transform-Load)¹¤¾ßÒԹ̶¨µÄÖÜÆÚ¼ÓÔØµ½Êý¾Ý²Ö¿âÖС£
Êý¾Ý´æ´¢ºÍ¹ÜÀí£º´Ë²ã´ÎÖ÷񻃾¼°¶ÔÊý¾ÝµÄ´æ´¢ºÍ¹ÜÀí£¬º¬Êý¾Ý²Ö¿â¡¢Êý¾Ý¼¯ÊС¢Êý¾Ý²Ö¿â¼ì²â¡¢ÔËÐÐÓëά»¤¹¤¾ßºÍÔªÊý¾Ý¹ÜÀíµÈ¡£
Êý¾Ý·þÎñ£ºÎªÇ°¶ËºÍÓ¦ÓÃÌṩÊý¾Ý·þÎñ£¬¿ÉÖ±½Ó´ÓÊý¾Ý²Ö¿âÖлñÈ¡Êý¾Ý¹©Ç°¶ËÓ¦ÓÃʹÓã¬Ò²¿Éͨ¹ýOLAP(OnLine
Analytical Processing£¬Áª»ú·ÖÎö´¦Àí)·þÎñÆ÷Ϊǰ¶ËÓ¦ÓÃÌṩ¸ºÔðµÄÊý¾Ý·þÎñ¡£
Êý¾ÝÓ¦Ó㺴˲ã´ÎÖ±½ÓÃæÏòÓû§£¬º¬Êý¾Ý²éѯ¹¤¾ß¡¢×ÔÓɱ¨±í¹¤¾ß¡¢Êý¾Ý·ÖÎö¹¤¾ß¡¢Êý¾ÝÍÚ¾ò¹¤¾ßºÍ¸÷ÀàÓ¦ÓÃϵͳ¡£
1-2 ´«Í³Êý¾Ý²Ö¿âµÄÎÊÌâ
ÎÞ·¨Âú×ã¿ìËÙÔö³¤µÄº£Á¿Êý¾Ý´æ´¢ÐèÇ󣬴«Í³Êý¾Ý²Ö¿â»ùÓÚ¹ØÏµÐÍÊý¾Ý¿â£¬ºáÏòÀ©Õ¹ÐԽϲ×ÝÏòÀ©Õ¹ÓÐÏÞ¡£
ÎÞ·¨´¦Àí²»Í¬ÀàÐ͵ÄÊý¾Ý£¬´«Í³Êý¾Ý²Ö¿âÖ»ÄÜ´æ´¢½á¹¹»¯Êý¾Ý£¬ÆóÒµÒµÎñ·¢Õ¹£¬Êý¾ÝÔ´µÄ¸ñʽԽÀ´Ô½·á¸»¡£
´«Í³Êý¾Ý²Ö¿â½¨Á¢ÔÚ¹ØÏµÐÍÊý¾Ý²Ö¿âÖ®ÉÏ£¬¼ÆËãºÍ´¦ÀíÄÜÁ¦²»×㣬µ±Êý¾ÝÁ¿´ïµ½TB¼¶ºó»ù±¾ÎÞ·¨»ñµÃºÃµÄÐÔÄÜ¡£
1-3 Hive
HiveÊǽ¨Á¢ÔÚHadoopÖ®ÉϵÄÊý¾Ý²Ö¿â£¬ÓÉFacebook¿ª·¢£¬ÔÚijÖ̶ֳÈÉÏ¿ÉÒÔ¿´³ÉÊÇÓû§±à³Ì½Ó¿Ú£¬±¾Éí²¢²»´æ´¢ºÍ´¦ÀíÊý¾Ý£¬ÒÀÀµÓÚHDFS´æ´¢Êý¾Ý£¬ÒÀÀµMR´¦ÀíÊý¾Ý¡£ÓÐÀàSQLÓïÑÔHiveQL£¬²»Íêȫ֧³ÖSQL±ê×¼£¬È磬²»Ö§³Ö¸üвÙ×÷¡¢Ë÷ÒýºÍÊÂÎñ£¬Æä×Ó²éѯºÍÁ¬½Ó²Ù×÷Ò²´æÔںܶàÏÞÖÆ¡£
Hive°ÑHQLÓï¾äת»»³ÉMRÈÎÎñºó£¬²ÉÓÃÅú´¦ÀíµÄ·½Ê½¶Ôº£Á¿Êý¾Ý½øÐд¦Àí¡£Êý¾Ý²Ö¿â´æ´¢µÄÊǾ²Ì¬Êý¾Ý£¬ºÜÊʺϲÉÓÃMR½øÐÐÅú´¦Àí¡£Hive»¹ÌṩÁËһϵÁжÔÊý¾Ý½øÐÐÌáÈ¡¡¢×ª»»¡¢¼ÓÔØµÄ¹¤¾ß£¬¿ÉÒÔ´æ´¢¡¢²éѯºÍ·ÖÎö´æ´¢ÔÚHDFSÉϵÄÊý¾Ý¡£
1-4 HiveÓëHadoopÉú̬ϵͳÖÐÆäËû×é¼þµÄ¹ØÏµ
HiveÒÀÀµÓÚHDFS´æ´¢Êý¾Ý£¬ÒÀÀµMR´¦ÀíÊý¾Ý;
Pig¿É×÷ΪHiveµÄÌæ´ú¹¤¾ß£¬ÊÇÒ»ÖÖÊý¾ÝÁ÷ÓïÑÔºÍÔËÐл·¾³£¬ÊʺÏÓÃÓÚÔÚHadoopƽ̨Éϲéѯ°ë½á¹¹»¯Êý¾Ý¼¯£¬ÓÃÓÚÓëETL¹ý³ÌµÄÒ»²¿·Ö£¬¼´½«ÍⲿÊý¾Ý×°ÔØµ½Hadoop¼¯ÈºÖУ¬×ª»»ÎªÓû§ÐèÒªµÄÊý¾Ý¸ñʽ;
HBaseÊÇÒ»¸öÃæÏòÁеġ¢·Ö²¼Ê½¿ÉÉìËõµÄÊý¾Ý¿â£¬¿ÉÌṩÊý¾ÝµÄʵʱ·ÃÎʹ¦ÄÜ£¬¶øHiveÖ»ÄÜ´¦Àí¾²Ì¬Êý¾Ý£¬Ö÷ÒªÊÇBI±¨±íÊý¾Ý£¬HiveµÄ³õÖÔÊÇΪ¼õÉÙ¸´ÔÓMRÓ¦ÓóÌÐòµÄ±àд¹¤×÷£¬HBaseÔòÊÇΪÁËʵÏÖ¶ÔÊý¾ÝµÄʵʱ·ÃÎÊ¡£

1-5 HiveÓ봫ͳÊý¾Ý¿âµÄ¶Ô±È

1-6 HiveµÄ²¿ÊðºÍÓ¦ÓÃ
1-6-1 HiveÔÚÆóÒµ´óÊý¾Ý·ÖÎöƽ̨ÖеÄÓ¦ÓÃ
µ±Ç°ÆóÒµÖв¿ÊðµÄ´óÊý¾Ý·ÖÎöƽ̨£¬³ýHadoopµÄ»ù±¾×é¼þHDFSºÍMRÍ⣬»¹½áºÏʹÓÃHive¡¢Pig¡¢HBase¡¢Mahout£¬´Ó¶øÂú×㲻ͬҵÎñ³¡¾°ÐèÇó¡£

ÉÏͼÊÇÆóÒµÖÐÒ»ÖÖ³£¼ûµÄ´óÊý¾Ý·ÖÎöƽ̨²¿Êð¿ò¼Ü £¬ÔÚÕâÖÖ²¿Êð¼Ü¹¹ÖУº
HiveºÍPigÓÃÓÚ±¨±íÖÐÐÄ£¬HiveÓÃÓÚ·ÖÎö±¨±í£¬PigÓÃÓÚ±¨±íÖÐÊý¾ÝµÄת»»¹¤×÷¡£
HBaseÓÃÓÚÔÚÏßÒµÎñ£¬HDFS²»Ö§³ÖËæ»ú¶Áд²Ù×÷£¬¶øHBaseÕýÊÇΪ´Ë¿ª·¢£¬¿É½ÏºÃµØÖ§³Öʵʱ·ÃÎÊÊý¾Ý¡£
MahoutÌṩһЩ¿ÉÀ©Õ¹µÄ»úÆ÷ѧϰÁìÓòµÄ¾µäË㷨ʵÏÖ£¬ÓÃÓÚ´´½¨ÉÌÎñÖÇÄÜ(BI)Ó¦ÓóÌÐò¡£
¶þ¡¢Hiveϵͳ¼Ü¹¹
ÏÂͼÏÔʾHiveµÄÖ÷Òª×é³ÉÄ£¿é¡¢HiveÈçºÎÓëHadoop½»»¥¹¤×÷¡¢ÒÔ¼°´ÓÍⲿ·ÃÎÊHiveµÄ¼¸ÖÖµäÐÍ·½Ê½¡£

HiveÖ÷ÒªÓÉÒÔÏÂÈý¸öÄ£¿é×é³É£º
Óû§½Ó¿ÚÄ£¿é£¬º¬CLI¡¢HWI¡¢JDBC¡¢Thrift ServerµÈ£¬ÓÃÀ´ÊµÏÖ¶ÔHiveµÄ·ÃÎÊ¡£CLIÊÇHive×Ô´øµÄÃüÁîÐнçÃæ;HWIÊÇHiveµÄÒ»¸ö¼òµ¥ÍøÒ³½çÃæ;JDBC¡¢ODBCÒÔ¼°Thrift
Server¿ÉÏòÓû§Ìṩ½øÐбà³ÌµÄ½Ó¿Ú£¬ÆäÖÐThrift ServerÊÇ»ùÓÚThriftÈí¼þ¿ò¼Ü¿ª·¢µÄ£¬ÌṩHiveµÄRPCͨÐŽӿڡ£
Çý¶¯Ä£¿é(Driver)£¬º¬±àÒëÆ÷¡¢ÓÅ»¯Æ÷¡¢Ö´ÐÐÆ÷µÈ£¬¸ºÔð°ÑHiveQLÓï¾äת»»³ÉһϵÁÐMR×÷Òµ£¬ËùÓÐÃüÁîºÍ²éѯ¶¼»á½øÈëÇý¶¯Ä£¿é£¬Í¨¹ý¸ÃÄ£¿éµÄ½âÎö±äÒ죬¶Ô¼ÆËã¹ý³Ì½øÐÐÓÅ»¯£¬È»ºó°´ÕÕÖ¸¶¨µÄ²½ÖèÖ´ÐС£
ÔªÊý¾Ý´æ´¢Ä£¿é(Metastore)£¬ÊÇÒ»¸ö¶ÀÁ¢µÄ¹ØÏµÐÍÊý¾Ý¿â£¬Í¨³£ÓëMySQLÊý¾Ý¿âÁ¬½Óºó´´½¨µÄÒ»¸öMySQLʵÀý£¬Ò²¿ÉÒÔÊÇHive×Ô´øµÄDerbyÊý¾Ý¿âʵÀý¡£´ËÄ£¿éÖ÷Òª±£´æ±íģʽºÍÆäËûϵͳԪÊý¾Ý£¬Èç±íµÄÃû³Æ¡¢±íµÄÁм°ÆäÊôÐÔ¡¢±íµÄ·ÖÇø¼°ÆäÊôÐÔ¡¢±íµÄÊôÐÔ¡¢±íÖÐÊý¾ÝËùÔÚλÖÃÐÅÏ¢µÈ¡£
ϲ»¶Í¼ÐνçÃæµÄÓû§£¬¿É²ÉÓü¸ÖÖµäÐ͵ÄÍⲿ·ÃÎʹ¤¾ß£ºKarmasphere¡¢Hue¡¢QuboleµÈ¡£
Èý¡¢Hive¹¤×÷ÔÀí
3-1 SQLÓï¾äת»»³ÉMapReduce×÷ÒµµÄ»ù±¾ÔÀí
3-1-1 ÓÃMapReduceʵÏÖÁ¬½Ó²Ù×÷
¼ÙÉèÁ¬½Ó(join)µÄÁ½¸ö±í·Ö±ðÊÇÓû§±íUser(uid,name)ºÍ¶©µ¥±íOrder(uid,orderid)£¬¾ßÌåµÄSQLÃüÁ
SELECT name, orderid FROM User u JOIN
Order o ON u.uid=o.uid;
ÉÏͼÃèÊöÁËÁ¬½Ó²Ù×÷ת»»ÎªMapReduce²Ù×÷ÈÎÎñµÄ¾ßÌåÖ´Ðйý³Ì¡£
Ê×ÏÈ£¬ÔÚMap½×¶Î£¬
User±íÒÔuidΪkey£¬ÒÔnameºÍ±íµÄ±ê¼Çλ(ÕâÀïUserµÄ±ê¼Çλ¼ÇΪ1)Ϊvalue£¬½øÐÐMap²Ù×÷£¬°Ñ±íÖмǼת»»Éú³ÉһϵÁÐKV¶ÔµÄÐÎʽ¡£±ÈÈ磬User±íÖмǼ(1,Lily)ת»»Îª¼üÖµ¶Ô(1,<1,Lily>)£¬ÆäÖеÚÒ»¸ö¡°1¡±ÊÇuidµÄÖµ£¬µÚ¶þ¸ö¡°1¡±ÊDZíUserµÄ±ê¼Ç룬ÓÃÀ´±êʾÕâ¸ö¼üÖµ¶ÔÀ´×ÔUser±í;
ͬÑù£¬Order±íÒÔuidΪkey£¬ÒÔorderidºÍ±íµÄ±ê¼Çλ(ÕâÀï±íOrderµÄ±ê¼Çλ¼ÇΪ2)Ϊֵ½øÐÐMap²Ù×÷£¬°Ñ±íÖеļǼת»»Éú³ÉһϵÁÐKV¶ÔµÄÐÎʽ;
½Ó×Å£¬ÔÚShuffle½×¶Î£¬°ÑUser±íºÍOrder±íÉú³ÉµÄKV¶Ô°´¼üÖµ½øÐÐHash£¬È»ºó´«Ë͸ø¶ÔÓ¦µÄReduce»úÆ÷Ö´ÐС£±ÈÈçKV¶Ô(1,<1,Lily>)¡¢(1,<2,101>)¡¢(1,<2,102>)´«Ë͵½Í¬Ò»Ì¨Reduce»úÆ÷ÉÏ¡£µ±Reduce»úÆ÷½ÓÊÕµ½ÕâЩKV¶Ôʱ£¬»¹Ðè°´±íµÄ±ê¼Çλ¶ÔÕâЩ¼üÖµ¶Ô½øÐÐÅÅÐò£¬ÒÔÓÅ»¯Á¬½Ó²Ù×÷;
×îºó£¬ÔÚReduce½×¶Î£¬¶Ôͬһ̨Reduce»úÆ÷ÉϵļüÖµ¶Ô£¬¸ù¾Ý¡°Öµ¡±(value)Öеıí±ê¼Ç룬¶ÔÀ´×Ô±íUserºÍOrderµÄÊý¾Ý½øÐеѿ¨¶û»ýÁ¬½Ó²Ù×÷£¬ÒÔÉú³É×îÖյĽá¹û¡£±ÈÈç¼üÖµ¶Ô(1,<1,Lily>)Óë¼üÖµ¶Ô(1,<2,101>)¡¢(1,<2,102>)µÄÁ¬½Ó½á¹ûÊÇ(Lily,101)¡¢(Lily,102)¡£
3-1-2 ÓÃMRʵÏÖ·Ö×é²Ù×÷
¼ÙÉè·ÖÊý±íScore(rank, level)£¬¾ßÓÐrank(ÅÅÃû)ºÍlevel(¼¶±ð)Á½¸öÊôÐÔ£¬ÐèÒª½øÐÐÒ»¸ö·Ö×é(Group
By)²Ù×÷£¬¹¦ÄÜÊǰѱíScoreµÄ²»Í¬Æ¬¶Î°´ÕÕrankºÍlevelµÄ×éºÏÖµ½øÐкϲ¢£¬²¢¼ÆË㲻ͬµÄ×éºÏÖµÓм¸Ìõ¼Ç¼¡£SQLÓï¾äÃüÁîÈçÏ£º
SELECT rank,level,count(*) as value
FROM score GROUP BY rank,level;

ÉÏͼÃèÊö·Ö×é²Ù×÷ת»¯ÎªMapReduceÈÎÎñµÄ¾ßÌåÖ´Ðйý³Ì¡£
Ê×ÏÈ£¬ÔÚMap½×¶Î£¬¶Ô±íScore½øÐÐMap²Ù×÷£¬Éú³ÉһϵÁÐKV¶Ô£¬Æä¼üΪ<rank, level>£¬ÖµÎª¡°ÓµÓиÃ<rank,
level>×éºÏÖµµÄ¼Ç¼µÄÌõÊý¡±¡£±ÈÈ磬Score±íµÄµÚһƬ¶ÎÖÐÓÐÁ½Ìõ¼Ç¼(A,1)£¬ËùÒÔ½øÐÐMap²Ù×÷ºó£¬×ª»¯Îª¼üÖµ¶Ô(<A,1>,2);
½Ó×ÅÔÚShuffle½×¶Î£¬¶ÔScore±íÉú³ÉµÄ¼üÖµ¶Ô£¬°´ÕÕ¡°¼ü¡±µÄÖµ½øÐÐHash£¬È»ºó¸ù¾ÝHash½á¹û´«Ë͸ø¶ÔÓ¦µÄReduce»úÆ÷È¥Ö´ÐС£±ÈÈ磬¼üÖµ¶Ô(<A,1>,2)¡¢(<A,1>,1)´«Ë͵½Í¬Ò»Ì¨Reduce»úÆ÷ÉÏ£¬¼üÖµ¶Ô(<B,2>,1)´«ËÍÁíÒ»Reduce»úÆ÷ÉÏ¡£È»ºó£¬Reduce»úÆ÷¶Ô½ÓÊÕµ½µÄÕâЩ¼üÖµ¶Ô£¬°´¡°¼ü¡±µÄÖµ½øÐÐÅÅÐò;
ÔÚReduce½×¶Î£¬°Ñ¾ßÓÐÏàͬ¼üµÄËùÓмüÖµ¶ÔµÄ¡°Öµ¡±½øÐÐÀÛ¼Ó£¬Éú³É·Ö×éµÄ×îÖÕ½á¹û¡£±ÈÈ磬ÔÚͬһ̨Reduce»úÆ÷ÉϵļüÖµ¶Ô(<A,1>,2)ºÍ(<A,1>,1)Reduce²Ù×÷ºóµÄÊä³ö½á¹ûΪ(A,1,3)¡£
3-2 HiveÖÐSQL²éѯת»»³ÉMR×÷ÒµµÄ¹ý³Ì
µ±Hive½ÓÊÕµ½Ò»ÌõHQLÓï¾äºó£¬ÐèÒªÓëHadoop½»»¥¹¤×÷À´Íê³É¸Ã²Ù×÷¡£HQLÊ×ÏȽøÈëÇý¶¯Ä£¿é£¬ÓÉÇý¶¯Ä£¿éÖеıàÒëÆ÷½âÎö±àÒ룬²¢ÓÉÓÅ»¯Æ÷¶Ô¸Ã²Ù×÷½øÐÐÓÅ»¯¼ÆË㣬Ȼºó½»¸øÖ´ÐÐÆ÷È¥Ö´ÐС£Ö´ÐÐÆ÷ͨ³£Æô¶¯Ò»¸ö»ò¶à¸öMRÈÎÎñ£¬ÓÐʱҲ²»Æô¶¯(ÈçSELECT
* FROM tb1£¬È«±íɨÃ裬²»´æÔÚͶӰºÍÑ¡Ôñ²Ù×÷)

ÉÏͼÊÇHive°ÑHQLÓï¾äת»¯³ÉMRÈÎÎñ½øÐÐÖ´ÐеÄÏêϸ¹ý³Ì¡£
ÓÉÇý¶¯Ä£¿éÖеıàÒëÆ÷¨CAntlrÓïÑÔʶ±ð¹¤¾ß£¬¶ÔÓû§ÊäÈëµÄSQLÓï¾ä½øÐдʷ¨ºÍÓï·¨½âÎö£¬½«HQLÓï¾äת»»³É³éÏóÓï·¨Ê÷(AST
Tree)µÄÐÎʽ;
±éÀú³éÏóÓï·¨Ê÷£¬×ª»¯³ÉQueryBlock²éѯµ¥Ôª¡£ÒòΪAST½á¹¹¸´ÔÓ£¬²»·½±ãÖ±½Ó·Òë³ÉMRËã·¨³ÌÐò¡£ÆäÖÐQueryBlockÊÇÒ»Ìõ×î»ù±¾µÄSQLÓï·¨×é³Éµ¥Ôª£¬°üÀ¨ÊäÈëÔ´¡¢¼ÆËã¹ý³Ì¡¢ºÍÊäÈëÈý¸ö²¿·Ö;
±éÀúQueryBlock£¬Éú³ÉOperatorTree(²Ù×÷Ê÷)£¬OperatorTreeÓɺܶàÂß¼²Ù×÷·û×é³É£¬ÈçTableScanOperator¡¢SelectOperator¡¢FilterOperator¡¢JoinOperator¡¢GroupByOperatorºÍReduceSinkOperatorµÈ¡£ÕâЩÂß¼²Ù×÷·û¿ÉÔÚMap¡¢Reduce½×¶ÎÍê³ÉÄ³Ò»ÌØ¶¨²Ù×÷;
HiveÇý¶¯Ä£¿éÖеÄÂß¼ÓÅ»¯Æ÷¶ÔOperatorTree½øÐÐÓÅ»¯£¬±ä»»OperatorTreeµÄÐÎʽ£¬ºÏ²¢¶àÓàµÄ²Ù×÷·û£¬¼õÉÙMRÈÎÎñÊý¡¢ÒÔ¼°Shuffle½×¶ÎµÄÊý¾ÝÁ¿;
±éÀúÓÅ»¯ºóµÄOperatorTree£¬¸ù¾ÝOperatorTreeÖеÄÂß¼²Ù×÷·ûÉú³ÉÐèÒªÖ´ÐеÄMRÈÎÎñ;
Æô¶¯HiveÇý¶¯Ä£¿éÖеÄÎïÀíÓÅ»¯Æ÷£¬¶ÔÉú³ÉµÄMRÈÎÎñ½øÐÐÓÅ»¯£¬Éú³É×îÖÕµÄMRÈÎÎñÖ´Ðмƻ®;
×îºó£¬ÓÐHiveÇý¶¯Ä£¿éÖеÄÖ´ÐÐÆ÷£¬¶Ô×îÖÕµÄMRÈÎÎñÖ´ÐÐÊä³ö¡£
HiveÇý¶¯Ä£¿éÖеÄÖ´ÐÐÆ÷Ö´ÐÐ×îÖÕµÄMRÈÎÎñʱ£¬Hive±¾Éí²»»áÉú³ÉMRËã·¨³ÌÐò¡£Ëüͨ¹ýÒ»¸ö±íʾ¡°JobÖ´Ðмƻ®¡±µÄXMLÎļþ£¬À´Çý¶¯ÄÚÖõġ¢ÔÉúµÄMapperºÍReducerÄ£¿é¡£Hiveͨ¹ýºÍJobTrackerͨÐÅÀ´³õʼ»¯MRÈÎÎñ£¬¶ø²»ÐèÖ±½Ó²¿ÊðÔÚJobTrackerËùÔÚ¹ÜÀí½ÚµãÉÏÖ´ÐС£Í¨³£ÔÚ´óÐͼ¯ÈºÖУ¬»áÓÐרÃŵÄÍø¹Ø»úÀ´²¿ÊðHive¹¤¾ß£¬ÕâÐ©Íø¹Ø»úµÄ×÷ÓÃÖ÷ÒªÊÇÔ¶³Ì²Ù×÷ºÍ¹ÜÀí½ÚµãÉϵÄJobTrackerͨÐÅÀ´Ö´ÐÐÈÎÎñ¡£HiveÒª´¦ÀíµÄÊý¾ÝÎļþ³£´æ´¢ÔÚHDFSÉÏ£¬HDFSÓÉÃû³Æ½Úµã(NameNode)À´¹ÜÀí¡£
JobTracker/TaskTracker
NameNode/DataNode
ËÄ¡¢Hive HA»ù±¾ÔÀí
ÔÚʵ¼ÊÓ¦ÓÃÖУ¬HiveÒ²±©Â¶³ö²»Îȶ¨µÄÎÊÌ⣬ÔÚ¼«ÉÙÊýÇé¿öÏ£¬»á³öÏֶ˿ڲ»ÏìÓ¦»ò½ø³Ì¶ªÊ§ÎÊÌâ¡£Hive
HA(High Availablity)¿ÉÒÔ½â¾öÕâÀàÎÊÌâ¡£

ÔÚHive HAÖУ¬ÔÚHadoop¼¯ÈºÉϹ¹½¨µÄÊý¾Ý²Ö¿âÊÇÓɶà¸öHiveʵÀý½øÐйÜÀíµÄ£¬ÕâЩHiveʵÀý±»ÄÉÈëµ½Ò»¸ö×ÊÔ´³ØÖУ¬ÓÉHAProxyÌṩͳһµÄ¶ÔÍâ½Ó¿Ú¡£¿Í»§¶ËµÄ²éѯÇëÇó£¬Ê×ÏÈ·ÃÎÊHAProxy£¬ÓÉHAProxy¶Ô·ÃÎÊÇëÇó½øÐÐת·¢¡£HAProxyÊÕµ½ÇëÇóºó£¬»áÂÖѯ×ÊÔ´³ØÖпÉÓõÄHiveʵÀý£¬Ö´ÐÐÂß¼¿ÉÓÃÐÔ²âÊÔ¡£
Èç¹ûij¸öHiveʵÀýÂß¼¿ÉÓ㬾ͻá°Ñ¿Í»§¶ËµÄ·ÃÎÊÇëÇóת·¢µ½HiveʵÀýÉÏ;
Èç¹ûij¸öʵÀý²»¿ÉÓ㬾ͰÑËü·ÅÈëºÚÃûµ¥£¬²¢¼ÌÐø´Ó×ÊÔ´³ØÖÐÈ¡³öÏÂÒ»¸öHiveʵÀý½øÐÐÂß¼¿ÉÓÃÐÔ²âÊÔ¡£
¶ÔÓÚºÚÃûµ¥ÖеÄHive£¬Hive HA»áÿ¸ôÒ»¶Îʱ¼ä½øÐÐͳһ´¦Àí£¬Ê×Ïȳ¢ÊÔÖØÆô¸ÃHiveʵÀý£¬Èç¹ûÖØÆô³É¹¦£¬¾ÍÔٴΰÑËü·ÅÈë×ÊÔ´³ØÖС£
ÓÉÓÚHAProxyÌṩͳһµÄ¶ÔÍâ·ÃÎʽӿڣ¬Òò´Ë£¬¶ÔÓÚ³ÌÐò¿ª·¢ÈËÔ±À´Ëµ£¬¿É°ÑËü¿´³Éһ̨³¬Ç¿¡°Hive¡±¡£
Îå¡¢Impala
5-1 Impala¼ò½é
ImpalaÓÉCloudera¹«Ë¾¿ª·¢£¬ÌṩSQLÓïÒ壬¿É²éѯ´æ´¢ÔÚHadoopºÍHBaseÉϵÄPB¼¶º£Á¿Êý¾Ý¡£HiveÒ²ÌṩSQLÓïÒ壬µ«µ×²ãÖ´ÐÐÈÎÎñÈÔ½èÖúÓÚMR£¬ÊµÊ±ÐÔ²»ºÃ£¬²éѯÑӳٽϸߡ£
Impala×÷ΪÐÂÒ»´ú¿ªÔ´´óÊý¾Ý·ÖÎöÒýÇæ£¬×î³õ²ÎÕÕDremel(ÓÉGoogle¿ª·¢µÄ½»»¥Ê½Êý¾Ý·ÖÎöϵͳ)£¬Ö§³Öʵʱ¼ÆË㣬ÌṩÓëHiveÀàËÆµÄ¹¦ÄÜ£¬ÔÚÐÔÄÜÉϸ߳öHive3~30±¶¡£Impala¿ÉÄܻᳬ¹ýHiveµÄʹÓÃÂÊÄܳÉΪHadoopÉÏ×îÁ÷ÐеÄʵʱ¼ÆËãÆ½Ì¨¡£Impala²ÉÓÃÓëÉÌÓò¢ÐйØÏµÊý¾Ý¿âÀàËÆµÄ·Ö²¼Ê½²éѯÒýÇæ£¬¿ÉÖ±½Ó´ÓHDFS¡¢HBaseÖÐÓÃSQLÓï¾ä²éѯÊý¾Ý£¬²»Ðè°ÑSQLÓï¾äת»»³ÉMRÈÎÎñ£¬½µµÍÑÓ³Ù£¬¿ÉºÜºÃµØÂú×ãʵʱ²éѯÐèÇó¡£
Impala²»ÄÜÌæ»»Hive£¬¿ÉÌṩһ¸öͳһµÄƽ̨ÓÃÓÚʵʱ²éѯ¡£ImpalaµÄÔËÐÐÒÀÀµÓÚHiveµÄÔªÊý¾Ý(Metastore)¡£ImpalaºÍHive²ÉÓÃÏàͬµÄSQLÓï·¨¡¢ODBCÇý¶¯³ÌÐòºÍÓû§½Ó¿Ú£¬¿Éͳһ²¿ÊðHiveºÍImpalaµÈ·ÖÎö¹¤¾ß£¬Í¬Ê±Ö§³ÖÅú´¦ÀíºÍʵʱ²éѯ¡£
5-2 Impalaϵͳ¼Ü¹¹

ÉÏͼÊÇImpalaϵͳ½á¹¹Í¼£¬ÐéÏßÄ£¿éÊý¾ÝImpala×é¼þ¡£ImpalaºÍHive¡¢HDFS¡¢HBaseͳһ²¿ÊðÔÚHadoopƽ̨ÉÏ¡£ImpalaÓÉImpalad¡¢State
StoreºÍCLIÈý²¿·Ö×é³É¡£
Implalad£ºÊÇImpalaµÄÒ»¸ö½ø³Ì£¬¸ºÔðе÷¿Í»§¶ËÌṩµÄ²éѯִÐУ¬¸øÆäËûImpalad·ÖÅäÈÎÎñ£¬ÒÔ¼°ÊÕ¼¯ÆäËûImpaladµÄÖ´Ðнá¹û½øÐлã×Ü¡£ImpaladÒ²»áÖ´ÐÐÆäËûImpalad¸øÆä·ÖÅäµÄÈÎÎñ£¬Ö÷ÒªÊǶԱ¾µØHDFSºÍHBaseÀïµÄ²¿·ÖÊý¾Ý½øÐвÙ×÷¡£Impalad½ø³ÌÖ÷Òªº¬Query
Planner¡¢Query CoordinatorºÍQuery Exec EngineÈý¸öÄ£¿é£¬ÓëHDFSµÄÊý¾Ý½Úµã(HDFS
DataNode)ÔËÐÐÔÚͬһ½ÚµãÉÏ£¬ÇÒÍêÈ«·Ö²¼ÔËÐÐÔÚMPP(´ó¹æÄ£²¢Ðд¦Àíϵͳ)¼Ü¹¹ÉÏ¡£
State Store£ºÊÕ¼¯·Ö²¼ÔÚ¼¯ÈºÉϸ÷¸öImpalad½ø³ÌµÄ×ÊÔ´ÐÅÏ¢£¬ÓÃÓÚ²éѯµÄµ÷¶È£¬Ëü»á´´½¨Ò»¸östatestored½ø³Ì£¬À´¸ú×Ù¼¯ÈºÖеÄImpaladµÄ½¡¿µ×´Ì¬¼°Î»ÖÃÐÅÏ¢¡£statestored½ø³Ìͨ¹ý´´½¨¶à¸öÏß³ÌÀ´´¦ÀíImpaladµÄ×¢²á¶©ÔÄÒÔ¼°Óë¶à¸öImpalad±£³ÖÐÄÌøÁ¬½Ó£¬´ËÍ⣬¸÷Impalad¶¼»á»º´æÒ»·ÝState
StoreÖеÄÐÅÏ¢¡£µ±State StoreÀëÏߺó£¬ImpaladÒ»µ©·¢ÏÖState Store´¦ÓÚÀëÏß״̬ʱ£¬¾Í»á½øÈë»Ö¸´Ä£Ê½£¬²¢½øÐзµ»Ø×¢²á¡£µ±State
StoreÖØÐ¼ÓÈ뼯Ⱥºó£¬×Ô¶¯»Ö¸´Õý³££¬¸üлº´æÊý¾Ý¡£
CLI£ºCLI¸øÓû§ÌṩÁËÖ´ÐвéѯµÄÃüÁîÐй¤¾ß¡£Impala»¹ÌṩÁËHue¡¢JDBC¼°ODBCʹÓýӿڡ£
5-3 Impala²éѯִÐйý³Ì

×¢²áºÍ¶©ÔÄ¡£µ±Óû§Ìá½»²éѯǰ£¬ImpalaÏÈ´´½¨Ò»¸öImpalad½ø³ÌÀ´¸ºÔðе÷¿Í»§¶ËÌá½»µÄ²éѯ£¬¸Ã½ø³Ì»áÏòState
StoreÌá½»×¢²á¶©ÔÄÐÅÏ¢£¬State Store»á´´½¨Ò»¸östatestored½ø³Ì£¬statestored½ø³Ìͨ¹ý´´½¨¶à¸öÏß³ÌÀ´´¦ÀíImpaladµÄ×¢²á¶©ÔÄÐÅÏ¢¡£
Ìá½»²éѯ¡£Í¨¹ýCLIÌá½»Ò»¸ö²éѯµ½Impalad½ø³Ì£¬ImpaladµÄQuery Planner¶ÔSQLÓï¾ä½âÎö£¬Éú³É½âÎöÊ÷;Planner½«½âÎöÊ÷±ä³ÉÈô¸ÉPlanFragment£¬·¢Ë͵½Query
Coordinator¡£ÆäÖÐPlanFragmentÓÉPlanNode×é³É£¬Äܱ»·Ö·¢µ½µ¥¶ÀµÄ½ÚµãÉÏÖ´ÐУ¬Ã¿¸öPlanNode±íʾһ¸ö¹ØÏµ²Ù×÷ºÍ¶ÔÆäÖ´ÐÐÓÅ»¯ÐèÒªµÄÐÅÏ¢¡£
»ñȡԪÊý¾ÝÓëÊý¾ÝµØÖ·¡£Query Coordinator´ÓMySQLÔªÊý¾Ý¿âÖлñȡԪÊý¾Ý(¼´²éѯÐèÒªÓõ½ÄÄЩÊý¾Ý)£¬´ÓHDFSµÄÃû³Æ½ÚµãÖлñÈ¡Êý¾ÝµØÖ·(¼´Êý¾Ý±»±£´æµ½ÄĸöÊý¾Ý½ÚµãÉÏ)£¬´Ó¶øµÃµ½´æ´¢Õâ¸ö²éѯÏà¹ØÊý¾ÝµÄËùÓÐÊý¾Ý½Úµã¡£
·Ö·¢²éѯÈÎÎñ¡£Query Coordinator³õʼ»¯ÏàÓ¦µÄImpaladÉϵÄÈÎÎñ£¬¼´°Ñ²éѯÈÎÎñ·ÖÅ䏸ËùÓд洢Õâ¸ö²éѯÏà¹ØÊý¾ÝµÄÊý¾Ý½Úµã¡£
»ã¾Û½á¹û¡£Query Executorͨ¹ýÁ÷ʽ½»»»ÖмäÊä³ö£¬²¢ÓÉQuery Coordinator»ã¾ÛÀ´×Ô¸÷¸öImpaladµÄ½á¹û¡£
·µ»Ø½á¹û¡£Query Coordinator°Ñ»ã×ܺóµÄ½á¹û·µ»Ø¸øCLI¿Í»§¶Ë¡£
5-4 ImpalaÓëHive

²»Í¬µã£º
HiveÊʺϳ¤Ê±¼äÅú´¦Àí²éѯ·ÖÎö;¶øImpalaÊʺϽøÐн»»¥Ê½SQL²éѯ¡£
HiveÒÀÀµÓÚMR¼ÆËã¿ò¼Ü£¬Ö´Ðмƻ®×éºÏ³É¹ÜµÀÐÍMRÈÎÎñÄ£ÐͽøÐÐÖ´ÐÐ;¶øImpalaÔò°ÑÖ´Ðмƻ®±íÏÖΪһ¿ÃÍêÕûµÄÖ´Ðмƻ®Ê÷£¬¿É¸ü×ÔÈ»µØ·Ö·¢Ö´Ðмƻ®µ½¸÷¸öImpaladÖ´Ðвéѯ¡£
HiveÔÚÖ´Ðйý³ÌÖУ¬ÈôÄÚ´æ·Å²»ÏÂËùÓÐÊý¾Ý£¬Ôò»áʹÓÃÍâ´æ£¬ÒÔ±£Ö¤²éѯÄܹ»Ë³ÀûÖ´ÐÐÍê³É;¶øImpalaÔÚÓöµ½ÄÚ´æ·Å²»ÏÂÊý¾Ýʱ£¬²»»áÀûÓÃÍâ´æ£¬ËùÒÔImpala´¦Àí²éѯʱ»áÊܵ½Ò»¶¨µÄÏÞÖÆ¡£
Ïàͬµã£º
ʹÓÃÏàͬµÄ´æ´¢Êý¾Ý³Ø£¬¶¼Ö§³Ö°ÑÊý¾Ý´æ´¢ÔÚHDFSºÍHBaseÖУ¬ÆäÖÐHDFSÖ§³Ö´æ´¢TEXT¡¢RCFILE¡¢PARQUET¡¢AVRO¡¢ETCµÈ¸ñʽµÄÊý¾Ý£¬HBase´æ´¢±íÖмǼ¡£
ʹÓÃÏàͬµÄÔªÊý¾Ý¡£
¶ÔSQLµÄ½âÎö´¦Àí±È½ÏÀàËÆ£¬¶¼ÊÇͨ¹ý´Ê·¨·ÖÎöÉú³ÉÖ´Ðмƻ®¡£ |