±à¼ÍƼö: |
±¾ÎÄÖ÷Òª½éÉÜÁËImpalaÕâ¸ö¸ßÐÔÄܵÄad-hoc²éѯÒýÇæ£¬·Ö±ð´ÓʹÓá¢ÔÀíºÍ²¿ÊðµÈ·½Ãæ×öÁËÏêϸµÄ·ÖÎö£¬×îÖÕ»ùÓÚÎÒÃǵIJâÊÔ½á¹ûҲ֤ʵÁËËüµÄ¸ßÐÔÄÜ,Ï£Íû¶ÔÄúµÄѧϰÓÐËù°ïÖú¡£
±¾ÎÄÀ´×ÔÊý¾Ý¹ÜÀí£¬ÓÉ»ðÁú¹ûÈí¼þAlice±à¼¡¢ÍƼö¡£ |
|
±³¾°
Ëæ×Å´óÊý¾Ýʱ´úµÄµ½À´£¬HadoopÔÚ¹ýÈ¥¼¸ÄêÒÔ½Ó½üͳÖÎÐԵķ½Ê½°üÀ¿µÄETLºÍÊý¾Ý·ÖÎö²éѯµÄ¹¤×÷£¬´ó¼ÒÒ²ÎÞÒâ¼äµÄÏëÍù´óÊý¾Ý·½Ïò¿¿Â££¬¼´Ê¹Ã¿ÌìÊý¾ÝÒ²¾Í¼¸Ê®¡¢¼¸°ÙMÒ²Òª·Åµ½HadoopÉÏ×÷·ÖÎö£¬Ö»»áÊÊµÃÆä·´£¬µ«Êǵ±Ãæ¶ÔÕæÕýµÄBig
DataµÄʱºò£¬Hadoop¾Í»á±©Â¶³öËü¶ÔÓÚÊý¾Ý·ÖÎö²éѯ֧³ÖµÄÈõµã¡£ÉõÖÁ³öÏÖ¡¶MapReduce:
Ò»¸ö¾Þ´óµÄµ¹ÍË¡·´ËÀ༫¶ËµÄͲۣ¬ÕâÒ²¹Ö²»µÃHadoop£¬±Ï¾¹ËüµÄÉè¼Æ¾ÍÊÇΪÁËÅú´¦Àí£¬Ê¹ÓÃÓÃMRµÄ±à³ÌÄ£ÐÍÀ´ÊµÏÖSQL²éѯ£¬ÐÔÄܿ϶¨²»ÈçÒâ¡£ËùÒÔͨ³£ÎÒÒ²Ö»ÊǰÑHiveµ±×öÄܹ»Ìṩ½«SQLÓïÒåת»»³ÉMRÈÎÎñµÄ¹¤¾ß£¬ÓÈÆäÔÚ×öETLµÄʱºò¡£
ÔÚDremelÂÛÎÄ·¢±íÖ®ºó£¬¿ªÔ´ÉçÇøÓ¿ÏÖ³öÁËÒ»Åú»ùÓÚMPP¼Ü¹¹µÄSQL-on-Hadoop(HDFS)²éѯÒýÇæ£¬µäÐÍ´ú±íÓÐApache
Impala¡¢Presto¡¢Apache Drill¡¢Apache HAWQµÈ£¬¿´ÉÏÈ¥ÕâЩ²éѯÒýÇæÌṩµÄ¹¦ÄܺÍʵÏÖ·½Ê½Ò²¶¼´óͬСÒ죬±¾ÎĽ«»ùÓÚImpalaµÄʹÓúÍʵÏÖ½éÉÜÈÕÒæ·¢Õ¹µÄ»ùÓÚHDFSµÄMPPÊý¾Ý²éѯÒýÇæ¡£
Impala½éÉÜ
Apache ImpalaÊÇÓÉCloudera¿ª·¢²¢¿ªÔ´µÄÒ»¿î»ùÓÚHDFS/HbaseµÄMPP
SQLÒýÇæ£¬ËüÓµÓкÍHadoopÒ»ÑùµÄ¿ÉÀ©Õ¹ÐÔ¡¢ËüÌṩÁËÀàSQL£¨ÀàHsql£©Óï·¨£¬ÔÚ¶àÓû§³¡¾°ÏÂÒ²ÄÜÓµÓнϸߵÄÏìÓ¦ËٶȺÍÍÌÍÂÁ¿¡£ËüÊÇÓÉJavaºÍC++ʵÏֵģ¬JavaÌṩµÄ²éѯ½»»¥µÄ½Ó¿ÚºÍʵÏÖ£¬C++ʵÏÖÁ˲éѯÒýÇæ²¿·Ö£¬³ý´ËÖ®Í⣬Impala»¹Äܹ»¹²ÏíHive
Metastore£¨ÕâÖð½¥±ä³ÉÒ»ÖÖ±ê×¼£©£¬ÉõÖÁ¿ÉÒÔÖ±½ÓʹÓÃHiveµÄJDBC jarºÍbeelineµÈÖ±½Ó¶ÔImpala½øÐвéѯ¡¢Ö§³Ö·á¸»µÄÊý¾Ý´æ´¢¸ñʽ£¨Parquet¡¢AvroµÈ£©£¬µ±È»³ýÁËÓбȽÏÃ÷È·µÄÀíÓÉ£¬Parquet×ÜÊÇʹÓÃImpalaµÄµÚһѡÔñ¡£
´ÓÓû§ÊÓ½Ç
¿ÉÒÔ½«ImpalaÕâÀàϵͳµÄÓû§·ÖΪÁ½À࣬һÀàÊǸºÔðÊý¾Ýµ¼ÈëºÍ¹ÜÀíµÄÊý¾Ý¿ª·¢Í¬Ñ§£¬ÁíÒ»ÀàÔòÊÇÖ´ÐвéѯµÄÊý¾Ý·ÖÎöʦͬѧ£¬Ç°Õßͨ³£ÐèÒª½«Êý¾Ý´æ´¢µ½HDFS£¬Í¨¹ýCREATE
TABLEµÄ·½Ê½´´½¨ÓëÊý¾ÝmatchµÄschema£¬È»ºóͨ¹ýload data»òÕßadd partitionµÄ·½Ê½½«±íºÍÊý¾Ý¹ØÁªÆðÀ´£¬ÕâһЩÁ÷³Ì´®ÆðÀ´»¹ÊÇͦÂé·³µÄ£¬µ«ÊǶà¿÷ÁËHive£¬ÓÉÓÚImpala¿ÉÒÔ¹²ÏíHiveµÄMetaStore£¬ÕâÑù¾Í¿ÉÒÔʹÓÃHiveÍê³É´ËÀàETL¹¤×÷£¬È»ºó½«Êý¾Ý²éѯµÄ¹¤×÷½»¸øImpala£¬´ó´ó¼ò»¯¹¤×÷Á÷³Ì£¨¾ÝÎÒËùÖª±Ï¾¹´ó²¿·ÖÊý¾Ý¿ª·¢Í¬Ñ§»¹ÊDZȽÏÊìϤHive£©¡£½ÓÏÂÀ´¶ÔÓÚÊý¾Ý·ÖÎöʦ¶øÑÔ¾ÍÊÇÈçºÎ±àдÕýÈ·µÄSQÒÔ±í´ïËûÃǵIJéѯ¡¢·ÖÎöÐèÇó£¬ÕâÒ²ÊÇËüÃÇ×îÄÃÊÖµÄÁË£¬Impalaͨ³£¿ÉÒÔÔÚTB¼¶±ðµÄÊý¾ÝÉÏÌṩÃë¼¶µÄ²éѯËÙ¶È£¬ËùÒÔʹÓÃÆðÀ´¿ÉÄÜÈÃÄã´ÓHiveµÄ¹êËÙÏìÓ¦Ò»ÏÂÌáÉýµ½ÆÚÍûµÄËÙ¶È¡£
Impala³ýÁËÖ§³Ö¼òµ¥ÀàÐÍÖ®Í⣬»¹Ö§³ÖString¡¢timestamp¡¢decimalµÈ¶àÖÖÀàÐÍ£¬Óû§»¹¿ÉÒÔ¶ÔÓÚÌØÊâµÄÂ߼ʵÏÖ×Ô¶¨Ò庯Êý£¨UDF£©ºÍ×Ô¶¨Òå¾ÛºÏº¯Êý£¨UDAF£©£¬Ç°Õß¿ÉÒÔʹÓÃJavaºÍC++ʵÏÖ£¬ºóÕßĿǰ½öÖ§³ÖC++ʵÏÖ£¬³ý´ËÖ®ÍâµÄschema²Ù×÷¶¼¿ÉÒÔÔÚHiveÉÏʵÏÖ£¬ÓÉÓÚImpalaµÄ´æ´¢ÓÉHDFSʵÏÖ£¬Òò´Ë²»Äܹ»ÊµÏÖupdate¡¢deleteÓï¾ä£¬Èç¹ûÓдËÀàÐèÇ󣬻¹ÊÇÐèÒªÖØÐ¼ÆËãÕû¸ö·ÖÇøµÄÊý¾Ý²¢ÇÒ¸²¸ÇÀÏÊý¾Ý£¬Õâµã¶ÔÓÚÐ޸ĵÄʵʱÐÔÒªÇó±È½Ï¸ßµÄÐèÇó»¹ÊDz»ÄÜÂú×ãµÄ£¬Èç¹ûÓдËÀàÐèÇó»¹ÊÇÆÚ´ýKuduµÄÖ§³Ö°É£¬»òÕß³¢ÊÔһϴ«Í³µÄMPPÊý¾Ý¿â£¬ÀýÈçGreenPlum¡£
µ±Íê³ÉÊý¾Ýµ¼ÈëÖ®ºó£¬Óû§ÐèÒªÖ´ÐÐCOMPUTE STATS <table\>ÒÔÊÕ¼¯ºÍ¸üбíµÄͳ¼ÆÐÅÏ¢£¬ÕâЩͳ¼ÆÐÅÏ¢¶ÔÓÚCBOÓÅ»¯Æ÷ÌṩÊý¾ÝÖ§³Ö£¬ÓÃÓÚÉú³É¸üÓŵÄÎïÀíÖ´Ðмƻ®¡£²âÊÔ·¢ÏÖÕâ¸ö²Ù×÷µÄËÙ¶È»¹ÊDZȽϿìµÄ£¬¿ÉÒÔ½«Æä¿´×öÊý¾Ýµ¼ÈëµÄÒ»²¿·Ö£¬ÁíÍâÐèҪעÒâµÄÊÇÕâ¸öÓï¾ä²»»á×Ô¶¯Ö´ÐУ¬Òò´Ë½¨ÒéÓû§ÔÚloadÍêÊý¾ÝÖ®ºóÊÖ¶¯µÄÖ´ÐÐÒ»´Î¸ÃÃüÁî¡£
ϵͳ¼Ü¹¹
´ÓÓû§µÄʹÓ÷½Ê½ÉÏÀ´¿´£¬ImpalaºÍHive»¹ÊǺÜÏàËÆµÄ£¬²¢ÇÒ¿ÉÒÔ¹²ÏíÒ»·ÝÔªÊý¾Ý£¬ÕâÒ²´ó´ó¼ò»¯Á˽ÓÈëÁ÷³Ì£¬ÏÂÃæÎÒÃÇ´ÓʵÏֵĽǶÈÀ´¿´Ò»ÏÂImpalaÊÇÈçºÎ¹¤×÷µÄ¡£ÏÂͼչʾÁËImpalaµÄϵͳ¼Ü¹¹ºÍ²éѯµÄÖ´ÐÐÁ÷³Ì¡£

´ÓÉÏͼ¿ÉÒÔ¿´³ö£¬Impala×ÔÉí°üº¬Èý¸öÄ£¿é£ºImpalad¡¢StatestoreºÍCatalog£¬³ý´ËÖ®ÍâËü»¹ÒÀÀµHive
MetastoreºÍHDFS£¬ÆäÖÐImapalad¸ºÔð½ÓÊÜÓû§µÄ²éѯÇëÇó£¬Ò²Òâζ×ÅÓû§µÄ¿ÉÒÔ½«ÇëÇó·¢Ë͸øÈÎÒâÒ»¸öImpalad½ø³Ì£¬¸Ã½ø³ÌÔÚ±¾´Î²éѯ³äµ±Ðµ÷Õߣ¨coordinator£©µÄ×÷Óã¬Éú³ÉÖ´Ðмƻ®²¢ÇÒ·Ö·¢µ½ÆäËüµÄImpalad½ø³ÌÖ´ÐУ¬×îÖջ㼯½á¹û·µ»Ø¸øÓû§£¬²¢ÇÒ¶ÔÓÚµ±Ç°ImpaladºÍÆäËüImpalad½ø³Ì¶øÑÔ£¬ËûÃÇͬʱҲÊDZ¾´Î²éѯµÄÖ´ÐÐÕߣ¬Íê³ÉÊý¾Ý¶ÁÈ¡¡¢ÎïÀíËã×ÓµÄÖ´Ðв¢½«½á¹û·µ»Ø¸øÐµ÷ÕßImpalad¡£ÕâÖÖÎÞÖÐÐIJéѯ½ÚµãµÄÉè¼ÆÄܹ»×î´ó³Ì¶ÈµÄ±£Ö¤ÈÝ´íÐÔ²¢ÇÒºÜÈÝÒ××ö¸ºÔؾùºâ¡£ÕýÈçͼÖÐչʾµÄÒ»Ñù£¬Í¨³£Ã¿Ò»¸öHDFSµÄDataNodeÉϲ¿ÊðÒ»¸öImpalad½ø³Ì£¬ÓÉÓÚHDFS´æ´¢Êý¾Ýͨ³£ÊǶั±¾µÄ£¬ËùÒÔÕâÑùµÄ²¿Êð¿ÉÒÔ±£Ö¤Êý¾ÝµÄ±¾µØÐÔ£¬²éѯ¾¡¿ÉÄܵĴӱ¾µØ´ÅÅ̶ÁÈ¡Êý¾Ý¶ø·ÇÍøÂ磬´ÓÕâµã¿ÉÒÔÍÆ¶Ï³öImpalad¶ÔÓÚ±¾µØÊý¾ÝµÄ¶ÁȡӦ¸ÃÊÇͨ¹ýÖ±½Ó¶Á±¾µØÎļþµÄ·½Ê½£¬¶ø·Çµ÷ÓÃHDFSµÄ½Ó¿Ú¡£ÎªÁËʵÏÖ²éѯ·Ö¸îµÄ×ÓÈÎÎñ¿ÉÒÔ×öµ½¾¡¿ÉÄܵı¾µØÊý¾Ý¶ÁÈ¡£¬ImpaladÐèÒª´ÓMetastoreÖлñÈ¡±íµÄÊý¾Ý´æ´¢Â·¾¶£¬²¢ÇÒ´ÓNameNodeÖлñȡÿһ¸öÎļþµÄÊý¾Ý¿é·Ö²¼¡£
Catalog·þÎñÌṩÁËÔªÊý¾ÝµÄ·þÎñ£¬ËüÒÔµ¥µãµÄÐÎʽ´æÔÚ£¬Ëü¼È¿ÉÒÔ´ÓÍⲿϵͳ£¨ÀýÈçHDFS
NameNodeºÍHive Metastore£©ÀȡԪÊý¾Ý£¬Ò²¸ºÔðÔÚImpalaÖÐÖ´ÐеÄDDLÓï¾äÌá½»µ½Metatstore£¬ÓÉÓÚImpalaûÓÐupdate/delete²Ù×÷£¬ËùÒÔËü²»ÐèÒª¶ÔHDFS×öÈκÎÐ޸ġ£Ö®Ç°ÎÒÃǽéÉܹýÓÐÁ½ÖÖ·½Ê½ÏòImpalaÖе¼ÈëÊý¾Ý£¨DDL£©¡ª¡ªÍ¨¹ýhive»òÕßimpala£¬Èç¹ûͨ¹ýhiveÔò¸Ä±äµÄÊÇHive
metastoreµÄ״̬£¬´ËʱÐèҪͨ¹ýÔÚImpalaÖÐÖ´ÐÐREFRESHÒÔ֪ͨԪÊý¾ÝµÄ¸üУ¬¶øÈç¹ûÔÚimpalaÖвÙ×÷ÔòImpalad»á½«¸Ã¸üвÙ×÷֪ͨCatalog£¬ºóÕßͨ¹ý¹ã²¥µÄ·½Ê½Í¨ÖªÆäËüµÄImpalad½ø³Ì¡£Ä¬ÈÏÇé¿öÏÂCatalogÊÇÒì²½¼ÓÔØÔªÊý¾ÝµÄ£¬Òò´Ë²éѯ¿ÉÄÜÐèÒªµÈ´ýÔªÊý¾Ý¼ÓÔØÍê³ÉÖ®ºó²ÅÄܽøÐУ¨µÚÒ»´Î¼ÓÔØ£©¡£¸Ã·þÎñµÄ´æÔÚ½«ÔªÊý¾Ý´ÓImpalad½ø³ÌÖжÀÁ¢³öÀ´£¬¿ÉÒÔ¼ò»¯ImpaladµÄʵÏÖ£¬½µµÍImpaladÖ®¼äµÄñîºÏ¡£
³ýÁËCatalog·þÎñ£¬Impala»¹ÌṩÁËStateStore·þÎñÍê³ÉÁ½¸ö¹¤×÷£ºÏûÏ¢¶©ÔÄ·þÎñºÍ״̬¼à²â¹¦ÄÜ¡£CatalogÖеÄÔªÊý¾Ý¾ÍÊÇͨ¹ýStateStore·þÎñ½øÐй㲥·Ö·¢µÄ£¬ËüʵÏÖÁËÒ»¸öPub-Sub·þÎñ£¬Impalad¿ÉÒÔ×¢²áËüÃÇÏ£Íû»ñµÃµÄʼþÀàÐÍ£¬Statestore»áÖÜÆÚÐԵķ¢ËÍÁ½ÖÖÀàÐ͵ÄÏûÏ¢¸øImpalad½ø³Ì£¬Ò»ÖÖΪ¸ÃImpalad×¢²á¼àÌýµÄʼþµÄ¸üУ¬»ùÓÚ°æ±¾µÄÔöÁ¿¸üУ¨Ö»Í¨ÖªÉϴγɹ¦¸üÐÂÖ®ºóµÄ±ä»¯£©¿ÉÒÔ¼õСÿ´ÎͨÐŵÄÏûÏ¢´óС£»ÁíÒ»ÖÖÏûϢΪÐÄÌøÐÅÏ¢£¬StateStore¸ºÔðͳ¼ÆÃ¿Ò»¸öImpalad½ø³ÌµÄ״̬£¬Impalad¿ÉÒԾݴËÁ˽âÆäÓàImpalad½ø³ÌµÄ״̬£¬ÓÃÓÚÅжϷÖÅä²éѯÈÎÎñµ½ÄÄЩ½Úµã¡£ÓÉÓÚÖÜÆÚÐÔµÄÍÆËͲ¢ÇÒÿһ¸ö½ÚµãµÄÍÆËÍÆµÂʲ»Ò»Ö¿ÉÄܻᵼÖÂÿһ¸öImpalad½ø³Ì»ñµÃµÄ״̬²»Ò»Ö£¬ÓÉÓÚÿһ´Î²éѯֻÒÀÀµÓÚе÷ÕßImpalad½ø³Ì»ñÈ¡µÄ״̬½øÐÐÈÎÎñµÄ·ÖÅ䣬¶ø²»ÐèÒª¶à¸ö½ø³Ì½øÐÐÔٴεÄе÷£¬Òò´Ë²¢²»ÐèÒª±£Ö¤ËùÓеÄImpalad״̬ÊÇÒ»Öµġ£ÁíÍ⣬StateStore½ø³ÌÊǵ¥µãµÄ£¬²¢ÇÒ²»»á³Ö¾Ã»¯ÈκÎÊý¾Ýµ½´ÅÅÌ£¬Èç¹û·þÎñ¹Òµô£¬ImpaladÔòÒÀÀµÓÚÉÏÒ»´Î»ñµÃÔªÊý¾Ý״̬½øÐÐÈÎÎñ·ÖÅ䣬¹Ù·½²¢Ã»ÓÐÌṩ¿É¿¿ÐÔ²¿ÊðµÄ·½°¸£¬Í¨³£¿ÉÒÔʹÓÃDNS·½Ê½°ó¶¨¶à¸ö·þÎñÒÔÓ¦¶Ôµ¥¸ö·þÎñ¹ÒµôµÄÇé¿ö¡£
ImpaladÄ£¿é
´ÓImpaladµÄ¸÷¸öÄ£¿é¿ÉÒÔ¿´³ö£¬Ö÷Òª²éѯ´¦Àí¶¼ÊÇÔÚImpalad½ø³ÌÖÐÍê³É£¬StateStoreºÍCatalog°ïÖúImpaladÍê³ÉÔªÊý¾ÝµÄ¹ÜÀíºÍ¸ºÔØ¼à¿ØµÈ¹¤×÷£¬Æäʵ¸ü½øÒ»²½¿ÉÒÔ½«Query
PlannerºÍQuery CoordinatorÄ£¿é´ÓImpaladÒÆ³öµ¥¶ÀµÄ×÷Ϊһ¸öÈë¿Ú·þÎñ´æÔÚ£¬¶øImpalad½ö¸ºÔðÊý¾Ý¶ÁдºÍ×ÓÈÎÎñµÄÖ´ÐС£
ÔÚImpalad½øÐÐÖ´ÐÐÓÅ»¯µÄʱºò¸ù±¾ÔÔòÊǾ¡¿ÉÄܵÄÊý¾Ý±¾µØ¶ÁÈ¡£¬¼õÉÙÍøÂçͨÐÅ£¬±Ï¾¹ÔÚ²»¿¼ÂÇÄڴ滺´æÊý¾ÝµÄÇé¿öÏ£¬´ÓÔ¶¶Ë¶ÁÈ¡Êý¾ÝÐèÒª´ÅÅÌ->ÄÚ´æ->Íø¿¨->±¾µØÍø¿¨->±¾µØÄÚ´æµÄ¹ý³Ì£¬¶ø´Ó±¾µØ¶ÁÈ¡Êý¾Ý½öÐèÒª±¾µØ´ÅÅÌ->±¾µØÄÚ´æµÄ¹ý³Ì£¬¿ÉÒÔ¿´³ö£¬ÔÚÏàͬµÄÓ²¼þ½á¹¹Ï£¬¶ÁÈ¡ÆäËû½ÚµãÊý¾ÝʼÖÕ±¾µØ´ÅÅ̵ÄÊý¾Ý¶ÁÈ¡ËÙ¶È¡£
Impalad·þÎñÓÉÈý¸öÄ£¿é×é³É£ºQuery Planner¡¢Query
CoordinatorºÍQuery Executor£¬Ç°Á½¸öÄ£¿é×é³Éǰ¶Ë£¬¸ºÔð½ÓÊÕSQL²éѯÇëÇ󣬽âÎöSQL²¢×ª»»³ÉÖ´Ðмƻ®£¬½»Óɺó¶ËÖ´ÐУ¬Óï·¨·½ÃæËü¼ÈÖ§³Ö»ù±¾µÄ²Ù×÷£¨select¡¢project¡¢join¡¢group
by¡¢filter¡¢order by¡¢limitµÈ£©£¬Ò²Ö§³Ö¹ØÁª×Ó²éѯºÍ·Ç¹ØÁª×Ó²éѯ£¬Ö§³Ö¸÷ÖÖouter-joinºÍ´°¿Úº¯Êý£¬Õⲿ·Ö°´ÕÕͨÓõĽâÎöÁ÷³Ì·ÖΪ²éѯ½âÎö->Óï·¨·ÖÎö->²éѯÓÅ»¯£¬×îÖÕÉú³ÉÎïÀíÖ´Ðмƻ®¡£¶ÔÓÚQuery
Planner¶øÑÔ£¬ËüÉú³ÉÎïÀíÖ´Ðмƻ®µÄ¹ý³Ì·Ö³ÉÁ½²½£¬Ê×ÏÈÉú³Éµ¥½ÚµãÖ´Ðмƻ®£¬È»ºóÔÙ¸ù¾ÝËüµÃµ½·ÖÇø¿É²¢ÐеÄÖ´Ðмƻ®¡£Ç°ÕßÊǸù¾ÝÀàËÆÓÚRDBMS½øÐÐÖ´ÐÐÓÅ»¯µÄ¹ý³Ì£¬¾ö¶¨join˳Ðò£¬¶ÔjoinÖ´ÐÐν´ÊÏÂÍÆ£¬¸ù¾Ý¹ØÏµÔËË㹫ʽ½øÐÐһЩת»»µÈ£¬Õâ¸öÖ´Ðмƻ®µÄÉú³É¹ý³ÌÒÀÀµÓÚImpala±íºÍ·ÖÇøµÄͳ¼ÆÐÅÏ¢¡£µÚ¶þ²½ÊǸù¾ÝÉÏÒ»²½Éú³ÉµÄµ¥½ÚµãÖ´Ðмƻ®µÃµ½·Ö²¼Ê½Ö´Ðмƻ®£¬¿É²ÎÕÕDremelµÄÖ´Ðйý³Ì¡£ÔÚÉÏÒ»²½ÒѾ¾ö¶¨ÁËjoinµÄ˳Ðò£¬ÕâÒ»²½ÐèÒª¾ö¶¨joinµÄ²ßÂÔ£ºÊ¹ÓÃhash
join»¹ÊÇbroadcast join£¬Ç°ÕßÒ»°ãÕë¶ÔÁ½¸ö´ó±í£¬¸ù¾Ýjoin¼ü½øÐÐhash·ÖÇøÒÔʹµÃÏàͬµÄidÉ¢Áе½ÏàͬµÄ½ÚµãÉϽøÐÐjoin£¬ºóÕßͨ¹ý¹ã²¥Õû¸öС±íµ½ËùÓнڵ㣬ImpalaÑ¡ÔñµÄ²ßÂÔÊÇÒÀÀµÓÚÍøÂçͨÐŵÄ×îС»¯¡£¶ÔÓھۺϲÙ×÷£¬Í¨³£ÐèÒªÊ×ÏÈÔÚÿ¸ö½ÚµãÉÏÖ´ÐÐÔ¤¾ÛºÏ£¬È»ºóÔÙ¸ù¾Ý¾ÛºÏ¼üµÄÖµ½øÐÐhash½«½á¹ûÉ¢Áе½¶à¸ö½ÚµãÔÙ½øÐÐÒ»´Îmerge£¬×îÖÕÔÚcoordinator½ÚµãÉϽøÐÐ×îÖյĺϲ¢£¨Ö»ÐèÒªºÏ²¢¾Í¿ÉÒÔÁË£©£¬µ±È»¶ÔÓÚ·Çgroup
byµÄ¾ÛºÏÔËË㣬Ôò¿ÉÒÔ½«Ã¿Ò»¸ö½ÚµãÔ¤¾ÛºÏµÄ½á¹û½»¸øÒ»¸ö½Úµã½øÐÐmerge¡£sortºÍtop-nµÄÔËËãºÍÕâ¸öÀàËÆ¡£
ÏÂͼչʾÁËÖ´ÐÐselect t1.n1, t2.n2, count(1) as c from t1
join t2 on t1.id = t2.id join t3 on t1.id = t3.id
where t3.n3 between ¡®a¡¯ and ¡®f¡¯ group by t1.n1, t2.n2
order by c desc limit 100;²éѯµÄÖ´ÐÐÂß¼£¬Ê×ÏÈQuery PlannerÉú³Éµ¥»úµÄÎïÀíÖ´Ðмƻ®£¬ÈçÏÂͼËùʾ£º

ºÍ´ó¶àÊýÊý¾Ý¿âʵÏÖÒ»Ñù£¬µÚÒ»²½Éú³ÉÁËÒ»¸öµ¥½ÚµãµÄÖ´Ðмƻ®£¬ÀûÓÃParquetµÈÁÐʽ´æ´¢£¬¿ÉÒÔÔÚSCAN²Ù×÷µÄʱºòÖ»¶ÁÈ¡ÐèÒªµÄÁУ¬²¢ÇÒ¿ÉÒÔ½«Î½´ÊÏÂÍÆµ½SCANÖУ¬´ó´ó½µµÍÊý¾Ý¶ÁÈ¡¡£È»ºóÖ´ÐÐjoin¡¢aggregation¡¢sortºÍlimitµÈ²Ù×÷£¬ÕâÑùµÄÖ´Ðмƻ®ÐèÒªÔÙת»»³É·Ö²¼Ê½Ö´Ðмƻ®£¬ÈçÏÂͼ¡£

ÕâÀàµÄ²éѯִÐÐÁ÷³ÌÀàËÆÓÚDremel£¬Ê×Ïȸù¾ÝÈý¸ö±íµÄ´óСȨºâʹÓõÄjoin·½Ê½£¬ÕâÀïT1ºÍT2ʹÓÃhash
join£¬´ËʱÐèÒª°´ÕÕidµÄÖµ·Ö±ð½«T1ºÍT2·ÖÉ¢µ½²»Í¬µÄImpalad½ø³Ì£¬µ«ÊÇÏàͬµÄid»áÉ¢Áе½ÏàͬµÄImpalad½ø³Ì£¬ÕâÑùÿһ¸öjoinÖ®ºóÊÇÈ«²¿Êý¾ÝµÄÒ»²¿·Ö¡£¶ÔÓÚT3µÄjoinʹÓÃboardcastµÄ·½Ê½£¬Ã¿Ò»¸ö½Úµã¶¼»áÊÕµ½T3µÄÈ«²¿Êý¾Ý£¨Ö»ÐèÒªidÁУ©£¬ÔÚÖ´ÐÐÍêjoinÖ®ºó¿ÉÒÔ¸ù¾Ýgroup
byÖ´Ðб¾µØµÄÔ¤¾ÛºÏ£¬Ã¿Ò»¸ö½ÚµãµÄÔ¤¾ÛºÏ½á¹ûÖ»ÊÇ×îÖÕ½á¹ûµÄÒ»²¿·Ö£¨²»Í¬µÄ½Úµã¿ÉÄÜ´æÔÚÏàͬµÄgroup
byµÄÖµ£©£¬ÐèÒªÔÙ½øÐÐÒ»´ÎÈ«¾ÖµÄ¾ÛºÏ£¬¶øÈ«¾ÖµÄ¾ÛºÏͬÑùÐèÒª²¢ÐУ¬Ôò¸ù¾Ý¾ÛºÏÁнøÐÐhash·ÖÉ¢µ½²»Í¬µÄ½ÚµãÖ´ÐÐmergeÔËË㣨ÆäʵÈÔÈ»ÊÇÒ»´Î¾ÛºÏÔËË㣩£¬Ò»°ãÇé¿öÏÂΪÁ˽ÏÉÙÊý¾ÝµÄÍøÂç´«Ê䣬
intermediate½ÚµãͬÑùÒ²ÊÇworker½Úµã¡£Í¨¹ý±¾´ÎµÄ¾ÛºÏ£¬ÏàͬµÄkeyÖ»´æÔÚÓÚÒ»¸ö½Úµã£¬È»ºó¶ÔÓÚÿһ¸ö½Úµã½øÐÐÅÅÐòºÍTopN¼ÆË㣬×îÖÕ½«Ã¿Ò»¸öWorkerµÄ½á¹û·µ»Ø¸øcoordinator½øÐкϲ¢¡¢ÅÅÐò¡¢limit¼ÆË㣬·µ»Ø½á¹û¸øÓû§¡£
ImpaladÓÅ»¯
ÉÏÃæ½éÉÜÁËÕû¸ö²éѯ´óÖµÄÖ´ÐÐÁ÷³Ì£¬ImpaladµÄºó¶ËʹÓõÄÊÇC++ʵÏֵģ¬ÕâʹµÃËü¿ÉÒÔÕë¶ÔÓ²¼þ×öÒ»Ð©ÌØÊâµÄÓÅ»¯£¬²¢ÇÒ¿ÉÒÔ±ÈʹÓÃJAVAʵÏÖµÄSQLÒýÇæÓиüºÃµÄ×ÊԴʹÓÃÂÊ¡£ÁíÍ⣬ºó¶ËµÄʵÏÖʹÓÃÁËLLVM£¬ËüÊÇÒ»¸ö±àÒëÆ÷¿ò¼Ü£¬¿ÉÒÔÔÚÖ´ÐÐÆ÷Éú³É²¢±àÒë´úÂë¡£¹Ù·½²âÊÔ·¢ÏÖʹÓö¯Ì¬Éú³É´úÂë»úÖÆ¿ÉÒÔʹµÃºó¶ËÖ´ÐÐÐÔÄÜÌá¸ß1¡ª5±¶¡£
ÔÚÊý¾Ý·ÃÎÊ·½Ã棬Impalad²¢Ã»ÓÐʹÓÃͨÓõÄHDFS¶ÁÈ¡Êý¾ÝÄÇÒ»Ì×Á÷³Ì£¬±Ï¾¹ImpaladÒ»°ã²¿ÊðÔÚDataNodeÉÏ£¬·ÃÎÊÊý¾ÝÍêÈ«²»ÐèÒªÔÙ×ßNameNodeÁË£¬Òò´ËËüʹÓÃÁËHDFSÌṩµÄShort-Circuit
Local Reads»úÖÆ£¬ËüÌṩÁËÖ±½Ó·ÃÎÊDataNodeµÄ·½°¸£¬¿ÉÒԲο¼Hadoop¹Ù·½ÎĵµºÍHDFS-347Á˽âÏêÇé¡£
×îºóImpaladºó¶ËÖ§³Ö¶ÔÖÐÎļþ¸ñʽºÍѹËõÊý¾ÝµÄ¶ÁÈ¡£¬°üÀ¨Avro¡¢RC¡¢Sequence¡¢Parquet£¬Ö§³Ösnappy¡¢gzip¡¢bz2µÈѹËõ£¬¿´À´Impala²»Ö§³Ö¿ÉÄÜÒ²²»´òËãÖ§³ÖORC¸ñʽÀ²£¬±Ï¾¹ÓÐ×Ô¼ÒÖ÷ÍÆµÄParquet£¬¶øORCÔòÔÚPrestoÖй㷺ʹÓá£
²¿Êð·½Ê½
ͨ³£Çé¿öÏ£¬ÎÒÃǻῼÂÇÁ½ÖÖ·½Ê½µÄ¼¯Èº²¿Ê𣺻ìºÏ²¿ÊðºÍ¶ÀÁ¢²¿Êð£¬ÏÂͼ·Ö±ðչʾÁË»ìºÏ²¿ÊðÓë¶ÀÁ¢²¿ÊðʱµÄ¸÷½Úµã½á¹¹¡£»ìºÏ²¿ÊðÒâζ׎«Impala¼¯Èº²¿ÊðÔÚHadoop¼¯ÈºÖ®ÉÏ£¬¹²ÏíÕû¸öHadoop¼¯ÈºµÄ×ÊÔ´£»¶ÀÁ¢²¿ÊðÔòÊǵ¥¶ÀʹÓò¿·Ö»úÆ÷Ö»²¿ÊðHDFSºÍImpala£¬Ç°ÕßµÄÓÅÊÆÊÇImpala¿ÉÒÔºÍHadoop¼¯Èº¹²ÏíÊý¾Ý£¬²»ÐèÒª½øÐÐÊý¾ÝµÄ¿½±´£¬µ«ÊÇ´æÔÚImpalaºÍHadoop¼¯ÈºÇÀÕ¼×ÊÔ´µÄÇé¿ö£¬½ø¶ø¿ÉÄÜÓ°ÏìImpalaµÄ²éѯÐÔÄÜ£¨MRÈÎÎñÒ²¿ÉÄܱ»ImpalaÓ°Ï죩£¬¶øºóÕß¿ÉÒÔÌṩÎȶ¨µÄ¸ßÐÔÄÜ£¬µ«ÊÇÐèÒª³ÖÐøµÄ´ÓHadoop¼¯Èº¿½±´Êý¾Ýµ½Impala¼¯ÈºÉÏ£¬Ôö¼ÓÁËETLµÄ¸´ÔÓ¶È¡£Á½ÖÖ·½Ê½¸÷ÓÐÓÅÁÓ£¬µ«ÊÇÕë¶ÔǰһÖÖ²¿Êð·½°¸£¬ÐèÒª¿¼ÂÇÈçºÎ·ÖÅä×ÊÔ´µÄÎÊÌ⣬Ê×ÏÈÔÚ»ìºÏ²¿ÊðµÄÇé¿öϲ»¿ÉÄÜÔÙÈÃImpalad½ø³Ì³£×¤£¨ÕâÑùÏ൱ÓÚ°Ñÿһ¸öNodeManagerµÄ×ÊÔ´·Ö³öÈ¥ÁËÒ»²¿·Ö£¬²¢ÇÒ²»Äܳä·ÖÀûÓü¯Èº×ÊÔ´£©£¬µ«ÊÇYARNµÄ×ÊÔ´·ÖÅä»úÖÆÑÓ³ÙÌ«´ó£¬¶ÔÓÚImpalaµÄ²éѯËÙ¶ÈÓкܴóµÄÓ°Ï죬ÓÚÊÇImpalaºÜÔç¾ÍÉè¼ÆÁËÒ»ÖÖÔÚYARNÉÏÍê³ÉImpala×ÊÔ´µ÷¶ÈµÄ·½°¸¡ª¡ªLlama£¨Low
Latency Application MAster£©£¬ËüÆäʵÊÇÒ»¸öAMµÄ½ÇÉ«£¬¶ÔÓÚImpala¶øÑÔ¡£ËüµÄÒªÇóÊÇÔÚ²éѯִÐÐ֮ǰ±ØÐëÈ·±£ÐèÒªµÄ×ÊÔ´¿ÉÓ㬷ñÔò¿ÉÄܳöÏÖÒ»¸öImpaladµÄ×èÈû¶øÓ°ÏìÕû¸ö²éѯµÄÏìÓ¦ËÙ¶È£¨Ä¾Í°ÔÀí£©£¬Llama»áÔÚImpala²éѯ֮ǰÉêÇë×ã¹»µÄ×ÊÔ´£¬²¢ÇÒÔÚ²éѯÍê³ÉÖ®ºó¾¡¿ÉÄܵĻº´æ×ÊÔ´£¬Ö»Óе±YARNÐèÒª½«¸Ã²¿·Ö×ÊÔ´ÓÃÓÚÆäËü¹¤×÷ʱ£¬Llama²Å»á½«×ÊÔ´ÊÍ·Å¡£ËäÈ»Llama¾¡¿ÉÄܵı£³Ö×ÊÔ´£¬µ«Êǵ±»ìºÏ²¿ÊðµÄÇé¿öÏ£¬»¹ÊÇ¿ÉÄÜ´æÔÚImpala²éѯ»ñÈ¡²»µ½×ÊÔ´µÄÇé¿ö£¬ËùÒÔΪÁ˱£Ö¤¸ßÐÔÄÜ£¬»¹Êǽ¨Òé¶ÀÁ¢²¿Êð¡£

²âÊÔ
ÎÒÃÇС×éµÄͬʶÔImpala×öÁËÒ»´Î»ùÓÚTPCDSÊý¾Ý¼¯µÄÐÔÄܲâÊÔ£¬·Ö±ð»ùÓÚ1TBºÍ10TBµÄÊý¾Ý¼¯£¬¿ÉÒÔ¿´³ö£¬ËüµÄ²éѯÐÔÄܽÏÖ®ÓÚHiveÓÐÊýÁ¿¼¶¼¶±ðµÄÌáÉý£¬¶Ô±ÈSpark
SQLÒ²Óм¸±¶µÄÌáÉý£¬Compute stat²Ù×÷¿ÉÒÔ¸øImpala´øÀ´Ò»¶¨µÄ²éѯÓÅ»¯£¬µ«ÊÇż¶û·´¶øÎóµ¼²éѯÓÅ»¯Æ÷ÒÔÖÁÓÚÐÔÄÜϽµ£¬×îºóÎÒÃÇ»¹²âÊÔÁËImpala
on Kudu£¬·¢ÏÖËü²¢Ã»ÓдﵽÒâÁÏÖеÄÐÔÄÜ£¨¼¸±¶µÄ²î±ð£©¡£Î¨Ò»µÄȱº¶ÊÇÎÒÃDz¢Ã»ÓжԶàÓû§²¢·¢³¡¾°Ï½øÐвâÊÔ£¬²»¹ý´Óµ¥¸ö²éѯµÄ×ÊÔ´ÏûºÄÀ´¿´£¬C++ʵÏÖµÄImpala¶Ô×ÊÔ´µÄÏûºÄÒ²ÊÇ×îÉٵģ¬¿ÉÒÔÍÆ¶Ï³öÔÚ¶àÓû§ÏÂËüÈÔÈ»ÄÜÂú×ã¿ìËÙÏìÓ¦µÄÐèÇó£¬×îºóÊǹٷ½¸ø³öµÄ¶àÓû§³¡¾°ÏµĶԱȽá¹û£¨Óеã¹ÊÒâºÚPrestoµÄ¸Ð¾õ£©¡£

1TBÊý¾Ý¼¯Óëspark¶Ô±È²âÊÔ½á¹û

10TBÊý¾Ý¼¯Óëspark¶Ô±È²âÊÔ½á¹û

Impala on parquetÓëImpala on Kudu¶Ô±È²âÊÔ½á¹û

²¢·¢²âÊÔ½á¹û
×ܽá
±¾ÎÄÖ÷Òª½éÉÜÁËImpalaÕâ¸ö¸ßÐÔÄܵÄad-hoc²éѯÒýÇæ£¬·Ö±ð´ÓʹÓá¢ÔÀíºÍ²¿ÊðµÈ·½Ãæ×öÁËÏêϸµÄ·ÖÎö£¬×îÖÕ»ùÓÚÎÒÃǵIJâÊÔ½á¹ûҲ֤ʵÁËËüµÄ¸ßÐÔÄÜ£¬Çø±ðÓÚ´«Í³DBMSµÄMPP½â¾ö·½°¸£¬ÀýÈçGreenplum¡¢Vertica¡¢TeradataµÈ£¬Impala¸üºÃµÄÈÚÈë´óÊý¾Ý£¨Hadoop/Spark£©Éú̬Ȧ£¬¸üºÃµÄʵÏÖÊý¾ÝÖ®¼äµÄÁ÷ͨ£¬¶ø´«Í³MPPÊý¾Ý¿â£¬¸üÇãÏòÓÚÊý¾Ý×ÔÖÆ¡£µ±È»»ùÓÚHDFSµÄʵÏÖµ¼ÖÂImpalaÎÞ·¨ÊµÏÖµ¥ÌõÊý¾ÝµÄʵʱ¸üУ¬¶øÖ»ÄÜÅúÁ¿µÄ×·¼Ó»òÕ߸²¸ÇÊý¾Ý£¬ËäÈ»ClouderaÒ²ÌṩÁËImpala¶ÔÓÚKuduµÄÖ§³Ö£¬µ«ÊÇ´ÓÐÔÄܲâÊÔ½á¹û¿´£¬Ä¿Ç°²éѯÐÔÄÜ»¹ÊDz»ÀíÏ룬¶ø´«Í³MPPÊý¾Ý¿â²»½ö¿ÉÒÔÖ§³Öµ¥ÌõÊý¾ÝµÄʵʱ¸üУ¬ÉõÖÁÄܹ»ÔÚ±£Ö¤²éѯÐÔÄܵÄÇé¿öÏÂÖ§³Ö½Ï¸´ÔÓµÄÊÂÎñ£¬ÕâÒ²ÊÇSQL-on-Hadoop²éѯÒýÇæËùÍû³¾Äª¼°µÄ¡£µ«ÊÇÎÞÂÛÈçºÎ£¬ÕâÀàµÄ²éѯÒýÇæ±Ï¾¹Ö§³ÖSQLÒýÇæ¶ø²»ÊÇÒ»¸öÍêÕûµÄÊý¾Ý¿âϵͳ£¬ËüÌṩ¸øÓû§ÔÚ´óÊý¾ÝȦÖиßÐÔÄܵIJéѯ·þÎñ£¬ÕâÒ²Äܹ»Âú×ãÁ˴󲿷ÖÓû§µÄÐèÇó¡£
|