Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Model Center   Code  
»áÔ±   
   
 
     
   
 ¶©ÔÄ
  ¾èÖú
´óÊý¾Ýʱ´ú¿ìËÙSQLÒýÇæ-Impala
 
×÷Õߣº ·ëÓî
 
  1750  次浏览      27
2020-7-9 
 
±à¼­ÍƼö:
±¾ÎÄÖ÷Òª½éÉÜÁËImpalaÕâ¸ö¸ßÐÔÄܵÄad-hoc²éѯÒýÇæ£¬·Ö±ð´ÓʹÓá¢Ô­ÀíºÍ²¿ÊðµÈ·½Ãæ×öÁËÏêϸµÄ·ÖÎö£¬×îÖÕ»ùÓÚÎÒÃǵIJâÊÔ½á¹ûҲ֤ʵÁËËüµÄ¸ßÐÔÄÜ,Ï£Íû¶ÔÄúµÄѧϰÓÐËù°ïÖú¡£
±¾ÎÄÀ´×ÔÊý¾Ý¹ÜÀí£¬ÓÉ»ðÁú¹ûÈí¼þAlice±à¼­¡¢ÍƼö¡£

±³¾°

Ëæ×Å´óÊý¾Ýʱ´úµÄµ½À´£¬HadoopÔÚ¹ýÈ¥¼¸ÄêÒÔ½Ó½üͳÖÎÐԵķ½Ê½°üÀ¿µÄETLºÍÊý¾Ý·ÖÎö²éѯµÄ¹¤×÷£¬´ó¼ÒÒ²ÎÞÒâ¼äµÄÏëÍù´óÊý¾Ý·½Ïò¿¿Â££¬¼´Ê¹Ã¿ÌìÊý¾ÝÒ²¾Í¼¸Ê®¡¢¼¸°ÙMÒ²Òª·Åµ½HadoopÉÏ×÷·ÖÎö£¬Ö»»áÊÊµÃÆä·´£¬µ«Êǵ±Ãæ¶ÔÕæÕýµÄBig DataµÄʱºò£¬Hadoop¾Í»á±©Â¶³öËü¶ÔÓÚÊý¾Ý·ÖÎö²éѯ֧³ÖµÄÈõµã¡£ÉõÖÁ³öÏÖ¡¶MapReduce: Ò»¸ö¾Þ´óµÄµ¹ÍË¡·´ËÀ༫¶ËµÄͲۣ¬ÕâÒ²¹Ö²»µÃHadoop£¬±Ï¾¹ËüµÄÉè¼Æ¾ÍÊÇΪÁËÅú´¦Àí£¬Ê¹ÓÃÓÃMRµÄ±à³ÌÄ£ÐÍÀ´ÊµÏÖSQL²éѯ£¬ÐÔÄܿ϶¨²»ÈçÒâ¡£ËùÒÔͨ³£ÎÒÒ²Ö»ÊǰÑHiveµ±×öÄܹ»Ìṩ½«SQLÓïÒåת»»³ÉMRÈÎÎñµÄ¹¤¾ß£¬ÓÈÆäÔÚ×öETLµÄʱºò¡£

ÔÚDremelÂÛÎÄ·¢±íÖ®ºó£¬¿ªÔ´ÉçÇøÓ¿ÏÖ³öÁËÒ»Åú»ùÓÚMPP¼Ü¹¹µÄSQL-on-Hadoop(HDFS)²éѯÒýÇæ£¬µäÐÍ´ú±íÓÐApache Impala¡¢Presto¡¢Apache Drill¡¢Apache HAWQµÈ£¬¿´ÉÏÈ¥ÕâЩ²éѯÒýÇæÌṩµÄ¹¦ÄܺÍʵÏÖ·½Ê½Ò²¶¼´óͬСÒ죬±¾ÎĽ«»ùÓÚImpalaµÄʹÓúÍʵÏÖ½éÉÜÈÕÒæ·¢Õ¹µÄ»ùÓÚHDFSµÄMPPÊý¾Ý²éѯÒýÇæ¡£

Impala½éÉÜ

Apache ImpalaÊÇÓÉCloudera¿ª·¢²¢¿ªÔ´µÄÒ»¿î»ùÓÚHDFS/HbaseµÄMPP SQLÒýÇæ£¬ËüÓµÓкÍHadoopÒ»ÑùµÄ¿ÉÀ©Õ¹ÐÔ¡¢ËüÌṩÁËÀàSQL£¨ÀàHsql£©Óï·¨£¬ÔÚ¶àÓû§³¡¾°ÏÂÒ²ÄÜÓµÓнϸߵÄÏìÓ¦ËٶȺÍÍÌÍÂÁ¿¡£ËüÊÇÓÉJavaºÍC++ʵÏֵģ¬JavaÌṩµÄ²éѯ½»»¥µÄ½Ó¿ÚºÍʵÏÖ£¬C++ʵÏÖÁ˲éѯÒýÇæ²¿·Ö£¬³ý´ËÖ®Í⣬Impala»¹Äܹ»¹²ÏíHive Metastore£¨ÕâÖð½¥±ä³ÉÒ»ÖÖ±ê×¼£©£¬ÉõÖÁ¿ÉÒÔÖ±½ÓʹÓÃHiveµÄJDBC jarºÍbeelineµÈÖ±½Ó¶ÔImpala½øÐвéѯ¡¢Ö§³Ö·á¸»µÄÊý¾Ý´æ´¢¸ñʽ£¨Parquet¡¢AvroµÈ£©£¬µ±È»³ýÁËÓбȽÏÃ÷È·µÄÀíÓÉ£¬Parquet×ÜÊÇʹÓÃImpalaµÄµÚһѡÔñ¡£

´ÓÓû§ÊÓ½Ç

¿ÉÒÔ½«ImpalaÕâÀàϵͳµÄÓû§·ÖΪÁ½À࣬һÀàÊǸºÔðÊý¾Ýµ¼ÈëºÍ¹ÜÀíµÄÊý¾Ý¿ª·¢Í¬Ñ§£¬ÁíÒ»ÀàÔòÊÇÖ´ÐвéѯµÄÊý¾Ý·ÖÎöʦͬѧ£¬Ç°Õßͨ³£ÐèÒª½«Êý¾Ý´æ´¢µ½HDFS£¬Í¨¹ýCREATE TABLEµÄ·½Ê½´´½¨ÓëÊý¾ÝmatchµÄschema£¬È»ºóͨ¹ýload data»òÕßadd partitionµÄ·½Ê½½«±íºÍÊý¾Ý¹ØÁªÆðÀ´£¬ÕâһЩÁ÷³Ì´®ÆðÀ´»¹ÊÇͦÂé·³µÄ£¬µ«ÊǶà¿÷ÁËHive£¬ÓÉÓÚImpala¿ÉÒÔ¹²ÏíHiveµÄMetaStore£¬ÕâÑù¾Í¿ÉÒÔʹÓÃHiveÍê³É´ËÀàETL¹¤×÷£¬È»ºó½«Êý¾Ý²éѯµÄ¹¤×÷½»¸øImpala£¬´ó´ó¼ò»¯¹¤×÷Á÷³Ì£¨¾ÝÎÒËùÖª±Ï¾¹´ó²¿·ÖÊý¾Ý¿ª·¢Í¬Ñ§»¹ÊDZȽÏÊìϤHive£©¡£½ÓÏÂÀ´¶ÔÓÚÊý¾Ý·ÖÎöʦ¶øÑÔ¾ÍÊÇÈçºÎ±àдÕýÈ·µÄSQÒÔ±í´ïËûÃǵIJéѯ¡¢·ÖÎöÐèÇó£¬ÕâÒ²ÊÇËüÃÇ×îÄÃÊÖµÄÁË£¬Impalaͨ³£¿ÉÒÔÔÚTB¼¶±ðµÄÊý¾ÝÉÏÌṩÃë¼¶µÄ²éѯËÙ¶È£¬ËùÒÔʹÓÃÆðÀ´¿ÉÄÜÈÃÄã´ÓHiveµÄ¹êËÙÏìÓ¦Ò»ÏÂÌáÉýµ½ÆÚÍûµÄËÙ¶È¡£

Impala³ýÁËÖ§³Ö¼òµ¥ÀàÐÍÖ®Í⣬»¹Ö§³ÖString¡¢timestamp¡¢decimalµÈ¶àÖÖÀàÐÍ£¬Óû§»¹¿ÉÒÔ¶ÔÓÚÌØÊâµÄÂß¼­ÊµÏÖ×Ô¶¨Ò庯Êý£¨UDF£©ºÍ×Ô¶¨Òå¾ÛºÏº¯Êý£¨UDAF£©£¬Ç°Õß¿ÉÒÔʹÓÃJavaºÍC++ʵÏÖ£¬ºóÕßĿǰ½öÖ§³ÖC++ʵÏÖ£¬³ý´ËÖ®ÍâµÄschema²Ù×÷¶¼¿ÉÒÔÔÚHiveÉÏʵÏÖ£¬ÓÉÓÚImpalaµÄ´æ´¢ÓÉHDFSʵÏÖ£¬Òò´Ë²»Äܹ»ÊµÏÖupdate¡¢deleteÓï¾ä£¬Èç¹ûÓдËÀàÐèÇ󣬻¹ÊÇÐèÒªÖØÐ¼ÆËãÕû¸ö·ÖÇøµÄÊý¾Ý²¢ÇÒ¸²¸ÇÀÏÊý¾Ý£¬Õâµã¶ÔÓÚÐ޸ĵÄʵʱÐÔÒªÇó±È½Ï¸ßµÄÐèÇó»¹ÊDz»ÄÜÂú×ãµÄ£¬Èç¹ûÓдËÀàÐèÇó»¹ÊÇÆÚ´ýKuduµÄÖ§³Ö°É£¬»òÕß³¢ÊÔһϴ«Í³µÄMPPÊý¾Ý¿â£¬ÀýÈçGreenPlum¡£

µ±Íê³ÉÊý¾Ýµ¼ÈëÖ®ºó£¬Óû§ÐèÒªÖ´ÐÐCOMPUTE STATS <table\>ÒÔÊÕ¼¯ºÍ¸üбíµÄͳ¼ÆÐÅÏ¢£¬ÕâЩͳ¼ÆÐÅÏ¢¶ÔÓÚCBOÓÅ»¯Æ÷ÌṩÊý¾ÝÖ§³Ö£¬ÓÃÓÚÉú³É¸üÓŵÄÎïÀíÖ´Ðмƻ®¡£²âÊÔ·¢ÏÖÕâ¸ö²Ù×÷µÄËÙ¶È»¹ÊDZȽϿìµÄ£¬¿ÉÒÔ½«Æä¿´×öÊý¾Ýµ¼ÈëµÄÒ»²¿·Ö£¬ÁíÍâÐèҪעÒâµÄÊÇÕâ¸öÓï¾ä²»»á×Ô¶¯Ö´ÐУ¬Òò´Ë½¨ÒéÓû§ÔÚloadÍêÊý¾ÝÖ®ºóÊÖ¶¯µÄÖ´ÐÐÒ»´Î¸ÃÃüÁî¡£

ϵͳ¼Ü¹¹

´ÓÓû§µÄʹÓ÷½Ê½ÉÏÀ´¿´£¬ImpalaºÍHive»¹ÊǺÜÏàËÆµÄ£¬²¢ÇÒ¿ÉÒÔ¹²ÏíÒ»·ÝÔªÊý¾Ý£¬ÕâÒ²´ó´ó¼ò»¯Á˽ÓÈëÁ÷³Ì£¬ÏÂÃæÎÒÃÇ´ÓʵÏֵĽǶÈÀ´¿´Ò»ÏÂImpalaÊÇÈçºÎ¹¤×÷µÄ¡£ÏÂͼչʾÁËImpalaµÄϵͳ¼Ü¹¹ºÍ²éѯµÄÖ´ÐÐÁ÷³Ì¡£

´ÓÉÏͼ¿ÉÒÔ¿´³ö£¬Impala×ÔÉí°üº¬Èý¸öÄ£¿é£ºImpalad¡¢StatestoreºÍCatalog£¬³ý´ËÖ®ÍâËü»¹ÒÀÀµHive MetastoreºÍHDFS£¬ÆäÖÐImapalad¸ºÔð½ÓÊÜÓû§µÄ²éѯÇëÇó£¬Ò²Òâζ×ÅÓû§µÄ¿ÉÒÔ½«ÇëÇó·¢Ë͸øÈÎÒâÒ»¸öImpalad½ø³Ì£¬¸Ã½ø³ÌÔÚ±¾´Î²éѯ³äµ±Ð­µ÷Õߣ¨coordinator£©µÄ×÷Óã¬Éú³ÉÖ´Ðмƻ®²¢ÇÒ·Ö·¢µ½ÆäËüµÄImpalad½ø³ÌÖ´ÐУ¬×îÖջ㼯½á¹û·µ»Ø¸øÓû§£¬²¢ÇÒ¶ÔÓÚµ±Ç°ImpaladºÍÆäËüImpalad½ø³Ì¶øÑÔ£¬ËûÃÇͬʱҲÊDZ¾´Î²éѯµÄÖ´ÐÐÕߣ¬Íê³ÉÊý¾Ý¶ÁÈ¡¡¢ÎïÀíËã×ÓµÄÖ´Ðв¢½«½á¹û·µ»Ø¸øÐ­µ÷ÕßImpalad¡£ÕâÖÖÎÞÖÐÐIJéѯ½ÚµãµÄÉè¼ÆÄܹ»×î´ó³Ì¶ÈµÄ±£Ö¤ÈÝ´íÐÔ²¢ÇÒºÜÈÝÒ××ö¸ºÔؾùºâ¡£ÕýÈçͼÖÐչʾµÄÒ»Ñù£¬Í¨³£Ã¿Ò»¸öHDFSµÄDataNodeÉϲ¿ÊðÒ»¸öImpalad½ø³Ì£¬ÓÉÓÚHDFS´æ´¢Êý¾Ýͨ³£ÊǶั±¾µÄ£¬ËùÒÔÕâÑùµÄ²¿Êð¿ÉÒÔ±£Ö¤Êý¾ÝµÄ±¾µØÐÔ£¬²éѯ¾¡¿ÉÄܵĴӱ¾µØ´ÅÅ̶ÁÈ¡Êý¾Ý¶ø·ÇÍøÂ磬´ÓÕâµã¿ÉÒÔÍÆ¶Ï³öImpalad¶ÔÓÚ±¾µØÊý¾ÝµÄ¶ÁȡӦ¸ÃÊÇͨ¹ýÖ±½Ó¶Á±¾µØÎļþµÄ·½Ê½£¬¶ø·Çµ÷ÓÃHDFSµÄ½Ó¿Ú¡£ÎªÁËʵÏÖ²éѯ·Ö¸îµÄ×ÓÈÎÎñ¿ÉÒÔ×öµ½¾¡¿ÉÄܵı¾µØÊý¾Ý¶ÁÈ¡£¬ImpaladÐèÒª´ÓMetastoreÖлñÈ¡±íµÄÊý¾Ý´æ´¢Â·¾¶£¬²¢ÇÒ´ÓNameNodeÖлñȡÿһ¸öÎļþµÄÊý¾Ý¿é·Ö²¼¡£

Catalog·þÎñÌṩÁËÔªÊý¾ÝµÄ·þÎñ£¬ËüÒÔµ¥µãµÄÐÎʽ´æÔÚ£¬Ëü¼È¿ÉÒÔ´ÓÍⲿϵͳ£¨ÀýÈçHDFS NameNodeºÍHive Metastore£©À­È¡ÔªÊý¾Ý£¬Ò²¸ºÔðÔÚImpalaÖÐÖ´ÐеÄDDLÓï¾äÌá½»µ½Metatstore£¬ÓÉÓÚImpalaûÓÐupdate/delete²Ù×÷£¬ËùÒÔËü²»ÐèÒª¶ÔHDFS×öÈκÎÐ޸ġ£Ö®Ç°ÎÒÃǽéÉܹýÓÐÁ½ÖÖ·½Ê½ÏòImpalaÖе¼ÈëÊý¾Ý£¨DDL£©¡ª¡ªÍ¨¹ýhive»òÕßimpala£¬Èç¹ûͨ¹ýhiveÔò¸Ä±äµÄÊÇHive metastoreµÄ״̬£¬´ËʱÐèҪͨ¹ýÔÚImpalaÖÐÖ´ÐÐREFRESHÒÔ֪ͨԪÊý¾ÝµÄ¸üУ¬¶øÈç¹ûÔÚimpalaÖвÙ×÷ÔòImpalad»á½«¸Ã¸üвÙ×÷֪ͨCatalog£¬ºóÕßͨ¹ý¹ã²¥µÄ·½Ê½Í¨ÖªÆäËüµÄImpalad½ø³Ì¡£Ä¬ÈÏÇé¿öÏÂCatalogÊÇÒì²½¼ÓÔØÔªÊý¾ÝµÄ£¬Òò´Ë²éѯ¿ÉÄÜÐèÒªµÈ´ýÔªÊý¾Ý¼ÓÔØÍê³ÉÖ®ºó²ÅÄܽøÐУ¨µÚÒ»´Î¼ÓÔØ£©¡£¸Ã·þÎñµÄ´æÔÚ½«ÔªÊý¾Ý´ÓImpalad½ø³ÌÖжÀÁ¢³öÀ´£¬¿ÉÒÔ¼ò»¯ImpaladµÄʵÏÖ£¬½µµÍImpaladÖ®¼äµÄñîºÏ¡£

³ýÁËCatalog·þÎñ£¬Impala»¹ÌṩÁËStateStore·þÎñÍê³ÉÁ½¸ö¹¤×÷£ºÏûÏ¢¶©ÔÄ·þÎñºÍ״̬¼à²â¹¦ÄÜ¡£CatalogÖеÄÔªÊý¾Ý¾ÍÊÇͨ¹ýStateStore·þÎñ½øÐй㲥·Ö·¢µÄ£¬ËüʵÏÖÁËÒ»¸öPub-Sub·þÎñ£¬Impalad¿ÉÒÔ×¢²áËüÃÇÏ£Íû»ñµÃµÄʼþÀàÐÍ£¬Statestore»áÖÜÆÚÐԵķ¢ËÍÁ½ÖÖÀàÐ͵ÄÏûÏ¢¸øImpalad½ø³Ì£¬Ò»ÖÖΪ¸ÃImpalad×¢²á¼àÌýµÄʼþµÄ¸üУ¬»ùÓÚ°æ±¾µÄÔöÁ¿¸üУ¨Ö»Í¨ÖªÉϴγɹ¦¸üÐÂÖ®ºóµÄ±ä»¯£©¿ÉÒÔ¼õСÿ´ÎͨÐŵÄÏûÏ¢´óС£»ÁíÒ»ÖÖÏûϢΪÐÄÌøÐÅÏ¢£¬StateStore¸ºÔðͳ¼ÆÃ¿Ò»¸öImpalad½ø³ÌµÄ״̬£¬Impalad¿ÉÒԾݴËÁ˽âÆäÓàImpalad½ø³ÌµÄ״̬£¬ÓÃÓÚÅжϷÖÅä²éѯÈÎÎñµ½ÄÄЩ½Úµã¡£ÓÉÓÚÖÜÆÚÐÔµÄÍÆËͲ¢ÇÒÿһ¸ö½ÚµãµÄÍÆËÍÆµÂʲ»Ò»Ö¿ÉÄܻᵼÖÂÿһ¸öImpalad½ø³Ì»ñµÃµÄ״̬²»Ò»Ö£¬ÓÉÓÚÿһ´Î²éѯֻÒÀÀµÓÚЭµ÷ÕßImpalad½ø³Ì»ñÈ¡µÄ״̬½øÐÐÈÎÎñµÄ·ÖÅ䣬¶ø²»ÐèÒª¶à¸ö½ø³Ì½øÐÐÔٴεÄЭµ÷£¬Òò´Ë²¢²»ÐèÒª±£Ö¤ËùÓеÄImpalad״̬ÊÇÒ»Öµġ£ÁíÍ⣬StateStore½ø³ÌÊǵ¥µãµÄ£¬²¢ÇÒ²»»á³Ö¾Ã»¯ÈκÎÊý¾Ýµ½´ÅÅÌ£¬Èç¹û·þÎñ¹Òµô£¬ImpaladÔòÒÀÀµÓÚÉÏÒ»´Î»ñµÃÔªÊý¾Ý״̬½øÐÐÈÎÎñ·ÖÅ䣬¹Ù·½²¢Ã»ÓÐÌṩ¿É¿¿ÐÔ²¿ÊðµÄ·½°¸£¬Í¨³£¿ÉÒÔʹÓÃDNS·½Ê½°ó¶¨¶à¸ö·þÎñÒÔÓ¦¶Ôµ¥¸ö·þÎñ¹ÒµôµÄÇé¿ö¡£

ImpaladÄ£¿é

´ÓImpaladµÄ¸÷¸öÄ£¿é¿ÉÒÔ¿´³ö£¬Ö÷Òª²éѯ´¦Àí¶¼ÊÇÔÚImpalad½ø³ÌÖÐÍê³É£¬StateStoreºÍCatalog°ïÖúImpaladÍê³ÉÔªÊý¾ÝµÄ¹ÜÀíºÍ¸ºÔØ¼à¿ØµÈ¹¤×÷£¬Æäʵ¸ü½øÒ»²½¿ÉÒÔ½«Query PlannerºÍQuery CoordinatorÄ£¿é´ÓImpaladÒÆ³öµ¥¶ÀµÄ×÷Ϊһ¸öÈë¿Ú·þÎñ´æÔÚ£¬¶øImpalad½ö¸ºÔðÊý¾Ý¶ÁдºÍ×ÓÈÎÎñµÄÖ´ÐС£

ÔÚImpalad½øÐÐÖ´ÐÐÓÅ»¯µÄʱºò¸ù±¾Ô­ÔòÊǾ¡¿ÉÄܵÄÊý¾Ý±¾µØ¶ÁÈ¡£¬¼õÉÙÍøÂçͨÐÅ£¬±Ï¾¹ÔÚ²»¿¼ÂÇÄڴ滺´æÊý¾ÝµÄÇé¿öÏ£¬´ÓÔ¶¶Ë¶ÁÈ¡Êý¾ÝÐèÒª´ÅÅÌ->ÄÚ´æ->Íø¿¨->±¾µØÍø¿¨->±¾µØÄÚ´æµÄ¹ý³Ì£¬¶ø´Ó±¾µØ¶ÁÈ¡Êý¾Ý½öÐèÒª±¾µØ´ÅÅÌ->±¾µØÄÚ´æµÄ¹ý³Ì£¬¿ÉÒÔ¿´³ö£¬ÔÚÏàͬµÄÓ²¼þ½á¹¹Ï£¬¶ÁÈ¡ÆäËû½ÚµãÊý¾ÝʼÖÕ±¾µØ´ÅÅ̵ÄÊý¾Ý¶ÁÈ¡ËÙ¶È¡£

Impalad·þÎñÓÉÈý¸öÄ£¿é×é³É£ºQuery Planner¡¢Query CoordinatorºÍQuery Executor£¬Ç°Á½¸öÄ£¿é×é³Éǰ¶Ë£¬¸ºÔð½ÓÊÕSQL²éѯÇëÇ󣬽âÎöSQL²¢×ª»»³ÉÖ´Ðмƻ®£¬½»Óɺó¶ËÖ´ÐУ¬Óï·¨·½ÃæËü¼ÈÖ§³Ö»ù±¾µÄ²Ù×÷£¨select¡¢project¡¢join¡¢group by¡¢filter¡¢order by¡¢limitµÈ£©£¬Ò²Ö§³Ö¹ØÁª×Ó²éѯºÍ·Ç¹ØÁª×Ó²éѯ£¬Ö§³Ö¸÷ÖÖouter-joinºÍ´°¿Úº¯Êý£¬Õⲿ·Ö°´ÕÕͨÓõĽâÎöÁ÷³Ì·ÖΪ²éѯ½âÎö->Óï·¨·ÖÎö->²éѯÓÅ»¯£¬×îÖÕÉú³ÉÎïÀíÖ´Ðмƻ®¡£¶ÔÓÚQuery Planner¶øÑÔ£¬ËüÉú³ÉÎïÀíÖ´Ðмƻ®µÄ¹ý³Ì·Ö³ÉÁ½²½£¬Ê×ÏÈÉú³Éµ¥½ÚµãÖ´Ðмƻ®£¬È»ºóÔÙ¸ù¾ÝËüµÃµ½·ÖÇø¿É²¢ÐеÄÖ´Ðмƻ®¡£Ç°ÕßÊǸù¾ÝÀàËÆÓÚRDBMS½øÐÐÖ´ÐÐÓÅ»¯µÄ¹ý³Ì£¬¾ö¶¨join˳Ðò£¬¶ÔjoinÖ´ÐÐν´ÊÏÂÍÆ£¬¸ù¾Ý¹ØÏµÔËË㹫ʽ½øÐÐһЩת»»µÈ£¬Õâ¸öÖ´Ðмƻ®µÄÉú³É¹ý³ÌÒÀÀµÓÚImpala±íºÍ·ÖÇøµÄͳ¼ÆÐÅÏ¢¡£µÚ¶þ²½ÊǸù¾ÝÉÏÒ»²½Éú³ÉµÄµ¥½ÚµãÖ´Ðмƻ®µÃµ½·Ö²¼Ê½Ö´Ðмƻ®£¬¿É²ÎÕÕDremelµÄÖ´Ðйý³Ì¡£ÔÚÉÏÒ»²½ÒѾ­¾ö¶¨ÁËjoinµÄ˳Ðò£¬ÕâÒ»²½ÐèÒª¾ö¶¨joinµÄ²ßÂÔ£ºÊ¹ÓÃhash join»¹ÊÇbroadcast join£¬Ç°ÕßÒ»°ãÕë¶ÔÁ½¸ö´ó±í£¬¸ù¾Ýjoin¼ü½øÐÐhash·ÖÇøÒÔʹµÃÏàͬµÄidÉ¢Áе½ÏàͬµÄ½ÚµãÉϽøÐÐjoin£¬ºóÕßͨ¹ý¹ã²¥Õû¸öС±íµ½ËùÓнڵ㣬ImpalaÑ¡ÔñµÄ²ßÂÔÊÇÒÀÀµÓÚÍøÂçͨÐŵÄ×îС»¯¡£¶ÔÓھۺϲÙ×÷£¬Í¨³£ÐèÒªÊ×ÏÈÔÚÿ¸ö½ÚµãÉÏÖ´ÐÐÔ¤¾ÛºÏ£¬È»ºóÔÙ¸ù¾Ý¾ÛºÏ¼üµÄÖµ½øÐÐhash½«½á¹ûÉ¢Áе½¶à¸ö½ÚµãÔÙ½øÐÐÒ»´Îmerge£¬×îÖÕÔÚcoordinator½ÚµãÉϽøÐÐ×îÖյĺϲ¢£¨Ö»ÐèÒªºÏ²¢¾Í¿ÉÒÔÁË£©£¬µ±È»¶ÔÓÚ·Çgroup byµÄ¾ÛºÏÔËË㣬Ôò¿ÉÒÔ½«Ã¿Ò»¸ö½ÚµãÔ¤¾ÛºÏµÄ½á¹û½»¸øÒ»¸ö½Úµã½øÐÐmerge¡£sortºÍtop-nµÄÔËËãºÍÕâ¸öÀàËÆ¡£

ÏÂͼչʾÁËÖ´ÐÐselect t1.n1, t2.n2, count(1) as c from t1 join t2 on t1.id = t2.id join t3 on t1.id = t3.id where t3.n3 between ¡®a¡¯ and ¡®f¡¯ group by t1.n1, t2.n2 order by c desc limit 100;²éѯµÄÖ´ÐÐÂß¼­£¬Ê×ÏÈQuery PlannerÉú³Éµ¥»úµÄÎïÀíÖ´Ðмƻ®£¬ÈçÏÂͼËùʾ£º

ºÍ´ó¶àÊýÊý¾Ý¿âʵÏÖÒ»Ñù£¬µÚÒ»²½Éú³ÉÁËÒ»¸öµ¥½ÚµãµÄÖ´Ðмƻ®£¬ÀûÓÃParquetµÈÁÐʽ´æ´¢£¬¿ÉÒÔÔÚSCAN²Ù×÷µÄʱºòÖ»¶ÁÈ¡ÐèÒªµÄÁУ¬²¢ÇÒ¿ÉÒÔ½«Î½´ÊÏÂÍÆµ½SCANÖУ¬´ó´ó½µµÍÊý¾Ý¶ÁÈ¡¡£È»ºóÖ´ÐÐjoin¡¢aggregation¡¢sortºÍlimitµÈ²Ù×÷£¬ÕâÑùµÄÖ´Ðмƻ®ÐèÒªÔÙת»»³É·Ö²¼Ê½Ö´Ðмƻ®£¬ÈçÏÂͼ¡£

ÕâÀàµÄ²éѯִÐÐÁ÷³ÌÀàËÆÓÚDremel£¬Ê×Ïȸù¾ÝÈý¸ö±íµÄ´óСȨºâʹÓõÄjoin·½Ê½£¬ÕâÀïT1ºÍT2ʹÓÃhash join£¬´ËʱÐèÒª°´ÕÕidµÄÖµ·Ö±ð½«T1ºÍT2·ÖÉ¢µ½²»Í¬µÄImpalad½ø³Ì£¬µ«ÊÇÏàͬµÄid»áÉ¢Áе½ÏàͬµÄImpalad½ø³Ì£¬ÕâÑùÿһ¸öjoinÖ®ºóÊÇÈ«²¿Êý¾ÝµÄÒ»²¿·Ö¡£¶ÔÓÚT3µÄjoinʹÓÃboardcastµÄ·½Ê½£¬Ã¿Ò»¸ö½Úµã¶¼»áÊÕµ½T3µÄÈ«²¿Êý¾Ý£¨Ö»ÐèÒªidÁУ©£¬ÔÚÖ´ÐÐÍêjoinÖ®ºó¿ÉÒÔ¸ù¾Ýgroup byÖ´Ðб¾µØµÄÔ¤¾ÛºÏ£¬Ã¿Ò»¸ö½ÚµãµÄÔ¤¾ÛºÏ½á¹ûÖ»ÊÇ×îÖÕ½á¹ûµÄÒ»²¿·Ö£¨²»Í¬µÄ½Úµã¿ÉÄÜ´æÔÚÏàͬµÄgroup byµÄÖµ£©£¬ÐèÒªÔÙ½øÐÐÒ»´ÎÈ«¾ÖµÄ¾ÛºÏ£¬¶øÈ«¾ÖµÄ¾ÛºÏͬÑùÐèÒª²¢ÐУ¬Ôò¸ù¾Ý¾ÛºÏÁнøÐÐhash·ÖÉ¢µ½²»Í¬µÄ½ÚµãÖ´ÐÐmergeÔËË㣨ÆäʵÈÔÈ»ÊÇÒ»´Î¾ÛºÏÔËË㣩£¬Ò»°ãÇé¿öÏÂΪÁ˽ÏÉÙÊý¾ÝµÄÍøÂç´«Ê䣬 intermediate½ÚµãͬÑùÒ²ÊÇworker½Úµã¡£Í¨¹ý±¾´ÎµÄ¾ÛºÏ£¬ÏàͬµÄkeyÖ»´æÔÚÓÚÒ»¸ö½Úµã£¬È»ºó¶ÔÓÚÿһ¸ö½Úµã½øÐÐÅÅÐòºÍTopN¼ÆË㣬×îÖÕ½«Ã¿Ò»¸öWorkerµÄ½á¹û·µ»Ø¸øcoordinator½øÐкϲ¢¡¢ÅÅÐò¡¢limit¼ÆË㣬·µ»Ø½á¹û¸øÓû§¡£

ImpaladÓÅ»¯

ÉÏÃæ½éÉÜÁËÕû¸ö²éѯ´óÖµÄÖ´ÐÐÁ÷³Ì£¬ImpaladµÄºó¶ËʹÓõÄÊÇC++ʵÏֵģ¬ÕâʹµÃËü¿ÉÒÔÕë¶ÔÓ²¼þ×öÒ»Ð©ÌØÊâµÄÓÅ»¯£¬²¢ÇÒ¿ÉÒÔ±ÈʹÓÃJAVAʵÏÖµÄSQLÒýÇæÓиüºÃµÄ×ÊԴʹÓÃÂÊ¡£ÁíÍ⣬ºó¶ËµÄʵÏÖʹÓÃÁËLLVM£¬ËüÊÇÒ»¸ö±àÒëÆ÷¿ò¼Ü£¬¿ÉÒÔÔÚÖ´ÐÐÆ÷Éú³É²¢±àÒë´úÂë¡£¹Ù·½²âÊÔ·¢ÏÖʹÓö¯Ì¬Éú³É´úÂë»úÖÆ¿ÉÒÔʹµÃºó¶ËÖ´ÐÐÐÔÄÜÌá¸ß1¡ª5±¶¡£

ÔÚÊý¾Ý·ÃÎÊ·½Ã棬Impalad²¢Ã»ÓÐʹÓÃͨÓõÄHDFS¶ÁÈ¡Êý¾ÝÄÇÒ»Ì×Á÷³Ì£¬±Ï¾¹ImpaladÒ»°ã²¿ÊðÔÚDataNodeÉÏ£¬·ÃÎÊÊý¾ÝÍêÈ«²»ÐèÒªÔÙ×ßNameNodeÁË£¬Òò´ËËüʹÓÃÁËHDFSÌṩµÄShort-Circuit Local Reads»úÖÆ£¬ËüÌṩÁËÖ±½Ó·ÃÎÊDataNodeµÄ·½°¸£¬¿ÉÒԲο¼Hadoop¹Ù·½ÎĵµºÍHDFS-347Á˽âÏêÇé¡£

×îºóImpaladºó¶ËÖ§³Ö¶ÔÖÐÎļþ¸ñʽºÍѹËõÊý¾ÝµÄ¶ÁÈ¡£¬°üÀ¨Avro¡¢RC¡¢Sequence¡¢Parquet£¬Ö§³Ösnappy¡¢gzip¡¢bz2µÈѹËõ£¬¿´À´Impala²»Ö§³Ö¿ÉÄÜÒ²²»´òËãÖ§³ÖORC¸ñʽÀ²£¬±Ï¾¹ÓÐ×Ô¼ÒÖ÷ÍÆµÄParquet£¬¶øORCÔòÔÚPrestoÖй㷺ʹÓá£

²¿Êð·½Ê½

ͨ³£Çé¿öÏ£¬ÎÒÃǻῼÂÇÁ½ÖÖ·½Ê½µÄ¼¯Èº²¿Ê𣺻ìºÏ²¿ÊðºÍ¶ÀÁ¢²¿Êð£¬ÏÂͼ·Ö±ðչʾÁË»ìºÏ²¿ÊðÓë¶ÀÁ¢²¿ÊðʱµÄ¸÷½Úµã½á¹¹¡£»ìºÏ²¿ÊðÒâζ׎«Impala¼¯Èº²¿ÊðÔÚHadoop¼¯ÈºÖ®ÉÏ£¬¹²ÏíÕû¸öHadoop¼¯ÈºµÄ×ÊÔ´£»¶ÀÁ¢²¿ÊðÔòÊǵ¥¶ÀʹÓò¿·Ö»úÆ÷Ö»²¿ÊðHDFSºÍImpala£¬Ç°ÕßµÄÓÅÊÆÊÇImpala¿ÉÒÔºÍHadoop¼¯Èº¹²ÏíÊý¾Ý£¬²»ÐèÒª½øÐÐÊý¾ÝµÄ¿½±´£¬µ«ÊÇ´æÔÚImpalaºÍHadoop¼¯ÈºÇÀÕ¼×ÊÔ´µÄÇé¿ö£¬½ø¶ø¿ÉÄÜÓ°ÏìImpalaµÄ²éѯÐÔÄÜ£¨MRÈÎÎñÒ²¿ÉÄܱ»ImpalaÓ°Ï죩£¬¶øºóÕß¿ÉÒÔÌṩÎȶ¨µÄ¸ßÐÔÄÜ£¬µ«ÊÇÐèÒª³ÖÐøµÄ´ÓHadoop¼¯Èº¿½±´Êý¾Ýµ½Impala¼¯ÈºÉÏ£¬Ôö¼ÓÁËETLµÄ¸´ÔÓ¶È¡£Á½ÖÖ·½Ê½¸÷ÓÐÓÅÁÓ£¬µ«ÊÇÕë¶ÔǰһÖÖ²¿Êð·½°¸£¬ÐèÒª¿¼ÂÇÈçºÎ·ÖÅä×ÊÔ´µÄÎÊÌ⣬Ê×ÏÈÔÚ»ìºÏ²¿ÊðµÄÇé¿öϲ»¿ÉÄÜÔÙÈÃImpalad½ø³Ì³£×¤£¨ÕâÑùÏ൱ÓÚ°Ñÿһ¸öNodeManagerµÄ×ÊÔ´·Ö³öÈ¥ÁËÒ»²¿·Ö£¬²¢ÇÒ²»Äܳä·ÖÀûÓü¯Èº×ÊÔ´£©£¬µ«ÊÇYARNµÄ×ÊÔ´·ÖÅä»úÖÆÑÓ³ÙÌ«´ó£¬¶ÔÓÚImpalaµÄ²éѯËÙ¶ÈÓкܴóµÄÓ°Ï죬ÓÚÊÇImpalaºÜÔç¾ÍÉè¼ÆÁËÒ»ÖÖÔÚYARNÉÏÍê³ÉImpala×ÊÔ´µ÷¶ÈµÄ·½°¸¡ª¡ªLlama£¨Low Latency Application MAster£©£¬ËüÆäʵÊÇÒ»¸öAMµÄ½ÇÉ«£¬¶ÔÓÚImpala¶øÑÔ¡£ËüµÄÒªÇóÊÇÔÚ²éѯִÐÐ֮ǰ±ØÐëÈ·±£ÐèÒªµÄ×ÊÔ´¿ÉÓ㬷ñÔò¿ÉÄܳöÏÖÒ»¸öImpaladµÄ×èÈû¶øÓ°ÏìÕû¸ö²éѯµÄÏìÓ¦ËÙ¶È£¨Ä¾Í°Ô­Àí£©£¬Llama»áÔÚImpala²éѯ֮ǰÉêÇë×ã¹»µÄ×ÊÔ´£¬²¢ÇÒÔÚ²éѯÍê³ÉÖ®ºó¾¡¿ÉÄܵĻº´æ×ÊÔ´£¬Ö»Óе±YARNÐèÒª½«¸Ã²¿·Ö×ÊÔ´ÓÃÓÚÆäËü¹¤×÷ʱ£¬Llama²Å»á½«×ÊÔ´ÊÍ·Å¡£ËäÈ»Llama¾¡¿ÉÄܵı£³Ö×ÊÔ´£¬µ«Êǵ±»ìºÏ²¿ÊðµÄÇé¿öÏ£¬»¹ÊÇ¿ÉÄÜ´æÔÚImpala²éѯ»ñÈ¡²»µ½×ÊÔ´µÄÇé¿ö£¬ËùÒÔΪÁ˱£Ö¤¸ßÐÔÄÜ£¬»¹Êǽ¨Òé¶ÀÁ¢²¿Êð¡£

²âÊÔ

ÎÒÃÇС×éµÄͬʶÔImpala×öÁËÒ»´Î»ùÓÚTPCDSÊý¾Ý¼¯µÄÐÔÄܲâÊÔ£¬·Ö±ð»ùÓÚ1TBºÍ10TBµÄÊý¾Ý¼¯£¬¿ÉÒÔ¿´³ö£¬ËüµÄ²éѯÐÔÄܽÏÖ®ÓÚHiveÓÐÊýÁ¿¼¶¼¶±ðµÄÌáÉý£¬¶Ô±ÈSpark SQLÒ²Óм¸±¶µÄÌáÉý£¬Compute stat²Ù×÷¿ÉÒÔ¸øImpala´øÀ´Ò»¶¨µÄ²éѯÓÅ»¯£¬µ«ÊÇż¶û·´¶øÎóµ¼²éѯÓÅ»¯Æ÷ÒÔÖÁÓÚÐÔÄÜϽµ£¬×îºóÎÒÃÇ»¹²âÊÔÁËImpala on Kudu£¬·¢ÏÖËü²¢Ã»ÓдﵽÒâÁÏÖеÄÐÔÄÜ£¨¼¸±¶µÄ²î±ð£©¡£Î¨Ò»µÄȱº¶ÊÇÎÒÃDz¢Ã»ÓжԶàÓû§²¢·¢³¡¾°Ï½øÐвâÊÔ£¬²»¹ý´Óµ¥¸ö²éѯµÄ×ÊÔ´ÏûºÄÀ´¿´£¬C++ʵÏÖµÄImpala¶Ô×ÊÔ´µÄÏûºÄÒ²ÊÇ×îÉٵģ¬¿ÉÒÔÍÆ¶Ï³öÔÚ¶àÓû§ÏÂËüÈÔÈ»ÄÜÂú×ã¿ìËÙÏìÓ¦µÄÐèÇó£¬×îºóÊǹٷ½¸ø³öµÄ¶àÓû§³¡¾°ÏµĶԱȽá¹û£¨Óеã¹ÊÒâºÚPrestoµÄ¸Ð¾õ£©¡£

1TBÊý¾Ý¼¯Óëspark¶Ô±È²âÊÔ½á¹û

10TBÊý¾Ý¼¯Óëspark¶Ô±È²âÊÔ½á¹û

Impala on parquetÓëImpala on Kudu¶Ô±È²âÊÔ½á¹û

²¢·¢²âÊÔ½á¹û

×ܽá

±¾ÎÄÖ÷Òª½éÉÜÁËImpalaÕâ¸ö¸ßÐÔÄܵÄad-hoc²éѯÒýÇæ£¬·Ö±ð´ÓʹÓá¢Ô­ÀíºÍ²¿ÊðµÈ·½Ãæ×öÁËÏêϸµÄ·ÖÎö£¬×îÖÕ»ùÓÚÎÒÃǵIJâÊÔ½á¹ûҲ֤ʵÁËËüµÄ¸ßÐÔÄÜ£¬Çø±ðÓÚ´«Í³DBMSµÄMPP½â¾ö·½°¸£¬ÀýÈçGreenplum¡¢Vertica¡¢TeradataµÈ£¬Impala¸üºÃµÄÈÚÈë´óÊý¾Ý£¨Hadoop/Spark£©Éú̬Ȧ£¬¸üºÃµÄʵÏÖÊý¾ÝÖ®¼äµÄÁ÷ͨ£¬¶ø´«Í³MPPÊý¾Ý¿â£¬¸üÇãÏòÓÚÊý¾Ý×ÔÖÆ¡£µ±È»»ùÓÚHDFSµÄʵÏÖµ¼ÖÂImpalaÎÞ·¨ÊµÏÖµ¥ÌõÊý¾ÝµÄʵʱ¸üУ¬¶øÖ»ÄÜÅúÁ¿µÄ×·¼Ó»òÕ߸²¸ÇÊý¾Ý£¬ËäÈ»ClouderaÒ²ÌṩÁËImpala¶ÔÓÚKuduµÄÖ§³Ö£¬µ«ÊÇ´ÓÐÔÄܲâÊÔ½á¹û¿´£¬Ä¿Ç°²éѯÐÔÄÜ»¹ÊDz»ÀíÏ룬¶ø´«Í³MPPÊý¾Ý¿â²»½ö¿ÉÒÔÖ§³Öµ¥ÌõÊý¾ÝµÄʵʱ¸üУ¬ÉõÖÁÄܹ»ÔÚ±£Ö¤²éѯÐÔÄܵÄÇé¿öÏÂÖ§³Ö½Ï¸´ÔÓµÄÊÂÎñ£¬ÕâÒ²ÊÇSQL-on-Hadoop²éѯÒýÇæËùÍû³¾Äª¼°µÄ¡£µ«ÊÇÎÞÂÛÈçºÎ£¬ÕâÀàµÄ²éѯÒýÇæ±Ï¾¹Ö§³ÖSQLÒýÇæ¶ø²»ÊÇÒ»¸öÍêÕûµÄÊý¾Ý¿âϵͳ£¬ËüÌṩ¸øÓû§ÔÚ´óÊý¾ÝȦÖиßÐÔÄܵIJéѯ·þÎñ£¬ÕâÒ²Äܹ»Âú×ãÁ˴󲿷ÖÓû§µÄÐèÇó¡£

 
   
1750 ´Îä¯ÀÀ       27
Ïà¹ØÎÄÕÂ

»ùÓÚEAµÄÊý¾Ý¿â½¨Ä£
Êý¾ÝÁ÷½¨Ä££¨EAÖ¸ÄÏ£©
¡°Êý¾Ýºþ¡±£º¸ÅÄî¡¢ÌØÕ÷¡¢¼Ü¹¹Óë°¸Àý
ÔÚÏßÉ̳ÇÊý¾Ý¿âϵͳÉè¼Æ ˼·+Ч¹û
 
Ïà¹ØÎĵµ

GreenplumÊý¾Ý¿â»ù´¡Åàѵ
MySQL5.1ÐÔÄÜÓÅ»¯·½°¸
ijµçÉÌÊý¾ÝÖÐ̨¼Ü¹¹Êµ¼ù
MySQL¸ßÀ©Õ¹¼Ü¹¹Éè¼Æ
Ïà¹Ø¿Î³Ì

Êý¾ÝÖÎÀí¡¢Êý¾Ý¼Ü¹¹¼°Êý¾Ý±ê×¼
MongoDBʵս¿Î³Ì
²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿âÉè¼ÆÓëÓÅ»¯
PostgreSQLÊý¾Ý¿âʵսÅàѵ
×îл¼Æ»®
DeepSeekÔÚÈí¼þ²âÊÔÓ¦ÓÃʵ¼ù 4-12[ÔÚÏß]
DeepSeek´óÄ£ÐÍÓ¦Óÿª·¢Êµ¼ù 4-19[ÔÚÏß]
UAF¼Ü¹¹ÌåϵÓëʵ¼ù 4-11[±±¾©]
AIÖÇÄÜ»¯Èí¼þ²âÊÔ·½·¨Óëʵ¼ù 5-23[ÉϺ£]
»ùÓÚ UML ºÍEA½øÐзÖÎöÉè¼Æ 4-26[±±¾©]
ÒµÎñ¼Ü¹¹Éè¼ÆÓ뽨ģ 4-18[±±¾©]
 
×îÐÂÎÄÕÂ
´óÊý¾Ýƽ̨ϵÄÊý¾ÝÖÎÀí
ÈçºÎÉè¼ÆÊµÊ±Êý¾Ýƽ̨£¨¼¼Êõƪ£©
´óÊý¾Ý×ʲú¹ÜÀí×ÜÌå¿ò¼Ü¸ÅÊö
Kafka¼Ü¹¹ºÍÔ­Àí
ELK¶àÖּܹ¹¼°ÓÅÁÓ
×îпγÌ
´óÊý¾Ýƽ̨´î½¨Óë¸ßÐÔÄܼÆËã
´óÊý¾Ýƽ̨¼Ü¹¹ÓëÓ¦ÓÃʵս
´óÊý¾ÝϵͳÔËά
´óÊý¾Ý·ÖÎöÓë¹ÜÀí
Python¼°Êý¾Ý·ÖÎö
³É¹¦°¸Àý
ijͨÐÅÉ豸ÆóÒµ PythonÊý¾Ý·ÖÎöÓëÍÚ¾ò
Ä³ÒøÐÐ È˹¤ÖÇÄÜ+Python+´óÊý¾Ý
±±¾© Python¼°Êý¾Ý·ÖÎö
ÉñÁúÆû³µ ´óÊý¾Ý¼¼Êõƽ̨-Hadoop
ÖйúµçÐÅ ´óÊý¾Ýʱ´úÓëÏÖ´úÆóÒµµÄÊý¾Ý»¯ÔËӪʵ¼ù