±à¼ÍƼö: |
±¾ÎĽ«Ïêϸ½éÉÜImpalaÊÇÈçºÎÔÚ²éѯִÐйý³ÌÖдÓHDFS»ñÈ¡Êý¾Ý,Ò²¾ÍÊÇImpalaÖÐHdfsScanNodeµÄʵÏÖϸ½Ú¡£
±¾ÎÄÀ´×Ô΢ÐŹ«ÖںţºÊý¾Ý¹ÜÀí£¬ÓÉ»ðÁú¹ûÈí¼þAlice±à¼¡¢ÍƼö¡£ |
|
ImpalaÊÇÒ»¸ö¸ßÐÔÄܵÄOLAPÒýÇæ£¬Impala±¾ÉíÖ»ÊÇÒ»¸öOLAP-SQLÒýÇæ£¬Ëü·ÃÎʵÄÊý¾Ý´æ´¢ÔÚµÚÈý·½ÒýÇæÖУ¬µÚÈý·½ÒýÇæ°üÀ¨HDFS¡¢Hbase¡¢kudu¡£¶ÔÓÚHDFSÉϵÄÊý¾Ý£¬ImpalaÖ§³Ö¶àÖÖÎļþ¸ñʽ£¬Ä¿Ç°¿ÉÒÔ·ÃÎÊParquet¡¢TEXT¡¢avro¡¢sequence
fileµÈ¡£¶ÔÓÚHDFSÎļþ¸ñʽ£¬Impala²»Ö§³Ö¸üвÙ×÷£¬ÕâÖ÷ÒªÏÞÖÆÓÚHDFS¶ÔÓÚ¸üвÙ×÷µÄÖ§³Ö±È½ÏÈõ¡£±¾ÎÄÖ÷Òª½éÉÜImpalaÊÇÈçºÎ·ÃÎÊHDFSÊý¾ÝµÄ£¬Impala·ÃÎÊHDFS°üÀ¨Èçϼ¸ÖÖÀàÐÍ£º1¡¢Êý¾Ý·ÃÎÊ£¨²éѯ£©£»2¡¢Êý¾ÝдÈ루²åÈ룩£»3¡¢Êý¾Ý²Ù×÷£¨ÖØÃüÃû¡¢Òƶ¯ÎļþµÈ£©¡£µ×²ã´æ´¢ÒýÇæµÄ´¦ÀíÐÔÄÜÖ±½Ó¾ö¶¨×ÅSQL²éѯµÄËÙ¶È¿ìÂý£¬Ä¿Ç°Impala+Parquet¸ñʽÎļþ´æ´¢µÄ²éѯÐÔÄÜ×öµ½ºÜºÃ£¬¿Ï¶¨ÊÇÓÐÆä¶ÀÌØµÄʵÏÖÔÀíµÄ¡£±¾ÎĽ«Ïêϸ½éÉÜImpalaÊÇÈçºÎÔÚ²éѯִÐйý³ÌÖдÓHDFS»ñÈ¡Êý¾Ý,Ò²¾ÍÊÇImpalaÖÐHdfsScanNodeµÄʵÏÖϸ½Ú¡£
Êý¾Ý·ÖÇø
ImpalaÖ´ÐвéѯµÄʱÊ×ÏÈÔÚFE¶Ë½øÐвéѯ½âÎö£¬Éú³ÉÎïÀíÖ´Ðмƻ®£¬½ø¶ø·Ö¸ô³É¶à¸öFragment£¨×Ó²éѯ£©£¬È»ºó½»ÓÉCoordinator´¦ÀíÈÎÎñ·Ö·¢£¬CoordinatorÔÚ×öÈÎÎñ·Ö·¢µÄʱºòÐèÒª¿¼Âǵ½Êý¾ÝµÄ±¾µØÐÔ£¬ËüÐèÒªÒÀÀµÓÚÿһ¸öÎļþËùÔڵĴ洢λÖã¨ÔÚÄĸöDataNodeÉÏ£©£¬ÕâÒ²¾ÍÊÇΪʲôͨ³£½«Impalad½Úµã²¿ÊðÔÚDataNodeͬһÅú»úÆ÷ÉϵÄÔÒò£¬ÎªÁ˽ҿªImpala·ÃÎÊHDFSµÄÃæÉ´ÐèÒªÏÈ´ÓImpalaÈçºÎ·ÖÅäɨÃèÈÎÎñ˵Æð¡£
ÖÚËùÖÜÖª£¬ÎÞÂÛÊÇMapReduceÈÎÎñ»¹ÊÇSparkÈÎÎñ£¬ËüÃÇÖ´ÐеÄ֮ǰ¶¼ÐèÒªÔÚ¿Í»§¶Ë½«ÊäÈëÎļþ½øÐзָȻºóÿһ¸öTask´¦ÀíÒ»¶ÎÊý¾Ý·ÖƬ£¬´Ó¶ø´ïµ½²¢Ðд¦ÀíµÄÄ¿µÄ¡£ImpalaµÄʵÏÖÒ²ÊÇÀàËÆµÄÔÀí£¬ÔÚÉú³ÉÎïÀíÖ´Ðмƻ®µÄʱºò£¬Impala¸ù¾ÝÊý¾ÝËùÔÚµÄλÖý«Fragment·ÖÅäµ½¶à¸öBackend
Impalad½ÚµãÉÏÖ´ÐУ¬ÄÇôÕâÀï´æÔÚÁ½¸öºËÐĵÄÎÊÌ⣺
ImpalaÈçºÎ»ñȡÿһ¸öÎļþµÄλÖã¿
ÈçºÎ¸ù¾ÝÊý¾ÝλÖ÷ÖÅä×ÓÈÎÎñ£¿
ÔÚ֮ǰ½éÉܵÄImpalaµÄ×ÜÌå¼Ü¹¹¿ÉÒÔ¿´µ½£¬Catalogd½Úµã¸ºÔðÕû¸öϵͳµÄÔªÊý¾Ý£¬ÔªÊý¾ÝÊÇÒÔ±íΪµ¥Î»µÄ£¬ÕâЩԪÊý¾Ý¾ßÓÐÒ»¸ö²ã¼¶µÄ¹ØÏµ£¬ÈçÏÂͼËùʾ

Impala±íÔªÊý¾Ý½á¹¹
ÿһ¸ö±í°üº¬ÈçÏÂÔªÊý¾Ý£¨Ö»Ñ¡È¡±¾ÎÄÐèÒªÓõ½µÄ£©£º
1.schemaÐÅÏ¢£º¸Ã±íÖаüº¬ÄÄЩÁУ¬Ã¿Ò»ÁеÄÀàÐÍÊÇʲôµÈ
2.±íÊôÐÔÐÅÏ¢£ºÓµÓÐÕß¡¢Êý¾Ý¿âÃû¡¢·ÖÇøÁС¢±íµÄ¸ù·¾¶¡¢±í´æ´¢¸ñʽ¡£
3.±íͳ¼ÆÐÅÏ¢£ºÖ÷Òª°üÀ¨±íÖÐ×ܵļǼÊý¡¢ËùÓÐÎļþ×Ü´óС¡£
4.·ÖÇøÐÅÏ¢£ºÃ¿Ò»¸ö·ÖÇøµÄÏêϸÐÅÏ¢¡£
ÿһ¸ö·ÖÇø°üº¬ÈçÏÂÐÅÏ¢£º
1.·ÖÇøÃû£ºÓÉËùÓеķÖÇøÁкÍÿһÁжÔÓ¦µÄֵΨһȷ¶¨µÄ
2.·ÖÇøÎļþ¸ñʽ£ºÃ¿Ò»¸ö·ÖÇø¿ÉÒÔʹÓò»Í¬µÄÎļþ¸ñʽ´æ´¢£¬½âÎöʱ¸ù¾Ý¸Ã¸ñʽ¶ø·Ç±íÖеÄÎļþ´æ´¢¸ñʽ£¬Èç¹û´´½¨·ÖÇøÊ±²»Ö¸¶¨ÔòΪ±íµÄ´æ´¢¸ñʽ¡£
3.·ÖÇøµÄËùÓÐÎļþÐÅÏ¢£º±£´æÁ˸÷ÖÇøÏÂÿһ¸öÎļþµÄÏêϸÐÅÏ¢£¬ÕâÒ²µ¼ÖÂÁËÖØÐÂдÈëÊý¾ÝÖ®ºóÐèÒªREFRESH±í¡£
ÿһ¸öÎļþ°üº¬ÈçϵÄÐÅÏ¢£º
1.¸ÃÎļþµÄ»ù±¾ÐÅÏ¢£ºÍ¨¹ýFileStatus¶ÔÏ󱣴棬°üÀ¨ÎļþÃû¡¢Îļþ´óС¡¢×îºóÐÞ¸Äʱ¼äµÈ¡£
2.ÎļþµÄѹËõ¸ñʽ£º¸ù¾ÝÎļþÃûµÄºó׺¾ö¶¨¡£
3.ÎļþÖÐÿһ¸öBLOCKµÄÐÅÏ¢£ºÒòΪHDFS´æ´¢ÎļþÊǰ´ÕÕBLOCK½øÐл®·ÖµÄ£¬Òò´ËImpalaҲͬÑù´æ´¢Ã¿Ò»¸ö¿éµÄÐÅÏ¢¡£
ÿһ¸öBLOCK°üº¬ÈçϵÄÐÅÏ¢£º
1.Õâ¸öBLOCK´¦ÓÚÎļþµÄÆ«ÒÆÁ¿¡¢BLOCK³¤¶È¡£
2.Õâ¸öBLOCKËùÔÚµÄDatanode½Úµã£ºÃ¿Ò»¸öBLOCKĬÈϻᱻ´æ´¢¶à¸ö¸±±¾£¬·Ö²¼ÔÚ²»Í¬µÄDatanodeÉÏ¡£
3.Õâ¸öBLOCKËùÔÚµÄDatanodeµÄDiskÐÅÏ¢£ºÕâ¸öBLOCK´æ´¢ÔÚ¶ÔÓ¦µÄDatanodeµÄÄÄÒ»¿é´ÅÅÌÉÏ£¬Èç¹û²éѯ²»µ½Ôò·µ»Ø-1±íʾδ֪¡£
ÈÎÎñ·Ö·¢
´ÓÉÏÃæµÄÔªÊý¾ÝÃèÊö¿ÉÒÔ½â´ðÎÒÃǵĵÚÒ»¸öÎÊÌ⣬ÿһ¸ö±íËùÓµÓеÄÈ«²¿ÎļþÐÅÏ¢¶¼ÔÚ±í¼ÓÔØµÄʱºòÓÉImpala»º´æ²¢ÇÒͨ¹ýstatestoredͬ²½µ½Ã¿Ò»¸öimpalad½Úµã»º´æ£¬ÔÚimpaladÉú³ÉHdfsScanNode½Úµãʱ»áÊ×Ïȸù¾Ý¸Ã±íµÄ¹ýÂËÌõ¼þ¹ýÂ˵ô²»±ØÒªµÄ·ÖÇø£¨·ÖÇø¼ôÖ¦£©£¬È»ºó±éÀúÿһ¸öÐèÒª´¦Àí·ÖÇøÎļþ£¬»ñȡÿһ¸öÐèÒª´¦ÀíµÄBLOCKµÄ»ù±¾ÐÅÏ¢ºÍλÖÃÐÅÏ¢£¬·µ»Ø¸øCoordinator×÷Ϊ·ÖÅäHdfsScanNodeµÄÊäÈë¡£ÕâÀﻹÓÐÒ»¸öÎÊÌ⣺ÿһ¸ö·ÖÅäµÄrangeÊǶà´óÄØ£¿Õâ¸öÒÀÀµÓÚ²éѯµÄÅäÖÃÏîMAX_SCAN_RANGE_LENGTH£¬Õâ¸öÅäÖÃÏî±íʾÿһ¸öɨÃèµÄµ¥ÔªµÄ×î´ó³¤¶È£¬¸ù¾Ý¸ÃÅäÖÃÏîµÃµ½Ã¿Ò»¸örangeµÄ´óСΪ£º
1.MAX_SCAN_RANGE_LENGTH £º Èç¹ûÅäÖÃÁ˸ÃÅäÖÃÏî²¢ÇÒ¸ÃÅäÖÃÏîСÓÚBLOCK´óС¡£
2.BLOCK´óС £º Èç¹ûÅäÖÃÁËMAX_SCAN_RANGE_LENGTHµ«ÊǸÃÅäÖÃÖµ´óÓÚHDFSµÄBLOCK´óС¡£
3.BLOCK´óС £º Èç¹ûûÓÐÅäÖÃMAX_SCAN_RANGE_LENGTH
4.Õû¸öÎļþ´óС £º Èç¹ûÎļþµÄ´óССÓÚÒ»¸öHDFSµÄBLOCK´óС¡£
µ½ÕâÒ»²½µÃµ½ÁËÿһ¸öHdfsScanNodeɨÃèµÄrangeÁÐ±í£¬Ã¿Ò»¸örange°üº¬ËùÊôµÄÎļþ¡¢¸ÃrangeµÄÆðÊ¼Æ«ÒÆÁ¿ºÍ³¤¶È£¬ÒÔ¼°¸ÃrangeËùÊôµÄBLOCKËùÔÚµÄDataNodeµØÖ·¡¢ÔÚDataNodeµÄDisk
idÒÔ¼°¸ÃBLOCKÊÇ·ñÒѱ»HDFS»º´æµÈÐÅÏ¢¡£
Íê³ÉÁËSQL½âÎö£¬Coordinator»á¸ù¾Ý·ÖÅäµÄ×ÓÈÎÎñ£¨±¾ÎÄÖ»¹ØÐÄHdfsScanNode£©ºÍÊý¾Ý·Ö²¼½øÐÐÈÎÎñµÄ·Ö·¢£¬·Ö·¢µÄÂß¼ÓÉCoordinatorµÄScheduler::
ComputeScanRangeAssignmentº¯ÊýÍê³É£¬ÓÉÓÚÿһ¸örange°üº¬Á˴洢λÖã¬Impala»áÊ×Ïȸù¾Ýÿһ¸öBLOCKÊÇ·ñÒѱ»»º´æ£¬»òÕßÊÇ·ñ´æ´¢ÔÚijһ¸öimpalad±¾µØ½ÚµãÉÏ£¬Ç°Õß±íʾ¿ÉÒÔÖ±½Ó´Ó»º´æ£¨Äڴ棩ÖжÁÈ¡Êý¾Ý£¬ºóÕßÒâζ×Å¿ÉÒÔͨ¹ýshortcutµÄ·½Ê½¶ÁÈ¡HDFSÊý¾Ý£¬ÕâÀïÐèÒªÌáµ½Ò»¸ö¶ÁÈ¡¾àÀëµÄ¸ÅÄImpalaÖн«¾àÀë´Ó½üµ½Ô¶·ÖΪÈçϼ¸ÖÖ£º
1.CACHE_LOCAL : ¸ÃrangeÒÑ»º´æ£¬²¢ÇÒ»º´æµÄDataNodeÊÇÒ»¸öimpalad½Úµã
2.CACHE_RACK : ¸ÃrangeÒÑ»º´æ£¬²¢ÇÒ»º´æÔÚÏàͬ»ú¼ÜµÄDataNodeÉÏ£¬Ä¿Ç°Ã»ÓÐʹÓá£
3.DISK_LOCAL : ¸Ãrange¿ÉÒÔ´Ó±¾µØ¶ÁÈ¡£¬ÒâζןÃBLOCKËùÔÚµÄDataNodeºÍ´¦Àí¸ÃBLOCKµÄimpalaÔÚͬһ¸ö»úÆ÷ÉÏ¡£
4.DISK_RACK : ¸Ãrange¿ÉÒÔ´Óͬһ¸ö»ú¼ÜµÄ´ÅÅ̶ÁÈ¡£¬Ä¿Ç°Ã»ÓÐʹÓá£
5.REMOTE : ¸Ãrange²»ÄÜͨ¹ý±¾µØ¶ÁÈ¡£¬Ö»ÄÜͨ¹ýHDFSÔ¶³Ì¶ÁÈ¡µÄ·½Ê½»ñÈ¡¡£
¿Í»§¶Ë²éѯµÄʱºò¿ÉÒÔÉèÖÃREPLICA_PREFERENCEÅäÖÃÏ¸ÃÅäÖÃÏî±íʾ±¾´Î²éѯ¸üÇãÏòÓÚʹÓÃÄÄÖÖ¾àÀëµÄ¸±±¾£¬Ä¬ÈÏΪ0±íʾCACHE_LOCAL£¬ÆäËûµÄÅäÖÃÓÐ3ºÍ5£¬·Ö±ð±íʾDISK_LOCALºÍREMOTE¡£ÁíÍâ¿ÉÒÔÅäÖÃDISABLE_CACHED_READSÉèÖÃÊÇ·ñ¿ÉÒÔ´Ó»º´æÖжÁÈ¡£¬³ý´ËÖ®Í⣬¿ÉÒÔÔÚSQLµÄhintsÖÐÉèÖÃĬÈ϶ÁÈ¡µÄ¾àÀë¡£×îºó£¬¿ÉÒÔÔÚSQLµÄhintsÖÐÉèÖÃÊÇ·ñËæ»úÑ¡Ôñ¸±±¾£¬ÓÐÁËÕâÁ½¸öÅäÖýÓÏÂÀ´¾Í¿ÉÒÔ¸ù¾ÝrangeµÄλÖüÆËãÿһ¸örangeÓ¦¸Ã±»Äĸöimpalad´¦Àí¡£
´¦ÀírangeµÄ·ÖÅäÊ×ÏÈÐèÒª¼ÆËã³ö¸ÃrangeµÄ×î¶Ì¾àÀ룬·ÖΪÁ½ÖÖÇé¿ö£º
1.Èç¹û×î¶ÌµÄ¾àÀëÊÇREMOTE£¬±íʾ¸ÃrangeËùÔÚµÄDataNodeûÓв¿Êðimpalad½Úµã£¬ÕâÖÖrange´ÓËùÓÐimpaladÖÐÑ¡ÔñÒ»¸öĿǰÒÑ·ÖÅäµÄrange×Ö½ÚÊý×îÉÙµÄimpalad¡£
2.CACHE_LOCALºÍDISK_LOCALµÄÇø±ðÔÚÓÚǰÕß¿ÉÒÔËæ»úÑ¡Ôñ£¬´Ëʱ¿ÉÒÔ´ÓËùÓÐÂú×ãÌõ¼þµÄ¸±±¾£¨¸Ã¸±±¾µÄ¾àÀëµÈÓÚ×î¶Ì¾àÀë£©Ëæ»úÑ¡ÔñÒ»¸öimpalad·ÖÅ䣬·ñÔò·ÖÅäµ½ÒÑ·ÖÅäµÄ×Ö½ÚÊý×îÉÙµÄimpalad¡£
½²µ½ÕâÀҲ¾Í»Ø´ðÁËÉÏÃæµÄµÚ¶þ¸öÎÊÌ⣬Impala¸ù¾Ýÿһ¸örangeËùÔÚµÄλÖ÷ÖÅäµ½impaladÉÏ£¬¾¡¿ÉÄܵÄ×öµ½rangeµÄ·ÖÅä¸ü¾ùºâ²¢ÇÒ¾¡¿ÉÄܵĴӱ¾µØÉõÖÁ»º´æÖжÁÈ¡¡£½ÓÏÂÀ´ÐèÒª¿´Ò»ÏÂHdfsScanNodeÊÇÈçºÎÔËÐеġ£
HdfsScanNodeµÄʵÏÖ
Ç°ÃæÎÒÃÇÌáµ½¹ý£¬HdfsScanNodeµÄ×÷ÓÃÊÇ´Ó±£´æÔÚHDFSÉϵÄÌØ¶¨¸ñʽµÄÎļþ¶ÁÈ¡Êý¾Ý£¬È»ºó¶ÔÆä½øÐнâÎöת»»³ÉÒ»ÌõÌõ¼Ç¼£¬½«ËüÃÇ´«µÝ¸ø¸¸Ö´Ðнڵ㴦Àí£¬Òò´ËÏÂÃæ½éÉܵĹý³ÌÖ÷ÒªÊÇÔÚÒÑ֪ɨÃèÄÄЩÊý¾ÝµÄÇé¿öÏ·µ»ØËùÓÐÐèÒª»ñÈ¡µÄ¼Ç¼¡£ÔÚÕâ֮ǰ£¬¿ÉÒÔÏÈ¿´Ò»ÏÂBEÄ£¿éµÄScanNodeµÄÀà½á¹¹£º

ImpalaÖ´ÐнڵãÀà²ã´Î
¼¯ºÏÉÏͼºÍImpalaÖ´ÐÐÂß¼£¬SQLÉú³ÉµÄÎïÀíÖ´Ðмƻ®ÖÐÿһ¸ö½Úµã¶¼ÊÇExecNodeµÄ×ÓÀ࣬¸ÃÀàÌṩÁË6¸ö½Ó¿Ú£º
Initº¯Êý£º¸Ãº¯ÊýÔÚ´´½¨ExecNode½ÚµãµÄʱºò±»µ÷Ó㬲ÎÊý·Ö±ðÊǸÃÖ´ÐнڵãµÄÏêϸÃèÊöÐÅÏ¢ºÍÕû¸öFragmentµÄÉÏÏÂÎÄ¡£HdfsScanNode³õʼ»¯µÄʱºò»á½âÎöruntime
filterÐÅÏ¢ºÍ²éѯÖÐÖ¸¶¨µÄ¸Ã±íµÄfilterÌõ¼þ¡£ÁíÍ⻹³õʼ»¯Ò»Ð©¸Ã½ÚµãµÄͳ¼ÆÖ¸±ê¡£
Prepareº¯Êý£º¸Ãº¯ÊýÔÚFragmentÖ´ÐÐPrepareº¯ÊýµÄʱºòµÝ¹éµÄµ÷ÓøÃ×ÓÊ÷ËùÓнڵãµÄPrepareº¯Êý£¬HdfsScanNodeµÄPrepareº¯Êý³õʼ»¯¸Ã±íµÄÃèÊöÐÅÏ¢ÒÔ¼°ÐèÒª¶ÁÈ¡²¢½»¸ø¸¸½ÚµãµÄ¼Ç¼°üº¬ÄÄЩÁУ¬³õʼ»¯Ã¿Ò»¸örangeɨÃèµÄÐÅÏ¢£¨´´½¨Hdfs
handlerµÈ£©¡£
Codegenº¯Êý£º¸Ãº¯ÊýʵÏÖÿһ¸ö½ÚµãµÄcodegen£¬ImpalaÀûÓÃLLVMʵÏÖcodegenµÄ¹¦ÄÜ£¬¼õÉÙÐ麯ÊýµÄµ÷Óã¬Ò»¶¨³Ì¶ÈÉÏÌáÉýÁ˲éѯÐÔÄÜ£¬HdfsScanNodeÔÚCodegenÖÐÉú³ÉÿһÖÖÎļþ¸ñʽµÄcodegen¡£
Openº¯Êý£º¸Ãº¯ÊýÔÚÖ´ÐÐ֮ǰ±»µ÷Óã¬Íê³ÉÖ´ÐÐ֮ǰµÄ³õʼ»¯¹¤×÷£¬ÔÚHdfsScanNodeµÄOpenº¯ÊýÖгõʼ»¯×î´óµÄscannerÏß³ÌÊý£¬²¢ÇÒ×¢²áThreadTokenAvailableCbº¯ÊýÓÃÓÚÆô¶¯ÐµÄscannerÏ̡߳£
GetNextº¯Êý£º¸Ãº¯Êýÿ´ÎÊä³öÒ»¸örow_batch£¬²¢ÇÒ´«Èëeos±äÁ¿ÓÃÓÚÉèÖøýڵãÊÇ·ñÖ´ÐÐÍê³É£¬HdfsScanNode»á±»¸¸½ÚµãÑ»·µÄµ÷Óã¬Ã¿´Î·µ»ØÒ»¸örow_batch¡£
Clodeº¯Êý£º¸Ãº¯ÊýÔÚÍê³Éʱ±»µ÷Ó㬴¦ÀíһЩ×ÊÔ´ÊͷźÍͳ¼ÆµÄ²Ù×÷¡£
¶ÔÓÚÿһ¸öExecNode£¬ÕæÕýÖ´ÐÐÂß¼Ò»°ãÊÇÔÚOpenºÍGetNextº¯ÊýÖУ¬ÔÚHdfsScanNode½ÚµãÖÐÒ²ÊÇÈç´Ë£¬¸Õ²ÅÌáµ½Openº¯ÊýÖлá×¢²áÒ»¸ö»Øµ÷º¯Êý£¬¸Ãº¯Êý±»µ÷ÓÃʱ»áÅжϵ±Ç°ÊÇ·ñÐèÒªÆô¶¯ÐµÄscannerỊ̈߳¬ÄÇôÊÇscannerÏß³ÌÓÖÊÇÊ²Ã´ÄØ£¿ÕâÀï¾ÍÐèÒª½éÉÜÒ»ÏÂimpaladÖ´ÐÐÊý¾ÝɨÃèµÄÄ£ÐÍ£¬impaladÖ´Ðйý³ÌÖлὫÊý¾Ý¶ÁÈ¡ºÍÊý¾ÝɨÃè·Ö¿ª£¬Êý¾Ý¶ÁÈ¡ÊÇÖ¸´ÓÔ¶³ÌHDFS»òÕß±¾µØ´ÅÅ̶ÁÈ¡Êý¾Ý£¬Êý¾ÝɨÃèÊÇÖ¸»ùÓÚ¶ÁÈ¡µÄÔʼÊý¾Ý¶ÔÆä½øÐÐת»»£¬×ª»»Ö®ºóµÄ¾ÍÊÇÒ»ÌõÌõ¼Ç¼Êý¾Ý¡£ËüÃǵÄÏß³ÌÄ£Ðͺ͹ØÏµÈçÏÂͼËùʾ£º

ImpalaÊý¾Ý´¦ÀíÏß³ÌÄ£ÐÍ
ÎÒÃÇ´ÓÏÂÍùÉÏ¿´Õâ¸ö´¦ÀíÄ£ÐÍ£¬×îµ×²ãµÄÏ̳߳ØÊÇHDFSÊý¾ÝI/OÏ̳߳أ¬Õâ¸öÏ̳߳ØÔÚimpalad³õʼ»¯µÄʱºòÆô¶¯ºÍ³õʼ»¯£¬impalad½«ÕâЩÏ̷߳ÖΪ±¾µØ´ÅÅÌÏ̺߳ÍÔ¶³Ì·ÃÎÊÊý¾ÝỊ̈߳¬±¾µØ´ÅÅÌÏß³ÌÐèҪΪÿһ¸ö´ÅÅÌÆô¶¯Ò»×éỊ̈߳¬Ëü¸ù¾ÝϵͳÅäÖÃnum_threads_per_diskÏî¾ö¶¨£¬Ä¬ÈÏÇé¿ö϶ÔÓÚÿһ¸ö»úе´ÅÅÌÆô¶¯1¸öỊ̈߳¬ÕâÑù¿ÉÒÔ±ÜÃâ´óÁ¿µÄËæ»ú¶ÁÈ¡£¨±ÜÃâ´óÁ¿µÄ´ÅÅÌѰµÀ£©£»¶ÔÓÚFLASH´ÅÅÌ£¨SSD£©£¬Ä¬ÈÏÇé¿ö¶ÔÓÚÿһ¿é´ÅÅÌÆô¶¯8¸öÏ̡߳£Ô¶³ÌÊý¾Ý·ÃÎÊÏß³ÌÊýÓÉϵͳÅäÖÃnum_remote_hdfs_io_threads¾ö¶¨£¬Ä¬ÈÏÇé¿öÏÂÆô¶¯8¸öỊ̈߳¬Ã¿Ò»¸öÏß³ÌÓµÓÐÒ»¸ö×èÈû¶ÓÁУ¬ScannerÏß³Ìͨ¹ý´«µÝ¹²Ïí±äÁ¿ScanRange¶ÔÏ󣬸öÔÏó°üº¬¶ÁÈ¡Êý¾ÝµÄÊäÈ룺Îļþ¡¢rangeµÄÆ«ÒÆÁ¿£¬rangeµÄ³¤¶È£¬´ÅÅÌIDµÈ£¬ÔÚ¶ÁÈ¡µÄ¹ý³ÌÖлáÏò¸Ã¶ÔÏóÖÐÌî³ä¶ÁÈ¡µÄÒ»¸ö¸öÄÚ´æ¿é£¬ÄÚ´æ¿éµÄ´óС¾ö¶¨ÁËÿ´Î´ÓHDFSÖжÁÈ¡µÄÊý¾ÝµÄ´óС£¬Ä¬ÈÏÊÇ8MB£¨ÏµÍ³ÅäÖÃÏîread_sizeÅäÖã©£¬²¢ÇÒÔÚScanRange¶ÔÏóÖмǼ±¾µØ¶ÁÈ¡Êý¾ÝºÍÔ¶³Ì¶ÁÈ¡Êý¾Ý´óС£¬±ãÓÚÉú³É¸Ã²éѯµÄͳ¼ÆÐÅÏ¢¡£
½«Êý¾Ý¶ÁÈ¡ºÍÊý¾Ý½âÎö·ÖÀëÊÇΪÁ˱£Ö¤±¾µØ´ÅÅ̶ÁдµÄ˳ÐòÐÔÒÔ¼°Ô¶³ÌÊý¾Ý¶ÁÈ¡²»Õ¼ÓùýÁ¿µÄCPU£¬¶øScannerÏ̵߳ÄÖ´ÐÐÐèÒªÒÀÀµÓÚDiskỊ̈߳¬ScannerÏß³ÌµÄÆô¶¯ÊÇÓɻص÷º¯ÊýThreadTokenAvailableCb´¥·¢µÄ£¬ÎÒÃÇÏÂÃæÔÚ×ö½éÉÜ£¬µ±µ÷ÓÃgetNext·½·¨»ñȡһ¸ö¸örow_batchʱ£¬HdfsScanNode»áÅжÏÊÇ·ñÊǵÚÒ»´Îµ÷Óã¬Èç¹ûÊǵÚÒ»´Îµ÷Óûᴥ·¢ËùÓÐÐèҪɨÃèµÄrangeµÄÇëÇóÏ·¢µ½Disk
I/OÏ̳߳أ¬É¨Ãè²Ù×÷ÐèÒª¸ù¾ÝÎļþÀàÐÍɨÃ費ͬµÄÇøÓò£¬ÀýÈç¶ÔÓÚparquet×ÜÊÇÐèҪɨÃèÎļþµÄfooterÐÅÏ¢¡£ÕâÀïÐèÒªÌáµ½Ò»¸ö²åÇú£¬Èç¹û¸Ã±íÐèҪʹÓÃruntime
filterÐèÒªÔÚɨÃèÎļþ֮ǰµÈ´ýruntime filterµ½´ï£¨³¬Ê±Ê±¼äĬÈÏÊÇ1s£©¡£
ÎÒÃÇ¿ÉÒÔ¼ÙÉ裬ÔÚµÚÒ»¸ögetNextµ÷ÓÃÖ®ºó£¬ËùÓеÄÊý¾Ý¶¼ÒѾ±»¶ÁÈ¡ÁË£¬ËäÈ»¿ÉÄÜÓеÄrangeµÄÊý¾Ý¶ÁÈ¡±»blockÁË£¨¿ÉÄÜδ±»µ÷¶È»òÕßÄÚ´æÒѾʹÓõ½ÁËÉÏÏߣ©£¬µ«ÊÇÕâЩ¶ÔÓÚscannerÏß³ÌÊÇ͸Ã÷µÄ£¬scannerÏß³ÌÖ»ÐèÒª´Óreadercontext¶ÔÏóÖлñÈ¡ÒѶÁÈ¡µÄÊý¾Ý£¨»ñÈ¡Êý¾ÝµÄ²Ù×÷¿ÉÄÜ×èÈû£©½øÐнâÎöµÄ´¦Àí¡£µ½ÕâÀÊý¾ÝÒѾ±»I/OÏ̶߳ÁÈ¡ÁË£¬ÄÇôʲôʱºò»áÆô¶¯ScannerÏß³ÌÄØ£¿
Êý¾Ý½âÎöºÍ´¦Àí
Ç°ÃæÌáµ½ScannerÏß³ÌµÄÆô¶¯ÊÇThreadTokenAvailableCbº¯Êý´¥·¢µÄ£¬µ±Ã¿´ÎÏòDiskÏ̳߳ØÖÐÇëÇóRangeScanÇëÇóʱ»á´¥·¢¸Ãº¯Êý£¬¸Ãº¯ÊýÐèÒª¸ù¾Ýµ±Ç°FragmentºÍϵͳÖÐ×ÊԴʹÓõÄÇé¿ö¾ö¶¨Æô¶¯¶àÉÙScannerỊ̈߳¬µ±Ã¿Ò»¸öScannerÏß³ÌÖ´ÐÐÍê³ÉÖ®ºó»áÖØÐ´¥·¢¸Ã»Øµ÷º¯ÊýÆô¶¯ÐµÄScannerÏ̡߳£Ã¿Ò»¸öScannerÏ̷߳ÖÅäÒ»¸öScanRange¶ÔÏ󣬸öÔÏóÖб£´æÁËÒ»¸ö·ÖÇøµÄÈ«²¿Êý¾Ý¡£×îºóµ÷ÓÃProcessSplitº¯Êý£¬¸Ãº¯Êý´¦ÀíÕâ¸ö·ÖÇøµÄÊý¾Ý½âÎö¡£

HDFSÎļþÊý¾Ý´¦ÀíÀà²ã´Î
ÉÏͼÃèÊöÁ˲»Í¬HDFSÎļþÀàÐ͵ÄScannerÀà½á¹¹£¬²»Í¬µÄÎļþÀàÐÍʹÓò»Í¬µÄScanner½øÐÐɨÃèºÍ½âÎö£¬ÕâÀïÎÒÃÇÒԱȽϼòµ¥µÄTEXT¸ñʽΪÀýÀ´ËµÃ÷¸ÃÁ÷³Ì£¬TEXT¸ñʽµÄ±íÐèÒªÔÚ½¨±íµÄʱºòÖ¸¶¨Ðзָô·û¡¢Áзָô·ûµÈÔªÊý¾Ý£¬·ÖÇøÊý¾ÝµÄ½âÎöÒÀÀµÓÚÕâЩ·Ö¸ô·ûÅäÖá£ÎªÁËÌáÉý½âÎöÐÔÄÜ£¬ImpalaʹÓÃÁËCodegen¼ÆÊýºÍSSE4Ö¸Áµ«ÊÇÓÉÓÚ·ÖÇøµÄ»®·ÖÊǰ´ÕÕBLOCKÀ´µÄ£¬¶øÃ¿Ò»¸öBLOCK¾ø´ó²¿·ÖÇé¿öÏÂÆäʵºÍ½áÊø¶¼´¦ÓÚÒ»Ìõ¼Ç¼µÄÖм䣬¶øÇÒÿ´Î¶ÁÈ¡Êý¾ÝµÄ»º´æÊÇ8MB´óС£¬Ã¿Ò»¿é»º´æÖеÄÊý¾Ý»¹ÊÇ¿ÉÄÜ´¦ÓڼǼµÄÖм䣬ÕâЩÇé¿ö¶¼ÐèÒªÌØÊâ´¦Àí¡£Impala´¦Àíÿһ¸ö·ÖÇøµÄʱºòÊ×ÏÈɨÃèµ½¸Ã·ÖÇøµÄµÚÒ»Ìõ¼Ç¼£¬µ±´¦ÀíÍê³É¸Ã·ÖÇø£¬Èç¹û·ÖÇøµÄ½áβÊÇÒ»Ìõ²»ÍêÕûµÄ¼Ç¼Ôò¼ÌÐøÍùÏÂɨÃèµ½¸Ã¼Ç¼½áÊøÎ»Ö᣶øÕý³£Çé¿öÏ£¬ScannerÖ»ÐèÒª¸ù¾ÝÐзָô·û½âÎö³öÿһÐУ¬¶ÔÓÚÿһÐиù¾ÝÐèÒª½âÎöµÄÁн«Æä±£´æ£¬¶øÖ±½ÓÌø¹ý²»ÐèÒª½âÎöµÄÁУ¬µ«ÊǶÔÓÚTEXTÕâÖÖÐÐʽ´æ´¢µÄÎļþ¸ñʽÐèÒªÊ×ÏȶÁȡȫ²¿µÄÊý¾Ý£¬È»ºó±éÀúÈ«²¿µÄÊý¾Ý£¬¶ø¶ÔÓÚParquetÖ®ÀàµÄÁÐʽ´æ´¢£¬ËäȻҲÐèÒª¶Áȡÿһ¸ö·ÖÇøµÄÊý¾Ý£¬µ«ÊÇÓÉÓÚÿһÁеÄÊý¾Ý´æ´¢ÔÚÒ»Æð£¬É¨ÃèµÄʱºòÖ»ÐèҪɨÃèÐèÒªµÄÁС£Õâ²ÅÊÇÁÐʽ´æ´¢¿ÉÒÔ¼õÉÙÊý¾ÝµÄɨÃ裬¶ø²»ÊǽÏÉÙÊý¾ÝµÄ¶ÁÈ¡¡£µ±È»ParquetÎļþÒ»°ãʹÓÃÊý¾ÝѹËõË㷨ʹµÃÊý¾ÝÁ¿Ô¶Ð¡ÓÚTEXT¸ñʽ¡£
ÎÞÂÛÊÇÄÄÖÖÎļþ¸ñʽ£¬Í¨¹ý½âÎöÆ÷½âÎö³öÒ»ÌõÌõ¼Ç¼£¬Ã¿Ò»Ìõ¼Ç¼ÖÐÖ»°üº¬¸Ã±íÐèÒª¶ÁÈ¡µÄÁеÄÄÚÈÝ£¬×é×°³ÉÒ»Ìõ¼Ç¼֮ºó»áͨ¹ý¸Ã±íµÄfilterÌõ¼þºÍruntime
filterÅжϸÃÌõ¼Ç¼ÊÇ·ñÐèÒª±»ÌÔÌ¡£¿ÉÒÔ¿´³ö£¬ScanNodeÖ´ÐÐÁËProjectºÍν´ÊÏÂÍÆµÄ¹¦ÄÜ¡£ËùÓÐû±»ÌÔ̵ļǼ°´ÕÕrow_batchµÄ½á¹¹×é×°ÔÚÒ»Æð£¬Ã¿Ò»¸örow_batchĬÈÏÇé¿öÏÂÊÇ1024ÐУ¬²éѯ¿Í»§¶Ë¿ÉÒÔʹÓÃBATCH_SIZEÅäÖÃÏîÉèÖᣵ«Êǹý´óµÄrow_batch´óСÐèÒªÕ¼Óøü´óµÄÄڴ棬¿ÉÄܽµµÍExecNodeÖ®¼äµÄ²¢·¢¶È£¬ÒòΪExecNodeÐèÒªµÈµ½×Ó½ÚµãÍê³ÉÒ»¸örow_batchµÄ×é×°²Å½øÐб¾½ÚµãµÄ¼ÆËã¡£ÓÉÓÚScan²Ù×÷ÊÇÓÉScannerÏß³ÌÖÐÍê³ÉµÄ£¬Ã¿´ÎScanner×é×°Íê³ÉÖ®ºó½«Æä·Åµ½Ò»¸öBlockingQueueÖУ¬µÈ´ý¸¸½Úµã´Ó¸ÃQueueÖлñÈ¡½øÐÐ×ÔÉíµÄ´¦ÀíÂß¼£¬µ±È»¿ÉÄÜ´æÔÚ¸¸½ÚµãºÍ×Ó½ÚµãÖ´ÐÐÆµÂʲ»Ò»ÖµÄÇé¿ö£¬µ¼ÖÂBlockingQueue¶ÓÁб»·ÅÂú£¬´ËʱScannerÏ߳̽«±»×èÈû£¬²¢ÇÒÒ²²»»á´´½¨ÐµÄScannerÏ̡߳£
Êý¾ÝѹËõ
×îºóÎÒÃǼòµ¥µÄÁÄÒ»ÏÂÎļþѹËõ£¬Í¨³£ÔÚÁĵ½OLAPÓÅ»¯·½Ê½µÄʱºò¶¼»áÌáµ½Êý¾ÝѹËõ£¬ÏàͬµÄÊý¾ÝѹËõÖ®ºó¿ÉÒÔÓкܴó³Ì¶ÈµÄÊý¾ÝÌå»ýµÄ½µµÍ£¬µ«ÊÇͨ¹ýѧϰimpalaµÄÊý¾Ý¶ÁÈ¡Á÷³Ì£¬impalaͨ¹ýÎļþÃûµÄºó׺ÅжÏÎļþʹÓÃÁËÄÄÖÖѹËõËã·¨£¬¶ÔÓÚʹÓÃÁËѹËõµÄÎļþ£¬ËäÈ»¶ÁÈ¡µÄÊý¾ÝÁ¿¼õÉÙÁËÐí¶à£¬µ«ÊÇÐèÒªÏûºÄ´óÁ¿µÄCPU×ÊÔ´½øÐнâѹËõ£¬½âѹËõÖ®ºóµÄÊý¾ÝÆäʵºÍ·ÇѹËõµÄÊý¾ÝÊÇÒ»ÑùµÄ£¬Òò´Ë¶ÔÓÚ½âÎö²Ù×÷´¦ÀíµÄÊý¾ÝÁ¿Á½Õß²¢Ã»ÓÐÈκβîÒì¡£Òò´ËʹÓÃÊý¾ÝѹËõÖ»²»¹ýÊÇÒ»¸öI/O×ÊÔ´»»È¡CPU×ÊÔ´µÄ³£ÓÃÊֶΣ¬µ±Ò»¸ö¼¯ÈºÖÐI/O¸ºÔرȽϸ߿ÉÒÔ¿¼ÂÇʹÓÃÊý¾ÝѹËõ½µµÍI/OÏûºÄ£¬¶øÏà·´CPU¸ºÔرȽϸߵÄϵͳÔòͨ³£²»ÐèÒª½øÐÐÊý¾ÝѹËõ¡£
×ܽá
ºÃÁË£¬ÔÚ½áÊøÖ®Ç°ÎÒÃÇ×ܽáÒ»ÏÂImpala¶ÁÈ¡HDFSÊý¾ÝµÄÂß¼£¬Ê×ÏÈImpala»á½«Êý¾ÝɨÃèºÍÊý¾Ý¶ÁÈ¡Ï̷߳ÖÀ룬ImpaladÔÚÆô¶¯µÄʱºò³õʼ»¯ËùÓдÅÅ̺ÍÔ¶³ÌHDFS·ÃÎʵÄỊ̈߳¬ÕâЩÏ̸߳ºÔðËùÓÐÊý¾Ý·ÖÇøµÄ¶ÁÈ¡¡£Impala¶ÔÓÚÿһ¸öSQL²éѯ¸ù¾Ý±íµÄÔªÊý¾ÝÐÅÏ¢¶Ôÿһ¸ö±íɨÃèµÄÊý¾Ý½øÐзÖÇø£¨¾¹ý·ÖÇø¼ôÖ¦Ö®ºó£©£¬²¢¼Ç¼ÿһ¸ö·ÖÇøµÄλÖÃÐÅÏ¢¡£BE¸ù¾Ýÿһ¸ö·ÖÇøµÄλÖÃÐÅÏ¢¶Ô×ÓÈÎÎñ½øÐзÖÅ䣬¾¡¿ÉÄܱ£Ö¤Êý¾ÝµÄ±¾µØ¶ÁÈ¡ºÍÈÎÎñ·ÖÅäµÄ¾ùºâÐÔ¡£Ã¿Ò»¸ö×ÓÈÎÎñ½»¸ø²»Í¬µÄBackendÄ£¿éÖ´ÐУ¬Ê×ÏÈ»áΪ×ÓÈÎÎñ´´½¨Ö´ÐÐÊ÷£¬HdfsScanNode½Úµã¸ºÔðÊý¾ÝµÄ¶ÁÈ¡ºÍɨÃ裬ͨ³£ÊÇÖ´ÐÐÊ÷µÄº¢×ӽڵ㣬ִÐÐʱÊ×ÏȽ«¸ÃHdfsScanNodeÐèҪɨÃèµÄ·ÖÇøÇëÇóDisk
I/OÏ̳߳ØÖ´ÐÐÊý¾Ý¶ÁÈ¡£¬È»ºó´´½¨ScannerÏ̴߳¦ÀíÊý¾ÝɨÃèºÍ½âÎö£¬½âÎöʱ¸ù¾Ý²»Í¬µÄÎļþÀàÐÍ´´½¨³ö²»Í¬µÄScanner¶ÔÏ󣬸öÔÏó´¦ÀíÊý¾ÝµÄ½âÎö£¬×é×°³ÉÒ»¸ö¸öµÄrow_batch¶ÔÏ󽻸ø¸¸½ÚµãÖ´ÐС£Ö±µ½ËùÓеķÖÇø¶¼ÒѾ±»¶ÁÈ¡²¢Íê³ÉɨÃèºÍ½âÎö¡£
×ܽáÏÂÀ´£¬Impala´¦ÀíHdfsScanNodeµÄÐÔÄÜ»¹ÊÇÓÐÆä¶Àµ½Ö®´¦µÄ£¬ÕâÒ²´ÙʹÁËImpalaÖÐÒ»°ãÊý¾Ý¶ÁÈ¡ºÍɨÃè²»»á³ÉΪ²éѯµÄÆ¿¾±£¬·´¶ø¾ÛºÏºÍJOIN²Ù×÷ÓÐʱ»áÍÏÂý²éѯËÙ¶È£¬´Ó±¾ÎĵķÖÎöÖпÉÒÔ¿´µ½Impala´¦ÀíHDFSÊý¾ÝԴʱÓÐÈçϼ¸µãÓÅ»¯£º
Êý¾ÝλÖÃ×÷Ϊ±íÔªÊý¾Ý´æ´¢£¬ÈÎÎñ·ÖÅäʱ³ä·Ö¿¼Âǵ½Êý¾Ý±¾µØÐкÍÈÎÎñ·ÖÅäµÄ¾ùºâÐÔ¡£
Êý¾Ý¶ÁÈ¡Ï̺߳ÍÊý¾Ý´¦ÀíÏ̷߳ÖÀ룬Á½Õß¿ÉÒÔ²¢Ðд¦Àí¡£
ͨ¹ýHDFSµÄshortcut»úÖÆÊµÏÖ±¾µØÊý¾Ý¶ÁÈ¡£¬ÌáÉý±¾µØ¶ÁµÄÐÔÄÜ
ÔÚScan½ÚµãÉÏÖ´ÐÐprojectºÍfilter´¦Àí£¬¼õÉÙÉϲã½ÚµãµÄÄڴ濽±´ºÍÍøÂç´«Êä
ʹÓÃcodegen¼¼Êõ½µµÍÔËÐÐÖеÄÐ麯Êýµ÷ÓÃËðºÄºÍÉú³ÉÌØ¶¨µÄ´úÂ룬ʹÓÃSSEÖ¸ÁîÌáÉýÊý¾Ý´¦ÀíÐÔÄÜ¡£
ʹÓÃbatch»úÖÆÅúÁ¿´¦ÀíÊý¾Ý£¬¼õÉÙº¯Êýµ÷ÓôÎÊý¡£
±¾ÎÄÏêϸ½éÉÜÁËImpalaÈçºÎʵÏÖHdfsScanNodeÖ´Ðнڵ㣬¸Ã½ÚµãÊÇËùÓвéѯSQL»ñÈ¡Êý¾ÝµÄÔ´Í·£¬Òò´ËÊÇÊ®·ÖÖØÒªµÄ£¬µ±È»ImpalaÖ§³ÖµÄHDFS¸ñʽ»¹ÊDZȽÏÓÐÏ޵쬶ÔÓÚORC¸ñʽ²»Äܹ»Ö§³Ö£¬¶ø¶ÔÓÚJSON¸ñʽµÄɨÃèÎÒÃÇÍê³É¸÷ÄÚ²¿µÄ¿ª·¢°æ±¾£¬ÓдýÓÚ½øÒ»²½ÐÔÄÜÓÅ»¯£¬±¾ÎÄÖÐÌáµ½ÁËÊý¾ÝɨÃè¹ý³ÌÖлá¸ù¾Ý¹ýÂËÌõ¼þºÍruntime
filter½øÐÐÊý¾ÝµÄ¹ýÂË£¬ÕâÖÖν´ÊÏÂÍÆÒ²ÊǸ÷ÖÖ´óÊý¾ÝÒýÇæÐÔÄÜÓÅ»¯µÄÒ»´óÒªµã£¬¶øruntime filter¿ÉνÊÇimpalaµÄ¶À¼ÒÃØóÅ¡£
|