Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Model Center   Code  
»áÔ±   
   
 
     
   
 ¶©ÔÄ
  ¾èÖú
BigData£­¡®»ùÓÚ´ú¼ÛÓÅ»¯¡¯¾¿¾¹ÊÇÔõôһ»ØÊ£¿
 
×÷Õߣº ·¶ÐÀÐÀ À´Ô´£ºhbasefly.com ·¢²¼ÓÚ£º2017-6-8
  2553  次浏览      27
 

CBO»ù±¾Ô­Àí

Ìáµ½CBO£¬¾Í²»µÃ²»ÌáÆðһλ¡¯ÀÏÊìÈË¡¯ ¨C »ùÓÚ¹æÔòÓÅ»¯£¨Rule-Based Optimization£¬¼ò³ÆRBO£©¡£RBOÊÇÒ»ÖÖ¾­Ñéʽ¡¢Æô·¢Ê½µÄÓÅ»¯Ë¼Â·£¬ÓÅ»¯¹æÔò¶¼ÒѾ­Ô¤Ïȶ¨ÒåºÃ£¬Ö»ÐèÒª½«SQLÍùÕâЩ¹æÔòÉÏÌ׾ͿÉÒÔ£¨¶ÔRBO»¹²»Á˽âµÄͯЬ£¬¿ÉÒԲο¼±ÊÕßµÄÁíһƪÎÄÕ ¨C ¡¶´Ó0µ½1ÈÏʶCatalyst¡·)¡£Ëµ°×ÁË£¬RBO¾ÍÏñÊÇÒ»¸ö¾­Ñé·á¸»µÄÀÏ˾»ú£¬»ù±¾Ì×·ȫ¶¼ÖªµÀ¡£

È»¶øÊÀ½çÉÏÓÐÒ»ÖÖ¶«Î÷½Ð×ö ¨C ²»°´Ì×·À´£¬ÓëÆä˵Ëü²»°´Ì×·À´£¬µ¹²»Èç˵Ëü±¾Éí²¢Ã»ÓÐʲôÌ×·¡£×îµäÐ͵ÄιýÓÚ¸´ÔÓJoinËã×ÓÓÅ»¯£¬¶ÔÓÚÕâЩJoinÀ´Ëµ£¬Í¨³£ÓÐÁ½¸öÑ¡ÔñÌâÒª×ö£º

1. JoinÓ¦¸ÃÑ¡ÔñÄÄÖÖËã·¨²ßÂÔÀ´Ö´ÐУ¿BroadcastJoin or ShuffleHashJoin or SortMergeJoin£¿²»Í¬µÄÖ´ÐвßÂÔ¶ÔϵͳµÄ×ÊÔ´ÒªÇó²»Í¬£¬Ö´ÐÐЧÂÊÒ²ÓÐÌìÈÀÖ®±ð£¬Í¬Ò»¸öSQL£¬Ñ¡Ôñµ½ºÏÊʵIJßÂÔÖ´ÐпÉÄÜÖ»ÐèÒª¼¸ÃëÖÓ£¬¶øÈç¹ûûÓÐÑ¡Ôñµ½ºÏÊʵÄÖ´ÐвßÂԾͿÉÄܻᵼÖÂϵͳOOM¡£

2. ¶ÔÓÚÑ©»¨Ä£ÐÍ»òÕßÐÇÐÍÄ£ÐÍÀ´½²£¬¶à±íJoinÓ¦¸ÃÑ¡ÔñʲôÑùµÄ˳ÐòÖ´ÐУ¿²»Í¬µÄJoin˳ÐòÒâζ×Ų»Í¬µÄÖ´ÐÐЧÂÊ£¬±ÈÈçA join B join C£¬A¡¢B±í¶¼ºÜ´ó£¬C±íºÜС£¬ÄÇA join BºÜÏÔÈ»ÐèÒª´óÁ¿µÄϵͳ×ÊÔ´À´ÔËË㣬ִÐÐʱ¼ä±ØÈ»²»»á¶Ì¡£¶øÈç¹ûʹÓÃA join C join BµÄÖ´ÐÐ˳Ðò£¬ÒòΪC±íºÜС£¬ËùÒÔA join C»áºÜ¿ìµÃµ½½á¹û£¬¶øÇÒ½á¹û¼¯»áºÜС£¬ÔÙʹÓÃСµÄ½á¹û¼¯ join B£¬ÐÔÄÜÏÔ¶øÒ×¼û»áºÃÓÚǰһÖÖ·½°¸¡£

´ó¼ÒÏëÏ룬ÕâÓÐʲô¹Ì¶¨µÄÓÅ»¯¹æÔòô£¿²¢Ã»ÓС£Ëµ°×ÁË£¬ÄãÐèÒªÖªµÀ¸ü¶à¹ØÓÚ±íµÄ»ù´¡ÐÅÏ¢£¨±í´óС¡¢±í¼Ç¼×ÜÌõÊýµÈ£©£¬ÔÙͨ¹ýÒ»¶¨¹æÔò´ú¼ÛÆÀ¹À²ÅÄÜ´ÓÖÐÑ¡ÔñÒ»Ìõ×îÓŵÄÖ´Ðмƻ®¡£CBOÒâΪ»ùÓÚ´ú¼ÛÓÅ»¯²ßÂÔ£¬¾ÍÊÇ´Ó¶à¸ö¿ÉÄܵÄÓï·¨Ê÷ÖÐÑ¡ÔñÒ»Ìõ´ú¼Û×îСµÄÓï·¨Ê÷À´Ö´ÐУ¬»»¸ö˵·¨£¬CBOµÄºËÐÄÔÚÓÚÆÀ¹À³öÒ»Ìõ¸ø¶¨Óï·¨Ê÷µÄʵ¼Ê´ú¼Û¡£±ÈÈçÏÂÃæÕâ¿ÅSQLÓï·¨Ê÷£º

ÒªÆÀ¹À¸ø¶¨Õû¿ÃÊ÷µÄ´ú¼Û£¬·Ö¶øÖÎÖ®Ö»ÐèÒªÆÀ¹Àÿ¸ö½ÚµãÖ´ÐеĴú¼Û£¬×îºó½«ËùÓнڵã´ú¼ÛÀÛ¼Ó¼´¿É¡£¶øÒªÆÀ¹Àµ¥¸ö½ÚµãÖ´ÐÐʵ¼Ê´ú¼Û£¬ÓÖÐèÒªÖªµÀÁ½µã£¬ÆäÒ»ÊÇÕâÖÖËã×ӵĴú¼Û¹æÔò£¬Ã¿ÖÖËã×ӵĴú¼Û¼ÆËã¹æÔò±ØÈ»¶¼²»Í¬£¬±ÈÈçMerge-Sort Join¡¢Shuffle Hash Join¡¢GroupBy¶¼ÓÐ×Ô¼ºµÄÒ»Ì×´ú¼Û¼ÆËãËã·¨¡£Æä¶þÊDzÎÓë²Ù×÷µÄÊý¾Ý¼¯»ù±¾ÐÅÏ¢£¨´óС¡¢×ܼǼÌõÊý£©£¬±ÈÈçʵ¼Ê²ÎÓëMerge-Sort JoinµÄÁ½±í´óС£¬×÷Ϊ½Úµãʵ¼ÊÖ´Ðдú¼ÛµÄÒ»¸öÖØÒªÒòËØ£¬µ±È»·Ç³£ÖØÒª¡£ÊÔÏ룬ͬÑùÊÇTable Scan²Ù×÷£¬´ó±íºÍС±íµÄÖ´Ðдú¼Û±ØÈ»²»Í¬¡£

Ϊ¸ø¶¨Ëã×ӵĴú¼Û½øÐÐÆÀ¹À˵µ½µ×Ò²ÊÇÒ»ÖÖËã·¨£¬Ëã·¨¶¼ÊÇËÀµÄ£¬ÔÝÇÒ²»±í£¬ÏÂÎÄÏêÊö¡£¶ø²ÎÓëµÄÊý¾Ý¼¯»ù±¾ÐÅϢȴÊÇ»îµÄ£¬ÎªÊ²Ã´Èç´Ë˵£¬ÒòΪÕâЩÊý¾Ý¼¯¶¼ÊÇԭʼ±í¾­¹ý¹ýÂË¡¢¾ÛºÏÖ®ºóµÄÖмä½á¹û£¬Ã»ÓйæÔòÖ±½Ó¸æËßÄãÕâ¸öÖмä½á¹ûÓжàÉÙÊý¾Ý£¡ÄÇÖмä½á¹ûµÄ»ù±¾ÐÅÏ¢ÈçºÎÆÀ¹ÀÄØ£¿ÍƵ¼£¡¶Ô£¬Ô­Ê¼±í»ù±¾ÐÅÏ¢ÎÒÃÇÊÇ¿ÉÒÔÖªµÀµÄ£¬Èç¹ûÄܹ»Ò»²ãÒ»²ãÏòÉÏÍÆµ¼£¬ÊDz»ÊǾÍÓпÉÄÜÖªµÀËùÇóÖмä½á¹ûÐÅÏ¢£¡

ÕâÀïÓÖ½«ÈÎÒâ½ÚµãÖмä½á¹ûÐÅÏ¢ÆÀ¹À²ð·ÖΪÁ½¸ö×ÓÎÊÌ⣺Ê×ÏÈÆÀ¹ÀÒ¶×ӽڵ㣨ԭʼ±í£©µÄ»ù±¾ÐÅÏ¢£¬Æä´ÎÒ»²ãÒ»²ãÍùÉÏÍÆµ¼¡£ÆÀ¹Àԭʼ±í»ù±¾ÐÅÏ¢ÏëÏë×ÜÊÇÓа취µÄ£¬´Ö±©µã¾ÍÈ«±íɨÃ裬»ñÈ¡¼Ç¼ÌõÊý¡¢×î´óÖµ¡¢×îСֵ£¬×ÜÖ®ÊÇ¿ÉÒÔ×öµ½µÄ¡£ÄÇ»ù±¾ÐÅÏ¢ÈçºÎÒ»²ãÒ»²ãÍùÉÏÍÆµ¼ÄØ£¿¹æÔò£¡±ÈÈçԭʼ±í¾­¹ý id = 12Õâ¸öFilter¹ýÂËÖ®ºóµÄÊý¾Ý¼¯ÐÅÏ¢£¨Êý¾Ý¼¯´óСµÈ£©¾Í¿ÉÒÔ¾­¹ýÒ»¶¨µÄ¹æÔòÍÆµ¼³öÀ´£¬²»Í¬Ëã×ÓÓв»Í¬µÄ¹æÔò£¬ÏÂÎÄÏêÊö£¡

ºÃ°É£¬ÉÏÎÄ»¨·ÑÁË´óÁ¿Ê±¼ä½«Ò»¸öÍêÕûµÄCBO½âÆÊµÄÁãÁãËéË飬±ä³ÉÁËÒ»¶Ñ¹æÔò¼Óԭʼ±íµÄɨÃè¡£ÏàÐÅ´ó¼Ò¶¼ÓеããÂãµġ£Äª»Å£¬ÎÒÃÇÔÙÀ´ÀíÒ»±é£º

1. »ùÓÚ´ú¼ÛÓÅ»¯£¨CBO£©Ô­ÀíÊǼÆËãËùÓÐÖ´Ðз¾¶µÄ´ú¼Û£¬²¢ÌôÑ¡´ú¼Û×îСµÄÖ´Ðз¾¶¡£ÎÊÌâת»¯Îª£ºÈçºÎ¼ÆËãÒ»Ìõ¸ø¶¨Ö´Ðз¾¶µÄ´ú¼Û

2. ¼ÆËã¸ø¶¨Â·¾¶µÄÖ´Ðдú¼Û£¬Ö»ÐèÒª¼ÆËãÕâÌõ·¾¶ÉÏÿ¸ö½ÚµãµÄÖ´Ðдú¼Û£¬×îºóÏà¼Ó¼´¿É¡£ÎÊÌâת»¯Îª£ºÈçºÎ¼ÆËãÆäÖÐÈÎÒâÒ»¸ö½ÚµãµÄÖ´Ðдú¼Û

3. ¼ÆËãÈÎÒâ½ÚµãµÄÖ´Ðдú¼Û£¬Ö»ÐèÒªÖªµÀµ±Ç°½ÚµãËã×ӵĴú¼Û¼ÆËã¹æÔòÒÔ¼°²ÎÓë¼ÆËãµÄÊý¾Ý¼¯£¨Öмä½á¹û£©»ù±¾ÐÅÏ¢£¨Êý¾ÝÁ¿´óС¡¢Êý¾ÝÌõÊýµÈ£©¡£ÎÊÌâת»¯Îª£ºÈçºÎ¼ÆËãÖмä½á¹ûµÄ»ù±¾ÐÅÏ¢ÒÔ¼°¶¨ÒåËã×Ó´ú¼Û¼ÆËã¹æÔò

4. Ëã×Ó´ú¼Û¼ÆËã¹æÔòÊÇÒ»ÖÖËÀµÄ¹æÔò£¬¿É¶¨Òå¡£¶øÈÎÒâÖмä½á¹û»ù±¾ÐÅÏ¢ÐèҪͨ¹ýԭʼ±í»ù±¾ÐÅϢ˳×ÅÓï·¨Ê÷Ò»²ãÒ»²ãÍùÉÏÍÆµ¼µÃ³ö¡£ÎÊÌâת»¯Îª£ºÈçºÎ¼ÆËãԭʼ±í»ù±¾ÐÅÏ¢ÒÔ¼°¶¨ÒåÍÆµ¼¹æÔò

ºÜÏÔÈ»£¬ÉÏÊö¹ý³ÌÊÇ˼ά¹ý³Ì£¬ÕæÕý¹¤³Ìʵ¼ùÊÇ·´×ÅÓÉÏÂÍùÉÏÒ»²½Ò»²½Ö´ÐУ¬×îÖյõ½´ú¼Û×îСµÄÖ´Ðз¾¶¡£ÏÖÔÚÔÙ°ÑËü´ÓÒ»¸ö¸öÁã¼þ×é×°ÆðÀ´£º

1. Ê×ÏȲɼ¯Ô­Ê¼±í»ù±¾ÐÅÏ¢

2. ÔÙ¶¨ÒåÿÖÖËã×ӵĻùÊýÆÀ¹À¹æÔò£¬¼´Ò»¸öÊý¾Ý¼¯¾­¹ý´ËËã×ÓÖ´ÐÐÖ®ºó»ù±¾ÐÅÏ¢±ä»¯¹æÔò¡£ÕâÁ½²½Íê³ÉÖ®ºó¾Í¿ÉÒÔÍÆµ¼³öÕû¸öÖ´Ðмƻ®Ê÷ÉÏËùÓÐÖмä½á¹û¼¯µÄÊý¾Ý»ù±¾ÐÅÏ¢

3. ¶¨ÒåÿÖÖËã×ÓµÄÖ´Ðдú¼Û£¬½áºÏÖмä½á¹û¼¯µÄ»ù±¾ÐÅÏ¢£¬´Ëʱ¿ÉÒԵóöÈÎÒâ½ÚµãµÄÖ´Ðдú¼Û

4. ½«¸ø¶¨Ö´Ðз¾¶ÉÏËùÓÐËã×ӵĴú¼ÛÀۼӵõ½Õû¿ÃÓï·¨Ê÷µÄ´ú¼Û

5. ¼ÆËã³öËùÓпÉÄÜÓï·¨Ê÷´ú¼Û£¬²¢Ñ¡³öÒ»Ìõ´ú¼Û×îСµÄ

CBO»ù±¾ÊµÏÖ˼·

ÉÏÎÄ´ÓÀíÂÛ²ãÃæ·ÖÎöÁËCBOµÄʵÏÖ˼·£¬½«ÍêÕûµÄCBO¹¦Äܲð·ÖΪÁ˶à¸ö×Ó¹¦ÄÜ£¬½ÓÏÂÀ´ÁÄÁĶÔÿһ¸ö×Ó¹¦ÄܵÄʵÏÖ¡£

µÚÒ»²½£º²É¼¯²Îԭʼ±í»ù±¾ÐÅÏ¢

Õâ¸ö²Ù×÷ÊÇCBO×î»ù´¡µÄÒ»Ï×÷£¬²É¼¯µÄÖ÷ÒªÐÅÏ¢°üÀ¨±í¼¶±ðÖ¸±êºÍÁм¶±ðÖ¸±ê£¬ÈçÏÂËùʾ£¬estimatedSizeºÍrowCountΪ±í¼¶±ðÐÅÏ¢£¬basicStatsºÍHistogramsΪÁм¶±ðÐÅÏ¢£¬ºóÕßÁ£¶È¸üϸ£¬¶ÔÓÅ»¯¸ü¼ÓÖØÒª¡£

estimatedSize: ÿ¸öLogicalPlan½ÚµãÊä³öÊý¾Ý´óС£¨½âѹ£©

rowCount: ÿ¸öLogicalPlan½ÚµãÊä³öÊý¾Ý×ÜÌõÊý

basicStats: »ù±¾ÁÐÐÅÏ¢£¬°üÀ¨ÁÐÀàÐÍ¡¢Max¡¢Min¡¢number of nulls, number of distinct values, max column length, average column lengthµÈ

Histograms: Histograms of columns, i.e., equi-width histogram (for numeric and string types) and equi-height histogram (only for numeric types).

ÕâÀïÓÐÁ½¸öÎÊÌâÖµµÃ˼¿¼£º

1. ΪʲôҪ²É¼¯ÕâЩÐÅÏ¢£¿Ã¿¸ö¶ÔÏóÔÚÓÅ»¯¹ý³ÌÖÐÆðµ½Ê²Ã´×÷Óã¿

2. ʵ¼Ê¹¤³ÌÒ»°ãÊÇÈçºÎʵÏÖÕâЩÊý¾Ý²É¼¯µÄ£¿

ΪʲôҪ²É¼¯ÕâЩÐÅÏ¢£¿ºÜÏÔÈ»£¬estimatedSizeºÍrowCountÕâÁ½¸öÖµÊÇËã×Ó´ú¼ÛÆÀ¹ÀµÄÖ±¹ÛÌåÏÖ£¬ÕâÁ½¸öÖµÔ½´ó£¬¸ø¶¨Ëã×ÓÖ´Ðдú¼Û±ØÈ»Ô½´ó£¬ËùÒÔÕâÁ½¸öÖµºóÐø»áÓÃÀ´ÆÀ¹Àʵ¼ÊËã×ÓµÄÖ´Ðдú¼Û¡£ÄÇbasicStatsºÍHistogramsÕâÁ©ÓÃÀ´¸ÉɶĨ£¬Òª²»Íü³õÐÄ£¬Ö®ËùÒԲɼ¯Ô­Ê¼±íµÄÕâЩÐÅÏ¢£¬ÊÇΪÁË˳×ÅÖ´ÐÐÓï·¨Ê÷ÍùÉÏÒ»²ãÒ»²ãÍÆµ¼³öËùÓÐÖмä½á¹ûµÄ»ù±¾ÐÅÏ¢£¬ÕâÁ©¾ÍÊÇÀ´¸ÉÕâ¸öµÄ£¬ÖÁÓÚÔõôʵÏֵģ¬ÏÂһС½Ú»á¾Ù¸öÀý×Ó½âÊÍ¡£

ʵ¼Ê¹¤³ÌÈçºÎʵÏÖÕâЩÊý¾Ý²É¼¯£¿Ò»°ãÓÐÁ½ÖֱȽϿÉÐеķ½°¸£º´ò¿ªËùÓбíɨÃèÒ»±é£¬ÕâÑù×î¼òµ¥£¬¶øÇÒͳ¼ÆÐÅϢ׼ȷ£¬È±µãÊǶÔÓÚ´ó±íÀ´Ëµ´ú¼Û±È½Ï´ó£»Õë¶ÔһЩ´ó±í£¬É¨ÃèÒ»±é´ú¼ÛÌ«´ó£¬¿ÉÒÔ²ÉÓòÉÑù£¨sample£©µÄ·½Ê½Í³¼Æ¼ÆËã¡£

Ö§³ÖCBOµÄϵͳ¶¼ÓÐÃüÁî¶ÔԭʼÊý¾ÝÐÅÏ¢½øÐÐͳ¼Æ£¬±ÈÈçHiveµÄAnalyzeÃüÁî¡¢ImpalaµÄComputeÃüÁî¡¢GreenplumµÄAnalyzeÃüÁîµÈ£¬µ«ÊÇÐèҪעÒâÕâЩÃüÁî²¢²»ÊÇËæÊ±¶¼Ó¦¸ÃÖ´Ðеģ¬Ê×ÏÈÔÚ±íÊý¾ÝûÓдó±ä¶¯µÄÇé¿öÏÂû±ØÒªÖ´ÐУ¬Æä´ÎÔÚϵͳ²éѯ¸ß·¢ÆÚÒ²²»Ó¦¸ÃÖ´ÐС£ÕâÀïÓиö×î¼Ñʵ¼ù£º¾¡¿ÉÄÜÔÚÒµÎñµÍ·åÆÚ¶Ô±íÊý¾ÝÓнϴó±ä¶¯µÄ±íµ¥¶ÀÖ´ÐÐͳ¼ÆÃüÁÕâ¾ä»°ÓÐÈý¸öÖØµã£¬²»ÖªµÀÄã¿´³öÀ´Ã»ÓУ¿

µÚ¶þ²½£º¶¨ÒåºËÐÄËã×ӵĻùÊýÍÆµ¼¹æÔò

¹æÔòÍÆµ¼Òâ˼ÊÇ˵ÔÚµ±Ç°×Ó½Úµãͳ¼ÆÐÅÏ¢µÄ»ù´¡ÉÏ£¬¼ÆË㸸½ÚµãÏà¹ØÍ³¼ÆÐÅÏ¢µÄÒ»Ì×ÍÆµ¼¹æÔò¡£¶ÔÓÚ²»Í¬Ëã×Ó£¬ÍƵ¼¹æÔò±ØÈ»²»Ò»Ñù£¬±ÈÈçfliter¡¢group by¡¢limitµÈµÈµÄÆÀ¹ÀÍÆµ¼ÊDz»Í¬µÄ¡£ÕâÀïÒÔfilterΪÀý½øÐн²½â¡£ÏÈÀ´¿´¿´ÕâÑùÒ»¸öSQL£ºselect * from A , C where A.id = C.c_id and C.c_id > N £¬¾­¹ýRBOÖ®ºóµÄÓï·¨Ê÷ÈçÏÂͼËùʾ£º

ÎÊÌⶨÒåΪ£º¼ÙÈçÏÖÔÚÒѾ­ÖªµÀ±íCµÄ»ù±¾Í³¼ÆÐÅÏ¢£¨estimatedSize¡¢rowCount¡¢basicStatsÒÔ¼°histograms£©£¬ÈçºÎÍÆµ¼³ö¾­¹ýC.c_id > N¹ýÂ˺óÖмä½á¹ûµÄ»ù±¾Í³¼ÆÐÅÏ¢¡£ÎÒÃÇÀ´¿´¿´£º

1. ¼ÙÉèÒÑÖªCÁеÄ×îСֵc_id.Min¡¢×î´óÖµc_id.MaxÒÔ¼°×ÜÐÐÊýc_id.Distinct£¬Í¬Ê±¼ÙÉèÊý¾Ý·Ö²¼¾ùÔÈ£¬ÈçÏÂͼËùʾ£º

2. ÏÖÔÚ·Ö±ðÓÐÈýÖÖÇé¿öÐèҪ˵Ã÷£¬ÆäÒ»ÊÇNСÓÚc_id.Min£¬Æä¶þÊÇN´óÓÚc_id.Max£¬ÆäÈýÊÇN½éÓÚc_id.MinºÍc_id.MaxÖ®¼ä¡£Ç°Á½ÖÖ³¡¾°ÊǵÚÈýÖÖ³¡¾°µÄÌØÊâÇé¿ö£¬ÕâÀï¼òµ¥µÄÕë¶ÔµÚÈýÖÖ³¡¾°ËµÃ÷¡£ÈçÏÂͼËùʾ£º

ÔÚC.c_id > N¹ýÂËÌõ¼þÏ£¬c_id.Min»áÔö´óµ½N£¬c_id.Max±£³Ö²»±ä¡£¶ø¹ýÂ˺ó×ÜÐÐÊýc_id.distinct(after filter) £½ (c_id.Max ¨C N) / (c_id.Max ¨C c_id.Min) * c_id.distinct(before filter)

¼òµ¥°É£¬µ«ÊÇ×¢Òâ¹þ£¬ÉÏÃæ¼ÆËãÊÇÔÚ¼ÙÉèÊý¾Ý·Ö²¼¾ùÔȵÄǰÌáÏÂÍê³ÉµÄ£¬¶øÊµ¼Ê³¡¾°ÖÐÊý¾Ý·Ö²¼ºÜÏÔÈ»²»¿ÉÄܾùºâ¡£Êý¾Ý·Ö²¼Í¨³£³É¸ÅÂÊ·Ö²¼£¬histogramsÔÚÕâÀï¾ÍÒªµÇ³¡ÁË£¬Ëµ°×ÁËËü¾ÍÊÇÒ»¸öÖù×´·Ö²¼Í¼£¬ÈçÏÂͼ£º

Öù״ͼºá×ø±ê±íʾÁÐÖµ´óС·Ö²¼£¬×Ý×ø±ê±íʾƵÂÊ¡£¼ÙÉèNÔÚÈçͼËùʾλÖã¬ÄǹýÂ˺ó×ÜÐÐÊýc_id.distinct(after filter) £½ height(>N) / height(All) * c_id.distinct(before filter)

µ±È»£¬ÉÏÊöËùÓмÆËã¶¼Ö»ÊÇʾÒâÐÔ¼ÆËã£¬ÕæÊµËã·¨»á¸´ÔӺܶࡣÁíÍ⣬Èç¹û´ó¼Ò¶Ôgroup by ¡¢limitµÈν´ÊµÄÆÀ¹À¹æÔò±È½Ï¸ÐÐËȤµÄ»°£¬¿ÉÒÔÔĶÁSparkSQL CBOÉè¼ÆÎĵµ£¬Ôڴ˲»ÔÙ׸Êö¡£ÖÁ´Ë£¬Í¨¹ý¸÷ÖÖÆÀ¹À¹æÔòÒÔ¼°Ô­Ê¼±íͳ¼ÆÐÅÏ¢¾Í¿ÉÒÔ¼ÆËã³öÓï·¨Ê÷ÖÐËùÓÐÖмä½ÚµãµÄ»ù±¾Í³¼ÆÐÅÏ¢ÁË£¬ÕâÊÇÍòÀﳤÕ÷µÄµÚ¶þ²½£¬Ò²ÊÇÖÁ¹ØÖØÒªµÄÒ»²½¡£½ÓÏÂÀ´¼ÌÐøÍùǰ×ߣ¬¿´¿´ÈçºÎ¼ÆËãÿÖÖºËÐÄËã×ÓµÄʵ¼Ê´ú¼Û¡£

µÚÈý²½£ººËÐÄËã×Óʵ¼Ê´ú¼Û¼ÆËã

´òÎÄÕÂÒ»¿ªÊ¼¾Í¿ª¿Ú±Õ¿Ú´ú¼Û´ú¼ÛµÄ£¬¿Éµ½µ×ʲôÊÇ´ú¼Û£¬Ôõô¶¨Òå´ú¼Û£¿Õâô˵°É£¬Ã¿¸öϵͳ¶Ô´ú¼ÛµÄ¶¨Òå²¢²»·Ç³£Ò»Ö£¬ÓеÄÒòΪʵÏÖµÄÔ­ÒòÉèÖõıȽϼòµ¥£¬ÓеĻá±È½Ï¸´ÔÓ¡£ÕâÒ»½ÚÖ÷ÒªÀ´¼òµ¥ÁÄÁÄÿ¸ö½ÚµãµÄÖ´Ðдú¼Û£¬ÉÏÎÄ˵ÁË£¬Ò»ÌõÖ´Ðз¾¶µÄ×Ü´ú¼Û¾ÍÊÇÕâÌõ·¾¶ÉÏËùÓнڵãµÄ´ú¼ÛÀÛ¼ÓÖ®ºÍ¡£

ͨ³£À´½²£¬½Úµãʵ¼ÊÖ´Ðдú¼ÛÖ÷Òª´ÓÁ½¸öά¶ÈÀ´¶¨Ò壺CPU CostÒÔ¼°IO Cost¡£ÎªºóÐø½²½â·½±ãÆð¼û£¬ÐèÒªÏÈÐж¨ÒåһЩ»ù±¾²ÎÊý£º

Hr£º´ÓHDFSÉ϶ÁÈ¡1byteÊý¾ÝËùÐè´ú¼Û

Hw£ºÍùHDFSÉÏдÈë1byteÊý¾ÝËùÐè´ú¼Û

Tr£ºÊý¾Ý×ÜÌõÊý£¨the number of tuples in the relation £©

Tsz£ºÊý¾Ýƽ¾ù´óС£¨Average size of the tuple in the relation £©

CPUc£ºÁ½Öµ±È½ÏËùÐèCPU×ÊÔ´´ú¼Û£¨CPU cost for a comparison in nano seconds £©

NEt£º1byteÊý¾Ýͨ¹ýÍøÂçÔÚ¼¯Èº½Úµã¼ä´«Ê仨·Ñ´ú¼Û£¨the average cost of transferring 1 byte over network in the Hadoop cluster from any node to any node £©

¡­¡­

ÉÏÎÄ˵¹ý£¬Ã¿ÖÖËã×ÓµÄʵ¼ÊÖ´Ðдú¼Û¼ÆË㷽ʽ¶¼²»Í¬£¬Ôڴ˲»¿ÉÄÜÁоÙËùÓÐËã×Ó£¬¾ÍÌôÁ½¸ö±È½Ï¼òµ¥¡¢ÈÝÒ×Àí½âµÄÀ´·ÖÎö£¬µÚÒ»¸öÊÇTable ScanËã×Ó£¬µÚ¶þ¸öÊÇHash JoinËã×Ó¡£

Table ScanËã×Ó

ScanËã×ÓÒ»°ãλÓÚÓï·¨Ê÷µÄÒ¶×Ó½áµã£¬Ö±¹ÛÉÏÀ´½²ÕâÀàËã×ÓÖ»ÓÐIO Cost£¬CPU CostΪ0¡£Table Scan Cost = IO Cost = Tr * Tsz * Hr£¬ºÜ¼òµ¥£¬Tr * Tsz±íʾÐèÒªscanµÄÊý¾Ý×Ü´óС£¬ÔÙ³ËÒÔHr¾ÍÊÇËùÐè´ú¼Û¡£OK£¬ºÜÖ±¹Û£¬ºÜ¼òµ¥¡£

Hash JoinËã×Ó

ÒÔBroadcast Hash JoinΪÀý£¨Èç¹û¿´¹Ù¶ÔBroadcast Hash Join¹¤×÷Ô­Àí»¹²»Á˽⣬¿É´ÁÕâÀ£¬¼ÙÉè´ó±í·Ö²¼ÔÚn¸ö½ÚµãÉÏ£¬Ã¿¸ö½ÚµãµÄÊý¾ÝÌõÊý\ƽ¾ù´óС·Ö±ðΪTr(R1)\Tsz(R1)£¬Tr(R2)\Tsz(R2), ¡­ Tr(Rn)\Tsz(Rn)£¬Ð¡±íÊý¾ÝÌõÊýΪTr(Rsmall)\Tsz(Rsmall)£¬ÄÇôCPU´ú¼ÛºÍIO´ú¼Û·Ö±ðΪ£º

CPU Cost = С±í¹¹½¨Hash Table´ú¼Û £« ´ó±í̽²â´ú¼Û £½ Tr(Rsmall) * CPUc + (Tr(R1) + Tr(R2) + ¡­ + Tr(Rn)) * N * CPUc£¬´Ë´¦¼ÙÉèHashTable¹¹½¨ËùÐèCPU×ÊÔ´Ô¶Ô¶¸ßÓÚÁ½Öµ¼òµ¥±È½Ï´ú¼Û£¬ÎªN * CPUc

IO Cost = С±íscan´ú¼Û £« С±í¹ã²¥´ú¼Û £« ´ó±íscan´ú¼Û £½ Tr(Rsmall) * Tsz(Rsmall) * Hr + n * Tr(Rsmall) * Tsz(Rsmall) * NEt + (Tr(R1)* Tsz(R1) + ¡­ + Tr(Rn) * Tsz(Rn)) * Hr

ºÜÏÔÈ»£¬Hash JoinËã×ÓÏà±ÈTable ScanËã×ÓÀ´½²ÉÔÉÔ¸´ÔÓÁËÒ»µã£¬µ«ÊÇÎÞÂÛÄÄÖÖËã×Ó£¬´ú¼Û¼ÆËã¶¼ºÍ²ÎÓëµÄÊý¾Ý×ÜÌõÊý¡¢Êý¾Ýƽ¾ù´óСµÈÒòËØÖ±½ÓÏà¹Ø£¬ÕâÒ²¾ÍÊÇΪʲôÔÚ֮ǰÁ½¸ö²½ÖèÖÐÒª²»Ð¸ÓàÁ¦µØ¼ÆËãÖмä½á¹ûÏà¹ØÏêϸµÄÕæÕýÔ­Òò¡£¿ÉνÊDz½²½ÎªÓª¡¢»·»·Ïà¿Û¡£ÕâϺÃÁË£¬ÈÎÒâ½ÚµãµÄʵ¼Ê´ú¼Û¶¼ÄÜÆÀ¹À³öÀ´£¬ÄÇô¸ø¶¨ÈÎÒâÖ´Ðз¾¶µÄ´ú¼Û±ØÈ»Ò²¾ÍºÜ¼òµ¥à¶¡£

µÚËIJ½£ºÑ¡Ôñ×îÓÅÖ´Ðз¾¶£¨´ú¼Û×îСִÐз¾¶£©

Õâ¸ö˼·ºÜÈÝÒ×Àí½âµÄ£¬¾­¹ýÉÏÊöÈý²½µÄŬÁ¦£¬¿ÉÒÔºÜÈÝÒ׵ؼÆËã³öÈÎÒâÒ»Ìõ¸ø¶¨Â·¾¶µÄ´ú¼Û¡£ÄÇôÄãÖ»ÐèÒªÕÒ³öËùÓпÉÐеÄÖ´Ðз¾¶£¬Ò»¸öÒ»¸ö¼ÆË㣬¾Í±ØÈ»ÄÜÕÒµ½Ò»¸ö´ú¼Û×îСµÄ£¬Ò²¾ÍÊÇ×îÓŵÄÖ´Ðз¾¶¡£

ÕâÌõ·¿´ÆðÀ´È·ÊµºÜ¼òµ¥£¬µ«Êµ¼Ê×öÆðÀ´È´²¢²»ÄÇôÈÝÒ×£¬ÎªÊ²Ã´£¿ËùÓпÉÐеÄÖ´Ðз¾¶ÊµÔÚÌ«¶à£¬ËùÓз¾¶¶¼¼ÆËãÒ»±é£¬»Æ»¨²Ë¶¼Á¹ÁË¡£ÄÇôÓÐʲôºÃµÄ½â¾ö·½°¸Ã´£¿µ±È»£¬Æäʵ¿´µ½Õâ¸ö±êÌ⣭ѡÔñ´ú¼Û×îСִÐз¾¶£¬¾ÍÓ¦¸ÃºÜÈÝÒ×Ïëµ½£­¶¯Ì¬¹æ»®£¬Èç¹ûÄãûÓÐÏëµ½£¬ÄÇÖ»ÄÜ˵Ã÷ÄãûÓжÁ¹ý¡¶Êýѧ֮ÃÀ¡·¡¢Ã»Ë¢¹ýLeetCode¡¢Ã»Íæ¹ýACM£¬ACM¡¢LeetCodeÈç¹û¾õµÃÌ«¿ÝÔÄǾÍÈ¥¿´¿´¡¶Êýѧ֮ÃÀ¡·£¬Ëü»á¸æËßÄã´Óµ±Ç°Õâ¸öÄãËùÔڵĵط½¿ª³µÈ¥±±¾©£¬ÈçºÎʹÓö¯Ì¬¹æ»®Ñ¡ÔñÒ»Ìõ×î¶ÌµÄ·Ïß¡£Ôڴ˲»ÔÙ׸Êö¡£

ÖÁ´Ë£¬±ÊÕß´ÖÏßÌõµØ½éÉÜÁ˵±Ç°Ö÷Á÷SQLÒýÇæÊÇÈçºÎ½«CBOÕâôһ¸ö¿´ËƸßÉîµÄ¼¼ÊõÒ»²½Ò»²½Â䵨µÄ¡£½ÓÏÂÀ´£¬±ÊÕß½«»á½èÓÃHive¡¢ImpalaÕâÁ½´óSQLÒýÇæ¿ªÆôCBOÖ®ºóµÄÓÅ»¯Ð§¹ûÈôó¼Ò¶ÔCBOÓÐÒ»¸ö¸üÖ±¹ÛµÄÀí½â¡£

Hive ¨C CBOÓÅ»¯Ð§¹û

Hive±¾ÉíûÓÐÈ¥´ÓͷʵÏÖÒ»¸öSQLÓÅ»¯Æ÷£¬¶øÊǽèÖúÓÚApache Calcite £¬CalciteÊÇÒ»¸ö¿ªÔ´µÄ¡¢»ùÓÚCBOµÄÆóÒµ¼¶SQL²éѯÓÅ»¯¿ò¼Ü£¬Ä¿Ç°°üÀ¨Hive¡¢Phoniex¡¢KylinÒÔ¼°FlinkµÈÏîÄ¿¶¼Ê¹ÓÃÁËCalcite×÷ΪÆäÖ´ÐÐÓÅ»¯Æ÷£¬ÕâÒ²ºÜºÃÀí½â£¬Ö´ÐÐÓÅ»¯Æ÷±¾À´¾Í¿ÉÒÔ³éÏó³ÉÒ»¸öϵͳģ¿é£¬²¢Ã»ÓбØÒª»¨·Ñ´óÁ¿Ê±¼äÈ¥ÖØ¸´ÔìÂÖ×Ó¡£

hortonworksÔø¾­¶ÔHiveµÄCBOÌØÐÔ×öÁËÏà¹ØµÄ²âÊÔ£¬²âÊÔ½á¹ûÈÏΪCBOÖÁÉÙ¶Ô²éѯÓÐÈý¸öÖØÒªµÄÓ°Ï죺Join ordering optimization¡¢Bushy join supportÒÔ¼°Join simplification£¬±¾ÎÄÖ»¼òµ¥½éÉÜÒ»ÏÂJoin ordering optimization£¬ÓÐÐËȤµÄͬѧ¿ÉÒÔ¼ÌÐøÔĶÁÕâÆªÎÄÕÂÀ´¸ü¶àµØÁ˽âÆäËûÁ½¸öÖØÒªÓ°Ïì¡££¨ÏÂÃæÊý¾ÝÒÔ¼°Ê¾ÒâͼҲÀ´×ÔÓÚ¸ÃÆªÎÄÕ£¬ÌØ´Ë×¢Ã÷£©

hortonworks¶ÔTPCDSµÄ²¿·ÖQuery½øÐÐÁËÑо¿£¬·¢ÏÖ¶ÔÓڴ󲿷ÖÐÇÐÍ\Ñ©»¨Ä£ÐÍ£¬¶¼´æÔÚ¶àJoinÎÊÌ⣬ÕâЩJoin˳ÐòÈç¹û×éÖ¯²»ºÃ£¬ÐÔÄܾͻáºÜ²î£¬Èç¹û×éÖ¯µÃµ±£¬ÐÔÄܾͻáºÜºÃ¡£±ÈÈçQuery Q3£º

select
dt.d_year,
item.i_brand_id brand_id,
item.i_brand brand,
sum(ss_ext_sales_price) sum_agg
from
date_dim dt,
store_sales,
item
where
dt.d_date_sk = store_sales.ss_sold_date_sk
and store_sales.ss_item_sk = item.i_item_sk
and item.i_manufact_id =436
and dt.d_moy =12
groupby dt.d_year , item.i_brand , item.i_brand_id
order by dt.d_year , sum_agg desc , brand_id
limit 10

ÉÏÊöQueryÉæ¼°µ½3ÕÅ±í£¬Ò»ÕÅÊÂʵ±ístore_sales£¨Êý¾ÝÁ¿´ó£©ºÍÁ½ÕÅά¶È±í£¨Êý¾ÝÁ¿Ð¡£©£¬Èý±íÖ®¼äµÄ¹ØÏµÈçÏÂͼËùʾ£º

ÕâÀï¾ÍÉæ¼°ÉÏÎÄÌáµ½µÄJoin˳ÐòÎÊÌ⣬´Óԭʼ±íÀ´¿´£¬date_dimÓÐ73049Ìõ¼Ç¼£¬¶øitemÓÐ462000Ìõ¼Ç¼¡£ºÜÏÔÈ»£¬Èç¹ûûÓÐÆäËû°µÊ¾µÄ»°£¬Join˳Ðò±ØÈ»ÊÇstore_sales join date_dim join item¡£µ«ÊÇ£¬whereÌõ¼þÖл¹´øÓÐÁ½¸öÌõ¼þ£¬CBO»á¸ù¾Ý¹ýÂËÌõ¼þ¶Ô¹ýÂ˺óµÄÊý¾Ý½øÐÐÆÀ¹À£¬½á¹ûÈçÏ£º

¸ù¾ÝÉϱíËùʾ£¬¹ýÂ˺óµÄÊý¾ÝÁ¿itemÃ÷ÏÔ±Èdate_dimСµÄ¶à£¬¾çÇ鷴תµÄÓеã¿ì¡£ÓÚÊǺõ£¬¾­¹ýCBOÖ®ºóJoin˳Ðò¾Í±ä³ÉÁËstore_sales join item join date_time£¬ÎªÁ˽øÒ»²½È·ÈÏ£¬¿ÉÒÔÔÚ¿ªÆôCBOǰºó·Ö±ð¼Ç¼¸ÃSQLµÄÖ´Ðмƻ®£¬ÈçÏÂͼËùʾ£º

×óͼÊÇ먦ÆôCBOÌØÐÔʱQ3µÄÖ´Ðмƻ®£¬store_salesÏÈÓëdate_dim½øÐÐjoin£¬joinºóµÄÖмä½á¹ûÊý¾Ý¼¯ÓÐ140ÒÚÌõ¡£¶øÔÙ¿´ÓÒͼ£¬store_salesÏÈÓÚitem½øÐÐjoin£¬Öмä½á¹ûÖ»ÓÐ8200wÌõ¡£ºÜÏÔÈ»£¬ºóÕßÖ´ÐÐЧÂÊ»á¸ü¸ß£¬Êµ¼ù³öÕæÖª£¬À´¿´¿´Á½ÕßµÄʵ¼ÊÖ´ÐÐʱ¼ä£º

ÉÏͼºÜÃ÷ÏԵĿ´³öQ3ÔÚCBOµÄÓÅ»¯ÏÂÐÔÄܽ«½üÌáÉýÁË1±¶£¬Óë´Ëͬʱ£¬CPU×ÊԴʹÓÃÂÊÒ²½µµÍÁËÒ»°ë×óÓÒ¡£²»µÃ²»Ëµ£¬TPCDSÖÐÓкܶàÏàËÆµÄQuery£¬ÓÐÐËȤµÄͬѧ¿ÉÒÔÉîÈë½øÒ»²½ÉîÈëÁ˽⡣

Impala ¨C CBOÓÅ»¯Ð§¹û

ºÍHiveÓÅ»¯µÄÔ­ÀíÏàͬ£¬Ò²ÊÇÕë¶Ô¸´ÔÓjoinµÄÖ´ÐÐ˳Ðò¡¢JoinµÄÖ´ÐвßÂÔÑ¡ÔñÓÅ»¯µÈ·½Ãæ½øÐеÄÓÅ»¯£¬±¾ÈËʹÓÃTPC-DS¶ÔImpalaÔÚ¿ªÆôCBOÌØÐÔǰºóµÄ²¿·ÖQuery½øÐÐÁËÐÔÄܲâÊÔ£¬²âÊÔ½á¹ûÈçÏÂͼËùʾ£º

CBO×ܽá

ÕâÆªÎÄÕÂÆäʵºÜÔç¾Í¿ªÊ¼¹¹Ë¼ÁË£¬Ç°Ç°ºóºó»¨Á˽«½ü3¸öÔÂʱ¼ä¶Ï¶ÏÐøÐøÀ´Ð´£¬Ð´ÁËɾ¡¢É¾ÁËд£¬¼ÇµÃµÚ¶þ¸åÒѾ­Ð´Á˺ܶàÄÚÈÝ£¬ÓÐÌìÒ»´óÔçÐÑÀ´ÍêÍêÕûÕûµØ¿´ÁËÒ»±é£¬·¢ÏÖдµÄ¶«Î÷²¢²»ÊÇ×Ô¼ºÏëÒªµÄ£¬×¼È·Ëµ£¬Ð´µÄȱÉÙÄÇôһЩЩÌõÀíÐÔ£¬¸ÄÓÖ²»ºÃ¸Ä£¬Ë÷ÐÔ¾ÍȫɾÁË¡£ÁíÒ»·½Ã棬ҲÓÐÒòΪµ±Ç°ÍøÂçÉϲ¢Ã»ÓÐÌ«¶à¹ØÓÚCBOµÄÍêÕû½éÉÜ£¬µ¹ÊÇÕÒµ½Ò»Ð©Ó¢ÎÄ×ÊÁÏ£¬µ«×ܸоõ»¹ÊÇȱ·¦ÌõÀíÐÔ£¬ºÜÄÑÀí½â¡£±¾ÎĵÚÒ»½ÚÖØµã´Ó˼άÉÏ´ø´ó¼ÒÈÏʶCBO£¬µÚ¶þ½Ú¸ü¶àµÄ´ÓʵÏÖµÄÊÓ½ÇÒ»²½Ò»²½½«Õû¸öÔ­Àí´ÖÏßÌõµØÂ䵨£¬µÚÈý½ÚÌôÑ¡HiveÓëImpalaÁ½¿î²úÆ·¶Ô±È½éÉÜ¿ªÆôCBOÖ®ºóµÄÓÅ»¯Ð§¹û£¬Ê¹´ó¼ÒÓÐÒ»¸ö¸üÖ±¹ÛµÄ¸ÐÊÜ¡£

   
2553 ´Îä¯ÀÀ       27
Ïà¹ØÎÄÕÂ

»ùÓÚEAµÄÊý¾Ý¿â½¨Ä£
Êý¾ÝÁ÷½¨Ä££¨EAÖ¸ÄÏ£©
¡°Êý¾Ýºþ¡±£º¸ÅÄî¡¢ÌØÕ÷¡¢¼Ü¹¹Óë°¸Àý
ÔÚÏßÉ̳ÇÊý¾Ý¿âϵͳÉè¼Æ ˼·+Ч¹û
 
Ïà¹ØÎĵµ

GreenplumÊý¾Ý¿â»ù´¡Åàѵ
MySQL5.1ÐÔÄÜÓÅ»¯·½°¸
ijµçÉÌÊý¾ÝÖÐ̨¼Ü¹¹Êµ¼ù
MySQL¸ßÀ©Õ¹¼Ü¹¹Éè¼Æ
Ïà¹Ø¿Î³Ì

Êý¾ÝÖÎÀí¡¢Êý¾Ý¼Ü¹¹¼°Êý¾Ý±ê×¼
MongoDBʵս¿Î³Ì
²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿âÉè¼ÆÓëÓÅ»¯
PostgreSQLÊý¾Ý¿âʵսÅàѵ