±à¼ÍƼö: |
±¾ÎÄÖ÷Òª½éÉÜÁËpresto¼Ü¹¹,
µÍÑÓ³ÙÔÀí ,´æ´¢²å¼þ ,Ö´Ðйý³Ì ,ÒýÇæ¶Ô±ÈµÈÏà¹ØÄÚÈÝ¡£
±¾ÎÄÀ´×Ô²©¿ÍÔ°£¬ÓÉ»ðÁú¹ûÈí¼þAnna±à¼¡¢ÍƼö¡£ |
|
Presto ÊÇ Facebook ÍÆ³öµÄÒ»¸ö»ùÓÚJava¿ª·¢µÄ´óÊý¾Ý·Ö²¼Ê½
SQL ²éѯÒýÇæ£¬¿É¶Ô´ÓÊý G µ½Êý P µÄ´óÊý¾Ý½øÐн»»¥Ê½µÄ²éѯ£¬²éѯµÄËÙ¶È´ïµ½ÉÌÒµÊý¾Ý²Ö¿âµÄ¼¶±ð£¬¾Ý³Æ¸ÃÒýÇæµÄÐÔÄÜÊÇ
Hive µÄ 10 ±¶ÒÔÉÏ¡£Presto ¿ÉÒÔ²éѯ°üÀ¨ Hive¡¢Cassandra ÉõÖÁÊÇһЩÉÌÒµµÄÊý¾Ý´æ´¢²úÆ·£¬µ¥¸ö
Presto ²éѯ¿ÉºÏ²¢À´×Ô¶à¸öÊý¾ÝÔ´µÄÊý¾Ý½øÐÐͳһ·ÖÎö¡£Presto µÄÄ¿±êÊÇÔÚ¿ÉÆÚÍûµÄÏìӦʱ¼äÄÚ·µ»Ø²éѯ½á¹û£¬Facebook
ÔÚÄÚ²¿¶à¸öÊý¾Ý´æ´¢ÖÐʹÓà Presto ½»»¥Ê½²éѯ£¬°üÀ¨ 300PB µÄÊý¾Ý²Ö¿â£¬³¬¹ý 1000 ¸ö
Facebook Ô±¹¤Ã¿ÌìÔÚʹÓà Presto ÔËÐг¬¹ý 3 Íò¸ö²éѯ£¬Ã¿ÌìɨÃ賬¹ý 1PB µÄÊý¾Ý¡£
Presto¼Ü¹¹
Presto²éѯÒýÇæÊÇÒ»¸öMaster-SlaveµÄ¼Ü¹¹£¬ÓÉÏÂÃæÈý²¿·Ö×é³É:
Ò»¸öCoordinator½Úµã
Ò»¸öDiscovery Server½Úµã
¶à¸öWorker½Úµã
Coordinator: ¸ºÔð½âÎöSQLÓï¾ä£¬Éú³ÉÖ´Ðмƻ®£¬·Ö·¢Ö´ÐÐÈÎÎñ¸øWorker½ÚµãÖ´ÐÐ
Discovery Server: ͨ³£ÄÚǶÓÚCoordinator½ÚµãÖÐ
Worker½Úµã: ¸ºÔðʵ¼ÊÖ´ÐвéѯÈÎÎñ,¸ºÔðÓëHDFS½»»¥¶ÁÈ¡Êý¾Ý
Worker½ÚµãÆô¶¯ºóÏòDiscovery Server·þÎñ×¢²á£¬Coordinator´ÓDiscovery
Server»ñµÃ¿ÉÒÔÕý³£¹¤×÷µÄWorker½Úµã¡£Èç¹ûÅäÖÃÁËHive Connector£¬ÐèÒªÅäÖÃÒ»¸öHive
MetaStore·þÎñΪPrestoÌṩHiveÔªÐÅÏ¢
¸üÐÎÏó¼Ü¹¹Í¼ÈçÏ£º
PrestoµÍÑÓ³ÙÔÀí
ÍêÈ«»ùÓÚÄÚ´æµÄ²¢ÐмÆËã
Á÷Ë®Ïßʽ¼ÆËã×÷Òµ
±¾µØ»¯¼ÆËã
¶¯Ì¬±àÒëÖ´Ðмƻ®
GC¿ØÖÆ
Presto´æ´¢²å¼þ
PrestoÉè¼ÆÁËÒ»¸ö¼òµ¥µÄÊý¾Ý´æ´¢µÄ³éÏó²ã£¬ À´Âú×ãÔÚ²»Í¬Êý¾Ý´æ´¢ÏµÍ³Ö®É϶¼¿ÉÒÔʹÓÃSQL½øÐвéѯ¡£
´æ´¢²å¼þ£¨Á¬½ÓÆ÷,connector£©Ö»ÐèÒªÌṩʵÏÖÒÔϲÙ×÷µÄ½Ó¿Ú£¬ °üÀ¨¶ÔÔªÊý¾Ý£¨metadata£©µÄÌáÈ¡£¬»ñµÃÊý¾Ý´æ´¢µÄλÖ㬻ñÈ¡Êý¾Ý±¾ÉíµÄ²Ù×÷µÈ¡£
³ýÁËÎÒÃÇÖ÷ҪʹÓõÄHive/HDFSºǫ́ϵͳ֮Í⣬ ÎÒÃÇÒ²¿ª·¢ÁËһЩÁ¬½ÓÆäËûϵͳµÄPresto Á¬½ÓÆ÷£¬°üÀ¨HBase£¬ScribeºÍ¶¨ÖÆ¿ª·¢µÄϵͳ
²å¼þ½á¹¹Í¼ÈçÏ£º
prestoÖ´Ðйý³Ì
Ö´Ðйý³ÌʾÒâͼ£º

Ìá½»²éѯ£ºÓû§Ê¹ÓÃPresto CliÌá½»Ò»¸ö²éѯÓï¾äºó£¬CliʹÓÃHTTPÐÒéÓëCoordinatorͨÐÅ£¬CoordinatorÊÕµ½²éѯÇëÇóºóµ÷ÓÃSqlParser½âÎöSQLÓï¾äµÃµ½Statement¶ÔÏ󣬲¢½«Statement·â×°³ÉÒ»¸öQueryStarter¶ÔÏó·ÅÈëÏ̳߳ØÖеȴýÖ´ÐУ¬ÈçÏÂͼ:ʾÀýSQLÈçÏÂ

select c1.rank, count(*) from dim.city c1 join dim.city
c2 on c1.id = c2.id where c1.id > 10 group by c1.rank
limit 10;
Âß¼Ö´Ðйý³ÌʾÒâͼÈçÏ£º

ÉÏͼÂß¼Ö´Ðмƻ®Í¼ÖеÄÐéÏß¾ÍÊÇPresto¶ÔÂß¼Ö´Ðмƻ®µÄÇзֵ㣬Âß¼¼Æ»®PlanÉú³ÉµÄSubPlan·ÖΪËĸö²¿·Ö£¬Ã¿Ò»¸öSubPlan¶¼»áÌá½»µ½Ò»¸ö»òÕß¶à¸öWorker½ÚµãÉÏÖ´ÐÐ
SubPlanÓм¸¸öÖØÒªµÄÊôÐÔplanDistribution¡¢outputPartitioning¡¢partitionByÊôÐÔÕû¸öÖ´Ðйý³ÌµÄÁ÷³ÌͼÈçÏ£º
PlanDistribution£º±íʾһ¸ö²éѯ½×¶ÎµÄ·Ö·¢·½Ê½£¬ÉÏͼÖеÄ4¸öSubPlan¹²ÓÐ3ÖÖ²»Í¬µÄPlanDistribution·½Ê½
Source£º±íʾÕâ¸öSubPlanÊÇÊý¾ÝÔ´£¬SourceÀàÐ͵ÄÈÎÎñ»á°´ÕÕÊý¾ÝÔ´´óСȷ¶¨·ÖÅä¶àÉÙ¸ö½Úµã½øÐÐÖ´ÐÐ
Fixed£º ±íʾÕâ¸öSubPlan»á·ÖÅä¹Ì¶¨µÄ½ÚµãÊý½øÐÐÖ´ÐУ¨ConfigÅäÖÃÖеÄquery.initial-hash-partitions²ÎÊýÅäÖã¬Ä¬ÈÏÊÇ8£©
None£º ±íʾÕâ¸öSubPlanÖ»·ÖÅäµ½Ò»¸ö½Úµã½øÐÐÖ´ÐÐ
OutputPartitioning£º±íʾÕâ¸öSubPlanµÄÊä³öÊÇ·ñ°´ÕÕpartitionByµÄkeyÖµ¶ÔÊý¾Ý½øÐÐShuffle£¨Ï´ÅÆ£©£¬
Ö»ÓÐÁ½¸öÖµHASHºÍNONE

ÔÚÉÏͼµÄÖ´Ðмƻ®ÖУ¬SubPlan1ºÍSubPlan0 PlanDistribution=Source£¬ÕâÁ½¸öSubPlan¶¼ÊÇÌṩÊý¾ÝÔ´µÄ½Úµã£¬SubPlan1ËùÓнڵãµÄ¶ÁÈ¡Êý¾Ý¶¼»á·¢ÏòSubPlan0µÄÿһ¸ö½Úµã£»SubPlan2·ÖÅä8¸ö½ÚµãÖ´ÐÐ×îÖյľۺϲÙ×÷£»SubPlan3Ö»¸ºÔðÊä³ö×îºó¼ÆËãÍê³ÉµÄÊý¾Ý£¬ÈçÏÂͼ£º

SubPlan1ºÍSubPlan0 ×÷ΪSource½Úµã£¬ËüÃǶÁÈ¡HDFSÎļþÊý¾ÝµÄ·½Ê½¾ÍÊǵ÷ÓõÄHDFS
InputSplit API£¬È»ºóÿ¸öInputSplit·ÖÅäÒ»¸öWorker½ÚµãÈ¥Ö´ÐУ¬Ã¿¸öWorker½Úµã·ÖÅäµÄInputSplitÊýÄ¿ÉÏÏÞÊDzÎÊý¿ÉÅäÖõģ¬ConfigÖеÄquery.max-pending-splits-per-node²ÎÊýÅäÖã¬Ä¬ÈÏÊÇ100
SubPlan1µÄÿ¸ö½Úµã¶Áȡһ¸öSplitµÄÊý¾Ý²¢¹ýÂ˺ó½«Êý¾Ý·Ö·¢¸øÃ¿¸öSubPlan0½Úµã½øÐÐJoin²Ù×÷ºÍPartial
Aggr²Ù×÷
SubPlan0µÄÿ¸ö½Úµã¼ÆËãÍê³Éºó°´GroupBy KeyµÄHashÖµ½«Êý¾Ý·Ö·¢µ½²»Í¬µÄSubPlan2½Úµã
ËùÓÐSubPlan2½Úµã¼ÆËãÍê³Éºó½«Êý¾Ý·Ö·¢µ½SubPlan3½Úµã
SubPlan3½Úµã¼ÆËãÍê³Éºó֪ͨCoordinator½áÊø²éѯ£¬²¢½«Êý¾Ý·¢Ë͸øCoordinator
prestoÒýÇæ¶Ô±È
Óëhive¡¢SparkSQL¶Ô±È½á¹ûͼ

|