ÕªÒª£º¾¹ý1Äê¶àµÄ³ÖÐø´´ÐÂÓë¸Ä½ø£¬ÐÇ»·ÒѾÔÚ¹úÄÚÂ䵨ÁËÊýÊ®¸öInceptorµÄÉÌÓÃÏîÄ¿¡£ÕâÊÇһƪÐÇ»·Spark½â¾ö·½°¸µÄ¼¼Êõ½âÎö£¬Ò²ÊÇSparkÓû§¿ÉÒÔЧ·ÂµÄÓÅ»¯Ö®µÀ¡£
ÐÇ»·¿Æ¼¼´Ó2013Äê6Ô¿ªÊ¼Ñз¢»ùÓÚSparkµÄSQLÖ´ÐÐÒýÇæ£¬ÔÚ2013Äêµ×ÍÆ³öTranswarp
Inceptor 1.0£¬²¢Â䵨Á˹úÄÚÊ׸ö7x24СʱµÄÉÌÓÃÏîÄ¿¡£¾¹ý1Äê¶àµÄ³ÖÐø´´ÐÂÓë¸Ä½ø£¬ÐÇ»·ÒѾÔÚ¹úÄÚÂ䵨ÁËÊýÊ®¸öInceptorµÄÉÌÓÃÏîÄ¿¡£ÕâÊÇһƪÐÇ»·Spark½â¾ö·½°¸µÄ¼¼Êõ½âÎö£¬Ò²ÊÇSparkÓû§¿ÉÒÔЧ·ÂµÄÓÅ»¯Ö®µÀ¡£
ÒÔÏÂΪÔÎÄ
µ±Ç°Hadoop¼¼ÊõÅ·¢Õ¹£¬ÓÃÓÚ½â¾ö´óÊý¾ÝµÄ·ÖÎöÄÑÌâµÄ¼¼Êõƽ̨¿ªÊ¼Ó¿ÏÖ¡£Sparkƾ½èÐÔÄÜÇ¿¾¢¡¢¸ß¶ÈÈÝ´í¡¢µ÷¶ÈÁé»îµÈ¼¼ÊõÓÅÊÆÒѽ¥½¥³ÉΪÖ÷Á÷¼¼Êõ£¬Òµ½ç´ó²¿·Ö³§É̶¼ÌṩÁË»ùÓÚSparkµÄ¼¼Êõ·½°¸ºÍ²úÆ·¡£¸ù¾ÝDatabricksµÄͳ¼Æ£¬Ä¿Ç°ÓÐ11¸öÉÌÒµµÄSpark°æ±¾¡£
ÔÚʹÓÃSpark×÷³ö¼ÆËãÆ½Ì¨µÄ½â¾ö·½°¸ÖУ¬ÓÐÁ½ÖÖÖ÷Á÷±à³ÌÄ£ÐÍ£¬Ò»ÀàÊÇ»ùÓÚSpark API»òÕßÑÜÉú³öÀ´µÄÓïÑÔ£¬ÁíÒ»ÖÖÊÇ»ùÓÚSQLÓïÑÔ¡£SQL×÷ΪÊý¾Ý¿âÁìÓòµÄÊÂʵ±ê×¼ÓïÑÔ£¬Ïà±È½ÏÓÃAPI£¨ÈçMapReduce
API£¬Spark APIµÈ£©À´¹¹½¨´óÊý¾Ý·ÖÎöµÄ½â¾ö·½°¸ÓÐ×ÅÏÈÌìµÄÓÅÊÆ£ºÒ»ÊDzúÒµÁ´ÍêÉÆ£¬¸÷ÖÖ±¨±í¹¤¾ß¡¢ETL¹¤¾ßµÈ¿ÉÒԺܺõĶԽӣ»¶þÊÇÓÃSQL¿ª·¢ÓиüµÍµÄ¼¼ÊõÃż÷£»ÈýÊÇÄܹ»½µµÍÔÓÐϵͳµÄÇ¨ÒÆ³É±¾µÈ¡£Òò´Ë£¬SQLÓïÑÔÒ²½¥½¥³ÉΪ´óÊý¾Ý·ÖÎöµÄÖ÷Á÷¼¼Êõ±ê×¼¡£±¾ÎĽ«ÉîÈë½âÎöInceptorµÄ¼Ü¹¹¡¢±à³ÌÄ£ÐͺͱàÒëÓÅ»¯¼¼Êõ£¬²¢Ìṩ»ù×¼²âÊÔÔÚ¶àÆ½Ì¨ÉϵÄÐÔÄܶԱÈÊý¾Ý¡£
1. Inceptor¼Ü¹¹
Transwarp InceptorÊÇ»ùÓÚSparkµÄ·ÖÎöÒýÇæ£¬Èçͼ1Ëùʾ£¬´ÓÏÂÍùÉÏÓÐÈý²ã¼Ü¹¹£º×îÏÂÃæÊÇ´æ´¢²ã£¬°üº¬·Ö²¼Ê½ÄÚ´æÁÐʽ´æ´¢£¨Transwarp
Holodesk£©£¬¿É½¨ÔÚÄÚ´æ»òÕßSSDÉÏ£»Öмä²ãÊÇSpark¼ÆËãÒýÇæ²ã£¬ÐÇ»·×öÁË´óÁ¿µÄ¸Ä½ø±£Ö¤ÒýÇæÓг¬Ç¿µÄÐÔÄܺ͸߶ȵĽ¡×³ÐÔ£»×îÉϲã°üÀ¨Ò»¸öÍêÕûµÄSQL
99ºÍPL/SQL±àÒëÆ÷¡¢Í³¼ÆËã·¨¿âºÍ»úÆ÷ѧϰËã·¨¿â£¬ÌṩÍêÕûµÄRÓïÑÔ·ÃÎʽӿڡ£
ͼ1£ºTranswarp Inceptor¼Ü¹¹Í¼
Transwarp Inceptor¿ÉÒÔ·ÖÎö´æ´¢ÔÚHDFS¡¢HBase»òÕßTranswarp Holodesk·Ö²¼Ê½»º´æÖеÄÊý¾Ý£¬¿ÉÒÔ´¦ÀíµÄÊý¾ÝÁ¿´ÓGBµ½ÊýÊ®TB£¬¼´Ê¹Êý¾ÝÔ´»òÕßÖмä½á¹ûµÄ´óСԶ´óÓÚÄÚ´æÈÝÁ¿Ò²¿É¸ßЧ´¦Àí¡£ÁíÍâTranswarp
Inceptorͨ¹ý¸Ä½øSparkºÍYARNµÄ×éºÏ£¬Ìá¸ßÁËSparkµÄ¿É¹ÜÀíÐÔ¡£Í¬Ê±ÐÇ»·²»½ö½öÊǽ«Spark×÷Ϊһ¸öȱʡ¼ÆËãÒýÇæ£¬Ò²ÖØÐ´ÁËSQL±àÒëÆ÷£¬Ìṩ¸ü¼ÓÍêÕûµÄSQLÖ§³Ö¡£
ͬʱ£¬Transwarp Inceptor»¹Í¨¹ý¸Ä½øSparkʹ֮¸üºÃµØÓëHBaseÈںϣ¬¿ÉÒÔΪHBaseÌṩÍêÕûµÄSQLÖ§³Ö£¬°üÀ¨ÅúÁ¿SQLͳ¼Æ¡¢OLAP·ÖÎöÒÔ¼°¸ß²¢·¢µÍÑÓʱµÄSQL²éѯÄÜÁ¦£¬Ê¹µÃHBaseµÄÓ¦ÓÿÉÒÔ´Ó¼òµ¥µÄÔÚÏß²éѯӦÓÃÀ©Õ¹µ½¸´ÔÓ·ÖÎöºÍÔÚÏßÓ¦ÓýáºÏµÄ»ìºÏÓ¦ÓÃÖУ¬´ó´óÍØÕ¹ÁËHBaseµÄÓ¦Ó÷¶Î§¡£
2. ±à³ÌÄ£ÐÍ
Transwarp InceptorÌṩÁ½ÖÖ±à³ÌÄ£ÐÍ£ºÒ»ÊÇ»ùÓÚSQLµÄ±à³ÌÄ£ÐÍ£¬ÓÃÓÚ³£¹æµÄÊý¾Ý·ÖÎö¡¢Êý¾Ý²Ö¿âÀàÓ¦ÓÃÊг¡£»¶þÊÇ»ùÓÚÊý¾ÝÍÚ¾ò±à³ÌÄ£ÐÍ£¬¿ÉÒÔÀûÓÃRÓïÑÔ»òÕßSpark
MLlibÀ´×öһЩÉî¶Èѧϰ¡¢Êý¾ÝÍÚ¾òµÈÒµÎñÄ£ÐÍ¡£
2.1 SQLÄ£ÐÍ
Transwarp InceptorʵÏÖÁË×Ô¼ºµÄSQL½âÎöÖ´ÐÐÒýÇæ£¬¿ÉÒÔ¼æÈÝSQL 99ºÍHiveQL£¬×Ô¶¯Ê¶±ðÓï·¨£¬Òò´Ë¿ÉÒÔ¼æÈÝÏÖÓеĻùÓÚHive¿ª·¢µÄÓ¦Óá£ÓÉÓÚTranswarp
InceptorÍêÕûÖ§³Ö±ê×¼µÄSQL 99±ê×¼£¬´«Í³Êý¾Ý¿âÉÏÔËÐеÄÒµÎñ¿ÉÒԷdz£·½±ãµÄÇ¨ÒÆµ½Transwarp
InceptorϵͳÉÏ¡£´ËÍâTranswarp InceptorÖ§³ÖPL/SQLÀ©Õ¹£¬´«Í³Êý¾Ý²Ö¿âµÄ»ùÓÚPL/SQL´æ´¢¹ý³ÌµÄÓ¦Óã¨ÈçETL¹¤¾ß£©¿ÉÒԷdz£·½±ãµÄÔÚInceptorÉϲ¢·¢Ö´ÐС£ÁíÍâTranswarp
InceptorÖ§³Ö²¿·ÖSQL 2003±ê×¼£¬Èç´°¿Úͳ¼Æ¹¦ÄÜ¡¢°²È«É󼯹¦Äܵȣ¬²¢¶Ô¶à¸öÐÐÒµ¿ª·¢ÁËרÃŵĺ¯Êý¿â£¬Òò´Ë¿ÉÒÔÂú×ã¶à¸öÐÐÒµµÄÌØÐÔÐèÇó¡£
2.2 Êý¾ÝÍÚ¾ò¼ÆËãÄ£ÐÍ
Transwarp InceptorʵÏÖÁË»úÆ÷ѧϰËã·¨¿âÓëͳ¼ÆËã·¨¿â£¬Ö§³Ö³£ÓûúÆ÷ѧϰËã·¨²¢Ðл¯Óëͳ¼ÆËã·¨²¢Ðл¯£¬²¢ÀûÓÃSparkÔÚµü´ú¼ÆËãºÍÄÚ´æ¼ÆËãÉϵÄÓÅÊÆ£¬½«²¢ÐеĻúÆ÷ѧϰËã·¨Óëͳ¼ÆËã·¨ÔËÐÐÔÚSparkÉÏ¡£ÀýÈ磺»úÆ÷ѧϰËã·¨¿âÓаüÀ¨Âß¼»Ø¹é¡¢ÆÓËØ±´Ò¶Ë¹¡¢Ö§³ÖÏòÁ¿»ú¡¢¾ÛÀà¡¢ÏßÐԻع顢¹ØÁªÍÚ¾ò¡¢ÍƼöËã·¨µÈ£¬Í³¼ÆËã·¨¿â°üÀ¨¾ùÖµ¡¢·½²î¡¢ÖÐλÊý¡¢Ö±·½Í¼¡¢ÏäÏßͼµÈ¡£Transwarp
Inceptor¿ÉÒÔÖ§³ÖÓÃRÓïÑÔ»òÕßSpark APIÔÚÆ½Ì¨Éϴ¶àÖÖ·ÖÎöÐÍÓ¦Óã¬ÀýÈçÓû§ÐÐΪ·ÖÎö¡¢¾«×¼ÓªÏú¡¢¶ÔÓû§Ìù±êÇ©¡¢½øÐзÖÀà¡£
3. SQL±àÒëÓëÓÅ»¯
Transwarp InceptorÑз¢ÁËÒ»Ì×ÍêÕûµÄSQL±àÒëÆ÷£¬°üÀ¨HiveQL½âÎöÆ÷¡¢SQL±ê×¼½âÎöÆ÷ºÍPL/SQL½âÎöÆ÷£¬½«²»Í¬µÄSQLÓïÑÔ½âÎö³ÉÖм伶±íʾÓïÑÔ£¬È»ºó¾¹ýÓÅ»¯Æ÷ת»»³ÉÎïÀíÖ´Ðмƻ®¡£SQLÓïÑÔ½âÎöºó¾¹ýÂß¼ÓÅ»¯Æ÷Éú³ÉÖм伶±íʾÓïÑÔ£¬¶øÖмä±íʾÓïÑÔÔÙ¾¹ýÎïÀíÓÅ»¯Æ÷Éú³É×îÖÕµÄÎïÀíÖ´Ðмƻ®¡£´Ó¼Ü¹¹ÉÏ·Ö£¬Âß¼ÓÅ»¯Æ÷ºÍÎïÀíÓÅ»¯Æ÷¶¼°üº¬»ùÓÚ¹æÔòµÄÓÅ»¯Ä£¿éºÍ»ùÓڳɱ¾µÄÓÅ»¯Ä£¿é¡£
ΪÁ˺ÍHadoopÉú̬¸üºÃµÄ¼æÈÝ£¬InceptorΪһ¸öSQL²éѯÉú³ÉMap ReduceÉϵÄÖ´Ðмƻ®ºÍSparkÉϵÄÖ´Ðмƻ®£¬²¢ÇÒ¿ÉÒÔͨ¹ýÒ»¸öSETÃüÁîÔÚÁ½ÖÖÖ´ÐÐÒýÇæÖ®¼äÇл»¡£
ͼ2£ºSQL±àÒë¿ò¼Ü
3.1 SQL±àÒëÓë½âÎö
Transwarp InceptorµÄSQL±àÒëÆ÷»á¸ù¾ÝÊäÈëµÄSQL²éѯµÄÀàÐÍÀ´×Ô¶¯Ñ¡Ôñ²»Í¬µÄ½âÎöÆ÷£¬ÈçPL/SQL´æ´¢¹ý³Ì»á×Ô¶¯½øÈëPL/SQL½âÎöÆ÷²¢Éú³ÉÒ»¸öSpark
RDDµÄDAG´Ó¶øÔÚSparkƽ̨Éϲ¢ÐмÆË㣬±ê×¼SQL²éѯ»á½øÈëSQL±ê×¼½âÎöÆ÷Éú³ÉSpark»òMap
ReduceÖ´Ðмƻ®¡£ÓÉÓÚHiveQLºÍ±ê×¼µÄSQLÓÐËù³öÈ룬ΪÁ˼æÈÝHiveQL£¬Transwarp
Inceptor±£ÁôÁËHiveQL½âÎöÆ÷£¬²¢¿ÉÒԶԷDZê×¼SQLµÄHive²éѯÉú³ÉSpark»òÕßMap ReduceÖ´Ðмƻ®¡£
3.1.1 SQL ±ê×¼½âÎöÆ÷
Transwarp Inceptor¹¹½¨ÁË×ÔÖ÷Ñз¢µÄSQL±ê×¼½âÎöÆ÷£¬ÓÃÓÚ½âÎöSQL 99 &
SQL 2003²éѯ²¢Éú³ÉSparkºÍMap ReduceµÄÖ´Ðмƻ®¡£´Ê·¨ºÍÓï·¨·ÖÎö²ã»ùÓÚAntlrÓï·¨À´¹¹½¨´Ê·¨·¶Ê½£¬Í¨¹ýAntlrÀ´Éú³É³éÏóÓïÒåÊ÷£¬²¢»áͨ¹ýһЩÉÏÏÂÎĵÄÓïÒåÀ´Ïû³ý³åÍ»²¢Éú³ÉÕýÈ·µÄ³éÏóÓïÒåÊ÷¡£ÓïÒå·ÖÎö²ã½âÎöÉϲãÉú³ÉµÄ³éÏóÓïÒåÊ÷£¬¸ù¾ÝÉÏÏÂÎÄÀ´Éú³ÉÂß¼Ö´Ðмƻ®²¢´«µÝ¸øÓÅ»¯Æ÷¡£Ê×ÏÈTranswarp
Inceptor»á½«SQL½âÎö³ÉTABLE SCAN¡¢SELECT¡¢FILTER¡¢JOIN¡¢UNION¡¢ORDER
BY¡¢GROUP BYµÈÖ÷ÒªµÄÂß¼¿é£¬½Ó×Å»á¸ù¾ÝһЩMetaÐÅÏ¢½øÒ»²½Ï¸»¯¸÷¸öÂß¼¿éµÄÖ´Ðмƻ®¡£ÈçTABLE
SCAN»á·Ö³É¿é¶ÁÈ¡¡¢¿é¹ýÂË¡¢Ðм¶±ð¹ýÂË¡¢ÐòÁл¯µÈ¶à¸öÖ´Ðмƻ®¡£
3.1.2 PL/SQL ½âÎöÆ÷
PL/SQLÊÇOracle¶ÔSQLÓïÑÔµÄÄ£¿é»¯À©Õ¹£¬ÒѾÔںܶàÐÐÒµÖÐÓдó¹æÄ£µÄÓ¦Óã¬ÊÇÊý¾Ý²Ö¿âÁìÓòµÄÖØÒª±à³ÌÓïÑÔ¡£
ΪÁËÈô洢¹ý³ÌÔÚSparkÉÏÓнϺõÄÐÔÄÜ£¬PL/SQL½âÎöÆ÷»á¸ù¾Ý´æ´¢¹ý³ÌÖеÄÉÏÏÂÎĹØÏµÀ´Éú³ÉSQL
DAG£¬È»ºó¶Ô¸÷SQLµÄÖ´Ðмƻ®Éú³ÉµÄRDD½øÐжþ´Î±àÒ룬ͨ¹ýÎïÀíÓÅ»¯Æ÷½«Ò»Ð©Ã»ÓÐÒÀÀµ¹ØÏµµÄRDD½øÐкϲ¢´Ó¶øÉú³ÉÒ»¸ö×îÖÕµÄRDD
DAG¡£Òò´Ë£¬Ò»¸ö´æ´¢¹ý³Ì±»½âÎö³ÉÒ»¸ö´óµÄDAG£¬´Ó¶østageÖ®¼ä¿ÉÒÔ´óÁ¿²¢·¢Ö´ÐУ¬±ÜÃâÁ˶à´ÎÖ´ÐÐSQLµÄÆô¶¯¿ªÏú²¢±£Ö¤ÁËϵͳµÄ²¢·¢ÐÔÄÜ¡£
½âÎö²¢Éú³ÉSQL¼¶±ðµÄÖ´Ðмƻ®

½âÎöSQLµÄÒÀÀµ¹ØÏµ²¢Éú³ÉDAG, ÔÙ¸ù¾Ý¸÷¸öSQLµÄÖ´Ðмƻ®À´Éú³É×îÖÕ´æ´¢¹ý³ÌµÄSpark
RDD DAG

3.2 SQLÓÅ»¯Æ÷
Transwarp InceptorʹÓÃSpark×÷ΪĬÈϼÆËãÒýÇæ£¬²¢ÇÒ¿ª·¢ÁËÍêÉÆµÄSQLÓÅ»¯Æ÷£¬Òò´ËÔÚ´óÁ¿µÄ¿Í»§°¸ÀýÐÔÄܲâÊÔÖУ¬Transwarp
InceptorµÄÐÔÄÜÁìÏÈMap Reduce 10-100±¶£¬²¢³¬Ô½²¿·Ö¿ªÔ´MPPÊý¾Ý¿â¡£SQLÓÅ»¯Æ÷¶Ôƽ̨ÐÔÄܵÄÌáÉý¾Ó¹¦ÖÁΰ¡£
3.2.1 »ùÓÚ¹æÔòµÄÓÅ»¯Æ÷£¨Rule Based Optimizer£©
ĿǰΪֹ£¬Transwarp Inceptor¹²ÊµÏÖÁËÒ»°Ù¶à¸öÓÅ»¯¹æÔò£¬²¢ÇÒÔÚ³ÖÐøµÄÌí¼ÓÐµĹæÔò¡£°´ÕÕ¹¦ÄÜ»®·Ö£¬ÕâЩ¹æÔòÖ÷Òª·Ö²¼ÔÚÈçϼ¸¸öÄ£¿é£º
Îļþ¶Áȡʱ¹ýÂË
ÔÚÎļþ¶Áȡʱ¹ýÂËÊý¾ÝÄܹ»×î´ó»¯µÄ¼õÉÙ²ÎÓë¼ÆËãµÄÊý¾ÝÁ¿´Ó¶ø×îΪÓÐЧµÄÌá¸ßÐÔÄÜ£¬Òò´ËTranswarp InceptorÌṩÁ˶à¸ö¹æÔòÓÃÓÚÉú³É±íµÄ¹ýÂËÌõ¼þ¡£¶ÔÓÚһЩSQLÖеÄÏÔʾÌõ¼þ£¬Transwarp
Inceptor»á¾¡Á¿½«¹ýÂËÇ°ÍÆµ½¶ÁÈ¡±íÖУ»¶ø¶ÔÓÚһЩÒþʽµÄ¹ýÂËÌõ¼þ£¬Èç¿ÉÒÔ¸ù¾Ýjoin keyÉú³ÉµÄ¹ýÂ˹æÔò£¬Inceptor»á¸ù¾ÝÓïÒå±£Ö¤ÕýÈ·ÐÔµÄǰÌáϽøÐйæÔòÉú³É¡£
¹ýÂËÌõ¼þǰÖÃ
Transwarp InceptorÄܹ»´Ó¸´ÔÓµÄ×éºÏ¹ýÂËÌõ¼þÖÐɸѡ³öÕë¶ÔÌØ¶¨±íµÄ¹ýÂ˹æÔò£¬È»ºóͨ¹ýSQLÓïÒåÀ´È·¶¨ÊÇ·ñÄܽ«¹ýÂËÌõ¼þÇ°ÍÆµ½¾¡Á¿ÔçµÄʱºòÖ´ÐС£Èç¹ûÓÐ×Ó²éѯ£¬¹ýÂËÌõ¼þ¿ÉÒԵݹéÇ°ÍÆÈë×îµÍ²ãµÄ×Ó²éѯÖУ¬´Ó¶ø±£Ö¤ËùÓеÄÈßÓàÊý¾Ý±»É¾³ý¡£
³¬¿í±íµÄ¶ÁÈ¡¹ýÂË
¶ÔһЩÁ㬶àµÄ±í½øÐд¦ÀíµÄʱºò£¬Transwarp InceptorÊ×ÏÈ»á¸ù¾ÝSQLÓïÒåÀ´È·¶¨Òª¶ÁÈ¡µÄÁУ¬²¢ÔÚ¶ÁÈ¡±íµÄʱºò½øÐпçÁжÁÈ¡¼õÉÙIOºÍÄÚ´æÏûºÄ¡£¶øÈç¹û±íÓйýÂËÌõ¼þ£¬Inceptor»á×ö½øÒ»²½ÓÅ»¯£¬Ê×ÏÈÖ»¶ÁÈ¡¹ýÂËÌõ¼þÏà¹ØµÄÁÐÀ´È·¶¨¸ÃÐмǼÊÇ·ñÐèÒª±»Ñ¡Ôñ£¬Èç¹û²»ÊǾÍÌø¹ýµ±Ç°ÐеÄËùÓÐÁУ¬Òò´ËÄܹ»×î´ó³Ì¶ÈÉϵļõÉÙÊý¾Ý¶ÁÈ¡¡£ÔÚһЩÉÌҵʵʩÖУ¬ÕâЩÓÅ»¯¹æÔòÄܹ»´øÀ´5x
- 10xµÄÐÔÄÜÌáÉý¡£
Shuffle StageµÄÓÅ»¯ÓëÏû³ý
SparkµÄshuffleʵÏÖµÄЧÂʷdz£µÍ£¬ÐèÒª°Ñ½á¹ûд´ÅÅÌ£¬È»ºóͨ¹ýHTTP´«Êä¡£Transwarp
InceptorÌí¼ÓÁËһЩshuffleÏû³ýµÄÓÅ»¯¹æÔò£¬¶ÔSQLµÄDAGÖв»±ØÒª»òÕßÊÇ¿ÉÒԺϲ¢µÄshuffle
stage½øÐÐÏû³ý»òÕߺϲ¢¡£¶ÔÓÚ±ØÐëÒª×öShuffleµÄ¼ÆËãÈÎÎñ£¬Inceptorͨ¹ýDAGSchedulerÀ´Ìá¸ßshuffleµÄЧÂÊ£ºMap
Task»áÖ±½Ó½«½á¹û·µ»Ø¸øDAGScheduler£¬È»ºóDAGScheduler½«½á¹ûÖ±½Ó½»¸øReduce
Task¶ø²»ÊǵȴýËùÓÐMap Task½áÊø£¬ÕâÑùÄܹ»·Ç³£Ã÷ÏÔµÄÌáÉýshuffle½×¶ÎµÄÐÔÄÜ¡£
PartitionÏû³ý
Transwarp InceptorÌṩµ¥Ò»ÖµPartitionºÍRange Partition£¬²¢ÇÒÖ§³Ö¶ÔPartition½¨BucketÀ´×ö¶à´Î·ÖÇø¡£µ±Partition¹ý¶àµÄʱºò£¬ÏµÍ³µÄÐÔÄÜ»áÒòΪÄÚ´æÏûºÄºÍµ÷¶È¿ªÏú¶øËðʧ¡£Òò´Ë£¬InceptorÌṩÁ˶à¸ö¹æÔòÓÃÓÚÏû³ý²»±ØÒªµÄPartition£¬Èç¹ûÉÏÏÂÎÄÖÐÓÐÒþʽµÄ¶ÔPartitionµÄ¹ýÂËÌõ¼þ£¬InceptorÒ²»áÉú³É¶ÔpartitionµÄ¹ýÂ˹æÔò¡£
3.2.2 »ùÓڳɱ¾µÄÓÅ»¯Æ÷£¨Cost Based Optimizer£©
»ùÓÚ¹æÔòµÄÓÅ»¯Æ÷¶¼ÊǸù¾ÝһЩ¾²Ì¬µÄÐÅÏ¢À´²úÉúµÄ£¬Òò´ËºÜ¶àºÍ¶¯Ì¬Êý¾ÝÏà¹ØµÄÌØÐÔÊDz»ÄÜͨ¹ý»ùÓÚ¹æÔòµÄÓÅ»¯À´½â¾ö£¬Òò´ËTranswarp
InceptorÌṩÁË»ùÓڳɱ¾µÄÓÅ»¯Æ÷À´×ö¶þ´ÎÓÅ»¯¡£Ïà¹ØµÄÔʼÊý¾ÝÖ÷ÒªÀ´×ÔMeta-storeÖеıíͳ¼ÆÐÅÏ¢¡¢RDDµÄÐÅÏ¢¡¢SQLÉÏÏÂÎÄÖеÄͳ¼ÆÐÅÏ¢µÈ¡£ÒÀÀµÓÚÕâЩ¶¯Ì¬µÄÊý¾Ý£¬CBO»á¼ÆËãÖ´Ðмƻ®µÄÎïÀí³É±¾²¢Ñ¡Ôñ×îÓÐЧµÄÖ´Ðмƻ®¡£Ò»Ð©·Ç³£ÓÐЧµÄÓÅ»¯¹æÔò°üÀ¨Èçϼ¸µã£º
JOIN˳Ðòµ÷ÓÅ
ÔÚʵ¼ÊµÄ°¸ÀýÖУ¬joinÊÇÏûºÄ¼ÆËãÁ¿×î¶àµÄÒµÎñ£¬Òò´Ë¶ÔjoinµÄÓÅ»¯ÖÁ¹ØÖØÒª¡£ÔÚ¶à±íJOINÄ£ÐÍÖУ¬Transwarp
Inceptor»á¸ù¾Ýͳ¼ÆÐÅÏ¢À´Ô¤¹ÀjoinµÄÖмä½á¹û´óС£¬²¢Ñ¡Ôñ²úÉúÖмäÊý¾ÝÁ¿×îСµÄjoin˳Ðò×÷ΪִÐмƻ®¡£
JOINÀàÐ͵ÄÑ¡Ôñ
Transwarp InceptorÖ§³ÖLeft-most Join Tree ºÍ Bush Join
Tree£¬²¢ÇÒ»á¸ù¾Ýͳ¼ÆÐÅÏ¢À´Ñ¡ÔñÉú³ÉÄÄÖÖJoinÄ£ÐÍÓÐ×î¼ÑÐÔÄÜ¡£´ËÍ⣬Transwarp Inceptor»á¸ù¾ÝÔʼ±í»òÕßÖмäÊý¾ÝµÄ´óСÀ´Ñ¡ÔñÊÇ·ñ¿ªÆôÕë¶ÔÊý¾ÝÇãбģÐÍϵÄÌØÊâÓÅ»¯µÈ¡£´ËÍ⣬Õë¶ÔHBase±íÊÇ·ñÓÐË÷ÒýµÄÇé¿ö£¬Transwarp
Inceptor»áÔÚÆÕͨJoinºÍLook-up Join¼ä×ö¸ö¾ùºâµÄÑ¡Ôñ¡£
²¢·¢¶ÈµÄ¿ØÖÆ
Sparkͨ¹ýÏ̼߳¶²¢·¢À´Ìá¸ßÐÔÄÜ£¬µ«ÊÇ´óÁ¿µÄ²¢·¢¿ÉÄÜ»á´øÀ´²»±ØÒªµÄµ÷¶È¿ªÏú£¬Òò´Ë²»Í¬µÄ°¸ÀýÔÚ²»Í¬²¢·¢¶ÈÏ»áÓÐ×î¼ÑÐÔÄÜ¡£Transwarp
Inceptorͨ¹ý¶ÔRDDµÄһЩÊôÐÔ½øÐÐÍÆËãÀ´Ñ¡Ôñ×î¼Ñ²¢·¢¿ØÖÆ£¬¶ÔºÜ¶àµÄ°¸ÀýÓÐ×Å2x-3xµÄÐÔÄÜÌáÉý¡£
4.Transwarp HolodeskÄÚ´æ¼ÆËãÒýÇæ
ΪÁËÓÐЧµÄ½µµÍSQL·ÖÎöµÄÑÓʱ£¬¼õÉÙ´ÅÅÌIO¶ÔϵͳÐÔÄܵÄÓ°Ï죬ÐÇ»·¿Æ¼¼Ñз¢ÁË»ùÓÚÄÚ´æ»òÕßSSDµÄ´æ´¢¼ÆËãÒýÇæTranswarp
Holodesk£¬Í¨¹ý½«±íÊý¾ÝÖ±½Ó½¨ÔÚÄÚ´æ»òÕßSSDÉÏÒÔʵÏÖSQL²éѯȫÄÚ´æ¼ÆËã¡£ÁíÍâTranswarp
HolodeskÔö¼ÓÁËÊý¾ÝË÷Òý¹¦ÄÜ£¬Ö§³Ö¶Ô¶à¸öÊý¾ÝÁн¨Ë÷Òý£¬´Ó¶ø¸ü´ó³Ì¶ÈµÄ½µµÍÁËSQL²éѯÑÓʱ¡£
4.1 ´æ´¢¸ñʽ
Transwarp Holodesk»ùÓÚÁÐʽ´æ´¢×öÁË´óÁ¿µÄÔ´´ÐԸĽø´øÀ´¸ü¸ßµÄÐÔÄܺ͸üµÍµÄÊý¾ÝÅòÕÍÂÊ¡£Ê×ÏÈÊý¾Ý±»ÐòÁл¯ºó´æ´¢µ½ÄÚ´æ»òSSDÉÏÒÔ½ÚÊ¡Õß×ÊÔ´Õ¼Óá£Èçͼ3Ëùʾ£¬Ã¿¸ö±íµÄÊý¾Ý±»´æ´¢³ÉÈô¸É¸öSegment£¬Ã¿¸öSegment±»»®·Ö³ÉÈô¸É¸öBlock£¬Ã¿¸öBlock°´ÕÕÁз½Ê½´æ´¢ÓÚSSD»òÄÚ´æÖС£ÁíÍâÿ¸öBlockµÄÍ·²¿¶¼¼ÓÉÏMin-Max
FilterºÍBloom FilterÓÃÓÚ¹ýÂËÎÞÓõÄÊý¾Ý¿é£¬¼õÉÙ²»±ØÒªµÄÊý¾Ý½øÈë¼ÆËã½×¶Î¡£
Transwarp Holodesk¸ù¾Ý²éѯÌõ¼þµÄν´ÊÊôÐÔ¶Ôÿ¸öÊý¾Ý¿éµÄ¶ÔÓ¦Áй¹½¨Êý¾ÝË÷Òý£¬Ë÷ÒýÁвÉÓÃ×Ô¼ºÑз¢µÄTrie½á¹¹½øÐÐ×éÖ¯´æ´¢£¬·ÇË÷ÒýÁвÉÓÃ×Öµä±àÂëµÄ·½Ê½½øÐÐ×éÖ¯´æ´¢¡£Trie²»½öÄܶԾßÓй«¹²Ç°×ºµÄ×Ö·û´®½øÐÐѹËõ£¬¶øÇÒ¿ÉÒÔ¶ÔÊäÈëµÄ×Ö·û´®ÅÅÐò£¬´Ó¶ø¿ÉÒÔÀûÓöþ·Ö²éÕÒ¿ìËÙ²éѯËùÐèÊý¾ÝµÄλÖ㬴Ӷø¿ìËÙÏìÓ¦²éѯÐèÇó¡£
ͼ3£ºHolodesk´æ´¢¸ñʽ
HDFS 2.6Ö§³ÖStorage TierÈÃÓ¦ÓóÌÐò¿ÉÒÔÑ¡Ôñ´æ´¢²ãΪ´ÅÅÌ»òÕßSSD£¬µ«ÊÇûÓÐרÓõĴ洢¸ñʽÉè¼ÆÊÇÎÞ·¨ÓÐЧÀûÓÃSSDµÄ¶ÁдÍÌÍÂÁ¿ºÍµÍÑÓ£¬Òò´ËÏÖÓеÄTextÒÔ¼°ÐÐÁлìºÏ£¨ORC/Parquet£©¶¼²»ÄÜÓÐЧµÄÀûÓÃSSDµÄ¸ßÐÔÄÜ¡£Îª´ËÑéÖ¤´æ´¢½á¹¹¶ÔÐÔÄܵÄÓ°Ï죬ÎÒÃǽ«HDFS¹¹½¨ÔÚSSDÉϲ¢Ñ¡ÓÃij»ù×¼²âÊÔÀ´×öÁ˽øÒ»²½µÄÐÔÄܶԱȣ¬½á¹ûÈçͼ4Ëùʾ£º²ÉÓÃÎı¾¸ñʽ£¬PCI-E
SSD´øÀ´µÄÐÔÄÜÌáÉý½ö1.5±¶£»²ÉÓÃרΪÄÚ´æºÍSSDÉè¼ÆµÄHolodeskÁÐʽ´æ´¢£¬ÆäÐÔÄÜÏà±È½ÏSSDÉϵÄHDFSÌáÉý¸ß´ï6±¶¡£
ͼ4£ºSSDÉÏHolodesk¶ÔHDFSµÄÐÔÄܼÓËÙ±È
4.2 ÐÔÄÜÓÅÊÆ
ijÔËÓªÉ̿ͻ§ÔÚ12̨x86·þÎñÆ÷ÉϴÁËTranswarp Inceptor£¬½«Transwarp Holodesk
ÅäÖÃÔÚPCIE-SSDÉÏ£¬²¢ÓëÆÕͨ´ÅÅ̱íÒÔ¼°DB2À´×öÐÔÄܶԱȲâÊÔ¡£×îÖÕ²âÊÔÊý¾ÝÈçͼ5Ëùʾ£º
ͼ5£ºÄ³ÔËÓªÉÌHolodeskÐÔÄܲâÊÔ½á¹û
ÔÚ´¿´âµÄcount²âÊÔÒ»ÏHolodeskÐÔÄÜÏà¶ÔÓÚ´ÅÅ̱í×î¸ßÁìÏÈ32±¶£»¶ÔÓÚjoin²âÊÔÒ»ÏTranswarp
Holodesk×î¸ßÁìÏÈ´ÅÅ̱í¶à´ï12±¶£»ÔÚµ¥±í¾ÛºÏ²âÊÔÖУ¬HolodeskÌáÉý±¶Êý´ï10~30±¶¡£ÁíÍâTranswarp
HolodeskÔÚºÍDB2µÄ¶Ô±ÈÖÐÒ²±íÏÖÓÅÐ㣬Á½¸ö¸´ÔÓSQL²éѯÔÚDB2Êý¾Ý¿âÖÐÐèÒªÔËÐÐ1СʱÒÔÉÏ£¬µ«ÊÇÔÚʹÓÃTranswarp
Holodesk¾ùÊÇ·ÖÖÓ¼¶ºÍÃë¼¶¾Í·µ»Ø½á¹û¡£
ÄÚ´æµÄ¼Û¸ñ´óÔ¼ÊÇͬÑùÈÝÁ¿SSDµÄÊ®±¶×óÓÒ£¬ÎªÁ˸øÆóÒµÌṩ¸ü¸ßÐԼ۱ȵļÆËã·½°¸£¬Transwarp HolodeskÕë¶ÔSSD½øÐÐÁË´óÁ¿µÄÓÅ»¯£¬Ê¹µÃÓ¦ÓÃÔÚSSDÉÏÔËÐоßÓÐÓëÔÚÄÚ´æÉϱȽϽӽüµÄÐÔÄÜ£¬´Ó¶øÎª¿Í»§ÌṩÁËÐԼ۱ȸü¸ßµÄ¼ÆËãÆ½Ì¨¡£
ÔÚ¶ÔTPC-DSµÄIOÃܼ¯ÐͲéѯµÄ²âÊÔÖУ¬ÎÞÂÛÉϹ¹½¨ÔÚPCI-E SSD»¹ÊÇÄÚ´æÉÏ£¬Holodesk¶Ô±È´ÅÅ̱íÓÐÒ»¸öÊýÁ¿¼¶ÉϵÄÐÔÄÜÌáÉý£»¶øSSDÉϵÄHolodeskÐÔÄÜÖ»±ÈÄÚ´æ²î10%×óÓÒ¡£

ͼ6£ºÊý¾ÝÔÚ´ÅÅÌ¡¢SSDºÍÄÚ´æÖеÄÐÔÄܱíÏÖ
5. Îȶ¨µÄSparkÖ´ÐÐÒýÇæ
ÆóҵĿǰӦÓÿªÔ´SparkµÄÖ÷ÒªÀ§ÄÑÔÚÎȶ¨ÐÔ¡¢¿É¹ÜÀíÐԺ͹¦Äܲ»¹»·á¸»ÉÏ¡£¿ªÔ´SparkÔÚÎȶ¨ÐÔÉÏ»¹ÓбȽ϶àµÄÎÊÌ⣬ÔÚ´¦Àí´óÊý¾ÝÁ¿Ê±¿ÉÄÜÎÞ·¨ÔËÐнáÊø»ò³öÏÖOut
of memory£¬ÐÔÄÜʱ¿ìʱÂý£¬ÓÐʱ±ÈMap/Reduce¸üÂý£¬ÎÞ·¨Ó¦Óõ½¸´ÔÓÊý¾Ý·ÖÎöÒµÎñÖС£
Transwarp InceptorÕë¶Ô¸÷ÖÖ³ö´í³¡¾°Éè¼ÆÁ˶àÖÖ½â¾ö·½·¨£¬Èçͨ¹ý»ùÓڳɱ¾µÄÓÅ»¯Æ÷Ñ¡Ôñ×îºÏÊʵÄÖ´Ðмƻ®¡¢¼ÓÇ¿¶ÔÊý¾Ý½á¹¹ÄÚ´æÊ¹ÓÃЧÂʵÄÓÐЧ¹ÜÀí¡¢¶Ô³£¼ûµÄÄÚ´æ³ö´íÎÊÌâͨ¹ý´ÅÅ̽øÐÐÊý¾Ý±¸·ÝµÈ·½Ê½£¬¼«´óÌá¸ßÁËSpark¹¦ÄܺÍÐÔÄܵÄÎȶ¨ÐÔ£¬ÉÏÊöÎÊÌâ¶¼ÒѾ½â¾ö²¢¾¹ýÉÌÒµ°¸ÀýµÄ¿¼Ñé¡£Transwarp
InceptorÄÜÎȶ¨µÄÔËÐÐ7*24Сʱ£¬²¢ÄÜÔÚTB¼¶¹æÄ£Êý¾ÝÉϸßЧ½øÐи÷ÖÖÎȶ¨µÄͳ¼Æ·ÖÎö¡£
6. SQLÒýÇæÐ§ÄÜÑéÖ¤
TPC-DSÊÇTPC×é֯ΪDecision Support SystemÉè¼ÆµÄÒ»¸ö²âÊÔ¼¯£¬°üº¬¶Ô´óÊý¾Ý¼¯µÄͳ¼Æ£¯±¨±íÉú³É£¯Áª»ú²éѯ£¯Êý¾ÝÍÚ¾òµÈ¸´ÔÓÓ¦Ó㬲âÊÔÓõÄÊý¾ÝÓи÷ÖÖ²»Í¬µÄ·Ö²¼ÓëÇãб£¬ÓëÕæÊµ³¡¾°·Ç³£½Ó½ü¡£Ëæ×ŹúÄÚÍâ¸÷´ú±íÐÔµÄHadoop·¢Ðа泧ÉÌÒÔTPC-DSΪ±ê×¼²âÆÀ²úÆ·£¬TPC-DSÒ²¾ÍÖð½¥³ÉΪÁËÒµ½ç¹«ÈϵÄHadoopϵͳ²âÊÔ×¼Ôò¡£
6.1 ÑéÖ¤¶Ô±ÈµÄƽ̨ºÍÅäÖÃ
ÎÒÃǴÁËÁ½¸ö¼¯Èº·Ö±ðÓÃÓÚTranswarp InceptorÓëCloudera
Data Hub/ImpalaµÄ²âÊÔ¡£Ã¿¸ö¼¯Èº²ÉÓÃ4̨ÆÕͨÁ½Â·x86·þÎñÆ÷´î½¨£¬Ã¿Ì¨·þÎñÆ÷Ó²¼þÅäÖÃÈçÏ£º

¿¼Âǵ½´ÅÅ̵ÄÈÝÁ¿ºÍHDFSµÄ´æ´¢¸´ÖÆÄ£Ê½£¬ÎÒÃÇÑ¡ÔñµÄÊÇ500GBµÄÊý¾Ý×ÜÁ¿¡£SQL²âÊÔ°¸ÀýµÄÑ¡ÔñÉÏ£¬ÔÚCloudera
ImpalaÖÐʹÓõÄÊÇÓÉCloudera¸Ä¶¯¹ýµÄTPC-DS²âÊÔ×Ó¼¯£¬ÔÚTranswarp InceptorÎÒÃÇÑ¡ÓõÄÊÇTPC-DSΪOracleÉú³ÉµÄ²âÊÔ¼¯ºÏ£¬±£ÁôÁËÔÓеĸ÷ÖÖ¸´ÔÓSQL£¬Òò´ËÄܹ»¿Í¹Û·´Ó³³öInceptorÔÚSQLÖ§³ÖÉϵÄÇé¿ö¡£

6.2 Transwarp Inceptor VS Cloudera
Impala
Transwarp InceptorÓÉÓÚÓÐÍêÉÆµÄSQLÖ§³Ö£¬Äܹ»ÔËÐÐÈ«²¿ËùÓеÄ99¸öSQL²éѯ¡£¶øÓÉÓÚCloudera¹Ù·½·¢²¼µÄTPC-DS²âÊÔ¼¯Ö»°üº¬19¸öSQL°¸Àý£¬Òò´ËÎÒÃÇÖ»ÄÜÔËÐÐÕâ19¸öSQL£¬ÊµÑéÖ¤Ã÷Õⲿ·Ö²éѯÔÚImpalaÉÏÈ«²¿Õý³£ÔËÐÐÍê³É¡£
ͼ7ÊÇËùÓеIJâÊÔ¼¯ºÏµÄÐÔÄܶԱÈͼ¡£Í¼ÖÐ×Ý×ø±êСÓÚ1±íÊö²âÊÔ°¸ÀýÖÐCloudera
ImpalaÐÔÄܳ¬¹ýTranswarp Inceptor£¬¶ø´óÓÚ1Ôò±íʾTranswarp InceptorÓиüºÃµÄÐÔÄܱíÏÖ¡£¶ÔÓÚCloudera
Impala²»ÄÜÖ§³ÖµÄSQL£¬ÎÒÃǾͱê¼ÇÕâ¸öÐÔÄܱÈΪ100¡£´ÓͼÖпɼû£¬ÔÚCloudera ImpalaÖ§³ÖµÄ19¸öSQLÖУ¬ÓÐ8¸öSQLµÄ±íÏÖ³¬¹ýTranswarp
Inceptor£¬2¸ö±íÏÖÏ൱£¬ÁíÍâ9¸öTranswarp Inceptor±ÈCloudera Impala±íÏֵĸüºÃ¡£

ͼ7£ºTranswarp InceptorÓëCloudera
ImpalaµÄÐÔÄܱȽÏ
6.3 Transwarp Inceptor VS Map Reduce
ÎÒÃÇʹÓÃÁËͬÑùµÄÓ²¼þºÍÈí¼þÅäÖÃÍê³ÉºÍ¿ªÔ´µÄHiveÖ´ÐÐЧÂÊÏà±È£¬Transwarp InceptorÄܹ»´øÀ´10x-100xµÄÐÔÄÜÌáÉý¡£Í¼8ÊÇTPC-DSµÄ²¿·ÖSQL²éѯÔÚInceptorºÍCDH
5.1 HiveµÄÐÔÄÜÌáÉý±¶Êý£¬ÆäÖÐ×î´óµÄÌáÉý±¶Êý¾¹¿É´ïµ½123±¶¡£
ͼ8£ºTranswarp InceptorÓ뿪ԴHiveµÄÐÔÄܱȽÏ
7. ½áÓï
Ëæ×ÅÔÚ´óÊý¾ÝÁìÓò¹úÄÚÍ⿪ʼ´¦ÓÚͬһÆðÅÜÏߣ¬ÎÒÃÇÏàÐÅÏñÐÇ»·¿Æ¼¼ÕâÑù¹úÄÚ¾ßÓдú±íÐÔµÄHadoop·¢Ðа泧É̽«ÔÚÖйúµÄ¹ãÀ«Êг¡¿Õ¼äÖлñµÃ³¤×ã·¢Õ¹£¬²¢ÇÒÓÉÓÚÖйúÊг¡¼¤ÁҵľºÕùÓëÄ¥Á·£¬Öð²½´òÄ¥³ö³¬Ô½¹úÍâÏȽø³§É̵ļ¼ÊõÓëʵÁ¦¡£ |