ÑÔ£ºËæ×Å´óÊý¾Ý¼¼ÊõµÄ·¢Õ¹£¬ÊµÊ±Á÷¼ÆËã¡¢»úÆ÷ѧϰ¡¢Í¼¼ÆËãµÈÁìÓò³ÉΪ½ÏÈȵÄÑо¿·½Ïò£¬¶øSpark×÷Ϊ´óÊý¾Ý´¦ÀíµÄ¡°ÀûÆ÷¡±ÓÐ׎ÏΪ³ÉÊìµÄÉú̬Ȧ£¬Äܹ»Ò»Õ¾Ê½½â¾öÀàËÆ³¡¾°µÄÎÊÌâ¡£ÄÇôSparkÉú̬ϵͳÖÐÓÐÄÄЩ×é¼þÄãÖªµÀÂð£¿ÏÂÃæÈÃÎÒÃǸú×ű¾ÎÄһͬÁ˽âÏÂ
ÕâЩ²»¿É»òȱµÄ×é¼þ¡£±¾ÎÄÑ¡×Ô¡¶Í¼½âSpark£ººËÐļ¼ÊõÓë°¸Àýʵս¡·¡£ Spark Éú̬ϵͳÒÔSpark Core ΪºËÐÄ£¬Äܹ»¶ÁÈ¡´«Í³Îļþ£¨ÈçÎı¾Îļþ£©¡¢HDFS¡¢Amazon
S3¡¢Alluxio ºÍNoSQL µÈÊý¾ÝÔ´£¬ÀûÓÃStandalone¡¢YARN ºÍMesos µÈ×ÊÔ´µ÷¶È¹ÜÀí£¬Íê³ÉÓ¦ÓóÌÐò·ÖÎöÓë´¦Àí¡£ÕâЩӦÓóÌÐòÀ´×ÔSpark
µÄ²»Í¬×é¼þ£¬ÈçSpark Shell »òSpark Submit ½»»¥Ê½Åú´¦Àí·½Ê½¡¢Spark Streaming
µÄʵʱÁ÷´¦ÀíÓ¦Óá¢Spark SQL µÄ¼´Ï¯²éѯ¡¢²ÉÑù½üËÆ²éѯÒýÇæBlinkDB µÄȨºâ²éѯ¡¢MLbase/MLlib
µÄ»úÆ÷ѧϰ¡¢GraphX µÄͼ´¦ÀíºÍSparkR µÄÊýѧ¼ÆËãµÈ£¬ÈçÏÂͼËùʾ£¬ÕýÊÇÕâ¸öÉú̬ϵͳʵÏÖÁË¡°One
Stack to Rule Them All¡±Ä¿±ê¡£

Spark Core Spark Core ÊÇÕû¸öBDAS Éú̬ϵͳµÄºËÐÄ×é¼þ£¬ÊÇÒ»¸ö·Ö²¼Ê½´óÊý¾Ý´¦Àí¿ò¼Ü¡£Spark CoreÌṩÁ˶àÖÖ×ÊÔ´µ÷¶È¹ÜÀí£¬Í¨¹ýÄÚ´æ¼ÆËã¡¢ÓÐÏòÎÞ»·Í¼£¨DAG£©µÈ»úÖÆ±£Ö¤·Ö²¼Ê½¼ÆËãµÄ¿ìËÙ£¬²¢ÒýÈëÁËRDD
µÄ³éÏó±£Ö¤Êý¾ÝµÄ¸ßÈÝ´íÐÔ£¬ÆäÖØÒªÌØÐÔÃèÊöÈçÏ¡£
Spark CoreÌṩÁ˶àÖÖÔËÐÐģʽ£¬²»½ö¿ÉÒÔʹÓÃ×ÔÉíÔËÐÐģʽ´¦ÀíÈÎÎñ£¬Èç±¾µØÄ£Ê½¡¢Standalone£¬¶øÇÒ¿ÉÒÔʹÓõÚÈý·½×ÊÔ´µ÷¶È¿ò¼ÜÀ´´¦ÀíÈÎÎñ£¬ÈçYARN¡¢MESOSµÈ¡£Ïà±È½Ï¶øÑÔ£¬µÚÈý·½×ÊÔ´µ÷¶È¿ò¼ÜÄܹ»¸üϸÁ£¶È¹ÜÀí×ÊÔ´¡£ Spark CoreÌṩÁËÓÐÏòÎÞ»·Í¼£¨DAG£©µÄ·Ö²¼Ê½²¢ÐмÆËã¿ò¼Ü£¬²¢ÌṩÄÚ´æ»úÖÆÀ´Ö§³Ö¶à´Îµü´ú¼ÆËã»òÕßÊý¾Ý¹²Ïí£¬´ó´ó¼õÉÙµü´ú¼ÆËãÖ®¼ä¶ÁÈ¡Êý¾ÝµÄ¿ªÏú£¬Õâ¶ÔÓÚÐèÒª½øÐжà´Îµü´úµÄÊý¾ÝÍÚ¾òºÍ·ÖÎöÐÔÄÜÓм«´óÌáÉý¡£ÁíÍ⣬ÔÚÈÎÎñ´¦Àí¹ý³ÌÖÐÒÆ¶¯¼ÆËã¶ø·ÇÒÆ¶¯Êý¾Ý£¬RDDPartition
¿ÉÒԾͽü¶ÁÈ¡·Ö²¼Ê½ÎļþϵͳÖеÄÊý¾Ý¿éµ½¸÷¸ö½ÚµãÄÚ´æÖнøÐмÆËã¡£ ÔÚSpark ÖÐÒýÈëÁËRDDµÄ³éÏó£¬ËüÊÇ·Ö²¼ÔÚÒ»×é½ÚµãÖеÄÖ»¶Á¶ÔÏ󼯺ϣ¬ÕâЩ¼¯ºÏÊǵ¯ÐԵģ¬Èç¹ûÊý¾Ý¼¯Ò»²¿·Ö¶ªÊ§£¬Ôò¿ÉÒÔ¸ù¾Ý¡°ÑªÍ³¡±¶ÔËüÃǽøÐÐÖØ½¨£¬±£Ö¤ÁËÊý¾ÝµÄ¸ßÈÝ´íÐÔ¡£ Spark Streaming Spark Streaming ÊÇÒ»¸ö¶ÔʵʱÊý¾ÝÁ÷½øÐиßÍÌÍ¡¢¸ßÈÝ´íµÄÁ÷ʽ´¦Àíϵͳ£¬¿ÉÒÔ¶Ô¶àÖÖÊý¾ÝÔ´£¨ÈçKafka¡¢Flume¡¢Twitter
ºÍZeroMQ µÈ£©½øÐÐÀàËÆMap¡¢Reduce ºÍJoin µÈ¸´ÔÓ²Ù×÷£¬²¢½«½á¹û±£´æµ½ÍⲿÎļþϵͳ¡¢Êý¾Ý¿â»òÓ¦Óõ½ÊµÊ±ÒDZíÅÌ£¬ÈçÏÂͼ¡£

Ïà±ÈÆäËûµÄ´¦ÀíÒýÇæÒªÃ´Ö»×¨×¢ÓÚÁ÷´¦Àí£¬ÒªÃ´Ö»¸ºÔðÅú´¦Àí£¨½öÌṩÐèÒªÍⲿʵÏÖµÄÁ÷´¦ÀíAPI ½Ó¿Ú£©£¬¶øSpark
Streaming ×î´óµÄÓÅÊÆÊÇÌṩµÄ´¦ÀíÒýÇæºÍRDD ±à³ÌÄ£ÐÍ¿ÉÒÔͬʱ½øÐÐÅú´¦ÀíÓëÁ÷´¦Àí¡£ ¶ÔÓÚ´«Í³Á÷´¦ÀíÖÐÒ»´Î´¦ÀíÒ»Ìõ¼Ç¼µÄ·½Ê½¶øÑÔ£¬Spark Streaming ʹÓõÄÊǽ«Á÷Êý¾ÝÀëÉ¢»¯´¦Àí£¨Discretized
Streams£©£¬Í¨¹ý¸Ã´¦Àí·½Ê½Äܹ»½øÐÐÃë¼¶ÒÔϵÄÊý¾ÝÅú´¦Àí¡£ÔÚSparkStreaming ´¦Àí¹ý³ÌÖУ¬Receiver
²¢ÐнÓÊÕÊý¾Ý£¬²¢½«Êý¾Ý»º´æÖÁSpark ¹¤×÷½ÚµãµÄÄÚ´æÖС£¾¹ýÑÓ³ÙÓÅ»¯ºó£¬Spark ÒýÇæ¶Ô¶ÌÈÎÎñ£¨¼¸Ê®ºÁÃ룩Äܹ»½øÐÐÅú´¦Àí£¬²¢Çҿɽ«½á¹ûÊä³öÖÁÆäËûϵͳÖС£Ó봫ͳÁ¬ÐøËã×ÓÄ£ÐͲ»Í¬£¬ÆäÄ£ÐÍÊǾ²Ì¬·ÖÅä¸øÒ»¸ö½Úµã½øÐмÆË㣬¶øSpark
¿É»ùÓÚÊý¾ÝµÄÀ´Ô´ÒÔ¼°¿ÉÓÃ×ÊÔ´Çé¿ö¶¯Ì¬·ÖÅ䏸¹¤×÷½Úµã¡£ 
ʹÓÃÀëÉ¢»¯Á÷Êý¾Ý£¨DStreaming£©£¬Spark Streaming ½«¾ßÓÐÈçÏÂÌØÐÔ¡£
¶¯Ì¬¸ºÔؾùºâ£ºSpark Streaming ½«Êý¾Ý»®·ÖΪСÅúÁ¿£¬Í¨¹ýÕâÖÖ·½Ê½¿ÉÒÔʵÏÖ¶Ô×ÊÔ´¸üϸÁ£¶ÈµÄ·ÖÅä¡£ÀýÈ磬´«Í³ÊµÊ±Á÷¼Ç¼´¦ÀíϵͳÔÚÊäÈëÊý¾ÝÁ÷ÒÔ¼üÖµ½øÐзÖÇø´¦ÀíÇé¿öÏ£¬Èç¹ûÒ»¸ö½Úµã¼ÆËãѹÁ¦½Ï´ó³¬³öÁ˸ººÉ£¬¸Ã½Úµã½«³ÉΪƿ¾±£¬½ø¶øÍÏÂýÕû¸öϵͳµÄ´¦ÀíËÙ¶È¡£¶øÔÚSpark¡¡StreamingÖУ¬×÷ÒµÈÎÎñ½«»á¶¯Ì¬µØÆ½ºâ·ÖÅ䏸¸÷¸ö½Úµã£¬Èçͼ£¬¼´Èç¹ûÈÎÎñ´¦Àíʱ¼ä½Ï³¤£¬·ÖÅäµÄÈÎÎñÊýÁ¿½«ÉÙЩ£»Èç¹ûÈÎÎñ´¦Àíʱ¼ä½Ï¶Ì£¬Ôò·ÖÅäµÄÈÎÎñÊý¾Ý½«¸ü¶àЩ¡£ 
¿ìËÙ¹ÊÕϻָ´»úÖÆ£ºÔÚ½Úµã³öÏÖ¹ÊÕϵÄÇé¿öÏ£¬´«Í³Á÷´¦Àíϵͳ»áÔÚÆäËûµÄ½ÚµãÉÏÖØÆôʧ°ÜµÄÁ¬ÐøËã×Ó£¬²¢¿ÉÄÜÖØÐÂÔËÐÐÏÈǰÊý¾ÝÁ÷´¦Àí²Ù×÷»ñÈ¡²¿·Ö¶ªÊ§Êý¾Ý¡£Ôڴ˹ý³ÌÖÐÖ»ÓиýڵãÖØÐ´¦Àíʧ°ÜµÄ¹ý³Ì£¬Ö»ÓÐÔÚнڵãÍê³É¹ÊÕÏǰËùÓмÆËãºó£¬Õû¸öϵͳ²ÅÄܹ»´¦ÀíÆäËûÈÎÎñ¡£ÔÚSparkÖУ¬¼ÆË㽫·Ö³ÉÐí¶àСµÄÈÎÎñ£¬±£Ö¤ÄÜÔÚÈκνڵãÔËÐкóÄܹ»ÕýÈ·½øÐкϲ¢¡£Òò´Ë£¬ÔÚij½Úµã³öÏֵĹÊÕϵÄÇé¿ö£¬Õâ¸ö½ÚµãµÄÈÎÎñ½«¾ùÔȵطÖÉ¢µ½¼¯ÈºÖÐµÄ½Úµã½øÐмÆË㣬Ïà¶ÔÓÚ´«µÝ¹ÊÕϻָ´»úÖÆÄܹ»¸ü¿ìµØ»Ö¸´¡£

Åú´¦Àí¡¢Á÷´¦ÀíÓë½»»¥Ê½·ÖÎöµÄÒ»Ì廯£ºSpark Streaming Êǽ«Á÷ʽ¼ÆËã·Ö½â³ÉһϵÁжÌСµÄÅú´¦Àí×÷Òµ£¬Ò²¾ÍÊǰÑSpark
Streaming µÄÊäÈëÊý¾Ý°´ÕÕÅú´¦Àí´óС£¨È缸Ã룩·Ö³ÉÒ»¶ÎÒ»¶ÎµÄÀëÉ¢Êý¾ÝÁ÷£¨DStream£©£¬Ã¿Ò»¶ÎÊý¾Ý¶¼×ª»»³ÉSpark
ÖеÄRDD£¬È»ºó½«Spark Streaming ÖжÔDStream Á÷´¦Àí²Ù×÷±äΪÕë¶ÔSpark ÖжÔRDD
µÄÅú´¦Àí²Ù×÷¡£ÁíÍ⣬Á÷Êý¾Ý¶¼´¢´æÔÚSpark ½ÚµãµÄÄÚ´æÀÓû§±ãÄܸù¾ÝËùÐè½øÐн»»¥²éѯ¡£ÕýÊÇÀûÓÃÁËSpark
ÕâÖÖ¹¤×÷»úÖÆ½«Åú´¦Àí¡¢Á÷´¦ÀíÓë½»»¥Ê½¹¤×÷½áºÏÔÚÒ»Æð¡£ Spark SQL Spark SQL µÄǰÉíÊÇShark£¬Ëü·¢²¼Ê±Hive ¿ÉÒÔ˵ÊÇSQL on Hadoop µÄΨһѡÔñ£¨Hive
¸ºÔð½«SQL ±àÒë³É¿ÉÀ©Õ¹µÄMapReduce ×÷Òµ£©£¬¼øÓÚHive µÄÐÔÄÜÒÔ¼°ÓëSpark µÄ¼æÈÝ£¬Shark
Óɴ˶øÉú¡£ Shark ¼´Hive on Spark£¬±¾ÖÊÉÏÊÇͨ¹ýHive µÄHQL ½øÐнâÎö£¬°ÑHQL ·Òë³ÉSpark
É϶ÔÓ¦µÄRDD ²Ù×÷£¬È»ºóͨ¹ýHive µÄMetadata »ñÈ¡Êý¾Ý¿âÀïµÄ±íÐÅÏ¢£¬Êµ¼ÊΪHDFS ÉϵÄÊý¾ÝºÍÎļþ£¬×îºóÓÉShark
»ñÈ¡²¢·Åµ½Spark ÉÏÔËËã¡£Shark µÄ×î´óÌØÐÔ¾ÍÊÇËٶȿ죬ÄÜÓëHive µÄÍêÈ«¼æÈÝ£¬²¢ÇÒ¿ÉÒÔÔÚShell
ģʽÏÂʹÓÃrdd2sql ÕâÑùµÄAPI£¬°ÑHQL µÃµ½µÄ½á¹û¼¯¼ÌÐøÔÚScala»·¾³ÏÂÔËË㣬֧³ÖÓû§±àд¼òµ¥µÄ»úÆ÷ѧϰ»ò¼òµ¥·ÖÎö´¦Àíº¯Êý£¬¶ÔHQL
½á¹û½øÒ»²½·ÖÎö¼ÆËã¡£ ÔÚ2014 Äê7 ÔÂ1 ÈÕµÄSpark Summit ÉÏ£¬Databricks Ðû²¼ÖÕÖ¹¶ÔShark
µÄ¿ª·¢£¬½«Öصã·Åµ½Spark SQL ÉÏ¡£Ôڴ˴λáÒéÉÏ£¬Databricks ±íʾ£¬Shark ¸ü¶àÊǶÔHive
µÄ¸ÄÔì£¬Ìæ»»ÁËHive µÄÎïÀíÖ´ÐÐÒýÇæ£¬Ê¹Ö®ÓÐÒ»¸ö½Ï¿ìµÄ´¦ÀíËÙ¶È¡£È»¶ø£¬²»ÈݺöÊÓµÄÊÇ£¬Shark ¼Ì³ÐÁË´óÁ¿µÄHive´úÂ룬Òò´Ë¸øÓÅ»¯ºÍά»¤´øÀ´´óÁ¿µÄÂé·³¡£Ëæ×ÅÐÔÄÜÓÅ»¯ºÍÏȽø·ÖÎöÕûºÏµÄ½øÒ»²½¼ÓÉ»ùÓÚMapReduce
Éè¼ÆµÄ²¿·ÖÎÞÒɳÉΪÁËÕû¸öÏîÄ¿µÄÆ¿¾±¡£Òò´Ë£¬ÎªÁ˸üºÃµÄ·¢Õ¹£¬¸øÓû§Ìṩһ¸ö¸üºÃµÄÌåÑ飬Databricks
Ðû²¼ÖÕÖ¹Shark ÏîÄ¿£¬´Ó¶ø½«¸ü¶àµÄ¾«Á¦·Åµ½Spark SQL ÉÏ¡£ Spark SQL ÔÊÐí¿ª·¢ÈËÔ±Ö±½Ó´¦ÀíRDD£¬Í¬Ê±Ò²¿É²éѯÔÚ Hive ÉÏ´æÔÚµÄÍⲿÊý¾Ý¡£SparkSQL
µÄÒ»¸öÖØÒªÌØµãÊÇÄܹ»Í³Ò»´¦Àí¹ØÏµ±íºÍRDD£¬Ê¹µÃ¿ª·¢ÈËÔ±¿ÉÒÔÇáËɵØÊ¹ÓÃSQL ÃüÁî½øÐÐÍⲿ²éѯ£¬Í¬Ê±½øÐиü¸´ÔÓµÄÊý¾Ý·ÖÎö¡£
Spark SQL µÄÌØµãÈçÏ¡£
ÒýÈëÁËеÄRDD ÀàÐÍSchemaRDD£¬¿ÉÒÔÏñ´«Í³Êý¾Ý¿â¶¨Òå±íÒ»ÑùÀ´¶¨ÒåSchemaRDD¡£ SchemaRDDÓɶ¨ÒåÁËÁÐÊý¾ÝÀàÐ͵ÄÐжÔÏ󹹳ɡ£SchemaRDD
¼È¿ÉÒÔ´ÓRDD ת»»¹ý À´£¬Ò²¿ÉÒÔ´ÓParquet Îļþ¶ÁÈ룬»¹¿ÉÒÔʹÓÃHiveQL´ÓHive ÖлñÈ¡¡£ ÄÚǶÁËCatalyst ²éѯÓÅ»¯¿ò¼Ü£¬ÔÚ°ÑSQL ½âÎö³ÉÂß¼Ö´Ðмƻ®Ö®ºó£¬ÀûÓÃCatalyst °üÀïµÄһЩÀàºÍ½Ó¿Ú£¬Ö´ÐÐÁËһЩ¼òµ¥µÄÖ´Ðмƻ®ÓÅ»¯£¬×îºó±ä³ÉRDD
µÄ¼ÆËã¡£ ÔÚÓ¦ÓóÌÐòÖпÉÒÔ»ìºÏʹÓò»Í¬À´Ô´µÄÊý¾Ý£¬Èç¿ÉÒÔ½«À´×ÔHiveQLµÄÊý¾ÝºÍÀ´×ÔSQLµÄÊý¾Ý½øÐÐJoin ²Ù×÷¡£
SharkµÄ³öÏÖʹµÃSQL-on-Hadoop µÄÐÔÄܱÈHive ÓÐÁË10¡«100 ±¶µÄÌá¸ß£¬ÄÇô£¬°ÚÍÑÁË
Hive µÄÏÞÖÆ£¬Spark SQLµÄÐÔÄÜÓÖÓÐÔõôÑùµÄ±íÏÖÄØ£¿ËäȻûÓÐShark Ïà¶ÔÓÚHive ÄÇÑùÖõÄ¿µÄ
ÐÔÄÜÌáÉý£¬µ«Ò²±íÏÖµÃÓÅÒ죬Èçͼ£¨ÆäÖУ¬ÓÒ²àÊý¾ÝΪSpark¡¡SQL£©¡£ 
ΪʲôSpark SQL µÄÐÔÄÜ»áµÃµ½Õâô´óµÄÌáÉýÄØ£¿Ö÷ÒªÊÇSpark SQL ÔÚÒÔϼ¸µã×öÁËÓÅ»¯¡£
ÄÚ´æÁд洢£¨In-Memory Columnar Storage£©£ºSpark SQL µÄ±íÊý¾ÝÔÚÄÚ´æÖд洢²»ÊDzÉÓÃÔÉú̬µÄJVM¶ÔÏó´æ´¢·½Ê½£¬¶øÊDzÉÓÃÄÚ´æÁд洢¡£
×Ö½ÚÂëÉú³É¼¼Êõ£¨Bytecode Generation£©£ºSpark 1.1.0 ÔÚCatalyst
Ä£¿éµÄExpressions Ôö¼ÓÁËCodegen Ä£¿é£¬Ê¹Óö¯Ì¬×Ö½ÚÂëÉú³É¼¼Êõ£¬¶ÔÆ¥ÅäµÄ±í´ïʽ²ÉÓÃÌØ¶¨µÄ´úÂ붯̬±àÒë¡£ÁíÍâ¶ÔSQL
±í´ïʽ¶¼×öÁËCG ÓÅ»¯¡£CGÓÅ»¯µÄʵÏÖÖ÷Òª»¹ÊÇÒÀ¿¿Scala 2.10ÔËÐÐʱµÄ·´Éä»úÖÆ£¨Runtime
Reflection£©¡£ Scala ´úÂëÓÅ»¯£ºSpark SQL ÔÚʹÓÃScala±àд´úÂëµÄʱºò£¬¾¡Á¿±ÜÃâµÍЧµÄ¡¢ÈÝÒ×GCµÄ´úÂ룻¾¡¹ÜÔö¼ÓÁ˱àд´úÂëµÄÄѶȣ¬µ«¶ÔÓÚÓû§À´Ëµ½Ó¿Úͳһ¡£ BlinkDB BlinkDB ÊÇÒ»¸öÓÃÓÚÔÚº£Á¿Êý¾ÝÉÏÔËÐн»»¥Ê½SQL ²éѯµÄ´ó¹æÄ£²¢ÐвéѯÒýÇæ£¬ËüÔÊÐíÓû§Í¨¹ýȨºâÊý¾Ý¾«¶ÈÀ´ÌáÉý²éѯÏìӦʱ¼ä£¬ÆäÊý¾ÝµÄ¾«¶È±»¿ØÖÆÔÚÔÊÐíµÄÎó²î·¶Î§ÄÚ¡£ÎªÁË´ïµ½Õâ¸öÄ¿±ê£¬BlinkDB
ʹÓÃÈçϺËÐÄ˼Ï룺
×ÔÊÊÓ¦ÓÅ»¯¿ò¼Ü£¬´ÓÔʼÊý¾ÝËæ×Åʱ¼äµÄÍÆÒÆ½¨Á¢²¢Î¬»¤Ò»×é¶àάÑù±¾¡£ ¶¯Ì¬Ñù±¾Ñ¡Ôñ²ßÂÔ£¬Ñ¡ÔñÒ»¸öÊʵ±´óСµÄʾÀý£¬¸ÃʾÀý»ùÓÚ²éѯµÄ׼ȷÐÔºÍÏìӦʱ¼äµÄ½ôÆÈÐÔ¡£ºÍ´«Í³¹ØÏµÐÍÊý¾Ý¿â²»Í¬£¬BlinkDBÊÇÒ»¸ö½»»¥Ê½²éѯϵͳ£¬¾ÍÏñÒ»¸öõÎõΰ壬Óû§ÐèÒªÔÚ²éѯ¾«¶ÈºÍ²éѯʱ¼äÉÏ×öȨºâ£»Èç¹ûÓû§Ïë¸ü¿ìµØ»ñÈ¡²éѯ½á¹û£¬ÄÇô½«ÎþÉü²éѯ½á¹ûµÄ¾«¶È£»·´Ö®£¬Óû§Èç¹ûÏë»ñÈ¡¸ü¸ß¾«¶ÈµÄ²éѯ½á¹û£¬¾ÍÐèÒªÎþÉü²éѯÏìӦʱ¼ä¡£ÏÂͼΪBlinkDB¼Ü¹¹¡£

MLBase/MLlib MLBase ÊÇSpark Éú̬ϵͳÖÐרעÓÚ»úÆ÷ѧϰµÄ×é¼þ£¬ËüµÄÄ¿±êÊÇÈûúÆ÷ѧϰµÄÃż÷¸üµÍ£¬ÈÃһЩ¿ÉÄܲ¢²»Á˽â»úÆ÷ѧϰµÄÓû§Äܹ»·½±ãµØÊ¹ÓÃMLBase¡£MLBase
·ÖΪ4 ¸ö²¿·Ö£ºMLRuntime¡¢MLlib¡¢MLI ºÍML Optimizer¡£
MLRuntime£ºÊÇÓÉSpark Core ÌṩµÄ·Ö²¼Ê½ÄÚ´æ¼ÆËã¿ò¼Ü£¬ÔËÐÐÓÉOptimizerÓÅ»¯¹ýµÄËã·¨½øÐÐÊý¾ÝµÄ¼ÆËã²¢Êä³ö·ÖÎö½á¹û¡£ MLlib£ºÊÇSpark ʵÏÖһЩ³£¼ûµÄ»úÆ÷ѧϰËã·¨ºÍʵÓóÌÐò£¬°üÀ¨·ÖÀà¡¢»Ø¹é¡¢¾ÛÀà¡¢Ðͬ¹ýÂË¡¢½µÎ¬ÒÔ¼°µ×²ãÓÅ»¯¡£¸ÃËã·¨¿ÉÒÔ½øÐпÉÀ©³ä¡£ MLI£ºÊÇÒ»¸ö½øÐÐÌØÕ÷³éÈ¡ºÍ¸ß¼¶ML ±à³Ì³éÏóË㷨ʵÏÖµÄAPI »òƽ̨¡£ MLOptimizer£º»áÑ¡ÔñËüÈÏΪ×îÊʺϵÄÒѾÔÚÄÚ²¿ÊµÏÖºÃÁ˵ĻúÆ÷ѧϰËã·¨ºÍÏà¹Ø²ÎÊý£¬À´´¦ÀíÓû§ÊäÈëµÄÊý¾Ý£¬²¢·µ»ØÄ£ÐÍ»òÆäËû°ïÖú·ÖÎöµÄ½á¹û¡£ 
MLBase µÄºËÐÄÊÇÆäÓÅ»¯Æ÷£¨ML Optimizer£©£¬Ëü¿ÉÒÔ°ÑÉùÃ÷ʽµÄÈÎÎñת»¯³É¸´ÔÓµÄѧϰ¼Æ»®£¬×îÖÕ²ú³ö×îÓŵÄÄ£ÐͺͼÆËã½á¹û¡£MLBase
ÓëÆäËû»úÆ÷ѧϰWeka ºÍMahout ²»Í¬£¬ÈýÕ߸÷ÓÐÌØÉ«£¬¾ßÌåÄÚÈÝÈçÏ¡£
MLBase »ùÓÚSpark£¬ËüÊÇʹÓõÄÊÇ·Ö²¼Ê½ÄÚ´æ¼ÆËãµÄ£»Weka ÊÇÒ»¸öµ¥»úµÄϵͳ£¬¶øMahout
ÊÇʹÓÃMapReduce ½øÐд¦ÀíÊý¾Ý£¨Mahout ÕýÏòʹÓÃSpark ´¦ÀíÊý¾Ýת±ä£©¡£ MLBase ÊÇ×Ô¶¯»¯´¦ÀíµÄ£»Weka ºÍMahout ¶¼ÐèҪʹÓÃÕ߾߱¸»úÆ÷ѧϰ¼¼ÄÜ£¬À´Ñ¡Ôñ×Ô¼ºÏëÒªµÄËã·¨ºÍ²ÎÊýÀ´×ö´¦Àí¡£ MLBase ÌṩÁ˲»Í¬³éÏó³Ì¶ÈµÄ½Ó¿Ú£¬¿ÉÒÔÓÉÓû§Í¨¹ý¸Ã½Ó¿ÚʵÏÖËã·¨µÄÀ©Õ¹¡£ GraphX GraphX ×î³õÊDz®¿ËÀûAMP ʵÑéÊÒµÄÒ»¸ö·Ö²¼Ê½Í¼¼ÆËã¿ò¼ÜÏîÄ¿£¬ºóÀ´ÕûºÏµ½Spark ÖгÉΪһ¸öºËÐÄ×é¼þ¡£ËüÊÇSpark
ÖÐÓÃÓÚͼºÍͼ²¢ÐмÆËãµÄAPI£¬¿ÉÒÔÈÏΪÊÇGraphLab ºÍPregel ÔÚSpark ÉϵÄÖØÐ´¼°ÓÅ»¯¡£¸úÆäËû·Ö²¼Ê½Í¼¼ÆËã¿ò¼ÜÏà±È£¬GraphX
×î´óµÄÓÅÊÆÊÇ£ºÔÚSpark »ù´¡ÉÏÌṩÁËһջʽÊý¾Ý½â¾ö·½°¸£¬¿ÉÒÔ¸ßЧµØÍê³Éͼ¼ÆËãµÄÍêÕûµÄÁ÷Ë®×÷Òµ¡£ GraphX µÄºËÐijéÏóÊÇResilient Distributed Property Graph£¬Ò»ÖÖµãºÍ±ß¶¼´øÊôÐÔµÄÓÐÏò¶àÖØÍ¼¡£GraphX
À©Õ¹ÁËSpark RDD µÄ³éÏó£¬ËüÓÐTable ºÍGraph Á½ÖÖÊÓͼ£¬µ«Ö»ÐèÒªÒ»·ÝÎïÀí´æ´¢£¬Á½ÖÖÊÓͼ¶¼ÓÐ×Ô¼º¶ÀÓеIJÙ×÷·û£¬´Ó¶ø»ñµÃÁËÁé»î²Ù×÷ºÍÖ´ÐÐЧÂÊ¡£GraphX
µÄÕûÌå¼Ü¹¹Öд󲿷ֵÄʵÏÖ¶¼ÊÇÎ§ÈÆPartition µÄÓÅ»¯½øÐеģ¬ÕâÔÚijÖ̶ֳÈÉÏ˵Ã÷ÁË£¬µã·Ö¸îµÄ´æ´¢ºÍÏàÓ¦µÄ¼ÆËãÓÅ»¯µÄÈ·ÊÇͼ¼ÆËã¿ò¼ÜµÄÖØµãºÍÄѵ㡣
GraphX µÄµ×²ãÉè¼ÆÓÐÒÔϼ¸¸ö¹Ø¼üµã¡£ £¨1£©¶ÔGraph ÊÓͼµÄËùÓвÙ×÷£¬×îÖÕ¶¼»áת»»³ÉÆä¹ØÁªµÄTable ÊÓͼµÄRDD ²Ù×÷À´Íê³É¡£ÕâÑù¶ÔÒ»¸öͼµÄ¼ÆË㣬×îÖÕÔÚÂß¼ÉÏ£¬µÈ¼ÛÓÚһϵÁÐRDD
µÄת»»¹ý³Ì¡£Òò´Ë£¬Graph ×îÖվ߱¸ÁËRDD µÄ3 ¸ö¹Ø¼üÌØÐÔ£ºImmutable¡¢Distributed
ºÍFault-Tolerant¡£ÆäÖÐ×î¹Ø¼üµÄÊÇImmutable£¨²»±äÐÔ£©¡£Âß¼ÉÏ£¬ËùÓÐͼµÄת»»ºÍ²Ù×÷¶¼²úÉúÁËÒ»¸öÐÂͼ£»ÎïÀíÉÏ£¬GraphX
»áÓÐÒ»¶¨³Ì¶ÈµÄ²»±ä¶¥µãºÍ±ßµÄ¸´ÓÃÓÅ»¯£¬¶ÔÓû§Í¸Ã÷¡£ £¨2£©Á½ÖÖÊÓͼµ×²ã¹²ÓõÄÎïÀíÊý¾Ý£¬ÓÉRDD[Vertex-Partition]ºÍRDD[EdgePartition]ÕâÁ½¸öRDD
×é³É¡£µãºÍ±ßʵ¼Ê¶¼²»ÊÇÒÔ±íCollection[tuple] µÄÐÎʽ´æ´¢µÄ£¬ ¶øÊÇÓÉVertexPartition/EdgePartition
ÔÚÄÚ²¿´æ´¢Ò»¸ö´øË÷Òý½á¹¹µÄ·ÖƬÊý¾Ý¿é£¬ÒÔ¼ÓËÙ²»Í¬ÊÓͼϵıéÀúËÙ¶È¡£²»±äµÄË÷Òý½á¹¹ÔÚRDD ת»»¹ý³ÌÖÐÊǹ²Óõ쬽µµÍÁ˼ÆËãºÍ´æ´¢¿ªÏú¡£
£¨3£©Í¼µÄ·Ö²¼Ê½´æ´¢²ÉÓõã·Ö¸îģʽ£¬¶øÇÒʹÓÃpartitionBy ·½·¨£¬ÓÉÓû§Ö¸¶¨²»Í¬µÄ»®·Ö²ßÂÔ£¨PartitionStrategy£©¡£»®·Ö²ßÂԻὫ±ß·ÖÅäµ½¸÷¸öEdgePartition£¬¶¥µãMaster
·ÖÅäµ½¸÷¸öVertexPartition£¬EdgePartition Ò²»á»º´æ±¾µØ±ß¹ØÁªµãµÄGhost
¸±±¾¡£»®·Ö²ßÂԵIJ»Í¬»áÓ°Ïìµ½ËùÐèÒª»º´æµÄGhost ¸±±¾ÊýÁ¿£¬ÒÔ¼°Ã¿¸öEdgePartition ·ÖÅäµÄ±ßµÄ¾ùºâ³Ì¶È£¬ÐèÒª¸ù¾ÝͼµÄ½á¹¹ÌØÕ÷ѡȡ×î¼Ñ²ßÂÔ¡£
SparkR R ÊÇ×ñÑGNU ÐÒéµÄÒ»¿î¿ªÔ´¡¢Ãâ·ÑµÄÈí¼þ£¬¹ã·ºÓ¦ÓÃÓÚͳ¼Æ¼ÆËãºÍͳ¼ÆÖÆÍ¼£¬µ«ÊÇËüÖ»Äܵ¥»úÔËÐС£ÎªÁËÄܹ»Ê¹ÓÃR
ÓïÑÔ·ÖÎö´ó¹æÄ£·Ö²¼Ê½µÄÊý¾Ý£¬²®¿ËÀû·ÖУAMP ʵÑéÊÒ¿ª·¢ÁËSparkR£¬²¢ÔÚSpark 1.4 °æ±¾ÖмÓÈëÁ˸Ã×é¼þ¡£Í¨¹ýSparkR
¿ÉÒÔ·ÖÎö´ó¹æÄ£µÄÊý¾Ý¼¯£¬²¢Í¨¹ýR Shell ½»»¥Ê½µØÔÚSparkR ÉÏÔËÐÐ×÷Òµ¡£SparkR ÌØÐÔÈçÏ£º
ÌṩÁËSpark Öе¯ÐÔ·Ö²¼Ê½Êý¾Ý¼¯£¨RDDs£©µÄAPI£¬Óû§¿ÉÒÔÔÚ¼¯ÈºÉÏͨ¹ýR Shell½»»¥ÐÔµØÔËÐÐSpark
ÈÎÎñ¡£ Ö§³ÖÐò»¯±Õ°ü¹¦ÄÜ£¬¿ÉÒÔ½«Óû§¶¨Ò庯ÊýÖÐËùÒýÓõ½µÄ±äÁ¿×Ô¶¯Ðò»¯·¢Ë͵½¼¯ÈºÖÐÆäËûµÄ»úÆ÷ÉÏ¡£ SparkR »¹¿ÉÒÔºÜÈÝÒ׵ص÷ÓÃR ¿ª·¢°ü£¬Ö»ÐèÒªÔÚ¼¯ÈºÉÏÖ´ÐвÙ×÷ǰÓÃincludePackage¶ÁÈ¡R
¿ª·¢°ü¾Í¿ÉÒÔÁË¡£ ÏÂΪSparkR µÄ´¦ÀíÁ÷³ÌʾÒâͼ¡£ 
Alluxio Alluxio ÊÇÒ»¸ö·Ö²¼Ê½ÄÚ´æÎļþϵͳ£¬ËüÊÇÒ»¸ö¸ßÈÝ´íµÄ·Ö²¼Ê½Îļþϵͳ£¬ÔÊÐíÎļþÒÔÄÚ´æµÄËÙ¶ÈÔÚ¼¯Èº¿ò¼ÜÖнøÐпɿ¿µÄ¹²Ïí£¬¾ÍÏñSpark
ºÍ MapReduce ÄÇÑù¡£Alluxio ÊǼܹ¹ÔÚ×îµ×²ãµÄ·Ö²¼Ê½Îļþ´æ´¢ºÍÉϲãµÄ¸÷ÖÖ¼ÆËã¿ò¼ÜÖ®¼äµÄÒ»ÖÖÖмä¼þ¡£ÆäÖ÷ÒªÖ°ÔðÊǽ«ÄÇЩ²»ÐèÒªÂ䵨µ½DFS
ÀïµÄÎļþ£¬Â䵨µ½·Ö²¼Ê½ÄÚ´æÎļþϵͳÖУ¬À´´ïµ½¹²ÏíÄڴ棬´Ó¶øÌá¸ßЧÂÊ¡£Í¬Ê±¿ÉÒÔ¼õÉÙÄÚ´æÈßÓà¡¢GC ʱ¼äµÈ¡£
ºÍHadoop ÀàËÆ£¬Alluxio µÄ¼Ü¹¹ÊÇ´«Í³µÄMaster-Slave ¼Ü¹¹£¬ËùÓеÄAlluxio
Worker ¶¼±»Alluxio Master Ëù¹ÜÀí£¬Alluxio Master ͨ¹ýAlluxio
Worker ¶¨Ê±·¢³öµÄÐÄÌøÀ´ÅжÏWorker ÊÇ·ñÒѾ±ÀÀ£ÒÔ¼°Ã¿¸öWorker Ê£ÓàµÄÄÚ´æ¿Õ¼äÁ¿£¬ÎªÁË·ÀÖ¹µ¥µãÎÊÌâʹÓÃÁËZooKeeper
×öÁËHA¡£ Alluxio ¾ßÓÐÈçÏÂÌØÐÔ¡£
AVA-Like File API£ºAlluxio ÌṩÀàËÆJava File ÀàµÄAPI¡£ ¼æÈÝÐÔ£ºAlluxio ʵÏÖÁËHDFS ½Ó¿Ú£¬ËùÒÔSpark ºÍMapReduce ³ÌÐò²»ÐèÒªÈκÎÐ޸ļ´¿ÉÔËÐС£ ¿É²å°ÎµÄµ×²ãÎļþϵͳ£ºAlluxioÊÇÒ»¸ö¿É²å°ÎµÄµ×²ãÎļþϵͳ£¬ÌṩÈÝ´í¹¦ÄÜ£¬Ëü½«ÄÚ´æÊý¾Ý¼Ç¼ÔڵײãÎļþϵͳ¡£ËüÓÐÒ»¸öͨÓõĽӿڣ¬¿ÉÒÔºÜÈÝÒ׵زåÈëµ½²»Í¬µÄµ×²ãÎļþϵͳ¡£Ä¿Ç°Ö§³ÖHDFS¡¢S3¡¢GlusterFSºÍµ¥½ÚµãµÄ±¾µØÎļþϵͳ£¬ÒÔºó½«Ö§³Ö¸ü¶àµÄÎļþϵͳ¡£Alluxio
ËùÖ§³ÖµÄÓ¦ÓÃÈçÏ¡£ 
|