SparkÊÇÒ»¸öÓɼÓÖÝ´óѧ²®¿ËÀû·ÖУ£¨UC
Berkeley AMP£©¿ª·¢µÄÒ»¸ö·Ö²¼Ê½Êý¾Ý¿ìËÙ·ÖÎöÏîÄ¿¡£ËüµÄºËÐļ¼ÊõÊǵ¯ÐÔ·Ö²¼Ê½Êý¾Ý¼¯(Resilient
distributed datasets)£¬ÌṩÁ˱ÈHadoop¸ü¼Ó·á¸»µÄMapReduceÄ£ÐÍ£¬¿ÉÒÔ¿ìËÙÔÚÄÚ´æÖжÔÊý¾Ý¼¯½øÐжà´Îµü´ú£¬À´Ö§³Ö¸´ÔÓµÄÊý¾ÝÍÚ¾òËã·¨ºÍͼ¼ÆËãËã·¨¡£
SparkʹÓÃScala¿ª·¢£¬Ê¹ÓÃMesos×÷Ϊµ×²ãµÄµ÷¶È¿ò¼Ü£¬¿ÉÒÔºÍhadoopºÍEc2½ôÃܼ¯³É£¬Ö±½Ó¶ÁÈ¡hdfs»òS3µÄÎļþ½øÐмÆËã²¢°Ñ½á¹ûд»Øhdfs»òS3£¬ÊÇHadoopºÍAmazonÔÆ¼ÆËãÉú̬ȦµÄÒ»²¿·Ö¡£SparkÊÇÒ»¸öСÇÉÁáççµÄÏîÄ¿£¬ÏîÄ¿µÄcore²¿·ÖµÄ´úÂëÖ»ÓÐ63¸öScalaÎļþ£¬³ä·ÖÌåÏÖÁ˾«¼òÖ®ÃÀ¡£

parkÖ®ÒÀÀµ
1.map ReduceÄ£ÐÍ£º×÷Ϊһ¸ö·Ö²¼Ê½¼ÆËã¿ò¼Ü£¬Spark²ÉÓÃÁËMapReduceÄ£ÐÍ¡£ÔÚËüÉíÉÏ£¬GoogleµÄMap
ReduceºÍHadoopµÄºÛ¼£ºÜÖØ£¬ºÜÃ÷ÏÔ£¬Ëü²¢·ÇÒ»¸ö´óµÄ´´Ð£¬¶øÊÇ΢´´Ð¡£ÔÚ»ù´¡ÀíÄî²»±äµÄǰÌáÏ£¬Ëü½è¼ø£¬Ä£·Â²¢ÒÀÀµÁËÏȱ²£¬¼ÓÈëÁËÒ»µã¸Ä½ø£¬¼«´óµÄÌáÉýÁËMapReduceµÄЧÂÊ¡£
º¯Êýʽ±à³Ì£ºSparkÓÉScalaд¾Í£¬¶øÖ§³ÖµÄÓïÑÔÒàÊÇScala¡£ÆäÔÒòÖ®Ò»¾ÍÊÇScalaÖ§³Öº¯Êýʽ±à³Ì¡£ÕâÒ»À´Ôì¾ÍÁËSparkµÄ´úÂë¼ò½à£¬¶þÀ´Ê¹µÃ»ùÓÚSpark¿ª·¢µÄ³ÌÐò£¬Ò²ÌرðµÄ¼ò½à¡£Ò»´ÎÍêÕûµÄMapReduce£¬HadoopÖÐÐèÒª´´½¨Ò»¸öMapperÀàºÍReduceÀ࣬¶øSparkÖ»ÐèÒª´´½¨ÏàÓ¦µÄÒ»¸ömap
2.ÊýºÍreduceº¯Êý¼´¿É£¬´úÂëÁ¿´ó´ó½µµÍ¡£
3.Mesos£ºSpark½«·Ö²¼Ê½ÔËÐеÄÐèÒª¿¼ÂǵÄÊÂÇ飬¶¼½»¸øÁËMesos£¬×Ô¼º²»Care£¬ÕâÒ²ÊÇËü´úÂëÄܹ»¾«¼òµÄÔÒòÖ®Ò»¡£
4.HDFSºÍS3£ºSparkÖ§³Ö2ÖÖ·Ö²¼Ê½´æ´¢ÏµÍ³£ºHDFSºÍS3¡£Ó¦¸ÃËãÊÇĿǰ×îÖ÷Á÷µÄÁ½ÖÖÁË¡£¶ÔÎļþϵͳµÄ¶ÁÈ¡ºÍдÈ빦ÄÜÊÇSpark×Ô¼ºÌṩµÄ£¬½èÖúMesos·Ö²¼Ê½ÊµÏÖ¡£
SparkÓëHadoopµÄ¶Ô±È
1.SparkµÄÖмäÊý¾Ý·Åµ½ÄÚ´æÖУ¬¶ÔÓÚµü´úÔËËãЧÂʸü¸ß¡£Spark¸üÊʺÏÓÚµü´úÔËËã±È½Ï¶àµÄMLºÍDMÔËËã¡£ÒòΪÔÚSparkÀïÃæ£¬ÓÐRDDµÄ³éÏó¸ÅÄî¡£
2.Spark±ÈHadoop¸üͨÓá£
SparkÌṩµÄÊý¾Ý¼¯²Ù×÷ÀàÐÍÓкܶàÖÖ£¬²»ÏñHadoopÖ»ÌṩÁËMapºÍReduceÁ½ÖÖ²Ù×÷¡£±ÈÈçmap,filter,flatMap,sample,groupByKey,reduceByKey,union,join,cogroup,mapValues,sort,partionByµÈ¶àÖÖ²Ù×÷ÀàÐÍ£¬Spark°ÑÕâЩ²Ù×÷³ÆÎªTransformations¡£Í¬Ê±»¹ÌṩCount,collect,reduce,lookup,saveµÈ¶àÖÖactions²Ù×÷¡£
ÕâЩ¶àÖÖ¶àÑùµÄÊý¾Ý¼¯²Ù×÷ÀàÐÍ£¬¸ø¸ø¿ª·¢ÉϲãÓ¦ÓõÄÓû§ÌṩÁË·½±ã¡£¸÷¸ö´¦Àí½ÚµãÖ®¼äµÄͨÐÅÄ£ÐͲ»ÔÙÏñHadoopÄÇÑù¾ÍÊÇΨһµÄData
ShuffleÒ»ÖÖģʽ¡£Óû§¿ÉÒÔÃüÃû£¬Îﻯ£¬¿ØÖÆÖмä½á¹ûµÄ´æ´¢¡¢·ÖÇøµÈ¡£¿ÉÒÔ˵±à³ÌÄ£ÐͱÈHadoop¸üÁé»î¡£
²»¹ýÓÉÓÚRDDµÄÌØÐÔ£¬Spark²»ÊÊÓÃÄÇÖÖÒ첽ϸÁ£¶È¸üÐÂ״̬µÄÓ¦Óã¬ÀýÈçweb·þÎñµÄ´æ´¢»òÕßÊÇÔöÁ¿µÄwebÅÀ³æºÍË÷Òý¡£¾ÍÊǶÔÓÚÄÇÖÖÔöÁ¿Ð޸ĵÄÓ¦ÓÃÄ£ÐͲ»Êʺϡ£
3.ÈÝ´íÐÔ¡£ÔÚ·Ö²¼Ê½Êý¾Ý¼¯¼ÆËãʱͨ¹ýcheckpointÀ´ÊµÏÖÈÝ´í£¬¶øcheckpointÓÐÁ½ÖÖ·½Ê½£¬Ò»¸öÊÇcheckpoint
data£¬Ò»¸öÊÇlogging the updates¡£Óû§¿ÉÒÔ¿ØÖƲÉÓÃÄÄÖÖ·½Ê½À´ÊµÏÖÈÝ´í¡£
4.¿ÉÓÃÐÔ¡£Sparkͨ¹ýÌṩ·á¸»µÄScala, Java£¬Python
API¼°½»»¥Ê½ShellÀ´Ìá¸ß¿ÉÓÃÐÔ¡£
parkÓëHadoopµÄ½áºÏ
Spark¿ÉÒÔÖ±½Ó¶ÔHDFS½øÐÐÊý¾ÝµÄ¶Áд£¬Í¬ÑùÖ§³ÖSpark on YARN¡£Spark¿ÉÒÔÓëMapReduceÔËÐÐÓÚͬ¼¯ÈºÖУ¬¹²Ïí´æ´¢×ÊÔ´Óë¼ÆË㣬Êý¾Ý²Ö¿âSharkʵÏÖÉϽèÓÃHive£¬¼¸ºõÓëHiveÍêÈ«¼æÈÝ¡£
SparkµÄºËÐĸÅÄî
1¡¢Resilient Distributed Dataset (RDD)µ¯ÐÔ·Ö²¼Êý¾Ý¼¯
RDDÊÇSparkµÄ×î»ù±¾³éÏó,ÊǶԷֲ¼Ê½ÄÚ´æµÄ³éÏóʹÓã¬ÊµÏÖÁËÒÔ²Ù×÷±¾µØ¼¯ºÏµÄ·½Ê½À´²Ù×÷·Ö²¼Ê½Êý¾Ý¼¯µÄ³éÏóʵÏÖ¡£RDDÊÇSpark×îºËÐĵĶ«Î÷£¬Ëü±íʾÒѱ»·ÖÇø£¬²»¿É±äµÄ²¢Äܹ»±»²¢ÐвÙ×÷µÄÊý¾Ý¼¯ºÏ£¬²»Í¬µÄÊý¾Ý¼¯¸ñʽ¶ÔÓ¦²»Í¬µÄRDDʵÏÖ¡£RDD±ØÐëÊÇ¿ÉÐòÁл¯µÄ¡£RDD¿ÉÒÔcacheµ½ÄÚ´æÖУ¬Ã¿´Î¶ÔRDDÊý¾Ý¼¯µÄ²Ù×÷Ö®ºóµÄ½á¹û£¬¶¼¿ÉÒÔ´æ·Åµ½ÄÚ´æÖУ¬ÏÂÒ»¸ö²Ù×÷¿ÉÒÔÖ±½Ó´ÓÄÚ´æÖÐÊäÈ룬ʡȥÁËMapReduce´óÁ¿µÄ´ÅÅÌIO²Ù×÷¡£Õâ¶ÔÓÚµü´úÔËËã±È½Ï³£¼ûµÄ»úÆ÷ѧϰËã·¨,
½»»¥Ê½Êý¾ÝÍÚ¾òÀ´Ëµ£¬Ð§ÂÊÌáÉý±È½Ï´ó¡£
RDDµÄÌØµã£º
1.ËüÊÇÔÚ¼¯Èº½ÚµãÉϵIJ»¿É±äµÄ¡¢ÒÑ·ÖÇøµÄ¼¯ºÏ¶ÔÏó¡£
2.ͨ¹ý²¢ÐÐת»»µÄ·½Ê½À´´´½¨È磨map, filter, join, etc£©¡£
3.ʧ°Ü×Ô¶¯Öؽ¨¡£
4.¿ÉÒÔ¿ØÖÆ´æ´¢¼¶±ð£¨ÄÚ´æ¡¢´ÅÅ̵ȣ©À´½øÐÐÖØÓá£
5.±ØÐëÊÇ¿ÉÐòÁл¯µÄ¡£
6.ÊǾ²Ì¬ÀàÐ͵ġ£
RDDµÄºÃ´¦£º
1.RDDÖ»Äܴӳ־ô洢»òͨ¹ýTransformations²Ù×÷²úÉú£¬Ïà±ÈÓÚ·Ö²¼Ê½¹²ÏíÄڴ棨DSM£©¿ÉÒÔ¸ü¸ßЧʵÏÖÈÝ´í£¬¶ÔÓÚ¶ªÊ§²¿·ÖÊý¾Ý·ÖÇøÖ»Ðè¸ù¾ÝËüµÄlineage¾Í¿ÉÖØÐ¼ÆËã³öÀ´£¬¶ø²»ÐèÒª×öÌØ¶¨µÄCheckpoint¡£
2.RDDµÄ²»±äÐÔ£¬¿ÉÒÔʵÏÖÀàHadoop MapReduceµÄÍÆ²âʽִÐС£
3.RDDµÄÊý¾Ý·ÖÇøÌØÐÔ£¬¿ÉÒÔͨ¹ýÊý¾ÝµÄ±¾µØÐÔÀ´Ìá¸ßÐÔÄÜ£¬ÕâÓëHadoop
MapReduceÊÇÒ»ÑùµÄ¡£
4.RDD¶¼ÊÇ¿ÉÐòÁл¯µÄ£¬ÔÚÄÚ´æ²»×ãʱ¿É×Ô¶¯½µ¼¶Îª´ÅÅÌ´æ´¢£¬°ÑRDD´æ´¢ÓÚ´ÅÅÌÉÏ£¬ÕâʱÐÔÄÜ»áÓдóµÄϽµµ«²»»á²îÓÚÏÖÔÚµÄMapReduce¡£
RDDµÄ´æ´¢Óë·ÖÇø£º
Óû§¿ÉÒÔÑ¡Ôñ²»Í¬µÄ´æ´¢¼¶±ð´æ´¢RDDÒÔ±ãÖØÓá£
µ±Ç°RDDĬÈÏÊÇ´æ´¢ÓÚÄڴ棬µ«µ±ÄÚ´æ²»×ãʱ£¬RDD»áspillµ½disk¡£
RDDÔÚÐèÒª½øÐзÖÇø°ÑÊý¾Ý·Ö²¼ÓÚ¼¯ÈºÖÐʱ»á¸ù¾ÝÿÌõ¼Ç¼Key½øÐзÖÇø£¨ÈçHash
·ÖÇø£©£¬ÒԴ˱£Ö¤Á½¸öÊý¾Ý¼¯ÔÚJoinʱÄܸßЧ¡£
RDDµÄÄÚ²¿±íʾ£º
·ÖÇøÁÐ±í£¨Êý¾Ý¿éÁÐ±í£©
¼ÆËãÿ¸ö·ÖƬµÄº¯Êý£¨¸ù¾Ý¸¸RDD¼ÆËã³ö´ËRDD£©
¶Ô¸¸RDDµÄÒÀÀµÁбí
¶Ôkey-value RDDµÄPartitioner¡¾¿ÉÑ¡¡¿
ÿ¸öÊý¾Ý·ÖƬµÄÔ¤¶¨Ò嵨ַÁбí(ÈçHDFSÉϵÄÊý¾Ý¿éµÄµØÖ·)¡¾¿ÉÑ¡¡¿
RDDµÄ´æ´¢¼¶±ð£ºRDD¸ù¾ÝuseDisk¡¢useMemory¡¢deserialized¡¢replicationËĸö²ÎÊýµÄ×éºÏÌṩÁË11ÖÖ´æ´¢¼¶±ð¡£RDD¶¨ÒåÁ˸÷ÖÖ²Ù×÷£¬²»Í¬ÀàÐ͵ÄÊý¾ÝÓɲ»Í¬µÄRDDÀà³éÏó±íʾ£¬²»Í¬µÄ²Ù×÷Ò²ÓÉRDD½øÐгéʵÏÖ¡£
RDDÓÐÁ½ÖÖ´´½¨·½Ê½£º
´ÓHadoopÎļþϵͳ£¨»òÓëHadoop¼æÈÝµÄÆäËü´æ´¢ÏµÍ³£©ÊäÈ루ÀýÈçHDFS£©´´½¨¡£
´Ó¸¸RDDת»»µÃµ½ÐÂRDD¡£
2¡¢Spark On Mesos
SparkÖ§³ÖLocalµ÷ÓúÍMesos¼¯ÈºÁ½ÖÖģʽ£¬ÔÚSparkÉÏ¿ª·¢Ëã·¨³ÌÐò£¬¿ÉÒÔÔÚ±¾µØÄ£Ê½µ÷ÊԳɹ¦ºó£¬Ö±½Ó¸ÄÓÃMesos¼¯ÈºÔËÐУ¬³ýÁËÎļþµÄ±£´æÎ»ÖÃÐèÒª¿¼ÂÇÒÔÍ⣬Ëã·¨ÀíÂÛÉϲ»ÐèÒª×öÈκÎÐ޸ġ£SparkµÄ±¾µØÄ£Ê½Ö§³Ö¶àỊ̈߳¬ÓÐÒ»¶¨µÄµ¥»ú²¢·¢´¦ÀíÄÜÁ¦¡£µ«ÊDz»ËãºÜÇ¿¾¢¡£±¾µØÄ£Ê½¿ÉÒÔ±£´æ½á¹ûÔÚ±¾µØ»òÕß·Ö²¼Ê½Îļþϵͳ£¬¶øMesosģʽһ¶¨ÐèÒª±£´æÔÚ·Ö²¼Ê½»òÕß¹²ÏíÎļþϵͳ¡£
ΪÁËÔÚMesos¿ò¼ÜÉÏÔËÐУ¬°²×°MesosµÄ¹æ·¶ºÍÉè¼Æ£¬SparkʵÏÖÁ½¸öÀ࣬һ¸öÊÇSparkScheduler£¬ÔÚSparkÖÐÀàÃûÊÇMesosScheduler£»Ò»¸öÊÇSparkExecutor£¬ÔÚSparkÖÐÀàÃûÊÇExecutor¡£ÓÐÁËÕâÁ½¸öÀ࣬Spark¾Í¿ÉÒÔͨ¹ýMesos½øÐзֲ¼Ê½µÄ¼ÆËã¡£Spark»á½«RDDºÍMapReduceº¯Êý£¬½øÐÐÒ»´Îת»»£¬±ä³É±ê×¼µÄJobºÍһϵÁеÄTask¡£Ìá½»¸øSparkScheduler£¬SparkScheduler»á°ÑTaskÌá½»¸øMesos
Master£¬ÓÉMaster·ÖÅ䏸²»Í¬µÄSlave£¬×îÖÕÓÉSlaveÖеÄSpark Executor£¬½«·ÖÅäµ½µÄTaskÒ»Ò»Ö´ÐУ¬²¢ÇÒ·µ»Ø£¬×é³ÉеÄRDD£¬»òÕßÖ±½ÓдÈëµ½·Ö²¼Ê½Îļþϵͳ¡£

3¡¢Transformations & Actions
¶ÔÓÚRDD¿ÉÒÔÓÐÁ½ÖÖ¼ÆË㷽ʽ£º×ª»»£¨·µ»ØÖµ»¹ÊÇÒ»¸öRDD£©Óë²Ù×÷£¨·µ»ØÖµ²»ÊÇÒ»¸öRDD£©¡£
ת»»(Transformations) (È磺map, filter,
groupBy, joinµÈ)£¬Transformations²Ù×÷ÊÇLazyµÄ£¬Ò²¾ÍÊÇ˵´ÓÒ»¸öRDDת»»Éú³ÉÁíÒ»¸öRDDµÄ²Ù×÷²»ÊÇÂíÉÏÖ´ÐУ¬SparkÔÚÓöµ½Transformations²Ù×÷ʱֻ»á¼Ç¼ÐèÒªÕâÑùµÄ²Ù×÷£¬²¢²»»áÈ¥Ö´ÐУ¬ÐèÒªµÈµ½ÓÐActions²Ù×÷µÄʱºò²Å»áÕæÕýÆô¶¯¼ÆËã¹ý³Ì½øÐмÆËã¡£
²Ù×÷(Actions) (È磺count, collect, saveµÈ)£¬Actions²Ù×÷»á·µ»Ø½á¹û»ò°ÑRDDÊý¾Ýдµ½´æ´¢ÏµÍ³ÖС£ActionsÊÇ´¥·¢SparkÆô¶¯¼ÆËãµÄ¶¯Òò¡£
ËüÃDZ¾ÖÊÇø±ðÊÇ£ºTransformation·µ»ØÖµ»¹ÊÇÒ»¸öRDD¡£ËüʹÓÃÁËÁ´Ê½µ÷ÓõÄÉè¼ÆÄ£Ê½£¬¶ÔÒ»¸öRDD½øÐмÆËãºó£¬±ä»»³ÉÁíÍâÒ»¸öRDD£¬È»ºóÕâ¸öRDDÓÖ¿ÉÒÔ½øÐÐÁíÍâÒ»´Îת»»¡£Õâ¸ö¹ý³ÌÊÇ·Ö²¼Ê½µÄ¡£Action·µ»ØÖµ²»ÊÇÒ»¸öRDD¡£ËüҪôÊÇÒ»¸öScalaµÄÆÕͨ¼¯ºÏ£¬ÒªÃ´ÊÇÒ»¸öÖµ£¬ÒªÃ´Êǿգ¬×îÖÕ»ò·µ»Øµ½Driver³ÌÐò£¬»ò°ÑRDDдÈëµ½ÎļþϵͳÖС£¹ØÓÚÕâÁ½¸ö¶¯×÷£¬ÔÚSpark¿ª·¢Ö¸ÄÏÖлáÓоͽøÒ»²½µÄÏêϸ½éÉÜ£¬ËüÃÇÊÇ»ùÓÚSpark¿ª·¢µÄºËÐÄ¡£ÕâÀォSparkµÄ¹Ù·½pptÖеÄÒ»ÕÅͼÂÔ×÷¸ÄÔ죬²ûÃ÷Ò»ÏÂÁ½ÖÖ¶¯×÷µÄÇø±ð¡£

4¡¢Lineage£¨ÑªÍ³£©
ÀûÓÃÄÚ´æ¼Ó¿ìÊý¾Ý¼ÓÔØ,ÔÚÖÚ¶àµÄÆäËüµÄIn-MemoryÀàÊý¾Ý¿â»òCacheÀàϵͳÖÐÒ²ÓÐʵÏÖ£¬SparkµÄÖ÷񻂿±ðÔÚÓÚËü´¦Àí·Ö²¼Ê½ÔËËã»·¾³ÏµÄÊý¾ÝÈÝ´íÐÔ£¨½ÚµãʵЧ/Êý¾Ý¶ªÊ§£©ÎÊÌâʱ²ÉÓõķ½°¸¡£ÎªÁ˱£Ö¤RDDÖÐÊý¾ÝµÄ³°ôÐÔ£¬RDDÊý¾Ý¼¯Í¨¹ýËùνµÄѪͳ¹ØÏµ(Lineage)¼ÇסÁËËüÊÇÈçºÎ´ÓÆäËüRDDÖÐÑݱä¹ýÀ´µÄ¡£Ïà±ÈÆäËüϵͳµÄϸ¿ÅÁ£¶ÈµÄÄÚ´æÊý¾Ý¸üм¶±ðµÄ±¸·Ý»òÕßLOG»úÖÆ£¬RDDµÄLineage¼Ç¼µÄÊÇ´Ö¿ÅÁ£¶ÈµÄÌØ¶¨Êý¾Ýת»»£¨Transformation£©²Ù×÷£¨filter,
map, join etc.)ÐÐΪ¡£µ±Õâ¸öRDDµÄ²¿·Ö·ÖÇøÊý¾Ý¶ªÊ§Ê±£¬Ëü¿ÉÒÔͨ¹ýLineage»ñÈ¡×ã¹»µÄÐÅÏ¢À´ÖØÐÂÔËËãºÍ»Ö¸´¶ªÊ§µÄÊý¾Ý·ÖÇø¡£ÕâÖÖ´Ö¿ÅÁ£µÄÊý¾ÝÄ£ÐÍ£¬ÏÞÖÆÁËSparkµÄÔËÓó¡ºÏ£¬µ«Í¬Ê±Ïà±Èϸ¿ÅÁ£¶ÈµÄÊý¾ÝÄ£ÐÍ£¬Ò²´øÀ´ÁËÐÔÄܵÄÌáÉý¡£
RDDÔÚLineageÒÀÀµ·½Ãæ·ÖΪÁ½ÖÖNarrow DependenciesÓëWide
DependenciesÓÃÀ´½â¾öÊý¾ÝÈÝ´íµÄ¸ßЧÐÔ¡£
Narrow DependenciesÊÇÖ¸¸¸RDDµÄÿһ¸ö·ÖÇø×î¶à±»Ò»¸ö×ÓRDDµÄ·ÖÇøËùÓ㬱íÏÖΪһ¸ö¸¸RDDµÄ·ÖÇø¶ÔÓ¦ÓÚÒ»¸ö×ÓRDDµÄ·ÖÇø»ò¶à¸ö¸¸RDDµÄ·ÖÇø¶ÔÓ¦ÓÚÒ»¸ö×ÓRDDµÄ·ÖÇø£¬Ò²¾ÍÊÇ˵һ¸ö¸¸RDDµÄÒ»¸ö·ÖÇø²»¿ÉÄܶÔÓ¦Ò»¸ö×ÓRDDµÄ¶à¸ö·ÖÇø¡£
Wide DependenciesÊÇÖ¸×ÓRDDµÄ·ÖÇøÒÀÀµÓÚ¸¸RDDµÄ¶à¸ö·ÖÇø»òËùÓзÖÇø£¬Ò²¾ÍÊÇ˵´æÔÚÒ»¸ö¸¸RDDµÄÒ»¸ö·ÖÇø¶ÔÓ¦Ò»¸ö×ÓRDDµÄ¶à¸ö·ÖÇø¡£¶ÔÓëWide
Dependencies£¬ÕâÖÖ¼ÆËãµÄÊäÈëºÍÊä³öÔÚ²»Í¬µÄ½ÚµãÉÏ£¬lineage·½·¨¶ÔÓëÊäÈë½ÚµãÍêºÃ£¬¶øÊä³ö½Úµãå´»úʱ£¬Í¨¹ýÖØÐ¼ÆË㣬ÕâÖÖÇé¿öÏ£¬ÕâÖÖ·½·¨ÈÝ´íÊÇÓÐЧµÄ£¬·ñÔòÎÞЧ£¬ÒòΪÎÞ·¨ÖØÊÔ£¬ÐèÒªÏòÉÏÆä׿ÏÈ×·ËÝ¿´ÊÇ·ñ¿ÉÒÔÖØÊÔ£¨Õâ¾ÍÊÇlineage£¬ÑªÍ³µÄÒâ˼£©£¬Narrow
Dependencies¶ÔÓÚÊý¾ÝµÄÖØË㿪ÏúҪԶСÓÚWide DependenciesµÄÊý¾ÝÖØË㿪Ïú¡£
ÔÚRDD¼ÆË㣬ͨ¹ýcheckpint½øÐÐÈÝ´í£¬×öcheckpointÓÐÁ½ÖÖ·½Ê½£¬Ò»¸öÊÇcheckpoint
data£¬Ò»¸öÊÇlogging the updates¡£Óû§¿ÉÒÔ¿ØÖƲÉÓÃÄÄÖÖ·½Ê½À´ÊµÏÖÈÝ´í£¬Ä¬ÈÏÊÇlogging
the updates·½Ê½£¬Í¨¹ý¼Ç¼¸ú×ÙËùÓÐÉú³ÉRDDµÄת»»£¨transformations£©Ò²¾ÍÊǼǼÿ¸öRDDµÄlineage£¨ÑªÍ³£©À´ÖØÐ¼ÆËãÉú³É¶ªÊ§µÄ·ÖÇøÊý¾Ý¡£
SparkµÄShuffle¹ý³Ì½éÉÜ
1.Shuffle Writer
Spark·á¸»ÁËÈÎÎñÀàÐÍ£¬ÓÐЩÈÎÎñÖ®¼äÊý¾ÝÁ÷ת²»ÐèҪͨ¹ýShuffle£¬µ«ÊÇÓÐЩÈÎÎñÖ®¼ä»¹ÊÇÐèҪͨ¹ýShuffleÀ´´«µÝÊý¾Ý£¬±ÈÈçwide
dependencyµÄgroup by key¡£
SparkÖÐÐèÒªShuffleÊä³öµÄMapÈÎÎñ»áΪÿ¸öReduce´´½¨¶ÔÓ¦µÄbucket£¬Map²úÉúµÄ½á¹û»á¸ù¾ÝÉèÖõÄpartitionerµÃµ½¶ÔÓ¦µÄbucketId£¬È»ºóÌî³äµ½ÏàÓ¦µÄbucketÖÐÈ¥¡£Ã¿¸öMapµÄÊä³ö½á¹û¿ÉÄܰüº¬ËùÓеÄReduceËùÐèÒªµÄÊý¾Ý£¬ËùÒÔÿ¸öMap»á´´½¨R¸öbucket£¨RÊÇreduceµÄ¸öÊý£©£¬M¸öMap×ܹ²»á´´½¨M*R¸öbucket¡£
Map´´½¨µÄbucketÆäʵ¶ÔÓ¦´ÅÅÌÉϵÄÒ»¸öÎļþ£¬MapµÄ½á¹ûдµ½Ã¿¸öbucketÖÐÆäʵ¾ÍÊÇдµ½ÄǸö´ÅÅÌÎļþÖУ¬Õâ¸öÎļþÒ²±»³ÆÎªblockFile£¬ÊÇDisk
Block Manager¹ÜÀíÆ÷ͨ¹ýÎļþÃûµÄHashÖµ¶ÔÓ¦µ½±¾µØÄ¿Â¼µÄ×ÓĿ¼Öд´½¨µÄ¡£Ã¿¸öMapÒªÔÚ½ÚµãÉÏ´´½¨R¸ö´ÅÅÌÎļþÓÃÓÚ½á¹ûÊä³ö£¬MapµÄ½á¹ûÊÇÖ±½ÓÊä³öµ½´ÅÅÌÎļþÉϵģ¬100KBµÄÄڴ滺³åÊÇÓÃÀ´´´½¨Fast
Buffered OutputStreamÊä³öÁ÷¡£ÕâÖÖ·½Ê½Ò»¸öÎÊÌâ¾ÍÊÇShuffleÎļþ¹ý¶à¡£

Õë¶ÔÉÏÊöShuffle¹ý³Ì²úÉúµÄÎļþ¹ý¶àÎÊÌ⣬SparkÓÐÁíÍâÒ»ÖָĽøµÄShuffle¹ý³Ì£ºconsolidation
Shuffle£¬ÒÔÆÚÏÔÖø¼õÉÙShuffleÎļþµÄÊýÁ¿¡£ÔÚconsolidation ShuffleÖÐÿ¸öbucket²¢·Ç¶ÔÓ¦Ò»¸öÎļþ£¬¶øÊǶÔÓ¦ÎļþÖеÄÒ»¸ösegment²¿·Ö¡£JobµÄmapÔÚij¸ö½ÚµãÉϵÚÒ»´ÎÖ´ÐУ¬ÎªÃ¿¸öreduce´´½¨bucket¶ÔÓ¦µÄÊä³öÎļþ£¬°ÑÕâЩÎļþ×éÖ¯³ÉShuffleFileGroup£¬µ±Õâ´ÎmapÖ´ÐÐÍêÖ®ºó£¬Õâ¸öShuffleFileGroup¿ÉÒÔÊÍ·ÅΪÏ´ÎÑ»·ÀûÓ㻵±ÓÖÓÐmapÔÚÕâ¸ö½ÚµãÉÏÖ´ÐÐʱ£¬²»ÐèÒª´´½¨ÐµÄbucketÎļþ£¬¶øÊÇÔÚÉϴεÄShuffleFileGroupÖÐÈ¡µÃÒѾ´´½¨µÄÎļþ¼ÌÐø×·¼Óдһ¸ösegment£»µ±Ç°´Îmap»¹Ã»Ö´ÐÐÍ꣬ShuffleFileGroup»¹Ã»ÓÐÊÍ·Å£¬ÕâʱÈç¹ûÓÐеÄmapÔÚÕâ¸ö½ÚµãÉÏÖ´ÐУ¬ÎÞ·¨Ñ»·ÀûÓÃÕâ¸öShuffleFileGroup£¬¶øÊÇÖ»ÄÜ´´½¨ÐµÄbucketÎļþ×é³ÉеÄShuffleFileGroupÀ´Ð´Êä³ö¡£

±ÈÈçÒ»¸öJobÓÐ3¸öMapºÍ2¸öreduce£º(1) Èç¹û´Ëʱ¼¯ÈºÓÐ3¸ö½ÚµãÓпղۣ¬Ã¿¸ö½Úµã¿ÕÏÐÁËÒ»¸öcore£¬Ôò3¸öMap»áµ÷¶Èµ½Õâ3¸ö½ÚµãÉÏÖ´ÐУ¬Ã¿¸öMap¶¼»á´´½¨2¸öShuffleÎļþ£¬×ܹ²´´½¨6¸öShuffleÎļþ£»(2)
Èç¹û´Ëʱ¼¯ÈºÓÐ2¸ö½ÚµãÓпղۣ¬Ã¿¸ö½Úµã¿ÕÏÐÁËÒ»¸öcore£¬Ôò2¸öMapÏȵ÷¶Èµ½Õâ2¸ö½ÚµãÉÏÖ´ÐУ¬Ã¿¸öMap¶¼»á´´½¨2¸öShuffleÎļþ£¬È»ºóÆäÖÐÒ»¸ö½ÚµãÖ´ÐÐÍêMapÖ®ºóÓÖµ÷¶ÈÖ´ÐÐÁíÒ»¸öMap£¬ÔòÕâ¸öMap²»»á´´½¨ÐµÄShuffleÎļþ£¬¶øÊǰѽá¹ûÊä³ö×·¼Óµ½Ö®Ç°Map´´½¨µÄShuffleÎļþÖУ»×ܹ²´´½¨4¸öShuffleÎļþ£»(3)
Èç¹û´Ëʱ¼¯ÈºÓÐ2¸ö½ÚµãÓпղۣ¬Ò»¸ö½ÚµãÓÐ2¸ö¿ÕcoreÒ»¸ö½ÚµãÓÐ1¸ö¿Õcore£¬ÔòÒ»¸ö½Úµãµ÷¶È2¸öMapÒ»¸ö½Úµãµ÷¶È1¸öMap£¬µ÷¶È2¸öMapµÄ½ÚµãÉÏ£¬Ò»¸öMap´´½¨ÁËShuffleÎļþ£¬ºóÃæµÄMap»¹Êǻᴴ½¨ÐµÄShuffleÎļþ£¬ÒòΪÉÏÒ»¸öMap»¹ÕýÔÚд£¬Ëü´´½¨µÄShuffleFileGroup»¹Ã»ÓÐÊÍ·Å£»×ܹ²´´½¨6¸öShuffleÎļþ¡£
2.Shuffle Fetcher
ReduceÈ¥ÍÏMapµÄÊä³öÊý¾Ý£¬SparkÌṩÁËÁ½Ìײ»Í¬µÄÀÈ¡Êý¾Ý¿ò¼Ü£ºÍ¨¹ýsocketÁ¬½ÓȥȡÊý¾Ý£»Ê¹ÓÃnetty¿ò¼ÜȥȡÊý¾Ý¡£
ÿ¸ö½ÚµãµÄExecutor»á´´½¨Ò»¸öBlockManager£¬ÆäÖлᴴ½¨Ò»¸öBlockManagerWorkerÓÃÓÚÏìÓ¦ÇëÇó¡£µ±ReduceµÄGET_BLOCKµÄÇëÇó¹ýÀ´Ê±£¬¶ÁÈ¡±¾µØÎļþ½«Õâ¸öblockIdµÄÊý¾Ý·µ»Ø¸øReduce¡£Èç¹ûʹÓõÄÊÇNetty¿ò¼Ü£¬BlockManager»á´´½¨ShuffleSenderÓÃÓÚ·¢ËÍShuffleÊý¾Ý¡£²¢²»ÊÇËùÓеÄÊý¾Ý¶¼ÊÇͨ¹ýÍøÂç¶ÁÈ¡£¬¶ÔÓÚÔÚ±¾½ÚµãµÄMapÊý¾Ý£¬ReduceÖ±½ÓÈ¥´ÅÅÌÉ϶ÁÈ¡¶ø²»ÔÙͨ¹ýÍøÂç¿ò¼Ü¡£
ReduceÍϹýÀ´Êý¾ÝÖ®ºóÒÔʲô·½Ê½´æ´¢ÄØ£¿Spark MapÊä³öµÄÊý¾ÝûÓо¹ýÅÅÐò£¬Spark
Shuffle¹ýÀ´µÄÊý¾ÝÒ²²»»á½øÐÐÅÅÐò£¬SparkÈÏΪShuffle¹ý³ÌÖеÄÅÅÐò²»ÊDZØÐëµÄ£¬²¢²»ÊÇËùÓÐÀàÐ͵ÄReduceÐèÒªµÄÊý¾Ý¶¼ÐèÒªÅÅÐò£¬Ç¿ÖƵؽøÐÐÅÅÐòÖ»»áÔö¼ÓShuffleµÄ¸ºµ£¡£ReduceÍϹýÀ´µÄÊý¾Ý»á·ÅÔÚÒ»¸öHashMapÖУ¬HashMapÖд洢µÄÒ²ÊÇ<key,
value>¶Ô£¬keyÊÇMapÊä³öµÄkey£¬MapÊä³ö¶ÔÓ¦Õâ¸ökeyµÄËùÓÐvalue×é³ÉHashMapvalue
Spark½«ShuffleÈ¡¹ýÀ´µÄÿһ¸ö<key, value>¶Ô²åÈë»òÕ߸üе½HashMapÖУ¬À´Ò»¸ö´¦ÀíÒ»¸ö¡£HashMapÈ«²¿·ÅÔÚÄÚ´æÖС£
ShuffleÈ¡¹ýÀ´µÄÊý¾ÝÈ«²¿´æ·ÅÔÚÄÚ´æÖУ¬¶ÔÓÚÊý¾ÝÁ¿±È½ÏС»òÕßÒѾÔÚMap¶Ë×ö¹ýºÏ²¢´¦ÀíµÄShuffleÊý¾Ý£¬Õ¼ÓÃÄÚ´æ¿Õ¼ä²»»áÌ«´ó£¬µ«ÊǶÔÓÚ±ÈÈçgroup
by keyÕâÑùµÄ²Ù×÷£¬ReduceÐèÒªµÃµ½key¶ÔÓ¦µÄËùÓÐvalue£¬²¢½«ÕâЩvalue×éÒ»¸öÊý×é·ÅÔÚÄÚ´æÖУ¬ÕâÑùµ±Êý¾ÝÁ¿½Ï´óʱ£¬¾ÍÐèÒª½Ï¶àÄÚ´æ¡£
µ±ÄÚ´æ²»¹»Ê±£¬Òª²»¾Íʧ°Ü£¬Òª²»¾ÍÓÃÀϰ취°ÑÄÚ´æÖеÄÊý¾ÝÒÆµ½´ÅÅÌÉÏ·Å×Å¡£SparkÒâʶµ½ÔÚ´¦ÀíÊý¾Ý¹æÄ£Ô¶Ô¶´óÓÚÄÚ´æ¿Õ¼äʱËù´øÀ´µÄ²»×㣬ÒýÈëÁËÒ»¸ö¾ßÓÐÍⲿÅÅÐòµÄ·½°¸¡£Shuffle¹ýÀ´µÄÊý¾ÝÏÈ·ÅÔÚÄÚ´æÖУ¬µ±ÄÚ´æÖд洢µÄ<key,
value>¶Ô³¬¹ý1000²¢ÇÒÄÚ´æÊ¹Óó¬¹ý70%ʱ£¬ÅжϽڵãÉÏ¿ÉÓÃÄÚ´æÈç¹û»¹×ã¹»£¬Ôò°ÑÄڴ滺³åÇø´óС·±¶£¬Èç¹û¿ÉÓÃÄÚ´æ²»ÔÙ¹»ÁË£¬Ôò°ÑÄÚ´æÖеÄ<key,
value>¶ÔÅÅÐòÈ»ºóдµ½´ÅÅÌÎļþÖС£×îºó°ÑÄڴ滺³åÇøÖеÄÊý¾ÝÅÅÐòÖ®ºóºÍÄÇЩ´ÅÅÌÎļþ×é³ÉÒ»¸ö×îС¶Ñ£¬Ã¿´Î´Ó×îС¶ÑÖжÁÈ¡×îСµÄÊý¾Ý£¬Õâ¸öºÍMapReduceÖеÄmerge¹ý³ÌÀàËÆ¡£
3.MapReduceºÍSparkµÄShuffle¹ý³Ì¶Ô±È

SparkµÄ×ÊÔ´¹ÜÀíÓë×÷Òµµ÷¶È
Spark¶ÔÓÚ×ÊÔ´¹ÜÀíÓë×÷Òµµ÷¶È¿ÉÒÔʹÓñ¾µØÄ£Ê½£¬Standalone(¶ÀÁ¢Ä£Ê½)£¬Apache
Mesos¼°Hadoop YARNÀ´ÊµÏÖ¡£Spark on YarnÔÚSpark0.6ʱÒýÓ㬵«ÕæÕý¿ÉÓÃÊÇÔÚÏÖÔÚµÄbranch-0.8°æ±¾¡£Spark
on Yarn×ñÑYARNµÄ¹Ù·½¹æ·¶ÊµÏÖ£¬µÃÒæÓÚSparkÌìÉúÖ§³Ö¶àÖÖSchedulerºÍExecutorµÄÁ¼ºÃÉè¼Æ£¬¶ÔYARNµÄÖ§³ÖÒ²¾Í·Ç³£ÈÝÒ×£¬Spark
on YarnµÄ´óÖ¿ò¼Üͼ¡£

ÈÃSparkÔËÐÐÓÚYARNÉÏÓëHadoop¹²Óü¯Èº×ÊÔ´¿ÉÒÔÌá¸ß×ÊÔ´ÀûÓÃÂÊ¡£
±à³Ì½Ó¿Ú
Sparkͨ¹ýÓë±à³ÌÓïÑÔ¼¯³ÉµÄ·½Ê½±©Â¶RDDµÄ²Ù×÷£¬ÀàËÆÓÚDryadLINQºÍFlumeJava£¬Ã¿¸öÊý¾Ý¼¯¶¼±íʾΪRDD¶ÔÏ󣬶ÔÊý¾Ý¼¯µÄ²Ù×÷¾Í±íʾ³É¶ÔRDD¶ÔÏóµÄ²Ù×÷¡£SparkÖ÷ÒªµÄ±à³ÌÓïÑÔÊÇScala£¬Ñ¡ÔñScalaÊÇÒòΪËüµÄ¼ò½àÐÔ£¨Scala¿ÉÒԺܷ½±ãÔÚ½»»¥Ê½ÏÂʹÓ㩺ÍÐÔÄÜ£¨JVMÉϵľ²Ì¬Ç¿ÀàÐÍÓïÑÔ£©¡£
SparkºÍHadoop MapReduceÀàËÆ£¬ÓÉMaster(ÀàËÆÓÚMapReduceµÄJobtracker)ºÍWorkers(SparkµÄSlave¹¤×÷½Úµã)×é³É¡£Óû§±àдµÄSpark³ÌÐò±»³ÆÎªDriver³ÌÐò£¬Dirver³ÌÐò»áÁ¬½Ómaster²¢¶¨ÒåÁ˶Ը÷RDDµÄת»»Óë²Ù×÷£¬¶ø¶ÔRDDµÄת»»Óë²Ù×÷ͨ¹ýScala±Õ°ü(×ÖÃæÁ¿º¯Êý)À´±íʾ£¬ScalaʹÓÃJava¶ÔÏóÀ´±íʾ±Õ°üÇÒ¶¼ÊÇ¿ÉÐòÁл¯µÄ£¬ÒԴ˰ѶÔRDDµÄ±Õ°ü²Ù×÷·¢Ë͵½¸÷Workers½Úµã¡£
Workers´æ´¢×ÅÊý¾Ý·Ö¿éºÍÏíÓм¯ÈºÄڴ棬ÊÇÔËÐÐÔÚ¹¤×÷½ÚµãÉϵÄÊØ»¤½ø³Ì£¬µ±ËüÊÕµ½¶ÔRDDµÄ²Ù×÷ʱ£¬¸ù¾ÝÊý¾Ý·ÖƬÐÅÏ¢½øÐб¾µØ»¯Êý¾Ý²Ù×÷£¬Éú³ÉеÄÊý¾Ý·ÖƬ¡¢·µ»Ø½á¹û»ò°ÑRDDдÈë´æ´¢ÏµÍ³¡£

Scala£ºSparkʹÓÃScala¿ª·¢£¬Ä¬ÈÏʹÓÃScala×÷Ϊ±à³ÌÓïÑÔ¡£±àдSpark³ÌÐò±È±àдHadoop
MapReduce³ÌÐòÒª¼òµ¥µÄ¶à£¬SparKÌṩÁËSpark-Shell£¬¿ÉÒÔÔÚSpark-Shell²âÊÔ³ÌÐò¡£Ð´SparK³ÌÐòµÄÒ»°ã²½Öè¾ÍÊÇ´´½¨»òʹÓÃ(SparkContext)ʵÀý£¬Ê¹ÓÃSparkContext´´½¨RDD£¬È»ºó¾ÍÊǶÔRDD½øÐвÙ×÷¡£
Java£ºSparkÖ§³ÖJava±à³Ì£¬µ«¶ÔÓÚʹÓÃJava¾ÍûÓÐÁËSpark-ShellÕâÑù·½±ãµÄ¹¤¾ß£¬ÆäËüÓëScala±à³ÌÊÇÒ»ÑùµÄ£¬ÒòΪ¶¼ÊÇJVMÉϵÄÓïÑÔ£¬ScalaÓëJava¿ÉÒÔ»¥²Ù×÷£¬Java±à³Ì½Ó¿ÚÆäʵ¾ÍÊǶÔScalaµÄ·â×°¡£È磺
Python£ºÏÖÔÚSparkÒ²ÌṩÁËPython±à³Ì½Ó¿Ú£¬SparkʹÓÃpy4jÀ´ÊµÏÖpythonÓëjavaµÄ»¥²Ù×÷£¬´Ó¶øÊµÏÖʹÓÃpython±àдSpark³ÌÐò¡£SparkҲͬÑùÌṩÁËpyspark£¬Ò»¸öSparkµÄpython
shell£¬¿ÉÒÔÒÔ½»»¥Ê½µÄ·½Ê½Ê¹ÓÃPython±àдSpark³ÌÐò¡£
SparkÉú̬ϵͳ
Shark ( Hive on Spark): Shark»ù±¾ÉϾÍÊÇÔÚSparkµÄ¿ò¼Ü»ù´¡ÉÏÌṩºÍHiveÒ»ÑùµÄH
iveQLÃüÁî½Ó¿Ú£¬ÎªÁË×î´ó³Ì¶ÈµÄ±£³ÖºÍHiveµÄ¼æÈÝÐÔ£¬SharkʹÓÃÁËHiveµÄAPIÀ´ÊµÏÖquery
ParsingºÍ Logic Plan generation£¬×îºóµÄPhysicalPlan execution½×¶ÎÓÃSpark´úÌæHadoop
MapReduce¡£Í¨¹ýÅäÖÃShark²ÎÊý£¬Shark¿ÉÒÔ×Ô¶¯ÔÚÄÚ´æÖлº´æÌض¨µÄRDD£¬ÊµÏÖÊý¾ÝÖØÓ㬽ø¶ø¼Ó¿ìÌØ¶¨Êý¾Ý¼¯µÄ¼ìË÷¡£Í¬Ê±£¬Sharkͨ¹ýUDFÓû§×Ô¶¨Ò庯ÊýʵÏÖÌØ¶¨µÄÊý¾Ý·ÖÎöѧϰËã·¨£¬Ê¹µÃSQLÊý¾Ý²éѯºÍÔËËã·ÖÎöÄܽáºÏÔÚÒ»Æð£¬×î´ó»¯RDDµÄÖØ¸´Ê¹Óá£
Spark streaming: ¹¹½¨ÔÚSparkÉÏ´¦ÀíStreamÊý¾ÝµÄ¿ò¼Ü£¬»ù±¾µÄÔÀíÊǽ«StreamÊý¾Ý·Ö³ÉСµÄʱ¼äƬ¶Ï£¨¼¸Ã룩£¬ÒÔÀàËÆbatchÅúÁ¿´¦ÀíµÄ·½Ê½À´´¦ÀíÕâС²¿·ÖÊý¾Ý¡£Spark
Streaming¹¹½¨ÔÚSparkÉÏ£¬Ò»·½ÃæÊÇÒòΪSparkµÄµÍÑÓ³ÙÖ´ÐÐÒýÇæ£¨100ms+£©¿ÉÒÔÓÃÓÚʵʱ¼ÆË㣬ÁíÒ»·½ÃæÏà±È»ùÓÚRecordµÄÆäËü´¦Àí¿ò¼Ü£¨ÈçStorm£©£¬RDDÊý¾Ý¼¯¸üÈÝÒ××ö¸ßЧµÄÈÝ´í´¦Àí¡£´ËÍâСÅúÁ¿´¦ÀíµÄ·½Ê½Ê¹µÃËü¿ÉÒÔͬʱ¼æÈÝÅúÁ¿ºÍʵʱÊý¾Ý´¦ÀíµÄÂß¼ºÍËã·¨¡£·½±ãÁËһЩÐèÒªÀúÊ·Êý¾ÝºÍʵʱÊý¾ÝÁªºÏ·ÖÎöµÄÌØ¶¨Ó¦Óó¡ºÏ¡£
Bagel: Pregel on Spark£¬¿ÉÒÔÓÃSpark½øÐÐͼ¼ÆË㣬ÕâÊǸö·Ç³£ÓÐÓõÄСÏîÄ¿¡£Bagel×Ô´øÁËÒ»¸öÀý×Ó£¬ÊµÏÖÁËGoogleµÄPageRankËã·¨¡£
SparkµÄÊÊÓó¡¾°
1.SparkÊÇ»ùÓÚÄÚ´æµÄµü´ú¼ÆËã¿ò¼Ü£¬ÊÊÓÃÓÚÐèÒª¶à´Î²Ù×÷ÌØ¶¨Êý¾Ý¼¯µÄÓ¦Óó¡ºÏ¡£ÐèÒª·´¸´²Ù×÷µÄ´ÎÊýÔ½¶à£¬ËùÐè¶ÁÈ¡µÄÊý¾ÝÁ¿Ô½´ó£¬ÊÜÒæÔ½´ó£¬Êý¾ÝÁ¿Ð¡µ«ÊǼÆËãÃܼ¯¶È½Ï´óµÄ³¡ºÏ£¬ÊÜÒæ¾ÍÏà¶Ô½ÏС
2.ÓÉÓÚRDDµÄÌØÐÔ£¬Spark²»ÊÊÓÃÄÇÖÖÒ첽ϸÁ£¶È¸üÐÂ״̬µÄÓ¦Óã¬ÀýÈçweb·þÎñµÄ´æ´¢»òÕßÊÇÔöÁ¿µÄwebÅÀ³æºÍË÷Òý¡£¾ÍÊǶÔÓÚÄÇÖÖÔöÁ¿Ð޸ĵÄÓ¦ÓÃÄ£ÐͲ»Êʺϡ£
×ܵÄÀ´ËµSparkµÄÊÊÓÃÃæ±È½Ï¹ã·ºÇұȽÏͨÓá£
ÔÚÒµ½çµÄʹÓÃ
SparkÏîÄ¿ÔÚ2009ÄêÆô¶¯£¬2010Ä꿪Դ, ÏÖÔÚʹÓõÄÓУºBerkeley,
Princeton, Klout, Foursquare, Conviva, Quantifind, Yahoo!
Research & others, ÌÔ±¦µÈ£¬¶¹°êÒ²ÔÚʹÓÃSparkµÄpython¿Ë¡°æDpark¡£
|