Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Modeler   Code  
»áÔ±   
 
   
 
 
     
   
 ¶©ÔÄ
  ¾èÖú
Spark£º±ÈHadoop¸üÇ¿´óµÄ·Ö²¼Ê½Êý¾Ý¼ÆËãÏîÄ¿
 
À´Ô´£º±êµã·û ·¢²¼ÓÚ£º2014-12-29
  3591  次浏览      29
 

SparkÊÇÒ»¸öÓɼÓÖÝ´óѧ²®¿ËÀû·ÖУ£¨UC Berkeley AMP£©¿ª·¢µÄÒ»¸ö·Ö²¼Ê½Êý¾Ý¿ìËÙ·ÖÎöÏîÄ¿¡£ËüµÄºËÐļ¼ÊõÊǵ¯ÐÔ·Ö²¼Ê½Êý¾Ý¼¯(Resilient distributed datasets)£¬ÌṩÁ˱ÈHadoop¸ü¼Ó·á¸»µÄMapReduceÄ£ÐÍ£¬¿ÉÒÔ¿ìËÙÔÚÄÚ´æÖжÔÊý¾Ý¼¯½øÐжà´Îµü´ú£¬À´Ö§³Ö¸´ÔÓµÄÊý¾ÝÍÚ¾òËã·¨ºÍͼ¼ÆËãËã·¨¡£

SparkʹÓÃScala¿ª·¢£¬Ê¹ÓÃMesos×÷Ϊµ×²ãµÄµ÷¶È¿ò¼Ü£¬¿ÉÒÔºÍhadoopºÍEc2½ôÃܼ¯³É£¬Ö±½Ó¶ÁÈ¡hdfs»òS3µÄÎļþ½øÐмÆËã²¢°Ñ½á¹ûд»Øhdfs»òS3£¬ÊÇHadoopºÍAmazonÔÆ¼ÆËãÉú̬ȦµÄÒ»²¿·Ö¡£SparkÊÇÒ»¸öСÇÉÁáççµÄÏîÄ¿£¬ÏîÄ¿µÄcore²¿·ÖµÄ´úÂëÖ»ÓÐ63¸öScalaÎļþ£¬³ä·ÖÌåÏÖÁ˾«¼òÖ®ÃÀ¡£

parkÖ®ÒÀÀµ

1.map ReduceÄ£ÐÍ£º×÷Ϊһ¸ö·Ö²¼Ê½¼ÆËã¿ò¼Ü£¬Spark²ÉÓÃÁËMapReduceÄ£ÐÍ¡£ÔÚËüÉíÉÏ£¬GoogleµÄMap ReduceºÍHadoopµÄºÛ¼£ºÜÖØ£¬ºÜÃ÷ÏÔ£¬Ëü²¢·ÇÒ»¸ö´óµÄ´´Ð£¬¶øÊÇ΢´´Ð¡£ÔÚ»ù´¡ÀíÄî²»±äµÄǰÌáÏ£¬Ëü½è¼ø£¬Ä£·Â²¢ÒÀÀµÁËÏȱ²£¬¼ÓÈëÁËÒ»µã¸Ä½ø£¬¼«´óµÄÌáÉýÁËMapReduceµÄЧÂÊ¡£
º¯Êýʽ±à³Ì£ºSparkÓÉScalaд¾Í£¬¶øÖ§³ÖµÄÓïÑÔÒàÊÇScala¡£ÆäÔ­ÒòÖ®Ò»¾ÍÊÇScalaÖ§³Öº¯Êýʽ±à³Ì¡£ÕâÒ»À´Ôì¾ÍÁËSparkµÄ´úÂë¼ò½à£¬¶þÀ´Ê¹µÃ»ùÓÚSpark¿ª·¢µÄ³ÌÐò£¬Ò²ÌرðµÄ¼ò½à¡£Ò»´ÎÍêÕûµÄMapReduce£¬HadoopÖÐÐèÒª´´½¨Ò»¸öMapperÀàºÍReduceÀ࣬¶øSparkÖ»ÐèÒª´´½¨ÏàÓ¦µÄÒ»¸ömap

2.ÊýºÍreduceº¯Êý¼´¿É£¬´úÂëÁ¿´ó´ó½µµÍ¡£

3.Mesos£ºSpark½«·Ö²¼Ê½ÔËÐеÄÐèÒª¿¼ÂǵÄÊÂÇ飬¶¼½»¸øÁËMesos£¬×Ô¼º²»Care£¬ÕâÒ²ÊÇËü´úÂëÄܹ»¾«¼òµÄÔ­ÒòÖ®Ò»¡£

4.HDFSºÍS3£ºSparkÖ§³Ö2ÖÖ·Ö²¼Ê½´æ´¢ÏµÍ³£ºHDFSºÍS3¡£Ó¦¸ÃËãÊÇĿǰ×îÖ÷Á÷µÄÁ½ÖÖÁË¡£¶ÔÎļþϵͳµÄ¶ÁÈ¡ºÍдÈ빦ÄÜÊÇSpark×Ô¼ºÌṩµÄ£¬½èÖúMesos·Ö²¼Ê½ÊµÏÖ¡£

SparkÓëHadoopµÄ¶Ô±È

1.SparkµÄÖмäÊý¾Ý·Åµ½ÄÚ´æÖУ¬¶ÔÓÚµü´úÔËËãЧÂʸü¸ß¡£Spark¸üÊʺÏÓÚµü´úÔËËã±È½Ï¶àµÄMLºÍDMÔËËã¡£ÒòΪÔÚSparkÀïÃæ£¬ÓÐRDDµÄ³éÏó¸ÅÄî¡£

2.Spark±ÈHadoop¸üͨÓá£

SparkÌṩµÄÊý¾Ý¼¯²Ù×÷ÀàÐÍÓкܶàÖÖ£¬²»ÏñHadoopÖ»ÌṩÁËMapºÍReduceÁ½ÖÖ²Ù×÷¡£±ÈÈçmap,filter,flatMap,sample,groupByKey,reduceByKey,union,join,cogroup,mapValues,sort,partionByµÈ¶àÖÖ²Ù×÷ÀàÐÍ£¬Spark°ÑÕâЩ²Ù×÷³ÆÎªTransformations¡£Í¬Ê±»¹ÌṩCount,collect,reduce,lookup,saveµÈ¶àÖÖactions²Ù×÷¡£

ÕâЩ¶àÖÖ¶àÑùµÄÊý¾Ý¼¯²Ù×÷ÀàÐÍ£¬¸ø¸ø¿ª·¢ÉϲãÓ¦ÓõÄÓû§ÌṩÁË·½±ã¡£¸÷¸ö´¦Àí½ÚµãÖ®¼äµÄͨÐÅÄ£ÐͲ»ÔÙÏñHadoopÄÇÑù¾ÍÊÇΨһµÄData ShuffleÒ»ÖÖģʽ¡£Óû§¿ÉÒÔÃüÃû£¬Îﻯ£¬¿ØÖÆÖмä½á¹ûµÄ´æ´¢¡¢·ÖÇøµÈ¡£¿ÉÒÔ˵±à³ÌÄ£ÐͱÈHadoop¸üÁé»î¡£

²»¹ýÓÉÓÚRDDµÄÌØÐÔ£¬Spark²»ÊÊÓÃÄÇÖÖÒ첽ϸÁ£¶È¸üÐÂ״̬µÄÓ¦Óã¬ÀýÈçweb·þÎñµÄ´æ´¢»òÕßÊÇÔöÁ¿µÄwebÅÀ³æºÍË÷Òý¡£¾ÍÊǶÔÓÚÄÇÖÖÔöÁ¿Ð޸ĵÄÓ¦ÓÃÄ£ÐͲ»Êʺϡ£

3.ÈÝ´íÐÔ¡£ÔÚ·Ö²¼Ê½Êý¾Ý¼¯¼ÆËãʱͨ¹ýcheckpointÀ´ÊµÏÖÈÝ´í£¬¶øcheckpointÓÐÁ½ÖÖ·½Ê½£¬Ò»¸öÊÇcheckpoint data£¬Ò»¸öÊÇlogging the updates¡£Óû§¿ÉÒÔ¿ØÖƲÉÓÃÄÄÖÖ·½Ê½À´ÊµÏÖÈÝ´í¡£

4.¿ÉÓÃÐÔ¡£Sparkͨ¹ýÌṩ·á¸»µÄScala, Java£¬Python API¼°½»»¥Ê½ShellÀ´Ìá¸ß¿ÉÓÃÐÔ¡£

parkÓëHadoopµÄ½áºÏ

Spark¿ÉÒÔÖ±½Ó¶ÔHDFS½øÐÐÊý¾ÝµÄ¶Áд£¬Í¬ÑùÖ§³ÖSpark on YARN¡£Spark¿ÉÒÔÓëMapReduceÔËÐÐÓÚͬ¼¯ÈºÖУ¬¹²Ïí´æ´¢×ÊÔ´Óë¼ÆË㣬Êý¾Ý²Ö¿âSharkʵÏÖÉϽèÓÃHive£¬¼¸ºõÓëHiveÍêÈ«¼æÈÝ¡£

SparkµÄºËÐĸÅÄî

1¡¢Resilient Distributed Dataset (RDD)µ¯ÐÔ·Ö²¼Êý¾Ý¼¯

RDDÊÇSparkµÄ×î»ù±¾³éÏó,ÊǶԷֲ¼Ê½ÄÚ´æµÄ³éÏóʹÓã¬ÊµÏÖÁËÒÔ²Ù×÷±¾µØ¼¯ºÏµÄ·½Ê½À´²Ù×÷·Ö²¼Ê½Êý¾Ý¼¯µÄ³éÏóʵÏÖ¡£RDDÊÇSpark×îºËÐĵĶ«Î÷£¬Ëü±íʾÒѱ»·ÖÇø£¬²»¿É±äµÄ²¢Äܹ»±»²¢ÐвÙ×÷µÄÊý¾Ý¼¯ºÏ£¬²»Í¬µÄÊý¾Ý¼¯¸ñʽ¶ÔÓ¦²»Í¬µÄRDDʵÏÖ¡£RDD±ØÐëÊÇ¿ÉÐòÁл¯µÄ¡£RDD¿ÉÒÔcacheµ½ÄÚ´æÖУ¬Ã¿´Î¶ÔRDDÊý¾Ý¼¯µÄ²Ù×÷Ö®ºóµÄ½á¹û£¬¶¼¿ÉÒÔ´æ·Åµ½ÄÚ´æÖУ¬ÏÂÒ»¸ö²Ù×÷¿ÉÒÔÖ±½Ó´ÓÄÚ´æÖÐÊäÈ룬ʡȥÁËMapReduce´óÁ¿µÄ´ÅÅÌIO²Ù×÷¡£Õâ¶ÔÓÚµü´úÔËËã±È½Ï³£¼ûµÄ»úÆ÷ѧϰËã·¨, ½»»¥Ê½Êý¾ÝÍÚ¾òÀ´Ëµ£¬Ð§ÂÊÌáÉý±È½Ï´ó¡£

RDDµÄÌØµã£º

1.ËüÊÇÔÚ¼¯Èº½ÚµãÉϵIJ»¿É±äµÄ¡¢ÒÑ·ÖÇøµÄ¼¯ºÏ¶ÔÏó¡£

2.ͨ¹ý²¢ÐÐת»»µÄ·½Ê½À´´´½¨È磨map, filter, join, etc£©¡£

3.ʧ°Ü×Ô¶¯Öؽ¨¡£

4.¿ÉÒÔ¿ØÖÆ´æ´¢¼¶±ð£¨ÄÚ´æ¡¢´ÅÅ̵ȣ©À´½øÐÐÖØÓá£

5.±ØÐëÊÇ¿ÉÐòÁл¯µÄ¡£

6.ÊǾ²Ì¬ÀàÐ͵ġ£

RDDµÄºÃ´¦£º

1.RDDÖ»Äܴӳ־ô洢»òͨ¹ýTransformations²Ù×÷²úÉú£¬Ïà±ÈÓÚ·Ö²¼Ê½¹²ÏíÄڴ棨DSM£©¿ÉÒÔ¸ü¸ßЧʵÏÖÈÝ´í£¬¶ÔÓÚ¶ªÊ§²¿·ÖÊý¾Ý·ÖÇøÖ»Ðè¸ù¾ÝËüµÄlineage¾Í¿ÉÖØÐ¼ÆËã³öÀ´£¬¶ø²»ÐèÒª×öÌØ¶¨µÄCheckpoint¡£

2.RDDµÄ²»±äÐÔ£¬¿ÉÒÔʵÏÖÀàHadoop MapReduceµÄÍÆ²âʽִÐС£

3.RDDµÄÊý¾Ý·ÖÇøÌØÐÔ£¬¿ÉÒÔͨ¹ýÊý¾ÝµÄ±¾µØÐÔÀ´Ìá¸ßÐÔÄÜ£¬ÕâÓëHadoop MapReduceÊÇÒ»ÑùµÄ¡£

4.RDD¶¼ÊÇ¿ÉÐòÁл¯µÄ£¬ÔÚÄÚ´æ²»×ãʱ¿É×Ô¶¯½µ¼¶Îª´ÅÅÌ´æ´¢£¬°ÑRDD´æ´¢ÓÚ´ÅÅÌÉÏ£¬ÕâʱÐÔÄÜ»áÓдóµÄϽµµ«²»»á²îÓÚÏÖÔÚµÄMapReduce¡£

RDDµÄ´æ´¢Óë·ÖÇø£º

Óû§¿ÉÒÔÑ¡Ôñ²»Í¬µÄ´æ´¢¼¶±ð´æ´¢RDDÒÔ±ãÖØÓá£

µ±Ç°RDDĬÈÏÊÇ´æ´¢ÓÚÄڴ棬µ«µ±ÄÚ´æ²»×ãʱ£¬RDD»áspillµ½disk¡£

RDDÔÚÐèÒª½øÐзÖÇø°ÑÊý¾Ý·Ö²¼ÓÚ¼¯ÈºÖÐʱ»á¸ù¾ÝÿÌõ¼Ç¼Key½øÐзÖÇø£¨ÈçHash ·ÖÇø£©£¬ÒԴ˱£Ö¤Á½¸öÊý¾Ý¼¯ÔÚJoinʱÄܸßЧ¡£

RDDµÄÄÚ²¿±íʾ£º

·ÖÇøÁÐ±í£¨Êý¾Ý¿éÁÐ±í£©

¼ÆËãÿ¸ö·ÖƬµÄº¯Êý£¨¸ù¾Ý¸¸RDD¼ÆËã³ö´ËRDD£©

¶Ô¸¸RDDµÄÒÀÀµÁбí

¶Ôkey-value RDDµÄPartitioner¡¾¿ÉÑ¡¡¿

ÿ¸öÊý¾Ý·ÖƬµÄÔ¤¶¨Ò嵨ַÁбí(ÈçHDFSÉϵÄÊý¾Ý¿éµÄµØÖ·)¡¾¿ÉÑ¡¡¿

RDDµÄ´æ´¢¼¶±ð£ºRDD¸ù¾ÝuseDisk¡¢useMemory¡¢deserialized¡¢replicationËĸö²ÎÊýµÄ×éºÏÌṩÁË11ÖÖ´æ´¢¼¶±ð¡£RDD¶¨ÒåÁ˸÷ÖÖ²Ù×÷£¬²»Í¬ÀàÐ͵ÄÊý¾ÝÓɲ»Í¬µÄRDDÀà³éÏó±íʾ£¬²»Í¬µÄ²Ù×÷Ò²ÓÉRDD½øÐгéʵÏÖ¡£

RDDÓÐÁ½ÖÖ´´½¨·½Ê½£º

´ÓHadoopÎļþϵͳ£¨»òÓëHadoop¼æÈÝµÄÆäËü´æ´¢ÏµÍ³£©ÊäÈ루ÀýÈçHDFS£©´´½¨¡£

´Ó¸¸RDDת»»µÃµ½ÐÂRDD¡£

2¡¢Spark On Mesos

SparkÖ§³ÖLocalµ÷ÓúÍMesos¼¯ÈºÁ½ÖÖģʽ£¬ÔÚSparkÉÏ¿ª·¢Ëã·¨³ÌÐò£¬¿ÉÒÔÔÚ±¾µØÄ£Ê½µ÷ÊԳɹ¦ºó£¬Ö±½Ó¸ÄÓÃMesos¼¯ÈºÔËÐУ¬³ýÁËÎļþµÄ±£´æÎ»ÖÃÐèÒª¿¼ÂÇÒÔÍ⣬Ëã·¨ÀíÂÛÉϲ»ÐèÒª×öÈκÎÐ޸ġ£SparkµÄ±¾µØÄ£Ê½Ö§³Ö¶àỊ̈߳¬ÓÐÒ»¶¨µÄµ¥»ú²¢·¢´¦ÀíÄÜÁ¦¡£µ«ÊDz»ËãºÜÇ¿¾¢¡£±¾µØÄ£Ê½¿ÉÒÔ±£´æ½á¹ûÔÚ±¾µØ»òÕß·Ö²¼Ê½Îļþϵͳ£¬¶øMesosģʽһ¶¨ÐèÒª±£´æÔÚ·Ö²¼Ê½»òÕß¹²ÏíÎļþϵͳ¡£

ΪÁËÔÚMesos¿ò¼ÜÉÏÔËÐУ¬°²×°MesosµÄ¹æ·¶ºÍÉè¼Æ£¬SparkʵÏÖÁ½¸öÀ࣬һ¸öÊÇSparkScheduler£¬ÔÚSparkÖÐÀàÃûÊÇMesosScheduler£»Ò»¸öÊÇSparkExecutor£¬ÔÚSparkÖÐÀàÃûÊÇExecutor¡£ÓÐÁËÕâÁ½¸öÀ࣬Spark¾Í¿ÉÒÔͨ¹ýMesos½øÐзֲ¼Ê½µÄ¼ÆËã¡£Spark»á½«RDDºÍMapReduceº¯Êý£¬½øÐÐÒ»´Îת»»£¬±ä³É±ê×¼µÄJobºÍһϵÁеÄTask¡£Ìá½»¸øSparkScheduler£¬SparkScheduler»á°ÑTaskÌá½»¸øMesos Master£¬ÓÉMaster·ÖÅ䏸²»Í¬µÄSlave£¬×îÖÕÓÉSlaveÖеÄSpark Executor£¬½«·ÖÅäµ½µÄTaskÒ»Ò»Ö´ÐУ¬²¢ÇÒ·µ»Ø£¬×é³ÉеÄRDD£¬»òÕßÖ±½ÓдÈëµ½·Ö²¼Ê½Îļþϵͳ¡£

3¡¢Transformations & Actions

¶ÔÓÚRDD¿ÉÒÔÓÐÁ½ÖÖ¼ÆË㷽ʽ£º×ª»»£¨·µ»ØÖµ»¹ÊÇÒ»¸öRDD£©Óë²Ù×÷£¨·µ»ØÖµ²»ÊÇÒ»¸öRDD£©¡£

ת»»(Transformations) (È磺map, filter, groupBy, joinµÈ)£¬Transformations²Ù×÷ÊÇLazyµÄ£¬Ò²¾ÍÊÇ˵´ÓÒ»¸öRDDת»»Éú³ÉÁíÒ»¸öRDDµÄ²Ù×÷²»ÊÇÂíÉÏÖ´ÐУ¬SparkÔÚÓöµ½Transformations²Ù×÷ʱֻ»á¼Ç¼ÐèÒªÕâÑùµÄ²Ù×÷£¬²¢²»»áÈ¥Ö´ÐУ¬ÐèÒªµÈµ½ÓÐActions²Ù×÷µÄʱºò²Å»áÕæÕýÆô¶¯¼ÆËã¹ý³Ì½øÐмÆËã¡£

²Ù×÷(Actions) (È磺count, collect, saveµÈ)£¬Actions²Ù×÷»á·µ»Ø½á¹û»ò°ÑRDDÊý¾Ýдµ½´æ´¢ÏµÍ³ÖС£ActionsÊÇ´¥·¢SparkÆô¶¯¼ÆËãµÄ¶¯Òò¡£

ËüÃDZ¾ÖÊÇø±ðÊÇ£ºTransformation·µ»ØÖµ»¹ÊÇÒ»¸öRDD¡£ËüʹÓÃÁËÁ´Ê½µ÷ÓõÄÉè¼ÆÄ£Ê½£¬¶ÔÒ»¸öRDD½øÐмÆËãºó£¬±ä»»³ÉÁíÍâÒ»¸öRDD£¬È»ºóÕâ¸öRDDÓÖ¿ÉÒÔ½øÐÐÁíÍâÒ»´Îת»»¡£Õâ¸ö¹ý³ÌÊÇ·Ö²¼Ê½µÄ¡£Action·µ»ØÖµ²»ÊÇÒ»¸öRDD¡£ËüҪôÊÇÒ»¸öScalaµÄÆÕͨ¼¯ºÏ£¬ÒªÃ´ÊÇÒ»¸öÖµ£¬ÒªÃ´Êǿգ¬×îÖÕ»ò·µ»Øµ½Driver³ÌÐò£¬»ò°ÑRDDдÈëµ½ÎļþϵͳÖС£¹ØÓÚÕâÁ½¸ö¶¯×÷£¬ÔÚSpark¿ª·¢Ö¸ÄÏÖлáÓоͽøÒ»²½µÄÏêϸ½éÉÜ£¬ËüÃÇÊÇ»ùÓÚSpark¿ª·¢µÄºËÐÄ¡£ÕâÀォSparkµÄ¹Ù·½pptÖеÄÒ»ÕÅͼÂÔ×÷¸ÄÔ죬²ûÃ÷Ò»ÏÂÁ½ÖÖ¶¯×÷µÄÇø±ð¡£

4¡¢Lineage£¨ÑªÍ³£©

ÀûÓÃÄÚ´æ¼Ó¿ìÊý¾Ý¼ÓÔØ,ÔÚÖÚ¶àµÄÆäËüµÄIn-MemoryÀàÊý¾Ý¿â»òCacheÀàϵͳÖÐÒ²ÓÐʵÏÖ£¬SparkµÄÖ÷񻂿±ðÔÚÓÚËü´¦Àí·Ö²¼Ê½ÔËËã»·¾³ÏµÄÊý¾ÝÈÝ´íÐÔ£¨½ÚµãʵЧ/Êý¾Ý¶ªÊ§£©ÎÊÌâʱ²ÉÓõķ½°¸¡£ÎªÁ˱£Ö¤RDDÖÐÊý¾ÝµÄ³°ôÐÔ£¬RDDÊý¾Ý¼¯Í¨¹ýËùνµÄѪͳ¹ØÏµ(Lineage)¼ÇסÁËËüÊÇÈçºÎ´ÓÆäËüRDDÖÐÑݱä¹ýÀ´µÄ¡£Ïà±ÈÆäËüϵͳµÄϸ¿ÅÁ£¶ÈµÄÄÚ´æÊý¾Ý¸üм¶±ðµÄ±¸·Ý»òÕßLOG»úÖÆ£¬RDDµÄLineage¼Ç¼µÄÊÇ´Ö¿ÅÁ£¶ÈµÄÌØ¶¨Êý¾Ýת»»£¨Transformation£©²Ù×÷£¨filter, map, join etc.)ÐÐΪ¡£µ±Õâ¸öRDDµÄ²¿·Ö·ÖÇøÊý¾Ý¶ªÊ§Ê±£¬Ëü¿ÉÒÔͨ¹ýLineage»ñÈ¡×ã¹»µÄÐÅÏ¢À´ÖØÐÂÔËËãºÍ»Ö¸´¶ªÊ§µÄÊý¾Ý·ÖÇø¡£ÕâÖÖ´Ö¿ÅÁ£µÄÊý¾ÝÄ£ÐÍ£¬ÏÞÖÆÁËSparkµÄÔËÓó¡ºÏ£¬µ«Í¬Ê±Ïà±Èϸ¿ÅÁ£¶ÈµÄÊý¾ÝÄ£ÐÍ£¬Ò²´øÀ´ÁËÐÔÄܵÄÌáÉý¡£

RDDÔÚLineageÒÀÀµ·½Ãæ·ÖΪÁ½ÖÖNarrow DependenciesÓëWide DependenciesÓÃÀ´½â¾öÊý¾ÝÈÝ´íµÄ¸ßЧÐÔ¡£

Narrow DependenciesÊÇÖ¸¸¸RDDµÄÿһ¸ö·ÖÇø×î¶à±»Ò»¸ö×ÓRDDµÄ·ÖÇøËùÓ㬱íÏÖΪһ¸ö¸¸RDDµÄ·ÖÇø¶ÔÓ¦ÓÚÒ»¸ö×ÓRDDµÄ·ÖÇø»ò¶à¸ö¸¸RDDµÄ·ÖÇø¶ÔÓ¦ÓÚÒ»¸ö×ÓRDDµÄ·ÖÇø£¬Ò²¾ÍÊÇ˵һ¸ö¸¸RDDµÄÒ»¸ö·ÖÇø²»¿ÉÄܶÔÓ¦Ò»¸ö×ÓRDDµÄ¶à¸ö·ÖÇø¡£

Wide DependenciesÊÇÖ¸×ÓRDDµÄ·ÖÇøÒÀÀµÓÚ¸¸RDDµÄ¶à¸ö·ÖÇø»òËùÓзÖÇø£¬Ò²¾ÍÊÇ˵´æÔÚÒ»¸ö¸¸RDDµÄÒ»¸ö·ÖÇø¶ÔÓ¦Ò»¸ö×ÓRDDµÄ¶à¸ö·ÖÇø¡£¶ÔÓëWide Dependencies£¬ÕâÖÖ¼ÆËãµÄÊäÈëºÍÊä³öÔÚ²»Í¬µÄ½ÚµãÉÏ£¬lineage·½·¨¶ÔÓëÊäÈë½ÚµãÍêºÃ£¬¶øÊä³ö½Úµãå´»úʱ£¬Í¨¹ýÖØÐ¼ÆË㣬ÕâÖÖÇé¿öÏ£¬ÕâÖÖ·½·¨ÈÝ´íÊÇÓÐЧµÄ£¬·ñÔòÎÞЧ£¬ÒòΪÎÞ·¨ÖØÊÔ£¬ÐèÒªÏòÉÏÆä׿ÏÈ×·ËÝ¿´ÊÇ·ñ¿ÉÒÔÖØÊÔ£¨Õâ¾ÍÊÇlineage£¬ÑªÍ³µÄÒâ˼£©£¬Narrow Dependencies¶ÔÓÚÊý¾ÝµÄÖØË㿪ÏúҪԶСÓÚWide DependenciesµÄÊý¾ÝÖØË㿪Ïú¡£

ÔÚRDD¼ÆË㣬ͨ¹ýcheckpint½øÐÐÈÝ´í£¬×öcheckpointÓÐÁ½ÖÖ·½Ê½£¬Ò»¸öÊÇcheckpoint data£¬Ò»¸öÊÇlogging the updates¡£Óû§¿ÉÒÔ¿ØÖƲÉÓÃÄÄÖÖ·½Ê½À´ÊµÏÖÈÝ´í£¬Ä¬ÈÏÊÇlogging the updates·½Ê½£¬Í¨¹ý¼Ç¼¸ú×ÙËùÓÐÉú³ÉRDDµÄת»»£¨transformations£©Ò²¾ÍÊǼǼÿ¸öRDDµÄlineage£¨ÑªÍ³£©À´ÖØÐ¼ÆËãÉú³É¶ªÊ§µÄ·ÖÇøÊý¾Ý¡£

SparkµÄShuffle¹ý³Ì½éÉÜ

1.Shuffle Writer

Spark·á¸»ÁËÈÎÎñÀàÐÍ£¬ÓÐЩÈÎÎñÖ®¼äÊý¾ÝÁ÷ת²»ÐèҪͨ¹ýShuffle£¬µ«ÊÇÓÐЩÈÎÎñÖ®¼ä»¹ÊÇÐèҪͨ¹ýShuffleÀ´´«µÝÊý¾Ý£¬±ÈÈçwide dependencyµÄgroup by key¡£

SparkÖÐÐèÒªShuffleÊä³öµÄMapÈÎÎñ»áΪÿ¸öReduce´´½¨¶ÔÓ¦µÄbucket£¬Map²úÉúµÄ½á¹û»á¸ù¾ÝÉèÖõÄpartitionerµÃµ½¶ÔÓ¦µÄbucketId£¬È»ºóÌî³äµ½ÏàÓ¦µÄbucketÖÐÈ¥¡£Ã¿¸öMapµÄÊä³ö½á¹û¿ÉÄܰüº¬ËùÓеÄReduceËùÐèÒªµÄÊý¾Ý£¬ËùÒÔÿ¸öMap»á´´½¨R¸öbucket£¨RÊÇreduceµÄ¸öÊý£©£¬M¸öMap×ܹ²»á´´½¨M*R¸öbucket¡£

Map´´½¨µÄbucketÆäʵ¶ÔÓ¦´ÅÅÌÉϵÄÒ»¸öÎļþ£¬MapµÄ½á¹ûдµ½Ã¿¸öbucketÖÐÆäʵ¾ÍÊÇдµ½ÄǸö´ÅÅÌÎļþÖУ¬Õâ¸öÎļþÒ²±»³ÆÎªblockFile£¬ÊÇDisk Block Manager¹ÜÀíÆ÷ͨ¹ýÎļþÃûµÄHashÖµ¶ÔÓ¦µ½±¾µØÄ¿Â¼µÄ×ÓĿ¼Öд´½¨µÄ¡£Ã¿¸öMapÒªÔÚ½ÚµãÉÏ´´½¨R¸ö´ÅÅÌÎļþÓÃÓÚ½á¹ûÊä³ö£¬MapµÄ½á¹ûÊÇÖ±½ÓÊä³öµ½´ÅÅÌÎļþÉϵģ¬100KBµÄÄڴ滺³åÊÇÓÃÀ´´´½¨Fast Buffered OutputStreamÊä³öÁ÷¡£ÕâÖÖ·½Ê½Ò»¸öÎÊÌâ¾ÍÊÇShuffleÎļþ¹ý¶à¡£

Õë¶ÔÉÏÊöShuffle¹ý³Ì²úÉúµÄÎļþ¹ý¶àÎÊÌ⣬SparkÓÐÁíÍâÒ»ÖָĽøµÄShuffle¹ý³Ì£ºconsolidation Shuffle£¬ÒÔÆÚÏÔÖø¼õÉÙShuffleÎļþµÄÊýÁ¿¡£ÔÚconsolidation ShuffleÖÐÿ¸öbucket²¢·Ç¶ÔÓ¦Ò»¸öÎļþ£¬¶øÊǶÔÓ¦ÎļþÖеÄÒ»¸ösegment²¿·Ö¡£JobµÄmapÔÚij¸ö½ÚµãÉϵÚÒ»´ÎÖ´ÐУ¬ÎªÃ¿¸öreduce´´½¨bucket¶ÔÓ¦µÄÊä³öÎļþ£¬°ÑÕâЩÎļþ×éÖ¯³ÉShuffleFileGroup£¬µ±Õâ´ÎmapÖ´ÐÐÍêÖ®ºó£¬Õâ¸öShuffleFileGroup¿ÉÒÔÊÍ·ÅΪÏ´ÎÑ­»·ÀûÓ㻵±ÓÖÓÐmapÔÚÕâ¸ö½ÚµãÉÏÖ´ÐÐʱ£¬²»ÐèÒª´´½¨ÐµÄbucketÎļþ£¬¶øÊÇÔÚÉϴεÄShuffleFileGroupÖÐÈ¡µÃÒѾ­´´½¨µÄÎļþ¼ÌÐø×·¼Óдһ¸ösegment£»µ±Ç°´Îmap»¹Ã»Ö´ÐÐÍ꣬ShuffleFileGroup»¹Ã»ÓÐÊÍ·Å£¬ÕâʱÈç¹ûÓÐеÄmapÔÚÕâ¸ö½ÚµãÉÏÖ´ÐУ¬ÎÞ·¨Ñ­»·ÀûÓÃÕâ¸öShuffleFileGroup£¬¶øÊÇÖ»ÄÜ´´½¨ÐµÄbucketÎļþ×é³ÉеÄShuffleFileGroupÀ´Ð´Êä³ö¡£

±ÈÈçÒ»¸öJobÓÐ3¸öMapºÍ2¸öreduce£º(1) Èç¹û´Ëʱ¼¯ÈºÓÐ3¸ö½ÚµãÓпղۣ¬Ã¿¸ö½Úµã¿ÕÏÐÁËÒ»¸öcore£¬Ôò3¸öMap»áµ÷¶Èµ½Õâ3¸ö½ÚµãÉÏÖ´ÐУ¬Ã¿¸öMap¶¼»á´´½¨2¸öShuffleÎļþ£¬×ܹ²´´½¨6¸öShuffleÎļþ£»(2) Èç¹û´Ëʱ¼¯ÈºÓÐ2¸ö½ÚµãÓпղۣ¬Ã¿¸ö½Úµã¿ÕÏÐÁËÒ»¸öcore£¬Ôò2¸öMapÏȵ÷¶Èµ½Õâ2¸ö½ÚµãÉÏÖ´ÐУ¬Ã¿¸öMap¶¼»á´´½¨2¸öShuffleÎļþ£¬È»ºóÆäÖÐÒ»¸ö½ÚµãÖ´ÐÐÍêMapÖ®ºóÓÖµ÷¶ÈÖ´ÐÐÁíÒ»¸öMap£¬ÔòÕâ¸öMap²»»á´´½¨ÐµÄShuffleÎļþ£¬¶øÊǰѽá¹ûÊä³ö×·¼Óµ½Ö®Ç°Map´´½¨µÄShuffleÎļþÖУ»×ܹ²´´½¨4¸öShuffleÎļþ£»(3) Èç¹û´Ëʱ¼¯ÈºÓÐ2¸ö½ÚµãÓпղۣ¬Ò»¸ö½ÚµãÓÐ2¸ö¿ÕcoreÒ»¸ö½ÚµãÓÐ1¸ö¿Õcore£¬ÔòÒ»¸ö½Úµãµ÷¶È2¸öMapÒ»¸ö½Úµãµ÷¶È1¸öMap£¬µ÷¶È2¸öMapµÄ½ÚµãÉÏ£¬Ò»¸öMap´´½¨ÁËShuffleÎļþ£¬ºóÃæµÄMap»¹Êǻᴴ½¨ÐµÄShuffleÎļþ£¬ÒòΪÉÏÒ»¸öMap»¹ÕýÔÚд£¬Ëü´´½¨µÄShuffleFileGroup»¹Ã»ÓÐÊÍ·Å£»×ܹ²´´½¨6¸öShuffleÎļþ¡£

2.Shuffle Fetcher

ReduceÈ¥ÍÏMapµÄÊä³öÊý¾Ý£¬SparkÌṩÁËÁ½Ìײ»Í¬µÄÀ­È¡Êý¾Ý¿ò¼Ü£ºÍ¨¹ýsocketÁ¬½ÓȥȡÊý¾Ý£»Ê¹ÓÃnetty¿ò¼ÜȥȡÊý¾Ý¡£

ÿ¸ö½ÚµãµÄExecutor»á´´½¨Ò»¸öBlockManager£¬ÆäÖлᴴ½¨Ò»¸öBlockManagerWorkerÓÃÓÚÏìÓ¦ÇëÇó¡£µ±ReduceµÄGET_BLOCKµÄÇëÇó¹ýÀ´Ê±£¬¶ÁÈ¡±¾µØÎļþ½«Õâ¸öblockIdµÄÊý¾Ý·µ»Ø¸øReduce¡£Èç¹ûʹÓõÄÊÇNetty¿ò¼Ü£¬BlockManager»á´´½¨ShuffleSenderÓÃÓÚ·¢ËÍShuffleÊý¾Ý¡£²¢²»ÊÇËùÓеÄÊý¾Ý¶¼ÊÇͨ¹ýÍøÂç¶ÁÈ¡£¬¶ÔÓÚÔÚ±¾½ÚµãµÄMapÊý¾Ý£¬ReduceÖ±½ÓÈ¥´ÅÅÌÉ϶ÁÈ¡¶ø²»ÔÙͨ¹ýÍøÂç¿ò¼Ü¡£

ReduceÍϹýÀ´Êý¾ÝÖ®ºóÒÔʲô·½Ê½´æ´¢ÄØ£¿Spark MapÊä³öµÄÊý¾ÝûÓо­¹ýÅÅÐò£¬Spark Shuffle¹ýÀ´µÄÊý¾ÝÒ²²»»á½øÐÐÅÅÐò£¬SparkÈÏΪShuffle¹ý³ÌÖеÄÅÅÐò²»ÊDZØÐëµÄ£¬²¢²»ÊÇËùÓÐÀàÐ͵ÄReduceÐèÒªµÄÊý¾Ý¶¼ÐèÒªÅÅÐò£¬Ç¿ÖƵؽøÐÐÅÅÐòÖ»»áÔö¼ÓShuffleµÄ¸ºµ£¡£ReduceÍϹýÀ´µÄÊý¾Ý»á·ÅÔÚÒ»¸öHashMapÖУ¬HashMapÖд洢µÄÒ²ÊÇ<key, value>¶Ô£¬keyÊÇMapÊä³öµÄkey£¬MapÊä³ö¶ÔÓ¦Õâ¸ökeyµÄËùÓÐvalue×é³ÉHashMapvalue

Spark½«ShuffleÈ¡¹ýÀ´µÄÿһ¸ö<key, value>¶Ô²åÈë»òÕ߸üе½HashMapÖУ¬À´Ò»¸ö´¦ÀíÒ»¸ö¡£HashMapÈ«²¿·ÅÔÚÄÚ´æÖС£

ShuffleÈ¡¹ýÀ´µÄÊý¾ÝÈ«²¿´æ·ÅÔÚÄÚ´æÖУ¬¶ÔÓÚÊý¾ÝÁ¿±È½ÏС»òÕßÒѾ­ÔÚMap¶Ë×ö¹ýºÏ²¢´¦ÀíµÄShuffleÊý¾Ý£¬Õ¼ÓÃÄÚ´æ¿Õ¼ä²»»áÌ«´ó£¬µ«ÊǶÔÓÚ±ÈÈçgroup by keyÕâÑùµÄ²Ù×÷£¬ReduceÐèÒªµÃµ½key¶ÔÓ¦µÄËùÓÐvalue£¬²¢½«ÕâЩvalue×éÒ»¸öÊý×é·ÅÔÚÄÚ´æÖУ¬ÕâÑùµ±Êý¾ÝÁ¿½Ï´óʱ£¬¾ÍÐèÒª½Ï¶àÄÚ´æ¡£

µ±ÄÚ´æ²»¹»Ê±£¬Òª²»¾Íʧ°Ü£¬Òª²»¾ÍÓÃÀϰ취°ÑÄÚ´æÖеÄÊý¾ÝÒÆµ½´ÅÅÌÉÏ·Å×Å¡£SparkÒâʶµ½ÔÚ´¦ÀíÊý¾Ý¹æÄ£Ô¶Ô¶´óÓÚÄÚ´æ¿Õ¼äʱËù´øÀ´µÄ²»×㣬ÒýÈëÁËÒ»¸ö¾ßÓÐÍⲿÅÅÐòµÄ·½°¸¡£Shuffle¹ýÀ´µÄÊý¾ÝÏÈ·ÅÔÚÄÚ´æÖУ¬µ±ÄÚ´æÖд洢µÄ<key, value>¶Ô³¬¹ý1000²¢ÇÒÄÚ´æÊ¹Óó¬¹ý70%ʱ£¬ÅжϽڵãÉÏ¿ÉÓÃÄÚ´æÈç¹û»¹×ã¹»£¬Ôò°ÑÄڴ滺³åÇø´óС·­±¶£¬Èç¹û¿ÉÓÃÄÚ´æ²»ÔÙ¹»ÁË£¬Ôò°ÑÄÚ´æÖеÄ<key, value>¶ÔÅÅÐòÈ»ºóдµ½´ÅÅÌÎļþÖС£×îºó°ÑÄڴ滺³åÇøÖеÄÊý¾ÝÅÅÐòÖ®ºóºÍÄÇЩ´ÅÅÌÎļþ×é³ÉÒ»¸ö×îС¶Ñ£¬Ã¿´Î´Ó×îС¶ÑÖжÁÈ¡×îСµÄÊý¾Ý£¬Õâ¸öºÍMapReduceÖеÄmerge¹ý³ÌÀàËÆ¡£

3.MapReduceºÍSparkµÄShuffle¹ý³Ì¶Ô±È

SparkµÄ×ÊÔ´¹ÜÀíÓë×÷Òµµ÷¶È

Spark¶ÔÓÚ×ÊÔ´¹ÜÀíÓë×÷Òµµ÷¶È¿ÉÒÔʹÓñ¾µØÄ£Ê½£¬Standalone(¶ÀÁ¢Ä£Ê½)£¬Apache Mesos¼°Hadoop YARNÀ´ÊµÏÖ¡£Spark on YarnÔÚSpark0.6ʱÒýÓ㬵«ÕæÕý¿ÉÓÃÊÇÔÚÏÖÔÚµÄbranch-0.8°æ±¾¡£Spark on Yarn×ñÑ­YARNµÄ¹Ù·½¹æ·¶ÊµÏÖ£¬µÃÒæÓÚSparkÌìÉúÖ§³Ö¶àÖÖSchedulerºÍExecutorµÄÁ¼ºÃÉè¼Æ£¬¶ÔYARNµÄÖ§³ÖÒ²¾Í·Ç³£ÈÝÒ×£¬Spark on YarnµÄ´óÖ¿ò¼Üͼ¡£

ÈÃSparkÔËÐÐÓÚYARNÉÏÓëHadoop¹²Óü¯Èº×ÊÔ´¿ÉÒÔÌá¸ß×ÊÔ´ÀûÓÃÂÊ¡£

±à³Ì½Ó¿Ú

Sparkͨ¹ýÓë±à³ÌÓïÑÔ¼¯³ÉµÄ·½Ê½±©Â¶RDDµÄ²Ù×÷£¬ÀàËÆÓÚDryadLINQºÍFlumeJava£¬Ã¿¸öÊý¾Ý¼¯¶¼±íʾΪRDD¶ÔÏ󣬶ÔÊý¾Ý¼¯µÄ²Ù×÷¾Í±íʾ³É¶ÔRDD¶ÔÏóµÄ²Ù×÷¡£SparkÖ÷ÒªµÄ±à³ÌÓïÑÔÊÇScala£¬Ñ¡ÔñScalaÊÇÒòΪËüµÄ¼ò½àÐÔ£¨Scala¿ÉÒԺܷ½±ãÔÚ½»»¥Ê½ÏÂʹÓ㩺ÍÐÔÄÜ£¨JVMÉϵľ²Ì¬Ç¿ÀàÐÍÓïÑÔ£©¡£

SparkºÍHadoop MapReduceÀàËÆ£¬ÓÉMaster(ÀàËÆÓÚMapReduceµÄJobtracker)ºÍWorkers(SparkµÄSlave¹¤×÷½Úµã)×é³É¡£Óû§±àдµÄSpark³ÌÐò±»³ÆÎªDriver³ÌÐò£¬Dirver³ÌÐò»áÁ¬½Ómaster²¢¶¨ÒåÁ˶Ը÷RDDµÄת»»Óë²Ù×÷£¬¶ø¶ÔRDDµÄת»»Óë²Ù×÷ͨ¹ýScala±Õ°ü(×ÖÃæÁ¿º¯Êý)À´±íʾ£¬ScalaʹÓÃJava¶ÔÏóÀ´±íʾ±Õ°üÇÒ¶¼ÊÇ¿ÉÐòÁл¯µÄ£¬ÒԴ˰ѶÔRDDµÄ±Õ°ü²Ù×÷·¢Ë͵½¸÷Workers½Úµã¡£ Workers´æ´¢×ÅÊý¾Ý·Ö¿éºÍÏíÓм¯ÈºÄڴ棬ÊÇÔËÐÐÔÚ¹¤×÷½ÚµãÉϵÄÊØ»¤½ø³Ì£¬µ±ËüÊÕµ½¶ÔRDDµÄ²Ù×÷ʱ£¬¸ù¾ÝÊý¾Ý·ÖƬÐÅÏ¢½øÐб¾µØ»¯Êý¾Ý²Ù×÷£¬Éú³ÉеÄÊý¾Ý·ÖƬ¡¢·µ»Ø½á¹û»ò°ÑRDDдÈë´æ´¢ÏµÍ³¡£

Scala£ºSparkʹÓÃScala¿ª·¢£¬Ä¬ÈÏʹÓÃScala×÷Ϊ±à³ÌÓïÑÔ¡£±àдSpark³ÌÐò±È±àдHadoop MapReduce³ÌÐòÒª¼òµ¥µÄ¶à£¬SparKÌṩÁËSpark-Shell£¬¿ÉÒÔÔÚSpark-Shell²âÊÔ³ÌÐò¡£Ð´SparK³ÌÐòµÄÒ»°ã²½Öè¾ÍÊÇ´´½¨»òʹÓÃ(SparkContext)ʵÀý£¬Ê¹ÓÃSparkContext´´½¨RDD£¬È»ºó¾ÍÊǶÔRDD½øÐвÙ×÷¡£

Java£ºSparkÖ§³ÖJava±à³Ì£¬µ«¶ÔÓÚʹÓÃJava¾ÍûÓÐÁËSpark-ShellÕâÑù·½±ãµÄ¹¤¾ß£¬ÆäËüÓëScala±à³ÌÊÇÒ»ÑùµÄ£¬ÒòΪ¶¼ÊÇJVMÉϵÄÓïÑÔ£¬ScalaÓëJava¿ÉÒÔ»¥²Ù×÷£¬Java±à³Ì½Ó¿ÚÆäʵ¾ÍÊǶÔScalaµÄ·â×°¡£È磺

Python£ºÏÖÔÚSparkÒ²ÌṩÁËPython±à³Ì½Ó¿Ú£¬SparkʹÓÃpy4jÀ´ÊµÏÖpythonÓëjavaµÄ»¥²Ù×÷£¬´Ó¶øÊµÏÖʹÓÃpython±àдSpark³ÌÐò¡£SparkҲͬÑùÌṩÁËpyspark£¬Ò»¸öSparkµÄpython shell£¬¿ÉÒÔÒÔ½»»¥Ê½µÄ·½Ê½Ê¹ÓÃPython±àдSpark³ÌÐò¡£

SparkÉú̬ϵͳ

Shark ( Hive on Spark): Shark»ù±¾ÉϾÍÊÇÔÚSparkµÄ¿ò¼Ü»ù´¡ÉÏÌṩºÍHiveÒ»ÑùµÄH iveQLÃüÁî½Ó¿Ú£¬ÎªÁË×î´ó³Ì¶ÈµÄ±£³ÖºÍHiveµÄ¼æÈÝÐÔ£¬SharkʹÓÃÁËHiveµÄAPIÀ´ÊµÏÖquery ParsingºÍ Logic Plan generation£¬×îºóµÄPhysicalPlan execution½×¶ÎÓÃSpark´úÌæHadoop MapReduce¡£Í¨¹ýÅäÖÃShark²ÎÊý£¬Shark¿ÉÒÔ×Ô¶¯ÔÚÄÚ´æÖлº´æÌض¨µÄRDD£¬ÊµÏÖÊý¾ÝÖØÓ㬽ø¶ø¼Ó¿ìÌØ¶¨Êý¾Ý¼¯µÄ¼ìË÷¡£Í¬Ê±£¬Sharkͨ¹ýUDFÓû§×Ô¶¨Ò庯ÊýʵÏÖÌØ¶¨µÄÊý¾Ý·ÖÎöѧϰËã·¨£¬Ê¹µÃSQLÊý¾Ý²éѯºÍÔËËã·ÖÎöÄܽáºÏÔÚÒ»Æð£¬×î´ó»¯RDDµÄÖØ¸´Ê¹Óá£

Spark streaming: ¹¹½¨ÔÚSparkÉÏ´¦ÀíStreamÊý¾ÝµÄ¿ò¼Ü£¬»ù±¾µÄÔ­ÀíÊǽ«StreamÊý¾Ý·Ö³ÉСµÄʱ¼äƬ¶Ï£¨¼¸Ã룩£¬ÒÔÀàËÆbatchÅúÁ¿´¦ÀíµÄ·½Ê½À´´¦ÀíÕâС²¿·ÖÊý¾Ý¡£Spark Streaming¹¹½¨ÔÚSparkÉÏ£¬Ò»·½ÃæÊÇÒòΪSparkµÄµÍÑÓ³ÙÖ´ÐÐÒýÇæ£¨100ms+£©¿ÉÒÔÓÃÓÚʵʱ¼ÆË㣬ÁíÒ»·½ÃæÏà±È»ùÓÚRecordµÄÆäËü´¦Àí¿ò¼Ü£¨ÈçStorm£©£¬RDDÊý¾Ý¼¯¸üÈÝÒ××ö¸ßЧµÄÈÝ´í´¦Àí¡£´ËÍâСÅúÁ¿´¦ÀíµÄ·½Ê½Ê¹µÃËü¿ÉÒÔͬʱ¼æÈÝÅúÁ¿ºÍʵʱÊý¾Ý´¦ÀíµÄÂß¼­ºÍËã·¨¡£·½±ãÁËһЩÐèÒªÀúÊ·Êý¾ÝºÍʵʱÊý¾ÝÁªºÏ·ÖÎöµÄÌØ¶¨Ó¦Óó¡ºÏ¡£

Bagel: Pregel on Spark£¬¿ÉÒÔÓÃSpark½øÐÐͼ¼ÆË㣬ÕâÊǸö·Ç³£ÓÐÓõÄСÏîÄ¿¡£Bagel×Ô´øÁËÒ»¸öÀý×Ó£¬ÊµÏÖÁËGoogleµÄPageRankËã·¨¡£

SparkµÄÊÊÓó¡¾°

1.SparkÊÇ»ùÓÚÄÚ´æµÄµü´ú¼ÆËã¿ò¼Ü£¬ÊÊÓÃÓÚÐèÒª¶à´Î²Ù×÷ÌØ¶¨Êý¾Ý¼¯µÄÓ¦Óó¡ºÏ¡£ÐèÒª·´¸´²Ù×÷µÄ´ÎÊýÔ½¶à£¬ËùÐè¶ÁÈ¡µÄÊý¾ÝÁ¿Ô½´ó£¬ÊÜÒæÔ½´ó£¬Êý¾ÝÁ¿Ð¡µ«ÊǼÆËãÃܼ¯¶È½Ï´óµÄ³¡ºÏ£¬ÊÜÒæ¾ÍÏà¶Ô½ÏС

2.ÓÉÓÚRDDµÄÌØÐÔ£¬Spark²»ÊÊÓÃÄÇÖÖÒ첽ϸÁ£¶È¸üÐÂ״̬µÄÓ¦Óã¬ÀýÈçweb·þÎñµÄ´æ´¢»òÕßÊÇÔöÁ¿µÄwebÅÀ³æºÍË÷Òý¡£¾ÍÊǶÔÓÚÄÇÖÖÔöÁ¿Ð޸ĵÄÓ¦ÓÃÄ£ÐͲ»Êʺϡ£

×ܵÄÀ´ËµSparkµÄÊÊÓÃÃæ±È½Ï¹ã·ºÇұȽÏͨÓá£

ÔÚÒµ½çµÄʹÓÃ

SparkÏîÄ¿ÔÚ2009ÄêÆô¶¯£¬2010Ä꿪Դ, ÏÖÔÚʹÓõÄÓУºBerkeley, Princeton, Klout, Foursquare, Conviva, Quantifind, Yahoo! Research & others, ÌÔ±¦µÈ£¬¶¹°êÒ²ÔÚʹÓÃSparkµÄpython¿Ë¡°æDpark¡£

   
3591 ´Îä¯ÀÀ       29
Ïà¹ØÎÄÕÂ

»ùÓÚEAµÄÊý¾Ý¿â½¨Ä£
Êý¾ÝÁ÷½¨Ä££¨EAÖ¸ÄÏ£©
¡°Êý¾Ýºþ¡±£º¸ÅÄî¡¢ÌØÕ÷¡¢¼Ü¹¹Óë°¸Àý
ÔÚÏßÉ̳ÇÊý¾Ý¿âϵͳÉè¼Æ ˼·+Ч¹û
 
Ïà¹ØÎĵµ

GreenplumÊý¾Ý¿â»ù´¡Åàѵ
MySQL5.1ÐÔÄÜÓÅ»¯·½°¸
ijµçÉÌÊý¾ÝÖÐ̨¼Ü¹¹Êµ¼ù
MySQL¸ßÀ©Õ¹¼Ü¹¹Éè¼Æ
Ïà¹Ø¿Î³Ì

Êý¾ÝÖÎÀí¡¢Êý¾Ý¼Ü¹¹¼°Êý¾Ý±ê×¼
MongoDBʵս¿Î³Ì
²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿âÉè¼ÆÓëÓÅ»¯
PostgreSQLÊý¾Ý¿âʵսÅàѵ
×îл¼Æ»®
DeepSeekÔÚÈí¼þ²âÊÔÓ¦ÓÃʵ¼ù 4-12[ÔÚÏß]
DeepSeek´óÄ£ÐÍÓ¦Óÿª·¢Êµ¼ù 4-19[ÔÚÏß]
UAF¼Ü¹¹ÌåϵÓëʵ¼ù 4-11[±±¾©]
AIÖÇÄÜ»¯Èí¼þ²âÊÔ·½·¨Óëʵ¼ù 5-23[ÉϺ£]
»ùÓÚ UML ºÍEA½øÐзÖÎöÉè¼Æ 4-26[±±¾©]
ÒµÎñ¼Ü¹¹Éè¼ÆÓ뽨ģ 4-18[±±¾©]

MySQLË÷Òý±³ºóµÄÊý¾Ý½á¹¹
MySQLÐÔÄܵ÷ÓÅÓë¼Ü¹¹Éè¼Æ
SQL ServerÊý¾Ý¿â±¸·ÝÓë»Ö¸´
ÈÃÊý¾Ý¿â·ÉÆðÀ´ 10´óDB2ÓÅ»¯
oracleµÄÁÙʱ±í¿Õ¼äдÂú´ÅÅÌ
Êý¾Ý¿âµÄ¿çƽ̨Éè¼Æ


²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿â
¸ß¼¶Êý¾Ý¿â¼Ü¹¹Éè¼ÆÊ¦
HadoopÔ­ÀíÓëʵ¼ù
Oracle Êý¾Ý²Ö¿â
Êý¾Ý²Ö¿âºÍÊý¾ÝÍÚ¾ò
OracleÊý¾Ý¿â¿ª·¢Óë¹ÜÀí


GE Çø¿éÁ´¼¼ÊõÓëʵÏÖÅàѵ
º½Ìì¿Æ¹¤Ä³×Ó¹«Ë¾ Nodejs¸ß¼¶Ó¦Óÿª·¢
ÖÐÊ¢Òæ»ª ׿Խ¹ÜÀíÕß±ØÐë¾ß±¸µÄÎåÏîÄÜÁ¦
ijÐÅÏ¢¼¼Êõ¹«Ë¾ PythonÅàѵ
ij²©²ÊITϵͳ³§ÉÌ Ò×ÓÃÐÔ²âÊÔÓëÆÀ¹À
ÖйúÓÊ´¢ÒøÐÐ ²âÊÔ³ÉÊì¶ÈÄ£Ðͼ¯³É(TMMI)
ÖÐÎïÔº ²úÆ·¾­ÀíÓë²úÆ·¹ÜÀí