»ù±¾¸ÅÄî
SparkÊÇÒ»¸ö·Ö²¼Ê½µÄÄÚ´æ¼ÆËã¿ò¼Ü£¬ÆäÌØµãÊÇÄÜ´¦Àí´ó¹æÄ£Êý¾Ý£¬¼ÆËãËٶȿ졣SparkÑÓÐøÁËHadoopµÄMapReduce¼ÆËãÄ£ÐÍ£¬Ïà±ÈÖ®ÏÂSparkµÄ¼ÆËã¹ý³Ì±£³ÖÔÚÄÚ´æÖУ¬¼õÉÙÁËÓ²Å̶Áд£¬Äܹ»½«¶à¸ö²Ù×÷½øÐкϲ¢ºó¼ÆË㣬Òò´ËÌáÉýÁ˼ÆËãËÙ¶È¡£Í¬Ê±SparkÒ²ÌṩÁ˸ü·á¸»µÄ¼ÆËãAPI¡£
MapReduceÊÇHadoopºÍSparkµÄ¼ÆËãÄ£ÐÍ£¬ÆäÌØµãÊÇMapºÍReduce¹ý³Ì¸ß¶È¿É²¢Ðл¯£»¹ý³Ì¼äñîºÏ¶ÈµÍ£¬µ¥¸ö¹ý³ÌµÄʧ°Üºó¿ÉÒÔÖØÐ¼ÆË㣬¶ø²»»áµ¼ÖÂÕûÌåʧ°Ü£»×îÖØÒªµÄÊÇÊý¾Ý´¦ÀíÖеļÆËãÂß¼¿ÉÒԺܺõÄת»»ÎªMapºÍReduce²Ù×÷¡£¶ÔÓÚÒ»¸öÊý¾Ý¼¯À´Ëµ£¬Map¶ÔÿÌõÊý¾Ý×öÏàͬµÄת»»²Ù×÷£¬Reduce¿ÉÒÔ°´Ìõ¼þ¶ÔÊý¾Ý·Ö×飬ȻºóÔÚ·Ö×éÉÏ×ö²Ù×÷¡£³ýÁËMapºÍReduce²Ù×÷Ö®Í⣬Spark»¹ÑÓÉì³öÁËÈçfilter£¬flatMap£¬count£¬distinctµÈ¸ü·á¸»µÄ²Ù×÷¡£
RDDµÄÊÇSparkÖÐ×îÖ÷ÒªµÄÊý¾Ý½á¹¹£¬¿ÉÒÔÖ±¹ÛµÄÈÏΪRDD¾ÍÊÇÒª´¦ÀíµÄÊý¾Ý¼¯¡£RDDÊÇ·Ö²¼Ê½µÄÊý¾Ý¼¯£¬Ã¿¸öRDD¶¼Ö§³ÖMapReduceÀà²Ù×÷£¬¾¹ýMapReduce²Ù×÷ºó»á²úÉúеÄRDD£¬¶ø²»»áÐÞ¸ÄÔÓÐRDD¡£RDDµÄÊý¾Ý¼¯ÊÇ·ÖÇøµÄ£¬Òò´Ë¿ÉÒÔ°Ñÿ¸öÊý¾Ý·ÖÇø·Åµ½²»Í¬µÄ·ÖÇøÉϽøÐмÆË㣬¶øÊµ¼ÊÉÏ´ó¶àÊýMapReduce²Ù×÷¶¼ÊÇÔÚ·ÖÇøÉϽøÐмÆËãµÄ¡£Spark²»»á°Ñÿһ¸öMapReduce²Ù×÷¶¼·¢ÆðÔËË㣬¶øÊǾ¡Á¿µÄ°Ñ²Ù×÷ÀÛ¼ÆÆðÀ´Ò»Æð¼ÆËã¡£Spark°Ñ²Ù×÷»®·ÖΪת»»£¨transformation£©ºÍ¶¯×÷£¨action£©£¬¶ÔRDD½øÐеÄת»»²Ù×÷»áµþ¼ÓÆðÀ´£¬Ö±µ½¶ÔRDD½øÐж¯×÷²Ù×÷ʱ²Å»á·¢Æð¼ÆËã¡£ÕâÖÖÌØÐÔҲʹSpark¿ÉÒÔ¼õÉÙÖмä½á¹ûµÄÍÌÍ£¬¿ÉÒÔ¿ìËٵĽøÐжà´Îµü´ú¼ÆËã¡£
ϵͳ½á¹¹
Spark×ÔÉíÖ»¶Ô¼ÆË㸺Ôð£¬Æä¼ÆËã×ÊÔ´µÄ¹ÜÀíºÍµ÷¶ÈÓɵÚÈý·½¿ò¼ÜÀ´ÊµÏÖ¡£³£ÓõĿò¼ÜÓÐYARNºÍMesos¡£±¾ÎÄÒÔYARNΪÀý½øÐнéÉÜ¡£ÏÈ¿´Ò»ÏÂSpark
on YARNµÄϵͳ½á¹¹Í¼£º

Spark on YARNϵͳ½á¹¹Í¼
ͼÖй²·ÖΪÈý´ó²¿·Ö£ºSpark Driver£¬ Worker£¬ Cluster manager¡£ÆäÖÐDriver
program¸ºÔð½«RDDת»»ÎªÈÎÎñ£¬²¢½øÐÐÈÎÎñµ÷¶È¡£Worker¸ºÔðÈÎÎñµÄÖ´ÐС£YARN¸ºÔð¼ÆËã×ÊÔ´µÄά»¤ºÍ·ÖÅä¡£
Driver¿ÉÒÔÔËÐÐÔÚÓû§³ÌÐòÖУ¬»òÕßÔËÐÐÔÚÆäÖÐÒ»¸öWorkerÉÏ¡£SparkÖеÄÿһ¸öÓ¦Óã¨Application£©¶ÔÓ¦×ÅÒ»¸öDriver¡£Õâ¸öDriver¿ÉÒÔ½ÓÊÕRDDÉϵļÆËãÇëÇó£¬Ã¿¸ö¶¯×÷£¨Action£©ÀàÐ͵IJÙ×÷½«±»×÷Ϊһ¸öJob½øÐмÆËã¡£Spark»á¸ù¾ÝRDDµÄÒÀÀµ¹ØÏµ¹¹½¨¼ÆËã½×¶Î£¨Stage£©µÄÓÐÏòÎÞ»·Í¼£¬Ã¿¸ö½×¶ÎÓÐÓë·ÖÇøÊýÏàͬµÄÈÎÎñ£¨Task£©¡£ÕâЩÈÎÎñ½«ÔÚÿ¸ö·ÖÇø£¨Partition£©ÉϽøÐмÆË㣬ÈÎÎñ»®·ÖÍê³ÉºóDriver½«ÈÎÎñÌá½»µ½ÔËÐÐÓÚWorkerÉϵÄExecutorÖнøÐмÆË㣬²¢¶ÔÈÎÎñµÄ³É¹¦¡¢Ê§°Ü½øÐмǼºÍÖØÆôµÈ´¦Àí¡£
WorkerÒ»°ã¶ÔӦһ̨ÎïÀí»ú£¬Ã¿¸öWorkerÉÏ¿ÉÒÔÔËÐжà¸öExecutor£¬Ã¿¸öExecutor¶¼ÊǶÀÁ¢µÄJVM½ø³Ì£¬DriverÌá½»µÄÈÎÎñ¾ÍÊÇÒÔÏ̵߳ÄÐÎʽÔËÐÐÔÚExecutorÖеġ£Èç¹ûʹÓÃYARN×÷Ϊ×ÊÔ´µ÷¶È¿ò¼ÜµÄ»°£¬ÆäÖÐÒ»¸öWorkerÉÏ»¹»áÓÐExecutor
launcher×÷ΪYARNµÄApplicationMaster£¬ÓÃÓÚÏòYARNÉêÇë¼ÆËã×ÊÔ´£¬²¢Æô¶¯¡¢¼à²â¡¢ÖØÆôExecutor¡£
¼ÆËã¹ý³Ì
ÕâÀïÎÒÃÇ´ÓRDDµ½Êä³ö½á¹ûµÄÕû¸ö¼ÆËã¹ý³ÌΪÖ÷Ïߣ¬Ì½¾¿SparkµÄ¼ÆËã¹ý³Ì¡£Õâ¸ö¼ÆËã¹ý³Ì¿ÉÒÔ·ÖΪ£º
RDD¹¹½¨£º¹¹½¨RDDÖ®¼äµÄÒÀÀµ¹ØÏµ£¬½«RDDת»»Îª½×¶ÎµÄÓÐÏòÎÞ»·Í¼¡£
ÈÎÎñµ÷¶È£º¸ù¾Ý¿ÕÏмÆËã×ÊÔ´Çé¿ö½øÐÐÈÎÎñÌá½»£¬²¢¶ÔÈÎÎñµÄÔËÐÐ״̬½øÐмà²âºÍ´¦Àí¡£
ÈÎÎñ¼ÆË㣺´î½¨ÈÎÎñÔËÐл·¾³£¬Ö´ÐÐÈÎÎñ²¢·µ»ØÈÎÎñ½á¹û¡£
Shuffle¹ý³Ì£ºÁ½¸ö½×¶ÎÖ®¼äÓпíÒÀÀµÊ±£¬ÐèÒª½øÐÐShuffle²Ù×÷¡£
¼ÆËã½á¹ûÊÕ¼¯£º´Óÿ¸öÈÎÎñÊÕ¼¯²¢»ã×ܽá¹û¡£
ÔÚÕâÀïÎÒÃÇÓÃÒ»¸ö¼ò½àµÄCharCount³ÌÐòΪÀý£¬Õâ¸ö³ÌÐò°Ñº¬ÓÐa-z×Ö·ûµÄÁбíת»¯ÎªRDD£¬¶Ô´ËRDD½øÐÐÁËMapºÍReduce²Ù×÷¼ÆËãÿ¸ö×ÖĸµÄƵÊý£¬×îºó½«½á¹ûÊÕ¼¯¡£Æä´úÂëÈçÏ£º

CharCountÀý×Ó³ÌÐò
RDD¹¹½¨ºÍת»»
RDD°´ÕÕÆä×÷ÓÿÉÒÔ·ÖΪÁ½ÖÖÀàÐÍ£¬Ò»ÖÖÊǶÔÊý¾ÝÔ´µÄ·â×°£¬¿ÉÒÔ°ÑÊý¾ÝԴת»»ÎªRDD£¬ÕâÖÖÀàÐ͵ÄRDD°üÀ¨NewHadoopRDD£¬ParallelCollectionRDD£¬JdbcRDDµÈ¡£ÁíÒ»ÖÖÊǶÔRDDµÄת»»£¬´Ó¶øÊµÏÖÒ»ÖÖ¼ÆËã·½·¨£¬ÕâÖÖÀàÐ͵ÄRDD°üÀ¨MappedRDD£¬ShuffledRDD£¬FilteredRDDµÈ¡£Êý¾ÝÔ´ÀàÐ͵ÄRDD²»ÒÀÀµÓÚÆäËûRDD£¬¼ÆËãÀàµÄRDDÓµÓÐ×Ô¼ºµÄRDDÒÀÀµ¡£
RDDÓÐÈý¸öÒªËØ£º·ÖÇø£¬ÒÀÀµ¹ØÏµ£¬¼ÆËãÂß¼¡£·ÖÇøÊDZ£Ö¤RDD·Ö²¼Ê½µÄÌØÐÔ£¬·ÖÇø¿ÉÒÔ¶ÔRDDµÄÊý¾Ý½øÐл®·Ö£¬»®·ÖºóµÄ·ÖÇø¿ÉÒÔ·Ö²¼µ½²»Í¬µÄExecutorÖУ¬´ó²¿·Ö¶ÔRDDµÄ¼ÆËã¶¼ÊÇÔÚ·ÖÇøÉϽøÐеġ£ÒÀÀµ¹ØÏµÎ¬»¤×ÅRDDµÄ¼ÆËã¹ý³Ì£¬Ã¿¸ö¼ÆËãÀàÐ͵ÄRDDÔÚ¼ÆËãʱ£¬»á½«ËùÒÀÀµµÄRDD×÷ΪÊý¾ÝÔ´½øÐмÆËã¡£¸ù¾ÝÒ»¸ö·ÖÇøµÄÊä³öÊÇ·ñ±»¶à·ÖÇøÊ¹Óã¬Spark»¹½«ÒÀÀµ·ÖΪÕÒÀÀµºÍ¿íÒÀÀµ¡£RDDµÄ¼ÆËãÂß¼ÊÇÆä¹¦ÄܵÄÌåÏÖ£¬Æä¼ÆËã¹ý³ÌÊÇÒÔËùÒÀÀµµÄRDDΪÊý¾ÝÔ´½øÐеġ£
Àý×ÓÖй²²úÉúÁËÈý¸öRDD£¬³ýÁ˵ÚÒ»¸öRDDÖ®Í⣬ÿ¸öRDDÓëÉϼ¶RDDÓÐÒÀÀµ¹ØÏµ¡£
1.spark.parallelize(data, partitionSize)·½·¨½«²úÉúÒ»¸öÊý¾ÝÔ´Ð͵ÄParallelCollectionRDD£¬Õâ¸öRDDµÄ·ÖÇøÊǶÔÁбíÊý¾ÝµÄÇз֣¬Ã»ÓÐÉϼ¶ÒÀÀµ£¬¼ÆËãÂß¼ÊÇÖ±½Ó·µ»Ø·ÖÇøÊý¾Ý¡£
2.mapº¯Êý½«»á´´½¨Ò»¸öMappedRDD£¬Æä·ÖÇøÓëÉϼ¶ÒÀÀµÏàͬ£¬»áÓÐÒ»¸öÒÀÀµÓÚParallelCollectionRDDµÄÕÒÀÀµ£¬¼ÆËãÂß¼ÊǶÔParallelCollectionRDDµÄÊý¾Ý×ömap²Ù×÷¡£
3.reduceByKeyº¯Êý½«»á²úÉúÒ»¸öShuffledRDD£¬·ÖÇøÊýÁ¿ÓëÉÏÃæµÄMappedRDDÏàͬ£¬»áÓÐÒ»¸öÒÀÀµÓÚMappedRDDµÄ¿íÒÀÀµ£¬¼ÆËãÂß¼ÊÇShuffleºóÔÚ·ÖÇøÉϵľۺϲÙ×÷¡£

RDDµÄÒÀÀµ¹ØÏµ
SparkÔÚÓöµ½¶¯×÷Àà²Ù×÷ʱ£¬¾Í»á·¢Æð¼ÆËãJob£¬°ÑRDDת»»ÎªÈÎÎñ£¬²¢·¢ËÍÈÎÎñµ½ExecutorÉÏÖ´ÐС£´ÓRDDµ½ÈÎÎñµÄת»»¹ý³ÌÊÇÔÚDAGSchedulerÖнøÐеġ£Æä×ÜÌå˼·ÊǸù¾ÝRDDµÄÒÀÀµ¹ØÏµ£¬°ÑÕÒÀÀµºÏ²¢µ½Ò»¸ö½×¶ÎÖУ¬Óöµ½¿íÒÀÀµÔò»®·Ö³öеĽ׶Σ¬×îÖÕÐγÉÒ»¸ö½×¶ÎµÄÓÐÏòÎÞ»·Í¼£¬²¢¸ù¾ÝͼµÄÒÀÀµ¹ØÏµÏȺóÌá½»½×¶Î¡£Ã¿¸ö½×¶Î°´ÕÕ·ÖÇøÊýÁ¿»®·ÖΪ¶à¸öÈÎÎñ£¬×îÖÕÈÎÎñ±»ÐòÁл¯²¢Ìá½»µ½ExecutorÉÏÖ´ÐС£

RDDµ½TaskµÄ¹¹½¨¹ý³Ì
µ±RDDµÄ¶¯×÷Àà²Ù×÷±»µ÷ÓÃʱ£¬RDD½«µ÷ÓÃSparkContext¿ªÊ¼Ìá½»Job£¬SparkContext½«µ÷ÓÃDAGScheduler°ÑRDDת»¯Îª½×¶ÎµÄÓÐÏòÎÞ»·Í¼£¬È»ºóÊ×ÏȽ«ÓÐÏòÎÞ»·Í¼ÖÐûÓÐδÍê³ÉµÄÒÀÀµµÄ½×¶Î½øÐÐÌá½»¡£Ôڽ׶α»Ìύʱ£¬Ã¿¸ö½×¶Î½«²úÉúÓë·ÖÇøÊýÁ¿ÏàͬµÄÈÎÎñ£¬ÕâЩÈÎÎñ³ÆÖ®ÎªÒ»¸öTaskSet¡£ÈÎÎñµÄÀàÐÍ·ÖΪ
ShuffleMapTaskºÍResultTask£¬Èç¹û½×¶ÎµÄÊä³ö½«ÓÃÓÚϸö½×¶ÎµÄÊäÈ룬Ҳ¾ÍÊÇÐèÒª½øÐÐShuffle²Ù×÷£¬ÔòÈÎÎñÀàÐÍΪShuffleMapTask¡£Èç¹û½×¶ÎµÄÊäÈ뼴ΪJob½á¹û£¬ÔòÈÎÎñÀàÐÍΪResultTask¡£ÈÎÎñ´´½¨Íê³Éºó»á½»¸øTaskSchedulerImpl½øÐÐTaskSet¼¶±ðµÄµ÷¶ÈÖ´ÐС£
ÈÎÎñµ÷¶È
ÔÚÈÎÎñµ÷¶ÈµÄ·Ö¹¤ÉÏ£¬DAGScheduler¸ºÔð×ÜÌåµÄÈÎÎñµ÷¶È£¬SchedulerBackend¸ºÔðÓëExecutorsͨÐÅ£¬Î¬»¤¼ÆËã×ÊÔ´ÐÅÏ¢£¬²¢¸ºÔð½«ÈÎÎñÐòÁл¯²¢Ìá½»µ½Executor¡£TaskSetManager¸ºÔð¶ÔÒ»¸ö½×¶ÎµÄÈÎÎñ½øÐйÜÀí£¬ÆäÖлá¸ù¾ÝÈÎÎñµÄÊý¾Ý±¾µØÐÔÑ¡ÔñÓÅÏÈÌá½»µÄÈÎÎñ¡£TaskSchedulerImpl¸ºÔð¶ÔTaskSet½øÐе÷¶È£¬Í¨¹ýµ÷¶È²ßÂÔÈ·¶¨TaskSetÓÅÏȼ¶¡£Í¬Ê±ÊÇÒ»¸öÖнéÕߣ¬Æä½«DAGScheduler£¬SchedulerBackendºÍTaskSetManagerÁª½áÆðÀ´£¬¶ÔExecutorºÍTaskµÄÏà¹ØÊ¼þ½øÐÐת·¢¡£
ÔÚÈÎÎñÌá½»Á÷³ÌÉÏ£¬DAGSchedulerÌá½»TaskSetµ½TaskSchedulerImpl£¬Ê¹TaskSetÔÚ´Ë×¢²á¡£TaskSchedulerImpl֪ͨSchedulerBackendÓÐеÄÈÎÎñ½øÈ룬SchedulerBackendµ÷ÓÃmakeOffers¸ù¾Ý×¢²áµ½×Ô¼ºµÄExecutorsÐÅÏ¢£¬È·¶¨ÊÇ·ñÓмÆËã×ÊÔ´Ö´ÐÐÈÎÎñ£¬ÈçÓÐ×ÊÔ´Ôò֪ͨTaskSchedulerImplÈ¥·ÖÅäÕâЩ×ÊÔ´¡£
TaskSchedulerImpl¸ù¾ÝTaskSetµ÷¶È²ßÂÔÓÅÏÈ·ÖÅäTaskSet½ÓÊÕ´Ë×ÊÔ´¡£TaskSetManagerÔÙ¸ù¾ÝÈÎÎñµÄÊý¾Ý±¾µØÐÔ£¬È·¶¨Ìá½»ÄÄЩÈÎÎñ¡£×îÖÕÈÎÎñµÄ±Õ°ü±»SchedulerBackendÐòÁл¯£¬²¢´«Ê䏸Executor½øÐÐÖ´ÐС£

SparkµÄÈÎÎñµ÷¶È
¸ù¾ÝÒÔÉϹý³Ì£¬SparkÖеÄÈÎÎñµ÷¶Èʵ¼ÊÉÏ·ÖÁËÈý¸ö²ã´Î¡£µÚÒ»²ã´ÎÊÇ»ùÓڽ׶εÄÓÐÏòÎÞ»·Í¼½øÐÐStageµÄµ÷¶È£¬µÚ¶þ²ã´ÎÊǸù¾Ýµ÷¶È²ßÂÔ£¨FIFO£¬FAIR£©½øÐÐTaskSetµ÷¶È£¬µÚÈý²ã´ÎÊǸù¾ÝÊý¾Ý±¾µØÐÔ£¨Process£¬Node£¬Rack£©ÔÚTaskSetÄÚ½øÐе÷¶È¡£
ÈÎÎñ¼ÆËã
ÈÎÎñµÄ¼ÆËã¹ý³ÌÊÇÔÚExecutorÉÏÍê³ÉµÄ£¬Executor¼àÌýÀ´×ÔSchedulerBackendµÄÖ¸Á½ÓÊÕµ½ÈÎÎñʱ»áÆô¶¯TaskRunnerÏ߳̽øÐÐÈÎÎñÖ´ÐС£ÔÚTaskRunnerÖÐÊ×ÏȽ«ÈÎÎñºÍÏà¹ØÐÅÏ¢·´ÐòÁл¯£¬È»ºó¸ù¾ÝÏà¹ØÐÅÏ¢»ñÈ¡ÈÎÎñËùÒÀÀµµÄJar°üºÍËùÐèÎļþ£¬Íê³É×¼±¸¹¤×÷ºóÖ´ÐÐÈÎÎñµÄrun·½·¨£¬Êµ¼ÊÉϾÍÊÇÖ´ÐÐShuffleMapTask»òResultTaskµÄrun·½·¨¡£ÈÎÎñÖ´ÐÐÍê±Ïºó½«½á¹û·¢Ë͸øDriver½øÐд¦Àí¡£
ÔÚTask.run·½·¨ÖпÉÒÔ¿´µ½ShuffleMapTaskºÍResultTaskÓÐ×Ų»Í¬µÄ¼ÆËãÂß¼¡£ShuffleMapTaskÊǽ«ËùÒÀÀµRDDµÄÊä³öдÈëµ½ShuffleWriterÖУ¬ÎªºóÃæµÄShuffle¹ý³Ì×ö×¼±¸¡£ResultTaskÊÇÔÚËùÒÀÀµRDDÉÏÓ¦ÓÃÒ»¸öº¯Êý£¬²¢·µ»Øº¯ÊýµÄ¼ÆËã½á¹û¡£ÔÚÕâÁ½¸öTaskÖÐÖ»ÄÜ¿´µ½Êý¾ÝµÄÊä³ö·½Ê½£¬¶ø¿´²»µ½Ó¦ÓеļÆËãÂß¼¡£Êµ¼ÊÉϼÆËã¹ý³ÌÊǰüº¬ÔÚRDDÖе쬵÷ÓÃRDD.
Iterator·½·¨»ñÈ¡RDDµÄÊý¾Ý½«´¥·¢Õâ¸öRDDµÄ¼ÆË㶯×÷£¨RDD. Iterator£©£¬ÓÉÓÚ´ËRDDµÄ¼ÆËã¹ý³ÌÖÐÒ²»áʹÓÃËùÒÀÀµRDDµÄÊý¾Ý¡£´Ó¶øRDDµÄ¼ÆËã¹ý³Ì½«µÝ¹éÏòÉÏÖ±µ½Ò»¸öÊý¾ÝÔ´ÀàÐ͵ÄRDD£¬ÔٵݹéÏòϼÆËãÿ¸öRDDµÄÖµ¡£ÐèҪעÒâµÄÊÇ£¬ÒÔÉϵļÆËã¹ý³Ì¶¼ÊÇÔÚ·ÖÇøÉϽøÐе쬶ø²»ÊÇÕû¸öÊý¾Ý¼¯£¬¼ÆËãÍê³ÉµÃµ½µÄÊÇ´Ë·ÖÇøÉϵĽá¹û£¬¶ø²»ÊÇ×îÖÕ½á¹û¡£
´ÓRDDµÄ¼ÆËã¹ý³Ì¿ÉÒÔ¿´³ö£¬RDDµÄ¼ÆËã¹ý³ÌÊǰüº¬ÔÚRDDµÄÒÀÀµ¹ØÏµÖеģ¬Ö»ÒªRDDÖ®¼äÊÇÁ¬ÐøÕÒÀÀµ£¬ÄÇô¶à¸ö¼ÆËã¹ý³Ì¾Í¿ÉÒÔÔÚͬһ¸öTaskÖнøÐмÆË㣬Öмä½á¹û¿ÉÒÔÁ¢¼´±»Ï¸ö²Ù×÷ʹÓ㬶øÎÞÐèÔÚ½ø³Ì¼ä¡¢½Úµã¼ä¡¢´ÅÅÌÉϽøÐн»»»¡£

RDD¼ÆËã¹ý³Ì
Shuffle¹ý³Ì
ShuffleÊÇÒ»¸ö¶ÔÊý¾Ý½øÐзÖ×é¾ÛºÏµÄ²Ù×÷¹ý³Ì£¬ÔÊý¾Ý½«°´ÕÕ¹æÔò½øÐзÖ×飬ȻºóʹÓÃÒ»¸ö¾ÛºÏº¯ÊýÓ¦ÓÃÓÚ·Ö×éÉÏ£¬´Ó¶ø²úÉúÐÂÊý¾Ý¡£Shuffle²Ù×÷µÄÄ¿µÄÊǰÑͬ×éÊý¾Ý·ÖÅäµ½Ïàͬ·ÖÇøÉÏ£¬´Ó¶øÄܹ»ÔÚ·ÖÇøÉϽøÐоۺϼÆË㡣ΪÁËÌá¸ßShuffleÐÔÄÜ£¬»¹¿ÉÒÔÏÈÔÚÔ·ÖÇø¶ÔÊý¾Ý½øÐоۺϣ¨mapSideCombine£©£¬È»ºóÔÙ·ÖÅ䲿·Ö¾ÛºÏµÄÊý¾Ýµ½Ð·ÖÇø£¬µÚÈý²½ÔÚзÖÇøÉÏÔٴνøÐоۺϡ£
ÔÚ»®·Ö½×¶Îʱ£¬Ö»ÓÐÓöµ½¿íÒÀÀµ²Å»á²úÉúн׶Σ¬²ÅÐèÒªShuffle²Ù×÷¡£¿íÒÀÀµÓëÕÒÀÀµÈ¡¾öÓÚÔ·ÖÇø±»Ð·ÖÇøµÄʹÓùØÏµ£¬Ö»ÒªÒ»¸öÔ·ÖÇø»á±»¶à¸öзÖÇøÊ¹Óã¬ÔòΪ¿íÒÀÀµ£¬ÐèÒªShuffle¡£·ñÔòΪÕÒÀÀµ£¬²»ÐèÒªShuffle¡£
ÒÔÉÏÒ²¾ÍÊÇ˵ֻÓн׶ÎÓë½×¶ÎÖ®¼äÐèÒªShuffle£¬×îºóÒ»¸ö½×¶Î»áÊä³ö½á¹û£¬Òò´Ë²»ÐèÒªShuffle¡£Àý×ÓÖеijÌÐò»á²úÉúÁ½¸ö½×¶Î£¬µÚÒ»¸öÎÒÃǼò³ÆMap½×¶Î£¬µÚ¶þ¸öÎÒÃǼò³ÆReduce½×¶Î¡£ShuffleÊÇͨ¹ýMap½×¶ÎµÄShuffleMapTaskÓëReduce½×¶ÎµÄShuffledRDDÅäºÏÍê³ÉµÄ¡£ÆäÖÐShuffleMapTask»á°ÑÈÎÎñµÄ¼ÆËã½á¹ûдÈëShuffleWriter£¬ShuffledRDD´ÓShuffleReaderÖжÁÈ¡Êý¾Ý£¬Shuffle¹ý³Ì»áÔÚдÈëºÍ¶ÁÈ¡¹ý³ÌÖÐÍê³É¡£ÒÔHashShuffleΪÀý£¬HashShuffleWriterÔÚдÈëÊý¾Ýʱ£¬»á¾ö¶¨ÊÇ·ñÔÚÔ·ÖÇø×ö¾ÛºÏ£¬È»ºó¸ù¾ÝÊý¾ÝµÄHashֵдÈëÏàӦзÖÇø¡£HashShuffleReaderÔÙ¸ù¾Ý·ÖÇøºÅÈ¡³öÏàÓ¦Êý¾Ý£¬È»ºó¶ÔÊý¾Ý½øÐоۺϡ£

SparkµÄShuffle¹ý³Ì
¼ÆËã½á¹ûÊÕ¼¯
ResultTaskÈÎÎñ¼ÆËãÍê³Éºó¿ÉÒԵõ½Ã¿¸ö·ÖÇøµÄ¼ÆËã½á¹û£¬´ËʱÐèÒªÔÚDriverÉ϶Խá¹û½øÐлã×Ü´Ó¶øµÃµ½×îÖÕ½á¹û¡£
RDDÔÚÖ´ÐÐcollect£¬countµÈ¶¯×÷ʱ£¬»á¸ø³öÁ½¸öº¯Êý£¬Ò»¸öº¯ÊýÔÚ·ÖÇøÉÏÖ´ÐУ¬Ò»¸öº¯ÊýÔÚ·ÖÇø½á¹û¼¯ÉÏÖ´ÐС£ÀýÈçcollect¶¯×÷ÔÚ·ÖÇøÉÏ£¨ExecutorÖУ©Ö´Ðн«Iteratorת»»ÎªArrayµÄº¯Êý£¬²¢½«´Ëº¯Êý½á¹û·µ»Øµ½Driver¡£Driver
´Ó¶à¸ö·ÖÇøÉϵõ½ArrayÀàÐ͵ķÖÇø½á¹û¼¯£¬È»ºóÔÚ½á¹û¼¯ÉÏ£¨DriverÖУ©Ö´Ðкϲ¢ArrayµÄ²Ù×÷£¬´Ó¶øµÃµ½×îÖÕ½á¹û¡£
×ܽá
Spark¶ÔÓÚRDDµÄÉè¼ÆÊÇÆä¾«ËèËùÔÚ¡£ÓÃRDD²Ù×÷Êý¾ÝµÄ¸Ð¾õ¾ÍÒ»¸ö×Ö£ºË¬£¡¡£Ïëµ½RDD±³ºóÊǼ¸¶ÖÖØµÄ´óÊý¾Ý¼¯£¬¶øÎÒÃÇËæÊÖµ÷ÓÃÏÂmap(),
reduce()¾Í¿ÉÒÔ°ÑËüת»»À´×ª»»È¥£¬Ò»ÖÖ°ëÁ½²¦Ç§½ïµÄ¸Ð¾õ¾Í»áÓÍÈ»¶øÉú¡£ÎÒÏëÊÇÒÔÏÂÌØÐÔ¸øÎÒÃÇ´øÀ´ÁËÕâЩ£º
RDD°Ñ²»Í¬À´Ô´£¬²»Í¬ÀàÐ͵ÄÊý¾Ý½øÐÐÁËͳһ£¬Ê¹ÎÒÃÇÃæ¶ÔRDDµÄʱºò¾Í»á²úÉúÒ»ÖÖÐÅÐÄ£¬¾Í»áÈÏΪÕâÊÇijÖÖÀàÐ͵ÄRDD£¬´Ó¶ø¿ÉÒÔ½øÐÐRDDµÄËùÓвÙ×÷¡£
¶ÔRDDµÄ²Ù×÷¿ÉÒÔµþ¼Óµ½Ò»Æð¼ÆË㣬ÎÒÃDz»±Øµ£ÐÄÖмä½á¹ûÍÌͶÔÐÔÄܵÄÓ°Ïì¡£
RDDÌṩÁ˸ü·á¸»µÄÊý¾Ý¼¯²Ù×÷º¯Êý£¬ÕâЩº¯Êý´ó¶¼ÊÇÔÚMapReduce»ù´¡ÉÏÀ©³äµÄ£¬Ê¹ÓÃÆðÀ´ºÜ·½±ã¡£
RDDΪÌṩÁËÒ»¸ö¼ò½àµÄ±à³Ì½çÃæ£¬±³ºó¸´Ôӵķֲ¼Ê½¼ÆËã¹ý³Ì¶Ô¿ª·¢ÕßÊÇ͸Ã÷µÄ¡£´Ó¶øÄܹ»ÈÃÎÒÃǰѹØ×¢µã¸ü¶àµÄ·ÅÔÚÒµÎñÉÏ¡£
|