MapReduceµÄShuffle¹ý³Ì½éÉÜ
ShuffleµÄ±¾ÒåÊÇÏ´ÅÆ¡¢»ìÏ´£¬°ÑÒ»×éÓÐÒ»¶¨¹æÔòµÄÊý¾Ý¾¡Á¿×ª»»³ÉÒ»×éÎÞ¹æÔòµÄÊý¾Ý£¬Ô½Ëæ»úÔ½ºÃ¡£MapReduceÖеÄShuffle¸üÏñÊÇÏ´ÅÆµÄÄæ¹ý³Ì£¬°ÑÒ»×éÎÞ¹æÔòµÄÊý¾Ý¾¡Á¿×ª»»³ÉÒ»×é¾ßÓÐÒ»¶¨¹æÔòµÄÊý¾Ý¡£
ΪʲôMapReduce¼ÆËãÄ£ÐÍÐèÒªShuffle¹ý³Ì£¿ÎÒÃǶ¼ÖªµÀMapReduce¼ÆËãÄ£ÐÍÒ»°ã°üÀ¨Á½¸öÖØÒªµÄ½×¶Î£ºMapÊÇÓ³É䣬¸ºÔðÊý¾ÝµÄ¹ýÂË·Ö·¢£»ReduceÊǹæÔ¼£¬¸ºÔðÊý¾ÝµÄ¼ÆËã¹é²¢¡£ReduceµÄÊý¾ÝÀ´Ô´ÓÚMap£¬MapµÄÊä³ö¼´ÊÇReduceµÄÊäÈ룬ReduceÐèҪͨ¹ýShuffleÀ´»ñÈ¡Êý¾Ý¡£
´ÓMapÊä³öµ½ReduceÊäÈëµÄÕû¸ö¹ý³Ì¿ÉÒÔ¹ãÒ嵨³ÆÎªShuffle¡£Shuffleºá¿çMap¶ËºÍReduce¶Ë£¬ÔÚMap¶Ë°üÀ¨Spill¹ý³Ì£¬ÔÚReduce¶Ë°üÀ¨copyºÍsort¹ý³Ì£¬ÈçͼËùʾ£º
Spill¹ý³Ì
Spill¹ý³Ì°üÀ¨Êä³ö¡¢ÅÅÐò¡¢Òçд¡¢ºÏ²¢µÈ²½Ö裬ÈçͼËùʾ£º

Collect
ÿ¸öMapÈÎÎñ²»¶ÏµØÒÔ<key, value>¶ÔµÄÐÎʽ°ÑÊý¾ÝÊä³öµ½ÔÚÄÚ´æÖй¹ÔìµÄÒ»¸ö»·ÐÎÊý¾Ý½á¹¹ÖС£Ê¹Óû·ÐÎÊý¾Ý½á¹¹ÊÇΪÁ˸üÓÐЧµØÊ¹ÓÃÄÚ´æ¿Õ¼ä£¬ÔÚÄÚ´æÖзÅÖþ¡¿ÉÄܶàµÄÊý¾Ý¡£
Õâ¸öÊý¾Ý½á¹¹Æäʵ¾ÍÊǸö×Ö½ÚÊý×飬½ÐKvbuffer£¬ÃûÈçÆäÒ壬µ«ÊÇÕâÀïÃæ²»¹â·ÅÖÃÁË<key,
value>Êý¾Ý£¬»¹·ÅÖÃÁËһЩË÷ÒýÊý¾Ý£¬¸ø·ÅÖÃË÷ÒýÊý¾ÝµÄÇøÓòÆðÁËÒ»¸öKvmetaµÄ±ðÃû£¬ÔÚKvbufferµÄÒ»¿éÇøÓòÉÏ´©ÁËÒ»¸öIntBuffer£¨×Ö½ÚÐò²ÉÓõÄÊÇÆ½Ì¨×ÔÉíµÄ×Ö½ÚÐò£©µÄÂí¼×¡£<key,
value>Êý¾ÝÇøÓòºÍË÷ÒýÊý¾ÝÇøÓòÔÚKvbufferÖÐÊÇÏàÁÚ²»ÖصþµÄÁ½¸öÇøÓò£¬ÓÃÒ»¸ö·Ö½çµãÀ´»®·ÖÁ½Õߣ¬·Ö½çµã²»ÊÇØ¨¹Å²»±äµÄ£¬¶øÊÇÿ´ÎSpillÖ®ºó¶¼»á¸üÐÂÒ»´Î¡£³õʼµÄ·Ö½çµãÊÇ0£¬<key,
value>Êý¾ÝµÄ´æ´¢·½ÏòÊÇÏòÉÏÔö³¤£¬Ë÷ÒýÊý¾ÝµÄ´æ´¢·½ÏòÊÇÏòÏÂÔö³¤£¬ÈçͼËùʾ£º

KvbufferµÄ´æ·ÅÖ¸ÕëbufindexÊÇÒ»Ö±ÃÆ×ÅÍ·µØÏòÉÏÔö³¤£¬±ÈÈçbufindex³õʼֵΪ0£¬Ò»¸öIntÐ͵ÄkeyдÍêÖ®ºó£¬bufindexÔö³¤Îª4£¬Ò»¸öIntÐ͵ÄvalueдÍêÖ®ºó£¬bufindexÔö³¤Îª8¡£
Ë÷ÒýÊǶÔ<key, value>ÔÚkvbufferÖеÄË÷Òý£¬ÊǸöËÄÔª×飬°üÀ¨£ºvalueµÄÆðʼλÖá¢keyµÄÆðʼλÖá¢partitionÖµ¡¢valueµÄ³¤¶È£¬Õ¼ÓÃËĸöInt³¤¶È£¬KvmetaµÄ´æ·ÅÖ¸ÕëKvindexÿ´Î¶¼ÊÇÏòÏÂÌøËĸö¡°¸ñ×Ó¡±£¬È»ºóÔÙÏòÉÏÒ»¸ö¸ñ×ÓÒ»¸ö¸ñ×ÓµØÌî³äËÄÔª×éµÄÊý¾Ý¡£±ÈÈçKvindex³õʼλÖÃÊÇ-4£¬µ±µÚÒ»¸ö<key,
value>дÍêÖ®ºó£¬(Kvindex+0)µÄλÖôæ·ÅvalueµÄÆðʼλÖá¢(Kvindex+1)µÄλÖôæ·ÅkeyµÄÆðʼλÖá¢(Kvindex+2)µÄλÖôæ·ÅpartitionµÄÖµ¡¢(Kvindex+3)µÄλÖôæ·ÅvalueµÄ³¤¶È£¬È»ºóKvindexÌøµ½-8λÖ㬵ȵڶþ¸ö<key,
value>ºÍË÷ÒýдÍêÖ®ºó£¬KvindexÌøµ½-32λÖá£
KvbufferµÄ´óСËäÈ»¿ÉÒÔͨ¹ý²ÎÊýÉèÖ㬵«ÊÇ×ܹ²¾ÍÄÇô´ó£¬<key, value>ºÍË÷Òý²»¶ÏµØÔö¼Ó£¬¼Ó׿Ó×Å£¬Kvbuffer×ÜÓв»¹»ÓõÄÄÇÌ죬ÄÇÔõô°ì£¿°ÑÊý¾Ý´ÓÄÚ´æË¢µ½´ÅÅÌÉÏÔÙ½Ó×ÅÍùÄÚ´æÐ´Êý¾Ý£¬°ÑKvbufferÖеÄÊý¾ÝË¢µ½´ÅÅÌÉϵĹý³Ì¾Í½ÐSpill£¬¶àôÃ÷Á˵Ľз¨£¬ÄÚ´æÖеÄÊý¾ÝÂúÁ˾Í×Ô¶¯µØspillµ½¾ßÓиü´ó¿Õ¼äµÄ´ÅÅÌ¡£
¹ØÓÚSpill´¥·¢µÄÌõ¼þ£¬Ò²¾ÍÊÇKvbufferÓõ½Ê²Ã´³Ì¶È¿ªÊ¼Spill£¬»¹ÊÇÒª½²¾¿Ò»Ïµġ£Èç¹û°ÑKvbufferÓõÃËÀËÀµÃ£¬Ò»µã·ì¶¼²»Ê£µÄʱºòÔÙ¿ªÊ¼Spill£¬ÄÇMapÈÎÎñ¾ÍÐèÒªµÈSpillÍê³ÉÌÚ³ö¿Õ¼äÖ®ºó²ÅÄܼÌÐøÐ´Êý¾Ý£»Èç¹ûKvbufferÖ»ÊÇÂúµ½Ò»¶¨³Ì¶È£¬±ÈÈç80%µÄʱºò¾Í¿ªÊ¼Spill£¬ÄÇÔÚSpillµÄͬʱ£¬MapÈÎÎñ»¹ÄܼÌÐøÐ´Êý¾Ý£¬Èç¹ûSpill¹»¿ì£¬Map¿ÉÄܶ¼²»ÐèҪΪ¿ÕÏÐ¿Õ¼ä¶ø·¢³î¡£Á½ÀûÏàºâÈ¡Æä´ó£¬Ò»°ãÑ¡ÔñºóÕß¡£
SpillÕâ¸öÖØÒªµÄ¹ý³ÌÊÇÓÉSpillÏ̳߳е££¬SpillÏ̴߳ÓMapÈÎÎñ½Óµ½¡°ÃüÁ֮ºó¾Í¿ªÊ¼Õýʽ¸É»î£¬¸ÉµÄ»î½ÐSortAndSpill£¬ÔÀ´²»½ö½öÊÇSpill£¬ÔÚSpill֮ǰ»¹ÓиöÆÄ¾ßÕùÒéÐÔµÄSort¡£
Sort
ÏȰÑKvbufferÖеÄÊý¾Ý°´ÕÕpartitionÖµºÍkeyÁ½¸ö¹Ø¼ü×ÖÉýÐòÅÅÐò£¬Òƶ¯µÄÖ»ÊÇË÷ÒýÊý¾Ý£¬ÅÅÐò½á¹ûÊÇKvmetaÖÐÊý¾Ý°´ÕÕpartitionΪµ¥Î»¾Û¼¯ÔÚÒ»Æð£¬Í¬Ò»partitionÄڵİ´ÕÕkeyÓÐÐò¡£
Spill
SpillÏß³ÌΪÕâ´ÎSpill¹ý³Ì´´½¨Ò»¸ö´ÅÅÌÎļþ£º´ÓËùÓеı¾µØÄ¿Â¼ÖÐÂÖѵ²éÕÒÄÜ´æ´¢Õâô´ó¿Õ¼äµÄĿ¼£¬ÕÒµ½Ö®ºóÔÚÆäÖд´½¨Ò»¸öÀàËÆÓÚ¡°spill12.out¡±µÄÎļþ¡£SpillÏ̸߳ù¾ÝÅŹýÐòµÄKvmeta°¤¸öpartitionµÄ°Ñ<key,
value>Êý¾Ý͵½Õâ¸öÎļþÖУ¬Ò»¸öpartition¶ÔÓ¦µÄÊý¾ÝÍÂÍêÖ®ºó˳ÐòµØÍÂϸöpartition£¬Ö±µ½°ÑËùÓеÄpartition±éÀúÍê¡£Ò»¸öpartitionÔÚÎļþÖжÔÓ¦µÄÊý¾ÝÒ²½Ð¶Î(segment)¡£
ËùÓеÄpartition¶ÔÓ¦µÄÊý¾Ý¶¼·ÅÔÚÕâ¸öÎļþÀËäÈ»ÊÇ˳Ðò´æ·ÅµÄ£¬µ«ÊÇÔõôֱ½ÓÖªµÀij¸öpartitionÔÚÕâ¸öÎļþÖдæ·ÅµÄÆðʼλÖÃÄØ£¿Ç¿´óµÄË÷ÒýÓÖ³ö³¡ÁË¡£ÓÐÒ»¸öÈýÔª×é¼Ç¼ij¸öpartition¶ÔÓ¦µÄÊý¾ÝÔÚÕâ¸öÎļþÖеÄË÷Òý£ºÆðʼλÖá¢ÔʼÊý¾Ý³¤¶È¡¢Ñ¹ËõÖ®ºóµÄÊý¾Ý³¤¶È£¬Ò»¸öpartition¶ÔÓ¦Ò»¸öÈýÔª×顣Ȼºó°ÑÕâЩË÷ÒýÐÅÏ¢´æ·ÅÔÚÄÚ´æÖУ¬Èç¹ûÄÚ´æÖзŲ»ÏÂÁË£¬ºóÐøµÄË÷ÒýÐÅÏ¢¾ÍÐèҪдµ½´ÅÅÌÎļþÖÐÁË£º´ÓËùÓеı¾µØÄ¿Â¼ÖÐÂÖѵ²éÕÒÄÜ´æ´¢Õâô´ó¿Õ¼äµÄĿ¼£¬ÕÒµ½Ö®ºóÔÚÆäÖд´½¨Ò»¸öÀàËÆÓÚ¡°spill12.out.index¡±µÄÎļþ£¬ÎļþÖв»¹â´æ´¢ÁËË÷ÒýÊý¾Ý£¬»¹´æ´¢ÁËcrc32µÄУÑéÊý¾Ý¡£(spill12.out.index²»Ò»¶¨ÔÚ´ÅÅÌÉÏ´´½¨£¬Èç¹ûÄڴ棨ĬÈÏ1M¿Õ¼ä£©ÖÐÄܷŵÃϾͷÅÔÚÄÚ´æÖУ¬¼´Ê¹ÔÚ´ÅÅÌÉÏ´´½¨ÁË£¬ºÍspill12.outÎļþÒ²²»Ò»¶¨ÔÚͬһ¸öĿ¼Ï¡£)
ÿһ´ÎSpill¹ý³Ì¾Í»á×îÉÙÉú³ÉÒ»¸öoutÎļþ£¬ÓÐʱ»¹»áÉú³ÉindexÎļþ£¬SpillµÄ´ÎÊýÒ²ÀÓÓ¡ÔÚÎļþÃûÖС£Ë÷ÒýÎļþºÍÊý¾ÝÎļþµÄ¶ÔÓ¦¹ØÏµÈçÏÂͼËùʾ£º

»°·ÖÁ½¶Ë£¬ÔÚSpillÏß³ÌÈç»ðÈçݱµÄ½øÐÐSortAndSpill¹¤×÷µÄͬʱ£¬MapÈÎÎñ²»»áÒò´Ë¶øÍ£Ðª£¬¶øÊÇÒ»ÎÞ¼ÈÍùµØ½øÐÐ×ÅÊý¾ÝÊä³ö¡£Map»¹ÊǰÑÊý¾Ýдµ½kvbufferÖУ¬ÄÇÎÊÌâ¾ÍÀ´ÁË£º<key,
value>Ö»¹Ë×ÅÃÆÍ·°´ÕÕbufindexÖ¸ÕëÏòÉÏÔö³¤£¬kvmetaÖ»¹Ë×Ű´ÕÕKvindexÏòÏÂÔö³¤£¬ÊDZ£³ÖÖ¸ÕëÆðʼλÖò»±ä¼ÌÐøÅÜÄØ£¬»¹ÊÇÁíıËü·£¿Èç¹û±£³ÖÖ¸ÕëÆðʼλÖò»±ä£¬ºÜ¿ìbufindexºÍKvindex¾ÍÅöÍ·ÁË£¬ÅöÍ·Ö®ºóÔÙÖØÐ¿ªÊ¼»òÕßÒÆ¶¯ÄÚ´æ¶¼±È½ÏÂé·³£¬²»¿ÉÈ¡¡£MapÈ¡kvbufferÖÐÊ£Óà¿Õ¼äµÄÖмäλÖã¬ÓÃÕâ¸öλÖÃÉèÖÃΪеķֽçµã£¬bufindexÖ¸ÕëÒÆ¶¯µ½Õâ¸ö·Ö½çµã£¬KvindexÒÆ¶¯µ½Õâ¸ö·Ö½çµãµÄ-16λÖã¬È»ºóÁ½Õ߾ͿÉÒÔºÍгµØ°´ÕÕ×Ô¼º¼È¶¨µÄ¹ì¼£·ÅÖÃÊý¾ÝÁË£¬µ±SpillÍê³É£¬¿Õ¼äÌÚ³öÖ®ºó£¬²»ÐèÒª×öÈκθ͝¼ÌÐøÇ°½ø¡£·Ö½çµãµÄת»»ÈçÏÂͼËùʾ£º

MapÈÎÎñ×ÜÒª°ÑÊä³öµÄÊý¾Ýдµ½´ÅÅÌÉÏ£¬¼´Ê¹Êä³öÊý¾ÝÁ¿ºÜСÔÚÄÚ´æÖÐÈ«²¿ÄÜ×°µÃÏ£¬ÔÚ×îºóÒ²»á°ÑÊý¾ÝË¢µ½´ÅÅÌÉÏ¡£
Merge

MapÈÎÎñÈç¹ûÊä³öÊý¾ÝÁ¿ºÜ´ó£¬¿ÉÄÜ»á½øÐкü¸´ÎSpill£¬outÎļþºÍIndexÎļþ»á²úÉúºÜ¶à£¬·Ö²¼ÔÚ²»Í¬µÄ´ÅÅÌÉÏ¡£×îºó°ÑÕâЩÎļþ½øÐкϲ¢µÄmerge¹ý³ÌÉÁÁÁµÇ³¡¡£
Merge¹ý³ÌÔõô֪µÀ²úÉúµÄSpillÎļþ¶¼ÔÚÄÄÁËÄØ£¿´ÓËùÓеı¾µØÄ¿Â¼ÉÏɨÃèµÃµ½²úÉúµÄSpillÎļþ£¬È»ºó°Ñ·¾¶´æ´¢ÔÚÒ»¸öÊý×éÀï¡£Merge¹ý³ÌÓÖÔõô֪µÀSpillµÄË÷ÒýÐÅÏ¢ÄØ£¿Ã»´í£¬Ò²ÊÇ´ÓËùÓеı¾µØÄ¿Â¼ÉÏɨÃèµÃµ½IndexÎļþ£¬È»ºó°ÑË÷ÒýÐÅÏ¢´æ´¢ÔÚÒ»¸öÁбíÀï¡£µ½ÕâÀÓÖÓöµ½ÁËÒ»¸öÖµµÃÄÉÃÆµÄµØ·½¡£ÔÚ֮ǰSpill¹ý³ÌÖеÄʱºòΪʲô²»Ö±½Ó°ÑÕâЩÐÅÏ¢´æ´¢ÔÚÄÚ´æÖÐÄØ£¬ºÎ±ØÓÖ¶àÁËÕⲽɨÃèµÄ²Ù×÷£¿ÌرðÊÇSpillµÄË÷ÒýÊý¾Ý£¬Ö®Ç°µ±Äڴ泬ÏÞÖ®ºó¾Í°ÑÊý¾Ýдµ½´ÅÅÌ£¬ÏÖÔÚÓÖÒª´Ó´ÅÅ̰ÑÕâЩÊý¾Ý¶Á³öÀ´£¬»¹ÊÇÐèҪװµ½¸ü¶àµÄÄÚ´æÖС£Ö®ËùÒÔ¶à´ËÒ»¾Ù£¬ÊÇÒòΪÕâʱkvbufferÕâ¸öÄÚ´æ´ó»§ÒѾ²»ÔÙʹÓÿÉÒÔ»ØÊÕ£¬ÓÐÄÚ´æ¿Õ¼äÀ´×°ÕâЩÊý¾ÝÁË¡££¨¶ÔÓÚÄÚ´æ¿Õ¼ä½Ï´óµÄÍÁºÀÀ´Ëµ£¬ÓÃÄÚ´æÀ´Ê¡È´ÕâÁ½¸öio²½Ö軹ÊÇÖµµÃ¿¼Âǵġ££©
È»ºóΪmerge¹ý³Ì´´½¨Ò»¸ö½Ðfile.outµÄÎļþºÍÒ»¸ö½Ðfile.out.IndexµÄÎļþÓÃÀ´´æ´¢×îÖÕµÄÊä³öºÍË÷Òý¡£
Ò»¸öpartitionÒ»¸öpartitionµÄ½øÐкϲ¢Êä³ö¡£¶ÔÓÚij¸öpartitionÀ´Ëµ£¬´ÓË÷ÒýÁбíÖвéѯÕâ¸öpartition¶ÔÓ¦µÄËùÓÐË÷ÒýÐÅÏ¢£¬Ã¿¸ö¶ÔÓ¦Ò»¸ö¶Î²åÈëµ½¶ÎÁбíÖС£Ò²¾ÍÊÇÕâ¸öpartition¶ÔÓ¦Ò»¸ö¶ÎÁÐ±í£¬¼Ç¼ËùÓеÄSpillÎļþÖжÔÓ¦µÄÕâ¸öpartitionÄǶÎÊý¾ÝµÄÎļþÃû¡¢ÆðʼλÖᢳ¤¶ÈµÈµÈ¡£
È»ºó¶ÔÕâ¸öpartition¶ÔÓ¦µÄËùÓеÄsegment½øÐкϲ¢£¬Ä¿±êÊǺϲ¢³ÉÒ»¸ösegment¡£µ±Õâ¸öpartition¶ÔÓ¦ºÜ¶à¸ösegmentʱ£¬»á·ÖÅúµØ½øÐкϲ¢£ºÏÈ´ÓsegmentÁбíÖаѵÚÒ»ÅúÈ¡³öÀ´£¬ÒÔkeyΪ¹Ø¼ü×Ö·ÅÖóÉ×îС¶Ñ£¬È»ºó´Ó×îС¶ÑÖÐÿ´ÎÈ¡³ö×îСµÄ<key,
value>Êä³öµ½Ò»¸öÁÙʱÎļþÖУ¬ÕâÑù¾Í°ÑÕâÒ»Åú¶ÎºÏ²¢³ÉÒ»¸öÁÙʱµÄ¶Î£¬°ÑËü¼Ó»Øµ½segmentÁбíÖУ»ÔÙ´ÓsegmentÁбíÖаѵڶþÅúÈ¡³öÀ´ºÏ²¢Êä³öµ½Ò»¸öÁÙʱsegment£¬°ÑÆä¼ÓÈëµ½ÁбíÖУ»ÕâÑùÍù¸´Ö´ÐУ¬Ö±µ½Ê£ÏµĶÎÊÇÒ»Åú£¬Êä³öµ½×îÖÕµÄÎļþÖС£
×îÖÕµÄË÷ÒýÊý¾ÝÈÔÈ»Êä³öµ½IndexÎļþÖС£
Map¶ËµÄShuffle¹ý³Ìµ½´Ë½áÊø¡£
Copy
ReduceÈÎÎñͨ¹ýHTTPÏò¸÷¸öMapÈÎÎñÍÏÈ¡ËüËùÐèÒªµÄÊý¾Ý¡£Ã¿¸ö½Úµã¶¼»áÆô¶¯Ò»¸ö³£×¤µÄHTTP server£¬ÆäÖÐÒ»Ïî·þÎñ¾ÍÊÇÏìÓ¦ReduceÍÏÈ¡MapÊý¾Ý¡£µ±ÓÐMapOutputµÄHTTPÇëÇó¹ýÀ´µÄʱºò£¬HTTP
server¾Í¶ÁÈ¡ÏàÓ¦µÄMapÊä³öÎļþÖжÔÓ¦Õâ¸öReduce²¿·ÖµÄÊý¾Ýͨ¹ýÍøÂçÁ÷Êä³ö¸øReduce¡£
ReduceÈÎÎñÍÏȡij¸öMap¶ÔÓ¦µÄÊý¾Ý£¬Èç¹ûÔÚÄÚ´æÖÐÄܷŵÃÏÂÕâ´ÎÊý¾ÝµÄ»°¾ÍÖ±½Ó°ÑÊý¾Ýдµ½ÄÚ´æÖС£ReduceÒªÏòÿ¸öMapÈ¥ÍÏÈ¡Êý¾Ý£¬ÔÚÄÚ´æÖÐÿ¸öMap¶ÔÓ¦Ò»¿éÊý¾Ý£¬µ±ÄÚ´æÖд洢µÄMapÊý¾ÝÕ¼Óÿռä´ïµ½Ò»¶¨³Ì¶ÈµÄʱºò£¬¿ªÊ¼Æô¶¯ÄÚ´æÖÐmerge£¬°ÑÄÚ´æÖеÄÊý¾ÝmergeÊä³öµ½´ÅÅÌÉÏÒ»¸öÎļþÖС£
Èç¹ûÔÚÄÚ´æÖв»ÄܷŵÃÏÂÕâ¸öMapµÄÊý¾ÝµÄ»°£¬Ö±½Ó°ÑMapÊý¾Ýдµ½´ÅÅÌÉÏ£¬ÔÚ±¾µØÄ¿Â¼´´½¨Ò»¸öÎļþ£¬´ÓHTTPÁ÷ÖжÁÈ¡Êý¾ÝÈ»ºóдµ½´ÅÅÌ£¬Ê¹ÓõĻº´æÇø´óСÊÇ64K¡£ÍÏÒ»¸öMapÊý¾Ý¹ýÀ´¾Í»á´´½¨Ò»¸öÎļþ£¬µ±ÎļþÊýÁ¿´ïµ½Ò»¶¨ãÐֵʱ£¬¿ªÊ¼Æô¶¯´ÅÅÌÎļþmerge£¬°ÑÕâЩÎļþºÏ²¢Êä³öµ½Ò»¸öÎļþ¡£
ÓÐЩMapµÄÊý¾Ý½ÏСÊÇ¿ÉÒÔ·ÅÔÚÄÚ´æÖеģ¬ÓÐЩMapµÄÊý¾Ý½Ï´óÐèÒª·ÅÔÚ´ÅÅÌÉÏ£¬ÕâÑù×îºóReduceÈÎÎñÍϹýÀ´µÄÊý¾ÝÓÐЩ·ÅÔÚÄÚ´æÖÐÁËÓÐЩ·ÅÔÚ´ÅÅÌÉÏ£¬×îºó»á¶ÔÕâЩÀ´Ò»¸öÈ«¾ÖºÏ²¢¡£
Merge Sort
ÕâÀïʹÓõÄMergeºÍMap¶ËʹÓõÄMerge¹ý³ÌÒ»Ñù¡£MapµÄÊä³öÊý¾ÝÒѾÊÇÓÐÐòµÄ£¬Merge½øÐÐÒ»´ÎºÏ²¢ÅÅÐò£¬ËùνReduce¶ËµÄsort¹ý³Ì¾ÍÊÇÕâ¸öºÏ²¢µÄ¹ý³Ì¡£Ò»°ãReduceÊÇÒ»±ßcopyÒ»±ßsort£¬¼´copyºÍsortÁ½¸ö½×¶ÎÊÇÖØµþ¶ø²»ÊÇÍêÈ«·Ö¿ªµÄ¡£
Reduce¶ËµÄShuffle¹ý³ÌÖÁ´Ë½áÊø¡£
SparkµÄShuffle¹ý³Ì½éÉÜ
Shuffle Writer
Spark·á¸»ÁËÈÎÎñÀàÐÍ£¬ÓÐЩÈÎÎñÖ®¼äÊý¾ÝÁ÷ת²»ÐèҪͨ¹ýShuffle£¬µ«ÊÇÓÐЩÈÎÎñÖ®¼ä»¹ÊÇÐèҪͨ¹ýShuffleÀ´´«µÝÊý¾Ý£¬±ÈÈçwide
dependencyµÄgroup by key¡£
SparkÖÐÐèÒªShuffleÊä³öµÄMapÈÎÎñ»áΪÿ¸öReduce´´½¨¶ÔÓ¦µÄbucket£¬Map²úÉúµÄ½á¹û»á¸ù¾ÝÉèÖõÄpartitionerµÃµ½¶ÔÓ¦µÄbucketId£¬È»ºóÌî³äµ½ÏàÓ¦µÄbucketÖÐÈ¥¡£Ã¿¸öMapµÄÊä³ö½á¹û¿ÉÄܰüº¬ËùÓеÄReduceËùÐèÒªµÄÊý¾Ý£¬ËùÒÔÿ¸öMap»á´´½¨R¸öbucket£¨RÊÇreduceµÄ¸öÊý£©£¬M¸öMap×ܹ²»á´´½¨M*R¸öbucket¡£
Map´´½¨µÄbucketÆäʵ¶ÔÓ¦´ÅÅÌÉϵÄÒ»¸öÎļþ£¬MapµÄ½á¹ûдµ½Ã¿¸öbucketÖÐÆäʵ¾ÍÊÇдµ½ÄǸö´ÅÅÌÎļþÖУ¬Õâ¸öÎļþÒ²±»³ÆÎªblockFile£¬ÊÇDisk
Block Manager¹ÜÀíÆ÷ͨ¹ýÎļþÃûµÄHashÖµ¶ÔÓ¦µ½±¾µØÄ¿Â¼µÄ×ÓĿ¼Öд´½¨µÄ¡£Ã¿¸öMapÒªÔÚ½ÚµãÉÏ´´½¨R¸ö´ÅÅÌÎļþÓÃÓÚ½á¹ûÊä³ö£¬MapµÄ½á¹ûÊÇÖ±½ÓÊä³öµ½´ÅÅÌÎļþÉϵģ¬100KBµÄÄڴ滺³åÊÇÓÃÀ´´´½¨Fast
Buffered OutputStreamÊä³öÁ÷¡£ÕâÖÖ·½Ê½Ò»¸öÎÊÌâ¾ÍÊÇShuffleÎļþ¹ý¶à¡£

Õë¶ÔÉÏÊöShuffle¹ý³Ì²úÉúµÄÎļþ¹ý¶àÎÊÌ⣬SparkÓÐÁíÍâÒ»ÖָĽøµÄShuffle¹ý³Ì£ºconsolidation
Shuffle£¬ÒÔÆÚÏÔÖø¼õÉÙShuffleÎļþµÄÊýÁ¿¡£ÔÚconsolidation ShuffleÖÐÿ¸öbucket²¢·Ç¶ÔÓ¦Ò»¸öÎļþ£¬¶øÊǶÔÓ¦ÎļþÖеÄÒ»¸ösegment²¿·Ö¡£JobµÄmapÔÚij¸ö½ÚµãÉϵÚÒ»´ÎÖ´ÐУ¬ÎªÃ¿¸öreduce´´½¨bucket¶ÔÓ¦µÄÊä³öÎļþ£¬°ÑÕâЩÎļþ×éÖ¯³ÉShuffleFileGroup£¬µ±Õâ´ÎmapÖ´ÐÐÍêÖ®ºó£¬Õâ¸öShuffleFileGroup¿ÉÒÔÊÍ·ÅΪÏ´ÎÑ»·ÀûÓ㻵±ÓÖÓÐmapÔÚÕâ¸ö½ÚµãÉÏÖ´ÐÐʱ£¬²»ÐèÒª´´½¨ÐµÄbucketÎļþ£¬¶øÊÇÔÚÉϴεÄShuffleFileGroupÖÐÈ¡µÃÒѾ´´½¨µÄÎļþ¼ÌÐø×·¼Óдһ¸ösegment£»µ±Ç°´Îmap»¹Ã»Ö´ÐÐÍ꣬ShuffleFileGroup»¹Ã»ÓÐÊÍ·Å£¬ÕâʱÈç¹ûÓÐеÄmapÔÚÕâ¸ö½ÚµãÉÏÖ´ÐУ¬ÎÞ·¨Ñ»·ÀûÓÃÕâ¸öShuffleFileGroup£¬¶øÊÇÖ»ÄÜ´´½¨ÐµÄbucketÎļþ×é³ÉеÄShuffleFileGroupÀ´Ð´Êä³ö¡£

±ÈÈçÒ»¸öJobÓÐ3¸öMapºÍ2¸öreduce£º(1) Èç¹û´Ëʱ¼¯ÈºÓÐ3¸ö½ÚµãÓпղۣ¬Ã¿¸ö½Úµã¿ÕÏÐÁËÒ»¸öcore£¬Ôò3¸öMap»áµ÷¶Èµ½Õâ3¸ö½ÚµãÉÏÖ´ÐУ¬Ã¿¸öMap¶¼»á´´½¨2¸öShuffleÎļþ£¬×ܹ²´´½¨6¸öShuffleÎļþ£»(2)
Èç¹û´Ëʱ¼¯ÈºÓÐ2¸ö½ÚµãÓпղۣ¬Ã¿¸ö½Úµã¿ÕÏÐÁËÒ»¸öcore£¬Ôò2¸öMapÏȵ÷¶Èµ½Õâ2¸ö½ÚµãÉÏÖ´ÐУ¬Ã¿¸öMap¶¼»á´´½¨2¸öShuffleÎļþ£¬È»ºóÆäÖÐÒ»¸ö½ÚµãÖ´ÐÐÍêMapÖ®ºóÓÖµ÷¶ÈÖ´ÐÐÁíÒ»¸öMap£¬ÔòÕâ¸öMap²»»á´´½¨ÐµÄShuffleÎļþ£¬¶øÊǰѽá¹ûÊä³ö×·¼Óµ½Ö®Ç°Map´´½¨µÄShuffleÎļþÖУ»×ܹ²´´½¨4¸öShuffleÎļþ£»(3)
Èç¹û´Ëʱ¼¯ÈºÓÐ2¸ö½ÚµãÓпղۣ¬Ò»¸ö½ÚµãÓÐ2¸ö¿ÕcoreÒ»¸ö½ÚµãÓÐ1¸ö¿Õcore£¬ÔòÒ»¸ö½Úµãµ÷¶È2¸öMapÒ»¸ö½Úµãµ÷¶È1¸öMap£¬µ÷¶È2¸öMapµÄ½ÚµãÉÏ£¬Ò»¸öMap´´½¨ÁËShuffleÎļþ£¬ºóÃæµÄMap»¹Êǻᴴ½¨ÐµÄShuffleÎļþ£¬ÒòΪÉÏÒ»¸öMap»¹ÕýÔÚд£¬Ëü´´½¨µÄShuffleFileGroup»¹Ã»ÓÐÊÍ·Å£»×ܹ²´´½¨6¸öShuffleÎļþ¡£
Shuffle Fetcher
ReduceÈ¥ÍÏMapµÄÊä³öÊý¾Ý£¬SparkÌṩÁËÁ½Ìײ»Í¬µÄÀÈ¡Êý¾Ý¿ò¼Ü£ºÍ¨¹ýsocketÁ¬½ÓȥȡÊý¾Ý£»Ê¹ÓÃnetty¿ò¼ÜȥȡÊý¾Ý¡£
ÿ¸ö½ÚµãµÄExecutor»á´´½¨Ò»¸öBlockManager£¬ÆäÖлᴴ½¨Ò»¸öBlockManagerWorkerÓÃÓÚÏìÓ¦ÇëÇó¡£µ±ReduceµÄGET_BLOCKµÄÇëÇó¹ýÀ´Ê±£¬¶ÁÈ¡±¾µØÎļþ½«Õâ¸öblockIdµÄÊý¾Ý·µ»Ø¸øReduce¡£Èç¹ûʹÓõÄÊÇNetty¿ò¼Ü£¬BlockManager»á´´½¨ShuffleSenderÓÃÓÚ·¢ËÍShuffleÊý¾Ý¡£
²¢²»ÊÇËùÓеÄÊý¾Ý¶¼ÊÇͨ¹ýÍøÂç¶ÁÈ¡£¬¶ÔÓÚÔÚ±¾½ÚµãµÄMapÊý¾Ý£¬ReduceÖ±½ÓÈ¥´ÅÅÌÉ϶ÁÈ¡¶ø²»ÔÙͨ¹ýÍøÂç¿ò¼Ü¡£
ReduceÍϹýÀ´Êý¾ÝÖ®ºóÒÔʲô·½Ê½´æ´¢ÄØ£¿Spark MapÊä³öµÄÊý¾ÝûÓо¹ýÅÅÐò£¬Spark Shuffle¹ýÀ´µÄÊý¾ÝÒ²²»»á½øÐÐÅÅÐò£¬SparkÈÏΪShuffle¹ý³ÌÖеÄÅÅÐò²»ÊDZØÐëµÄ£¬²¢²»ÊÇËùÓÐÀàÐ͵ÄReduceÐèÒªµÄÊý¾Ý¶¼ÐèÒªÅÅÐò£¬Ç¿ÖƵؽøÐÐÅÅÐòÖ»»áÔö¼ÓShuffleµÄ¸ºµ£¡£ReduceÍϹýÀ´µÄÊý¾Ý»á·ÅÔÚÒ»¸öHashMapÖУ¬HashMapÖд洢µÄÒ²ÊÇ<key,
value>¶Ô£¬keyÊÇMapÊä³öµÄkey£¬MapÊä³ö¶ÔÓ¦Õâ¸ökeyµÄËùÓÐvalue×é³ÉHashMapµÄvalue¡£Spark½«ShuffleÈ¡¹ýÀ´µÄÿһ¸ö<key,
value>¶Ô²åÈë»òÕ߸üе½HashMapÖУ¬À´Ò»¸ö´¦ÀíÒ»¸ö¡£HashMapÈ«²¿·ÅÔÚÄÚ´æÖС£
ShuffleÈ¡¹ýÀ´µÄÊý¾ÝÈ«²¿´æ·ÅÔÚÄÚ´æÖУ¬¶ÔÓÚÊý¾ÝÁ¿±È½ÏС»òÕßÒѾÔÚMap¶Ë×ö¹ýºÏ²¢´¦ÀíµÄShuffleÊý¾Ý£¬Õ¼ÓÃÄÚ´æ¿Õ¼ä²»»áÌ«´ó£¬µ«ÊǶÔÓÚ±ÈÈçgroup
by keyÕâÑùµÄ²Ù×÷£¬ReduceÐèÒªµÃµ½key¶ÔÓ¦µÄËùÓÐvalue£¬²¢½«ÕâЩvalue×éÒ»¸öÊý×é·ÅÔÚÄÚ´æÖУ¬ÕâÑùµ±Êý¾ÝÁ¿½Ï´óʱ£¬¾ÍÐèÒª½Ï¶àÄÚ´æ¡£
µ±ÄÚ´æ²»¹»Ê±£¬Òª²»¾Íʧ°Ü£¬Òª²»¾ÍÓÃÀϰ취°ÑÄÚ´æÖеÄÊý¾ÝÒÆµ½´ÅÅÌÉÏ·Å×Å¡£SparkÒâʶµ½ÔÚ´¦ÀíÊý¾Ý¹æÄ£Ô¶Ô¶´óÓÚÄÚ´æ¿Õ¼äʱËù´øÀ´µÄ²»×㣬ÒýÈëÁËÒ»¸ö¾ßÓÐÍⲿÅÅÐòµÄ·½°¸¡£Shuffle¹ýÀ´µÄÊý¾ÝÏÈ·ÅÔÚÄÚ´æÖУ¬µ±ÄÚ´æÖд洢µÄ<key,
value>¶Ô³¬¹ý1000²¢ÇÒÄÚ´æÊ¹Óó¬¹ý70%ʱ£¬ÅжϽڵãÉÏ¿ÉÓÃÄÚ´æÈç¹û»¹×ã¹»£¬Ôò°ÑÄڴ滺³åÇø´óС·±¶£¬Èç¹û¿ÉÓÃÄÚ´æ²»ÔÙ¹»ÁË£¬Ôò°ÑÄÚ´æÖеÄ<key,
value>¶ÔÅÅÐòÈ»ºóдµ½´ÅÅÌÎļþÖС£×îºó°ÑÄڴ滺³åÇøÖеÄÊý¾ÝÅÅÐòÖ®ºóºÍÄÇЩ´ÅÅÌÎļþ×é³ÉÒ»¸ö×îС¶Ñ£¬Ã¿´Î´Ó×îС¶ÑÖжÁÈ¡×îСµÄÊý¾Ý£¬Õâ¸öºÍMapReduceÖеÄmerge¹ý³ÌÀàËÆ¡£
MapReduceºÍSparkµÄShuffle¹ý³Ì¶Ô±È

MapReduce Spark
collect ÔÚÄÚ´æÖй¹ÔìÁËÒ»¿éÊý¾Ý½á¹¹ÓÃÓÚmapÊä³öµÄ»º³å ûÓÐÔÚÄÚ´æÖй¹ÔìÒ»¿éÊý¾Ý½á¹¹ÓÃÓÚmapÊä³öµÄ»º³å£¬¶øÊÇÖ±½Ó°ÑÊä³öдµ½´ÅÅÌÎļþ
sort mapÊä³öµÄÊý¾ÝÓÐÅÅÐò mapÊä³öµÄÊý¾ÝûÓÐÅÅÐò
merge ¶Ô´ÅÅÌÉϵĶà¸öspillÎļþ×îºó½øÐкϲ¢³ÉÒ»¸öÊä³öÎļþ ÔÚmap¶ËûÓÐmerge¹ý³Ì£¬ÔÚÊä³öʱֱ½ÓÊǶÔÓ¦Ò»¸öreduceµÄÊý¾Ýдµ½Ò»¸öÎļþÖУ¬ÕâЩÎļþͬʱ´æÔÚ²¢·¢Ð´£¬×îºó²»ÐèÒªºÏ²¢³ÉÒ»¸ö
copy¿ò¼Ü jetty netty»òÕßÖ±½ÓsocketÁ÷
¶ÔÓÚ±¾½ÚµãÉϵÄÎļþ ÈÔÈ»ÊÇͨ¹ýÍøÂç¿ò¼ÜÍÏÈ¡Êý¾Ý ²»Í¨¹ýÍøÂç¿ò¼Ü£¬¶ÔÓÚÔÚ±¾½ÚµãÉϵÄmapÊä³öÎļþ£¬²ÉÓñ¾µØ¶ÁÈ¡µÄ·½Ê½
copy¹ýÀ´µÄÊý¾Ý´æ·ÅλÖà ÏÈ·ÅÔÚÄڴ棬ÄÚ´æ·Å²»ÏÂʱдµ½´ÅÅÌ Ò»ÖÖ·½Ê½È«²¿·ÅÔÚÄڴ棻
ÁíÒ»ÖÖ·½Ê½ÏÈ·ÅÔÚÄÚ´æ
merge sort ×îºó»á¶Ô´ÅÅÌÎļþºÍÄÚ´æÖеÄÊý¾Ý½øÐкϲ¢ÅÅÐò ¶ÔÓÚ²ÉÓÃÁíÒ»ÖÖ·½Ê½Ê±Ò²»áÓкϲ¢ÅÅÐòµÄ¹ý³Ì
ShuffleºóÐøÓÅ»¯·½Ïò
ͨ¹ýÉÏÃæµÄ½éÉÜ£¬ÎÒÃÇÁ˽⵽£¬Shuffle¹ý³ÌµÄÖ÷Òª´æ´¢½éÖÊÊÇ´ÅÅÌ£¬¾¡Á¿µÄ¼õÉÙIOÊÇShuffleµÄÖ÷ÒªÓÅ»¯·½Ïò¡£ÎÒÃÇÄÔº£Öж¼ÓÐÄǸö¾µäµÄ´æ´¢½ð×ÖËþÌåϵ£¬Shuffle¹ý³ÌΪʲô°Ñ½á¹û¶¼·ÅÔÚ´ÅÅÌÉÏ£¬ÄÇÊÇÒòΪÏÖÔÚÄÚ´æÔÙ´óÒ²´ó²»¹ý´ÅÅÌ£¬ÄÚ´æ¾ÍÄÇô´ó£¬»¹Õâô¶àÕÅ×ì³Ô£¬µ±È»ÊÇ·ÖÅ䏸×îÐèÒªµÄÁË¡£Èç¹û¾ßÓС°ÍÁºÀ¡±ÄÚ´æ½Úµã£¬¼õÉÙShuffle
IOµÄ×îÓÐЧ·½Ê½ÎÞÒÉÊǾ¡Á¿°ÑÊý¾Ý·ÅÔÚÄÚ´æÖС£ÏÂÃæÁоÙһЩÏÖÔÚ¿´¿ÉÒÔÓÅ»¯µÄ·½Ã棬ÆÚ´ý¾¹ýÎÒÃDz»¶ÏµÄŬÁ¦£¬TDW¼ÆËãÒýÇæÔËÐеظüºÃ¡£
MapReduce ShuffleºóÐøÓÅ»¯·½Ïò
1.ѹËõ£º¶ÔÊý¾Ý½øÐÐѹËõ£¬¼õÉÙд¶ÁÊý¾ÝÁ¿£»
2.¼õÉÙ²»±ØÒªµÄÅÅÐò£º²¢²»ÊÇËùÓÐÀàÐ͵ÄReduceÐèÒªµÄÊý¾Ý¶¼ÊÇÐèÒªÅÅÐòµÄ£¬ÅÅÐòÕâ¸önbµÄ¹ý³ÌÈç¹û²»ÐèÒª×îºÃ»¹ÊDz»ÒªµÄºÃ£»
3.Äڴ滯£ºShuffleµÄÊý¾Ý²»·ÅÔÚ´ÅÅ̶øÊǾ¡Á¿·ÅÔÚÄÚ´æÖУ¬³ý·Ç±Æ²»µÃÒÑÍù´ÅÅÌÉÏ·Å£»µ±È»ÁËÈç¹ûÓÐÐÔÄܺÍÄÚ´æÏ൱µÄµÚÈý·½´æ´¢ÏµÍ³£¬ÄÇ·ÅÔÚµÚÈý·½´æ´¢ÏµÍ³ÉÏÒ²ÊǺܺõģ»Õâ¸öÊǸö´óÕУ»
4.ÍøÂç¿ò¼Ü£ºnettyµÄÐÔÄܾÝ˵ҪռÓÅÁË£»
5.±¾½ÚµãÉϵÄÊý¾Ý²»×ßÍøÂç¿ò¼Ü£º¶ÔÓÚ±¾½ÚµãÉϵÄMapÊä³ö£¬ReduceÖ±½ÓÈ¥¶Á°É£¬²»ÐèÒªÈÆµÀÍøÂç¿ò¼Ü¡£
Spark ShuffleºóÐøÓÅ»¯·½Ïò
Spark×÷ΪMapReduceµÄ½ø½×¼Ü¹¹£¬¶ÔÓÚShuffle¹ý³ÌÒѾÊÇÓÅ»¯Á˵ģ¬ÌرðÊǶÔÓÚÄÇЩ¾ßÓÐÕùÒéµÄ²½ÖèÒѾ×öÁËÓÅ»¯£¬µ«ÊÇSparkµÄShuffle¶ÔÓÚÎÒÃÇÀ´ËµÔÚһЩ·½Ã滹ÊÇÐèÒªÓÅ»¯µÄ¡£
1.ѹËõ£º¶ÔÊý¾Ý½øÐÐѹËõ£¬¼õÉÙд¶ÁÊý¾ÝÁ¿£»
2.Äڴ滯£ºSparkÀúÊ·°æ±¾ÖÐÊÇÓÐÕâÑùÉè¼ÆµÄ£ºMapдÊý¾ÝÏȰÑÊý¾ÝÈ«²¿Ð´µ½ÄÚ´æÖУ¬Ð´ÍêÖ®ºóÔÙ°ÑÊý¾ÝË¢µ½´ÅÅÌÉÏ£»¿¼ÂÇÄÚ´æÊǽôȱ×ÊÔ´£¬ºóÀ´Ð޸ijɰÑÊý¾ÝÖ±½Óдµ½´ÅÅÌÁË£»¶ÔÓÚ¾ßÓнϴóÄÚ´æµÄ¼¯ÈºÀ´½²£¬»¹ÊǾ¡Á¿µØÍùÄÚ´æÉÏд°É£¬ÄÚ´æ·Å²»ÏÂÁËÔÙ·Å´ÅÅÌ¡£
|