¼òµ¥½âÊÍ
MapReduce Ëã·¨
Ò»¸öÓÐȤµÄÀý×Ó
ÄãÏëÊý³öÒ»ÞûÅÆÖÐÓжàÉÙÕźÚÌÒ¡£Ö±¹Û·½Ê½ÊÇÒ»ÕÅÒ»Õżì²é²¢ÇÒÊý³öÓжàÉÙÕÅÊǺÚÌÒ£¿

MapReduce·½·¨ÔòÊÇ£º
¸øÔÚ×ùµÄËùÓÐÍæ¼ÒÖзÖÅäÕâÞûÅÆ
ÈÃÿ¸öÍæ¼ÒÊý×Ô¼ºÊÖÖеÄÅÆÓм¸ÕÅÊǺÚÌÒ£¬È»ºó°ÑÕâ¸öÊýÄ¿»ã±¨¸øÄã
Äã°ÑËùÓÐÍæ¼Ò¸æËßÄãµÄÊý×Ö¼ÓÆðÀ´£¬µÃµ½×îºóµÄ½áÂÛ
²ð·Ö
MapReduceºÏ²¢ÁËÁ½ÖÖ¾µäº¯Êý£º
Ó³É䣨Mapping£©¶Ô¼¯ºÏÀïµÄÿ¸öÄ¿±êÓ¦ÓÃͬһ¸ö²Ù×÷¡£¼´£¬Èç¹ûÄãÏë°Ñ±íµ¥Àïÿ¸öµ¥Ôª¸ñ³ËÒÔ¶þ£¬ÄÇô°ÑÕâ¸öº¯Êýµ¥¶ÀµØÓ¦ÓÃÔÚÿ¸öµ¥Ôª¸ñÉϵIJÙ×÷¾ÍÊôÓÚmapping¡£
»¯¼ò£¨Reducing £©±éÀú¼¯ºÏÖеÄÔªËØÀ´·µ»ØÒ»¸ö×ۺϵĽá¹û¡£¼´£¬Êä³ö±íµ¥ÀïÒ»ÁÐÊý×ֵĺÍÕâ¸öÈÎÎñÊôÓÚreducing¡£
ÖØÐÂÉóÊÓÉÏÃæµÄÀý×Ó
ÖØÐÂÉóÊÓÎÒÃÇÔÀ´ÄǸö·ÖɢֽůµÄÀý×Ó£¬ÎÒÃÇÓÐMapReduceÊý¾Ý·ÖÎöµÄ»ù±¾·½·¨¡£ÓÑÇéÌáʾ£ºÕâ²»ÊǸöÑϽ÷µÄÀý×Ó¡£ÔÚÕâ¸öÀý×ÓÀÈË´ú±í¼ÆËã»ú£¬ÒòΪËûÃÇͬʱ¹¤×÷£¬ËùÒÔËûÃÇÊǸö¼¯Èº¡£ÔÚ´ó¶àÊýʵ¼ÊÓ¦ÓÃÖУ¬ÎÒÃǼÙÉèÊý¾ÝÒѾÔÚÿ̨¼ÆËã»úÉÏÁË
¨C Ò²¾ÍÊÇ˵°ÑÅÆ·Ö·¢³öÈ¥²¢²»ÊÇMapReduceµÄÒ»²½¡££¨ÊÂʵÉÏ£¬ÔÚ¼ÆËã»ú¼¯ÈºÖÐÈçºÎ´æ´¢ÎļþÊÇHadoopµÄÕæÕýºËÐÄ¡££©
ͨ¹ý°ÑÅÆ·Ö¸ø¶à¸öÍæ¼Ò²¢ÇÒÈÃËûÃǸ÷×ÔÊýÊý£¬Äã¾ÍÔÚ²¢ÐÐÖ´ÐÐÔËË㣬ÒòΪÿ¸öÍæ¼Ò¶¼ÔÚͬʱ¼ÆÊý¡£Õâͬʱ°ÑÕâÏ×÷±ä³ÉÁË·Ö²¼Ê½µÄ£¬ÒòΪ¶à¸ö²»Í¬µÄÈËÔÚ½â¾öͬһ¸öÎÊÌâµÄ¹ý³ÌÖв¢²»ÐèÒªÖªµÀËûÃǵÄÁÚ¾ÓÔÚ¸Éʲô¡£
ͨ¹ý¸æËßÿ¸öÈËÈ¥ÊýÊý£¬Äã¶ÔÒ»Ïî¼ì²éÿÕÅÅÆµÄÈÎÎñ½øÐÐÁËÓ³Éä¡£ Äã²»»áÈÃËûÃǰѺÚÌÒÅÆµÝ¸øÄ㣬¶øÊÇÈÃËûÃǰÑÄãÏëÒªµÄ¶«Î÷»¯¼òΪһ¸öÊý×Ö¡£
ÁíÍâÒ»¸öÓÐÒâ˼µÄÇé¿öÊÇÅÆ·ÖÅäµÃÓжà¾ùÔÈ¡£MapReduce¼ÙÉèÊý¾ÝÊÇÏ´¹ýµÄ£¨shuffled£©- Èç¹ûËùÓкÚÌÒ¶¼·Öµ½ÁËÒ»¸öÈËÊÖÉÏ£¬ÄÇËûÊýÅÆµÄ¹ý³Ì¿ÉÄÜ±ÈÆäËûÈËÒªÂýºÜ¶à¡£
Èç¹ûÓÐ×ã¹»µÄÈ˵ϰ£¬ÎÊһЩ¸üÓÐȤµÄÎÊÌâ¾ÍÏ൱¼òµ¥ÁË - ±ÈÈç¡°Ò»ÞûÅÆµÄƽ¾ùÖµ£¨¶þʮһµãËã·¨£©ÊÇʲô¡±¡£Äã¿ÉÒÔͨ¹ýºÏ²¢¡°ËùÓÐÅÆµÄÖµµÄºÍÊÇʲô¡±¼°¡°ÎÒÃÇÓжàÉÙÕÅÅÆ¡±ÕâÁ½¸öÎÊÌâÀ´µÃµ½´ð°¸¡£ÓÃÕâ¸öºÍ³ýÒÔÅÆµÄÕÅÊý¾ÍµÃµ½ÁËÆ½¾ùÖµ¡£
MapReduceËã·¨µÄ»úÖÆÒªÔ¶±ÈÕ⸴Ôӵö࣬µ«ÊÇÖ÷Ìå˼ÏëÊÇÒ»Ö嵀 ¨C
ͨ¹ý·ÖÉ¢¼ÆËãÀ´·ÖÎö´óÁ¿Êý¾Ý¡£ÎÞÂÛÊÇFacebook¡¢NASA£¬»¹ÊÇС´´Òµ¹«Ë¾£¬MapReduce¶¼ÊÇĿǰ·ÖÎö»¥ÁªÍø¼¶±ðÊý¾ÝµÄÖ÷Á÷·½·¨¡£
HadoopÖеÄMapReduce
´ó¹æÄ£Êý¾Ý´¦Àíʱ£¬MapReduceÔÚÈý¸ö²ãÃæÉϵĻù±¾¹¹Ë¼
1¡¢ÈçºÎ¶Ô¸¶´óÊý¾Ý´¦Àí£º·Ö¶øÖÎÖ®
¶ÔÏ໥¼ä²»¾ßÓмÆËãÒÀÀµ¹ØÏµµÄ´óÊý¾Ý£¬ÊµÏÖ²¢ÐÐ×î×ÔÈ»µÄ°ì·¨¾ÍÊDzÉÈ¡·Ö¶øÖÎÖ®µÄ²ßÂÔ
2¡¢ÉÏÉýµ½³éÏóÄ£ÐÍ£ºMapperÓëReducer
MPIµÈ²¢ÐмÆËã·½·¨È±Éٸ߲㲢Ðбà³ÌÄ£ÐÍ£¬ÎªÁ˿˷þÕâһȱÏÝ£¬MapReduce½è¼øÁËLispº¯ÊýʽÓïÑÔÖеÄ˼Ï룬ÓÃMapºÍReduceÁ½¸öº¯ÊýÌṩÁ˸߲ãµÄ²¢Ðбà³Ì³éÏóÄ£ÐÍ
3¡¢ÉÏÉýµ½¹¹¼Ü£ºÍ³Ò»¹¹¼Ü£¬Îª³ÌÐòÔ±Òþ²ØÏµÍ³²ãϸ½Ú
MPIµÈ²¢ÐмÆËã·½·¨È±ÉÙͳһµÄ¼ÆËã¿ò¼ÜÖ§³Ö£¬³ÌÐòÔ±ÐèÒª¿¼ÂÇÊý¾Ý´æ´¢¡¢»®·Ö¡¢·Ö·¢¡¢½á¹ûÊÕ¼¯¡¢´íÎó»Ö¸´µÈÖî¶àϸ½Ú£»Îª´Ë£¬MapReduceÉè¼Æ²¢ÌṩÁËͳһµÄ¼ÆËã¿ò¼Ü£¬Îª³ÌÐòÔ±Òþ²ØÁ˾ø´ó¶àÊýϵͳ²ãÃæµÄ´¦Àíϸ½Ú
1.¶Ô¸¶´óÊý¾Ý´¦Àí-·Ö¶øÖÎÖ®
ʲôÑùµÄ¼ÆËãÈÎÎñ¿É½øÐв¢Ðл¯¼ÆË㣿
²¢ÐмÆËãµÄµÚÒ»¸öÖØÒªÎÊÌâÊÇÈçºÎ»®·Ö¼ÆËãÈÎÎñ»òÕß¼ÆËãÊý¾ÝÒÔ±ã¶Ô»®·ÖµÄ×ÓÈÎÎñ»òÊý¾Ý¿éͬʱ½øÐмÆËã¡£µ«Ò»Ð©¼ÆËãÎÊÌâǡǡÎÞ·¨½øÐÐÕâÑùµÄ»®·Ö£¡
Nine women cannot have a baby in one month!
ÀýÈ磺Fibonacciº¯Êý: Fk+2 = Fk + Fk+1
ǰºóÊý¾ÝÏîÖ®¼ä´æÔÚºÜÇ¿µÄÒÀÀµ¹ØÏµ£¡Ö»ÄÜ´®ÐмÆË㣡
½áÂÛ£º²»¿É·Ö²ðµÄ¼ÆËãÈÎÎñ»òÏ໥¼äÓÐÒÀÀµ¹ØÏµµÄÊý¾ÝÎÞ·¨½øÐв¢ÐмÆË㣡
´óÊý¾ÝµÄ²¢Ðл¯¼ÆËã
Ò»¸ö´óÊý¾ÝÈô¿ÉÒÔ·ÖΪ¾ßÓÐͬÑù¼ÆËã¹ý³ÌµÄÊý¾Ý¿é£¬²¢ÇÒÕâЩÊý¾Ý¿éÖ®¼ä²»´æÔÚÊý¾ÝÒÀÀµ¹ØÏµ£¬ÔòÌá¸ß´¦ÀíËٶȵÄ×îºÃ°ì·¨¾ÍÊDz¢ÐмÆËã
ÀýÈ磺¼ÙÉèÓÐÒ»¸ö¾Þ´óµÄ2άÊý¾ÝÐèÒª´¦Àí(±ÈÈçÇóÿ¸öÔªËØµÄ¿ªÁ¢·½)£¬ÆäÖжÔÿ¸öÔªËØµÄ´¦ÀíÊÇÏàͬµÄ,²¢ÇÒÊý¾ÝÔªËØ¼ä²»´æÔÚÊý¾ÝÒÀÀµ¹ØÏµ,¿ÉÒÔ¿¼ÂDz»Í¬µÄ»®·Ö·½·¨½«Æä»®·ÖΪ×ÓÊý×é,ÓÉÒ»×é´¦ÀíÆ÷²¢Ðд¦Àí



2.¹¹½¨³éÏóÄ£ÐÍ-MapºÍReduce
½è¼øº¯ÊýʽÉè¼ÆÓïÑÔLispµÄÉè¼ÆË¼Ïë
?º¯Êýʽ³ÌÐòÉè¼Æ(functional programming)ÓïÑÔLispÊÇÒ»ÖÖÁÐ±í´¦Àí ÓïÑÔ(List
processing)£¬ÊÇÒ»ÖÖÓ¦ÓÃÓÚÈ˹¤ÖÇÄÜ´¦ÀíµÄ·ûºÅʽÓïÑÔ£¬ÓÉMITµÄÈ˹¤ÖÇÄÜר¼Ò¡¢Í¼Áé½±»ñµÃÕßJohn
McCarthyÓÚ1958ÄêÉè¼Æ·¢Ã÷¡£
?Lisp¶¨ÒåÁ˿ɶÔÁбíÔªËØ½øÐÐÕûÌå´¦ÀíµÄ¸÷ÖÖ²Ù×÷£¬È磺
È磺(add #(1 2 3 4) #(4 3 2 1)) ½«²úÉú½á¹û£º #(5 5 5 5)
?LispÖÐÒ²ÌṩÁËÀàËÆÓÚMapºÍReduceµÄ²Ù×÷
Èç: (map ¡®vector #+ #(1 2 3 4 5) #(10 11 12 13 14))
ͨ¹ý¶¨Òå¼Ó·¨mapÔËË㽫2¸öÏòÁ¿Ïà¼Ó²úÉú½á¹û#(11 13 15 17 19)
(reduce #¡¯+ #(11 13 15 17 19)) ͨ¹ý¼Ó·¨¹é²¢²úÉúÀÛ¼Ó½á¹û75
Map: ¶ÔÒ»×éÊý¾ÝÔªËØ½øÐÐijÖÖÖØ¸´Ê½µÄ´¦Àí
Reduce: ¶ÔMapµÄÖмä½á¹û½øÐÐijÖÖ½øÒ»²½µÄ½á¹ûÕû

¹Ø¼ü˼Ï룺Ϊ´óÊý¾Ý´¦Àí¹ý³ÌÖеÄÁ½¸öÖ÷Òª´¦Àí²Ù×÷ÌṩһÖÖ³éÏó»úÖÆ
MapReduceÖеÄMapºÍReduce²Ù×÷µÄ³éÏóÃèÊö
MapReduce½è¼øÁ˺¯Êýʽ³ÌÐòÉè¼ÆÓïÑÔLispÖеÄ˼Ï룬¶¨ÒåÁËÈçϵÄMapºÍReduceÁ½¸ö³éÏóµÄ±à³Ì½Ó¿Ú£¬ÓÉÓû§È¥±à³ÌʵÏÖ:
map: (k1; v1) ¡ú [(k2; v2)]
ÊäÈ룺¼üÖµ¶Ô(k1; v1)±íʾµÄÊý¾Ý
´¦Àí£ºÎĵµÊý¾Ý¼Ç¼(ÈçÎı¾ÎļþÖеÄÐУ¬»òÊý¾Ý±í¸ñÖеÄÐÐ)½«ÒÔ¡°¼üÖµ¶Ô¡±ÐÎʽ´«Èëmapº¯Êý£»mapº¯Êý½«´¦ÀíÕâЩ¼üÖµ¶Ô£¬²¢ÒÔÁíÒ»ÖÖ¼üÖµ¶ÔÐÎʽÊä³ö´¦ÀíµÄÒ»×é¼üÖµ¶ÔÖмä½á¹û¡¡¡¡¡¡[(k2;
v2)]
Êä³ö£º¼üÖµ¶Ô[(k2; v2)]±íʾµÄÒ»×éÖмäÊý¾Ý
reduce: (k2; [v2]) ¡ú [(k3; v3)]
ÊäÈ룺 ÓÉmapÊä³öµÄÒ»×é¼üÖµ¶Ô[(k2; v2)] ½«±»½øÐкϲ¢´¦Àí½«Í¬ÑùÖ÷¼üϵIJ»Í¬ÊýÖµºÏ²¢µ½Ò»¸öÁбí[v2]ÖУ¬¹ÊreduceµÄÊäÈëΪ(k2;
[v2])
´¦Àí£º¶Ô´«ÈëµÄÖмä½á¹ûÁбíÊý¾Ý½øÐÐijÖÖÕûÀí»ò½øÒ»²½µÄ´¦Àí,²¢²úÉú×îÖÕµÄijÖÖÐÎʽµÄ½á¹ûÊä³ö[(k3; v3)]
¡£
Êä³ö£º×îÖÕÊä³ö½á¹û[(k3; v3)]
MapºÍReduceΪ³ÌÐòÔ±ÌṩÁËÒ»¸öÇåÎúµÄ²Ù×÷½Ó¿Ú³éÏóÃèÊö

¸÷¸ömapº¯Êý¶ÔËù»®·ÖµÄÊý¾Ý²¢Ðд¦Àí£¬´Ó²»Í¬µÄÊäÈëÊý¾Ý²úÉú²»Í¬µÄÖмä½á¹ûÊä³ö
¸÷¸öreduceÒ²¸÷×Ô²¢ÐмÆË㣬¸÷×Ô¸ºÔð´¦Àí²»Í¬µÄÖмä½á¹ûÊý¾Ý¼¯ºÏ?½øÐÐreduce´¦Àí֮ǰ,±ØÐëµÈµ½ËùÓеÄmapº¯Êý×öÍ꣬Òò´Ë,ÔÚ½øÈëreduceǰÐèÒªÓÐÒ»¸öͬ²½ÕÏ(barrier);Õâ¸ö½×¶ÎÒ²¸ºÔð¶ÔmapµÄÖмä½á¹ûÊý¾Ý½øÐÐÊÕ¼¯ÕûÀí(aggregation
& shuffle)´¦Àí,ÒÔ±ãreduce¸üÓÐЧµØ¼ÆËã×îÖÕ½á¹û?×îÖÕ»ã×ÜËùÓÐreduceµÄÊä³ö½á¹û¼´¿É»ñµÃ×îÖÕ½á¹û
»ùÓÚMapReduceµÄ´¦Àí¹ý³ÌʾÀý¨CÎĵµ´ÊƵͳ¼Æ£ºWordCount
ÉèÓÐ4×éÔʼÎı¾Êý¾Ý£º
Text 1: the weather is good Text 2: today is good
Text 3: good weather is good Text 4: today has good
weather
´«Í³µÄ´®Ðд¦Àí·½Ê½(Java)£º
String[] text = new String[] { ¡°hello world¡±, ¡°hello every one¡±, ¡°say hello to everyone in the world¡± £ý; HashTable ht = new HashTable(); for(i = 0; i < 3; ++i) { StringTokenizer st = new StringTokenizer(text[i]); while (st.hasMoreTokens()) { String word = st.nextToken(); if(!ht.containsKey(word)) { ht.put(word, new Integer(1)); } else { int wc = ((Integer)ht.get(word)).intValue() +1; // ¼ÆÊý¼Ó1 ht.put(word, new Integer(wc)); } } } for (Iterator itr=ht.KeySet().iterator(); itr.hasNext(); ) { String word = (String)itr.next(); System.out.print(word+ ¡°: ¡±+ (Integer)ht.get(word)+¡°; ¡±); |
Êä³ö£ºgood: 5; has: 1; is: 3; the: 1; today: 2; weather:
3
»ùÓÚMapReduceµÄ´¦Àí¹ý³ÌʾÀý¨CÎĵµ´ÊƵͳ¼Æ£ºWordCount
MapReduce´¦Àí·½Ê½
ʹÓÃ4¸ömap½Úµã£º
map½Úµã1:
ÊäÈ룺(text1, ¡°the weather is good¡±)
Êä³ö£º(the, 1), (weather, 1), (is, 1), (good, 1)
map½Úµã2:
ÊäÈ룺(text2, ¡°today is good¡±)
Êä³ö£º(today, 1), (is, 1), (good, 1)
map½Úµã3:
ÊäÈ룺(text3, ¡°good weather is good¡±)
Êä³ö£º(good, 1), (weather, 1), (is, 1), (good, 1)
map½Úµã4:
ÊäÈ룺(text3, ¡°today has good weather¡±)
Êä³ö£º(today, 1), (has, 1), (good, 1), (weather, 1)
ʹÓÃ3¸öreduce½Úµã£º

MapReduce´¦Àí·½Ê½
MapReduceα´úÂë(ʵÏÖMapºÍReduceÁ½¸öº¯Êý)£º

Class Mapper method map(String input_key, String input_value): // input_key: text document name // input_value: document contents for each word w in input_value: EmitIntermediate(w, "1"); Class Reducer method reduce(String output_key, Iterator intermediate_values): // output_key: a word // output_values: a list of counts int result = 0; for each v in intermediate_values: result += ParseInt(v); Emit(output_key£¬ result); |
3.ÉÏÉýµ½¹¹¼Ü-×Ô¶¯²¢Ðл¯²¢Òþ²ØµÍ²ãϸ½Ú
ÈçºÎÌṩͳһµÄ¼ÆËã¿ò¼Ü
MapReduceÌṩһ¸öͳһµÄ¼ÆËã¿ò¼Ü£¬¿ÉÍê³É£º
¼ÆËãÈÎÎñµÄ»®·ÖºÍµ÷¶È
Êý¾ÝµÄ·Ö²¼´æ´¢ºÍ»®·Ö
´¦ÀíÊý¾ÝÓë¼ÆËãÈÎÎñµÄͬ²½
½á¹ûÊý¾ÝµÄÊÕ¼¯ÕûÀí(sorting, combining, partitioning,¡)
ϵͳͨÐÅ¡¢¸ºÔØÆ½ºâ¡¢¼ÆËãÐÔÄÜÓÅ»¯´¦Àí
´¦Àíϵͳ½Úµã³ö´í¼ì²âºÍʧЧ»Ö¸´
MapReduce×î´óµÄÁÁµã
ͨ¹ý³éÏóÄ£ÐͺͼÆËã¿ò¼Ü°ÑÐèÒª×öʲô(what need to do)Óë¾ßÌåÔõô×ö(how
to do)·Ö¿ªÁË£¬Îª³ÌÐòÔ±Ìṩһ¸ö³éÏóºÍ¸ß²ãµÄ±à³Ì½Ó¿ÚºÍ¿ò¼Ü
³ÌÐòÔ±½öÐèÒª¹ØÐÄÆäÓ¦ÓòãµÄ¾ßÌ弯ËãÎÊÌ⣬½öÐè±àдÉÙÁ¿µÄ´¦ÀíÓ¦Óñ¾Éí¼ÆËãÎÊÌâµÄ³ÌÐò´úÂë
ÈçºÎ¾ßÌåÍê³ÉÕâ¸ö²¢ÐмÆËãÈÎÎñËùÏà¹ØµÄÖî¶àϵͳ²ãϸ½Ú±»Òþ²ØÆðÀ´,½»¸ø¼ÆËã¿ò¼ÜÈ¥´¦Àí£º´Ó·Ö²¼´úÂëµÄÖ´ÐУ¬µ½´óµ½ÊýǧСµ½µ¥¸ö½Úµã¼¯ÈºµÄ×Ô¶¯µ÷¶ÈʹÓÃ
MapReduceÌṩµÄÖ÷Òª¹¦ÄÜ
?ÈÎÎñµ÷¶È£ºÌá½»µÄÒ»¸ö¼ÆËã×÷Òµ(job)½«±»»®·ÖΪºÜ¶à¸ö¼ÆËãÈÎÎñ(tasks), ÈÎÎñµ÷¶È¹¦ÄÜÖ÷Òª¸ºÔðΪÕâЩ»®·ÖºóµÄ¼ÆËãÈÎÎñ·ÖÅäºÍµ÷¶È¼ÆËã½Úµã(map½Úµã»òreducer½Úµã);
ͬʱ¸ºÔð¼à¿ØÕâЩ½ÚµãµÄÖ´ÐÐ״̬, ²¢¸ºÔðmap½ÚµãÖ´ÐеÄͬ²½¿ØÖÆ(barrier); Ò²¸ºÔð½øÐÐһЩ¼ÆËãÐÔÄÜÓÅ»¯´¦Àí,
Èç¶Ô×îÂýµÄ¼ÆËãÈÎÎñ²ÉÓö౸·ÝÖ´ÐС¢Ñ¡×î¿ìÍê³ÉÕß×÷Ϊ½á¹û
?Êý¾Ý/´úÂ뻥¶¨Î»£ºÎªÁ˼õÉÙÊý¾ÝͨÐÅ£¬Ò»¸ö»ù±¾ÔÔòÊDZ¾µØ»¯Êý¾Ý´¦Àí(locality)£¬¼´Ò»¸ö¼ÆËã½Úµã¾¡¿ÉÄÜ´¦ÀíÆä±¾µØ´ÅÅÌÉÏËù·Ö²¼´æ´¢µÄÊý¾Ý£¬ÕâʵÏÖÁË´úÂëÏòÊý¾ÝµÄÇ¨ÒÆ£»µ±ÎÞ·¨½øÐÐÕâÖÖ±¾µØ»¯Êý¾Ý´¦Àíʱ£¬ÔÙѰÕÒÆäËü¿ÉÓýڵ㲢½«Êý¾Ý´ÓÍøÂçÉÏ´«Ë͸ø¸Ã½Úµã(Êý¾ÝÏò´úÂëÇ¨ÒÆ)£¬µ«½«¾¡¿ÉÄÜ´ÓÊý¾ÝËùÔڵı¾µØ»ú¼ÜÉÏѰÕÒ¿ÉÓýڵãÒÔ¼õÉÙͨÐÅÑÓ³Ù
?³ö´í´¦Àí£ºÒԵͶËÉÌÓ÷þÎñÆ÷¹¹³ÉµÄ´ó¹æÄ£MapReduce¼ÆË㼯ȺÖÐ,½ÚµãÓ²¼þ(Ö÷»ú¡¢´ÅÅÌ¡¢ÄÚ´æµÈ)³ö´íºÍÈí¼þÓÐbugÊdz£Ì¬£¬Òò´Ë,MapReducerÐèÒªÄܼì²â²¢¸ôÀë³ö´í½Úµã£¬²¢µ÷¶È·ÖÅäеĽڵã½Ó¹Ü³ö´í½ÚµãµÄ¼ÆËãÈÎÎñ
·Ö²¼Ê½Êý¾Ý´æ´¢ÓëÎļþ¹ÜÀí£ºº£Á¿Êý¾Ý´¦ÀíÐèÒªÒ»¸öÁ¼ºÃµÄ·Ö²¼Êý¾Ý´æ´¢ºÍÎļþ¹ÜÀíϵͳ֧³Å,¸ÃÎļþϵͳÄܹ»°Ñº£Á¿Êý¾Ý·Ö²¼´æ´¢ÔÚ¸÷¸ö½ÚµãµÄ±¾µØ´ÅÅÌÉÏ,µ«±£³ÖÕû¸öÊý¾ÝÔÚÂß¼ÉϳÉΪһ¸öÍêÕûµÄÊý¾ÝÎļþ£»ÎªÁËÌṩÊý¾Ý´æ´¢ÈÝ´í»úÖÆ,¸ÃÎļþϵͳ»¹ÒªÌṩÊý¾Ý¿éµÄ¶à±¸·Ý´æ´¢¹ÜÀíÄÜÁ¦
CombinerºÍPartitioner:ΪÁ˼õÉÙÊý¾ÝͨÐÅ¿ªÏú,Öмä½á¹ûÊý¾Ý½øÈëreduce½ÚµãǰÐèÒª½øÐкϲ¢(combine)´¦Àí,°Ñ¾ßÓÐͬÑùÖ÷¼üµÄÊý¾ÝºÏ²¢µ½Ò»Æð±ÜÃâÖØ¸´´«ËÍ;
Ò»¸öreducer½ÚµãËù´¦ÀíµÄÊý¾Ý¿ÉÄÜ»áÀ´×Ô¶à¸ömap½Úµã, Òò´Ë, map½ÚµãÊä³öµÄÖмä½á¹ûÐèʹÓÃÒ»¶¨µÄ²ßÂÔ½øÐÐÊʵ±µÄ»®·Ö(partitioner)´¦Àí£¬±£Ö¤Ïà¹ØÊý¾Ý·¢Ë͵½Í¬Ò»¸öreducer½Úµã
»ùÓÚMapºÍReduceµÄ²¢ÐмÆËãÄ£ÐÍ
4.MapReduceµÄÖ÷ÒªÉè¼ÆË¼ÏëºÍÌØÕ÷
1¡¢Ïò¡°Í⡱ºáÏòÀ©Õ¹£¬¶ø·ÇÏò¡°ÉÏ¡±×ÝÏòÀ©Õ¹£¨Scale ¡°out¡±, not ¡°up¡±£©
¼´MapReduce¼¯ÈºµÄ¹¹ÖþÑ¡Óü۸ñ±ãÒË¡¢Ò×ÓÚÀ©Õ¹µÄ´óÁ¿µÍ¶ËÉÌÓ÷þÎñÆ÷£¬¶ø·Ç¼Û¸ñ°º¹ó¡¢²»Ò×À©Õ¹µÄ¸ß¶Ë·þÎñÆ÷£¨SMP£©?µÍ¶Ë·þÎñÆ÷Êг¡Óë¸ßÈÝÁ¿Desktop
PCÓÐÖØµþµÄÊг¡£¬Òò´Ë£¬ÓÉÓÚÏ໥¼ä¼Û¸ñµÄ¾ºÕù¡¢¿É»¥»»µÄ²¿¼þ¡¢ºÍ¹æÄ£¾¼ÃЧӦ£¬Ê¹µÃµÍ¶Ë·þÎñÆ÷±£³Ö½ÏµÍµÄ¼Û¸ñ?»ùÓÚTPC-CÔÚ2007Äêµ×µÄÐÔÄÜÆÀ¹À½á¹û,Ò»¸öµÍ¶Ë·þÎñÆ÷ƽ̨Óë¸ß¶ËµÄ¹²Ïí´æ´¢Æ÷½á¹¹µÄ·þÎñÆ÷ƽ̨Ïà±È,ÆäÐԼ۱ȴóÔ¼Òª¸ß4±¶;Èç¹û°ÑÍâ´æ¼Û¸ñ³ýÍâ,µÍ¶Ë·þÎñÆ÷ÐԼ۱ȴóÔ¼Ìá¸ß12±¶?¶ÔÓÚ´ó¹æÄ£Êý¾Ý´¦Àí£¬ÓÉÓÚÓдóÁ¿Êý¾Ý´æ´¢ÐèÒª£¬ÏÔ¶øÒ×¼û£¬»ùÓڵͶ˷þÎñÆ÷µÄ¼¯ÈºÔ¶±È»ùÓڸ߶˷þÎñÆ÷µÄ¼¯ÈºÓÅÔ½£¬Õâ¾ÍÊÇΪʲôMapReduce²¢ÐмÆË㼯Ⱥ»á»ùÓڵͶ˷þÎñÆ÷ʵÏÖ
2¡¢Ê§Ð§±»ÈÏΪÊdz£Ì¬£¨Assume failures are common£©
MapReduce¼¯ÈºÖÐʹÓôóÁ¿µÄµÍ¶Ë·þÎñÆ÷(GoogleĿǰÔÚÈ«Çò¹²Ê¹ÓðÙÍǫ̀ÒÔÉϵķþÎñÆ÷½Úµã),Òò´Ë£¬½ÚµãÓ²¼þʧЧºÍÈí¼þ³ö´íÊdz£Ì¬£¬Òò¶ø£º
Ò»¸öÁ¼ºÃÉè¼Æ¡¢¾ßÓÐÈÝ´íÐԵIJ¢ÐмÆËãϵͳ²»ÄÜÒòΪ½ÚµãʧЧ¶øÓ°Ïì¼ÆËã·þÎñµÄÖÊÁ¿£¬ÈκνڵãʧЧ¶¼²»Ó¦µ±µ¼Ö½á¹ûµÄ²»Ò»Ö»ò²»È·¶¨ÐÔ£»ÈκÎÒ»¸ö½ÚµãʧЧʱ£¬ÆäËü½ÚµãÒªÄܹ»ÎÞ·ì½Ó¹ÜʧЧ½ÚµãµÄ¼ÆËãÈÎÎñ£»µ±Ê§Ð§½Úµã»Ö¸´ºóÓ¦ÄÜ×Ô¶¯ÎÞ·ì¼ÓÈ뼯Ⱥ£¬¶ø²»ÐèÒª¹ÜÀíÔ±È˹¤½øÐÐϵͳÅäÖÃ
MapReduce²¢ÐмÆËãÈí¼þ¿ò¼ÜʹÓÃÁ˶àÖÖÓÐЧµÄ»úÖÆ£¬Èç½Úµã×Ô¶¯ÖØÆô¼¼Êõ£¬Ê¹¼¯ÈººÍ¼ÆËã¿ò¼Ü¾ßÓжԸ¶½ÚµãʧЧµÄ½¡×³ÐÔ£¬ÄÜÓÐЧ´¦ÀíʧЧ½ÚµãµÄ¼ì²âºÍ»Ö¸´¡£
3¡¢°Ñ´¦ÀíÏòÊý¾ÝÇ¨ÒÆ£¨Moving processing to the data£©
´«Í³¸ßÐÔÄܼÆËãϵͳͨ³£ÓкܶദÀíÆ÷½ÚµãÓëһЩÍâ´æ´¢Æ÷½ÚµãÏàÁ¬£¬ÈçÓÃÇøÓò´æ´¢ÍøÂç(SAN,Storage
Area Network)Á¬½ÓµÄ´ÅÅÌÕóÁУ¬Òò´Ë£¬´ó¹æÄ£Êý¾Ý´¦ÀíʱÍâ´æÎļþÊý¾ÝI/O·ÃÎÊ»á³ÉΪһ¸öÖÆÔ¼ÏµÍ³ÐÔÄܵį¿¾±¡£?ΪÁ˼õÉÙ´ó¹æÄ£Êý¾Ý²¢ÐмÆËãϵͳÖеÄÊý¾ÝͨÐÅ¿ªÏú£¬´úÖ®ÒÔ°ÑÊý¾Ý´«Ë͵½´¦Àí½Úµã(Êý¾ÝÏò´¦ÀíÆ÷»ò´úÂëÇ¨ÒÆ)£¬Ó¦µ±¿¼Âǽ«´¦ÀíÏòÊý¾Ý¿¿Â£ºÍÇ¨ÒÆ¡£?MapReduce²ÉÓÃÁËÊý¾Ý/´úÂ뻥¶¨Î»µÄ¼¼Êõ·½·¨£¬¼ÆËã½Úµã½«Ê×ÏȽ«¾¡Á¿¸ºÔð¼ÆËãÆä±¾µØ´æ´¢µÄÊý¾Ý,ÒÔ·¢»ÓÊý¾Ý±¾µØ»¯Ìصã(locality),½öµ±½ÚµãÎÞ·¨´¦Àí±¾µØÊý¾Ýʱ£¬ÔÙ²ÉÓþͽüÔÔòѰÕÒÆäËü¿ÉÓüÆËã½Úµã£¬²¢°ÑÊý¾Ý´«Ë͵½¸Ã¿ÉÓüÆËã½Úµã¡£
4¡¢Ë³Ðò´¦ÀíÊý¾Ý¡¢±ÜÃâËæ»ú·ÃÎÊÊý¾Ý£¨Process data sequentially
and avoid random access£©
´ó¹æÄ£Êý¾Ý´¦ÀíµÄÌØµã¾ö¶¨ÁË´óÁ¿µÄÊý¾Ý¼Ç¼²»¿ÉÄÜ´æ·ÅÔÚÄÚ´æ¡¢¶øÖ»¿ÉÄÜ·ÅÔÚÍâ´æÖнøÐд¦Àí¡£?´ÅÅ̵Ä˳Ðò·ÃÎʺÍËæ¼´·ÃÎÊÔÚÐÔÄÜÉÏÓо޴óµÄ²îÒì
Àý£º100ÒÚ(1010)¸öÊý¾Ý¼Ç¼(ÿ¼Ç¼100B,¹²¼Æ1TB)µÄÊý¾Ý¿â
¸üÐÂ1%µÄ¼Ç¼(Ò»¶¨ÊÇËæ»ú·ÃÎÊ)ÐèÒª1¸öÔÂʱ¼ä£»¶øË³Ðò·ÃÎʲ¢ÖØÐ´ËùÓÐÊý¾Ý¼Ç¼½öÐè1Ììʱ¼ä£¡
MapReduceÉè¼ÆÎªÃæÏò´óÊý¾Ý¼¯Åú´¦ÀíµÄ²¢ÐмÆËãϵͳ£¬ËùÓмÆËã¶¼±»×éÖ¯³ÉºÜ³¤µÄÁ÷ʽ²Ù×÷£¬ÒÔ±ãÄÜÀûÓ÷ֲ¼ÔÚ¼¯ÈºÖдóÁ¿½ÚµãÉÏ´ÅÅ̼¯ºÏµÄ¸ß´«Êä´ø¿í¡£
5¡¢ÎªÓ¦Óÿª·¢ÕßÒþ²ØÏµÍ³²ãϸ½Ú£¨Hide system-level details
from the application developer£©
Èí¼þ¹¤³Ìʵ¼ùÖ¸ÄÏÖУ¬×¨Òµ³ÌÐòÔ±ÈÏΪ֮ËùÒÔд³ÌÐòÀ§ÄÑ£¬ÊÇÒòΪ³ÌÐòÔ±ÐèÒª¼Çס̫¶àµÄ±à³Ìϸ½Ú(´Ó±äÁ¿Ãûµ½¸´ÔÓËã·¨µÄ±ß½çÇé¿ö´¦Àí)£¬Õâ¶Ô´óÄÔ¼ÇÒäÊÇÒ»¸ö¾Þ´óµÄÈÏÖª¸ºµ£,ÐèÒª¸ß¶È¼¯ÖÐ×¢ÒâÁ¦?¶ø²¢ÐгÌÐò±àдÓиü¶àÀ§ÄÑ£¬ÈçÐèÒª¿¼ÂǶàÏß³ÌÖÐÖîÈçͬ²½µÈ¸´ÔÓ·±ËöµÄϸ½Ú£¬ÓÉÓÚ²¢·¢Ö´ÐÐÖеIJ»¿ÉÔ¤²âÐÔ£¬³ÌÐòµÄµ÷ÊÔ²é´íҲʮ·ÖÀ§ÄÑ£»´ó¹æÄ£Êý¾Ý´¦Àíʱ³ÌÐòÔ±ÐèÒª¿¼ÂÇÖîÈçÊý¾Ý·Ö²¼´æ´¢¹ÜÀí¡¢Êý¾Ý·Ö·¢¡¢Êý¾ÝͨÐźÍͬ²½¡¢¼ÆËã½á¹ûÊÕ¼¯µÈÖî¶àϸ½ÚÎÊÌâ?MapReduceÌṩÁËÒ»ÖÖ³éÏó»úÖÆ½«³ÌÐòÔ±Óëϵͳ²ãϸ½Ú¸ôÀ뿪À´£¬³ÌÐòÔ±½öÐèÃèÊöÐèÒª¼ÆËãʲô(what
to compute), ¶ø¾ßÌåÔõôȥ×ö(how to compute)¾Í½»ÓÉϵͳµÄÖ´Ðпò¼Ü´¦Àí£¬ÕâÑù³ÌÐòÔ±¿É´Óϵͳ²ãϸ½ÚÖнâ·Å³öÀ´£¬¶øÖÂÁ¦ÓÚÆäÓ¦Óñ¾Éí¼ÆËãÎÊÌâµÄËã·¨Éè¼Æ
6¡¢Æ½»¬ÎÞ·ìµÄ¿ÉÀ©Õ¹ÐÔ£¨Seamless scalability£©
Ö÷Òª°üÀ¨Á½²ãÒâÒåÉϵÄÀ©Õ¹ÐÔ£ºÊý¾ÝÀ©Õ¹ºÍϵͳ¹æÄ£À©Õ¹¡£?ÀíÏëµÄÈí¼þËã·¨Ó¦µ±ÄÜËæ×ÅÊý¾Ý¹æÄ£µÄÀ©´ó¶ø±íÏÖ³ö³ÖÐøµÄÓÐЧÐÔ£¬ÐÔÄÜÉϵÄϽµ³Ì¶ÈÓ¦ÓëÊý¾Ý¹æÄ£À©´óµÄ±¶ÊýÏ൱?ÔÚ¼¯Èº¹æÄ£ÉÏ£¬ÒªÇóËã·¨µÄ¼ÆËãÐÔÄÜÓ¦ÄÜËæ×ŽڵãÊýµÄÔö¼Ó±£³Ö½Ó½üÏßÐԳ̶ȵÄÔö³¤?¾ø´ó¶àÊýÏÖÓеĵ¥»úËã·¨¶¼´ï²»µ½ÒÔÉÏÀíÏëµÄÒªÇó£»°ÑÖмä½á¹ûÊý¾Ýά»¤ÔÚÄÚ´æÖеĵ¥»úËã·¨ÔÚ´ó¹æÄ£Êý¾Ý´¦ÀíʱºÜ¿ìʧЧ£»´Óµ¥»úµ½»ùÓÚ´ó¹æÄ£¼¯ÈºµÄ²¢ÐмÆËã´Ó¸ù±¾ÉÏÐèÒªÍêÈ«²»Í¬µÄËã·¨Éè¼Æ?ÆæÃîµÄÊÇ£¬MapReduce¼¸ºõÄÜʵÏÖÒÔÉÏÀíÏëµÄÀ©Õ¹ÐÔÌØÕ÷¡£
¶àÏîÑо¿·¢ÏÖ»ùÓÚMapReduceµÄ¼ÆËãÐÔÄÜ¿ÉËæ½ÚµãÊýÄ¿Ôö³¤±£³Ö½üËÆÓÚÏßÐÔµÄÔö³¤¡£
|