Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Modeler   Code  
»áÔ±   
 
   
 
 
     
   
 ¶©ÔÄ
  ¾èÖú
MapReduceÔ­ÀíÓëÉè¼ÆË¼Ïë
 
×÷Õߣºcodingwu µÄ²©¿Í À´Ô´£º²©¿ÍÔ° ·¢²¼ÓÚ£º2015-6-24
  2651  次浏览      27
 

¼òµ¥½âÊÍ MapReduce Ëã·¨

Ò»¸öÓÐȤµÄÀý×Ó

ÄãÏëÊý³öÒ»ÞûÅÆÖÐÓжàÉÙÕźÚÌÒ¡£Ö±¹Û·½Ê½ÊÇÒ»ÕÅÒ»Õżì²é²¢ÇÒÊý³öÓжàÉÙÕÅÊǺÚÌÒ£¿

MapReduce·½·¨ÔòÊÇ£º

¸øÔÚ×ùµÄËùÓÐÍæ¼ÒÖзÖÅäÕâÞûÅÆ

ÈÃÿ¸öÍæ¼ÒÊý×Ô¼ºÊÖÖеÄÅÆÓм¸ÕÅÊǺÚÌÒ£¬È»ºó°ÑÕâ¸öÊýÄ¿»ã±¨¸øÄã

Äã°ÑËùÓÐÍæ¼Ò¸æËßÄãµÄÊý×Ö¼ÓÆðÀ´£¬µÃµ½×îºóµÄ½áÂÛ

²ð·Ö

MapReduceºÏ²¢ÁËÁ½ÖÖ¾­µäº¯Êý£º

Ó³É䣨Mapping£©¶Ô¼¯ºÏÀïµÄÿ¸öÄ¿±êÓ¦ÓÃͬһ¸ö²Ù×÷¡£¼´£¬Èç¹ûÄãÏë°Ñ±íµ¥Àïÿ¸öµ¥Ôª¸ñ³ËÒÔ¶þ£¬ÄÇô°ÑÕâ¸öº¯Êýµ¥¶ÀµØÓ¦ÓÃÔÚÿ¸öµ¥Ôª¸ñÉϵIJÙ×÷¾ÍÊôÓÚmapping¡£

»¯¼ò£¨Reducing £©±éÀú¼¯ºÏÖеÄÔªËØÀ´·µ»ØÒ»¸ö×ۺϵĽá¹û¡£¼´£¬Êä³ö±íµ¥ÀïÒ»ÁÐÊý×ֵĺÍÕâ¸öÈÎÎñÊôÓÚreducing¡£

ÖØÐÂÉóÊÓÉÏÃæµÄÀý×Ó

ÖØÐÂÉóÊÓÎÒÃÇÔ­À´ÄǸö·ÖɢֽůµÄÀý×Ó£¬ÎÒÃÇÓÐMapReduceÊý¾Ý·ÖÎöµÄ»ù±¾·½·¨¡£ÓÑÇéÌáʾ£ºÕâ²»ÊǸöÑϽ÷µÄÀý×Ó¡£ÔÚÕâ¸öÀý×ÓÀÈË´ú±í¼ÆËã»ú£¬ÒòΪËûÃÇͬʱ¹¤×÷£¬ËùÒÔËûÃÇÊǸö¼¯Èº¡£ÔÚ´ó¶àÊýʵ¼ÊÓ¦ÓÃÖУ¬ÎÒÃǼÙÉèÊý¾ÝÒѾ­ÔÚÿ̨¼ÆËã»úÉÏÁË ¨C Ò²¾ÍÊÇ˵°ÑÅÆ·Ö·¢³öÈ¥²¢²»ÊÇMapReduceµÄÒ»²½¡££¨ÊÂʵÉÏ£¬ÔÚ¼ÆËã»ú¼¯ÈºÖÐÈçºÎ´æ´¢ÎļþÊÇHadoopµÄÕæÕýºËÐÄ¡££©

ͨ¹ý°ÑÅÆ·Ö¸ø¶à¸öÍæ¼Ò²¢ÇÒÈÃËûÃǸ÷×ÔÊýÊý£¬Äã¾ÍÔÚ²¢ÐÐÖ´ÐÐÔËË㣬ÒòΪÿ¸öÍæ¼Ò¶¼ÔÚͬʱ¼ÆÊý¡£Õâͬʱ°ÑÕâÏ×÷±ä³ÉÁË·Ö²¼Ê½µÄ£¬ÒòΪ¶à¸ö²»Í¬µÄÈËÔÚ½â¾öͬһ¸öÎÊÌâµÄ¹ý³ÌÖв¢²»ÐèÒªÖªµÀËûÃǵÄÁÚ¾ÓÔÚ¸Éʲô¡£

ͨ¹ý¸æËßÿ¸öÈËÈ¥ÊýÊý£¬Äã¶ÔÒ»Ïî¼ì²éÿÕÅÅÆµÄÈÎÎñ½øÐÐÁËÓ³Éä¡£ Äã²»»áÈÃËûÃǰѺÚÌÒÅÆµÝ¸øÄ㣬¶øÊÇÈÃËûÃǰÑÄãÏëÒªµÄ¶«Î÷»¯¼òΪһ¸öÊý×Ö¡£

ÁíÍâÒ»¸öÓÐÒâ˼µÄÇé¿öÊÇÅÆ·ÖÅäµÃÓжà¾ùÔÈ¡£MapReduce¼ÙÉèÊý¾ÝÊÇÏ´¹ýµÄ£¨shuffled£©- Èç¹ûËùÓкÚÌÒ¶¼·Öµ½ÁËÒ»¸öÈËÊÖÉÏ£¬ÄÇËûÊýÅÆµÄ¹ý³Ì¿ÉÄÜ±ÈÆäËûÈËÒªÂýºÜ¶à¡£

Èç¹ûÓÐ×ã¹»µÄÈ˵ϰ£¬ÎÊһЩ¸üÓÐȤµÄÎÊÌâ¾ÍÏ൱¼òµ¥ÁË - ±ÈÈç¡°Ò»ÞûÅÆµÄƽ¾ùÖµ£¨¶þʮһµãËã·¨£©ÊÇʲô¡±¡£Äã¿ÉÒÔͨ¹ýºÏ²¢¡°ËùÓÐÅÆµÄÖµµÄºÍÊÇʲô¡±¼°¡°ÎÒÃÇÓжàÉÙÕÅÅÆ¡±ÕâÁ½¸öÎÊÌâÀ´µÃµ½´ð°¸¡£ÓÃÕâ¸öºÍ³ýÒÔÅÆµÄÕÅÊý¾ÍµÃµ½ÁËÆ½¾ùÖµ¡£

MapReduceËã·¨µÄ»úÖÆÒªÔ¶±ÈÕ⸴Ôӵö࣬µ«ÊÇÖ÷Ìå˼ÏëÊÇÒ»Ö嵀 ¨C ͨ¹ý·ÖÉ¢¼ÆËãÀ´·ÖÎö´óÁ¿Êý¾Ý¡£ÎÞÂÛÊÇFacebook¡¢NASA£¬»¹ÊÇС´´Òµ¹«Ë¾£¬MapReduce¶¼ÊÇĿǰ·ÖÎö»¥ÁªÍø¼¶±ðÊý¾ÝµÄÖ÷Á÷·½·¨¡£

HadoopÖеÄMapReduce

´ó¹æÄ£Êý¾Ý´¦Àíʱ£¬MapReduceÔÚÈý¸ö²ãÃæÉϵĻù±¾¹¹Ë¼

1¡¢ÈçºÎ¶Ô¸¶´óÊý¾Ý´¦Àí£º·Ö¶øÖÎÖ®

¶ÔÏ໥¼ä²»¾ßÓмÆËãÒÀÀµ¹ØÏµµÄ´óÊý¾Ý£¬ÊµÏÖ²¢ÐÐ×î×ÔÈ»µÄ°ì·¨¾ÍÊDzÉÈ¡·Ö¶øÖÎÖ®µÄ²ßÂÔ

2¡¢ÉÏÉýµ½³éÏóÄ£ÐÍ£ºMapperÓëReducer

MPIµÈ²¢ÐмÆËã·½·¨È±Éٸ߲㲢Ðбà³ÌÄ£ÐÍ£¬ÎªÁ˿˷þÕâһȱÏÝ£¬MapReduce½è¼øÁËLispº¯ÊýʽÓïÑÔÖеÄ˼Ï룬ÓÃMapºÍReduceÁ½¸öº¯ÊýÌṩÁ˸߲ãµÄ²¢Ðбà³Ì³éÏóÄ£ÐÍ

3¡¢ÉÏÉýµ½¹¹¼Ü£ºÍ³Ò»¹¹¼Ü£¬Îª³ÌÐòÔ±Òþ²ØÏµÍ³²ãϸ½Ú

MPIµÈ²¢ÐмÆËã·½·¨È±ÉÙͳһµÄ¼ÆËã¿ò¼ÜÖ§³Ö£¬³ÌÐòÔ±ÐèÒª¿¼ÂÇÊý¾Ý´æ´¢¡¢»®·Ö¡¢·Ö·¢¡¢½á¹ûÊÕ¼¯¡¢´íÎó»Ö¸´µÈÖî¶àϸ½Ú£»Îª´Ë£¬MapReduceÉè¼Æ²¢ÌṩÁËͳһµÄ¼ÆËã¿ò¼Ü£¬Îª³ÌÐòÔ±Òþ²ØÁ˾ø´ó¶àÊýϵͳ²ãÃæµÄ´¦Àíϸ½Ú

1.¶Ô¸¶´óÊý¾Ý´¦Àí-·Ö¶øÖÎÖ®

ʲôÑùµÄ¼ÆËãÈÎÎñ¿É½øÐв¢Ðл¯¼ÆË㣿

²¢ÐмÆËãµÄµÚÒ»¸öÖØÒªÎÊÌâÊÇÈçºÎ»®·Ö¼ÆËãÈÎÎñ»òÕß¼ÆËãÊý¾ÝÒÔ±ã¶Ô»®·ÖµÄ×ÓÈÎÎñ»òÊý¾Ý¿éͬʱ½øÐмÆËã¡£µ«Ò»Ð©¼ÆËãÎÊÌâǡǡÎÞ·¨½øÐÐÕâÑùµÄ»®·Ö£¡

Nine women cannot have a baby in one month!

ÀýÈ磺Fibonacciº¯Êý: Fk+2 = Fk + Fk+1

ǰºóÊý¾ÝÏîÖ®¼ä´æÔÚºÜÇ¿µÄÒÀÀµ¹ØÏµ£¡Ö»ÄÜ´®ÐмÆË㣡

½áÂÛ£º²»¿É·Ö²ðµÄ¼ÆËãÈÎÎñ»òÏ໥¼äÓÐÒÀÀµ¹ØÏµµÄÊý¾ÝÎÞ·¨½øÐв¢ÐмÆË㣡

´óÊý¾ÝµÄ²¢Ðл¯¼ÆËã

Ò»¸ö´óÊý¾ÝÈô¿ÉÒÔ·ÖΪ¾ßÓÐͬÑù¼ÆËã¹ý³ÌµÄÊý¾Ý¿é£¬²¢ÇÒÕâЩÊý¾Ý¿éÖ®¼ä²»´æÔÚÊý¾ÝÒÀÀµ¹ØÏµ£¬ÔòÌá¸ß´¦ÀíËٶȵÄ×îºÃ°ì·¨¾ÍÊDz¢ÐмÆËã

ÀýÈ磺¼ÙÉèÓÐÒ»¸ö¾Þ´óµÄ2άÊý¾ÝÐèÒª´¦Àí(±ÈÈçÇóÿ¸öÔªËØµÄ¿ªÁ¢·½)£¬ÆäÖжÔÿ¸öÔªËØµÄ´¦ÀíÊÇÏàͬµÄ,²¢ÇÒÊý¾ÝÔªËØ¼ä²»´æÔÚÊý¾ÝÒÀÀµ¹ØÏµ,¿ÉÒÔ¿¼ÂDz»Í¬µÄ»®·Ö·½·¨½«Æä»®·ÖΪ×ÓÊý×é,ÓÉÒ»×é´¦ÀíÆ÷²¢Ðд¦Àí

2.¹¹½¨³éÏóÄ£ÐÍ-MapºÍReduce

½è¼øº¯ÊýʽÉè¼ÆÓïÑÔLispµÄÉè¼ÆË¼Ïë

?º¯Êýʽ³ÌÐòÉè¼Æ(functional programming)ÓïÑÔLispÊÇÒ»ÖÖÁÐ±í´¦Àí ÓïÑÔ(List processing)£¬ÊÇÒ»ÖÖÓ¦ÓÃÓÚÈ˹¤ÖÇÄÜ´¦ÀíµÄ·ûºÅʽÓïÑÔ£¬ÓÉMITµÄÈ˹¤ÖÇÄÜר¼Ò¡¢Í¼Áé½±»ñµÃÕßJohn McCarthyÓÚ1958ÄêÉè¼Æ·¢Ã÷¡£

?Lisp¶¨ÒåÁ˿ɶÔÁбíÔªËØ½øÐÐÕûÌå´¦ÀíµÄ¸÷ÖÖ²Ù×÷£¬È磺

È磺(add #(1 2 3 4) #(4 3 2 1)) ½«²úÉú½á¹û£º #(5 5 5 5)

?LispÖÐÒ²ÌṩÁËÀàËÆÓÚMapºÍReduceµÄ²Ù×÷

Èç: (map ¡®vector #+ #(1 2 3 4 5) #(10 11 12 13 14))

ͨ¹ý¶¨Òå¼Ó·¨mapÔËË㽫2¸öÏòÁ¿Ïà¼Ó²úÉú½á¹û#(11 13 15 17 19)

(reduce #¡¯+ #(11 13 15 17 19)) ͨ¹ý¼Ó·¨¹é²¢²úÉúÀÛ¼Ó½á¹û75

Map: ¶ÔÒ»×éÊý¾ÝÔªËØ½øÐÐijÖÖÖØ¸´Ê½µÄ´¦Àí

Reduce: ¶ÔMapµÄÖмä½á¹û½øÐÐijÖÖ½øÒ»²½µÄ½á¹ûÕû

¹Ø¼ü˼Ï룺Ϊ´óÊý¾Ý´¦Àí¹ý³ÌÖеÄÁ½¸öÖ÷Òª´¦Àí²Ù×÷ÌṩһÖÖ³éÏó»úÖÆ

MapReduceÖеÄMapºÍReduce²Ù×÷µÄ³éÏóÃèÊö

MapReduce½è¼øÁ˺¯Êýʽ³ÌÐòÉè¼ÆÓïÑÔLispÖеÄ˼Ï룬¶¨ÒåÁËÈçϵÄMapºÍReduceÁ½¸ö³éÏóµÄ±à³Ì½Ó¿Ú£¬ÓÉÓû§È¥±à³ÌʵÏÖ:

map: (k1; v1) ¡ú [(k2; v2)]

ÊäÈ룺¼üÖµ¶Ô(k1; v1)±íʾµÄÊý¾Ý

´¦Àí£ºÎĵµÊý¾Ý¼Ç¼(ÈçÎı¾ÎļþÖеÄÐУ¬»òÊý¾Ý±í¸ñÖеÄÐÐ)½«ÒÔ¡°¼üÖµ¶Ô¡±ÐÎʽ´«Èëmapº¯Êý£»mapº¯Êý½«´¦ÀíÕâЩ¼üÖµ¶Ô£¬²¢ÒÔÁíÒ»ÖÖ¼üÖµ¶ÔÐÎʽÊä³ö´¦ÀíµÄÒ»×é¼üÖµ¶ÔÖмä½á¹û¡¡¡¡¡¡[(k2; v2)]

Êä³ö£º¼üÖµ¶Ô[(k2; v2)]±íʾµÄÒ»×éÖмäÊý¾Ý

reduce: (k2; [v2]) ¡ú [(k3; v3)]

ÊäÈ룺 ÓÉmapÊä³öµÄÒ»×é¼üÖµ¶Ô[(k2; v2)] ½«±»½øÐкϲ¢´¦Àí½«Í¬ÑùÖ÷¼üϵIJ»Í¬ÊýÖµºÏ²¢µ½Ò»¸öÁбí[v2]ÖУ¬¹ÊreduceµÄÊäÈëΪ(k2; [v2])

´¦Àí£º¶Ô´«ÈëµÄÖмä½á¹ûÁбíÊý¾Ý½øÐÐijÖÖÕûÀí»ò½øÒ»²½µÄ´¦Àí,²¢²úÉú×îÖÕµÄijÖÖÐÎʽµÄ½á¹ûÊä³ö[(k3; v3)] ¡£

Êä³ö£º×îÖÕÊä³ö½á¹û[(k3; v3)]

MapºÍReduceΪ³ÌÐòÔ±ÌṩÁËÒ»¸öÇåÎúµÄ²Ù×÷½Ó¿Ú³éÏóÃèÊö

¸÷¸ömapº¯Êý¶ÔËù»®·ÖµÄÊý¾Ý²¢Ðд¦Àí£¬´Ó²»Í¬µÄÊäÈëÊý¾Ý²úÉú²»Í¬µÄÖмä½á¹ûÊä³ö

¸÷¸öreduceÒ²¸÷×Ô²¢ÐмÆË㣬¸÷×Ô¸ºÔð´¦Àí²»Í¬µÄÖмä½á¹ûÊý¾Ý¼¯ºÏ?½øÐÐreduce´¦Àí֮ǰ,±ØÐëµÈµ½ËùÓеÄmapº¯Êý×öÍ꣬Òò´Ë,ÔÚ½øÈëreduceǰÐèÒªÓÐÒ»¸öͬ²½ÕÏ(barrier);Õâ¸ö½×¶ÎÒ²¸ºÔð¶ÔmapµÄÖмä½á¹ûÊý¾Ý½øÐÐÊÕ¼¯ÕûÀí(aggregation & shuffle)´¦Àí,ÒÔ±ãreduce¸üÓÐЧµØ¼ÆËã×îÖÕ½á¹û?×îÖÕ»ã×ÜËùÓÐreduceµÄÊä³ö½á¹û¼´¿É»ñµÃ×îÖÕ½á¹û

»ùÓÚMapReduceµÄ´¦Àí¹ý³ÌʾÀý¨CÎĵµ´ÊƵͳ¼Æ£ºWordCount

ÉèÓÐ4×éԭʼÎı¾Êý¾Ý£º

Text 1: the weather is good Text 2: today is good

Text 3: good weather is good Text 4: today has good weather

´«Í³µÄ´®Ðд¦Àí·½Ê½(Java)£º

String[] text = new String[] { ¡°hello world¡±, ¡°hello every one¡±, ¡°say hello to everyone in the world¡± £ý;
HashTable ht = new HashTable();
for(i = 0; i < 3; ++i) {
StringTokenizer st = new StringTokenizer(text[i]);
while (st.hasMoreTokens()) {
String word = st.nextToken();
if(!ht.containsKey(word)) {
ht.put(word, new Integer(1));
} else {
int wc = ((Integer)ht.get(word)).intValue() +1;
// ¼ÆÊý¼Ó1
ht.put(word, new Integer(wc));
}
}
}
for (Iterator itr=ht.KeySet().iterator(); itr.hasNext(); ) {
String word = (String)itr.next();
System.out.print(word+ ¡°: ¡±+ (Integer)ht.get(word)+¡°; ¡±);

Êä³ö£ºgood: 5; has: 1; is: 3; the: 1; today: 2; weather: 3

»ùÓÚMapReduceµÄ´¦Àí¹ý³ÌʾÀý¨CÎĵµ´ÊƵͳ¼Æ£ºWordCount

MapReduce´¦Àí·½Ê½

ʹÓÃ4¸ömap½Úµã£º

map½Úµã1:

ÊäÈ룺(text1, ¡°the weather is good¡±)

Êä³ö£º(the, 1), (weather, 1), (is, 1), (good, 1)

map½Úµã2:

ÊäÈ룺(text2, ¡°today is good¡±)

Êä³ö£º(today, 1), (is, 1), (good, 1)

map½Úµã3:

ÊäÈ룺(text3, ¡°good weather is good¡±)

Êä³ö£º(good, 1), (weather, 1), (is, 1), (good, 1)

map½Úµã4:

ÊäÈ룺(text3, ¡°today has good weather¡±)

Êä³ö£º(today, 1), (has, 1), (good, 1), (weather, 1)

ʹÓÃ3¸öreduce½Úµã£º

MapReduce´¦Àí·½Ê½

MapReduceα´úÂë(ʵÏÖMapºÍReduceÁ½¸öº¯Êý)£º

Class Mapper method map(String input_key, String input_value):

// input_key: text document name

// input_value: document contents
for each word w in input_value:
EmitIntermediate(w, "1");

Class Reducer method reduce(String output_key, Iterator intermediate_values):

// output_key: a word

// output_values: a list of counts
int result = 0;
for each v in intermediate_values:
result += ParseInt(v);
Emit(output_key£¬ result);

3.ÉÏÉýµ½¹¹¼Ü-×Ô¶¯²¢Ðл¯²¢Òþ²ØµÍ²ãϸ½Ú

ÈçºÎÌṩͳһµÄ¼ÆËã¿ò¼Ü

MapReduceÌṩһ¸öͳһµÄ¼ÆËã¿ò¼Ü£¬¿ÉÍê³É£º

¼ÆËãÈÎÎñµÄ»®·ÖºÍµ÷¶È

Êý¾ÝµÄ·Ö²¼´æ´¢ºÍ»®·Ö

´¦ÀíÊý¾ÝÓë¼ÆËãÈÎÎñµÄͬ²½

½á¹ûÊý¾ÝµÄÊÕ¼¯ÕûÀí(sorting, combining, partitioning,¡­)

ϵͳͨÐÅ¡¢¸ºÔØÆ½ºâ¡¢¼ÆËãÐÔÄÜÓÅ»¯´¦Àí

´¦Àíϵͳ½Úµã³ö´í¼ì²âºÍʧЧ»Ö¸´

MapReduce×î´óµÄÁÁµã

ͨ¹ý³éÏóÄ£ÐͺͼÆËã¿ò¼Ü°ÑÐèÒª×öʲô(what need to do)Óë¾ßÌåÔõô×ö(how to do)·Ö¿ªÁË£¬Îª³ÌÐòÔ±Ìṩһ¸ö³éÏóºÍ¸ß²ãµÄ±à³Ì½Ó¿ÚºÍ¿ò¼Ü

³ÌÐòÔ±½öÐèÒª¹ØÐÄÆäÓ¦ÓòãµÄ¾ßÌ弯ËãÎÊÌ⣬½öÐè±àдÉÙÁ¿µÄ´¦ÀíÓ¦Óñ¾Éí¼ÆËãÎÊÌâµÄ³ÌÐò´úÂë

ÈçºÎ¾ßÌåÍê³ÉÕâ¸ö²¢ÐмÆËãÈÎÎñËùÏà¹ØµÄÖî¶àϵͳ²ãϸ½Ú±»Òþ²ØÆðÀ´,½»¸ø¼ÆËã¿ò¼ÜÈ¥´¦Àí£º´Ó·Ö²¼´úÂëµÄÖ´ÐУ¬µ½´óµ½ÊýǧСµ½µ¥¸ö½Úµã¼¯ÈºµÄ×Ô¶¯µ÷¶ÈʹÓÃ

MapReduceÌṩµÄÖ÷Òª¹¦ÄÜ

?ÈÎÎñµ÷¶È£ºÌá½»µÄÒ»¸ö¼ÆËã×÷Òµ(job)½«±»»®·ÖΪºÜ¶à¸ö¼ÆËãÈÎÎñ(tasks), ÈÎÎñµ÷¶È¹¦ÄÜÖ÷Òª¸ºÔðΪÕâЩ»®·ÖºóµÄ¼ÆËãÈÎÎñ·ÖÅäºÍµ÷¶È¼ÆËã½Úµã(map½Úµã»òreducer½Úµã); ͬʱ¸ºÔð¼à¿ØÕâЩ½ÚµãµÄÖ´ÐÐ״̬, ²¢¸ºÔðmap½ÚµãÖ´ÐеÄͬ²½¿ØÖÆ(barrier); Ò²¸ºÔð½øÐÐһЩ¼ÆËãÐÔÄÜÓÅ»¯´¦Àí, Èç¶Ô×îÂýµÄ¼ÆËãÈÎÎñ²ÉÓö౸·ÝÖ´ÐС¢Ñ¡×î¿ìÍê³ÉÕß×÷Ϊ½á¹û

?Êý¾Ý/´úÂ뻥¶¨Î»£ºÎªÁ˼õÉÙÊý¾ÝͨÐÅ£¬Ò»¸ö»ù±¾Ô­ÔòÊDZ¾µØ»¯Êý¾Ý´¦Àí(locality)£¬¼´Ò»¸ö¼ÆËã½Úµã¾¡¿ÉÄÜ´¦ÀíÆä±¾µØ´ÅÅÌÉÏËù·Ö²¼´æ´¢µÄÊý¾Ý£¬ÕâʵÏÖÁË´úÂëÏòÊý¾ÝµÄÇ¨ÒÆ£»µ±ÎÞ·¨½øÐÐÕâÖÖ±¾µØ»¯Êý¾Ý´¦Àíʱ£¬ÔÙѰÕÒÆäËü¿ÉÓýڵ㲢½«Êý¾Ý´ÓÍøÂçÉÏ´«Ë͸ø¸Ã½Úµã(Êý¾ÝÏò´úÂëÇ¨ÒÆ)£¬µ«½«¾¡¿ÉÄÜ´ÓÊý¾ÝËùÔڵı¾µØ»ú¼ÜÉÏѰÕÒ¿ÉÓýڵãÒÔ¼õÉÙͨÐÅÑÓ³Ù

?³ö´í´¦Àí£ºÒԵͶËÉÌÓ÷þÎñÆ÷¹¹³ÉµÄ´ó¹æÄ£MapReduce¼ÆË㼯ȺÖÐ,½ÚµãÓ²¼þ(Ö÷»ú¡¢´ÅÅÌ¡¢ÄÚ´æµÈ)³ö´íºÍÈí¼þÓÐbugÊdz£Ì¬£¬Òò´Ë,MapReducerÐèÒªÄܼì²â²¢¸ôÀë³ö´í½Úµã£¬²¢µ÷¶È·ÖÅäеĽڵã½Ó¹Ü³ö´í½ÚµãµÄ¼ÆËãÈÎÎñ

·Ö²¼Ê½Êý¾Ý´æ´¢ÓëÎļþ¹ÜÀí£ºº£Á¿Êý¾Ý´¦ÀíÐèÒªÒ»¸öÁ¼ºÃµÄ·Ö²¼Êý¾Ý´æ´¢ºÍÎļþ¹ÜÀíϵͳ֧³Å,¸ÃÎļþϵͳÄܹ»°Ñº£Á¿Êý¾Ý·Ö²¼´æ´¢ÔÚ¸÷¸ö½ÚµãµÄ±¾µØ´ÅÅÌÉÏ,µ«±£³ÖÕû¸öÊý¾ÝÔÚÂß¼­ÉϳÉΪһ¸öÍêÕûµÄÊý¾ÝÎļþ£»ÎªÁËÌṩÊý¾Ý´æ´¢ÈÝ´í»úÖÆ,¸ÃÎļþϵͳ»¹ÒªÌṩÊý¾Ý¿éµÄ¶à±¸·Ý´æ´¢¹ÜÀíÄÜÁ¦

CombinerºÍPartitioner:ΪÁ˼õÉÙÊý¾ÝͨÐÅ¿ªÏú,Öмä½á¹ûÊý¾Ý½øÈëreduce½ÚµãǰÐèÒª½øÐкϲ¢(combine)´¦Àí,°Ñ¾ßÓÐͬÑùÖ÷¼üµÄÊý¾ÝºÏ²¢µ½Ò»Æð±ÜÃâÖØ¸´´«ËÍ; Ò»¸öreducer½ÚµãËù´¦ÀíµÄÊý¾Ý¿ÉÄÜ»áÀ´×Ô¶à¸ömap½Úµã, Òò´Ë, map½ÚµãÊä³öµÄÖмä½á¹ûÐèʹÓÃÒ»¶¨µÄ²ßÂÔ½øÐÐÊʵ±µÄ»®·Ö(partitioner)´¦Àí£¬±£Ö¤Ïà¹ØÊý¾Ý·¢Ë͵½Í¬Ò»¸öreducer½Úµã

»ùÓÚMapºÍReduceµÄ²¢ÐмÆËãÄ£ÐÍ

4.MapReduceµÄÖ÷ÒªÉè¼ÆË¼ÏëºÍÌØÕ÷

1¡¢Ïò¡°Í⡱ºáÏòÀ©Õ¹£¬¶ø·ÇÏò¡°ÉÏ¡±×ÝÏòÀ©Õ¹£¨Scale ¡°out¡±, not ¡°up¡±£©

¼´MapReduce¼¯ÈºµÄ¹¹ÖþÑ¡Óü۸ñ±ãÒË¡¢Ò×ÓÚÀ©Õ¹µÄ´óÁ¿µÍ¶ËÉÌÓ÷þÎñÆ÷£¬¶ø·Ç¼Û¸ñ°º¹ó¡¢²»Ò×À©Õ¹µÄ¸ß¶Ë·þÎñÆ÷£¨SMP£©?µÍ¶Ë·þÎñÆ÷Êг¡Óë¸ßÈÝÁ¿Desktop PCÓÐÖØµþµÄÊг¡£¬Òò´Ë£¬ÓÉÓÚÏ໥¼ä¼Û¸ñµÄ¾ºÕù¡¢¿É»¥»»µÄ²¿¼þ¡¢ºÍ¹æÄ£¾­¼ÃЧӦ£¬Ê¹µÃµÍ¶Ë·þÎñÆ÷±£³Ö½ÏµÍµÄ¼Û¸ñ?»ùÓÚTPC-CÔÚ2007Äêµ×µÄÐÔÄÜÆÀ¹À½á¹û,Ò»¸öµÍ¶Ë·þÎñÆ÷ƽ̨Óë¸ß¶ËµÄ¹²Ïí´æ´¢Æ÷½á¹¹µÄ·þÎñÆ÷ƽ̨Ïà±È,ÆäÐԼ۱ȴóÔ¼Òª¸ß4±¶;Èç¹û°ÑÍâ´æ¼Û¸ñ³ýÍâ,µÍ¶Ë·þÎñÆ÷ÐԼ۱ȴóÔ¼Ìá¸ß12±¶?¶ÔÓÚ´ó¹æÄ£Êý¾Ý´¦Àí£¬ÓÉÓÚÓдóÁ¿Êý¾Ý´æ´¢ÐèÒª£¬ÏÔ¶øÒ×¼û£¬»ùÓڵͶ˷þÎñÆ÷µÄ¼¯ÈºÔ¶±È»ùÓڸ߶˷þÎñÆ÷µÄ¼¯ÈºÓÅÔ½£¬Õâ¾ÍÊÇΪʲôMapReduce²¢ÐмÆË㼯Ⱥ»á»ùÓڵͶ˷þÎñÆ÷ʵÏÖ

2¡¢Ê§Ð§±»ÈÏΪÊdz£Ì¬£¨Assume failures are common£©

MapReduce¼¯ÈºÖÐʹÓôóÁ¿µÄµÍ¶Ë·þÎñÆ÷(GoogleĿǰÔÚÈ«Çò¹²Ê¹ÓðÙÍǫ̀ÒÔÉϵķþÎñÆ÷½Úµã),Òò´Ë£¬½ÚµãÓ²¼þʧЧºÍÈí¼þ³ö´íÊdz£Ì¬£¬Òò¶ø£º

Ò»¸öÁ¼ºÃÉè¼Æ¡¢¾ßÓÐÈÝ´íÐԵIJ¢ÐмÆËãϵͳ²»ÄÜÒòΪ½ÚµãʧЧ¶øÓ°Ïì¼ÆËã·þÎñµÄÖÊÁ¿£¬ÈκνڵãʧЧ¶¼²»Ó¦µ±µ¼Ö½á¹ûµÄ²»Ò»Ö»ò²»È·¶¨ÐÔ£»ÈκÎÒ»¸ö½ÚµãʧЧʱ£¬ÆäËü½ÚµãÒªÄܹ»ÎÞ·ì½Ó¹ÜʧЧ½ÚµãµÄ¼ÆËãÈÎÎñ£»µ±Ê§Ð§½Úµã»Ö¸´ºóÓ¦ÄÜ×Ô¶¯ÎÞ·ì¼ÓÈ뼯Ⱥ£¬¶ø²»ÐèÒª¹ÜÀíÔ±È˹¤½øÐÐϵͳÅäÖÃ

MapReduce²¢ÐмÆËãÈí¼þ¿ò¼ÜʹÓÃÁ˶àÖÖÓÐЧµÄ»úÖÆ£¬Èç½Úµã×Ô¶¯ÖØÆô¼¼Êõ£¬Ê¹¼¯ÈººÍ¼ÆËã¿ò¼Ü¾ßÓжԸ¶½ÚµãʧЧµÄ½¡×³ÐÔ£¬ÄÜÓÐЧ´¦ÀíʧЧ½ÚµãµÄ¼ì²âºÍ»Ö¸´¡£

3¡¢°Ñ´¦ÀíÏòÊý¾ÝÇ¨ÒÆ£¨Moving processing to the data£©

´«Í³¸ßÐÔÄܼÆËãϵͳͨ³£ÓкܶദÀíÆ÷½ÚµãÓëһЩÍâ´æ´¢Æ÷½ÚµãÏàÁ¬£¬ÈçÓÃÇøÓò´æ´¢ÍøÂç(SAN,Storage Area Network)Á¬½ÓµÄ´ÅÅÌÕóÁУ¬Òò´Ë£¬´ó¹æÄ£Êý¾Ý´¦ÀíʱÍâ´æÎļþÊý¾ÝI/O·ÃÎÊ»á³ÉΪһ¸öÖÆÔ¼ÏµÍ³ÐÔÄܵį¿¾±¡£?ΪÁ˼õÉÙ´ó¹æÄ£Êý¾Ý²¢ÐмÆËãϵͳÖеÄÊý¾ÝͨÐÅ¿ªÏú£¬´úÖ®ÒÔ°ÑÊý¾Ý´«Ë͵½´¦Àí½Úµã(Êý¾ÝÏò´¦ÀíÆ÷»ò´úÂëÇ¨ÒÆ)£¬Ó¦µ±¿¼Âǽ«´¦ÀíÏòÊý¾Ý¿¿Â£ºÍÇ¨ÒÆ¡£?MapReduce²ÉÓÃÁËÊý¾Ý/´úÂ뻥¶¨Î»µÄ¼¼Êõ·½·¨£¬¼ÆËã½Úµã½«Ê×ÏȽ«¾¡Á¿¸ºÔð¼ÆËãÆä±¾µØ´æ´¢µÄÊý¾Ý,ÒÔ·¢»ÓÊý¾Ý±¾µØ»¯Ìصã(locality),½öµ±½ÚµãÎÞ·¨´¦Àí±¾µØÊý¾Ýʱ£¬ÔÙ²ÉÓþͽüÔ­ÔòѰÕÒÆäËü¿ÉÓüÆËã½Úµã£¬²¢°ÑÊý¾Ý´«Ë͵½¸Ã¿ÉÓüÆËã½Úµã¡£

4¡¢Ë³Ðò´¦ÀíÊý¾Ý¡¢±ÜÃâËæ»ú·ÃÎÊÊý¾Ý£¨Process data sequentially and avoid random access£©

´ó¹æÄ£Êý¾Ý´¦ÀíµÄÌØµã¾ö¶¨ÁË´óÁ¿µÄÊý¾Ý¼Ç¼²»¿ÉÄÜ´æ·ÅÔÚÄÚ´æ¡¢¶øÖ»¿ÉÄÜ·ÅÔÚÍâ´æÖнøÐд¦Àí¡£?´ÅÅ̵Ä˳Ðò·ÃÎʺÍËæ¼´·ÃÎÊÔÚÐÔÄÜÉÏÓо޴óµÄ²îÒì

Àý£º100ÒÚ(1010)¸öÊý¾Ý¼Ç¼(ÿ¼Ç¼100B,¹²¼Æ1TB)µÄÊý¾Ý¿â

¸üÐÂ1%µÄ¼Ç¼(Ò»¶¨ÊÇËæ»ú·ÃÎÊ)ÐèÒª1¸öÔÂʱ¼ä£»¶øË³Ðò·ÃÎʲ¢ÖØÐ´ËùÓÐÊý¾Ý¼Ç¼½öÐè1Ììʱ¼ä£¡

MapReduceÉè¼ÆÎªÃæÏò´óÊý¾Ý¼¯Åú´¦ÀíµÄ²¢ÐмÆËãϵͳ£¬ËùÓмÆËã¶¼±»×éÖ¯³ÉºÜ³¤µÄÁ÷ʽ²Ù×÷£¬ÒÔ±ãÄÜÀûÓ÷ֲ¼ÔÚ¼¯ÈºÖдóÁ¿½ÚµãÉÏ´ÅÅ̼¯ºÏµÄ¸ß´«Êä´ø¿í¡£

5¡¢ÎªÓ¦Óÿª·¢ÕßÒþ²ØÏµÍ³²ãϸ½Ú£¨Hide system-level details from the application developer£©

Èí¼þ¹¤³Ìʵ¼ùÖ¸ÄÏÖУ¬×¨Òµ³ÌÐòÔ±ÈÏΪ֮ËùÒÔд³ÌÐòÀ§ÄÑ£¬ÊÇÒòΪ³ÌÐòÔ±ÐèÒª¼Çס̫¶àµÄ±à³Ìϸ½Ú(´Ó±äÁ¿Ãûµ½¸´ÔÓËã·¨µÄ±ß½çÇé¿ö´¦Àí)£¬Õâ¶Ô´óÄÔ¼ÇÒäÊÇÒ»¸ö¾Þ´óµÄÈÏÖª¸ºµ£,ÐèÒª¸ß¶È¼¯ÖÐ×¢ÒâÁ¦?¶ø²¢ÐгÌÐò±àдÓиü¶àÀ§ÄÑ£¬ÈçÐèÒª¿¼ÂǶàÏß³ÌÖÐÖîÈçͬ²½µÈ¸´ÔÓ·±ËöµÄϸ½Ú£¬ÓÉÓÚ²¢·¢Ö´ÐÐÖеIJ»¿ÉÔ¤²âÐÔ£¬³ÌÐòµÄµ÷ÊÔ²é´íҲʮ·ÖÀ§ÄÑ£»´ó¹æÄ£Êý¾Ý´¦Àíʱ³ÌÐòÔ±ÐèÒª¿¼ÂÇÖîÈçÊý¾Ý·Ö²¼´æ´¢¹ÜÀí¡¢Êý¾Ý·Ö·¢¡¢Êý¾ÝͨÐźÍͬ²½¡¢¼ÆËã½á¹ûÊÕ¼¯µÈÖî¶àϸ½ÚÎÊÌâ?MapReduceÌṩÁËÒ»ÖÖ³éÏó»úÖÆ½«³ÌÐòÔ±Óëϵͳ²ãϸ½Ú¸ôÀ뿪À´£¬³ÌÐòÔ±½öÐèÃèÊöÐèÒª¼ÆËãʲô(what to compute), ¶ø¾ßÌåÔõôȥ×ö(how to compute)¾Í½»ÓÉϵͳµÄÖ´Ðпò¼Ü´¦Àí£¬ÕâÑù³ÌÐòÔ±¿É´Óϵͳ²ãϸ½ÚÖнâ·Å³öÀ´£¬¶øÖÂÁ¦ÓÚÆäÓ¦Óñ¾Éí¼ÆËãÎÊÌâµÄËã·¨Éè¼Æ

6¡¢Æ½»¬ÎÞ·ìµÄ¿ÉÀ©Õ¹ÐÔ£¨Seamless scalability£©

Ö÷Òª°üÀ¨Á½²ãÒâÒåÉϵÄÀ©Õ¹ÐÔ£ºÊý¾ÝÀ©Õ¹ºÍϵͳ¹æÄ£À©Õ¹¡£?ÀíÏëµÄÈí¼þËã·¨Ó¦µ±ÄÜËæ×ÅÊý¾Ý¹æÄ£µÄÀ©´ó¶ø±íÏÖ³ö³ÖÐøµÄÓÐЧÐÔ£¬ÐÔÄÜÉϵÄϽµ³Ì¶ÈÓ¦ÓëÊý¾Ý¹æÄ£À©´óµÄ±¶ÊýÏ൱?ÔÚ¼¯Èº¹æÄ£ÉÏ£¬ÒªÇóËã·¨µÄ¼ÆËãÐÔÄÜÓ¦ÄÜËæ×ŽڵãÊýµÄÔö¼Ó±£³Ö½Ó½üÏßÐԳ̶ȵÄÔö³¤?¾ø´ó¶àÊýÏÖÓеĵ¥»úËã·¨¶¼´ï²»µ½ÒÔÉÏÀíÏëµÄÒªÇó£»°ÑÖмä½á¹ûÊý¾Ýά»¤ÔÚÄÚ´æÖеĵ¥»úËã·¨ÔÚ´ó¹æÄ£Êý¾Ý´¦ÀíʱºÜ¿ìʧЧ£»´Óµ¥»úµ½»ùÓÚ´ó¹æÄ£¼¯ÈºµÄ²¢ÐмÆËã´Ó¸ù±¾ÉÏÐèÒªÍêÈ«²»Í¬µÄËã·¨Éè¼Æ?ÆæÃîµÄÊÇ£¬MapReduce¼¸ºõÄÜʵÏÖÒÔÉÏÀíÏëµÄÀ©Õ¹ÐÔÌØÕ÷¡£ ¶àÏîÑо¿·¢ÏÖ»ùÓÚMapReduceµÄ¼ÆËãÐÔÄÜ¿ÉËæ½ÚµãÊýÄ¿Ôö³¤±£³Ö½üËÆÓÚÏßÐÔµÄÔö³¤¡£

   
2651 ´Îä¯ÀÀ       27
Ïà¹ØÎÄÕÂ

»ùÓÚEAµÄÊý¾Ý¿â½¨Ä£
Êý¾ÝÁ÷½¨Ä££¨EAÖ¸ÄÏ£©
¡°Êý¾Ýºþ¡±£º¸ÅÄî¡¢ÌØÕ÷¡¢¼Ü¹¹Óë°¸Àý
ÔÚÏßÉ̳ÇÊý¾Ý¿âϵͳÉè¼Æ ˼·+Ч¹û
 
Ïà¹ØÎĵµ

GreenplumÊý¾Ý¿â»ù´¡Åàѵ
MySQL5.1ÐÔÄÜÓÅ»¯·½°¸
ijµçÉÌÊý¾ÝÖÐ̨¼Ü¹¹Êµ¼ù
MySQL¸ßÀ©Õ¹¼Ü¹¹Éè¼Æ
Ïà¹Ø¿Î³Ì

Êý¾ÝÖÎÀí¡¢Êý¾Ý¼Ü¹¹¼°Êý¾Ý±ê×¼
MongoDBʵս¿Î³Ì
²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿âÉè¼ÆÓëÓÅ»¯
PostgreSQLÊý¾Ý¿âʵսÅàѵ
×îл¼Æ»®
DeepSeekÔÚÈí¼þ²âÊÔÓ¦ÓÃʵ¼ù 4-12[ÔÚÏß]
DeepSeek´óÄ£ÐÍÓ¦Óÿª·¢Êµ¼ù 4-19[ÔÚÏß]
UAF¼Ü¹¹ÌåϵÓëʵ¼ù 4-11[±±¾©]
AIÖÇÄÜ»¯Èí¼þ²âÊÔ·½·¨Óëʵ¼ù 5-23[ÉϺ£]
»ùÓÚ UML ºÍEA½øÐзÖÎöÉè¼Æ 4-26[±±¾©]
ÒµÎñ¼Ü¹¹Éè¼ÆÓ뽨ģ 4-18[±±¾©]

MySQLË÷Òý±³ºóµÄÊý¾Ý½á¹¹
MySQLÐÔÄܵ÷ÓÅÓë¼Ü¹¹Éè¼Æ
SQL ServerÊý¾Ý¿â±¸·ÝÓë»Ö¸´
ÈÃÊý¾Ý¿â·ÉÆðÀ´ 10´óDB2ÓÅ»¯
oracleµÄÁÙʱ±í¿Õ¼äдÂú´ÅÅÌ
Êý¾Ý¿âµÄ¿çƽ̨Éè¼Æ

²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿â
¸ß¼¶Êý¾Ý¿â¼Ü¹¹Éè¼ÆÊ¦
HadoopÔ­ÀíÓëʵ¼ù
Oracle Êý¾Ý²Ö¿â
Êý¾Ý²Ö¿âºÍÊý¾ÝÍÚ¾ò
OracleÊý¾Ý¿â¿ª·¢Óë¹ÜÀí

GE Çø¿éÁ´¼¼ÊõÓëʵÏÖÅàѵ
º½Ìì¿Æ¹¤Ä³×Ó¹«Ë¾ Nodejs¸ß¼¶Ó¦Óÿª·¢
ÖÐÊ¢Òæ»ª ׿Խ¹ÜÀíÕß±ØÐë¾ß±¸µÄÎåÏîÄÜÁ¦
ijÐÅÏ¢¼¼Êõ¹«Ë¾ PythonÅàѵ
ij²©²ÊITϵͳ³§ÉÌ Ò×ÓÃÐÔ²âÊÔÓëÆÀ¹À
ÖйúÓÊ´¢ÒøÐÐ ²âÊÔ³ÉÊì¶ÈÄ£Ðͼ¯³É(TMMI)
ÖÐÎïÔº ²úÆ·¾­ÀíÓë²úÆ·¹ÜÀí