²»Í¬³¡¾°µÄģʽºÍʾÀý
MapReduce ´¦ÀíΪ´¦ÀíºÍ¹¹½¨²»Í¬ÀàÐ͵IJéѯ´´½¨ÁËÒ»ÕûÌ×з¶ÀýºÍ½á¹¹¡£È»¶ø£¬Òª×î³ä·ÖµØÀûÓÃ
Hadoop£¬Òâζ×ÅÒª±àдºÏÊ浀 MapReduce ²éѯÀ´´¦ÀíÐÅÏ¢¡£±¾ÎĽéÉÜÐí¶à²»Í¬µÄ³¡¾°£¬ÆäÖаüº¬ÈçºÎ¿ª·¢²»Í¬ÀàÐ͵IJéѯµÄʳÆ×ʽʾÀý¡£
¸ß¼¶Îı¾´¦Àí
´¦ÀíÎı¾ÊÇ MapReduce Á÷³ÌµÄÒ»ÖÖ³£¼ûÓ÷¨£¬ÒòΪÎı¾´¦ÀíÏà¶Ô¸´ÔÓÇÒÊÇ´¦ÀíÆ÷×ÊÔ´Ãܼ¯µÄ´¦Àí¡£»ù±¾µÄ×ÖÊýͳ¼Æ³£³£ÓÃÓÚÑÝʾ
Haddoop ´¦Àí´óÁ¿Îı¾ºÍ»ù±¾»ã×Ü´óÌåÄÚÈݵÄÄÜÁ¦¡£
Òª»ñµÃ×ÖÊý£¬½«Îı¾´ÓÒ»¸öÊäÈëÎļþÖвð·Ö£¨Ê¹ÓÃÒ»¸ö»ù±¾µÄ string tokenizer£©Îª¸÷¸ö°üº¬¼ÆÊýµÄµ¥´Ê£¬²¢Ê¹ÓÃÒ»¸ö
Reduce À´¼ÆËãÿ¸öµ¥´ÊµÄÊýÁ¿¡£ÀýÈ磬´Ó¶ÌÓï the quick brown fox jumps over
the lazy dog ÖУ¬Map ½×¶ÎÉú³ÉÇåµ¥ 1 ÖеÄÊä³ö¡£
Çåµ¥ 1. Map ½×¶ÎµÄÊä³ö
the, 1 quick, 1 brown, 1 fox, 1 jumps, 1 over, 1 the, 1 lazy, 1 dog, 1 |
Reduce ½×¶ÎÈ»ºóºÏ¼ÆÃ¿¸öΩһµÄµ¥´Ê³öÏֵĴÎÊý£¬µÃµ½Çåµ¥ 2 ÖÐËùʾµÄÊä³ö¡£
Çåµ¥ 2. Reduce ½×¶ÎµÄÊä³ö
the, 2 quick, 1 brown, 1 fox, 1 jumps, 1 over, 1 lazy, 1 dog, 1 |
¾¡¹Ü´Ë·½·¨ÊÊÓÃÓÚ»ù±¾µÄ×ÖÊýͳ¼Æ£¬µ«Äú³£³£Ï£Íûʶ±ðÖØÒªµÄ¶ÌÓï»òµ¥´ÊµÄ³öÏÖ¡£ÀýÈ磬»ñÈ¡ Amazon É϶Բ»Í¬Ó°Æ¬ºÍÊÓÆµµÄÆÀÂÛ¡£
ʹÓÃÀ´×Ô Stanford University ´óÊý¾ÝÏîÄ¿µÄÐÅÏ¢£¬Äú¿ÉÒÔÏÂÔØÓ°Æ¬ÆÀÂÛÊý¾Ý£¨²Î¼û ²Î¿¼×ÊÁÏ£©¡£¸ÃÊý¾Ý°üº¬£¨Amazon
Éϱ¨¸æµÄ£©ÔʼÆÀÂ󵀮À·ÖºÍÓÐÓÃÐÔ£¬ÈçÇåµ¥ 3 ÖÐËùʾ¡£
Çåµ¥ 3. ÏÂÔØÓ°Æ¬ÆÀÂÛÊý¾Ý
product/productId: B003AI2VGA review/userId: A3QYDL5CDNYN66 review/profileName: abra "a devoted reader" review/helpfulness: 0/0 review/score: 2.0 review/time: 1229040000 review/summary: Pretty pointless fictionalization review/text: The murders in Juarez are real. This movie is a badly acted fantasy of revenge and holy intercession. If there is a good movie about Juarez, I don't know what it is, but it is not this one. |
Çë×¢Ò⣬¾¡¹ÜÆÀÂÛÕ߸øÓ°Æ¬´òÁË 2 ·Ö£¨1 Ϊ×î²î£¬5 Ϊ×îºÃ£©£¬µ«ÆÀÂÛÄÚÈݽ«´ËӰƬÃèÊöΪһ²¿·Ç³£²îµÄӰƬ¡£ÎÒÃÇÐèÒªÒ»¸öÖÃÐÅ¶ÈÆÀ·Ö£¬ÒÔ±ãÄܹ»Á˽âËù¸øµÄÆÀ·ÖÓëʵ¼ÊµÄÆÀÂÛÊÇ·ñ±Ë´ËÆ¥Åä¡£
Ðí¶à¹¤¾ß¿ÉÓÃÓÚÖ´Ðи߼¶Æô·¢Ê½·ÖÎö£¬µ«»ù±¾µÄ´¦Àí¿ÉʹÓÃÒ»¸ö¼òµ¥µÄË÷Òý»òÕýÔò±í´ïʽÀ´ÊµÏÖ¡£È»ºó£¬ÎÒÃÇ¿Éͳ¼ÆÕýÃæºÍ¸ºÃæÕýÔò±í´ïʽƥÅäÊýÀ´»ñµÃÒ»²¿Ó°Æ¬µÄ·ÖÊý¡£
ͼ 1. ͳ¼ÆÕýÃæºÍ¸ºÃæÕýÔò±í´ïʽƥÅäÊýÀ´»ñµÃÒ»²¿Ó°Æ¬µÄ·ÖÊý

¸ÃͼÏÔʾÁËÈçºÎ´ÓÔʼÊý¾ÝµÄµ¥´Ê·ÖÊýÀ´»ñµÃӰƬ·ÖÊý
¶ÔÓÚ Map ²¿·Ö£¬Í³¼ÆÓ°Æ¬ÆÀÂÛÖи÷¸öµ¥´Ê»ò¶ÌÓïµÄÊýÁ¿£¬ÎªÕýÃæºÍ¸ºÃæÆÀ¼ÛÌṩµ¥¸ö¼ÆÊý¡£Map ²Ù×÷´Ó²úÆ·ÆÀÂÛÖÐͳ¼ÆÓ°Æ¬µÄ·ÖÊý£¬Reduce
²Ù×÷È»ºó°´²úÆ· ID »ã×ÜÕâЩ·ÖÊý£¬ÒÔÌṩÕýÃæ»ò¸ºÃæµÄÆÀ·Ö¡£Òò´Ë Map ÀàËÆÓÚÇåµ¥ 4¡£
Çåµ¥ 4. ΪÕýÃæºÍ¸ºÃæÆÀÂÛÌṩµ¥¸ö¼ÆÊýµÄ Map º¯Êý
// List of positive words/phrases static String[] pwords = {"good","excellent","brilliant movie"}; // List of negative words/phrases static String[] nwords = {"poor","bad","unwatchable"};
int count = 0;
for (String word : pwords) {
String REGEX = "\\b" + word + "\\b";
Pattern p = Pattern.compile(REGEX);
Matcher m = p.matcher(INPUT);
while(m.find()) {
count++;
}
for (String word : nwords) {
String REGEX = "\\b" + word + "\\b";
Pattern p = Pattern.compile(REGEX);
Matcher m = p.matcher(INPUT);
while(m.find()) {
count--;
}
}
output.collect(productId, count); |
Reduce È»ºó¿ÉÏñ´«Í³µÄÄÚÈÝÇóºÍÄÇÑù¼ÆËã¡£
Çåµ¥ 5. °´²úÆ· ID ¶ÔÕýÃæºÍ¸ºÃæÆÀÂÛÇóºÍµÄ Reduce º¯Êý
public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key,
Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
} |
½á¹ûÊÇÆÀÂÛµÄÖÃÐŶȷÖÊý¡£¿ÉÒÔÀ©Õ¹µ¥´ÊÁбíÀ´°üº¬ÄúÏëҪƥÅäµÄ¶ÌÓï¡£
¶ÁÈ¡ºÍдÈë JSON Êý¾Ý
JSON ÒѳÉΪһÖÖʵÓõÄÊý¾Ý½»»»¸ñʽ¡£ËüµÄʵÓÃÐÔÒ»¶¨³Ì¶ÈÉÏÔ´ÓÚËüµÄ¼òµ¥ÐÔÖʺͽṹ£¬ÒÔ¼°ÔÚÈç´Ë¶àµÄÓïÑԺͻ·¾³ÖнâÎöµÄÇáËÉÐÔ¡£
ÔÚ½âÎö´«ÈëµÄ JSON Êý¾Ýʱ£¬×î³£¼ûµÄ¸ñʽÊÇÿ¸ö·ûºÅÊäÈëÐÐÒ»Ìõ JSON ¼Ç¼¡£
Çåµ¥ 6. ÿ¸ö·ûºÅÊäÈëÐÐÒ»Ìõ JSON ¼Ç¼
{ "productId" : "B003AI2VGA", "score": 2.0, "text" : """} { "productId" : "B007BI4DAT", "score": 3.4, "text" : """} { "productId" : "B006AI2FDH", "score": 4.1, "text" : """} |
´Ë´úÂë¿Éͨ¹ýʹÓúÏÊʵÄÀࣨ±ÈÈç GSON£©½«´«ÈëµÄ×Ö·û´®×ª»»Îª JSON ¶ÔÏóÀ´ÇáËɽâÎö¡£½«´Ë·½·¨ÓÃÓÚ
GSON ʱ£¬Äú½«ÐèҪȥÐòÁл¯µ½Ò»¸öÔ¤ÏÈÈ·¶¨µÄÀàÖС£
Çåµ¥ 7. È¥ÐòÁл¯µ½Ò»¸öÔ¤ÏÈÈ·¶¨µÄÀàÖÐ
class amazonRank { private String productId; private float score; private String text; amazonRank() { } } |
½âÎö´«ÈëµÄÎı¾£¬ÈçÏÂËùʾ¡£
Çåµ¥ 8. ½âÎö´«ÈëµÄÎı¾
public void map(Object key, Text value, Context context) throws IOException, InterruptedException { try {
amazonRank rank = gson.fromJson(value.toString(),amazonRank.class);
... |
ҪдÈë JSON Êý¾Ý£¬¿ÉÖ´ÐÐÏà·´µÄ²Ù×÷¡£´´½¨ÄúÏëÒªÓë MapReduce ¶¨ÒåÄÚµÄ JSON Êä³öÆ¥ÅäµÄÊä³öÀ࣬ȻºóʹÓÃ
GSON Àཫ´Ëת»»Îª´Ë½á¹¹µÄÒ»ÖÖ JSON ±íʾ¡£
Çåµ¥ 9. дÈë JSON Êý¾Ý
class recipeRecord { private String recipe; private String recipetext; private int recipeid; private float calories; private float fat; private float weight; recipeRecord() { } } |
ÏÖÔÚÄú¿ÉÔÚÊä³öÆÚ¼äÌî³ä¶ÔÏóµÄÒ»¸öʵÀý£¬½«Ëüת»»Îªµ¥Ìõ JSON ¼Ç¼¡£
Çåµ¥ 10. ÔÚÊä³öÆÚ¼äÌî³ä¶ÔÏóµÄÒ»¸öʵÀý
recipeNutrition recipe = new recipeRecord(); recipe.recipeid = key.toString(); recipe.calories = sum;
Gson json = new Gson();
output.collect(key, new Text(json.toJson(recipe))); |
Èç¹ûÄúÒªÔÚ Hadoop ´¦Àí×÷ÒµÖÐʹÓÃÒ»¸öµÚÈý·½¿â£¬ÇëÈ·±£½«¿â JAR ÎļþÓë MapReduce
´úÂë°üº¬ÔÚÒ»Æð£º$ jar -cvf recipenutrition.jar -C recipenutrition/*
google-gson/gson.jar¡£
¾¡¹ÜÔÚ Hadoop MapReduce ´¦ÀíÆ÷Ö®Í⣬µ«ÁíÒ»ÖÖÌæ´ú·½°¸ÊÇʹÓà Jaql£¬Ëü½«Ö±½Ó½âÎö²¢´¦Àí
JSON Êý¾Ý¡£
ºÏ²¢Êý¾Ý¼¯
Ò»¸ö MapReduce ×÷ÒµÖÐͨ³£Ö´ÐÐ 3 ÖÖÀàÐ͵ĺϲ¢£º
×éºÏ¶à¸ö¾ßÓÐÏàͬ½á¹¹µÄÎļþµÄÄÚÈÝ¡£
×éºÏ¶à¸öÄúÏëÒª×éºÏµÄ¾ßÓÐÀàËÆ½á¹¹µÄÎļþµÄÄÚÈÝ¡£
Áª½ÓÀ´×Ô¶à¸öÀ´Ô´µÄÓëÒ»¸öÌØ¶¨ ID »ò¹Ø¼ü´ÊÏà¹ØµÄÊý¾Ý¡£
µÚÒ»¸öÑ¡Ïî×îºÃ£¬ÔÚµäÐ굀 MapReduce ×÷ÒµÍⲿ´¦Àí£¬ÒòΪËü¿ÉʹÓà Hadoop Distributed
File System (HDFS) getmerge ²Ù×÷»òij¸öÀàËÆ²Ù×÷Íê³É¡£´Ë²Ù×÷½ÓÊܵ¥¸öĿ¼×÷ΪÄÚÈݲ¢Êä³öµ½Ò»¸öÖ¸¶¨Îļþ¡£ÀýÈ磬$
hadoop fs -getmerge srcfiles megafile ½« srcfiles Ŀ¼ÖеÄËùÓÐÎļþºÏ²¢µ½Ò»¸öÎļþÖУºmegafile¡£
ºÏ²¢ÀàËÆÎļþ
ÒªºÏ²¢ÀàËÆµ«²»µÈͬµÄÎļþ£¬Ö÷ÒªÎÊÌâÔÚÓÚÈçºÎʶ±ðÊäÈëʱʹÓõĸñʽÒÔ¼°ÈçºÎÖ¸¶¨Êä³öµÄ¸ñʽ¡£ÀýÈ磬¸ø¶¨Îļþ
name, phone, count ºÍµÚ¶þ¸öÎļþ name, email, phone, count£¬ÄúÒª¸ºÔðÈ·¶¨ÄĸöÎļþÊÇÕýÈ·µÄ²¢Ö´ÐÐ
Map À´Éú³ÉËùÐèµÄ½á¹¹¡£¶ÔÓÚ¸ü¸´ÔӵļǼ£¬Äú¿ÉÄÜÐèÒªÔÚ Map ½×¶Î¶Ô°üº¬ºÍ²»°üº¬¿ÕÖµµÄ×Ö¶ÎÖ´Ðиü¸´Ôӵĺϲ¢£¬ÒÔÉú³ÉÐÅÏ¢¡£
ÊÂʵÉÏ£¬Hadoop ²»ÊǴ˹ý³ÌµÄÀíÏëÑ¡Ôñ£¬³ý·ÇÄú»¹½«Ëü×÷Ϊ¼ò»¯¡¢Í³¼Æ»ò»¯¼òÐÅÏ¢µÄÒ»¸ö»ú»á¡£Ò²¾ÍÊÇ˵£¬Äúʶ±ð´«Èë¼Ç¼µÄÊýÁ¿£¬ÓÐÄÄЩ¿ÉÄܵĸñʽ£¬²¢ÔÚÄúÏëҪѡÔñµÄ×Ö¶ÎÉÏÖ´ÐÐ
Reduce¡£
Áª½Ó
¾¡¹ÜÓÐһЩDZÔڵĽâ¾ö·½°¸À´Ö´ÐÐÁª½Ó£¬µ«ËüÃdz£³£ÒÀÀµÓÚÒÔÒ»Öֽṹ»¯·½Ê½´¦ÀíÐÅÏ¢£¬È»ºóʹÓô˽ṹȷ¶¨¶ÔÊä³öÐÅÏ¢×öʲô¡£
¾ÙÀý¶øÑÔ£¬¸ø¶¨Á½Ìõ²»Í¬µÄÐÅÏ¢ÏßË÷£¨±ÈÈçµç×ÓÓʼþµØÖ·¡¢·¢Ë͵ĵç×ÓÓʼþÊýÁ¿£¬ÒÔ¼°½ÓÊյĵç×ÓÓʼþµØÖ·ÊýÁ¿£©£¬Ä¿µÄÔÚÓÚ½«Êý¾ÝºÏ²¢µ½Ò»ÖÖÊä³ö¸ñʽÖС£ÕâÊÇÊäÈëÎļþ£ºemail,
sent-count ºÍ email, received-count¡£Êä³öӦΪ´Ë¸ñʽ£ºemail, sent-count,
received-count¡£
´¦Àí´«ÈëµÄÎļþ²¢ÒÔ²»Í¬·½Ê½Êä³öÄÚÈÝ£¬ÒÔ±ã¿ÉÒÔ²»Í¬·½Ê½·ÃÎʺÍÉú³ÉÎļþºÍÊý¾Ý¡£È»ºóÒÀ¿¿ Reduce º¯ÊýÀ´Ö´Ðл¯¼ò¡£ÔÚ´ó¶àÊýÇé¿öÖУ¬Õ⽫ÊÇÒ»¸ö¶à½×¶Î¹ý³Ì£º
Ò»¸ö½×¶Î´¦Àí ¡°ÒÑ·¢Ë͵ġ± µç×ÓÓʼþ£¬ÒÔ email, fake#sent ÐÎʽÊä³öÐÅÏ¢
×¢Ò⣺ÎÒÃÇʹÓÃαǰ׺À´µ÷Õû˳Ðò£¬ÒÔ±ãÊý¾Ý¿É°´Î±Ç°×ºÀ´ºË¶Ô£¬¶ø²»°´ÊÕµ½µÄǰ׺À´ºË¶Ô¡£´Ë×ö·¨ÔÊÐíÊý¾Ý°´Ðé¼Ù¡¢°µº¬µÄ˳ÐòÁª½Ó¡£
Ò»¸ö½×¶Î´¦Àí ¡°ÒÑ·¢Ë͵ġ± µç×ÓÓʼþ£¬ÒÔ email, received ÐÎʽÊä³öÐÅÏ¢¡£
ÔÚ Map º¯Êý¶ÁÈ¡Îļþʱ£¬ËüÉú³ÉһЩÐС£
Çåµ¥ 11. Éú³ÉÐÐ
dev@null.org,0#sent dev@null.org, received |
Map ʶ±ðÊäÈë¼Ç¼²¢Êä³öÒ»¸ö´øÒ»¸ö¼üµÄͳһ°æ±¾¡£Êä³ö²¢Éú³É sent#received ½á¹¹À´´¦ÀíÄÚÈÝ£¬È·¶¨¸ÃÖµÓ¦ºÏ²¢ÔÚÒ»Æð»¹ÊÇ»ã×ÜΪһ¸öµ¥´¿ÊÕµ½µÄÖµ¡£
Çåµ¥ 12. Êä³öÒ»¸ö´øÒ»¸ö¼üµÄͳһ°æ±¾
int sent = 0; int received = 0; for (Text val : values) { String strVal = val.toString(); buf.append(strVal).append(","); if (strVal.contains("#")) { String[] tokens = strVal.split("#"); // If the content contains a hash, assume it's sent and received int recvthis = Integer.parseInt(tokens[0]); int sentthis = Integer.parseInt(tokens[1]); received = received + Integer.parseInt(recvthis); sent = sent _ sentthis; } else { // Otherwise, it's just the received value received = received + Integer.parseInt(strVal); } } context.write(key, IntWritable(sendReplyCount), new IntWritable(receiveReplyCount)); |
ÔÚ´ËÇé¿öÏ£¬ÎÒÃÇÒÀÀµÓÚ Hadoop ±¾ÉíÄڵϝ¼òÀ´°´¸Ã¼ü¼ò»¯Êä³öÊý¾Ý£¨ÔÚ´ËÇé¿öÏ£¬¸Ã¼üΪµç×ÓÓʼþµØÖ·£©£¬¼ò»¯ÎªÎÒÃÇÐèÒªµÄÐÅÏ¢¡£ÒòΪ¸ÃÐÅÏ¢ÊÇÒÔµç×ÓÓʼþΪ¼ü£¬ËùÒԼǼ¿ÉÒÔµç×ÓÓʼþΪ¼üÀ´ÇáËɵغϲ¢¡£
ʹÓüüµÄ¼¼ÇÉ
Çë¼Çס£¬MapReduce ¹ý³ÌµÄһЩ·½Ãæ¿ÉΪÎÒÃÇËùÓá£ÔÚ±¾ÖÊÉÏ£¬MapReduce ÊÇÒ»¸öÁ½½×¶Î¹ý³Ì£º
Map ½×¶Î·ÃÎÊÊý¾Ý£¬ÌôÑ¡ÄúÐèÒªµÄÐÅÏ¢£¬È»ºóÊä³ö¸ÃÐÅÏ¢£¬Ê¹ÓÃÒ»¸ö¼üºÍ¹ØÁªµÄÐÅÏ¢¡£
Reduce ½×¶ÎʹÓÃͨÓõļü½«Ó³ÉäµÄÊý¾ÝºÏ²¢¡¢»ã×Ü»òͳ¼ÆÎªÒ»ÖÖ¸ü¼òµ¥µÄÐÎʽ£¬´Ó¶ø¼ò»¯Êý¾Ý¡£
¼üÊÇÒ»¸öÖØÒªµÄ¸ÅÄÒòΪËü¿ÉÓÃÓÚÒÔ²»Í¬·½Ê½¸ñʽ»¯ºÍ»ã×ÜÊý¾Ý¡£ÀýÈ磬Èç¹ûÄú¼Æ»®»¯¼òÓйعú¼ÒºÍ³ÇÊÐÈ˿ڵÄÊý¾Ý£¬¿ÉÒÔ½öÊä³öÒ»¸ö¼üÀ´°´¹ú¼Ò»¯¼ò»ò»ã×ÜÊý¾Ý¡£
Çåµ¥ 13. ½öÊä³öÒ»¸ö¼ü
Òª°´¹ú¼ÒºÍ³ÇÊлã×Ü£¬¼üÊǶþÕߵĸ´ºÏ°æ±¾¡£
Çåµ¥ 14. ¼üÊǹú¼ÒºÍ³ÇÊеĸ´ºÏ°æ±¾
France#Paris France#Lyon France#Grenoble United Kingdom#Birmingham United Kingdom#London |
ÕâÊÇÒ»¸ö»ù±¾µÄ¼¼ÇÉ£¬¿ÉÔÚ´¦ÀíijЩÀàÐ͵ÄÊý¾ÝʱΪÎÒÃÇËùÓã¨ÀýÈç¾ßÓÐÒ»¸ö¹²Í¬¼üµÄ²ÄÁÏ£©£¬ÒòΪÎÒÃÇ¿ÉʹÓÃËüÄ£ÄâαÁª½Ó¡£´Ë¼¼ÇÉÔÚ×éºÏ²©¿ÍÎÄÕ£¨ÓµÓÐÒ»¸ö
blogpostid ÒÔ±ãÓÚʶ±ð£©ºÍ²©¿ÍÆÀÂÛ£¨ÓµÓÐÒ»¸ö blogpostid ºÍ blogcommentid£©Ê±Ò²ºÜÓÐÓá£
Òª»¯¼òÊä³ö£¨ÀýÈçͳ¼Æ²©¿ÍºÍÆÀÂÛÖеÄ×ÖÊý£©£¬ÎÒÃÇÊ×ÏÈͨ¹ý Map ´¦Àí²©¿ÍÎÄÕºͲ©¿ÍÆÀÂÛ£¬µ«ÎÒÃÇÊä³öÒ»¸öͨÓõÄ
ID¡£
Çåµ¥ 15. »¯¼òÊä³ö
blogpostid,the,quick,brown,fox blogpostid#blogcommentid,jumps,over,the,lazy,dog |
Õâ»áÃ÷ÏÔµØÊ¹ÓÃÁ½¸ö¼ü£¬½«ÐÅÏ¢Êä³öΪÁ½¸ö²»Í¬µÄÐÅÏ¢ÐС£ÎÒÃÇÒ²¿É·´×ªÕâÒ»¹ØÏµ¡£ÎÒÃÇ¿Éͨ¹ýÏòÿ¸öµ¥´ÊÌí¼ÓÆÀÂÛ
ID£¬´ÓÆÀÂÛÖÐÕë¶Ô blogpostid À´Ê¶±ðµ¥´Ê¡£
Çåµ¥ 16. ·´×ª¹ØÏµ
blogpostid,the,quick,brown,fox,jumps#blogcommentid,over#blogcommentid, the#blogcommentid,lazy#blogcommentid,dog#blogcommentid |
ÔÚ´¦ÀíÆÚ¼ä£¬ÎÒÃÇ¿Éͨ¹ý²é¿´ ID ¶ø»ñÖª¸Ãµ¥´ÊÊÇ·ñ¸½¼Óµ½²©¿ÍÎÄÕ£¬ÒÔ¼°ËüÊÇ·ñ°´¸Ã¸ñʽ¸½¼Óµ½²©¿ÍÎÄÕ»òÆÀÂÛ¡£
Ä£Ä⴫ͳµÄÊý¾Ý¿â²Ù×÷
Hadoop ÔÚÕæÕýÒâÒåÉϲ»ÊÇÒ»¸öÕæÕýµÄÊý¾Ý¿â£¬ÕâÒ»¶¨³Ì¶ÈÉÏÊÇÒòΪÎÒÃÇÎÞ·¨ÖðÐÐÖ´ÐиüС¢É¾³ý»ò²åÈë¡£¾¡¹ÜÕâÔÚÐí¶àÇé¿öϲ»ÊÇÎÊÌ⣨Äú¿É¶ÔÒª´¦ÀíµÄ»î¶¯Êý¾ÝÖ´ÐÐת´¢ºÍ¼ÓÔØ£©£¬µ«ÓÐʱÄú²»Ï£Íûµ¼³ö²¢ÖØÐ¼ÓÔØÊý¾Ý¡£
Ò»ÖÖ±ÜÃâµ¼³ö²¢ÖØÐ¼ÓÔØÊý¾ÝµÄ¼¼ÇÉÊÇ£¬´´½¨Ò»¸ö±ä¸üÎļþ£¬ÆäÖаüº¬À´×ÔÔʼת´¢ÎļþµÄÒ»¸ö²îÒìÁÐ±í¡£ÏÖÔÚÎÒÃÇÔÝʱºöÂÔ´Ó
SQL »òÆäËûÊý¾Ý¿âÉú³ÉÕâЩÊý¾ÝµÄ¹ý³Ì¡£Ö»ÒªÊý¾ÝÓÐÒ»¸öΩһ ID£¬ÎÒÃǾͿɽ«ËüÓÃ×÷¼ü£¬¾Í¿ÉÀûÓøüü¡£ÏÂÃæÀ´¿´Ò»¸öÀàËÆÓÚÇåµ¥
17 µÄÔ´Îļþ¡£
Çåµ¥ 17. Ô´Îļþ
1,London 2,Paris, 3,New York |
¼ÙÉèÓÐÒ»¸öÀàËÆÓÚÇåµ¥ 18 µÄ±ä¸üÎļþ¡£
Çåµ¥ 18. ±ä¸üÎļþ
1,DELETE 2,UPDATE,Munich 4,INSERT,Tokyo |
×îÖյóöÁ½¸öÎļþ¾¹ý½âÎöµÄºÏ²¢½á¹û£¬ÈçÇåµ¥ 19 Ëùʾ¡£
Çåµ¥ 19. Ô´ÎļþºÍ±ä¸üÎļþµÄºÏ²¢
2,Munich 3,New York 4,Tokyo |
ÎÒÃÇÈçºÎͨ¹ý Hadoop ʵÏÖÕâÑùÒ»Öֺϲ¢£¿
ʹÓà Hadoop ʵÏִ˺ϲ¢µÄÒ»ÖÖ·½Ê½ÊÇ£¬´¦Àíµ±Ç°Êý¾Ý²¢½«Ëüת»»Îª²åÈëÊý¾Ý£¨ÒòΪËüÃǶ¼ÊDzåÈëÄ¿±êÎļþÖеÄÐÂÊý¾Ý£©£¬È»ºó½«
UDPATE ²Ù×÷ת»»ÎªÐÂÊý¾ÝµÄ DELETE ºÍ INSERT ²Ù×÷¡£ÊÂʵÉÏ£¬Ê¹Óñä¸üÎļþ£¬Í¨¹ý½«ËüÐÞ¸ÄΪÇåµ¥
20 ÖеÄÄÚÈݸüÈÝÒ×ʵÏÖ´ËÄ¿µÄ¡£
Çåµ¥ 20. ͨ¹ý Hadoop ʵÏֺϲ¢
1,DELETE 2,DELETE 2,INSERT,Munich 4,INSERT,Tokyo |
ÎÊÌâÔÚÓÚ£¬ÎÒÃÇÎÞ·¨¶ÔÁ½¸öÎļþ½øÐÐÎïÀíºÏ²¢£¬µ«ÎÒÃÇ¿ÉÏàÓ¦µØ´¦ÀíËüÃÇ¡£Èç¹ûËüÊÇÒ»¸öÔʼµÄ INSERT »ò
DELETE£¬ÎÒÃÇ»áÊä³öÒ»¸ö´øÓмÆÊýÆ÷µÄ¼ü¡£Èç¹ûËüÊÇ´´½¨Ð²åÈëÊý¾ÝµÄ UPDATE ²Ù×÷£¬ÎÒÃÇÏëÒªÒ»¸ö²»»á»¯¼òµÄ²»Í¬µÄ¼ü£¬ËùÒÔÎÒÃÇÉú³ÉÒ»¸öÀàËÆÇåµ¥
21 µÄ¼ä϶ (interstitial) Îļþ¡£
Çåµ¥ 21. Éú³É¼ä϶Îļþ
1,1,London 2,1,Paris, 3,1,New York 1,-1,London 2,-1,Paris 2#NEW,Munich 4#NEW,1,Tokyo |
ÔÚ Reduce ÆÚ¼ä£¬ÎÒÃÇ»ã×Üÿ¸öΩһ¼üµÄ¼ÆÊýÆ÷µÄÄÚÈÝ£¬Éú³ÉÇåµ¥ 22¡£
Çåµ¥ 22. »ã×Üÿ¸öΩһ¼üµÄ¼ÆÊýÆ÷µÄÄÚÈÝ
1,0,London 2,0,Paris, 3,1,New York 2#NEW,1,Munich 4#NEW,1,Tokyo |
ÎÒÃÇÈ»ºó¿Éͨ¹ýÒ»¸ö¸¨Öú MapReduce º¯ÊýÔËÐÐÄÚÈÝ£¬Ê¹ÓÃÇåµ¥ 23 ÖÐËùʾµÄ»ù±¾½á¹¹¡£
Çåµ¥ 23. ͨ¹ýÒ»¸ö¸¨Öú MapReduce º¯ÊýÔËÐÐÄÚÈÝ
map: if (key contains #NEW): emit(row) if (count >0 ): emit(row) |
¸¨Öú MapReduce »áµÃµ½Ô¤ÆÚÊä³ö£¬ÈçÇåµ¥ 24 ÖÐËùʾ¡£
Çåµ¥ 24. ¸¨Öú MapReduce º¯ÊýµÄÔ¤ÆÚÊä³ö
3,1,New York 2,Munich 4,1,Tokyo |
ͼ 2 ÑÝʾÁËÕâ¸öÊ×Ïȸñʽ»¯ºÍ»¯¼ò¡¢È»ºó¼ò»¯Êä³öµÄÁ½½×¶Î¹ý³Ì¡£

ͼ 2. ¸ñʽ»¯¡¢»¯¼òºÍ¼ò»¯Êä³öµÄÁ½½×¶Î¹ý³Ì
ÔʼÊý¾ÝÔÚ Map ºÍ Reduce ½×¶ÎÖеõ½»¯¼òºÍÓ³Éä
Õâ¸ö¹ý³ÌÐèÒª±È´«Í³Êý¾Ý¿âÖиü¶àµÄ¹¤×÷£¬µ«ËüËùÌṩ½â¾ö·½°¸ÐèÒªµÄ¶Ô²»¶Ï¸üеÄÊý¾ÝµÄ½»»»¼òµ¥µÃ¶à¡£
½áÊøÓï
±¾ÎĽéÉÜÁËÐí¶àʹÓà MapReduce ²éѯµÄ²»Í¬³¡¾°¡£Äú¿´µ½ÁËÕâЩ²éѯÔÚ´¦Àí¸÷ÖÖÊý¾ÝÉϵÄÇ¿´ó¹¦ÄÜ£¬ÄúÏÖÔÚÓ¦Äܹ»ÔÚ×Ô¼ºµÄ
MapReduce ½â¾ö·½°¸ÖÐÀûÓÃÕâЩʾÀýÁË¡£
|