Hadoop ×÷Ϊһ¸ö½ÏͨÓõĺ£Á¿Êý¾Ý´¦ÀíÆ½Ì¨£¬Ã¿´ÎÔËËã¶¼»áÐèÒª´¦Àí´óÁ¿Êý¾Ý£¬ÎÒÃÇ»áÔÚ
Hadoop ϵͳÖжÔÊý¾Ý½øÐÐѹËõ´¦ÀíÀ´ÓÅ»¯´ÅÅÌʹÓÃÂÊ£¬Ìá¸ßÊý¾ÝÔÚ´ÅÅ̺ÍÍøÂçÖеĴ«ÊäËÙ¶È£¬´Ó¶øÌá¸ßϵͳ´¦ÀíÊý¾ÝµÄЧÂÊ¡£ÔÚʹÓÃѹËõ·½Ê½·½Ã棬Ö÷Òª¿¼ÂÇѹËõËٶȺÍѹËõÎļþµÄ¿É·Ö¸îÐÔ¡£×ÛºÏËùÊö£¬Ê¹ÓÃѹËõµÄÓŵãÈçÏ£º½ÚÊ¡Êý¾ÝÕ¼ÓõĴÅÅ̿ռ䣻¼Ó¿ìÊý¾ÝÔÚ´ÅÅ̺ÍÍøÂçÖеĴ«ÊäËÙ¶È£¬´Ó¶øÌá¸ßϵͳµÄ´¦ÀíËÙ¶È¡£
Òý×Ó
Ëæ×ÅÔÆÊ±´úµÄÀ´ÁÙ£¬´óÊý¾Ý£¨Big data£©Ò²»ñµÃÁËÔ½À´Ô½¶àµÄ¹Ø×¢¡£ÖøÔÆÌ¨µÄ·ÖÎöʦÍŶÓÈÏΪ£¬´óÊý¾Ý£¨Big
data£©Í¨³£ÓÃÀ´ÐÎÈÝÒ»¸ö¹«Ë¾´´ÔìµÄ´óÁ¿·Ç½á¹¹»¯ºÍ°ë½á¹¹»¯Êý¾Ý£¬ÕâЩÊý¾ÝÔÚÏÂÔØµ½¹ØÏµÐÍÊý¾Ý¿âÓÃÓÚ·ÖÎöʱ»á»¨·Ñ¹ý¶àʱ¼äºÍ½ðÇ®¡£´óÊý¾Ý·ÖÎö³£ºÍÔÆ¼ÆËãÁªÏµµ½Ò»Æð£¬ÒòΪʵʱµÄ´óÐÍÊý¾Ý¼¯·ÖÎöÐèÒªÏñ
MapReduce Ò»ÑùµÄ¿ò¼ÜÀ´ÏòÊýÊ®¡¢Êý°Ù»òÉõÖÁÊýǧµÄµçÄÔ·ÖÅ乤×÷¡£
¡°´óÊý¾Ý¡±ÔÚ»¥ÁªÍøÐÐÒµÖ¸µÄÊÇÕâÑùÒ»ÖÖÏÖÏ󣺻¥ÁªÍø¹«Ë¾ÔÚÈÕ³£ÔËÓªÖÐÉú³É¡¢ÀÛ»ýµÄÓû§ÍøÂçÐÐΪÊý¾Ý¡£ÕâЩÊý¾ÝµÄ¹æÄ£ÊÇÈç´ËÅÓ´ó£¬ÒÔÖÁÓÚ²»ÄÜÓÃ
G »ò T À´ºâÁ¿¡£ËùÒÔÈçºÎ¸ßЧµÄ´¦Àí·ÖÎö´óÊý¾ÝµÄÎÊÌâ°ÚÔÚÁËÃæÇ°¡£¶ÔÓÚ´óÊý¾ÝµÄ´¦ÀíÓÅ»¯·½Ê½ÓкܶàÖÖ£¬±¾ÎÄÖÐÖ÷Òª½éÉÜÔÚʹÓÃ
Hadoop ƽ̨ÖжÔÊý¾Ý½øÐÐѹËõ´¦ÀíÀ´Ìá¸ßÊý¾Ý´¦ÀíЧÂÊ¡£
ѹËõ¼ò½é
Hadoop ×÷Ϊһ¸ö½ÏͨÓõĺ£Á¿Êý¾Ý´¦ÀíÆ½Ì¨£¬Ã¿´ÎÔËËã¶¼»áÐèÒª´¦Àí´óÁ¿Êý¾Ý£¬ÎÒÃÇ»áÔÚ
Hadoop ϵͳÖжÔÊý¾Ý½øÐÐѹËõ´¦ÀíÀ´ÓÅ»¯´ÅÅÌʹÓÃÂÊ£¬Ìá¸ßÊý¾ÝÔÚ´ÅÅ̺ÍÍøÂçÖеĴ«ÊäËÙ¶È£¬´Ó¶øÌá¸ßϵͳ´¦ÀíÊý¾ÝµÄЧÂÊ¡£ÔÚʹÓÃѹËõ·½Ê½·½Ã棬Ö÷Òª¿¼ÂÇѹËõËٶȺÍѹËõÎļþµÄ¿É·Ö¸îÐÔ¡£×ÛºÏËùÊö£¬Ê¹ÓÃѹËõµÄÓŵãÈçÏ£º
1. ½ÚÊ¡Êý¾ÝÕ¼ÓõĴÅÅ̿ռ䣻
2. ¼Ó¿ìÊý¾ÝÔÚ´ÅÅ̺ÍÍøÂçÖеĴ«ÊäËÙ¶È£¬´Ó¶øÌá¸ßϵͳµÄ´¦ÀíËÙ¶È¡£
ѹËõ¸ñʽ
Hadoop ¶ÔÓÚѹËõ¸ñʽµÄÊÇ×Ô¶¯Ê¶±ð¡£Èç¹ûÎÒÃÇѹËõµÄÎļþÓÐÏàӦѹËõ¸ñʽµÄÀ©Õ¹Ãû£¨±ÈÈç
lzo£¬gz£¬bzip2 µÈ£©¡£Hadoop »á¸ù¾ÝѹËõ¸ñʽµÄÀ©Õ¹Ãû×Ô¶¯Ñ¡ÔñÏà¶ÔÓ¦µÄ½âÂëÆ÷À´½âѹÊý¾Ý£¬´Ë¹ý³ÌÍêÈ«ÊÇ
Hadoop ×Ô¶¯´¦Àí£¬ÎÒÃÇÖ»ÐèҪȷ±£ÊäÈëµÄѹËõÎļþÓÐÀ©Õ¹Ãû¡£
Hadoop ¶Ôÿ¸öѹËõ¸ñʽµÄÖ§³Ö, Ïêϸ¼ûÏÂ±í£º
±í 1. ѹËõ¸ñʽ

Èç¹ûѹËõµÄÎļþûÓÐÀ©Õ¹Ãû£¬ÔòÐèÒªÔÚÖ´ÐÐ MapReduce ÈÎÎñµÄʱºòÖ¸¶¨ÊäÈë¸ñʽ¡£
hadoop jar /usr/home/hadoop/hadoop-0.20.2/contrib/streaming/ hadoop-streaming-0.20.2-CD H3B4.jar -file /usr/home/hadoop/hello/mapper.py -mapper / usr/home/hadoop/hello/mapper.py -file /usr/home/hadoop/hello/ reducer.py -reducer /usr/home/hadoop/hello/reducer.py -input lzotest -output result4 - jobconf mapred.reduce.tasks=1*-inputformatorg.apache.hadoop.mapred.LzoTextInputFormat* |
ÐÔÄܶԱÈ
Hadoop ϸ÷ÖÖѹËõËã·¨µÄѹËõ±È£¬Ñ¹Ëõʱ¼ä£¬½âѹʱ¼ä¼ûϱí:
±í 2. ÐÔÄܶԱÈ

Òò´ËÎÒÃÇ¿ÉÒԵóö£º
1) Bzip2 ѹËõЧ¹ûÃ÷ÏÔÊÇ×îºÃµÄ£¬µ«ÊÇ bzip2 ѹËõËÙ¶ÈÂý£¬¿É·Ö¸î¡£
2) Gzip ѹËõЧ¹û²»Èç Bzip2£¬µ«ÊÇѹËõ½âѹËٶȿ죬²»Ö§³Ö·Ö¸î¡£
3) LZO ѹËõЧ¹û²»Èç Bzip2 ºÍ Gzip£¬µ«ÊÇѹËõ½âѹËÙ¶È×î¿ì£¡²¢ÇÒÖ§³Ö·Ö¸î£¡
ÕâÀïÌáһϣ¬ÎļþµÄ¿É·Ö¸îÐÔÔÚ Hadoop ÖÐÊǺܷdz£ÖØÒªµÄ£¬Ëü»áÓ°Ïìµ½ÔÚÖ´ÐÐ×÷ҵʱ
Map Æô¶¯µÄ¸öÊý£¬´Ó¶ø»áÓ°Ïìµ½×÷ÒµµÄÖ´ÐÐЧÂÊ£¡
ËùÓеÄѹËõËã·¨¶¼ÏÔʾ³öÒ»ÖÖʱ¼ä¿Õ¼äµÄȨºâ£¬¸ü¿ìµÄѹËõºÍ½âѹËÙ¶Èͨ³£»áºÄ·Ñ¸ü¶àµÄ¿Õ¼ä¡£ÔÚÑ¡ÔñʹÓÃÄÄÖÖѹËõ¸ñʽʱ£¬ÎÒÃÇÓ¦¸Ã¸ù¾Ý×ÔÉíµÄÒµÎñÐèÇóÀ´Ñ¡Ôñ¡£
ÏÂͼÊÇÔÚ±¾µØÑ¹ËõÓëͨ¹ýÁ÷½«Ñ¹Ëõ½á¹ûÉÏ´«µ½ BI µÄʱ¼ä¶Ô±È¡£

ͼ 1. ʱ¼ä¶Ô±È
ʹÓ÷½Ê½
MapReduce ¿ÉÒÔÔÚÈý¸ö½×¶ÎÖÐʹÓÃѹËõ¡£
1. ÊäÈëѹËõÎļþ¡£Èç¹ûÊäÈëµÄÎļþÊÇѹËõ¹ýµÄ£¬ÄÇôÔÚ±» MapReduce
¶Áȡʱ£¬ËüÃǻᱻ×Ô¶¯½âѹ¡£
2.MapReduce ×÷ÒµÖУ¬¶Ô Map Êä³öµÄÖмä½á¹û¼¯Ñ¹Ëõ¡£ÊµÏÖ·½Ê½ÈçÏ£º
1£©¿ÉÒÔÔÚ core-site.xml ÎļþÖÐÅäÖ㬴úÂëÈçÏÂ

ͼ 2. core-site.xml ´úÂëʾÀý
2£©Ê¹Óà Java ´úÂëÖ¸¶¨
conf.setCompressMapOut(true); conf.setMapOutputCompressorClass(GzipCode.class); |
×îºóÒ»ÐдúÂëÖ¸¶¨ Map Êä³ö½á¹ûµÄ±àÂëÆ÷¡£
3.MapReduce ×÷ÒµÖУ¬¶Ô Reduce Êä³öµÄ×îÖÕ½á¹û¼¯Ñ¹¡£ÊµÏÖ·½Ê½ÈçÏ£º
1£©¿ÉÒÔÔÚ core-site.xml ÎļþÖÐÅäÖ㬴úÂëÈçÏÂ

ͼ 3. core-site.xml ´úÂëʾÀý
2£©Ê¹Óà Java ´úÂëÖ¸¶¨
conf.setBoolean(¡°mapred.output.compress¡±,true); conf.setClass(¡°mapred.output.compression.codec¡±,GzipCode.class,CompressionCodec.class); |
×îºóÒ»ÐÐͬÑùÖ¸¶¨ Reduce Êä³ö½á¹ûµÄ±àÂëÆ÷¡£
ѹËõ¿ò¼Ü
ÎÒÃÇÇ°ÃæÒѾÌáµ½¹ý¹ØÓÚѹËõµÄʹÓ÷½Ê½£¬ÆäÖеÚÒ»ÖÖ¾ÍÊǽ«Ñ¹ËõÎļþÖ±½Ó×÷ΪÈë¿Ú²ÎÊý½»¸ø
MapReduce ´¦Àí£¬MapReduce »á×Ô¶¯¸ù¾ÝѹËõÎļþµÄÀ©Õ¹ÃûÀ´×Ô¶¯Ñ¡ÔñºÏÊʽâѹÆ÷´¦ÀíÊý¾Ý¡£ÄÇôµ½µ×ÊÇÔõôʵÏÖµÄÄØ£¿ÈçÏÂͼËùʾ£º

ͼ 4. ѹËõʵÏÖÇéÐÎ
ÎÒÃÇÔÚÅäÖà Job ×÷ÒµµÄʱºò£¬»áÉèÖÃÊý¾ÝÊäÈëµÄ¸ñʽ»¯·½Ê½£¬Ê¹Óà conf.setInputFormat()
·½·¨£¬ÕâÀïµÄÈë¿Ú²ÎÊýÊÇ TextInputFormat.class¡£
TextInputFormat.class ¼Ì³ÐÓÚ InputFormat.class£¬Ö÷ÒªÓÃÓÚ¶ÔÊý¾Ý½øÐÐÁ½·½ÃæµÄÔ¤´¦Àí¡£Ò»ÊǶÔÊäÈëÊý¾Ý½øÐÐÇз֣¬Éú³ÉÒ»×é
split£¬Ò»¸ö split »á·Ö·¢¸øÒ»¸ö mapper ½øÐд¦Àí£»¶þÊÇÕë¶Ôÿ¸ö split£¬ÔÙ´´½¨Ò»¸ö
RecordReader ¶ÁÈ¡ split ÄÚµÄÊý¾Ý£¬²¢°´ÕÕ<key,value>µÄÐÎʽ×éÖ¯³ÉÒ»Ìõ
record ´«¸ø map º¯Êý½øÐд¦Àí¡£´ËÀàÔÚ¶ÔÊý¾Ý½øÐÐÇзÖ֮ǰ£¬»áÊ×Ïȳõʼ»¯Ñ¹Ëõ½âѹ¹¤³ÌÀà CompressionCodeFactory.class£¬Í¨¹ý¹¤³§»ñȡʵÀý»¯µÄ±àÂë½âÂëÆ÷
CompressionCodec ºó¶ÔÊý¾Ý´¦Àí²Ù×÷¡£
ÏÂÃæÎÒÃÇÀ´ÏêϸµÄ¿´Ò»Ï´ÓѹËõ¹¤³§»ñÈ¡±àÂë½âÂëÆ÷µÄ¹ý³Ì¡£
ѹËõ½âѹ¹¤³§Àà CompressionCodecFactory
ѹËõ½âѹ¹¤³§Àà CompressionCodeFactory.class
Ö÷Òª¹¦ÄܾÍÊǸºÔð¸ù¾Ý²»Í¬µÄÎļþÀ©Õ¹ÃûÀ´×Ô¶¯»ñÈ¡Ïà¶ÔÓ¦µÄѹËõ½âѹÆ÷CompressionCodec.class£¬ÊÇÕû¸öѹËõ¿ò¼ÜµÄºËÐÄ¿ØÖÆÆ÷¡£ÎÒÃÇÀ´¿´ÏÂ
CompressionCodeFactory.class ÖеöÖØÒª·½·¨£º
1. ³õʼ»¯·½·¨

ͼ 5. ´úÂëʾÀý
¢Ù getCodeClasses(conf) ¸ºÔð»ñÈ¡¹ØÓÚ±àÂë½âÂëÆ÷ CompressionCodec.class
µÄÅäÖÃÐÅÏ¢¡£ÏÂÃæ½«»áÏêϸ½²½â¡£
¢Ú ĬÈÏÌí¼ÓÁ½ÖÖ±àÂë½âÂëÆ÷¡£µ± getCodeClass(conf) ·½·¨Ã»ÓжÁÈ¡µ½Ïà¹ØµÄ±àÂë½âÂëÆ÷
CompressionCodec.class µÄÅäÖÃÐÅϢʱ£¬ÏµÍ³»áĬÈÏÌí¼ÓÁ½ÖÖ±àÂë½âÂëÆ÷ CompressionCodec.class£¬·Ö±ðÊÇ
GzipCode.class ºÍ DefaultCode.class¡£
¢Û addCode(code) ´Ë·½·¨ÓÃÓÚ½«±àÂë½âÂëÆ÷ CompressionCodec.class
Ìí¼Óµ½ÏµÍ³»º´æÖС£ÏÂÃæ½«»áÏêϸ½²½â¡£
2. getCodeClasses(conf)

ͼ 6. ´úÂëʾÀý
¢Ù ÕâÀïÎÒÃÇ¿ÉÒÔ¿´£¬ÏµÍ³¶ÁÈ¡¹ØÓÚ±àÂë½âÂëÆ÷ CompressionCodec.class
µÄÅäÖÃÐÅÏ¢ÔÚ core-site.xml ÖÐ io.compression.codes Ï¡£ÎÒÃÇ¿´ÏÂÕâ¶ÎÅäÖÃÎļþ£¬ÈçÏÂͼËùʾ£º

ͼ 7. ´úÂëʾÀý
Value ±êÇ©ÖÐÊÇÿ¸ö±àÂë½âÂë CompressionCodec.class
µÄÍêÕû·¾¶£¬ÖмäÓöººÅ·Ö¸ô¡£ÎÒÃÇÖ»ÐèÒª½«×Ô¼ºÐèҪʹÓõ½µÄ±àÂë½âÂëÅäÖõ½´ËÊôÐÔÖУ¬ÏµÍ³¾Í»á×Ô¶¯¼ÓÔØµ½»º´æÖС£
³ýÁËÉÏÊöµÄÕâÖÖ·½Ê½ÒÔÍ⣬Hadoop ΪÎÒÃÇÌṩÁËÁíÒ»ÖÖ¼ÓÔØ·½Ê½£º´úÂë¼ÓÔØ¡£Í¬Ñù×îÖÕ½«ÐÅÏ¢ÅäÖÃÔÚ
io.compression.codes ÊôÐÔÖУ¬´úÂëÈçÏ£º
conf.set("io.compression.codecs","org.apache.hadoop.io.compress.DefaultCodec, org.apache.hadoop.io.compress.GzipCodec,com.hadoop.compression.lzo.LzopCodec");) |
3. addCode(code) ·½·¨Ìí¼Ó±àÂë½âÂëÆ÷

ͼ 8. ´úÂëʾÀý
addCodec(codec) ·½·¨Èë¿Ú²ÎÊýÊǸö±àÂë½âÂëÆ÷ CompressionCodec.class£¬ÕâÀïÎÒÃÇ»áÊ×ÏȽӴ¥µ½ËüµÄÒ»¸ö·½·¨¡£
¢Ù codec.getDefaultExtension() ·½·¨¿´·½·¨ÃûµÄ×ÖÃæÒâ˼ÎÒÃǾͿÉÒÔÖªµÀ£¬´Ë·½·¨ÓÃÓÚ»ñÈ¡´Ë±àÂë½âÂëËù¶ÔÓ¦ÎļþµÄÀ©Õ¹Ãû£¬±ÈÈ磬ÎļþÃûÊÇ
xxxx.gz2£¬ÄÇôÕâ¸ö·½·¨µÄ·µ»ØÖµ¾ÍÊÇ¡°.bz2¡±£¬ÎÒÃÇÀ´¿´Ï org.apache.hadoop.io.compress.BZip2Codec
´Ë·½·¨µÄʵÏÖ´úÂ룺

ͼ 9. ´úÂëʾÀý
¢Ú Codecs ÊÇÒ»¸ö SortedMap µÄʾÀý¡£ÕâÀïÓиöºÜÓÐÒâ˼µÄµØ·½£¬Ëü½«
Key Öµ£¬Ò²¾ÍÊÇͨ¹ý codec.getDefaultExtension() ·½·¨»ñÈ¡µ½µÄÎļþÀ©Õ¹Ãû½øÐÐÁ˷ת£¬¾Ù¸öÀý×Ó£¬±ÈÈçÎļþÃûÀ©Õ¹Ãû¡°.bz2¡±£¬½«ÎļþÃû·×ªÖ®ºó¾Í±ä³ÉÁË¡°2zb.¡±¡£
ϵͳ¼ÓÔØÍêËùÓеıàÂë½âÂëÆ÷ºó£¬ÎÒÃÇ¿ÉÒԵõ½ÕâÑùÒ»¸öÓÐÐòÓ³Éä±í£¬ÈçÏ£º

ͼ 10. ´úÂëʾÀý
ÏÖÔÚ±àÂë½âÂëÆ÷¶¼ÓÐÁË£¬ÎÒÃÇÔõôµÃµ½¶ÔÓ¦µÄ±àÂë½âÂëÆ÷ÄØ£¿¿´ÏÂÃæÕâ¸ö·½·¨¡£
4. getCodec() ·½·¨
´Ë·½·¨ÓÃÓÚ»ñÈ¡ÎļþËù¶ÔÓ¦µÄµÄ±àÂë½âÂëÆ÷ CompressionCodec.class¡£

ͼ 11. ´úÂëʾÀý
getCodec(Path) ·½·¨µÄÊäÈë²ÎÊýÊÇ Path ¶ÔÏ󣬱£´æ×ÅÎļþ·¾¶¡£
¢Ù ½«ÎļþÃû·×ª¡£Èç xxxx.bz2 ·×ª³É 2zb.xxxx¡£
¢Ú »ñÈ¡ codecs ¼¯ºÏÖÐ×î½Ó½ü 2zb.xxxx µÄÖµ¡£´Ë·½·¨Óзµ»ØÖµÍ¬ÑùÊǸö
SortMap ¶ÔÏó¡£
ÔÚÕâÀï¶Ô·µ»ØµÄ SortMap ¶ÔÏó½øÐеڶþ´Îɸѡ¡£
±àÂë½âÂëÆ÷ CompressionCodec
¸Õ¸ÕÔÚ½éÉÜѹËõ½âѹ¹¤³ÌÀà CompressionCodeFactory.class
µÄʱºò£¬ÎÒÃǶà´ÎÌáµ½ÁËѹËõ½âѹÆ÷ CompressionCodecclass£¬²¢ÇÒÎÒÃÇÔÚÉÏÎÄÖл¹Ìáµ½ÁËËüÆäÖеÄÒ»¸öÓÃÓÚ»ñÈ¡ÎļþÀ©Õ¹ÃûµÄ·½·¨
getDefaultExtension()¡£
ѹËõ½âѹ¹¤³ÌÀà CompressionCodeFactory.class
ʹÓõÄÊdzéÏ󹤳§µÄÉè¼ÆÄ£Ê½¡£ËüÊÇÒ»¸ö½Ó¿Ú£¬Öƶ¨ÁËһϵÁз½·¨£¬ÓÃÓÚ´´½¨Ìض¨Ñ¹Ëõ½âѹËã·¨¡£ÏÂÃæÎÒÃÇÀ´¿´Ï±ȽÏÖØÒªµÄ¼¸¸ö·½·¨£º
1. createOutputStream() ·½·¨¶ÔÊý¾ÝÁ÷½øÐÐѹËõ¡£

ͼ 12. ´úÂëʾÀý
´Ë·½·¨ÌṩÁË·½·¨ÖØÔØ¡£
¢Ù »ùÓÚÁ÷µÄѹËõ´¦Àí£»
¢Ú »ùÓÚѹËõ»ú Compress.class µÄѹËõ´¦Àí
2. createInputStream() ·½·¨¶ÔÊý¾ÝÁ÷½øÐнâѹ¡£

ͼ 13. ´úÂëʾÀý
ÕâÀïµÄ½âѹ·½·¨Í¬ÑùÌṩÁË·½·¨ÖØÔØ¡£
¢Ù »ùÓÚÁ÷µÄ½âѹ´¦Àí£»
¢Ú »ùÓÚ½âѹ»ú Decompressor.class µÄ½âѹ´¦Àí£»
¹ØÓÚѹËõ/½âѹÁ÷ÓëѹËõ/½âѹ»ú»áÔÚÏÂÃæµÄÎÄÕÂÖÐÎÒÃÇ»áÏêϸ½²½â¡£´Ë´¦ÔÝ×÷Á˽⡣
3. getCompressorType() ·µ»ØÐèÒªµÄ±àÂëÆ÷µÄÀàÐÍ¡£
getDefaultExtension() »ñÈ¡¶ÔÓ¦ÎļþÀ©Õ¹ÃûµÄ·½·¨¡£Ç°ÎÄÒÑÌáµ½¹ý£¬²»ÔÙ°½Êö¡£
ѹËõ»ú Compressor ºÍ½âѹ»ú Decompressor
Ç°ÃæÔÚ±àÂë½âÂëÆ÷²¿·ÖµÄ createInputStream() ºÍ createInputStream()
·½·¨ÖÐÎÒÃÇÌáµ½¹ý Compressor.class ºÍ Decompressor.class ¶ÔÏó¡£ÔÚ
Hadoop µÄʵÏÖÖУ¬Êý¾Ý±àÂëÆ÷ºÍ½âÂëÆ÷±»³éÏó³ÉÁËÁ½¸ö½Ó¿Ú£º
1. org.apache.hadoop.io.compress.Compressor;
2. org.apache.hadoop.io.compress.Decompressor;
ËüÃǹ涨ÁËһϵÁеķ½·¨£¬ËùÒÔÔÚ Hadoop ÄÚ²¿µÄ±àÂë/½âÂëË㷨ʵÏÖ¶¼ÐèҪʵÏÖ¶ÔÓ¦µÄ½Ó¿Ú¡£ÔÚʵ¼ÊµÄÊý¾ÝѹËõÓë½âѹËõ¹ý³Ì£¬Hadoop
ΪÓû§ÌṩÁËͳһµÄ I/O Á÷´¦Àíģʽ¡£
ÎÒÃÇ¿´Ò»ÏÂѹËõ»ú Compressor.class£¬´úÂëÈçÏ£º

ͼ 14. ´úÂëʾÀý
¢Ù setInput() ·½·¨½ÓÊÕÊý¾Ýµ½ÄÚ²¿»º³åÇø£¬¿ÉÒÔ¶à´Îµ÷Óã»
¢Ú needsInput() ·½·¨ÓÃÓÚ¼ì²é»º³åÇøÊÇ·ñÒÑÂú¡£Èç¹ûÊÇ false
Ôò˵Ã÷µ±Ç°µÄ»º³åÇøÒÑÂú£»
¢Û getBytesRead() ÊäÈëδѹËõ×Ö½ÚµÄ×ÜÊý£»
¢Ü getBytesWritten() Êä³öѹËõ×Ö½ÚµÄ×ÜÊý£»
¢Ý finish() ·½·¨½áÊøÊý¾ÝÊäÈëµÄ¹ý³Ì£»
¢Þ finished() ·½·¨ÓÃÓÚ¼ì²éÊÇ·ñÒѾ¶ÁÈ¡ÍêËùÓеĵȴýѹËõµÄÊý¾Ý¡£Èç¹û·µ»Ø
false£¬±íÃ÷ѹËõÆ÷Öл¹ÓÐδ¶ÁÈ¡µÄѹËõÊý¾Ý£¬¿ÉÒÔ¼ÌÐøÍ¨¹ý compress() ·½·¨¶ÁÈ¡£»
¢ß compress() ·½·¨»ñȡѹËõºóµÄÊý¾Ý£¬ÊÍ·Å»º³åÇø¿Õ¼ä£»
¢à reset() ·½·¨ÓÃÓÚÖØÖÃѹËõÆ÷£¬ÒÔ´¦ÀíеÄÊäÈëÊý¾Ý¼¯ºÏ£»
¢á end() ·½·¨ÓÃÓڹرսâѹËõÆ÷²¢·ÅÆúËùÓÐδ´¦ÀíµÄÊäÈ룻
¢â reinit() ·½·¨¸ü½øÒ»²½ÔÊÐíʹÓà Hadoop µÄÅäÖÃϵͳ£¬ÖØÖò¢ÖØÐÂÅäÖÃѹËõÆ÷£»
ΪÁËÌá¸ßѹËõЧÂÊ£¬²¢²»ÊÇÿ´ÎÓû§µ÷Óà setInput() ·½·¨£¬Ñ¹Ëõ»ú¾Í»áÁ¢¼´¹¤×÷£¬ËùÒÔ£¬ÎªÁË֪ͨѹËõ»úËùÓÐÊý¾ÝÒѾдÈ룬±ØÐëʹÓÃ
finish() ·½·¨¡£finish() µ÷ÓýáÊøºó£¬Ñ¹Ëõ»ú»º³åÇøÖб£³ÖµÄÒѾѹËõµÄÊý¾Ý£¬¿ÉÒÔ¼ÌÐøÍ¨¹ý
compress() ·½·¨»ñµÃ¡£ÖÁÓÚÒªÅжÏѹËõ»úÖÐÊÇ·ñ»¹ÓÐδ¶ÁÈ¡µÄѹËõÊý¾Ý£¬ÔòÐèÒªÀûÓà finished()
·½·¨À´Åжϡ£
ѹËõÁ÷ CompressionOutputStream ºÍ½âѹËõÁ÷ CompressionInputStream
ǰÎıàÂë½âÂëÆ÷²¿·ÖÌáµ½¹ý createInputStream() ·½·¨·µ»Ø
CompressionOutputStream ¶ÔÏó£¬createInputStream() ·½·¨·µ»Ø
CompressionInputStream ¶ÔÏó¡£ÕâÁ½¸öÀà·Ö±ð¼Ì³Ð×Ô java.io.OutputStream
ºÍ java.io.InputStream¡£´Ó¶øÎÒÃDz»ÄÑÀí½â£¬ÕâÁ½¸ö¶ÔÏóµÄ×÷ÓÃÁ˰ɡ£
ÎÒÃÇÀ´¿´Ï CompressionInputStream.class µÄ´úÂ룺

ͼ 15. ´úÂëʾÀý
¿ÉÒÔ¿´µ½ CompressionOutputStream ʵÏÖÁË OutputStream
µÄ close() ·½·¨ºÍ flush() ·½·¨£¬µ«ÓÃÓÚÊä³öÊý¾ÝµÄ write() ·½·¨ÒÔ¼°ÓÃÓÚ½áÊøÑ¹Ëõ¹ý³Ì²¢½«ÊäÈëдµ½µ×²ãÁ÷µÄ
finish() ·½·¨ºÍÖØÖÃѹËõ״̬µÄ resetState() ·½·¨»¹ÊdzéÏó·½·¨£¬ÐèÒª CompressionOutputStream
µÄ×ÓÀàʵÏÖ¡£
Hadoop ѹËõ¿ò¼ÜÖÐΪÎÒÃÇÌṩÁËÒ»¸öʵÏÖÁË CompressionOutputStream
ÀàͨÓõÄ×ÓÀà CompressorStream.class¡£

ͼ 16. ´úÂëʾÀý
CompressorStream.class ÌṩÁËÈý¸ö²»Í¬µÄ¹¹Ô캯Êý£¬CompressorStream
ÐèÒªµÄµ×²ãÊä³öÁ÷ out ºÍѹËõʱʹÓõÄѹËõÆ÷£¬¶¼×÷Ϊ²ÎÊý´«Èë¹¹Ô캯Êý¡£ÁíÒ»¸ö²ÎÊýÊÇ CompressorStream
¹¤×÷ʱʹÓõĻº³åÇø buffer µÄ´óС£¬¹¹Ôìʱ»áÀûÓÃÕâ¸ö²ÎÊý·ÖÅä¸Ã»º³åÇø¡£µÚÒ»¸ö¿ÉÒÔÊÖ¶¯ÉèÖûº³åÇø´óС£¬µÚ¶þ¸öĬÈÏ
512£¬µÚÈý¸öûÓлº³åÇøÇÒ²»¿ÉʹÓÃѹËõÆ÷¡£

ͼ 17. ´úÂëʾÀý
ÔÚ write()¡¢compress()¡¢finish() ÒÔ¼° resetState()
·½·¨ÖУ¬ÎÒÃÇ·¢ÏÖÁËѹËõ»ú Compressor µÄÉíÓ°£¬Ç°ÃæÎÄÕÂÎÒÃÇÒѾ½éÉܹýѹËõ»úµÄµÄʵÏÖ¹ý³Ì£¬Í¨¹ýµ÷ÓÃ
setInput() ·½·¨½«´ýѹËõÊý¾ÝÌî³äµ½ÄÚ²¿»º³åÇø£¬È»ºóµ÷Óà needsInput() ·½·¨¼ì²é»º³åÇøÊÇ·ñÒÑÂú£¬Èç¹û»º³åÇøÒÑÂú£¬½«µ÷ÓÃ
compress() ·½·¨¶ÔÊý¾Ý½øÐÐѹËõ¡£Á÷³ÌÈçÏÂͼËùʾ£º

ͼ 18. µ÷ÓÃÁ÷³Ìͼ
½áÊøÓï
±¾ÎÄÉîÈëµ½ Hadoop ƽ̨ѹËõ¿ò¼ÜÄÚ²¿£¬¶ÔÆäºËÐÄ´úÂëÒÔ¼°¸÷ѹËõ¸ñʽµÄЧÂʽøÐжԱȷÖÎö£¬ÒÔ°ïÖú¶ÁÕßÔÚʹÓÃ
Hadoop ƽ̨ʱ£¬¿ÉÒÔͨ¹ý¶ÔÊý¾Ý½øÐÐѹËõ´¦ÀíÀ´Ìá¸ßÊý¾Ý´¦ÀíЧÂÊ¡£µ±ÔÙ´ÎÃæÁÙº£Á¿Êý¾Ý´¦Àíʱ£¬ Hadoop
ƽ̨µÄѹËõ»úÖÆ¿ÉÒÔÈÃÎÒÃÇʰ빦±¶¡£
|