Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Model Center   Code  
»áÔ±   
   
 
     
   
 ¶©ÔÄ
  ¾èÖú
Ò»²½Ò»²½Ñ§Ï°´óÊý¾Ý£ºHadoopÉú̬ϵͳÓ볡¾°
 
À´Ô´£º51CTO ·¢²¼ÓÚ£º2017-8-22
  3056  次浏览      27
 

Hadoop¸ÅÒª

µ½µ×ÊÇÒµÎñÍÆ¶¯Á˼¼ÊõµÄ·¢Õ¹£¬»¹ÊǼ¼ÊõÍÆ¶¯ÁËÒµÎñµÄ·¢Õ¹£¬Õâ¸ö»°Ìâ·ÅÔÚʲôʱºò¶¼»áÈÇÀ´Ò»Ð©ÕùÒé¡£

Ëæ×Å»¥ÁªÍøÒÔ¼°ÎïÁªÍøµÄÅ·¢Õ¹£¬ÎÒÃǽøÈëÁË´óÊý¾Ýʱ´ú¡£IDCÔ¤²â£¬µ½2020Äê,È«Çò»áÓÐ44ZBµÄÊý¾ÝÁ¿¡£´«Í³´æ´¢ºÍ¼¼Êõ¼Ü¹¹ÎÞ·¨Âú×ãÐèÇó¡£ÔÚ2013Äê³ö°æµÄ¡¶´óÊý¾Ýʱ´ú¡·Ò»ÊéÖУ¬¶¨ÒåÁË´óÊý¾ÝµÄ5VÌØµã£ºVolume(´óÁ¿)¡¢Velocity(¸ßËÙ)¡¢Variety(¶àÑù)¡¢Value(µÍ¼ÛÖµÃܶÈ)¡¢Veracity(ÕæÊµÐÔ)¡£

µ±ÎÒÃǰÑʱ¼äÍù»Ø¿´10Ä꣬À´µ½ÁË2003Ä꣬ÕâÒ»ÄêGoogle·¢±í¡¶Google File System¡·£¬ÆäÖÐÌá³öÒ»¸öGFS¼¯ÈºÖÐÓɶà¸ö½Úµã×é³É£¬ÆäÖÐÖ÷Òª·ÖΪÁ½Àࣺһ¸öMaster node£¬ºÜ¶àChunkservers¡£Ö®ºóÓÚ2004ÄêGoogle·¢±íÂÛÎIJ¢ÒýÈëMapReduce¡£2006Äê2Ô£¬Doug CuttingµÈÈËÔÚNutchÏîÄ¿ÉÏÓ¦ÓÃGFSºÍ MapReduce˼Ïë,²¢ÑÝ»¯ÎªHadoopÏîÄ¿¡£

Doug CuttingÔø¾­Ëµ¹ýËû·Ç³£Ï²»¶×Ô¼ºµÄ³ÌÐò±»Ç§ÍòÈËʹÓõĸоõ£¬ºÜÃ÷ÏÔ£¬Ëû×öµ½ÁË;ÏÂͼ¾ÍÊDZ¾×ðÕÕÆ¬£¬Ë§ÆøµÄÒ»ËúºýÍ¿

2008Äê1ÔÂ, Hadoop³ÉΪApacheµÄ¿ªÔ´ÏîÄ¿¡£

HadoopµÄ³öÏÖ½â¾öÁË»¥ÁªÍøÊ±´úµÄº£Á¿Êý¾Ý´æ´¢ºÍ´¦Àí£¬ÆäÊÇÒ»ÖÖÖ§³Ö·Ö²¼Ê½¼ÆËãºÍ´æ´¢µÄ¿ò¼ÜÌåϵ¡£¼ÙÈç°ÑHadoop¼¯Èº³éÏó³Éһ̨»úÆ÷µÄ»°£¬ÀíÂÛÉÏÎÒÃǵÄÓ²¼þ×ÊÔ´(CPU¡¢MemoeryµÈ)ÊÇ¿ÉÒÔÎÞÏÞÀ©Õ¹µÄ¡£

Hadoopͨ¹ýÆä¸÷¸ö×é¼þÀ´À©Õ¹ÆäÓ¦Óó¡¾°£¬ÀýÈçÀëÏß·ÖÎö¡¢ÊµÊ±´¦ÀíµÈ¡£

HadoopÏà¹Ø×é¼þ½éÉÜ

±¾ÎÄÖ÷ÒªÊÇÒÀ¾ÝHadoop2.7°æ±¾£¬ºóÃæÃ»ÓÐÌØÊâ˵Ã÷Ò²Êǰ´Õմ˰汾

HDFS

HDFS,Hadoop Distributed File System (Hadoop·Ö²¼Ê½Îļþϵͳ)±»Éè¼Æ³ÉÊʺÏÔËÐÐÔÚͨÓÃÓ²¼þ(commodity hardware)Éϵķֲ¼Ê½Îļþϵͳ¡£ËüºÍÏÖÓеķֲ¼Ê½ÎļþϵͳÓкܶ๲ͬµã£¬ÀýÈçµäÐ͵ÄMaster/Slave¼Ü¹¹(ÕâÀï²»×¼±¸Õ¹¿ª½éÉÜ);È»¶øHDFSÊÇÒ»¸ö¸ß¶ÈÈÝ´íÐÔµÄϵͳ£¬Êʺϲ¿ÊðÔÚÁ®¼ÛµÄ»úÆ÷ÉÏ¡£

¹ØÓÚHDFSÖ÷ÒªÏë˵Á½µã¡£

1.HDFSÖеÄĬÈϸ±±¾ÊýÊÇ3£¬ÕâÀïÉæ¼°µ½Ò»¸öÎÊÌâΪʲôÊÇ3¶ø²»ÊÇ2»òÕß4¡£

2.»ú¼Ü¸ÐÖª(Rack Awareness)¡£

Ö»ÓÐÉî¿ÌÀí½âÁËÕâÁ½µã²ÅÄÜÀí½âΪʲôHadoopÓÐן߶ȵÄÈÝ´íÐÔ£¬¸ß¶ÈÈÝ´íÐÔÊÇHadoop¿ÉÒÔÔÚͨÓÃÓ²¼þÉÏÔËÐеĻù´¡¡£

Yarn

Yarn,Yet Another Resource Negotiator(ÓÖÒ»¸ö×ÊԴЭµ÷Õß)£¬ÊǼÌCommon¡¢HDFS¡¢MapReduceÖ®ºóHadoop µÄÓÖÒ»¸ö×ÓÏîÄ¿¡£YarnµÄ³öÏÖÊÇÒòΪÔÚHadoop1.xÖдæÔÚÈçϼ¸¸öÎÊÌ⣺

1.À©Õ¹ÐԲJobTracker¼æ±¸×ÊÔ´¹ÜÀíºÍ×÷Òµ¿ØÖÆÁ½¸ö¹¦ÄÜ¡£

2.¿É¿¿ÐԲÔÚMaster/Slave¼Ü¹¹ÖÐ,´æÔÚMasterµ¥µã¹ÊÕÏ¡£

3.×ÊÔ´ÀûÓÃÂʵ͡£Map Slot(1.xÖÐ×ÊÔ´·ÖÅäµÄµ¥Î»)ºÍReduce Slot·Ö¿ª,Á½ÕßÖ®¼äÎÞ·¨¹²Ïí¡£

4.ÎÞ·¨Ö§³Ö¶àÖÖ¼ÆËã¿ò¼Ü¡£MapReduce¼ÆËã¿ò¼ÜÊÇ»ùÓÚ´ÅÅ̵ÄÀëÏß¼ÆËã Ä£ÐÍ,ÐÂÓ¦ÓÃÒªÇóÖ§³ÖÄÚ´æ¼ÆËã¡¢Á÷ʽ¼ÆËã¡¢µü´úʽ¼ÆËãµÈ¶àÖÖ¼ÆËã¿ò¼Ü¡£

Yarnͨ¹ý²ð·ÖÔ­ÓеÄJobTrackerΪ£º

1.È«¾ÖµÄ ResourceManager(RM)¡£

2.ÿ¸öApplicationÓÐÒ»¸öApplicationMaster(AM)¡£

ÓÉYarnרߺÔð×ÊÔ´¹ÜÀí,JobTracker¿ÉÒÔרߺÔð×÷Òµ¿ØÖÆ,Yarn½ÓÌæ TaskSchedulerµÄ×ÊÔ´¹ÜÀí¹¦ÄÜ,ÕâÖÖËÉñîºÏµÄ¼Ü¹¹·½Ê½ ʵÏÖÁËHadoopÕûÌå¿ò¼ÜµÄÁé»îÐÔ¡£

Hive

HiveµÄÊÇ»ùÓÚHadoopÉϵÄÊý¾Ý²Ö¿â»ù´¡¹¹¼Ü£¬ÀûÓüòµ¥µÄSQLÓï¾ä(¼ò³ÆHQL)À´²éѯ¡¢·ÖÎö´æ´¢ÔÚHDFSµÄÊý¾Ý¡£²¢ÇÒ°ÑSQLÓï¾äת»»³ÉMapReduce³ÌÐòÀ´Êý¾ÝµÄ´¦Àí¡£

HiveÓ봫ͳµÄ¹ØÏµÊý¾Ý¿âÖ÷񻂿±ðÔÚÒÔϼ¸µã£º

1.´æ´¢µÄλÖà HiveµÄÊý¾Ý´æ´¢ÔÚHDFS»òÕßHbaseÖУ¬¶øºóÕßÒ»°ã´æ´¢ÔÚÂãÉ豸»òÕß±¾µØµÄÎļþϵͳÖС£

2.Êý¾Ý¿â¸üРHiveÊDz»Ö§³Ö¸üеģ¬Ò»°ãÊÇÒ»´ÎдÈë¶à´Î¶Áд¡£

3.Ö´ÐÐSQLµÄÑÓ³Ù HiveµÄÑÓ³ÙÏà¶Ô½Ï¸ß£¬ÒòΪÿ´ÎÖ´ÐÐHQLÐèÒª½âÎö³ÉMapReduce¡£

4.Êý¾ÝµÄ¹æÄ£ÉÏ HiveÒ»°ãÊÇTB¼¶±ð£¬¶øºóÕßÏà¶Ô½ÏС¡£

5.¿ÉÀ©Õ¹ÐÔÉÏ HiveÖ§³ÖUDF/UDAF/UDTF£¬ºóÕßÏà¶ÔÀ´Ëµ½Ï²î¡£

HBase

HBase£¬ÊÇHadoop Database£¬ÊÇÒ»¸ö¸ß¿É¿¿ÐÔ¡¢¸ßÐÔÄÜ¡¢ÃæÏòÁС¢¿ÉÉìËõµÄ·Ö²¼Ê½´æ´¢ÏµÍ³¡£Ëüµ×²ãµÄÎļþϵͳʹÓÃHDFS£¬Ê¹ÓÃZookeeperÀ´¹ÜÀí¼¯ÈºµÄHMasterºÍ¸÷Region serverÖ®¼äµÄͨÐÅ£¬¼à¿Ø¸÷Region serverµÄ״̬£¬´æ´¢¸÷RegionµÄÈë¿ÚµØÖ·µÈ¡£

HBaseÊÇKey-ValueÐÎʽµÄÊý¾Ý¿â(Àà±ÈJavaÖеÄMap)¡£ÄÇô¼ÈÈ»ÊÇÊý¾Ý¿âÄǿ϶¨¾ÍÓÐ±í£¬HBaseÖеıí´ó¸ÅÓÐÒÔϼ¸¸öÌØµã£º

1.´ó£ºÒ»¸ö±í¿ÉÒÔÓÐÉÏÒÚÐУ¬ÉϰÙÍòÁÐ(Áжàʱ£¬²åÈë±äÂý)¡£ÃæÏòÁУºÃæÏòÁÐ(×å)µÄ´æ´¢ºÍȨÏÞ¿ØÖÆ£¬ÁÐ(×å)¶ÀÁ¢¼ìË÷¡£

2.Ï¡Ê裺¶ÔÓÚΪ¿Õ(null)µÄÁУ¬²¢²»Õ¼Óô洢¿Õ¼ä£¬Òò´Ë£¬±í¿ÉÒÔÉè¼ÆµÄ·Ç³£Ï¡Êè¡£

3.ÿ¸öcellÖеÄÊý¾Ý¿ÉÒÔÓжà¸ö°æ±¾£¬Ä¬ÈÏÇé¿öϰ汾ºÅ×Ô¶¯·ÖÅ䣬Êǵ¥Ôª¸ñ²åÈëʱµÄʱ¼ä´Á¡£

4.HBaseÖеÄÊý¾Ý¶¼ÊÇ×Ö½Ú£¬Ã»ÓÐÀàÐÍ(ÒòΪϵͳÐèÒªÊÊÓ¦²»Í¬ÖÖÀàµÄÊý¾Ý¸ñʽºÍÊý¾ÝÔ´£¬²»ÄÜÔ¤ÏÈÑϸñ¶¨Òåģʽ)¡£

Spark

SparkÊÇÓɲ®¿ËÀû´óѧ¿ª·¢µÄ·Ö²¼Ê½¼ÆËãÒýÇæ£¬½â¾öÁ˺£Á¿Êý¾ÝÁ÷ʽ·ÖÎöµÄÎÊÌâ¡£SparkÊ×ÏȽ«Êý¾Ýµ¼ÈëSpark¼¯Èº£¬È»ºóÔÙͨ¹ý»ùÓÚÄÚ´æµÄ¹ÜÀí·½Ê½¶ÔÊý¾Ý½øÐпìËÙɨÃè £¬Í¨¹ýµü´úË㷨ʵÏÖÈ«¾ÖI/O²Ù×÷µÄ×îС»¯£¬´ïµ½ÌáÉýÕûÌå´¦ÀíÐÔÄܵÄÄ¿µÄ£¬ÕâÓëHadoop´Ó¡°¼ÆË㡱ÕÒ¡°Êý¾Ý¡±µÄʵÏÖ˼·ÊÇÀàËÆµÄ¡£

Other Tools

Phoneix

»ùÓÚHbaseµÄSQL½Ó¿Ú£¬°²×°ÍêPhoneixÖ®ºó¿ÉÒÔÊÊÓÃSQLÓï¾äÀ´²Ù×÷HbaseÊý¾Ý¿â¡£

Sqoop

SqoopµÄÖ÷Òª×÷ÓÃÊÇ·½±ã²»Í¬µÄ¹ØÏµÊý¾Ý¿â½«Êý¾ÝÇ¨ÒÆµ½Hadoop£¬Ö§³Ö¶àÖÖÊý¾Ý¿âÀýÈçPostgres£¬MysqlµÈ¡£

Hadoop¼¯ÈºÓ²¼þºÍÍØÆË¹æ»®

¹æ»®Õâ¼þÊÂÇ鲢ûÓÐ×îÓŽ⣬ֻÊÇÔÚÔ¤Ëã¡¢Êý¾Ý¹æÄ£¡¢Ó¦Óó¡¾°ÏÂÖ®¼äµÄƽºâ¡£

Ó²¼þÅäÖÃ

Raid

Ê×ÏÈRaidÊÇ·ñÐèÒª£¬ÔڻشðÕâ¸öÎÊÌâ֮ǰ£¬ÎÒÃÇÊ×ÏÈÁ˽âʲôÊÇRaid0ÒÔ¼°Raid1¡£

Raid0ÊÇÌá¸ß´æ´¢ÐÔÄܵÄÔ­ÀíÊǰÑÁ¬ÐøµÄÊý¾Ý·ÖÉ¢µ½¶à¸ö´ÅÅÌÉÏ´æÈ¡£¬ÕâÑù£¬ÏµÍ³ÓÐÊý¾ÝÇëÇó¾Í¿ÉÒÔ±»¶à¸ö´ÅÅ̲¢ÐеÄÖ´ÐУ¬Ã¿¸ö´ÅÅÌÖ´ÐÐÊôÓÚËü×Ô¼ºµÄÄDz¿·ÖÊý¾ÝÇëÇó¡£ÕâÖÖÊý¾ÝÉϵIJ¢ÐвÙ×÷¿ÉÒÔ³ä·ÖÀûÓÃ×ÜÏߵĴø¿í£¬ÏÔÖøÌá¸ß´ÅÅÌÕûÌå´æÈ¡ÐÔÄÜ¡£(À´Ô´°Ù¶È°Ù¿Æ)

µ±Raid0ÓëHadoop½áºÏÔÚÒ»Æð»á²úÉúʲôӰÏìÄØ?

ÓÅÊÆ£º

1.Ìá¸ßIO¡£

2.¼Ó¿ì¶Áд¡£

3.Ïû³ýµ¥¿é´ÅÅ̵ĶÁд¹ýÈȵÄÇé¿ö¡£

È»¶øÔÚHadoopϵͳÖУ¬µ±Raid0ÖеÄÒ»¿é´ÅÅÌÊý¾Ý³öÏÖÎÊÌâ(»òÕß¶Áд±äµÃºÜÂýµÄʱºò)ʱ£¬ÄãÐèÒªÖØÐ¸ñʽ»¯Õû¸öRaid£¬²¢ÇÒÊý¾ÝÐèÒªÖØÐ»ָ´µ½DataNodeÖС£Õû¸öÖÜÆÚ»áËæ×ÅÊý¾ÝµÄÔö¼Ó¶øÖð²½Ôö¼Ó¡£

Æä´ÎRaid0µÄÆ¿¾±ÊÇRaidÖÐ×îÂýµÄÄÇÒ»¿éÅÌ£¬µ±ÄãÐèÒªÌæ»»ÆäÖÐ×îÂýµÄÄÇÒ»¿éÅ̵Äʱºò¾Í»áÖØÐ¸ñʽ»¯Õû¸öRaidÈ»ºó»Ö¸´Êý¾Ý¡£

RAID 1ͨ¹ý´ÅÅÌÊý¾Ý¾µÏñʵÏÖÊý¾ÝÈßÓ࣬ÔڳɶԵĶÀÁ¢´ÅÅÌÉϲúÉú»¥ Ϊ±¸·ÝµÄÊý¾Ý¡£µ±Ô­Ê¼Êý¾Ý·±Ã¦Ê±£¬¿ÉÖ±½Ó´Ó¾µÏñ¿½±´ÖжÁÈ¡Êý¾Ý£¬Òò´ËRAID 1¿ÉÒÔÌá¸ß¶ÁÈ¡ÐÔÄÜ¡£RAID 1ÊÇ´ÅÅÌÕóÁÐÖе¥Î»³É±¾×î¸ßµÄ£¬µ«ÌṩÁ˺ܸߵÄÊý¾Ý°²È«ÐԺͿÉÓÃÐÔ¡£µ±Ò»¸ö´ÅÅÌʧЧʱ£¬ÏµÍ³¿ÉÒÔ×Ô¶¯Çл»µ½¾µÏñ´ÅÅÌÉ϶Áд£¬¶ø²»ÐèÒªÖØ×éʧЧµÄÊý¾Ý¡£(À´Ô´°Ù¶È°Ù¿Æ)

ËùÒÔRaid1µÄ±¾ÖÊÊÇÌá¸ßÊý¾ÝµÄÈßÓ࣬¶øHadoop±¾ÉíĬÈϾÍÊÇ3¸ö¸±±¾£¬ËùÒÔµ±´æÔÚRaid1ʱºò£¬¸±±¾Êý½«»á±ä³É6£¬½«»áÌá¸ßϵͳ¶ÔÓÚÓ²¼þ×ÊÔ´µÄÐèÇó¡£

ËùÒÔÔÚHadoopϵͳÖв»½¨ÒéÊÊÓÃRaidµÄ£¬Æäʵ¸ü¼ÓÍÆ¼öJBOD£¬µ±Ò»¿é´ÅÅ̳öÏÖÎÊÌâʱ£¬Ö±½ÓunmountÈ»ºóÌæ»»´ÅÅÌ(ºÜ¶àʱºòÖ±½Ó»»»úÆ÷µÄ)¡£

¼¯Èº¹æÄ£¼°×ÊÔ´

ÕâÀïÖ÷ÒªÒÀ¾ÝÊý¾Ý×ÜÁ¿À´ÍÆË㼯Ⱥ¹æÄ££¬²»¿¼ÂÇCPUÒÔÒÔ¼°ÄÚ´æÅäÖá£

Ò»°ãÇé¿öÀ´Ëµ£¬ÎÒÃÇÊǸù¾Ý´ÅÅ̵ĵÄÐèÇóÀ´¼ÆËãÐèÒª»úÆ÷µÄ¸öÊý¡£

Ê×ÏÈÎÒÃÇÐèÒªµ÷ÑÐÕû¸öϵͳµÄµ±Á¿ÒÔ¼°ÔöÁ¿Êý¾Ý¡£

¾Ù¸öÀý×ÓÀ´Ëµ£¬¼ÙÈçÏÖÔÚϵͳÖдæÔÚ8TµÄÊý¾Ý£¬Ä¬Èϸ±±¾ÊýΪ3£¬ÄÇôËùÐèÒªµÄ´æ´¢=8T*3/80% = 30T×óÓÒ¡£

ÿ̨»úÆ÷´æ´¢Îª6T£¬ÔòÊý¾Ý½Úµã¸öÊýΪ5¡£

¼ÓÉÏMaster½Úµã£¬²»¿¼ÂÇHAµÄÇé¿öÏ£¬´ó¸ÅÊÇ6̨×óÓÒ»úÆ÷¡£

Èí¼þÅäÖÃ

¸ù¾ÝÒµÎñÐèÇóÊÇ·ñÐèÒªÅäÖÃHA·½°¸½øÐл®·Ö,ÓÉÓÚʵ¼Ê³¡¾°¸´ÔÓ¶à±ä£¬ÏÂÃæ·½°¸½ö¹©²Î¿¼¡£

1.·ÇHA·½°¸

Ò»°ã¿¼Âǽ«ËùÓеĹÜÀí½Úµã·ÅÔÚһ̨»úÆ÷ÉÏ£¬Í¬Ê±ÔÚÊý¾Ý½ÚµãÉÏÆô¶¯Èô¸É¸öZookeeper·þÎñ(ÆæÊý)¡£

1.¹ÜÀí½Úµã£ºNameNode+ResourceManager+HMaster

2.Êý¾Ý½Úµã£ºSecondaryNameNode

3.Êý¾Ý½Úµã£ºDataNode +RegionServer+Zookeeper

2.HA·½°¸

ÔÚHA·½°¸ÖУ¬ÐèÒª½«Primary Node ÓëStandby Node ·ÅÔÚ²»Í¬µÄ»úÆ÷ÉÏ£¬Ò»°ãÔÚʵ¼Ê³¡¾°ÖУ¬¿¼Âǵ½½ÚÊ¡»úÆ÷£¬¿ÉÄܻὫ²»Í¬µÄ×é¼þµÄMaster½Úµã½øÐн»²æ»¥±¸£¬ÈçA»úÆ÷ÉÏÓÐPrimary NameNonde ÒÔ¼° Standby HMaster £¬B»úÆ÷ÉÏÓÐStandby NameNode ÒÔ¼° Primary Master¡£

1.¹ÜÀí½Ú µã£ºNameNode(Primary)+HMaster(Standby)

2.¹ÜÀí½Úµã£ºNameNode(Standby)+HMaster(Primary)

3.¹ÜÀí½Úµã£ºResourceManager

4.Êý¾Ý½Úµã£ºDataNode +RegionServer+Zookeeper

HadoopµÄÉè¼ÆÄ¿±êºÍÊÊÓó¡¾°

ÆäʵÔÚÉÏÃæµÄHadoop¸ÅÒªÉÏÎÒÃǾͿÉÒÔ¿´µ½Hadoopµ±³õµÄÉè¼ÆÄ¿±êÊÇʲô¡£HadoopÔںܶೡºÏ϶¼ÊÇ´óÊý¾ÝµÄ´úÃû´Ê¡£ÆäÖ÷ÒªÊÇÓÃÀ´´¦Àí°ë½á¹¹ÒÔ¼°·Ç½á¹¹Êý¾Ý(ÀýÈçMapReduce)¡£

Æä±¾ÖÊÒ²ÊÇͨ¹ýMapreduce³ÌÐòÀ´½«°ë½á¹¹»¯»òÕ߷ǽṹ»¯µÄÊý¾Ý½á¹¹»¯¼Ì¶øÀ´½øÐкóÐøµÄ´¦Àí¡£

Æä´ÎÓÉÓÚHadoopÊÇ·Ö²¼Ê½µÄ¼Ü¹¹£¬ÆäÕë¶ÔµÄÊÇ´ó¹æÄ£µÄÊý¾Ý´¦Àí£¬ËùÒÔÏà¶Ô½ÏÉÙµÄÊý¾ÝÁ¿²¢²»ÄÜÌåÏÖHadoopµÄÓÅÊÆ¡£ÀýÈç´¦ÀíGB¼¶±ðµÄÊý¾ÝÁ¿£¬ÀûÓô«Í³µÄ¹ØÏµÐÍÊý¾Ý¿âµÄËÙ¶È¿ÉÄÜÏà¶Ô½Ï¿ì¡£

»ùÓÚÉÏÊöÀ´¿´HadoopµÄÊÊÓó¡¾°ÈçÏ£º

1.ÀëÏßÈÕÖ¾µÄ´¦Àí(°üÀ¨ETL¹ý³Ì£¬Æäʵ±¾ÖʾÍÊÇ»ùÓÚHadoopµÄÊý¾Ý²Ö¿â)¡£

2.´ó¹æÄ£²¢ÐмÆËã¡£

HadoopµÄ¼Ü¹¹½âÎö

HadoopÓÉÖ÷ÒªÓÉÁ½²¿·Ö×é³É£º

1.·Ö²¼Ê½Îļþϵͳ(HDFS)£¬Ö÷ÒªÓÃÓÚ´ó¹æÄ£µÄÊý¾Ý´æ´¢¡£

2.·Ö²¼Ê½¼ÆËã¿ò¼ÜMapReduce£¬ÆäÖ÷ÒªÓÃÀ´¶ÔHDFSÉϵÄÊý¾Ý½øÐÐÔËËã´¦Àí¡£

HDFSÖ÷ÒªÓÉNameNode(Master)ÒÔ¼°DataNode(Slave)×é³É¡£Ç°ÕßÖ÷ÒªÊǶÔÃüÃû¿Õ¼ä¹ÜÀí£ºÈç¶ÔHDFSÖеÄĿ¼¡¢ÎļþºÍ¿é×öÀàËÆ ÎļþϵͳµÄ´´½¨¡¢Ð޸ġ¢É¾³ý¡¢ÁбíÎļþºÍĿ¼µÈ»ù±¾²Ù×÷¡£ºóÕߴ洢ʵ¼ÊµÄÊý¾Ý¿é£¬²¢ÓëNameNode±£³ÖÒ»¶¨µÄÐÄÌø¡£

MapReduce2.0µÄ¼ÆËã¿ò¼Ü±¾ÖÊÊÇÓÐYarnÀ´Íê³ÉµÄ£¬YarnÊǹØ×¢µã·ÖÀëµÄ˼·£¬ÓÉYarnרߺÔð×ÊÔ´¹ÜÀí £¬JobTracker¿ÉÒÔרߺÔð×÷Òµ¿ØÖÆ£¬Yarn½ÓÌæ TaskSchedulerµÄ×ÊÔ´¹ÜÀí¹¦ÄÜ£¬ÕâÖÖËÉñîºÏµÄ¼Ü¹¹·½Ê½ ʵÏÖÁËHadoopÕûÌå¿ò¼ÜµÄÁé»îÐÔ¡£

MapReduce¹¤×÷Ô­ÀíºÍ°¸Àý˵Ã÷

MapReduce¿ÉνHadoopµÄ¾«»ªËùÔÚ£¬ÊÇÓÃÓÚÊý¾Ý´¦ÀíµÄ±à³ÌÄ£ÐÍ¡£MapReduce´ÓÃû³ÆÉÏÃæ¿ÉÒÔ¿´µ½MapÒÔ¼°ReduceÁ½¸ö²¿·Ö¡£Æä˼ÏëÀàËÆÓÚÏÈ·ÖºóºÏ£¬Map¶ÔÓëÊý¾Ý½øÐгéȡת»»£¬Reduce¶ÔÊý¾Ý½øÐлã×Ü¡£ÆäÖÐÐèҪעÒâµÄÊÇMapÈÎÎñ½«Êä³ö½á¹û´æ´¢ÔÚ±¾µØ´ÅÅÌ,¶ø²»ÊÇHDFS¡£

ÔÚÎÒÃÇÖ´ÐÐMapReduceµÄ¹ý³ÌÖУ¬¸ù¾ÝMapÓëÊý¾Ý¿âµÄ¹ØÏµ´óÌåÉÏ¿ÉÒÔ·ÖΪÈýÀࣺ

1.Êý¾Ý±¾µØ

2.»ú¼Ü±¾µØ

3.¿ç»ú¼Ü

 

´ÓÉÏÊö¼¸ÖÖ¿ÉÒÔ¿´³öÀ´£¬¼ÙÉèÒ»¸öMapReduce¹ý³ÌÖдæÔÚ´óÁ¿µÄÊý¾ÝÒÆ¶¯¶ÔÓÚÖ´ÐÐЧÂÊÀ´ËµÊÇÔÖÄÑÐÔ¡£

MapReduceÊý¾ÝÁ÷

´ÓÊý¾ÝÁ÷À´¿´MapReduceµÄ¹ØÏµ´óÌå¿ÉÒÔ·ÖΪÒÔϼ¸Àࣺ

1.µ¥Reduce

2.¶àReduce

3.ÎÞReduce

È»¶øÎÞÂÛʲôMapReduce¹ØÏµÈçºÎ£¬MapReduceµÄÖ´ÐÐÁ÷³Ì¶¼ÈçÏÂͼËùʾ£º

ÆäÖÐÔÚÖ´ÐÐÿ¸öMap Taskʱ£¬ÎÞÂÛMap·½·¨ÖÐÖ´ÐÐʲôÂß¼­£¬×îÖÕ¶¼ÊÇÒª°ÑÊä³öдµ½´ÅÅÌÉÏ¡£Èç¹ûûÓÐReduce½×¶Î£¬ÔòÖ±½ÓÊä³öµ½HDFSÉÏ¡£Èç¹ûÓÐReduce×÷Òµ£¬Ôòÿ¸öMap·½·¨µÄÊä³öÔÚд´ÅÅÌǰÏßÔÚÄÚ´æÖлº´æ¡£Ã¿¸öMap Task¶¼ÓÐÒ»¸ö»·×´µÄÄڴ滺³åÇø£¬´æ´¢×ÅMapµÄÊä³ö½á¹û£¬Ä¬ÈÏ100m£¬ÔÚÿ´Îµ±»º³åÇø¿ìÂúµÄʱºòÓÉÒ»¸ö¶ÀÁ¢µÄÏ߳̽«»º³åÇøµÄÊý¾ÝÒÔÒ»¸öÒç³öÎļþµÄ·½Ê½´æ·Åµ½´ÅÅÌ£¬µ±Õû¸öMap Task½áÊøºóÔÙ¶Ô´ÅÅÌÖÐÕâ¸öMap Task²úÉúµÄËùÓÐÒç³öÎļþ×öºÏ²¢£¬±»ºÏ²¢³ÉÒÑ·ÖÇøÇÒÒÑÅÅÐòµÄÊä³öÎļþ¡£È»ºóµÈ´ýReduce TaskÀ´À­Êý¾Ý¡£

ÉÏÊöÕâ¸ö¹ý³ÌÆäʵҲMapReduceÖкպÕÓÐÃûµÄShuffle¹ý³Ì¡£

MapReduceʵ¼Ê°¸Àý

Raw DataԭʼµÄÊý¾ÝÎļþÊÇÆÕͨµÄÎı¾Îļþ£¬Ã¿Ò»ÐмǼÖдæÔÚÒ»¸öÄê·ÝÒÔ¼°¸ÄÄê·ÝÖÐÿһÌìµÄζȡ£

MapMap¹ý³ÌÖУ¬½«Ã¿Ò»ÐмǼ¶¼Éú³ÉÒ»¸ökey£¬keyÒ»°ãÊǸÄÐÐÔÚÎļþÖеÄÐÐÊý(Offset)£¬ÀýÈçÏÂͼÖеÄ0£¬106´ú±íµÚÒ»ÐС¢µÚ107ÐС£ÆäÖдÖÌåµÄµØ·½´ú±íÄê·ÝÒÔ¼°Î¶ȡ£

Shuffle¸Ã¹ý³ÌÖлñÈ¡ËùÒªµÄ¼Ç¼×é³É¼üÖµ¶Ô{Äê·Ý£¬Î¶È}¡£

Sort½«ÉÏÒ»²½¹ý³ÌÖеÄÏàͬkeyµÄvalue×é³ÉÒ»¸ölist£¬¼´{Äê·Ý£¬List<ζÈ>}£¬´«µ½Reduce¶Ë¡£

ReduceReduce¶Ë¶Ôlist½øÐд¦Àí£¬»ñÈ¡×î´óÖµ£¬È»ºóÊä³öµ½HDFSÖС£

ÉÏÊö¹ý³Ì½øÐÐ×ܽáÏÂÀ´Á÷³ÌÈçÏ£º

 

 

   
3056 ´Îä¯ÀÀ       27
Ïà¹ØÎÄÕÂ

»ùÓÚEAµÄÊý¾Ý¿â½¨Ä£
Êý¾ÝÁ÷½¨Ä££¨EAÖ¸ÄÏ£©
¡°Êý¾Ýºþ¡±£º¸ÅÄî¡¢ÌØÕ÷¡¢¼Ü¹¹Óë°¸Àý
ÔÚÏßÉ̳ÇÊý¾Ý¿âϵͳÉè¼Æ ˼·+Ч¹û
 
Ïà¹ØÎĵµ

GreenplumÊý¾Ý¿â»ù´¡Åàѵ
MySQL5.1ÐÔÄÜÓÅ»¯·½°¸
ijµçÉÌÊý¾ÝÖÐ̨¼Ü¹¹Êµ¼ù
MySQL¸ßÀ©Õ¹¼Ü¹¹Éè¼Æ
Ïà¹Ø¿Î³Ì

Êý¾ÝÖÎÀí¡¢Êý¾Ý¼Ü¹¹¼°Êý¾Ý±ê×¼
MongoDBʵս¿Î³Ì
²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿âÉè¼ÆÓëÓÅ»¯
PostgreSQLÊý¾Ý¿âʵսÅàѵ