±à¼ÍƼö: |
±¾ÎÄÀ´×ÔÓÚCSDN£¬±¾ÎÄÖ÷Òª½éÉÜÁË´óÊý¾ÝµÄ°²×°ÅäÖû·¾³¡¢HDFSÒÔ¼°YARN
- Hadoop ×ÊÔ´¹ÜÀíÆ÷µÈ¡£ |
|
Ò»¡¢Ê²Ã´ÊÇApache Hadoop£¿
1.1 ¶¨ÒåºÍÌØÐÔ
¿É¿¿µÄ¡¢¿ÉÀ©Õ¹µÄ¡¢·Ö²¼Ê½¼ÆË㿪ԴÈí¼þ¡£
Apache HadoopÈí¼þ¿âÊÇÒ»¸ö¿ò¼Ü£¬ÔÊÐíʹÓüòµ¥µÄ±à³ÌÄ£ÐÍ£¬ÔÚ¼ÆËã»ú¼¯Èº·Ö²¼Ê½µØ´¦Àí´óÐÍÊý¾Ý¼¯¡£
Ëü¿ÉÒÔ´Óµ¥¸ö·þÎñÆ÷À©Õ¹µ½Êýǧ̨»úÆ÷£¬Ã¿¸ö»úÆ÷¶¼Ìṩ±¾µØ¼ÆËãºÍ´æ´¢¡£
ÿһ̨¼ÆËã»ú¶¼ÈÝÒ׳öÏÖ¹ÊÕÏ£¬¿â±¾ÉíµÄÄ¿µÄÊǼì²âºÍ´¦ÀíÓ¦ÓòãµÄ¹ÊÕÏ£¬Òò´ËÔÚÒ»×鼯Ëã»úÉÏÌṩ¸ß¿ÉÓÃÐÔ·þÎñ£¬¶ø²»ÊÇÒÀ¿¿Ó²¼þÀ´Ìṩ¸ß¿ÉÓÃÐÔ¡£
1.2 Ö÷Ҫģ¿é£º
Hadoop Distributed File System(HDFS): Ò»¸ö·Ö²¼Ê½Îļþϵͳ£¬ËüÌṩ¶ÔÓ¦ÓóÌÐòÊý¾ÝµÄ¸ßÍÌÍÂÁ¿·ÃÎÊ¡£
Hadoop YARN: ×÷Òµµ÷¶ÈºÍ¼¯Èº×ÊÔ´¹ÜÀíµÄ¿ò¼Ü¡£
Hadoop MapReduce: »ùÓÚYARNµÄ´óÐÍÊý¾Ý¼¯²¢Ðд¦Àíϵͳ¡£
¶þ¡¢Hadoop°²×°£¨ÒÔhadoop-1.2.1ΪÀý£©
2.1 ×¼±¸Ìõ¼þ
Linux²Ù×÷ϵͳ
°²×°JDKÒÔ¼°ÅäÖÃÏà¹Ø»·¾³±äÁ¿
ÏÂÔØHadoop°²×°°ü£¬È磺hadoop-1.2.1.tar.gz£¨¹ÙÍøÏÂÔØµØÖ·£ºhttp://hadoop.apache.org/releases.html£©
2.2 °²×°
½«hadoop-1.2.1.tar.gz½âѹµ½Ö¸¶¨Ä¿Â¼£¬È磺/opt/hadoop-1.2.1/
2.3 ÅäÖÃhadoop»·¾³±äÁ¿
ÔÚ/etc/profileÖÐÅäÖÃÈçÏÂÐÅÏ¢£º
export JAVA_HOME=/opt/jdk1.8.0_131
export JRE_HOME=/opt/jdk1.8.0_131/jre
export HADOOP_HOME=/opt/hadoop-1.2.1
export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JRE_HOME/Lib
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$HADOOP_HOME/bin:$PATH |
2.4 ÐÞ¸ÄËĸöÅäÖÃÎļþ
ÕâËĸöÅäÖÃÎļþ¾ùÔÚ/opt/hadoop-1.2.1/conf/Ŀ¼Ï¡£
(a)ÐÞ¸Ähadoop-env.sh,ÉèÖÃJAVA_HOME:
# The java implementation
to use. Required.
export JAVA_HOME=/opt/jdk1.8.0_131 |
(b)ÐÞ¸Äcore-site.xml,ÉèÖÃhadoop.tmp.dir,dfs.name.dir,fs.default.name:
<configuration>
<property> <name>hadoop.tmp.dir</name>
<!-- hadoopÁÙʱ¹¤×÷Ŀ¼ --> <value>/home/jochen/hadoop</value>
</property> <property> <name>dfs.name.dir</name>
<!-- hadoopÔ´Êý¾ÝĿ¼ --> <value>/home/jochen/hadoop/name</value>
</property> <property> <name>fs.default.name</name>
<!-- Îļþϵͳnamenode => µØÖ·£º¶Ë¿ÚºÅ --> <value>hdfs://localhost:9000</value>
</property>
</configuration> |
(c)ÐÞ¸Ämapred-site.xml,ÉèÖÃmapred.job.tracker
<configuration>
<property> <name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration> |
(d)ÐÞ¸Ähdfs-site.xml,ÉèÖÃdfs.data.dir:
<configuration>
<property> <name>dfs.data.dir</name>
<!-- dfsÎļþ¿é´æ·ÅĿ¼ --> <value>/home/jochen/hadoop/data</value>
</property>
</configuration> |
2.5 ¸ñʽ»¯
Ö´ÐÐÃüÁ
$ hadoop namenode
-format |
ÕýÈ·Ö´ÐеĽá¹ûÈçÏÂËùʾ£º
Warning: $HADOOP_HOME
is deprecated.
17/05/19 23:46:05 INFO namenode.NameNode: STARTUP_MSG:
/*************************************************** *********
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = ubuntu/127.0.0.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 1.2.1
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common /branches/branch-1.2
-r 1503152; compiled by 'mattf' on Mon Jul 22
15:23:09 PDT 2013
STARTUP_MSG: java = 1.8.0_131
************************************************* ***********/
17/05/19 23:46:05 INFO util.GSet: Computing capacity
for map BlocksMap
17/05/19 23:46:05 INFO util.GSet: VM type = 64-bit
17/05/19 23:46:05 INFO util.GSet: 2.0% max memory
= 932184064
17/05/19 23:46:05 INFO util.GSet: capacity = 2^21
= 2097152 entries
17/05/19 23:46:05 INFO util.GSet: recommended=2097152,
actual=2097152
17/05/19 23:46:05 INFO namenode.FSNamesystem:
fsOwner=jochen
17/05/19 23:46:05 INFO namenode.FSNamesystem:
supergroup=supergroup
17/05/19 23:46:05 INFO namenode.FSNamesystem:
isPermissionEnabled=true
17/05/19 23:46:05 INFO namenode.FSNamesystem:
dfs.block.invalidate.limit=100
17/05/19 23:46:05 INFO namenode.FSNamesystem:
isAccessTokenEnabled=false accessKeyUpdateInterval =0
min(s), accessTokenLifetime=0 min(s)
17/05/19 23:46:05 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length
= 0
17/05/19 23:46:05 INFO namenode.NameNode: Caching
file names occuring more than 10 times
17/05/19 23:46:05 INFO common.Storage: Image file
/home/jochen/hadoop/dfs/name/current/fsimage of
size 112 bytes saved in 0 seconds.
17/05/19 23:46:06 INFO namenode.FSEditLog: closing
edit log: position=4, editlog=/home/jochen/hadoop/dfs/name/current/edits
17/05/19 23:46:06 INFO namenode.FSEditLog: close
success: truncate to 4, editlog=/home/jochen/hadoop/dfs/name/current/edits
17/05/19 23:46:06 INFO common.Storage: Storage
directory /home/jochen/hadoop/dfs/name has been
successfully formatted.
17/05/19 23:46:06 INFO namenode.NameNode: SHUTDOWN_MSG:
/******************************************* *****************
SHUTDOWN_MSG: Shutting down NameNode at ubuntu/127.0.0.1
******************************************** ****************/ |
2.6 Æô¶¯
$ cd /opt/hadoop-1.2.1/bin
$ ./start-all.sh |
2.7 ²é¿´µ±Ç°ÔËÐеÄjava½ø³Ì
ÔÚTerminalÊäÈëÃüÁ³öÏÖÈçϽá¹û±íʾhadoop°²×°³É¹¦£º
$ jps
12785 JobTracker
1161 Jps
23626 TaskTracker
23275 DataNode
21659 NameNode
23436 SecondaryNameNode¡¶Æª¡· |
Èý¡¢HDFS¼ò½é
3.1 HDFS»ù±¾¸ÅÄî
HDFSÉè¼Æ¼Ü¹¹

¿é£¨Block£©£º
HDFSµÄÎļþ±»·Ö³É¿é½øÐд洢
HDFS¿éµÄĬÈÏ´óСΪ64MB
¿éÊÇÎļþ´æ´¢´¦ÀíµÄÂß¼µ¥Ôª
¹ÜÀí½Úµã£¨NameNode£©£¬´æ·ÅÎļþÔªÊý¾Ý£º
ÎļþÓëÊý¾Ý¿éµÄÓ³Éä±í
Êý¾Ý¿éÓëÊý¾Ý½ÚµãµÄÓ³Éä±í
DataNode£º
DataNodeÊÇHDFSµÄ¹¤×÷½Úµã
´æ·ÅÊý¾Ý¿é
3.2 Êý¾Ý¹ÜÀí²ßÂÔÓëÈÝ´í
Êý¾Ý¿é¸±±¾£ºÃ¿¸öÊý¾Ý¿éÖÁÉÙ3¸ö¸±±¾£¬·Ö²¼ÔÚÁ½¸ö»ú¼ÜÄڵĶà¸ö½Úµã
ÐÄÌø¼ì²â£ºDataNode¶¨ÆÚÏòNameNode·¢ËÍÐÄÌøÏûÏ¢ 
¶þ¼¶NameNode£º¶þ¼¶NameNode¶¨ÆÚͬ²½ÔªÊý¾ÝÓ³ÏñÎļþºÍÐÞ¸ÄÈÕÖ¾£¬NameNode·¢Éú¹ÊÕÏʱ£¬¶þ¼¶NameNodeÌæ»»ÎªÖ÷NameNode

3.3 HDFSÖÐÎļþµÄ¶Áд²Ù×÷
HDFS¶ÁÈ¡ÎļþµÄÁ÷³Ì

HDFSдÈëÎļþµÄÁ÷³Ì

3.4 HDFSµÄÌØµã
Êý¾ÝÈßÓ࣬Ӳ¼þÈÝ´í
Á÷ʽµÄÊý¾Ý·ÃÎÊ£¨Ò»´ÎдÈë¡¢¶à´Î¶ÁÈ¡£©
Êʺϴ洢´óÎļþ
ÊÊÓÃÐԺ;ÖÏÞÐÔ
ÊʺÏÊý¾ÝÅúÁ¿¶Áд£¬ÍÌÍÂÁ¿¸ß
²»ÊʺϽ»»¥Ê½Ó¦Ó㬵ÍÑÓ³ÙºÜÄÑÂú×ã
ÊʺÏÒ»´ÎдÈë¶à´Î¶ÁÈ¡£¬Ë³Ðò¶Áд
²»Ö§³Ö¶àÓû§²¢·¢Ð´ÏàͬÎļþ
3.5 HDFSʹÓÃ
HDFSÃüÁîÐвÙ×÷£º
hadoop fs -ls
dirpath // ÁгöijĿ¼ÏµÄÎļþºÍĿ¼
hadoop fs -mkdir dirname // ÔÚHDFSÖÐн¨Ä¿Â¼
hadoop fs -put filepath dirpath // ½«±¾µØÎļþÉÏ´«µ½HDFS
hadoop fs -get filepath dirpath // ´ÓHDFSÏÂÔØÎļþµ½±¾µØ
hadoop fs -cat filepath // ²é¿´ÎļþÄÚÈÝ
hadoop dfsadmin -report // ²é¿´HDFSÐÅÏ¢ |
ËÄ¡¢MapReduce¼ò½é
4.1 MapReduceµÄÔÀí
·Ö¶øÖÎÖ®£¬Ò»¸ö´óÈÎÎñ·Ö³É¶à¸öСµÄ×ÓÈÎÎñ£¨map£©£¬²¢ÐÐÖ´Ðк󣬺ϲ¢½á¹û£¨reduce£©

4.2 MapReduceµÄÔËÐÐÁ÷³Ì
»ù±¾¸ÅÄî
Job(×÷Òµ) & Task(ÈÎÎñ)
Ò»¸öJob¿ÉÒԷֳɶà¸öTask£¨MapTask & ReduceTask£©
JobTracker£¨×÷Òµ¹ÜÀí½Úµã£©
¿Í»§¶ËÌá½»Job£¬JobTracker½«Æä·ÅÈëºòÑ¡¶ÓÁÐÖУ¬ÔÚÊʵ±µÄʱºò½øÐе÷¶È£¬½«Job²ð·Ö³É¶à¸öMapTaskºÍReduceTask£¬·Ö·¢¸øTaskTrackerÖ´ÐС£JobTrackerµÄ½ÇÉ«£º
×÷Òµµ÷¶È
·ÖÅäÈÎÎñ¡¢¼à¿ØÈÎÎñÖ´Ðнø¶È
¼à¿ØTaskTrackerµÄ״̬
TaskTracker£¨ÈÎÎñ¹ÜÀí½Úµã£©
ͨ³£TaskTrackerºÍHDFSµÄDataNodeÊôÓÚͬһ×éÎïÀí½Úµã£¬ÊµÏÖÁËÒÆ¶¯¼ÆËã´úÌæÒÆ¶¯Êý¾Ý£¬±£Ö¤¶ÁÈ¡Êý¾Ý¿ªÏú×îС¡£TaskTrackerµÄ½ÇÉ«£º
Ö´ÐÐÈÎÎñ
»ã±¨ÈÎÎñ״̬
MapReduceµÄÌåϵ½á¹¹

MapReduce×÷ÒµÖ´Ðйý³Ì

MapReduceµÄÈÝ´í»úÖÆ
ÖØ¸´Ö´ÐÐ
ĬÈÏΪ×î¶à4´Îºó·ÅÆú
ÍÆ²âÖ´ÐÐ
ÔÒò£ºËùÓÐMap¶ËÔËËãÍê³É£¬²Å¿ªÊ¼Ö´ÐÐReduce¶Ë¡£
×÷Ó㺱£Ö¤Õû¸öÈÎÎñµÄ¼ÆË㣬²»»áÒòΪijһÁ½¸öTaskTrackerµÄ¹ÊÕÏ£¬µ¼ÖÂÕû¸öÈÎÎñÖ´ÐÐЧÂʺܵ͡£
Îå¡¢YARN - Hadoop ×ÊÔ´¹ÜÀíÆ÷
YARNµÄ»ù±¾Ë¼ÏëÊǽ«×ÊÔ´¹ÜÀíºÍ×÷Òµµ÷¶È/¼à¿ØµÄ¹¦Äܲð·Öµ½²»Í¬µÄÊØ»¤½ø³Ì¡£ÕâÖÖ˼ÏëÐèÒªÓÐÒ»¸öÈ«¾ÖµÄ×ÊÔ´¹ÜÀíÆ÷£¨RM£©ºÍ£¨Ã¿¸öÓ¦ÓóÌÐò¶¼ÒªÓеģ©Ó¦ÓóÌÐò¹ÜÀíÆ÷£¨AM£©¡£
×ÊÔ´¹ÜÀíÆ÷£¨RM£©ºÍ½Úµã¹ÜÀíÆ÷£¨NodeManager£©ÐγÉÁËÊý¾Ý¼ÆËã¿ò¼Ü¡£×ÊÔ´¹ÜÀíÆ÷£¨RM£©ÊÇÔÚϵͳÖÐËùÓÐÓ¦ÓóÌÐò¼äÖÙ²Ã×ÊÔ´µÄ×îÖÕȨÍþ¡£½Úµã¹ÜÀíÆ÷£¨NodeManager£©ÊÇÿ̨»úÆ÷µÄ¿ò¼Ü´úÀí£¬¸ºÔðÈÝÆ÷µÄ¹ÜÀí£¬¼à¿ØËûÃǵÄ×ÊԴʹÓÃÇé¿ö(cpu¡¢ÄÚ´æ¡¢´ÅÅÌ¡¢ÍøÂç)£¬²¢Ïò×ÊÔ´¹ÜÀíÆ÷£¨RM£©/µ÷¶ÈÆ÷±¨¸æ¸ÃÇé¿ö¡£
ÿ¸öÓ¦ÓóÌÐòµÄÓ¦ÓóÌÐò¹ÜÀíÆ÷£¨AM£©Êµ¼ÊÉÏÊÇÒ»¸öÌØ¶¨µÄ¿ò¼ÜµÄ¿â£¬ËüµÄÈÎÎñÊÇÓë×ÊÔ´¹ÜÀíÆ÷£¨RM£©ÐÉÌ×ÊÔ´£¬²¢Óë½Úµã¹ÜÀíÆ÷£¨NodeManager£©Ò»Æð¹¤×÷À´Ö´ÐкͼàÊÓÈÎÎñ¡£
×ÊÔ´¹ÜÀíÆ÷£¨RM£©ÓÐÁ½¸öÖ÷Òª×é¼þ:µ÷¶È³ÌÐòºÍÓ¦ÓóÌÐò¹ÜÀíÆ÷£¨AM£©¡£
µ÷¶È³ÌÐò¸ºÔð½«×ÊÔ´·ÖÅ䏸¸÷ÖÖÔËÐеÄÓ¦ÓóÌÐò¡£µ÷¶È³ÌÐòÊÇ´¿´âµÄµ÷¶ÈÆ÷£¬ÒòΪËü²»Ö´ÐÐÓ¦ÓóÌÐòµÄ״̬¼àÊÓ»ò¸ú×Ù¡£ÁíÍ⣬ËüÒ²²»Äܱ£Ö¤ÖØÐÂÆô¶¯Ê§°ÜµÄÈÎÎñ£¬ÎÞÂÛÊÇÓÉÓÚÓ¦ÓóÌÐòʧ°Ü»¹ÊÇÓ²¼þ¹ÊÕÏ¡£
Ó¦ÓóÌÐò¹ÜÀíÆ÷£¨AM£©¸ºÔð½ÓÊÕÌá½»µÄ¹¤×÷£¬ÐÉÌÖ´ÐÐÓ¦ÓóÌÐòµÄµÚÒ»¸öÈÝÆ÷£¬²¢²¢ÌṩÔÚʧ°ÜÊ±ÖØÐÂÆô¶¯Ó¦ÓóÌÐò¹ÜÀíÆ÷(AM)ÈÝÆ÷µÄ·þÎñ¡£Ã¿¸öÓ¦ÓóÌÐò¹ÜÀíÆ÷(AM)¸ºÔð´Óµ÷¶È³ÌÐòÖÐÐÉÌÊʵ±µÄ×ÊÔ´ÈÝÆ÷£¬¸ú×ÙËüÃǵÄ״̬²¢¼àÊÓ½ø³Ì¡£
YARN »¹Ö§³Ö×ÊÔ´Ô¤¶¨µÄ¸ÅÄ±£Áô×ÊÔ´ÒÔÈ·±£ÖØÒª¹¤×÷µÄ¿ÉÔ¤¼ûÐÔÖ´ÐС£Ô¤¶©ÏµÍ³»á¶Ô×ÊÔ´½øÐиú×Ù£¬¶ÔÔ¤¶©½øÐпØÖÆ£¬²¢¶¯Ì¬µØÖ¸µ¼µ×²ãµÄµ÷¶È³ÌÐò£¬ÒÔÈ·±£Ô¤¶©ÊÇÂúµÄ¡£

|