±à¼ÍƼö: |
±¾ÎÄÀ´×Ôcsdn,Îı¾Ïêϸ½éÉÜÁËHDFSÖеÄÐí¶à¸ÅÄîÒÔ¼°¼¯Èº´æ´¢µÄÊý¾Ý£¬¶ÔÓÚÀí½âHadoop·Ö²¼Ê½ÎļþϵͳºÜÓаïÖú¡£ |
|
1. ½éÉÜ
ÔÚÏÖ´úµÄÆóÒµ»·¾³ÖУ¬µ¥»úÈÝÁ¿ÍùÍùÎÞ·¨´æ´¢´óÁ¿Êý¾Ý£¬ÐèÒª¿ç»úÆ÷´æ´¢¡£Í³Ò»¹ÜÀí·Ö²¼ÔÚ¼¯ÈºÉϵÄÎļþϵͳ³ÆÎª·Ö²¼Ê½Îļþϵͳ¡£¶øÒ»µ©ÔÚϵͳÖУ¬ÒýÈëÍøÂ磬¾Í²»¿É±ÜÃâµØÒýÈëÁËËùÓÐÍøÂç±à³ÌµÄ¸´ÔÓÐÔ£¬ÀýÈçÌôÕ½Ö®Ò»ÊÇÈç¹û±£Ö¤Ôڽڵ㲻¿ÉÓõÄʱºòÊý¾Ý²»¶ªÊ§¡£
´«Í³µÄÍøÂçÎļþϵͳ£¨NFS£©ËäȻҲ³ÆÎª·Ö²¼Ê½Îļþϵͳ£¬µ«ÊÇÆä´æÔÚһЩÏÞÖÆ¡£ÓÉÓÚNFSÖУ¬ÎļþÊÇ´æ´¢ÔÚµ¥»úÉÏ£¬Òò´ËÎÞ·¨Ìṩ¿É¿¿ÐÔ±£Ö¤£¬µ±ºÜ¶à¿Í»§¶Ëͬʱ·ÃÎÊNFS
Serverʱ£¬ºÜÈÝÒ×Ôì³É·þÎñÆ÷ѹÁ¦£¬Ôì³ÉÐÔÄÜÆ¿¾±¡£ÁíÍâÈç¹ûÒª¶ÔNFSÖеÄÎļþÖнøÐвÙ×÷£¬ÐèÒªÊ×ÏÈͬ²½µ½±¾µØ£¬ÕâЩÐÞ¸ÄÔÚͬ²½µ½·þÎñ¶Ë֮ǰ£¬ÆäËû¿Í»§¶ËÊDz»¿É¼ûµÄ¡£Ä³Ö̶ֳÈÉÏ£¬NFS²»ÊÇÒ»ÖÖµäÐ͵ķֲ¼Ê½ÏµÍ³£¬ËäÈ»ËüµÄÎļþµÄÈ··ÅÔÚÔ¶¶Ë£¨µ¥Ò»£©µÄ·þÎñÆ÷ÉÏÃæ¡£


´ÓNFSµÄÐÒéÕ»¿ÉÒÔ¿´µ½£¬ËüÊÂʵÉÏÊÇÒ»ÖÖVFS£¨²Ù×÷ϵͳ¶ÔÎļþµÄÒ»ÖÖ³éÏó£©ÊµÏÖ¡£
HDFS£¬ÊÇHadoop Distributed File SystemµÄ¼ò³Æ£¬ÊÇHadoop³éÏóÎļþϵͳµÄÒ»ÖÖʵÏÖ¡£Hadoop³éÏóÎļþϵͳ¿ÉÒÔÓë±¾µØÏµÍ³¡¢Amazon
S3µÈ¼¯³É£¬ÉõÖÁ¿ÉÒÔͨ¹ýWebÐÒ飨webhsfs£©À´²Ù×÷¡£HDFSµÄÎļþ·Ö²¼ÔÚ¼¯Èº»úÆ÷ÉÏ£¬Í¬Ê±Ìṩ¸±±¾½øÐÐÈÝ´í¼°¿É¿¿ÐÔ±£Ö¤¡£ÀýÈç¿Í»§¶ËдÈë¶ÁÈ¡ÎļþµÄÖ±½Ó²Ù×÷¶¼ÊÇ·Ö²¼ÔÚ¼¯Èº¸÷¸ö»úÆ÷Éϵģ¬Ã»Óе¥µãÐÔÄÜѹÁ¦¡£
Èç¹ûÄã´ÓÁ㿪ʼ´î½¨Ò»¸öÍêÕûµÄ¼¯Èº£¬²Î¿¼[Hadoop¼¯Èº´î½¨Ïêϸ²½Ö裨2.6.0£©]
2. HDFSÉè¼ÆÔÔò
HDFSÉè¼ÆÖ®³õ¾Í·Ç³£Ã÷È·ÆäÓ¦Óó¡¾°£¬ÊÊÓÃÓëʲôÀàÐ͵ÄÓ¦Ó㬲»ÊÊÓÃʲôӦÓã¬ÓÐÒ»¸öÏà¶ÔÃ÷È·µÄÖ¸µ¼ÔÔò¡£
2.1 Éè¼ÆÄ¿±ê
´æ´¢·Ç³£´óµÄÎļþ£ºÕâÀï·Ç³£´óÖ¸µÄÊǼ¸°ÙM¡¢G¡¢»òÕßTB¼¶±ð¡£Êµ¼ÊÓ¦ÓÃÖÐÒÑÓкܶ༯Ⱥ´æ´¢µÄÊý¾Ý´ïµ½PB¼¶±ð¡£¸ù¾ÝHadoop¹ÙÍø£¬Yahoo£¡µÄHadoop¼¯ÈºÔ¼ÓÐ10Íò¿ÅCPU£¬ÔËÐÐÔÚ4Íò¸ö»úÆ÷½ÚµãÉÏ¡£¸ü¶àÊÀ½çÉϵÄHadoop¼¯ÈºÊ¹ÓÃÇé¿ö£¬²Î¿¼Hadoop¹ÙÍø.
²ÉÓÃÁ÷ʽµÄÊý¾Ý·ÃÎÊ·½Ê½: HDFS»ùÓÚÕâÑùµÄÒ»¸ö¼ÙÉ裺×îÓÐЧµÄÊý¾Ý´¦ÀíģʽÊÇÒ»´ÎдÈë¡¢¶à´Î¶ÁÈ¡Êý¾Ý¼¯¾³£´ÓÊý¾ÝÔ´Éú³É»òÕß¿½±´Ò»´Î£¬È»ºóÔÚÆäÉÏ×öºÜ¶à·ÖÎö¹¤×÷
·ÖÎö¹¤×÷¾³£¶ÁÈ¡ÆäÖеĴ󲿷ÖÊý¾Ý£¬¼´Ê¹²»ÊÇÈ«²¿¡£ Òò´Ë¶ÁÈ¡Õû¸öÊý¾Ý¼¯ËùÐèʱ¼ä±È¶ÁÈ¡µÚÒ»Ìõ¼Ç¼µÄÑÓʱ¸üÖØÒª¡£
ÔËÐÐÓÚÉÌÒµÓ²¼þÉÏ: Hadoop²»ÐèÒªÌØ±ð¹óµÄ¡¢reliableµÄ»úÆ÷£¬¿ÉÔËÐÐÓÚÆÕͨÉÌÓûúÆ÷£¨¿ÉÒÔ´Ó¶à¼Ò¹©Ó¦É̲ɹº£©
ÉÌÓûúÆ÷²»´ú±íµÍ¶Ë»úÆ÷ÔÚ¼¯ÈºÖУ¨ÓÈÆäÊÇ´óµÄ¼¯Èº£©£¬½Úµãʧ°ÜÂÊÊDZȽϸߵÄHDFSµÄÄ¿±êÊÇÈ·±£¼¯ÈºÔÚ½Úµãʧ°ÜµÄʱºò²»»áÈÃÓû§¸Ð¾õµ½Ã÷ÏÔµÄÖжϡ£
2.2 HDFS²»ÊʺϵÄÓ¦ÓÃÀàÐÍ
ÓÐЩ³¡¾°²»ÊʺÏʹÓÃHDFSÀ´´æ´¢Êý¾Ý¡£ÏÂÃæÁоټ¸¸ö£º
1£© µÍÑÓʱµÄÊý¾Ý·ÃÎÊ
¶ÔÑÓʱҪÇóÔÚºÁÃë¼¶±ðµÄÓ¦Ó㬲»ÊʺϲÉÓÃHDFS¡£HDFSÊÇΪ¸ßÍÌÍÂÊý¾Ý´«ÊäÉè¼ÆµÄ,Òò´Ë¿ÉÄÜÎþÉüÑÓʱHBase¸üÊʺϵÍÑÓʱµÄÊý¾Ý·ÃÎÊ¡£
2£©´óÁ¿Ð¡Îļþ
ÎļþµÄÔªÊý¾Ý£¨ÈçĿ¼½á¹¹£¬ÎļþblockµÄ½ÚµãÁÐ±í£¬block-node mapping£©±£´æÔÚNameNodeµÄÄÚ´æÖУ¬
Õû¸öÎļþϵͳµÄÎļþÊýÁ¿»áÊÜÏÞÓÚNameNodeµÄÄÚ´æ´óС¡£
¾Ñé¶øÑÔ£¬Ò»¸öÎļþ/Ŀ¼/Îļþ¿éÒ»°ãÕ¼ÓÐ150×Ö½ÚµÄÔªÊý¾ÝÄÚ´æ¿Õ¼ä¡£Èç¹ûÓÐ100Íò¸öÎļþ£¬Ã¿¸öÎļþÕ¼ÓÃ1¸öÎļþ¿é£¬ÔòÐèÒª´óÔ¼300MµÄÄÚ´æ¡£Òò´ËÊ®ÒÚ¼¶±ðµÄÎļþÊýÁ¿ÔÚÏÖÓÐÉÌÓûúÆ÷ÉÏÄÑÒÔÖ§³Ö¡£
3£©¶à·½¶Áд£¬ÐèÒªÈÎÒâµÄÎļþÐÞ¸Ä
HDFS²ÉÓÃ×·¼Ó£¨append-only£©µÄ·½Ê½Ð´ÈëÊý¾Ý¡£²»Ö§³ÖÎļþÈÎÒâoffsetµÄÐ޸ġ£²»Ö§³Ö¶à¸öдÈëÆ÷£¨writer£©¡£
3. HDFSºËÐĸÅÄî
3.1 Blocks
ÎïÀí´ÅÅÌÖÐÓпéµÄ¸ÅÄ´ÅÅ̵ÄÎïÀíBlockÊÇ´ÅÅ̲Ù×÷×îСµÄµ¥Ôª£¬¶Áд²Ù×÷¾ùÒÔBlockΪ×îСµ¥Ôª£¬Ò»°ãΪ512
Byte¡£ÎļþϵͳÔÚÎïÀíBlockÖ®ÉϳéÏóÁËÁíÒ»²ã¸ÅÄÎļþϵͳBlockÎïÀí´ÅÅÌBlockµÄÕûÊý±¶¡£Í¨³£Îª¼¸KB¡£HadoopÌṩµÄdf¡¢fsckÕâÀàÔËά¹¤¾ß¶¼ÊÇÔÚÎļþϵͳµÄBlock¼¶±ðÉϽøÐвÙ×÷¡£
HDFSµÄBlock¿é±ÈÒ»°ãµ¥»úÎļþϵͳ´óµÃ¶à£¬Ä¬ÈÏΪ128M¡£HDFSµÄÎļþ±»²ð·Ö³Éblock-sizedµÄchunk£¬chunk×÷Ϊ¶ÀÁ¢µ¥Ôª´æ´¢¡£±ÈBlockСµÄÎļþ²»»áÕ¼ÓÃÕû¸öBlock£¬Ö»»áÕ¼¾Ýʵ¼Ê´óС¡£ÀýÈ磬
Èç¹ûÒ»¸öÎļþ´óСΪ1M£¬ÔòÔÚHDFSÖÐÖ»»áÕ¼ÓÃ1MµÄ¿Õ¼ä£¬¶ø²»ÊÇ128M¡£
HDFSµÄBlockΪʲôÕâô´ó£¿
ÊÇΪÁË×îС»¯²éÕÒ£¨seek£©Ê±¼ä£¬¿ØÖƶ¨Î»ÎļþÓë´«ÊäÎļþËùÓõÄʱ¼ä±ÈÀý¡£¼ÙÉ趨λµ½BlockËùÐèµÄʱ¼äΪ10ms£¬´ÅÅÌ´«ÊäËÙ¶ÈΪ100M/s¡£Èç¹ûÒª½«¶¨Î»µ½BlockËùÓÃʱ¼äÕ¼´«Êäʱ¼äµÄ±ÈÀý¿ØÖÆ1%£¬ÔòBlock´óСÐèÒªÔ¼100M¡£
µ«ÊÇÈç¹ûBlockÉèÖùý´ó£¬ÔÚMapReduceÈÎÎñÖУ¬Map»òÕßReduceÈÎÎñµÄ¸öÊý Èç¹ûСÓÚ¼¯Èº»úÆ÷ÊýÁ¿£¬»áʹµÃ×÷ÒµÔËÐÐЧÂʺܵ͡£
Block³éÏóµÄºÃ´¦
blockµÄ²ð·ÖʹµÃµ¥¸öÎļþ´óС¿ÉÒÔ´óÓÚÕû¸ö´ÅÅ̵ÄÈÝÁ¿£¬¹¹³ÉÎļþµÄBlock¿ÉÒÔ·Ö²¼ÔÚÕû¸ö¼¯Èº£¬ ÀíÂÛÉÏ£¬µ¥¸öÎļþ¿ÉÒÔÕ¼¾Ý¼¯ÈºÖÐËùÓлúÆ÷µÄ´ÅÅÌ¡£
BlockµÄ³éÏóÒ²¼ò»¯Á˴洢ϵͳ£¬¶ÔÓÚBlock£¬ÎÞÐè¹Ø×¢ÆäȨÏÞ£¬ËùÓÐÕßµÈÄÚÈÝ£¨ÕâЩÄÚÈݶ¼ÔÚÎļþ¼¶±ðÉϽøÐпØÖÆ£©¡£
Block×÷ΪÈÝ´íºÍ¸ß¿ÉÓûúÖÆÖеĸ±±¾µ¥Ôª£¬¼´ÒÔBlockΪµ¥Î»½øÐи´ÖÆ¡£
3.2 Namenode & Datanode
Õû¸öHDFS¼¯ÈºÓÉNamenodeºÍDatanode¹¹³Émaster-worker£¨Ö÷´Ó£©Ä£Ê½¡£Namenode¸´ÔÓ¹¹½¨ÃüÃû¿Õ¼ä£¬¹ÜÀíÎļþµÄÔªÊý¾ÝµÈ£¬¶øDatanode¸ºÔðʵ¼Ê´æ´¢Êý¾Ý£¬¸ºÔð¶Áд¹¤×÷¡£
Namenode
Namenode´æ·ÅÎļþϵͳÊ÷¼°ËùÓÐÎļþ¡¢Ä¿Â¼µÄÔªÊý¾Ý¡£ÔªÊý¾Ý³Ö¾Ã»¯Îª2ÖÖÐÎʽ£º
NFS£º´«Í³µÄÍøÂçÎļþϵͳ
QJM£ºquorum journal manager |
µ«Êdz־û¯Êý¾ÝÖв»°üÀ¨BlockËùÔڵĽڵãÁÐ±í£¬¼°ÎļþµÄBlock·Ö²¼ÔÚ¼¯ÈºÖеÄÄÄЩ½ÚµãÉÏ£¬ÕâЩÐÅÏ¢ÊÇÔÚÏµÍ³ÖØÆôµÄʱºòÖØÐ¹¹½¨£¨Í¨¹ýDatanode»ã±¨µÄBlockÐÅÏ¢£©¡£
ÔÚHDFSÖУ¬Namenode¿ÉÄܳÉΪ¼¯ÈºµÄµ¥µã¹ÊÕÏ£¬Namenode²»¿ÉÓÃʱ£¬Õû¸öÎļþϵͳÊDz»¿ÉÓõġ£HDFSÕë¶Ôµ¥µã¹ÊÕÏÌṩÁË2ÖÖ½â¾ö»úÖÆ£º
1£©±¸·Ý³Ö¾Ã»¯ÔªÊý¾Ý
½«ÎļþϵͳµÄÔªÊý¾Ýͬʱдµ½¶à¸öÎļþϵͳ£¬ ÀýÈçͬʱ½«ÔªÊý¾Ýдµ½±¾µØÎļþϵͳ¼°NFS¡£ÕâЩ±¸·Ý²Ù×÷¶¼ÊÇͬ²½µÄ¡¢Ô×ӵġ£
2£©Secondary Namenode
Secondary½Úµã¶¨ÆÚºÏ²¢Ö÷NamenodeµÄnamespace
imageºÍedit log£¬ ±ÜÃâedit log¹ý´ó£¬Í¨¹ý´´½¨¼ì²éµãcheckpointÀ´ºÏ²¢¡£Ëü»áά»¤Ò»¸öºÏ²¢ºóµÄnamespace
image¸±±¾£¬ ¿ÉÓÃÓÚÔÚNamenodeÍêÈ«±ÀÀ£Ê±»Ö¸´Êý¾Ý¡£ÏÂͼΪSecondary NamenodeµÄ¹ÜÀí½çÃæ£º

Secondary Namenodeͨ³£ÔËÐÐÔÚÁíһ̨»úÆ÷£¬ÒòΪºÏ²¢²Ù×÷ÐèÒªºÄ·Ñ´óÁ¿µÄCPUºÍÄÚ´æ¡£ÆäÊý¾ÝÂäºóÓÚNamenode£¬Òò´Ëµ±NamenodeÍêÈ«±ÀÀ£Ê±£¬»á³öÏÖÊý¾Ý¶ªÊ§¡£
ͨ³£×ö·¨ÊÇ¿½±´NFSÖеı¸·ÝÔªÊý¾Ýµ½Second£¬½«Æä×÷ΪеÄÖ÷Namenode¡£
ÔÚHAÖпÉÒÔÔËÐÐÒ»¸öHot Standby£¬×÷ΪÈȱ¸·Ý£¬ÔÚActive Namenode¹ÊÕÏÖ®ºó£¬Ìæ´úÔÓÐNamenode³ÉΪActive
Namenode¡£
Datanode
Êý¾Ý½Úµã¸ºÔð´æ´¢ºÍÌáÈ¡Block£¬¶ÁдÇëÇó¿ÉÄÜÀ´×Ônamenode£¬Ò²¿ÉÄÜÖ±½ÓÀ´×Ô¿Í»§¶Ë¡£Êý¾Ý½ÚµãÖÜÆÚÐÔÏòNamenode»ã±¨×Ô¼º½ÚµãÉÏËù´æ´¢µÄBlockÏà¹ØÐÅÏ¢¡£
3.3 Block Caching
DataNodeͨ³£Ö±½Ó´Ó´ÅÅ̶ÁÈ¡Êý¾Ý£¬µ«ÊÇÆµ·±Ê¹ÓõÄBlock¿ÉÒÔÔÚÄÚ´æÖлº´æ¡£Ä¬ÈÏÇé¿öÏ£¬Ò»¸öBlockÖ»ÓÐÒ»¸öÊý¾Ý½Úµã»á»º´æ¡£µ«ÊÇ¿ÉÒÔÕë¶Ôÿ¸öÎļþ¿ÉÒÔ¸öÐÔ»¯ÅäÖá£
×÷Òµµ÷¶ÈÆ÷¿ÉÒÔÀûÓûº´æÌáÉýÐÔÄÜ£¬ÀýÈçMapReduce¿ÉÒÔ°ÑÈÎÎñÔËÐÐÔÚÓÐBlock»º´æµÄ½ÚµãÉÏ¡£
Óû§»òÕßÓ¦ÓÿÉÒÔÏòNameNode·¢ËÍ»º´æÖ¸Á»º´æÄĸöÎļþ£¬»º´æ¶à¾Ã£©£¬ »º´æ³ØµÄ¸ÅÄîÓÃÓÚ¹ÜÀíÒ»×黺´æµÄȨÏÞºÍ×ÊÔ´¡£
3.4 HDFS Federation
ÎÒÃÇÖªµÀNameNodeµÄÄÚ´æ»áÖÆÔ¼ÎļþÊýÁ¿£¬HDFS FederationÌṩÁËÒ»ÖÖºáÏòÀ©Õ¹NameNodeµÄ·½Ê½¡£ÔÚFederationģʽÖУ¬Ã¿¸öNameNode¹ÜÀíÃüÃû¿Õ¼äµÄÒ»²¿·Ö£¬ÀýÈçÒ»¸öNameNode¹ÜÀí/userĿ¼ÏµÄÎļþ£¬
ÁíÒ»¸öNameNode¹ÜÀí/shareĿ¼ÏµÄÎļþ¡£
ÿ¸öNameNode¹ÜÀíÒ»¸önamespace volumn£¬ËùÓÐvolumn¹¹³ÉÎļþϵͳµÄÔªÊý¾Ý¡£Ã¿¸öNameNodeͬʱά»¤Ò»¸öBlock
Pool£¬±£´æBlockµÄ½ÚµãÓ³ÉäµÈÐÅÏ¢¡£¸÷NameNodeÖ®¼äÊǶÀÁ¢µÄ£¬Ò»¸ö½ÚµãµÄʧ°Ü²»»áµ¼ÖÂÆäËû½Úµã¹ÜÀíµÄÎļþ²»¿ÉÓá£
¿Í»§¶ËʹÓÃmount table½«Îļþ·¾¶Ó³Éäµ½NameNode¡£mount tableÊÇÔÚNamenodeȺ×éÖ®ÉÏ·â×°ÁËÒ»²ã£¬ÕâÒ»²ãÒ²ÊÇÒ»¸öHadoopÎļþϵͳµÄʵÏÖ£¬Í¨¹ýviewfs:ÐÒé·ÃÎÊ¡£
3.5 HDFS HA
ÔÚHDFS¼¯ÈºÖУ¬NameNodeÒÀÈ»Êǵ¥µã¹ÊÕÏ£¨SPOF£©¡£ÔªÊý¾Ýͬʱдµ½¶à¸öÎļþϵͳÒÔ¼°Second
NameNode¶¨ÆÚcheckpointÓÐÀûÓÚ±£»¤Êý¾Ý¶ªÊ§£¬µ«ÊDz¢²»ÄÜÌá¸ß¿ÉÓÃÐÔ¡£
ÕâÊÇÒòΪNameNodeÊÇΨһһ¸ö¶ÔÎļþÔªÊý¾ÝºÍfile-blockÓ³É为ÔðµÄµØ·½£¬ µ±Ëü¹ÒÁËÖ®ºó£¬°üÀ¨MapReduceÔÚÄÚµÄ×÷Òµ¶¼ÎÞ·¨½øÐжÁд¡£
µ±NameNode¹ÊÕÏʱ£¬³£¹æµÄ×ö·¨ÊÇʹÓÃÔªÊý¾Ý±¸·ÝÖØÐÂÆô¶¯Ò»¸öNameNode¡£ÔªÊý¾Ý±¸·Ý¿ÉÄÜÀ´Ô´ÓÚ£º
¶àÎļþϵͳдÈëÖеı¸·Ý
Second NameNodeµÄ¼ì²éµãÎļþ
Æô¶¯ÐµÄNamenodeÖ®ºó£¬ÐèÒªÖØÐÂÅäÖÿͻ§¶ËºÍDataNodeµÄNameNodeÐÅÏ¢¡£ÁíÍâÖØÆôºÄʱһ°ã±È½Ï¾Ã£¬ÉԾ߹æÄ£µÄ¼¯ÈºÖØÆô¾³£ÐèÒª¼¸Ê®·ÖÖÓÉõÖÁÊýСʱ£¬Ôì³ÉÖØÆôºÄʱµÄÔÒò´óÖÂÓУº
1£© ÔªÊý¾Ý¾µÏñÎļþÔØÈëµ½ÄÚ´æºÄʱ½Ï³¤¡£
2£© ÐèÒªÖØ·Åedit log
3£© ÐèÒªÊÕµ½À´×ÔDataNodeµÄ״̬±¨¸æ²¢ÇÒÂú×ãÌõ¼þºó²ÅÄÜÀ뿪°²È«Ä£Ê½Ìṩд·þÎñ¡£
HadoopµÄHA·½°¸
²ÉÓÃHAµÄHDFS¼¯ÈºÅäÖÃÁ½¸öNameNode£¬·Ö±ð´¦ÓÚActiveºÍStandby״̬¡£µ±Active
NameNode¹ÊÕÏÖ®ºó£¬Standby½Ó¹ýÔðÈμÌÐøÌṩ·þÎñ£¬Óû§Ã»ÓÐÃ÷ÏÔµÄÖжϸоõ¡£Ò»°ãºÄʱÔÚ¼¸Ê®Ãëµ½Êý·ÖÖÓ¡£
HAÉæ¼°µ½µÄÖ÷ҪʵÏÖÂß¼ÓÐ
1£© Ö÷±¸Ðè¹²Ïíedit log´æ´¢¡£
Ö÷NameNodeºÍ´ýÃüµÄNameNode¹²ÏíÒ»·Ýedit log£¬µ±Ö÷±¸Çл»Ê±£¬Standbyͨ¹ý»Ø·Åedit
logͬ²½Êý¾Ý¡£
¹²Ïí´æ´¢Í¨³£ÓÐ2ÖÖÑ¡Ôñ
NFS£º´«Í³µÄÍøÂçÎļþϵͳ
QJM£ºquorum journal manager
QJMÊÇרÃÅΪHDFSµÄHAʵÏÖ¶øÉè¼ÆµÄ£¬ÓÃÀ´Ìṩ¸ß¿ÉÓõÄedit log¡£QJMÔËÐÐÒ»×éjournal
node£¬edit log±ØÐëдµ½´ó²¿·ÖµÄjournal nodes¡£Í¨³£Ê¹ÓÃ3¸ö½Úµã£¬Òò´ËÔÊÐíÒ»¸ö½Úµãʧ°Ü£¬ÀàËÆZooKeeper¡£×¢ÒâQJMûÓÐʹÓÃZK£¬ËäÈ»HDFS
HAµÄȷʹÓÃÁËZKÀ´Ñ¡¾ÙÖ÷Namenode¡£Ò»°ãÍÆ¼öʹÓÃQJM¡£
2£©DataNodeÐèҪͬʱÍùÖ÷±¸·¢ËÍBlock Report
ÒòΪBlockÓ³ÉäÊý¾Ý´æ´¢ÔÚÄÚ´æÖУ¨²»ÊÇÔÚ´ÅÅÌÉÏ£©£¬ÎªÁËÔÚActive NameNode¹ÒµôÖ®ºó£¬ÐµÄNameNodeÄܹ»¿ìËÙÆô¶¯£¬²»ÐèÒªµÈ´ýÀ´×ÔDatanodeµÄBlock
Report£¬DataNodeÐèҪͬʱÏòÖ÷±¸Á½¸öNameNode·¢ËÍBlock Report¡£
3£©¿Í»§¶ËÐèÒªÅäÖÃfailoverģʽ£¨¶ÔÓû§Í¸Ã÷£©
NamenodeµÄÇл»¶Ô¿Í»§¶ËÀ´ËµÊÇÎÞ¸ÐÖªµÄ£¬Í¨¹ý¿Í»§¶Ë¿âÀ´ÊµÏÖ¡£¿Í»§¶ËÔÚÅäÖÃÎļþÖÐʹÓõÄHDFS
URIÊÇÂ߼·¾¶£¬Ó³Éäµ½Ò»¶ÔNamenodeµØÖ·¡£¿Í»§¶Ë»á²»¶Ï³¢ÊÔÿһ¸öNamenodeµØÖ·Ö±µ½³É¹¦¡£
4£©StandbyÌæ´úSecondary NameNode
Èç¹ûûÓÐÆôÓÃHA£¬HDFS¶ÀÁ¢ÔËÐÐÒ»¸öÊØ»¤½ø³Ì×÷ΪSecondary Namenode¡£¶¨ÆÚcheckpoint£¬ºÏ²¢¾µÏñÎļþºÍeditÈÕÖ¾¡£
Èç¹ûµ±Ö÷Namenodeʧ°Üʱ£¬±¸·ÝNamenodeÕýÔڹػú£¨Í£Ö¹ Standby£©£¬ÔËάÈËÔ±ÒÀÈ»¿ÉÒÔ´ÓÍ·Æô¶¯±¸·ÝNamenode£¬ÕâÑù±ÈûÓÐHAµÄʱºò¸üʡʣ¬ËãÊÇÒ»ÖָĽø£¬ÒòÎªÖØÆôÕû¸ö¹ý³ÌÒѾ±ê×¼»¯µ½HadoopÄÚ²¿£¬ÎÞÐèÔËά½øÐи´ÔÓµÄÇл»²Ù×÷¡£
NameNodeµÄÇл»Í¨¹ý´úfailover controllerÀ´ÊµÏÖ¡£failover controllerÓжàÖÖʵÏÖ£¬Ä¬ÈÏʵÏÖʹÓÃZooKeeperÀ´±£Ö¤Ö»ÓÐÒ»¸öNamenode´¦ÓÚactive״̬¡£
ÿ¸öNamenodeÔËÐÐÒ»¸öÇáÁ¿¼¶µÄfailover controller½ø³Ì£¬¸Ã½ø³ÌʹÓüòµ¥µÄÐÄÌø»úÖÆÀ´¼à¿ØNamenodeµÄ´æ»î״̬²¢ÔÚNamenodeʧ°ÜÊÇ´¥·¢failover¡£Failover¿ÉÒÔÓÉÔËάÊÖ¶¯´¥·¢£¬ÀýÈçÔÚÈÕ³£Î¬»¤ÖÐÐèÒªÇл»Ö÷Namenode£¬ÕâÖÖÇé¿ögraceful
failover£¬·ÇÊÖ¶¯´¥·¢µÄfailover³ÆÎªungraceful failover¡£
ÔÚungraceful failoverµÄÇé¿öÏ£¬Ã»Óа취ȷ¶¨Ê§°Ü£¨±»Åж¨ÎªÊ§°Ü£©µÄ½ÚµãÊÇ·ñÍ£Ö¹ÔËÐУ¬Ò²¾ÍÊÇ˵´¥·¢failoverºó£¬Ö®Ç°µÄÖ÷Namenode¿ÉÄÜ»¹ÔÚÔËÐС£QJMÒ»´ÎÖ»ÔÊÐíÒ»¸öNamenodeдedit
log£¬µ«ÊÇ֮ǰµÄÖ÷NamenodeÈÔÈ»¿ÉÒÔ½ÓÊܶÁÇëÇó¡£HadoopʹÓÃfencingÀ´É±µô֮ǰµÄNamenode¡£Fencingͨ¹ýÊÕ»ØÖ®Ç°Namenode¶Ô¹²ÏíµÄedit
logµÄ·ÃÎÊȨÏÞ¡¢¹Ø±ÕÆäÍøÂç¶Ë¿ÚʹµÃÔÓеÄNamenode²»ÄÜÔÙ¼ÌÐø½ÓÊÜ·þÎñÇëÇó¡£Ê¹ÓÃSTONITH¼¼ÊõÒ²¿ÉÒÔ½«Ö®Ç°µÄÖ÷Namenode¹Ø»ú¡£
×îºó£¬HA·½°¸ÖÐNamenodeµÄÇл»¶Ô¿Í»§¶ËÀ´ËµÊDz»¿É¼ûµÄ£¬Ç°ÃæÒѾ½éÉܹý£¬Ö÷Ҫͨ¹ý¿Í»§¶Ë¿âÀ´Íê³É¡£
4. ÃüÁîÐнӿÚ
HDFSÌṩÁ˸÷ÖÖ½»»¥·½Ê½£¬ÀýÈçͨ¹ýJava API¡¢HTTP¡¢shellÃüÁîÐеġ£ÃüÁîÐеĽ»»¥Ö÷Ҫͨ¹ýhadoop
fsÀ´²Ù×÷¡£ÀýÈ磺
hadoop fs -copyFromLocal
// ´Ó±¾µØ¸´ÖÆÎļþµ½HDFS
hadoop fs mkdir // ´´½¨Ä¿Â¼
hadoop fs -ls // ÁгöÎļþÁбí |
HadoopÖУ¬ÎļþºÍĿ¼µÄȨÏÞÀàËÆÓÚPOSIXÄ£ÐÍ£¬°üÀ¨¶Á¡¢Ð´¡¢Ö´ÐÐ3ÖÖȨÏÞ£º
¶ÁȨÏÞ£¨r£©£ºÓÃÓÚ¶ÁÈ¡Îļþ»òÕßÁгöĿ¼ÖеÄÄÚÈÝ
дȨÏÞ£¨w£©£º¶ÔÓÚÎļþ£¬¾ÍÊÇÎļþµÄдȨÏÞ¡£Ä¿Â¼µÄдȨÏÞÖ¸ÔÚ¸ÃĿ¼Ï´´½¨»òÕßɾ³ýÎļþ£¨Ä¿Â¼£©µÄȨÏÞ¡£
Ö´ÐÐȨÏÞ£¨x£©£ºÎļþûÓÐËùνµÄÖ´ÐÐȨÏÞ£¬±»ºöÂÔ¡£¶ÔÓÚĿ¼£¬Ö´ÐÐȨÏÞÓÃÓÚ·ÃÎÊÆ÷Ŀ¼ÏµÄÄÚÈÝ¡£
ÿ¸öÎļþ»òĿ¼¶¼ÓÐowner£¬group£¬modeÈý¸öÊôÐÔ£¬ownerÖ¸ÎļþµÄËùÓÐÕߣ¬groupΪȨÏÞ×é¡£mode
ÓÉËùÓÐÕßȨÏÞ¡¢ÎļþËùÊôµÄ×éÖÐ×éÔ±µÄȨÏÞ¡¢·ÇËùÓÐÕß·Ç×éÔ±µÄȨÏÞ×é³É¡£ÏÂͼ±íʾÆäËùÓÐÕßrootÓµÓжÁдȨÏÞ£¬supergroup×éµÄ×éÔ±ÓжÁȨÏÞ£¬ÆäËûÈËÓжÁȨÏÞ¡£

ÎļþȨÏÞÊÇ·ñ¿ªÆôͨ¹ýdfs.permissions.enabledÊôÐÔÀ´¿ØÖÆ£¬Õâ¸öÊôÐÔĬÈÏΪfalse£¬Ã»Óдò¿ª°²È«ÏÞÖÆ£¬Òò´Ë²»»á¶Ô¿Í»§¶Ë×öÊÚȨУÑ飬Èç¹û¿ªÆô°²È«ÏÞÖÆ£¬»á¶Ô²Ù×÷ÎļþµÄÓû§×öȨÏÞУÑé¡£ÌØÊâÓû§superuserÊÇNamenode½ø³ÌµÄ±êʶ£¬²»»áÕë¶Ô¸ÃÓû§×öȨÏÞУÑé¡£
×îºó¿´Ò»ÏÂlsÃüÁîµÄÖ´Ðнá¹û£º

Õâ¸ö·µ»Ø½á¹ûÀàËÆÓÚUnixϵͳϵÄlsÃüÁµÚÒ»À¸ÎªÎļþµÄmode£¬d±íʾĿ¼£¬½ô½Ó×Å3ÖÖȨÏÞ9λ¡£
µÚ¶þÀ¸ÊÇÖ¸ÎļþµÄ¸±±¾Êý£¬Õâ¸öÊýÁ¿Í¨¹ýdfs.replicationÅäÖã¬Ä¿Â¼ÔòʹÓÃ-±íʾûÓи±±¾Ò»Ëµ¡£ÆäËûÖîÈçËùÓÐÕß¡¢×é¡¢¸üÐÂʱ¼ä¡¢Îļþ´óС¸úUnixϵͳÖеÄlsÃüÁîÒ»Ö¡£
Èç¹ûÐèÒª²é¿´¼¯Èº×´Ì¬»òÕßä¯ÀÀÎļþĿ¼£¬¿ÉÒÔ·ÃÎÊNamenode±©Â¶µÄHttp Server²é¿´¼¯ÈºÐÅÏ¢£¬Ò»°ãÔÚnamenodeËùÔÚ»úÆ÷µÄ50070¶Ë¿Ú¡£



5. HadoopÎļþϵͳ
Ç°ÃæHadoopµÄÎļþϵͳ¸ÅÄîÊdzéÏóµÄ£¬HDFSÖ»ÊÇÆäÖеÄÒ»ÖÖʵÏÖ¡£HadoopÌṩµÄʵÏÖÈçÏÂͼ£º


¼òµ¥½éÉÜһϣ¬LocalÊǶԱ¾µØÎļþϵͳµÄ³éÏó£¬hdfs¾ÍÊÇÎÒÃÇ×î³£¼ûµÄ£¬Á½ÖÖwebÐÎʽ£¨webhdfs£¬swebhdfs£©µÄʵÏÖͨ¹ýHTTPÌṩÎļþ²Ù×÷½Ó¿Ú¡£harÊÇHadoopÌåϵϵÄѹËõÎļþ£¬µµÎļþºÜ¶àµÄʱºò¿ÉÒÔѹËõ³ÉÒ»¸ö´óÎļþ£¬¿ÉÒÔÓÐЧ¼õÉÙÔªÊý¾ÝµÄÊýÁ¿¡£viewfs¾ÍÊÇÎÒÃÇÇ°Ãæ½éÉÜHDFS
FederationÕÅÌáµ½µÄ£¬ÓÃÀ´ÔÚ¿Í»§¶ËÆÁ±Î¶à¸öNamenodeµÄµ×²ãϸ½Ú¡£ftp¹ËÃû˼Ò壬¾ÍÊÇʹÓÃftpÐÒéÀ´ÊµÏÖ£¬¶ÔÎļþµÄ²Ù×÷ת»¯ÎªftpÐÒé¡£s3aÊǶÔAmazonÔÆ·þÎñÌṩµÄ´æ´¢ÏµÍ³µÄʵÏÖ£¬azureÔòÊÇ΢ÈíµÄÔÆ·þÎñƽ̨ʵÏÖ¡£
Ç°ÃæÎÒÃÇÌáµ½ÁËʹÓÃÃüÁîÐиúHDFS½»»¥£¬ÊÂʵÉÏ»¹Óкܶ෽ʽÀ´²Ù×÷Îļþϵͳ¡£ÀýÈçJavaÓ¦ÓóÌÐò¿ÉÒÔʹÓÃorg.apache.hadoop.fs.FileSystemÀ´²Ù×÷£¬ÆäËûÐÎʽµÄ²Ù×÷Ò²¶¼ÊÇ»ùÓÚFileSystem½øÐзâ×°¡£ÎÒÃÇÕâÀïÖ÷Òª½éÉÜÒ»ÏÂHTTPµÄ½»»¥·½Ê½¡£
WebHDFSºÍSWebHDFSÐÒ齫Îļþϵͳ±©Â¶HTTP²Ù×÷£¬ÕâÖÖ½»»¥·½Ê½±ÈÔÉúµÄJav¿Í»§¶ËÂý£¬²»ÊʺϲÙ×÷´óÎļþ¡£Í¨¹ýHTTP£¬ÓÐ2ÖÖ·ÃÎÊ·½Ê½£¬Ö±½Ó·ÃÎʺÍͨ¹ý´úÀí·ÃÎÊ
Ö±½Ó·ÃÎÊ
Ö±½Ó·ÃÎʵÄʾÒâͼÈçÏ£º

NamenodeºÍDatanodeĬÈÏ´ò¿ªÁËǶÈëʽweb server£¬¼´dfs.webhdfs.enabledĬÈÏΪtrue¡£webhdfsͨ¹ýÕâЩ·þÎñÆ÷À´½»»¥¡£ÔªÊý¾ÝµÄ²Ù×÷ͨ¹ýnamenodeÍê³É£¬ÎļþµÄ¶ÁдÊ×ÏÈ·¢µ½namenode£¬È»ºóÖØ¶¨Ïòµ½datanode¶ÁÈ¡£¨Ð´È룩ʵ¼ÊµÄÊý¾ÝÁ÷¡£
ͨ¹ýHDFS´úÀí

²ÉÓôúÀíµÄʾÒâͼÈçÉÏËùʾ¡£ ʹÓôúÀíµÄºÃ´¦ÊÇ¿ÉÒÔͨ¹ý´úÀíʵÏÖ¸ºÔؾùºâ»òÕß¶Ô´ø¿í½øÐÐÏÞÖÆ£¬»òÕß·À»ðǽÉèÖᣴúÀíͨ¹ýHTTP»òÕßHTTPS±©Â¶ÎªWebHDFS£¬¶ÔӦΪwebhdfsºÍswebhdfs
URL Schema¡£
´úÀí×÷Ϊ¶ÀÁ¢µÄÊØ»¤½ø³Ì£¬¶ÀÁ¢ÓÚnamenodeºÍdatanode£¬Ê¹ÓÃhttpfs.sh½Å±¾£¬Ä¬ÈÏÔËÐÐÔÚ14000¶Ë¿Ú
³ýÁËFileSystemÖ±½Ó²Ù×÷£¬ÃüÁîÐУ¬HTTTPÍ⣬»¹ÓÐCÓïÑÔAPI£¬NFS£¬FUSERµÈ·½Ê½£¬ÕâÀï²»×ö¹ý¶à½éÉÜ¡£
6. Java½Ó¿Ú
ʵ¼ÊµÄÓ¦ÓÃÖУ¬¶ÔHDFSµÄ´ó¶àÊý²Ù×÷»¹ÊÇͨ¹ýFileSystemÀ´²Ù×÷£¬Õⲿ·ÖÖØµã½éÉÜÒ»ÏÂÏà¹ØµÄ½Ó¿Ú£¬Ö÷Òª¹Ø×¢HDFSµÄʵÏÖÀàDistributedFileSystem¼°Ïà¹ØÀà¡£
6.1 ¶Á²Ù×÷
¿ÉÒÔʹÓÃURLÀ´¶ÁÈ¡Êý¾Ý£¬»òÕß¶øÖ±½ÓʹÓÃFileSystem²Ù×÷¡£
´ÓHadoop URL¶ÁÈ¡Êý¾Ý
java.net.URLÀàÌṩÁË×ÊÔ´¶¨Î»µÄͳһ³éÏó£¬ÈκÎÈ˶¼¿ÉÒÔ×Ô¼º¶¨ÒåÒ»ÖÖURL Schema£¬²¢ÌṩÏàÓ¦µÄ´¦ÀíÀàÀ´½øÐÐʵ¼ÊµÄ²Ù×÷¡£hdfs
schema±ãÊÇÕâÑùµÄÒ»ÖÖʵÏÖ¡£
InputStream in
= null;
try {
in = new URL("hdfs://master/user/hadoop").openStream();
}finally{
IOUtils.closeStream(in);
} |
ΪÁËʹÓÃ×Ô¶¨ÒåµÄSchema£¬ÐèÒªÉèÖÃURLStreamHandlerFactory£¬Õâ¸ö²Ù×÷Ò»¸öJVMÖ»ÄܽøÐÐÒ»´Î£¬¶à´Î²Ù×÷»áµ¼Ö²»¿ÉÓã¬Í¨³£ÔÚ¾²Ì¬¿éÖÐÍê³É¡£ÏÂÃæµÄ½ØÍ¼ÊÇÒ»¸öʹÓÃʾÀý£º


ʹÓÃFileSystem API¶ÁÈ¡Êý¾Ý
1£© Ê×ÏÈ»ñÈ¡FileSystemʵÀý£¬Ò»°ãʹÓþ²Ì¬get¹¤³§·½·¨
public static
FileSystem get(Configuration conf) throws IOException
public static FileSystem get(URI uri , Configuration
conf) throws IOException
public static FileSystem get(URI uri , Configuration
conf£¬String user) throws IOException |
Èç¹ûÊDZ¾µØÎļþ£¬Í¨¹ýgetLocal»ñÈ¡±¾µØÎļþϵͳ¶ÔÏó£º
public static
LocalFileSystem getLocal(COnfiguration conf) thrown
IOException |
2£©µ÷ÓÃFileSystemµÄopen·½·¨»ñȡһ¸öÊäÈëÁ÷:
public FSDataInputStream
open(Path f) throws IOException
public abstarct FSDataInputStream open(Path f
, int bufferSize) throws IOException |
ĬÈÏÇé¿öÏ£¬openʹÓÃ4KBµÄBuffer£¬¿ÉÒÔ¸ù¾ÝÐèÒª×ÔÐÐÉèÖá£
3£©Ê¹ÓÃFSDataInputStream½øÐÐÊý¾Ý²Ù×÷
FSDataInputStreamÊÇjava.io.DataInputStreamµÄÌØÊâʵÏÖ£¬ÔÚÆä»ù´¡ÉÏÔö¼ÓÁËËæ»ú¶ÁÈ¡¡¢²¿·Ö¶ÁÈ¡µÄÄÜÁ¦
public class
FSDataInputStream extends DataInputStream
implements Seekable, PositionedReadable,
ByteBufferReadable, HasFileDescriptor, CanSetDropBehind,
CanSetReadahead,
HasEnhancedByteBufferAccess
|
Ëæ»ú¶ÁÈ¡²Ù×÷ͨ¹ýSeekable½Ó¿Ú¶¨Ò壺
public interface
Seekable {
void seek(long pos) throws IOException;
long getPos() throws IOException;
} |
seek²Ù×÷¿ªÏú°º¹ó£¬É÷Óá£
²¿·Ö¶Áȡͨ¹ýPositionedReadable½Ó¿Ú¶¨Ò壺
public interface
PositionedReadable{
public int read(long pistion ,byte[] buffer,int
offser , int length) throws IOException;
public int readFully(long pistion ,byte[] buffer,int
offser , int length) throws IOException;
public int readFully(long pistion ,byte[] buffer)
throws IOException;
} |
6.2 дÊý¾Ý
ÔÚHDFSÖУ¬ÎļþʹÓÃFileSystemÀàµÄcreate·½·¨¼°ÆäÖØÔØÐÎʽÀ´´´½¨£¬create·½·¨·µ»ØÒ»¸öÊä³öÁ÷FSDataOutputStream£¬¿ÉÒÔµ÷Ó÷µ»ØÊä³öÁ÷µÄgetPos·½·¨²é¿´µ±Ç°ÎļþµÄÎ»ÒÆ£¬µ«ÊDz»ÄܽøÐÐseek²Ù×÷£¬HDFS½öÖ§³Ö×·¼Ó²Ù×÷¡£
´´½¨Ê±£¬¿ÉÒÔ´«µÝÒ»¸ö»Øµ÷½Ó¿ÚPeofressable£¬»ñÈ¡½ø¶ÈÐÅÏ¢
append(Path f)·½·¨ÓÃÓÚ×·¼ÓÄÚÈݵ½ÒÑÓÐÎļþ£¬µ«ÊDz¢²»ÊÇËùÓеÄʵÏÖ¶¼Ìṩ¸Ã·½·¨£¬ÀýÈçAmazonµÄÎļþʵÏÖ¾ÍûÓÐÌṩ׷¼Ó¹¦ÄÜ¡£
ÏÂÃæÊÇÒ»¸öÀý×Ó£º
String localSrc
= args[0];
String dst = args[1];
InputStream in = new BufferedInputStream(new FileInputStream(localSrc));
COnfiguration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(dst),conf);
OutputStream out = fs.create(new Path(dst), new
Progressable(){
public vid progress(){
System.out.print(.);
}
});
IOUtils.copyBytes(in , out, 4096,true); |
6.3 Ŀ¼²Ù×÷
ʹÓÃmkdirs£¨£©·½·¨,»á×Ô¶¯´´½¨Ã»ÓеÄÉϼ¶Ä¿Â¼
HDFSÖÐÔªÊý¾Ý·â×°ÔÚFileStatusÀàÖУ¬°üÀ¨³¤¶È¡¢block size£¬replicaions£¬ÐÞ¸Äʱ¼ä¡¢ËùÓÐÕß¡¢È¨ÏÞµÈÐÅÏ¢¡£Ê¹ÓÃFileSystemÌṩµÄgetFileStatus·½·¨»ñÈ¡FileStatus¡£exists()·½·¨ÅжÏÎļþ»òÕßĿ¼ÊÇ·ñ´æÔÚ£»
ÁгöÎļþ£¨list£©£¬ÔòʹÓÃlistStatus·½·¨£¬¿ÉÒԲ鿴Îļþ»òÕßĿ¼µÄÐÅÏ¢
public abstract
FileStatus[] listStatus(Path f) throws FileNotFoundException,
IOException; |
PathÊǸöÎļþµÄʱºò£¬·µ»Ø³¤¶ÈΪ1µÄÊý×é¡£FileUtilÌṩµÄstat2Paths·½·¨ÓÃÓÚ½«FileStatusת»¯ÎªPath¶ÔÏó¡£
globStatusÔòʹÓÃͨÅä·û¶ÔÎļþ·¾¶½øÐÐÆ¥Å䣺
public FileStatus[]
globStatus(Path pathPattern) throws IOException
|
PathFilterÓÃÓÚ×Ô¶¨ÒåÎļþÃû¹ýÂË£¬²»Äܸù¾ÝÎļþÊôÐÔ½øÐйýÂË£¬ÀàËÆÓÚjava.io.FileFilter¡£ÀýÈçÏÂÃæÕâ¸öÀý×ÓÅųýµ½¸ø¶¨ÕýÔò±í´ïʽµÄÎļþ£º
public interfacePathFilter{
boolean accept(Path path);
} |
6.4 ɾ³ýÊý¾Ý
ʹÓÃFileSystemµÄdelete()·½·¨
public boolean
delete(Path f , boolean recursive) throws IOException;
|
recursive²ÎÊýÔÚfÊǸöÎļþµÄʱºò±»ºöÂÔ¡£Èç¹ûfÊÇÎļþ²¢ÇÒrecursiceΪtrue£¬Ôòɾ³ýÕû¸öĿ¼£¬·ñÔòÅ׳öÒì³£.
7. Êý¾ÝÁ÷(¶ÁдÁ÷³Ì£©
½ÓÏÂÀ´Ïêϸ½éÉÜHDFS¶ÁдÊý¾ÝµÄÁ÷³Ì£¬ÒÔ¼°Ò»ÖÂÐÔÄ£ÐÍÏà¹ØµÄһЩ¸ÅÄî¡£
7.1 ¶ÁÎļþ
´óÖ¶ÁÎļþµÄÁ÷³ÌÈçÏ£º

1£©¿Í»§¶Ë´«µÝÒ»¸öÎļþPath¸øFileSystemµÄopen·½·¨
2£©DFS²ÉÓÃRPCÔ¶³Ì»ñÈ¡Îļþ×ʼµÄ¼¸¸öblockµÄdatanodeµØÖ·¡£Namenode»á¸ù¾ÝÍøÂçÍØÆË½á¹¹¾ö¶¨·µ»ØÄÄЩ½Úµã£¨Ç°ÌáÊǽڵãÓÐblock¸±±¾£©£¬Èç¹û¿Í»§¶Ë±¾ÉíÊÇDatanode²¢ÇÒ½ÚµãÉϸպÃÓÐblock¸±±¾£¬Ö±½Ó´Ó±¾µØ¶ÁÈ¡¡£
3£©¿Í»§¶ËʹÓÃopen·½·¨·µ»ØµÄFSDataInputStream¶ÔÏó¶ÁÈ¡Êý¾Ý£¨µ÷ÓÃread·½·¨£©
4£©DFSInputStream£¨FSDataInputStreamʵÏÖÁ˸ÄÀࣩÁ¬½Ó³ÖÓеÚÒ»¸öblockµÄ¡¢×î½üµÄ½Úµã£¬·´¸´µ÷ÓÃread·½·¨¶ÁÈ¡Êý¾Ý
5£©µÚÒ»¸öblock¶ÁÈ¡Íê±ÏÖ®ºó£¬Ñ°ÕÒÏÂÒ»¸öblockµÄ×î¼Ñdatanode£¬¶ÁÈ¡Êý¾Ý¡£Èç¹ûÓбØÒª£¬DFSInputStream»áÁªÏµNamenode»ñÈ¡ÏÂÒ»ÅúBlock
µÄ½ÚµãÐÅÏ¢(´æ·ÅÓÚÄڴ棬²»³Ö¾Ã»¯£©£¬ÕâЩѰַ¹ý³Ì¶Ô¿Í»§¶Ë¶¼ÊDz»¿É¼ûµÄ¡£
6£©Êý¾Ý¶ÁÈ¡Íê±Ï£¬¿Í»§¶Ëµ÷ÓÃclose·½·¨¹Ø±ÕÁ÷¶ÔÏó
ÔÚ¶ÁÊý¾Ý¹ý³ÌÖУ¬Èç¹ûÓëDatanodeµÄͨÐÅ·¢Éú´íÎó£¬DFSInputStream¶ÔÏó»á³¢ÊÔ´ÓÏÂÒ»¸ö×î¼Ñ½Úµã¶ÁÈ¡Êý¾Ý£¬²¢ÇÒ¼Çס¸Ãʧ°Ü½Úµã£¬
ºóÐøBlockµÄ¶ÁÈ¡²»»áÔÙÁ¬½Ó¸Ã½Úµã
¶Áȡһ¸öBlockÖ®ºó£¬DFSInputStram»á½øÐмìÑéºÍÑéÖ¤£¬Èç¹ûBlockË𻵣¬³¢ÊÔ´ÓÆäËû½Úµã¶ÁÈ¡Êý¾Ý£¬²¢ÇÒ½«Ë𻵵Äblock»ã±¨¸øNamenode¡£
¿Í»§¶ËÁ¬½ÓÄĸödatanode»ñÈ¡Êý¾Ý£¬ÊÇÓÉnamenodeÀ´Ö¸µ¼µÄ£¬ÕâÑù¿ÉÒÔÖ§³Ö´óÁ¿²¢·¢µÄ¿Í»§¶ËÇëÇó£¬namenode¾¡¿ÉÄܽ«Á÷Á¿¾ùÔÈ·Ö²¼µ½Õû¸ö¼¯Èº¡£
BlockµÄλÖÃÐÅÏ¢ÊÇ´æ´¢ÔÚnamenodeµÄÄÚ´æÖУ¬Òò´ËÏàӦλÖÃÇëÇó·Ç³£¸ßЧ£¬²»»á³ÉΪƿ¾±¡£
7.2 дÎļþ

²½Öè·Ö½â
1£©¿Í»§¶Ëµ÷ÓÃDistributedFileSystemµÄcreate·½·¨
2£©DistributedFileSystemÔ¶³ÌRPCµ÷ÓÃNamenodeÔÚÎļþϵͳµÄÃüÃû¿Õ¼äÖд´½¨Ò»¸öÐÂÎļþ£¬´Ëʱ¸ÃÎļþûÓйØÁªµ½ÈκÎblock¡£
Õâ¸ö¹ý³ÌÖУ¬Namenode»á×öºÜ¶àУÑ鹤×÷£¬ÀýÈçÊÇ·ñÒѾ´æÔÚͬÃûÎļþ£¬ÊÇ·ñÓÐȨÏÞ£¬Èç¹ûÑé֤ͨ¹ý£¬·µ»ØÒ»¸öFSDataOutputStream¶ÔÏó¡£
Èç¹ûÑéÖ¤²»Í¨¹ý£¬Å׳öÒì³£µ½¿Í»§¶Ë¡£
3£©¿Í»§¶ËдÈëÊý¾ÝµÄʱºò£¬DFSOutputStream·Ö½âΪpackets£¬²¢Ð´Èëµ½Ò»¸öÊý¾Ý¶ÓÁÐÖУ¬¸Ã¶ÓÁÐÓÉDataStreamerÏû·Ñ¡£
4£©DateStreamer¸ºÔðÇëÇóNamenode·ÖÅäеÄblock´æ·ÅµÄÊý¾Ý½Úµã¡£ÕâЩ½Úµã´æ·Åͬһ¸öBlockµÄ¸±±¾£¬¹¹³ÉÒ»¸ö¹ÜµÀ¡£
DataStreamer½«packerдÈëµ½¹ÜµÀµÄµÚÒ»¸ö½Úµã£¬µÚÒ»¸ö½Úµã´æ·ÅºÃpackerÖ®ºó£¬×ª·¢¸øÏÂÒ»¸ö½Úµã£¬ÏÂÒ»¸ö½Úµã´æ·Å
Ö®ºó¼ÌÐøÍùÏ´«µÝ¡£
5£©DFSOutputStreamͬʱά»¤Ò»¸öack queue¶ÓÁУ¬µÈ´ýÀ´×ÔdatanodeÈ·ÈÏÏûÏ¢¡£µ±¹ÜµÀÉϵÄËùÓÐdatanode¶¼È·ÈÏÖ®ºó£¬packer´Óack¶ÓÁÐÖÐÒÆ³ý¡£
6£©Êý¾ÝдÈëÍê±Ï£¬¿Í»§¶ËcloseÊä³öÁ÷¡£½«ËùÓеÄpacketˢе½¹ÜµÀÖУ¬È»ºó°²ÐĵȴýÀ´×ÔdatanodeµÄÈ·ÈÏÏûÏ¢¡£È«²¿µÃµ½È·ÈÏÖ®ºó¸æÖªNamenodeÎļþÊÇÍêÕûµÄ¡£
Namenode´ËʱÒѾ֪µÀÎļþµÄËùÓÐBlockÐÅÏ¢£¨ÒòΪDataStreamerÊÇÇëÇóNamenode·ÖÅäblockµÄ£©£¬Ö»ÐèµÈ´ý´ïµ½×îС¸±±¾ÊýÒªÇó£¬È»ºó·µ»Ø³É¹¦ÐÅÏ¢¸ø¿Í»§¶Ë¡£
NamenodeÈçºÎ¾ö¶¨¸±±¾´æÔÚÄĸöDatanode£¿
HDFSµÄ¸±±¾µÄ´æ·Å²ßÂÔÊǿɿ¿ÐÔ¡¢Ð´´ø¿í¡¢¶Á´ø¿íÖ®¼äµÄȨºâ¡£Ä¬ÈϲßÂÔÈçÏ£º

µÚÒ»¸ö¸±±¾·ÅÔÚ¿Í»§¶ËÏàͬµÄ»úÆ÷ÉÏ£¬Èç¹û»úÆ÷ÔÚ¼¯ÈºÖ®Íâ£¬Ëæ»úÑ¡ÔñÒ»¸ö£¨µ«Êǻᾡ¿ÉÄÜÑ¡ÔñÈÝÁ¿²»ÊÇÌ«Âý»òÕßµ±Ç°²Ù×÷Ì«·±Ã¦µÄ£©
µÚ¶þ¸ö¸±±¾Ëæ»ú·ÅÔÚ²»Í¬ÓÚµÚÒ»¸ö¸±±¾µÄ»ú¼ÜÉÏ¡£
µÚÈý¸ö¸±±¾·ÅÔÚ¸úµÚ¶þ¸ö¸±±¾Í¬Ò»»ú¼ÜÉÏ£¬µ«ÊDz»Í¬µÄ½ÚµãÉÏ£¬Âú×ãÌõ¼þµÄ½ÚµãÖÐËæ»úÑ¡Ôñ¡£
¸ü¶àµÄ¸±±¾ÔÚÕû¸ö¼¯ÈºÉÏËæ»úÑ¡Ôñ£¬ËäÈ»»á¾¡Á¿±ãÃæÌ«¶à¸±±¾ÔÚͬһ»ú¼ÜÉÏ¡£
¸±±¾µÄλÖÃÈ·¶¨Ö®ºó£¬ÔÚ½¨Á¢Ð´Èë¹ÜµÀµÄʱºò£¬»á¿¼ÂÇÍøÂçÍØÆË½á¹¹¡£ÏÂÃæÊÇ¿ÉÄܵÄÒ»¸ö´æ·Å²ßÂÔ£ºÕâÑùÑ¡ÔñºÜºÃµÎƽºâÁ˿ɿ¿ÐÔ¡¢¶ÁдÐÔÄÜ
¿É¿¿ÐÔ£ºBlock·Ö²¼ÔÚÁ½¸ö»ú¼ÜÉÏ
д´ø¿í£ºÐ´Èë¹ÜµÀµÄ¹ý³ÌÖ»ÐèÒª¿çÔ½Ò»¸ö½»»»»ú
¶Á´ø¿í£º¿ÉÒÔ´ÓÁ½¸ö»ú¼ÜÖÐÈÎѡһ¸ö¶ÁÈ¡
7.3 Ò»ÖÂÐÔÄ£ÐÍ
Ò»ÖÂÐÔÄ£ÐÍÃèÊöÎļþϵͳÖжÁд²Ù×ݵĿɼûÐÔ¡£HDFSÖУ¬ÎļþÒ»µ©´´½¨Ö®ºó£¬ÔÚÎļþϵͳµÄÃüÃû¿Õ¼äÖпɼû£º
Path p = new
Path("p");
fs.create(p);
assertTaht(fs.exists(p),is(true)); |
µ«ÊÇÈκα»Ð´Èëµ½ÎļþµÄÄÚÈݲ»±£Ö¤¿É¼û£¬¼´Ê¹¶ÔÏóÁ÷ÒѾ±»Ë¢Ð¡£
¡°`java
Path p = new Path(¡°p¡±);
OutputStream out = fs.create(p);
out.write(¡°content¡±.getBytes(¡°UTF-8¡±));
out.flush();
assertTaht(fs.getFileStatus(p).getLen,0L); //
Ϊ0£¬¼´Ê¹µ÷ÓÃÁËflush
|
Èç¹ûÐèÒªÇ¿ÖÆË¢ÐÂÊý¾Ýµ½Datanode£¬Ê¹ÓÃFSDataOutputStreamµÄhflush·½·¨Ç¿Öƽ«»º³åË¢µ½datanode
hflushÖ®ºó£¬HDFS±£Ö¤µ½Õâ¸öʱ¼äµãΪֹдÈëµ½ÎļþµÄÊý¾Ý¶¼µ½´ïËùÓеÄÊý¾Ý½Úµã¡£
```java
Path p = new Path("p");
OutputStream out = fs.create(p);
out.write("content".getBytes("UTF-8"));
out.flush();
assertTaht(fs.getFileStatus(p).getLen,is (((long,"content".length())));
|
¹Ø±Õ¶ÔÏóÁ÷ʱ£¬ÄÚ²¿»áµ÷ÓÃhflush·½·¨,µ«ÊÇhflush²»±£Ö¤datanodeÊý¾ÝÒѾдÈëµ½´ÅÅÌ£¬Ö»ÊDZ£Ö¤Ð´Èëµ½datanodeµÄÄڴ棬
Òò´ËÔÚ»úÆ÷¶ÏµçµÄʱºò¿ÉÄܵ¼ÖÂÊý¾Ý¶ªÊ§£¬Èç¹ûÒª±£Ö¤Ð´Èë´ÅÅÌ£¬Ê¹ÓÃhsync·½·¨£¬hsyncÀàÐÍÓëfsync£¨£©µÄϵͳµ÷Óã¬fsyncÌύij¸öÎļþ¾ä±úµÄ»º³åÊý¾Ý¡£
FileOutputStreamout
= new FileOutPutStream (localFile);
out.write("content".getBytes("UTF-8"));
out.flush();
out.getFD().sync();
assertTaht(localFile.getLen,is(((long, "content".length())));
|
ʹÓÃhflush»òhsync»áµ¼ÖÂÍÌÍÂÁ¿Ï½µ£¬Òò´ËÉè¼ÆÓ¦ÓÃʱ£¬ÐèÒªÔÚÍÌÍÂÁ¿ÒÔ¼°Êý¾ÝµÄ½¡×³ÐÔÖ®¼ä×öȨºâ¡£
ÁíÍ⣬ÎļþдÈë¹ý³ÌÖУ¬µ±Ç°ÕýÔÚдÈëµÄBlock¶ÔÆäËûReader²»¿É¼û¡£
7.4 Hadoop½Úµã¾àÀë
ÔÚ¶ÁÈ¡ºÍдÈëµÄ¹ý³ÌÖУ¬namenodeÔÚ·ÖÅäDatanodeµÄʱºò£¬»á¿¼ÂǽڵãÖ®¼äµÄ¾àÀë¡£HDFSÖУ¬¾àÀëûÓÐ
²ÉÓôø¿íÀ´ºâÁ¿£¬ÒòΪʵ¼ÊÖкÜÄÑ׼ȷ¶ÈÁ¿Á½Ì¨»úÆ÷Ö®¼äµÄ´ø¿í¡£
Hadoop°Ñ»úÆ÷Ö®¼äµÄÍØÆË½á¹¹×éÖ¯³ÉÊ÷½á¹¹£¬²¢ÇÒÓõ½´ï¹«¹²¸¸½ÚµãËùÐèÌø×ªÊýÖ®ºÍ×÷Ϊ¾àÀë¡£ÊÂʵÉÏÕâÊÇÒ»¸ö¾àÀë¾ØÕóµÄÀý×Ó¡£ÏÂÃæµÄÀý×Ó¼òÃ÷µØËµÃ÷Á˾àÀëµÄ¼ÆË㣺


Hadoop¼¯ÈºµÄÍØÆË½á¹¹ÐèÒªÊÖ¶¯ÅäÖã¬Èç¹ûûÅäÖã¬HadoopĬÈÏËùÓнڵãλÓÚͬһ¸öÊý¾ÝÖÐÐĵÄͬһ»ú¼ÜÉÏ¡£
8 Ïà¹ØÔËά¹¤¾ß
8.1 ʹÓÃdistcp²¢Ðи´ÖÆ
Ç°ÃæµÄ¹Ø×¢µã¶¼ÔÚÓÚµ¥Ï̵߳ķÃÎÊ£¬Èç¹ûÐèÒª²¢Ðд¦ÀíÎļþ£¬ÐèÒª×Ô¼º±àдӦÓá£HadoopÌṩµÄdistcp¹¤¾ßÓÃÓÚ²¢Ðе¼ÈëÊý¾Ýµ½Hadoop»òÕß´ÓHadoopµ¼³ö¡£Ò»Ð©Àý×Ó£º
hadoop distcp
file1 file2 //¿ÉÒÔ×÷Ϊfs -cpÃüÁîµÄ¸ßÐ§Ìæ´ú
hadoop distcp dir1 dir2
hadoop distcp -update dir1 dir2 #update²ÎÊý±íʾֻͬ²½±»¸üеÄÎļþ£¬ÆäËû±£³Ö²»±ä
|
distcpÊǵײãʹÓÃMapReduceʵÏÖ£¬Ö»ÓÐmapʵÏÖ£¬Ã»ÓÐreduce¡£ÔÚmapÖв¢Ðи´ÖÆÎļþ¡£
distcp¾¡¿ÉÄÜÔÚmapÖ®¼äƽ¾ù·ÖÅäÎļþ¡£mapµÄÊýÁ¿¿ÉÒÔͨ¹ý-m²ÎÊýÖ¸¶¨:
hadoop distcp
-update -delete -p hdfs://master1:9000/foo hdfs://master2/foo
|
ÕâÑùµÄ²Ù×÷³£ÓÃÓÚÔÚÁ½¸ö¼¯ÈºÖ®¼ä¸´ÖÆÊý¾Ý£¬update²ÎÊý±íʾֻͬ²½±»¸üйýµÄÊý¾Ý£¬delete»áɾ³ýÄ¿±êĿ¼ÖдæÔÚ£¬µ«ÊÇԴĿ¼²»´æÔÚµÄÎļþ¡£p²ÎÊý±íʾ±£ÁôÎļþµÄȫУ¡¢block´óС¡¢¸±±¾ÊýÁ¿µÈÊôÐÔ¡£
Èç¹ûÁ½¸ö¼¯ÈºµÄHadoop°æ±¾²»¼æÈÝ£¬¿ÉÒÔʹÓÃwebhdfsÐÒ飺
hadoop distcp
webhdfs: //namenode1: 50070/foo webhdfs: //namenode2:
50070/foo |
8.2 ƽºâHDFS¼¯Èº
ÔÚdistcp¹¤¾ßÖУ¬Èç¹ûÎÒÃÇÖ¸¶¨mapÊýÁ¿Îª1£¬²»½öËٶȺÜÂý£¬Ã¿¸öBlockµÚÒ»¸ö¸±±¾½«È«²¿Âäµ½ÔËÐÐÕâ¸öΨһmapµÄ½ÚµãÉÏ£¬Ö±µ½´ÅÅÌÒç³ö¡£Òò´ËʹÓÃdistcpµÄʱºò£¬×îºÃʹÓÃĬÈϵÄmapÊýÁ¿£¬¼´20.
HDFSÔÚBlock¾ùÔÈ·Ö²¼ÔÚ¸÷¸ö½ÚµãÉϵÄʱºò¹¤×÷µÃ×îºÃ£¬Èç¹ûûÓа취ÔÚ×÷ÒµÖо¡Á¿±£³Ö¼¯ÈºÆ½ºâ£¬ÀýÈçΪÁËÏÞÖÆmapÊýÁ¿£¨ÒÔ±ãÆäËû½Úµã¿ÉÒÔ±»±ðµÄ×÷ҵʹÓã©£¬ÄÇô¿ÉÒÔʹÓÃbalancer¹¤¾ßÀ´µ÷Õû¼¯ÈºµÄBlock·Ö²¼¡£
|