½üÈÕ£¬Pinterest¹«¿ªÁËÆä´óÊý¾Ýƽ̨µÄ´òÔìÀíÄ¡ªÔÚÖØ¶ÈʹÓÃHadoopºÍAWSµÄÇé¿öÏ£¬PinterestÁ¦Õù´òÔìÒ»¸ö×Ô·þÎñµÄƽ̨¡£¶øÔÚÆ½Ì¨´òÔìµÄ¹ý³ÌÖУ¬ËûÃÇ»¹²»µÃ²»ºâÁ¿¶à¸öMapReduce½â¾ö·½°¸µÄÀ©Õ¹ÐÔµÈÎÊÌâ¡£
´óÊý¾ÝΪPinterest´òÔìÁËÏßÉÏ×î·á¸»µÄÐËȤ¼¯£¬ÔÚÍøÕ¾µÄÅäÖúÍÔËÓªÖз¢»Ó×ÅÖØÒªµÄ×÷Óã¬ÎªÁËѸËٴ´óÊý¾Ýƽ̨£¬Pinterest½«µ¥¸ö¼¯ÈºHadoop»ù´¡ÉèÊ©Éý¼¶ÎªÒ»¸öͨÓõÄ×Ô·þÎñƽ̨¡£½üÈÕ£¬PinterestÔڸù«Ë¾µÄ²©¿ÍÉϹ«²¼ÁËÕâ¸öƽ̨µÄ´òÔì¹ý³Ì¡£
ÒÔÏÂΪÒëÎÄ£º
´óÊý¾ÝÔÚPinterestÖаçÑÝ×ÅÖØÒªµÄ½ÇÉ«¡£ÏµÍ³ÖÐÓÐ300¶àÒÚPins£¬ÎÒÃÇ´òÔìÁËÏßÉÏ×î·á¸»µÄÐËȤ¼¯¡£´òÔì¸öÐÔ»¯ËÑË÷ÒýÇæµÄÒ»¸öÌôÕ½ÊÇÀ©Õ¹Êý¾Ý»ù´¡ÉèÊ©ÒÔ±éÀúÐËȤͼÆ×£¬½ø¶øÌáȡÿһPinµÄÄÚÈݺÍÒâͼ¡£
ĿǰÎÒÃÇÿÌì¸üÐÂ20TBÊý¾Ý£¬S3ÿÌì»á¸üдó¸Å10TBÊý¾Ý¡£ÎÒÃÇʹÓÃHadoop´¦ÀíÕâЩÊý¾Ý£¬HadoopʹµÃÎÒÃÇ¿ÉÒÔͨ¹ýRelated Pins¡¢Guided Search¼°image processingµÈ¹¦Äܽ«×îÏà¹ØºÍ×îеÄÄÚÈݳÊÏÖ¸øPinners¡£HadoopÿÌì¿ÉÒÔ°ïÖúÎÒÃÇÖ´ÐÐÊýǧ¸ö¶ÈÁ¿£¬Ì½²âÑϸñʵÑéÌõ¼þϵÄÓû§±ä»¯²¢½øÐзÖÎö¡£
ΪÁËѸËٴ´óÊý¾ÝÓ¦Óã¬ÎÒÃǽ«µ¥¸ö¼¯ÈºHadoop»ù´¡ÉèÊ©Éý¼¶ÎªÒ»¸öͨÓõÄ×Ô·þÎñƽ̨¡£

ΪHadoop´î½¨Ò»¸ö×Ô·þÎñƽ̨
¾¡¹ÜHadoopÊÇÒ»¸öÇ¿´óµÄ´¦ÀíºÍ´æ´¢ÏµÍ³£¬µ«ÊÇËü»¹²»ÊÇÒ»Ïî¼´²å¼´Óõļ¼Êõ¡£ÒòΪHadoopûÓÐÔÆ¼ÆËãºÍµ¯ÐÔ¼ÆË㣬Ҳ²»ÃæÏò·Ç¼¼ÊõÓû§£¬ËùÒÔ×î³õµÄHadoopÉè¼ÆÎÞ·¨×÷Ϊһ¸ö×Ô·þÎñϵͳ¡£ºÃÔںܶàHadoop¿â¡¢HadoopÓ¦ÓúͷþÎñÌṩÉÌÕë¶ÔÕâЩ¾ÖÏÞÌṩÁ˽â¾ö·½°¸¡£ÔÚÑ¡Ôñ½â¾ö·½°¸Ç°£¬ÎÒÃÇÏÈÌÖÂÛÁËÎÒÃǵÄHadoopÉèÖÃÐèÇó¡£
1.¶à×â»§¸ôÀ룺MapReduceÉÏÓÐÐí¶àÐèÇóºÍÅäÖò»Í¬Ó¦ÓóÌÐò£¬¿ª·¢ÕßÓ¦¸ÃÔÚ²»Ó°ÏìËûÈ˹¤×÷µÄǰÌáÏÂÓÅ»¯×Ô¼ºµÄ¹¤×÷¡£
2.µ¯ÐÔ£ºÅú´¦Àíͨ³£ÐèҪͻ·¢ÐÔÄÜÀ´Ö§³ÖʵÑ鿪·¢¡£Ò»¸öÀíÏëµÄÅäÖÃÖУ¬ÎÒÃÇ¿ÉÒÔÀ©Õ¹ÖÁÊýǧ¸ö½Úµã¼¯Èº£¬È»ºóÔÚ²»µ¼ÖÂÈκÎÖжϺÍÊý¾ÝËðʧµÄÇé¿öϼõÉÙ¹æÄ£¡£
3.¶à¼¯ÈºÖ§³Ö£º¾¡¹ÜÎÒÃÇ¿ÉÒÔˮƽÀ©Õ¹µ¥¸öHadoop¼¯Èº£¬ÎÒÃÇ·¢ÏÖ£ººÜÄÑ»ñµÃÀíÏëµÄ¸ôÀëÐԺ͵¯ÐÔ£»ÖîÈçÒþ˽¡¢°²È«¡¢³É±¾·Ö̯µÈÉÌÒµÐèÇóʹ¶à¼¯ÈºÖ§³Ö¸üΪʵÓá£
4.Ö§³ÖÁÙʱ¼¯Èº£ºÓû§Ó¦µ±¿ÉÒÔÔÚÐèҪʹÓü¯ÈºÊ±»ñµÃ¼¯Èº£¬²¢¿ÉÒÔËæÊ±Í˳ö¼¯Èº¡£¼¯ÈºÔÚºÏÀíµÄʱ¼ä·¶Î§ÄÚ´æÔÚ£¬²¢¿ÉÒÔÔÚ²»ÐèÒªÊÖ¶¯ÅäÖõÄÇé¿öÏÂÈ«ÃæÖ§³ÖËùÓеÄHadoop¹¤×÷¡£
5.Ò×ÓÚÈí¼þ°ü²¿Ê𣺴ÓOSºÍHadoop²ãµ½¾ßÌåÒµÎñ½Å²ãÃæ£¬ÎÒÃÇÐèҪΪÓû§Ìṩ¶¨ÖÆ»¯µÄ½Ó¿Ú¡£
6.¹²ÏíÊý¾Ý´æ´¢£ºHadoopÒ²Ó¦¿ÉÒÔ·ÃÎÊÆäËü¼¯Èº²úÉúµÄÊý¾Ý¡£
7.·ÃÎÊ¿ØÖƲ㣺ºÍÆäËüµÄ·þÎñµ¼ÏòµÄϵͳһÑù£¬ÎÒÃÇÐèÒª¿ìËÙÌí¼ÓºÍÐ޸ķÃÎÊ£¨Èç·ÇSSH¹Ø¼ü´Ê£©¡£ÀíÏëÇé¿öÏ£¬ÎÒÃÇ¿ÉÒÔºÍÏÖÓÐÈÏÖ¤£¨Èçͨ¹ýOAUTH£©ÕûºÏ¡£
ȨºâºÍʵʩ
×ܽá³öÐèÇóºó£¬ÎÒÃÇÔÚһϵÁÐ×ÔÐпª·¢µÄ¡¢¿ªÔ´µÄºÍÉÌҵרÓеĽâ¾ö·½°¸ÖÐѰÕÒ·ûºÏÎÒÃÇÐèÇóµÄ½â¾ö·½°¸¡£
½âñî¼ÆËãºÍ´æ´¢£ºÎª¼Ó¿ì´¦ÀíËÙ¶È£¬´«Í³µÄMapReduce²ÉÓÃÊý¾Ý±¾µØ»¯¡£Êµ¼ÊÖУ¬ÎÒÃÇ·¢ÏÖÍøÂçI/O£¨ÎÒÃÇʹÓõÄÊÇS3£©²¢Ã»ÓбȴÅÅÌI/OÂýºÜ¶à¡£Í¨¹ýÖ§¸¶ÍøÂçI/OµÄ±ß¼Ê³É±¾ºÍ½«¼ÆËã´Ó´æ´¢·ÖÀ룬ÎÒÃǺÜÈÝÒ×µÄʵÏÖÎÒÃǵÄ×Ô·þÎñHadoopƽ̨µÄÐí¶àÐèÇó¡£ÀýÈ磬ÒòΪÎÒÃDz»ÔÙÐèÒª¿¼ÂǼÓÔØ»òͬ²½Êý¾Ý£¬ËùÒԶ༯Ⱥ֧³Ö±äµÃºÜÈÝÒ×£¬ÈκÎÏÖÓлò½«À´µÄ¼¯Èº¶¼¿ÉÒÔͨ¹ýÒ»¸ö¹²ÏíÎļþϵͳʹÓÃÊý¾Ý¡£²»ÐèÒªµ£ÐÄÊý¾ÝÒâζןü¼òµ¥µÄ²Ù×÷£¬ÕâÊÇÒòΪÎÒÃÇ¿ÉÒÔÔÚ²»¶ªÊ§Èκι¤×÷µÄÇé¿öϽøÐÐÓ²¸´Î»»ò¶ªÆúÒ»¸ö¼¯Èº£¬Ç¨ÒƵ½ÁíÒ»¸ö¼¯Èº¡£ÕâÒ²Òâζ×ÅÎÒÃÇ¿ÉÒÔʹÓö¯Ì¬µÄ½Úµã£¬Òò´ËÎÒÃÇ¿ÉÒÔÖ§¸¶µÍÁ®µÄ¼ÆËã»ú·ÑÓ㬲»µ£ÐÄËðʧÈκγ־ÃÐÔÊý¾Ý¡£

¼¯ÖÐʽHiveÔª´æ´¢×÷Ϊ½â¾ö·½°¸£ºÎÒÃǴ󲿷ֵŤ×÷¶¼Ñ¡ÓÃHadoop£¬ÕâÊÇÒòΪSQL½Ó¿ÚºÜ¼òµ¥£¬Òµ½ç¶ÔSQL½Ó¿ÚÒ²ºÜÊìϤ¡£Ëæ×Åʱ¼äµÄÍÆÒÆ£¬ÎÒÃÇ·¢ÏÖʹÓÃÔª´æ´¢×÷ΪHadoop¹¤×÷µÄÊý¾ÝĿ¼ʱ£¬Hive»¹»á´øÀ´¶îÍâµÄºÃ´¦¡£HiveºÜÏñÆäËüµÄSQL¹¤¾ß£¬ËüÌṩÁËÖîÈç¡°show tables¡±£¬¡°describe table¡±ºÍ¡°show partitions¡±µÄ¹¦ÄÜ¡£Õâ¸ö½Ó¿Ú±ÈÔÚĿ¼Öоö¶¨Éú³ÉÎļþµÄÇåµ¥Îļþ¼ò½àµÄ¶à£¬Ò²¿ìµÄ¶à£¬Ò»ÖÂÐÔÒ²¸üºÃ£¬ÕâÊÇÒòΪMySQLÊý¾Ý¿âÖ§³Ö×ÅÕâ¸ö½Ó¿Ú¡£S3µÄÇåµ¥ÎļþºÜÂý£¬S3²»Ö§³ÖÒÆ¶¯£¬»¹ÓÐÒ»ÖÂÐÔµÄÎÊÌâ¡£ÒòΪÎÒÃÇÒÀÀµÓÚS3£¬ËùÒÔHiveµÄÕâÐ©ÌØÐÔÏԵøüÖØÒª¡£
ÎÒÃÇÓÃÓëÏÖÓдÅÅÌÊý¾Ý±£³ÖHiveÔªÊý¾ÝÒ»ÖÂÐԵķ½Ê½ÅÅÁй¤×÷£¨ÊÇHive£¬Cascading£¬Hadoop Steaming»¹ÊÇÆäËüµÄ£©¡£Òò´Ë£¬ÎÒÃÇ¿ÉÒÔÔڶ༯ȺºÍ¶à¹¤×÷Á÷¸üдÅÅÌÊý¾Ý£¬ÎÞÐèµ£ÐÄÓû§¿ÉÄÜ»ñµÃ²¿·ÖÊý¾Ý¡£

¶à²ã°ü/ÅäÖãºHadoopÓ¦Óüä²îÒìºÜ´ó£¬Ã¿¸öÓ¦Óö¼¿ÉÄÜÓжÀÌØµÄÐèÇóºÍÒÀÀµÏî¡£ÎÒÃÇÐèÒªÒ»ÖÖÁé»îµÄ¡¢¿ÉÒÔȨºâ¿É¶¨ÖÆÐԺͿìËÙÅäÖÃ/Ëٶȵķ½·¨¡£
ÎÒÃDzÉÓÃÒ»ÖÖÈý²ãµÄ·½·¨À´¹ÜÀíÒÀÀµÏÕâÖÖ·½·¨¿ÉÒÔ½«²úÉú¡¢µ÷ÓÃÒ»¸öǧ½Úµã¼¯ÈºµÄʱ¼ä´Ó45·ÖÖÓ¼õµ½5·ÖÖÓ¡£

1.Baking AMI
¶ÔÓÚÄÇЩ½Ï´óµÄ¡¢ÐèÒª»¨Ò»¶Îʱ¼ä°²×°µÄÒÀÀµÏÎÒÃǽ«ËûÃÇÔ¤°²×°¡£ÆäÖаüÀ¨ÎÒÃÇΪÁ˹ú¼Ê»¯Ëù²ÉÓõÄHadoop¿âºÍNLP¿â°ü¡£ÎÒÃǽ«ÕâÒ»¹ý³Ì³ÆÎª¡°baking an AMI¡±¡£²»ÐÒµÄÊÇ£¬ºÜ¶àHadoop·þÎñ¹©Ó¦ÉÌÉв»Ö§³ÖÕâÖÖ·½·¨¡£
2.×Ô¶¯»¯ÅäÖã¨ÎÞ¹ÜÀíµÄPuppet£©
ÎÒÃǴ󲿷ֵ͍֯»¯·þÎñÊÇʹÓÃPuppet¹ÜÀíµÄ¡£ÔÚÒýµ¼³ÌÐò½×¶Î£¬ÎÒÃǵļ¯ÈºÔÚÿ¸ö½Úµã¶¼°²×°ºÍÅäÖÃPuppet£¬½öÐ輸·ÖÖÓµÄʱ¼ä£¬Puppet¾Í¿ÉÒÔ½«ÎÒÃǵĽڵãºÍPuppetÅäÖÃÖ¸¶¨µÄÒÀÀµÏîÆ¥Åä¡£
Ŀǰ£¬PuppetÖ÷ÒªµÄ¾ÖÏÞÐÔÈçÏ£ºµ±ÎÒÃÇÔÚÉú²úϵͳÌí¼Óнڵãʱ£¬ÕâЩнڵã»á×Ô¶¯ÁªÏµPuppet¹ÜÀí£¬ÍÆ·ÐÂÅäÖã¬Õâ³£³£»á¸²¸ÇÖ÷½Úµã£¬½ø¶øµ¼Ö´íÎó¡£ÎªÁ˱ÜÃâÕâÖÖ´íÎó£¬ÎÒÃÇÔÊÐíPuppet¿Í»§¶Ë´ÓS3»ñÈ¡ÅäÖã¬ÉèÖÃÒ»¸ö¸ºÔðͬ²½S3ÅäÖúÍPuppet¹ÜÀíµÄ·þÎñ£¬´Ó¶ø½«Puppet¿Í»§¶ËÉèÖÃΪ¡°ÎÞ¹ÜÀí¡±¡£

3.ÔËÐн׶Σ¨ÔÚS3ÉÏ£©
MapReduce¹¤×÷¼ä·¢ÉúµÄ´ó²¿·Ö¶¨ÖÆ»¯·þÎñ¶¼Éæ¼°jars£¬¹¤×÷ÅäÖúÍ×Ô¶¨Òå´úÂë¡£¿ª·¢×éÐèÒª¿ÉÒÔÔÚ¿ª·¢»·¾³ÖÐÐÞ¸ÄÕâЩÒÀÀµÏ²¢ÇÒÔÚ²»Ó°ÏìÆäËû¹¤×÷µÄǰÌáÏÂʹÕâЩÒÀÀµÏîÔÚÎÒÃǵÄÈÎÒâÒ»¸öHadoop¼¯ÈºÖпÉÓá£ÎªÁËȨºâÁé»îÐÔ¡¢ËٶȺ͸ôÀëÐÔ£¬ÎÒÃÇΪS3ÉϵÄÿ¸ö¿ª·¢Õß´´½¨ÁËÒ»¸ö¸ôÀëµÄ¹¤×÷Ŀ¼¡£ÏÖÔÚ£¬µ±Ò»¸ö¹¤×÷Ö´ÐÐʱ£¬Ò»¸ö¹¤×÷Ä¿Â¼ÃæÏòÒ»¸ö¿ª·¢Õߣ¬¹¤×÷·¾¶µÄÒÀÀµÏîÖ±½Ó´ÓS3»ñµÃ¡£

Ö´ÐгéÏó²ã
ÏÈǰ£¬ÎÒÃÇʹÓÃÑÇÂíÑ·µÄElastic MapReduce£¨EMR£©ÔËÐÐÎÒÃǵÄHadoop¹¤×÷¡£EMRºÍS3¡¢SpotʵÀýÔËÐеĺܺã¬Í¨³£Ò²ºÜÎȶ¨¡£µ«µ±ÎÒÃÇÀ©Õ¹µ½¼¸°Ù¸ö½Úµãʱ£¬EMR±äµÃûÄÇôÎȶ¨£¬ÎÒÃÇÓöµ½ÁËEMRµÄ¾ÖÏÞ¡£ÎÒÃÇÔÚEMRÉϴÁ˺ܶàÓ¦Óã¬ÒÔÖÁÓÚÎÒÃǺÜÄÑÇ¨ÒÆµ½Ò»¸öÐÂϵͳ¡£ÎÒÃÇÒ²²»ÖªµÀ¸ü»»µ½ÄÄÖÖϵͳ£¬ÒòΪEMRµÄһЩϸ΢²îÒì»áµ¼ÖÂʵ¼Ê¹¤×÷Âß¼²îÒ졣ΪÁËÊÔÑéÆäËüÀàÐ͵ÄHadoop£¬ÎÒÃÇʵʩÁËÒ»¸öÖ´Ðнӿڣ¬½«ËùÓеÄEMRÌØ¶¨Âß¼¶¼Ç¨ÒƵ½EMRÖ´Ðнӿڡ£ Õâ¸ö½Ó¿ÚʵʩÁËһϵÁз½·¨£¬Èç¡°run_raw_hive_query£¨query_str£©¡± ºÍ ¡°run_java_job£¨class_path£©¡±¡£ÕâʹµÃÎÒÃǾßÓÐÔÚ¼¸ÖÖHadoopºÍHadoop·þÎñ¹©Ó¦ÉÌÉÏʵÑéµÄÁé»îÐÔ£¬²¢¿ÉÒÔÒÔ×îСµÄÍ£»úʱ¼äÖð½¥Ç¨ÒÆ¡£

×îÖÕ²ÉÓÃQubole
×îÖÕÎÒÃǾö¶¨½«ÎÒÃǵÄHadoop¹¤×÷Ç¨ÒÆµ½Qubole£¬QubleÊÇHadoop·þÎñÊг¡µÄÐÂÐã¡£¿¼Âǵ½Ä¿Ç°ÎÒÃǵĹæÄ£ÏÂEMR²»ÔÙÎȶ¨£¬ÎÒÃDZØÐë¿ìËÙÇ¨ÒÆµ½Ò»¸öÁ¼ºÃÖ§³ÖAWS£¨ÌرðÊÇspotʵÀý£©ºÍS3µÄ¹©Ó¦ÉÌ¡£QuboleÖ§³ÖAWS/S3£¬²¢ÇÒÆð²½¼òµ¥¡£ÔÚÉóºËQubole£¬²¢½«ÆäÐÔÄܺͼ¸¸öºòÑ¡Õߣ¨°üÀ¨¹ÜÀí¼¯Èº£©±È½Ïºó£¬ÎÒÃÇÑ¡ÔñÁËQubole£¬ÔÒòÈçÏ£º
- µ¥¸ö¼¯ÈººÍºáÏòÀ©Õ¹ÖÁ1000¸ö½Úµã
- Ìṩ24/7µÄÊý¾Ý»ù´¡ÉèÊ©¹¤³Ì·þÎñ
- ÓëHive½ôÃܼ¯³É
- Google¡°ÃæÏò·Ç¼¼ÊõÓû§µÄOAUTH ACLºÍHive Web UI¡±
- ÃæÏò¼ò»¯µÄÖ´ÐгéÏó²ã+¶à¼¯ÈºÖ§³ÖµÄAPI
- Baking AMI¶¨ÖÆ»¯·þÎñ£¨×¨Òµ°æÖ§³Ö¿ÉÓã©
- ÃæÏòspotʵÀýµÄ¸ß¼¶Ö§³Ö¡ª100%Ö§³ÖspotʵÀý¼¯Èº
- S3×îÖÕÒ»ÖÂÐÔ±£»¤
- ÓÅÑŵļ¯ÈºÀ©Õ¹ºÍ×Ô¶¯À©Õ¹
×ܵÄÀ´Ëµ£¬Ê¹ÓÃQubole¶ÔÎÒÃǶøÑÔÊÇÒ»¸öÕýÈ·µÄ¾ö¶¨£¬QuboleÍŶӵļ¼ÊõºÍʵʩ¹¤×÷ÉîÉîµØ´ò¶¯ÁËÎÒÃÇ¡£´ÓÈ¥Ä꿪ʼ£¬QuboleÖ¤Ã÷ÁËÆäÔÚÅÄ×Ö½Ú¹æÄ£µÄÎȶ¨ÐÔ£¬Ïà±ÈEMR£¬ÎªÎÒÃÇÌá¸ßÁË30%~60%µÄÍÌÍÂÁ¿¡£·Ç¼¼ÊõÓû§Ò²ºÜÈÝÒ×ÉÏÊÖQubole¡£
ÎÒÃÇĿǰµÄ״̬
ÔÚÎÒÃǵ±ÏµÄÅäÖÃÏ£¬HadoopÊÇÒ»ÏîÓ¦ÓÃÔÚ¶à×éÖ¯¡¢²Ù×÷·ÑÓõ͵ÄÁé»î·þÎñ¡£ÎÒÃÇÓÐ100¶à¸ö³£¹æMapreduceÓû§£¬ËûÃÇÿÌìͨ¹ýQuboleÍøÂç½Ó¿Ú¡¢ad-hoc¹¤×÷ºÍ¼Æ»®¹¤×÷Á÷ÔËÐÐ×Å2000¶à¸ö¹¤×÷¡£
ÎÒÃÇÓÐ6¸öHadoop¼¯Èº£¬ËûÃÇÓÉ3000¶à¸ö½Úµã×é³É£¬¿ª·¢Õß¿ÉÒÔÔÚ¼¸·ÖÖÓÄÚÑ¡Ôñ´´½¨×Ô¼ºµÄHadoop¼¯Èº¡£ÎÒÃÇÿÌìÉú³É200ÒÚÈÕÖ¾ÐÅÏ¢£¬´¦Àí´ó¸Å1ÅÄ×Ö½ÚµÄÊý¾Ý¡£
ÎÒÃÇÒ²ÔÚÊÔÑéÕß¹ÜÀíHadoop¼¯Èº£¬ÆäÖаüÀ¨Hadoop2£¬²»¹ýĿǰ£¬Ê¹ÓÃÖîÈçS3ºÍQuboleµÄÔÆ·þÎñ¶ÔÎÒÃǶøÑÔÊÇÕýÈ·µÄÑ¡Ôñ£¬ÒòΪËüÃǽ«ÎÒÃÇ´ÓHadoopµÄ²Ù×÷¿ªÏúÖнâ·Å³öÀ´£¬Ê¹ÎÒÃÇ¿ÉÒÔרעÓÚ´óÊý¾ÝÓ¦ÓõŤ³Ì¹¤×÷¡£ |