Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Model Center   Code  
»áÔ±   
   
 
     
   
 ¶©ÔÄ
  ¾èÖú
´ÓHadoopµ½SparkµÄ¼Ü¹¹Êµ¼ù
 
×÷ÕߣºÑÖÖ¾ÌÎ À´Ô´£ºCSDN ·¢²¼ÓÚ£º2015-7-21
 

±¾ÎÄÔòÖ÷Òª½éÉÜTalkingDataÔÚ´óÊý¾Ýƽ̨½¨Éè¹ý³ÌÖУ¬Öð½¥ÒýÈëSpark£¬²¢ÇÒÒÔHadoop YARNºÍSparkΪ»ù´¡À´¹¹½¨Òƶ¯´óÊý¾Ýƽ̨µÄ¹ý³Ì¡£

µ±Ï£¬SparkÒѾ­ÔÚ¹úÄڵõ½Á˹㷺µÄÈϿɺÍÖ§³Ö£º2014Ä꣬Spark Summit ChinaÔÚ±±¾©ÕÙ¿ª£¬³¡Ãæ»ð±¬£»Í¬Ä꣬Spark MeetupÔÚ±±¾©¡¢ÉϺ£¡¢ÉîÛںͺ¼ÖÝËĸö³ÇÊоٰ죬ÆäÖнö±±¾©¾Í³É¹¦¾Ù°ìÁË5´Î£¬ÄÚÈݸüº­¸ÇSpark Core¡¢Spark Streaming¡¢Spark MLlib¡¢Spark SQLµÈÖÚ¶àÁìÓò¡£¶ø×÷Ϊ½ÏÔç¹Ø×¢ºÍÒýÈëSparkµÄÒÆ¶¯»¥ÁªÍø´óÊý¾Ý×ۺϷþÎñ¹«Ë¾£¬TalkingDataÒ²»ý¼«µØ²ÎÓëµ½¹úÄÚSparkÉçÇøµÄ¸÷Öֻ£¬²¢¶à´ÎÔÚMeetupÖзÖÏí¹«Ë¾µÄSparkʹÓþ­Ñé¡£±¾ÎÄÔòÖ÷Òª½éÉÜTalkingDataÔÚ´óÊý¾Ýƽ̨½¨Éè¹ý³ÌÖУ¬Öð½¥ÒýÈëSpark£¬²¢ÇÒÒÔHadoop YARNºÍSparkΪ»ù´¡À´¹¹½¨Òƶ¯´óÊý¾Ýƽ̨µÄ¹ý³Ì¡£

³õʶSpark

×÷Ϊһ¼ÒÔÚÒÆ¶¯»¥ÁªÍø´óÊý¾ÝÁìÓò´´ÒµµÄ¹«Ë¾£¬Ê±¿Ì¹Ø×¢´óÊý¾Ý¼¼ÊõÁìÓòµÄ·¢Õ¹ºÍ½ø²½Êǹ«Ë¾¼¼ÊõÍŶӱØ×öµÄ¹¦¿Î¡£¶øÔÚÕûÀíStrata 2013¹«¿ªµÄ½²Òåʱ£¬Ò»ÆªÖ÷ÌâΪ¡¶An Introduction on the Berkeley Data Analytics Stack_BDAS_Featuring Spark,Spark Streaming,and Shark¡·µÄ½Ì³ÌÒýÆðÁËÕû¸ö¼¼ÊõÍŶӵĹØ×¢ºÍÌÖÂÛ£¬ÆäÖÐSpark»ùÓÚÄÚ´æµÄRDDÄ£ÐÍ¡¢¶Ô»úÆ÷ѧϰËã·¨µÄÖ§³Ö¡¢Õû¸ö¼¼ÊõÕ»ÖÐʵʱ´¦ÀíºÍÀëÏß´¦ÀíµÄͳһģÐÍÒÔ¼°Shark¶¼ÈÃÈËÑÛǰһÁÁ¡£Í¬Ê±ÆÚÎÒÃǹØ×¢µÄ»¹ÓÐImpala£¬µ«¶Ô±ÈSpark£¬Impala¿ÉÒÔÀí½âΪ¶ÔHiveµÄÉý¼¶£¬¶øSparkÔò³¢ÊÔÎ§ÈÆRDD½¨Á¢Ò»¸öÓÃÓÚ´óÊý¾Ý´¦ÀíµÄÉú̬ϵͳ¡£¶ÔÓÚÒ»¼ÒÊý¾ÝÁ¿¸ßËÙÔö³¤£¬ÒµÎñÓÖÊÇÒÔ´óÊý¾Ý´¦ÀíΪºËÐIJ¢ÇÒÔÚ²»¶Ï±ä»¯µÄ´´Òµ¹«Ë¾¶øÑÔ£¬ºóÕßÎÞÒɸüÖµµÃ½øÒ»²½¹Ø×¢ºÍÑо¿¡£

Spark³õ̽

2013ÄêÖÐÆÚ£¬Ëæ×ÅÒµÎñ¸ßËÙ·¢Õ¹£¬Ô½À´Ô½¶àµÄÒÆ¶¯É豸²àÊý¾Ý±»¸÷¸ö²»Í¬µÄÒµÎñƽ̨ÊÕ¼¯¡£ÄÇôÕâЩÊý¾Ý³ýÁËÌṩ²»Í¬ÒµÎñËùÐèÒªµÄÒµÎñÖ¸±ê£¬ÊÇ·ñ»¹Ô̲Ø×Ÿü¶àµÄ¼ÛÖµ£¿ÎªÁ˸üºÃµØÍÚ¾òÊý¾ÝDZÔÚ¼ÛÖµ£¬ÎÒÃǾö¶¨½¨Ôì×Ô¼ºµÄÊý¾ÝÖÐÐÄ£¬½«¸÷ÒµÎñƽ̨µÄÊý¾Ý»ã¼¯µ½Ò»Æð£¬¶Ô¸²¸ÇÉ豸µÄÏà¹ØÊý¾Ý½øÐмӹ¤¡¢·ÖÎöºÍÍÚ¾ò£¬´Ó¶øÌ½Ë÷Êý¾ÝµÄ¼ÛÖµ¡£³õÆÚÊý¾ÝÖÐÐÄÖ÷Òª¹¦ÄÜÉèÖÃÈçÏÂËùʾ£º

1. ¿çÊг¡¾ÛºÏµÄ°²×¿Ó¦ÓÃÅÅÃû£»

2. »ùÓÚÓû§ÐËȤµÄÓ¦ÓÃÍÆ¼ö¡£

»ùÓÚµ±Ê±µÄ¼¼ÊõÕÆÎճ̶Ⱥ͹¦ÄÜÐèÇó£¬Êý¾ÝÖÐÐÄËù²ÉÓõļ¼Êõ¼Ü¹¹Èçͼ1¡£

ͼ1 »ùÓÚHadoop 2.0µÄÊý¾ÝÖÐÐļ¼Êõ¼Ü¹¹

Õû¸öϵͳ¹¹½¨»ùÓÚHadoop 2.0£¨Cloudera CDH4.3£©£¬²ÉÓÃÁË×îԭʼµÄ´óÊý¾Ý¼ÆËã¼Ü¹¹¡£Í¨¹ýÈÕÖ¾»ã¼¯³ÌÐò£¬½«²»Í¬ÒµÎñƽ̨µÄÈÕÖ¾»ã¼¯µ½Êý¾ÝÖÐÐÄ£¬²¢Í¨¹ýETL½«Êý¾Ý½øÐиñʽ»¯´¦Àí£¬´¢´æµ½HDFS¡£ÆäÖУ¬ÅÅÃûºÍÍÆ¼öËã·¨µÄʵÏÖ¶¼²ÉÓÃÁËMapReduce£¬ÏµÍ³ÖÐÖ»´æÔÚÀëÏßÅúÁ¿¼ÆË㣬²¢Í¨¹ý»ùÓÚAzkabanµÄµ÷¶Èϵͳ½øÐÐÀëÏßÈÎÎñµÄµ÷¶È¡£

µÚÒ»¸ö°æ±¾µÄÊý¾ÝÖÐÐļܹ¹»ù±¾ÉÏÊÇÒÔÂú×ã¡°×î»ù±¾µÄÊý¾ÝÀûÓá±ÕâһĿµÄ½øÐÐÉè¼ÆµÄ¡£È»¶ø£¬Ëæ×ŶÔÊý¾Ý¼Ûֵ̽Ë÷µÃÖð½¥¼ÓÉԽÀ´Ô½¶àµÄʵʱ·ÖÎöÐèÇó±»Ìá³ö¡£Óë´Ëͬʱ£¬¸ü¶àµÄ»úÆ÷ѧϰË㷨ҲؽÐèÌí¼Ó£¬ÒÔ±ãÖ§³Ö²»Í¬µÄÊý¾ÝÍÚ¾òÐèÇó¡£¶ÔÓÚʵʱÊý¾Ý·ÖÎö£¬ÏÔÈ»²»ÄÜͨ¹ý¡°¶Ôÿ¸ö·ÖÎöÐèÇóµ¥¶À¿ª·¢MapReduceÈÎÎñ¡±À´Íê³É£¬Òò´ËÒýÈëHive ÊÇÒ»¸ö¼òµ¥¶øÖ±½ÓµÄÑ¡Ôñ¡£¼øÓÚ´«Í³µÄMapReduceÄ£ÐͲ¢²»ÄܺܺõØÖ§³Öµü´ú¼ÆË㣬ÎÒÃÇÐèÒªÒ»¸ö¸üºÃµÄ²¢ÐмÆËã¿ò¼ÜÀ´Ö§³Ö»úÆ÷ѧϰËã·¨¡£¶øÕâЩÕýÊÇÎÒÃÇÒ»Ö±ÔÚÃÜÇйØ×¢µÄSparkËùÉó¤µÄÁìÓò¡ª¡ªÆ¾½èÆä¶Ôµü´ú¼ÆËãµÄÓѺÃÖ§³Ö£¬SparkÀíËùµ±È»µØ³ÉΪÁ˲»¶þ֮ѡ¡£2013Äê9Ôµף¬Ëæ×ÅSpark 0.8.0·¢²¼£¬ÎÒÃǾö¶¨¶Ô×î³õµÄ¼Ü¹¹½øÐÐÑݽø£¬ÒýÈëHive×÷Ϊ¼´Ê±²éѯµÄ»ù´¡£¬Í¬Ê±ÒýÈëSpark¼ÆËã¿ò¼ÜÀ´Ö§³Ö»úÆ÷ѧϰÀàÐ͵ļÆË㣬²¢ÇÒÑéÖ¤SparkÕâ¸öеļÆËã¿ò¼ÜÊÇ·ñÄܹ»È«ÃæÌæ´ú´«Í³µÄÒÔMapReduceΪ»ù´¡µÄ¼ÆËã¿ò¼Ü¡£Í¼2ΪÕû¸öϵͳµÄ¼Ü¹¹Ñݱ䡣

ͼ2 ÔÚԭʼ¼Ü¹¹ÖвâÊÔSpark

ÔÚÕâ¸ö¼Ü¹¹ÖУ¬ÎÒÃǽ«Spark 0.8.1²¿ÊðÔÚYARNÉÏ£¬Í¨¹ý·ÖQueue£¬À´¸ôÀë»ùÓÚSparkµÄ»úÆ÷ѧϰÈÎÎñ£¬¼ÆËãÅÅÃûµÄÈÕ³£MapReduceÈÎÎñºÍ»ùÓÚHiveµÄ¼´Ê±·ÖÎöÈÎÎñ¡£

ÏëÒªÒýÈëSpark£¬µÚÒ»²½ÐèÒª×öµÄ¾ÍÊÇҪȡµÃÖ§³ÖÎÒÃÇHadoop»·¾³µÄSpark°ü¡£ÎÒÃǵÄHadoop»·¾³ÊÇCloudera·¢²¼µÄCDH 4.3£¬Ä¬ÈϵÄSpark·¢²¼°ü²¢²»°üº¬Ö§³ÖCDH 4.3µÄ°æ±¾£¬Òò´ËÖ»ÄÜ×Ô¼º±àÒë¡£Spark¹Ù·½ÎĵµÍƼöÓÃMaven½øÐбàÒ룬¿ÉÊDZàÒëÈ´²»ÈçÏëÏóÖÐ˳Àû¡£¸÷ÖÖ°üÒÀÀµÓÉÓÚÖÚËùÖÜÖªµÄÔ­Òò£¬²»ÄÜ˳ÀûµØ´ÓijЩÒÀÀµÖÐÐÄ¿âÏÂÔØ¡£ÓÚÊÇÎÒÃDzÉÈ¡ÁË×î¼òµ¥Ö±½ÓµÄÈÆ¿ª°ì·¨£¬ÀûÓÃAWSÔÆÖ÷»ú½øÐбàÒë¡£ÐèҪעÒâµÄÊÇ£¬±àÒëǰһ¶¨Òª×ñÑ­ÎĵµµÄ½¨Ò飬ÉèÖãº

·ñÔò£¬±àÒë¹ý³ÌÖоͻáÓöµ½ÄÚ´æÒç³öµÄÎÊÌâ¡£Õë¶ÔCDH 4.3£¬mvn buildµÄ²ÎÊýΪ£º

ÔÚ±àÒë³É¹¦ËùÐèÒªµÄSpark°üºó£¬²¿ÊðºÍÔÚHadoop»·¾³ÖÐÔËÐÐSparkÔòÊǷdz£¼òµ¥µÄÊÂÇé¡£½«±àÒëºÃµÄSparkĿ¼´ò°üѹËõºó£¬ÔÚ¿ÉÒÔÔËÐÐHadoop ClientµÄ»úÆ÷ÉϽâѹËõ£¬¾Í¿ÉÒÔÔËÐÐSparkÁË¡£ÏëÒªÑéÖ¤SparkÊÇ·ñÄܹ»Õý³£ÔÚÄ¿±êHadoop»·¾³ÉÏÔËÐУ¬¿ÉÒÔ²ÎÕÕSparkµÄ¹Ù·½Îĵµ£¬ÔËÐÐexampleÖеÄSparkPiÀ´ÑéÖ¤£º

Íê³ÉSpark²¿ÊðÖ®ºó£¬Ê£ÏµľÍÊÇ¿ª·¢»ùÓÚSparkµÄ³ÌÐòÁË¡£ËäÈ»SparkÖ§³ÖJava¡¢Python£¬µ«×îºÏÊÊ¿ª·¢Spark³ÌÐòµÄÓïÑÔ»¹ÊÇScala¡£¾­¹ýÒ»¶Îʱ¼äµÄÃþË÷ʵ¼ù£¬ÎÒÃÇÕÆÎÕÁËScalaÓïÑԵĺ¯Êýʽ±à³ÌÓïÑÔÌØµãºó£¬ÖÕÓÚÌå»áÁËÀûÓÃScala¿ª·¢SparkÓ¦Óõľ޴óºÃ´¦¡£Í¬ÑùµÄ¹¦ÄÜ£¬ÓÃMapReduce¼¸°ÙÐвÅÄÜʵÏֵļÆË㣬ÔÚSparkÖУ¬Scalaͨ¹ý¶Ì¶ÌµÄÊýÊ®ÐдúÂë¾ÍÄÜÍê³É¡£¶øÔÚÔËÐÐʱ£¬Í¬ÑùµÄ¼ÆË㹦ÄÜ£¬SparkÉÏÖ´ÐÐÔò±ÈMapReduceÓÐÊýÊ®±¶µÄÌá¸ß¡£¶ÔÓÚÐèÒªµü´úµÄ»úÆ÷ѧϰËã·¨À´½²£¬SparkµÄRDDÄ£ÐÍÏà±ÈMapReduceµÄÓÅÊÆÔò¸üÊÇÃ÷ÏÔ£¬¸üºÎ¿ö»¹Óлù±¾µÄMLlibµÄÖ§³Ö¡£¾­¹ý¼¸¸öÔµÄʵ¼ù£¬Êý¾ÝÍÚ¾òÏà¹Ø¹¤×÷±»ÍêÈ«Ç¨ÒÆµ½Spark£¬²¢ÇÒÔÚSparkÉÏʵÏÖÁËÊʺÏÎÒÃÇÊý¾Ý¼¯µÄ¸ü¸ßЧµÄLRµÈµÈËã·¨¡£

È«ÃæÓµ±§Spark

½øÈë2014Ä꣬¹«Ë¾µÄÒµÎñÓÐÁ˳¤×ãµÄ·¢Õ¹£¬¶Ô±ÈÊý¾ÝÖÐÐÄÆ½Ì¨½¨Á¢Ê±£¬Ã¿ÈÕ´¦ÀíµÄÊý¾ÝÁ¿Òà·­Á˼¸·¬¡£Ã¿ÈÕµÄÅÅÃû¼ÆËãËù»¨µÄʱ¼äÔ½À´Ô½³¤£¬¶ø»ùÓÚHiveµÄ¼´Ê±¼ÆËãÖ»ÄÜÖ§³ÖÈճ߶ȵļÆË㣬Èç¹ûµ½ÖÜÕâ¸ö³ß¶È£¬¼ÆËãËù»¨µÄʱ¼äÒѾ­ºÜÄÑÈÌÊÜ£¬µ½ÔÂÕâ¸ö³ß¶ÈÔò»ù±¾ÉÏû°ì·¨Íê³É¼ÆËã¡£»ùÓÚÔÚSparkÉϵÄÈÏÖªºÍ»ýÀÛ£¬ÊÇʱºò½«Õû¸öÊý¾ÝÖÐÐÄÇ¨ÒÆµ½SparkÉÏÁË¡£

2014Äê4Ô£¬Spark Summit ChinaÔÚ±±¾©¾ÙÐС£±§×ÅѧϰµÄÄ¿µÄ£¬ÎÒÃǼ¼ÊõÍŶÓÒ²²Î¼ÓÁËÔÚÖйú¾ÙÐеÄÕâÒ»´ÎSparkÊ¢»á¡£Í¨¹ýÕâ´ÎÊ¢»á£¬ÎÒÃÇÁ˽⵽¹úÄڵĺܶàͬÐÐÒѾ­¿ªÊ¼²ÉÓÃSparkÀ´½¨Ôì×Ô¼ºµÄ´óÊý¾Ýƽ̨£¬¶øSparkÒ²±ä³ÉÁËÔÚASFÖÐ×îΪ»îÔ¾µÄÏîĿ֮һ¡£ÁíÍ⣬ԽÀ´Ô½¶àµÄ´óÊý¾ÝÏà¹ØµÄ²úÆ·Ò²Öð½¥ÔÚºÍSparkÏàÈںϻòÕßÔÚÏòSparkÇ¨ÒÆ¡£SparkÎÞÒɽ«»á±äΪһ¸öÏà±ÈHadoop MapReduce¸üºÃµÄÉú̬ϵͳ¡£Í¨¹ýÕâ´Î´ó»á£¬ÎÒÃǸü¼Ó¼á¶¨ÁËÈ«ÃæÓµ±§SparkµÄ¾öÐÄ¡£

»ùÓÚYARNºÍSpark£¬ÎÒÃÇ¿ªÊ¼ÖØÐ¼ܹ¹Êý¾ÝÖÐÐÄÒÀÀµµÄ´óÊý¾Ýƽ̨¡£Õû¸öеÄÊý¾Ýƽ̨Ӧ¸ÃÄܹ»³ÐÔØ£º

1. ׼ʵʱµÄÊý¾Ý»ã¼¯ºÍETL£»

2. Ö§³ÖÁ÷ʽµÄÊý¾Ý¼Ó¹¤£»

3. ¸ü¸ßЧµÄÀëÏß¼ÆËãÄÜÁ¦£»

4. ¸ßËٵĶàά·ÖÎöÄÜÁ¦£»

5. ¸ü¸ßЧµÄ¼´Ê±·ÖÎöÄÜÁ¦£»

6. ¸ßЧµÄ»úÆ÷ѧϰÄÜÁ¦£»

7. ͳһµÄÊý¾Ý·ÃÎʽӿڣ»

8. ͳһµÄÊý¾ÝÊÓͼ£»

9. Áé»îµÄÈÎÎñµ÷¶È.

Õû¸öеļܹ¹³ä·ÖµØÀûÓÃYARNºÍSpark£¬²¢ÇÒÈںϹ«Ë¾µÄһЩ¼¼Êõ»ýÀÛ£¬¼Ü¹¹Èçͼ3Ëùʾ¡£

ÔÚеļܹ¹ÖУ¬ÒýÈëÁËKafka×÷ΪÈÕÖ¾»ã¼¯µÄͨµÀ¡£¼¸¸öÒµÎñϵͳÊÕ¼¯µÄÒÆ¶¯É豸²àµÄÈÕÖ¾£¬ÊµÊ±µØÐ´Èëµ½Kafka ÖУ¬´Ó¶ø·½±ãºóÐøµÄÊý¾ÝÏû·Ñ¡£

ÀûÓÃSpark Streaming£¬¿ÉÒÔ·½±ãµØ¶ÔKafkaÖеÄÊý¾Ý½øÐÐÏû·Ñ´¦Àí¡£ÔÚÕû¸ö¼Ü¹¹ÖУ¬Spark StreamingÖ÷ÒªÍê³ÉÁËÒÔϹ¤×÷¡£

1. ԭʼÈÕÖ¾µÄ±£´æ¡£½«KafkaÖеÄԭʼÈÕÖ¾ÒÔJSON¸ñʽÎÞËðµÄ±£´æÔÚHDFSÖС£

2. Êý¾ÝÇåÏ´ºÍת»»£¬ÇåÏ´ºÍ±ê×¼»¯Ö®ºó£¬×ª±äΪParquet¸ñʽ£¬´æ´¢ÔÚHDFSÖУ¬·½±ãºóÐøµÄ¸÷ÖÖÊý¾Ý¼ÆËãÈÎÎñ¡£

3. ¶¨ÒåºÃµÄÁ÷ʽ¼ÆËãÈÎÎñ£¬±ÈÈç»ùÓÚÆµ´Î¹æÔòµÄ±êÇ©¼Ó¹¤µÈµÈ£¬¼ÆËã½á¹ûÖ±½Ó´æ´¢ÔÚMongoDBÖС£

ͼ3 ºÏÁËYARNºÍSparkµÄ×îÐÂÊý¾ÝÖÐÐļܹ¹

ÅÅÃû¼ÆËãÈÎÎñÔòÔÚSparkÉÏ×öÁËÖØÐÂʵÏÖ£¬½èÁ¦Spark´øÀ´µÄÐÔÄÜÌá¸ß£¬ÒÔ¼°ParquetÁÐʽ´æ´¢´øÀ´µÄ¸ßЧÊý¾Ý·ÃÎÊ¡£Í¬ÑùµÄ¼ÆËãÈÎÎñ£¬ÔÚÊý¾ÝÁ¿Ìá¸ßµ½Ô­À´3±¶µÄÇé¿öÏ£¬Ê±¼ä¿ªÏúÖ»ÓÐÔ­À´µÄ1/6¡£

ͬʱ£¬ÔÚÀûÓÃSparkºÍParquetÁÐʽ´æ´¢´øÀ´µÄÐÔÄÜÌáÉýÖ®Íâ£¬Ôø¾­ºÜÄÑÂú×ãÒµÎñÐèÇóµÄ¼´Ê±¶àά¶ÈÊý¾Ý·ÖÎöÖÕÓÚ³ÉΪÁË¿ÉÄÜ¡£Ôø¾­ÀûÓÃHiveÐèҪСʱ¼¶±ð²ÅÄÜÍê³ÉÈճ߶ȵĶàά¶È¼´Ê±·ÖÎö£¬ÔÚмܹ¹ÉÏ£¬Ö»ÐèÒª2·ÖÖÓ¾ÍÄܹ»Ë³ÀûÍê³É¡£¶øÖܳ߶ÈÉÏÒ²²»¹ýÊ®·ÖÖÓ¾ÍÄܹ»Ëã³ö½á¹û¡£Ôø¾­ÔÚHiveÉÏÎÞ·¨Íê³ÉµÄÔ³߶ȶàά¶È·ÖÎö¼ÆË㣬ÔòÔÚÁ½¸öСʱÄÚÒ²¿ÉÒÔËã³ö½á¹û¡£ÁíÍâSpark SQLµÄÖð½¥ÍêÉÆÒ²½µµÍÁË¿ª·¢µÄÄѶȡ£

ÀûÓÃYARNÌṩµÄ×ÊÔ´¹ÜÀíÄÜÁ¦£¬ÓÃÓÚ¶àά¶È·ÖÎö£¬×ÔÖ÷Ñз¢µÄBitmapÒýÇæÒ²±»Ç¨ÒƵ½ÁËYARNÉÏ¡£¶ÔÓÚÒѾ­È·¶¨ºÃµÄά¶È£¬¿ÉÒÔÔ¤ÏÈ´´½¨BitmapË÷Òý¡£¶ø¶àά¶ÈµÄ·ÖÎö£¬Èç¹ûËùÐèÒªµÄά¶ÈÒѾ­Ô¤ÏȽ¨Á¢ÁËBitmapË÷Òý£¬Ôòͨ¹ýBitmapÒýÇæÓÉBitmap¼ÆËãÀ´ÊµÏÖ£¬´Ó¶ø¿ÉÒÔÌṩʵʱµÄ¶àά¶ÈµÄ·ÖÎöÄÜÁ¦¡£

ÔÚеļܹ¹ÖУ¬ÎªÁ˸ü·½±ãµØ¹ÜÀíÊý¾Ý£¬ÎÒÃÇÒýÈëÁË»ùÓÚHCatalogµÄÔªÊý¾Ý¹ÜÀíϵͳ£¬Êý¾ÝµÄ¶¨Òå¡¢´æ´¢¡¢·ÃÎʶ¼Í¨¹ýÔªÊý¾Ý¹ÜÀíϵͳ£¬´Ó¶øÊµÏÖÁËÊý¾ÝµÄͳһÊÓͼ£¬·½±ãÁËÊý¾Ý×ʲúµÄ¹ÜÀí¡£

YARNÖ»ÌṩÁË×ÊÔ´µÄµ÷¶ÈÄÜÁ¦£¬ÔÚÒ»¸ö´óÊý¾Ýƽ̨£¬·Ö²¼Ê½µÄÈÎÎñµ÷¶ÈϵͳͬÑù²»¿É»òȱ¡£ÔÚеļܹ¹ÖУ¬ÎÒÃÇ×ÔÐпª·¢ÁËÒ»¸öÖ§³ÖDAGµÄ·Ö²¼Ê½ÈÎÎñµ÷¶Èϵͳ£¬½áºÏYARNÌṩµÄ×ÊÔ´µ÷¶ÈÄÜÁ¦£¬´Ó¶øÊµÏÖ¶¨Ê±ÈÎÎñ¡¢¼´Ê±ÈÎÎñÒÔ¼°²»Í¬ÈÎÎñ¹¹³ÉµÄpipeline¡£

»ùÓÚÎ§ÈÆYARNºÍSparkµÄеļܹ¹£¬Ò»¸öÕë¶ÔÊý¾ÝÒµÎñ²¿ÃŵÄ×Ô·þÎñ´óÊý¾Ýƽ̨µÃÒÔʵÏÖ£¬Êý¾ÝÒµÎñ²¿ÃÅ¿ÉÒÔ·½±ãµØÀûÓÃÕâ¸öƽ̨¶Ô½øÐжàά¶ÈµÄ·ÖÎö¡¢Êý¾ÝµÄ³éÈ¡£¬ÒÔ¼°½øÐÐ×Ô¶¨ÒåµÄ±êÇ©¼Ó¹¤¡£×Ô·þÎñϵͳÌá¸ßÁËÊý¾ÝÀûÓõÄÄÜÁ¦£¬Í¬Ê±Ò²´ó´óÌá¸ßÁËÊý¾ÝÀûÓõÄЧÂÊ¡£

ʹÓÃSparkÓöµ½µÄһЩ¿Ó

ÈκÎм¼ÊõµÄÒýÈë¶¼»áÀú¾­Ä°Éúµ½ÊìϤ£¬´Ó×î³õм¼Êõ´øÀ´µÄ¾ªÏ²£¬µ½ºóÀ´Óöµ½À§ÄÑʱµÄÒ»³ïĪչºÍã°â꣬ÔÙµ½ÎÊÌâ½â¾öºóµÄÓäÔ㬴óÊý¾ÝйóSparkͬÑù²»ÄÜÃâËס£ÏÂÃæ¾ÍÁоÙһЩÎÒÃÇÓöµ½µÄ¿Ó¡£

¡¾¿ÓÒ»£ºÅܴܺóµÄÊý¾Ý¼¯µÄʱºò£¬»áÓöµ½org.apache.spark.SparkException: Error communicating with MapOutputTracker¡¿

Õâ¸ö´íÎ󱨵úÜÒþ»Þ£¬´Ó´íÎóÈÕÖ¾¿´£¬ÊÇSpark¼¯ÈºpartitionÁË£¬µ«Èç¹û¹Û²ìÎïÀí»úÆ÷µÄÔËÐÐÇé¿ö£¬»á·¢ÏÖ´ÅÅÌI/O·Ç³£¸ß¡£½øÒ»²½·ÖÎö»á·¢ÏÖÔ­ÒòÊÇSparkÔÚ´¦Àí´óÊý¾Ý¼¯Ê±µÄshuffle¹ý³ÌÖÐÉú³ÉÁËÌ«¶àµÄÁÙʱÎļþ£¬Ôì³ÉÁ˲Ù×÷ϵͳ´ÅÅÌI/O¸ºÔعý´ó¡£ÕÒµ½Ô­Òòºó£¬½â¾öÆðÀ´¾ÍºÜ¼òµ¥ÁË£¬ÉèÖÃspark.shuffle.consolidateFilesΪtrue¡£Õâ¸ö²ÎÊýÔÚĬÈϵÄÉèÖÃÖÐÊÇfalseµÄ£¬¶ÔÓÚlinuxµÄext4Îļþϵͳ£¬½¨Òé´ó¼Ò»¹ÊÇĬÈÏÉèÖÃΪtrue°É¡£Spark¹Ù·½ÎĵµµÄÃèÊöÒ²½¨Òéext4ÎļþϵͳÉèÖÃΪtrueÀ´Ìá¸ßÐÔÄÜ¡£

¡¾¿Ó¶þ£ºÔËÐÐʱ±¨Fetch failure´í¡¿

ÔÚ´óÊý¾Ý¼¯ÉÏ£¬ÔËÐÐSpark³ÌÐò£¬ÔںܶàÇé¿öÏ»áÓöµ½Fetch failureµÄ´í¡£ÓÉÓÚSpark±¾ÉíÉè¼ÆÊÇÈÝ´íµÄ£¬´ó²¿·ÖµÄFetch failure»á¾­¹ýÖØÊÔºóͨ¹ý£¬Òò´ËÕû¸öSparkÈÎÎñ»áÕý³£ÅÜÍ꣬²»¹ýÓÉÓÚÖØÊÔµÄÓ°Ï죬ִÐÐʱ¼ä»áÏÔÖøÔö³¤¡£Ôì³ÉFetch failureµÄ¸ù±¾Ô­ÒòÔò²»¾¡Ïàͬ¡£´Ó´íÎó±¾Éí¿´£¬ÊÇÓÉÓÚÈÎÎñ²»ÄÜ´ÓÔ¶³ÌµÄ½Úµã¶ÁÈ¡shuffleµÄÊý¾Ý£¬¾ßÌåÔ­ÒòÔòÐèÒªÀûÓãº

²é¿´SparkµÄÔËÐÐÈÕÖ¾£¬´Ó¶øÕÒµ½Ôì³ÉFetch failureµÄ¸ù±¾Ô­Òò¡£ÆäÖд󲿷ֵÄÎÊÌâ¶¼¿ÉÒÔͨ¹ýºÏÀíµÄ²ÎÊýÅäÖÃÒÔ¼°¶Ô³ÌÐò½øÐÐÓÅ»¯À´½â¾ö¡£2014ÄêSpark Summit ChinaÉϳ³¬µÄÄǸöרÌ⣬¶ÔÓÚÈçºÎ¶ÔSparkÐÔÄܽøÐÐÓÅ»¯£¬Óзdz£ºÃµÄ½¨Òé¡£

µ±È»£¬ÔÚʹÓÃSpark¹ý³ÌÖл¹Óöµ½¹ýÆäËû²»Í¬µÄÎÊÌ⣬²»¹ýÓÉÓÚSpark±¾ÉíÊÇ¿ªÔ´µÄ£¬Í¨¹ýÔ´´úÂëµÄÔĶÁ£¬ÒÔ¼°½èÖú¿ªÔ´ÉçÇøµÄ°ïÖú£¬´ó²¿·ÖÎÊÌâ¶¼¿ÉÒÔ˳Àû½â¾ö¡£

ÏÂÒ»²½µÄ¼Æ»®

SparkÔÚ2014ÄêÈ¡µÃÁ˳¤×ãµÄ·¢Õ¹£¬Î§ÈÆSparkµÄ´óÊý¾ÝÉú̬ϵͳҲÖð½¥µÄÍêÉÆ¡£Spark 1.3ÒýÈëÁËÒ»¸öеÄDataFrame API£¬Õâ¸öеÄDataFrame API½«»áʹµÃSpark¶ÔÓÚÊý¾ÝµÄ´¦Àí¸ü¼ÓÓѺá£Í¬Ñù³ö×ÔÓÚAMPLabµÄ·Ö²¼Ê½»º´æÏµÍ³TachyonÒòΪÆäÓëSparkµÄÁ¼ºÃ¼¯³ÉÒ²Öð½¥ÒýÆðÁËÈËÃǵÄ×¢Òâ¡£¼øÓÚÔÚÒµÎñ³¡¾°ÖУ¬ºÜ¶à»ù´¡Êý¾ÝÊÇÐèÒª±»¶à¸ö²»Í¬µÄSparkÈÎÎñÖØ¸´Ê¹Óã¬ÏÂÒ»²½£¬ÎÒÃǽ«»áÔڼܹ¹ÖÐÒýÈëTachyonÀ´×÷Ϊ»º´æ²ã¡£ÁíÍâ£¬Ëæ×ÅSSDµÄÈÕÒæÆÕ¼°£¬ÎÒÃǺóÐøµÄ¼Æ»®ÊÇÔÚ¼¯ÈºÖÐÿ̨»úÆ÷¶¼ÒýÈëSSD´æ´¢£¬ÅäÖÃSparkµÄshuffleµÄÊä³öµ½SSD£¬ÀûÓÃSSDµÄ¸ßËÙËæ»ú¶ÁдÄÜÁ¦£¬½øÒ»²½Ìá¸ß´óÊý¾Ý´¦ÀíЧÂÊ¡£

ÔÚ»úÆ÷ѧϰ·½Ã棬H2O»úÆ÷ѧϰÒýÇæÒ²ºÍSparkÓÐÁËÁ¼ºÃµÄ¼¯³É´Ó¶ø²úÉúÁËSparkling-water¡£ÏàÐÅÀûÓÃSparking-water£¬×÷Ϊһ¼Ò´´Òµ¹«Ë¾£¬ÎÒÃÇÒ²¿ÉÒÔÀûÓÃÉî¶ÈѧϰµÄÁ¦Á¿À´½øÒ»²½ÍÚ¾òÊý¾ÝµÄ¼ÛÖµ¡£

½áÓï

2004Ä꣬GoogleµÄMapReduceÂÛÎĽҿªÁË´óÊý¾Ý´¦ÀíµÄʱ´ú£¬HadoopµÄMapReduceÔÚ¹ýÈ¥½Ó½ü10ÄêµÄʱ¼ä³ÉÁË´óÊý¾Ý´¦ÀíµÄ´úÃû´Ê¡£¶øMatei Zaharia 2012Äê¹ØÓÚRDDµÄһƪÂÛÎÄ¡°Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing¡±Ôò½ÒʾÁË´óÊý¾Ý´¦Àí¼¼ÊõÒ»¸öÐÂʱ´úµÄµ½À´¡£°éËæ×ÅеÄÓ²¼þ¼¼ÊõµÄ·¢Õ¹¡¢µÍÑÓ³Ù´óÊý¾Ý´¦ÀíµÄ¹ã·ºÐèÇóÒÔ¼°Êý¾ÝÍÚ¾òÔÚ´óÊý¾ÝÁìÓòµÄÈÕÒæÆÕ¼°£¬Spark×÷Ϊһ¸öոеĴóÊý¾ÝÉú̬ϵͳ£¬Öð½¥È¡´ú´«Í³µÄMapReduce¶ø³ÉΪÐÂÒ»´ú´óÊý¾Ý´¦Àí¼¼ÊõµÄÈÈÃÅ¡£ÎÒÃǹýÈ¥Á½Äê´ÓMapReduceµ½Spark¼Ü¹¹µÄÑݱä¹ý³Ì£¬Ò²»ù±¾ÉÏ´ú±íÁËÏ൱һ²¿·Ö´óÊý¾ÝÁìÓò´ÓÒµÕߵļ¼ÊõÑݽøµÄÀú³Ì¡£ÏàÐÅËæ×ÅSparkÉú̬µÄÈÕÒæÍêÉÆ£¬»áÓÐÔ½À´Ô½¶àµÄÆóÒµ½«×Ô¼ºµÄÊý¾Ý´¦ÀíÇ¨ÒÆµ½SparkÉÏÀ´¡£¶ø°éËæ×ÅÔ½À´Ô½¶àµÄ´óÊý¾Ý¹¤³ÌʦÊìϤºÍÁ˽âSpark£¬¹úÄÚµÄSparkÉçÇøÒ²»áÔ½À´Ô½»îÔ¾£¬Spark×÷Ϊһ¸ö¿ªÔ´µÄƽ̨£¬ÏàÐÅÒ²»áÓÐÔ½À´Ô½¶àµÄ»ªÈ˱ä³ÉSparkÏà¹ØÏîÄ¿µÄContributor£¬SparkÒ²»á±äµÃÔ½À´Ô½³ÉÊìºÍÇ¿´ó¡£

   
´Îä¯ÀÀ       
Ïà¹ØÎÄÕÂ

»ùÓÚEAµÄÊý¾Ý¿â½¨Ä£
Êý¾ÝÁ÷½¨Ä££¨EAÖ¸ÄÏ£©
¡°Êý¾Ýºþ¡±£º¸ÅÄî¡¢ÌØÕ÷¡¢¼Ü¹¹Óë°¸Àý
ÔÚÏßÉ̳ÇÊý¾Ý¿âϵͳÉè¼Æ ˼·+Ч¹û
 
Ïà¹ØÎĵµ

GreenplumÊý¾Ý¿â»ù´¡Åàѵ
MySQL5.1ÐÔÄÜÓÅ»¯·½°¸
ijµçÉÌÊý¾ÝÖÐ̨¼Ü¹¹Êµ¼ù
MySQL¸ßÀ©Õ¹¼Ü¹¹Éè¼Æ
Ïà¹Ø¿Î³Ì

Êý¾ÝÖÎÀí¡¢Êý¾Ý¼Ü¹¹¼°Êý¾Ý±ê×¼
MongoDBʵս¿Î³Ì
²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿âÉè¼ÆÓëÓÅ»¯
PostgreSQLÊý¾Ý¿âʵսÅàѵ