ʵʱÁ÷´¦Àíϵͳ±ØÐëÒªÄÜÔÚ24/7ʱ¼äÄÚ¹¤×÷£¬Òò´ËËüÐèÒª¾ß±¸´Ó¸÷ÖÖϵͳ¹ÊÕÏÖлָ´¹ýÀ´µÄÄÜÁ¦¡£×ʼ£¬Spark Streaming¾ÍÖ§³Ö´ÓdriverºÍworker¹ÊÕϻָ´µÄÄÜÁ¦¡£±¾ÎÄ̸¼°Spark StreamingÈÝ´íµÄ¸Ä½øºÍÁãÊý¾Ý¶ªÊ§µÄʵÏÖ¡£
±¾ÎÄÀ´×ÔSpark StreamingÏîÄ¿´øÍ·ÈË Tathagata DasµÄ²©¿ÍÎÄÕ£¬ËûÏÖÔÚ¾ÍÖ°ÓÚDatabricks¹«Ë¾¡£¹ýÈ¥ÔøÔÚUC BerkeleyµÄAMPLabʵÑéÊÒ½øÐдóÊý¾ÝºÍSpark StreamingµÄÑо¿¹¤×÷¡£±¾ÎÄÖ÷Ҫ̸¼°ÁËSpark StreamingÈÝ´íµÄ¸Ä½øºÍÁãÊý¾Ý¶ªÊ§µÄʵÏÖ¡£
ÒÔÏÂΪÔÎÄ£º
ʵʱÁ÷´¦Àíϵͳ±ØÐëÒªÄÜÔÚ24/7ʱ¼äÄÚ¹¤×÷£¬Òò´ËËüÐèÒª¾ß±¸´Ó¸÷ÖÖϵͳ¹ÊÕÏÖлָ´¹ýÀ´µÄÄÜÁ¦¡£×ʼ£¬Spark Streaming¾ÍÖ§³Ö´ÓdriverºÍworker¹ÊÕϻָ´µÄÄÜÁ¦¡£È»¶øÓÐЩÊý¾ÝÔ´µÄÊäÈë¿ÉÄÜÔÚ¹ÊÕϻָ´ÒÔºó¶ªÊ§Êý¾Ý¡£ÔÚSpark 1.2°æ±¾ÖУ¬ÎÒÃÇÒѾÔÚSpark StreamingÖжÔԤдÈÕÖ¾£¨Ò²±»³ÆÎªjournaling£©×÷Á˳õ²½Ö§³Ö£¬¸Ä½øÁ˻ָ´»úÖÆ£¬²¢Ê¹¸ü¶àÊý¾ÝÔ´µÄÁãÊý¾Ý¶ªÊ§ÓÐÁ˿ɿ¿±£Ö¤¡£±¾ÎĽ«ÏêϸµØÃèÊöÕâ¸öÌØÐԵŤ×÷»úÖÆ£¬ÒÔ¼°¿ª·¢ÕßÈçºÎÔÚSpark StreamingÓ¦ÓÃÖÐʹÓÃÕâ¸ö»úÖÆ¡£
±³¾°
SparkºÍËüµÄRDD³éÏóÉè¼ÆÔÊÐíÎÞ·ìµØ´¦Àí¼¯ÈºÖÐÈκÎworker½ÚµãµÄ¹ÊÕÏ¡£¼øÓÚSpark Streaming½¨Á¢ÓÚSparkÖ®ÉÏ£¬Òò´ËÆäworker½ÚµãÒ²¾ß±¸ÁËͬÑùµÄÈÝ´íÄÜÁ¦¡£È»¶ø£¬Spark StreamingÓг¤Õý³£ÔËÐÐʱ¼äÐèÇóÆäÓ¦ÓóÌÐò±ØÐëÒ²¾ß±¸´Ódriver½ø³Ì£¨Ðµ÷¸÷¸öworkerµÄÖ÷ÒªÓ¦Óýø³Ì£©¹ÊÕϻָ´µÄÄÜÁ¦¡£Ê¹Spark driverÄܹ»ÈÝ´íÊǼþºÜ¼¬ÊÖµÄÊÂÇ飬ÒòΪËü¿ÉÒÔÊÇÈÎÒâ¼ÆËãģʽʵÏÖµÄÈÎÒâÓû§³ÌÐò¡£²»¹ýSpark StreamingÓ¦ÓóÌÐòÔÚ¼ÆËãÉÏÓÐÒ»¸öÄÚÔڵĽṹ——ÔÚÿ¶Îmicro-batchÊý¾ÝÖÜÆÚÐÔµØÖ´ÐÐͬÑùµÄSpark¼ÆËã¡£ÕâÖֽṹÔÊÐí°ÑÓ¦ÓõÄ״̬£¨Òà³Æcheckpoint£©ÖÜÆÚÐԵر£´æµ½¿É¿¿µÄ´æ´¢¿Õ¼äÖУ¬²¢ÔÚdriverÖØÐÂÆô¶¯Ê±»Ö¸´¸Ã״̬¡£
¶ÔÓÚÎļþÕâÑùµÄÔ´Êý¾Ý£¬Õâ¸ödriver»Ö¸´»úÖÆ×ãÒÔ×öµ½ÁãÊý¾Ý¶ªÊ§£¬ÒòΪËùÓеÄÊý¾Ý¶¼±£´æÔÚÁËÏñHDFS»òS3ÕâÑùµÄÈÝ´íÎļþϵͳÖÐÁË¡£µ«¶ÔÓÚÏñKafkaºÍFlumeµÈÆäËüÊý¾ÝÔ´£¬ÓÐЩ½ÓÊÕµ½µÄÊý¾Ý»¹Ö»»º´æÔÚÄÚ´æÖУ¬ÉÐδ±»´¦Àí£¬ËüÃǾÍÓпÉÄܻᶪʧ¡£ÕâÊÇÓÉÓÚSparkÓ¦Óõķֲ¼²Ù×÷·½Ê½ÒýÆðµÄ¡£µ±driver½ø³Ìʧ°Üʱ£¬ËùÓÐÔÚstandalone/yarn/mesos¼¯ÈºÔËÐеÄexecutor£¬Á¬Í¬ËüÃÇÔÚÄÚ´æÖеÄËùÓÐÊý¾Ý£¬Ò²Í¬Ê±±»ÖÕÖ¹¡£¶ÔÓÚSpark StreamingÀ´Ëµ£¬´ÓÖîÈçKafkaºÍFlumeµÄÊý¾ÝÔ´½ÓÊÕµ½µÄËùÓÐÊý¾Ý£¬ÔÚËüÃÇ´¦ÀíÍê³É֮ǰ£¬Ò»Ö±¶¼»º´æÔÚexecutorµÄÄÚ´æÖС£×ÝÈ»driverÖØÐÂÆô¶¯£¬ÕâЩ»º´æµÄÊý¾ÝÒ²²»Äܱ»»Ö¸´¡£ÎªÁ˱ÜÃâÕâÖÖÊý¾ÝËðʧ£¬ÎÒÃÇÔÚSpark 1.2·¢²¼°æ±¾ÖÐÒý½øÁËԤдÈÕÖ¾£¨Write Ahead Logs£©¹¦ÄÜ¡£
ԤдÈÕÖ¾
ԤдÈÕÖ¾£¨Ò²³Æ×÷journal£©Í¨³£±»ÓÃÓÚÊý¾Ý¿âºÍÎļþϵͳÖУ¬ÓÃÀ´±£Ö¤ÈκÎÊý¾Ý²Ù×÷µÄ³Ö¾ÃÐÔ¡£Õâ¸ö²Ù×÷µÄ˼ÏëÊÇÊ×ÏȽ«²Ù×÷¼ÇÈëÒ»¸ö³Ö¾ÃµÄÈÕÖ¾£¬È»ºó²Å¶ÔÊý¾ÝÊ©¼ÓÕâ¸ö²Ù×÷¡£¼ÙÈçÔÚÊ©¼Ó²Ù×÷µÄÖмäϵͳʧ°ÜÁË£¬Í¨¹ý¶ÁÈ¡ÈÕÖ¾²¢ÖØÐÂÊ©¼ÓÇ°ÃæÔ¤¶¨µÄ²Ù×÷£¬ÏµÍ³¾ÍµÃµ½Á˻ָ´¡£ÏÂÃæÈÃÎÒÃÇ¿´¿´ÈçºÎÀûÓÃÕâÑùµÄ¸ÅÄî±£Ö¤½ÓÊÕµ½µÄÊý¾ÝµÄ³Ö¾ÃÐÔ¡£
ÏñKafkaºÍFlumeÕâÑùµÄÊý¾ÝԴʹÓýÓÊÕÆ÷£¨Receiver£©À´½ÓÊÕÊý¾Ý¡£ËüÃÇ×÷Ϊ³¤×¤ÔËÐÐÈÎÎñÔÚexecutorÖÐÔËÐУ¬¸ºÔð´ÓÊý¾ÝÔ´½ÓÊÕÊý¾Ý£¬²¢ÇÒÔÚÊý¾ÝÔ´Ö§³Öʱ£¬»¹¸ºÔðÈ·ÈÏÊÕµ½µÄÊý¾Ý¡£ÊÕµ½µÄÊý¾Ý±»±£´æÔÚexecutorµÄÄÚ´æÖУ¬È»ºódriverÔÚexecutorÖÐÔËÐÐÀ´´¦ÀíÈÎÎñ¡£
µ±ÆôÓÃÁËԤдÈÕÖ¾ÒÔºó£¬ËùÓÐÊÕµ½µÄÊý¾Ýͬʱ»¹±£´æµ½ÁËÔÚϵͳÈÝ´íÎļþϵͳµÄÈÕÖ¾ÎļþÖС£Òò´Ë¼´Ê¹Spark Streamingʧ°Ü£¬ÕâЩ½ÓÊÕµ½µÄÊý¾ÝÒ²²»»á¶ªÊ§¡£ÁíÍ⣬ÊÕµ½Êý¾ÝÕýÈ·ÐÔÖ»ÔÚÊý¾Ý±»Ô¤Ð´µ½ÈÕÖ¾ÒÔºó½ÓÊÕÆ÷²Å»áÈ·ÈÏ£¬ÒѾ»º´æµ«»¹Ã»Óб£´æµÄÊý¾Ý¿ÉÒÔÔÚdriverÖØÐÂÆô¶¯Ö®ºóÓÉÊý¾ÝÔ´ÔÙ·¢ËÍÒ»´Î¡£ÕâÁ½¸ö»úÖÆÈ·±£ÁËÁãÊý¾Ý¶ªÊ§£¬¼´ËùÓеÄÊý¾Ý»òÕß´ÓÈÕÖ¾Öлָ´£¬»òÕßÓÉÊý¾ÝÔ´ÖØ·¢¡£
ÅäÖÃ
Èç¹ûÐèÒªÆôÓÃԤдÈÕÖ¾¹¦ÄÜ£¬¿ÉÒÔͨ¹ýÈç϶¯×÷ʵÏÖ¡£
- ͨ¹ýstreamingContext.checkpoint(path-to-directory)ÉèÖüì²éµãµÄĿ¼¡£Õâ¸öĿ¼¿ÉÒÔÔÚÈκÎÓëHadoopAPI¿Ú¼æÈݵÄÎļþϵͳÖÐÉèÖã¬Ëü¼ÈÓÃ×÷±£´æÁ÷¼ì²éµã£¬ÓÖÓÃ×÷±£´æÔ¤Ð´ÈÕÖ¾¡£
- ÉèÖÃSparkConfµÄÊôÐÔ spark.streaming.receiver.writeAheadLog.enableÎªÕæ£¨Ä¬ÈÏÖµÊǼ٣©¡£
ÔÚÈÕÖ¾±»ÆôÓÃÒÔºó£¬ËùÓнÓÊÕÆ÷¶¼»ñµÃÁËÄܹ»´Ó¿É¿¿ÊÕµ½µÄÊý¾ÝÖлָ´µÄÓÅÊÆ¡£ÎÒÃǽ¨Òé½ûÖ¹ÄÚ´æÖеĸ´ÖÆ»úÖÆ£¨in-memory replication£©£¨Í¨¹ýÔÚÊäÈëÁ÷ÖÐÉèÖÃÊʵ±µÄ³Ö¾ÃµÈ¼¶(persistence level)£©£¬ÒòΪÓÃÓÚԤдÈÕÖ¾µÄÈÝ´íÎļþϵͳºÜ¿ÉÄÜÒ²¸´ÖÆÁËÊý¾Ý¡£
´ËÍ⣬Èç¹ûÏ£ÍûÉõÖÁÄܹ»»Ö¸´»º´æµÄÊý¾Ý£¬¾ÍÐèҪʹÓÃÖ§³ÖackingµÄÊý¾ÝÔ´£¨¾ÍÏñKafka£¬FlumeºÍKinesisÒ»Ñù£©£¬²¢ÇÒʵÏÖÁËÒ»¸ö¿É¿¿µÄ½ÓÊÕÆ÷£¬ËüÔÚÊý¾Ý¿É¿¿µØ±£´æµ½ÈÕÖ¾ÒԺ󣬲ÅÏòÊý¾ÝÔ´È·ÈÏÕýÈ·¡£ÄÚÖõÄKafkaºÍFlumeÂÖѯ½ÓÊÕÆ÷ÒѾÊǿɿ¿µÄÁË¡£
×îºó£¬Çë×¢ÒâÔÚÆôÓÃÁËԤдÈÕÖ¾ÒÔºó£¬Êý¾Ý½ÓÊÕÍÌÍÂÂÊ»áÓÐÇá΢µÄ½µµÍ¡£ÓÉÓÚËùÓÐÊý¾Ý¶¼±»Ð´ÈëÈÝ´íÎļþϵͳ£¬ÎļþϵͳµÄдÈëÍÌÍÂÂʺÍÓÃÓÚÊý¾Ý¸´ÖƵÄÍøÂç´ø¿í£¬¿ÉÄܾÍÊÇDZÔ򵀮¿¾±ÁË¡£ÔÚ´ËÇé¿öÏ£¬×îºÃ´´½¨¸ü¶àµÄ½ÓÊÕÆ÷Ôö¼Ó½ÓÊյIJ¢Ðжȣ¬ºÍ/»òʹÓøüºÃµÄÓ²¼þÒÔÔö¼ÓÈÝ´íÎļþϵͳµÄÍÌÍÂÂÊ¡£
ʵÏÖϸ½Ú
ÈÃÎÒÃǸüÉîÈëµØÌ½ÌÖÒ»ÏÂÕâ¸öÎÊÌ⣬ŪÇåԤдÈÕÖ¾µ½µ×ÊÇÈçºÎ¹¤×÷µÄ¡£ÈÃÎÒÃÇÖØÎÂÒ»ÏÂÔÚÕâÑùµÄÉÏÏÂÎÄÖÐͨ³£µÄSpark StreamingµÄ¼Ü¹¹¡£
ÔÚÒ»¸öSpark StreamingÓ¦ÓÿªÊ¼Ê±£¨Ò²¾ÍÊÇdriver¿ªÊ¼Ê±£©£¬Ïà¹ØµÄStreamingContext£¨ËùÓÐÁ÷¹¦ÄܵĻù´¡£©Ê¹ÓÃSparkContextÆô¶¯½ÓÊÕÆ÷³ÉΪ³¤×¤ÔËÐÐÈÎÎñ¡£ÕâЩ½ÓÊÕÆ÷½ÓÊÕ²¢±£´æÁ÷Êý¾Ýµ½SparkÄÚ´æÖÐÒÔ¹©´¦Àí¡£Óû§´«ËÍÊý¾ÝµÄÉúÃüÖÜÆÚÈçÏÂͼËùʾ£¨Çë²Î¿¼ÏÂÁÐͼʾ£©¡£
- ½ÓÊÕÊý¾Ý£¨À¶É«¼ýÍ·£©——½ÓÊÕÆ÷½«Êý¾ÝÁ÷·Ö³ÉһϵÁÐС¿é£¬´æ´¢µ½executorÄÚ´æÖС£ÁíÍ⣬ÔÚÆôÓÃÒÔºó£¬Êý¾Ýͬʱ»¹Ð´Èëµ½ÈÝ´íÎļþϵͳµÄԤдÈÕÖ¾¡£
- ֪ͨdriver£¨ÂÌÉ«¼ýÍ·£©——½ÓÊÕ¿éÖеÄÔªÊý¾Ý£¨metadata£©±»·¢Ë͵½driverµÄStreamingContext¡£Õâ¸öÔªÊý¾Ý°üÀ¨£º£¨i£©¶¨Î»ÆäÔÚexecutorÄÚ´æÖÐÊý¾ÝλÖõĿéreference id£¬£¨ii£©¿éÊý¾ÝÔÚÈÕÖ¾ÖÐµÄÆ«ÒÆÐÅÏ¢£¨Èç¹ûÆôÓÃÁË£©¡£
- ´¦ÀíÊý¾Ý£¨ºìÉ«¼ýÍ·£©——ÿÅúÊý¾ÝµÄ¼ä¸ô£¬Á÷ÉÏÏÂÎÄʹÓÿéÐÅÏ¢²úÉúµ¯ÐÔ·Ö²¼Êý¾Ý¼¯RDDºÍËüÃǵÄ×÷Òµ£¨job£©¡£StreamingContextͨ¹ýÔËÐÐÈÎÎñ´¦ÀíexecutorÄÚ´æÖеĿéÀ´Ö´ÐÐ×÷Òµ¡£
- ¼ì²éµã¼ÆË㣨³ÈÉ«¼ýÍ·£©——ΪÁ˻ָ´µÄÐèÒª£¬Á÷¼ÆË㣨»»¾ä»°Ëµ£¬¼´ StreamingContextÌṩµÄDStreams £©ÖÜÆÚÐÔµØÉèÖüì²éµã£¬²¢±£´æµ½Í¬Ò»¸öÈÝ´íÎļþϵͳÖÐÁíÍâµÄÒ»×éÎļþÖС£

µ±Ò»¸öʧ°ÜµÄdriverÖØÆôʱ£¬ÏÂÁÐÊÂÇé³öÏÖ£¨²Î¿¼ÏÂÒ»¸öͼʾ£©¡£
- »Ö¸´¼ÆË㣨³ÈÉ«¼ýÍ·£©——ʹÓüì²éµãÐÅÏ¢ÖØÆôdriver£¬ÖØÐ¹¹ÔìÉÏÏÂÎIJ¢ÖØÆô½ÓÊÕÆ÷¡£
- »Ö¸´ÔªÊý¾Ý¿é£¨ÂÌÉ«¼ýÍ·£©——ΪÁ˱£Ö¤Äܹ»¼ÌÐøÏÂÈ¥Ëù±Ø±¸µÄÈ«²¿ÔªÊý¾Ý¿é¶¼±»»Ö¸´¡£
- δÍê³É×÷ÒµµÄÖØÐÂÐγɣ¨ºìÉ«¼ýÍ·£©——ÓÉÓÚʧ°Ü¶øÃ»Óд¦ÀíÍê³ÉµÄÅú´¦Àí£¬½«Ê¹Óûָ´µÄÔªÊý¾ÝÔٴβúÉúRDDºÍ¶ÔÓ¦µÄ×÷Òµ¡£
- ¶ÁÈ¡±£´æÔÚÈÕÖ¾ÖеĿéÊý¾Ý£¨À¶É«¼ýÍ·£©——ÔÚÕâЩ×÷ÒµÖ´ÐÐʱ£¬¿éÊý¾ÝÖ±½Ó´ÓԤдÈÕÖ¾ÖжÁ³ö¡£Õ⽫»Ö¸´ÔÚÈÕÖ¾Öпɿ¿µØ±£´æµÄËùÓбØÒªÊý¾Ý¡£
- ÖØ·¢ÉÐδȷÈϵÄÊý¾Ý£¨×ÏÉ«¼ýÍ·£©——ʧ°ÜʱûÓб£´æµ½ÈÕÖ¾ÖеĻº´æÊý¾Ý½«ÓÉÊý¾ÝÔ´Ôٴη¢ËÍ¡£ÒòΪ½ÓÊÕÆ÷ÉÐδ¶ÔÆäÈ·ÈÏ¡£

Òò´Ëͨ¹ýԤдÈÕÖ¾ºÍ¿É¿¿µÄ½ÓÊÕÆ÷£¬Spark Streaming¾Í¿ÉÒÔ±£Ö¤Ã»ÓÐÊäÈëÊý¾Ý»áÓÉÓÚdriverµÄʧ°Ü£¨»ò»»ÑÔÖ®£¬ÈκÎʧ°Ü£©¶ø¶ªÊ§¡£
δÀ´µÄ·¢Õ¹·½Ïò
ÓйØÔ¤Ð´ÈÕÖ¾µÄijЩδÀ´·¢Õ¹·½Ïò°üÀ¨£º
- ÀàËÆKafkaÕâÑùµÄϵͳ¿ÉÒÔͨ¹ý¸´ÖÆÊý¾Ý±£³Ö¿É¿¿ÐÔ¡£ÔÊÐíԤдÈÕÖ¾Á½´Î¸ßЧµØ¸´ÖÆÍ¬ÑùµÄÊý¾Ý£ºÒ»´ÎÓÉKafka£¬¶øÁíÒ»´ÎÓÉSpark Streaming¡£SparkδÀ´°æ±¾½«°üº¬KafkaÈÝ´í»úÖÆµÄÔÉúÖ§³Ö£¬´Ó¶ø±ÜÃâµÚ¶þ¸öÈÕÖ¾¡£
- ԤдÈÕ־дÈëÐÔÄܵĸĽø£¨ÓÈÆäÊÇÍÌÍÂÂÊ£©¡£
ÎÄÕ²ÎÓëÈËÔ±
¸ÃÌØÐÔ£¨Ô¤Ð´ÈÕÖ¾£©µÄÖ÷ҪʵÏÖÕßÈçÏ£º
- Tathagata Das (Databricks)——ÕûÌåÉè¼ÆÒÔ¼°´ó²¿·ÖʵÏÖ¡£
- Hari Shreedharan (Cloudera)——ԤдÈÕÖ¾µÄдÈëºÍ¶Á³ö¡£
- ÉÛÈüÈü (Intel)——ÄÚÖÃKafkaÖ§³ÖµÄ¸Ä½ø¡£
½øÒ»²½Ñо¿µÄ²Î¿¼
- ¹ØÓÚ¼ì²éµãºÍԤдÈÕÖ¾¸ü¶àµÄÐÅÏ¢£¬Çë²Î¿¼Spark Streaming Programming Guide
- SparkµÄMeetup talkÖÐÓйصÄÖ÷Ìâ
- JIRA – SPARK-3129
|