ÕªÒª£ºÑÛÏ´óÊý¾ÝÁìÓò×îÈÈÃŵĴʻãÖ®Ò»±ãÊÇÁ÷¼ÆËãÁË£¬¶øÆäÖÐ×îÒ«ÑÛµÄÎÞÒÉÊÇÀ´×ÔSparkÉçÇøµÄSparkStreamingÏîÄ¿¡£ ¶ÔÓÚÁ÷¼ÆËã¶øÑÔ£¬×îºËÐĵÄÌØµãºÁÎÞÒÉÎʾÍÊÇËü¶ÔµÍʱµÄÐèÇ󣬵«ÕâÒ²´øÀ´ÁËÏà¹ØµÄÊý¾Ý¿É¿¿ÐÔÎÊÌâ¡£
2Driver HA
ÓÉÓÚÁ÷¼ÆËãϵͳÊdz¤ÆÚÔËÐС¢ÇÒ²»¶ÏÓÐÊý¾ÝÁ÷È룬Òò´ËÆäSparkÊØ»¤½ø³Ì£¨Driver£©µÄ¿É¿¿ÐÔÖÁ¹ØÖØÒª£¬Ëü¾ö¶¨ÁËStreaming³ÌÐòÄÜ·ñÒ»Ö±ÕýÈ·µØÔËÐÐÏÂÈ¥¡£
DriverʵÏÖHAµÄ½â¾ö·½°¸¾ÍÊǽ«ÔªÊý¾Ý³Ö¾Ã»¯£¬ÒÔ±ãÖØÆôºóµÄ״̬»Ö¸´¡£ÈçͼһËùʾ£¬Driver³Ö¾Ã»¯µÄÔªÊý¾Ý°üÀ¨£º
BlockÔªÊý¾Ý£¨Í¼1ÖеÄÂÌÉ«¼ýÍ·£©£ºReceiver´ÓÍøÂçÉϽÓÊÕµ½µÄÊý¾Ý£¬×é×°³ÉBlockºó²úÉúµÄBlockÔªÊý¾Ý£»
CheckpointÊý¾Ý£¨Í¼1ÖеijÈÉ«¼ýÍ·£©£º°üÀ¨ÅäÖÃÏî¡¢DStream²Ù×÷¡¢Î´Íê³ÉµÄBatch״̬¡¢ºÍÉú³ÉµÄRDDÊý¾ÝµÈ£»
Driverʧ°ÜÖØÆôºó£º

»Ö¸´¼ÆË㣨ͼ2ÖеijÈÉ«¼ýÍ·£©£ºÊ¹ÓÃCheckpointÊý¾ÝÖØÆôdriver£¬ÖØÐ¹¹ÔìÉÏÏÂÎIJ¢ÖØÆô½ÓÊÕÆ÷¡£
»Ö¸´ÔªÊý¾Ý¿é£¨Í¼2ÖеÄÂÌÉ«¼ýÍ·£©£º»Ö¸´BlockÔªÊý¾Ý¡£

»Ö¸´Î´Íê³ÉµÄ×÷Òµ£¨Í¼2ÖеĺìÉ«¼ýÍ·£©£ºÊ¹Óûָ´³öÀ´µÄÔªÊý¾Ý£¬ÔٴβúÉúRDDºÍ¶ÔÓ¦µÄjob£¬È»ºóÌá½»µ½Spark¼¯ÈºÖ´ÐС£
ͨ¹ýÈçÉϵÄÊý¾Ý±¸·ÝºÍ»Ö¸´»úÖÆ£¬DriverʵÏÖÁ˹ÊÕϺóÖØÆô¡¢ÒÀÈ»Äָܻ´StreamingÈÎÎñ¶ø²»¶ªÊ§Êý¾Ý£¬Òò´ËÌṩÁËϵͳ¼¶µÄÊý¾Ý¸ß¿É¿¿¡£
¿É¿¿µÄÉÏÏÂÓÎIOϵͳ
Á÷¼ÆËãÖ÷Ҫͨ¹ýÍøÂçsocketͨÐÅÀ´ÊµÏÖÓëÍⲿIOϵͳµÄÊý¾Ý½»»¥¡£ÓÉÓÚÍøÂçͨÐŵIJ»¿É¿¿Ìص㣬·¢ËͶËÓë½ÓÊÕ¶ËÐèҪͨ¹ýÒ»¶¨µÄÐÒéÀ´±£Ö¤Êý¾Ý°üµÄ½ÓÊÕÈ·ÈϺÍʧ°ÜÖØ·¢»úÖÆ¡£
²»ÊÇËùÓеÄIOϵͳ¶¼Ö§³ÖÖØ·¢£¬ÕâÖÁÉÙÐèҪʵÏÖÊý¾ÝÁ÷µÄ³Ö¾Ã»¯£¬Í¬Ê±»¹ÒªÊµÏÖ¸ßÍÌͺ͵ÍʱÑÓ¡£ÔÚSparkStreaming¹Ù·½Ö§³ÖµÄdata sourceÀïÃæ£¬ÄÜͬʱÂú×ãÕâЩҪÇóµÄÖ»ÓÐKafka£¬Òò´ËÔÚ×î½üµÄSparkStreaming releaseÀïÃæ£¬Ò²ÊǰÑKafkaµ±³ÉÍÆ¼öµÄÍⲿÊý¾Ýϵͳ¡£
³ýÁ˰ÑKafkaµ±³ÉÊäÈëÊý¾ÝÔ´£¨inbound data source£©Ö®Í⣬ͨ³£Ò²½«Æä×÷ΪÊä³öÊý¾ÝÔ´£¨outbound data source£©¡£ËùÓеÄʵʱϵͳ¶¼Í¨¹ýKafkaÕâ¸öMQÀ´×öÊý¾ÝµÄ¶©Ôĺͷַ¢£¬´Ó¶øÊµÏÖÁ÷Êý¾ÝÉú²úÕߺÍÏû·ÑÕߵĽâñî¡£
Ò»¸öµäÐÍµÄÆóÒµ´óÊý¾ÝÖÐÐÄÊý¾ÝÁ÷ÏòÊÓͼÈçͼ3Ëùʾ£º

³ýÁË´ÓÔ´Í·±£Ö¤Êý¾Ý¿ÉÖØ·¢Ö®Í⣬Kafka¸üÊÇÁ÷Êý¾ÝExact OnceÓïÒåµÄÖØÒª±£ÕÏ¡£KafkaÌṩÁËÒ»Ì׵ͼ¶API£¬Ê¹µÃclient¿ÉÒÔ·ÃÎÊtopicÊý¾ÝÁ÷µÄͬʱҲÄÜ·ÃÎÊÆäÔªÊý¾Ý¡£SparkStreamingÿ¸ö½ÓÊÕµÄÈÎÎñ¶¼¿ÉÒÔ´ÓÖ¸¶¨µÄKafka topic¡¢partitionºÍoffsetÈ¥»ñÈ¡Êý¾ÝÁ÷£¬¸÷¸öÈÎÎñµÄÊý¾Ý±ß½çºÜÇåÎú£¬ÈÎÎñʧ°Üºó¿ÉÒÔÖØÐÂÈ¥½ÓÊÕÕⲿ·ÖÊý¾Ý¶ø²»»á²úÉú¡°ÖصþµÄ¡±Êý¾Ý£¬Òò¶ø±£Ö¤ÁËÁ÷Êý¾Ý¡°ÓÐÇÒ½ö´¦ÀíÒ»´Î¡±¡£
¿É¿¿µÄ½ÓÊÕÆ÷
ÔÚSpark 1.3°æ±¾Ö®Ç°£¬SparkStreamingÊÇͨ¹ýÆô¶¯×¨ÓõÄReceiverÈÎÎñÀ´Íê³É´ÓKafka¼¯ÈºµÄÊý¾ÝÁ÷ÀÈ¡¡£
ReceiverÈÎÎñÆô¶¯ºó£¬»áʹÓÃKafkaµÄ¸ß¼¶APIÀ´´´½¨topicMessageStreams¶ÔÏ󣬲¢ÖðÌõ¶ÁÈ¡Êý¾ÝÁ÷»º´æ£¬Ã¿¸öbatchInervalʱ¿Ìµ½À´Ê±ÓÉJobGeneratorÌá½»Éú³ÉÒ»¸öspark¼ÆËãÈÎÎñ¡£
ÓÉÓÚReceiverÈÎÎñ´æÔÚå´»ú·çÏÕ£¬Òò´ËSparkÌṩÁËÒ»¸ö¸ß¼¶µÄ¿É¿¿½ÓÊÕÆ÷-ReliableKafkaReceiverÀàÐÍÀ´ÊµÏÖ¿É¿¿µÄÊý¾ÝÊÕÈ¡£¬ËüÀûÓÃÁËSpark 1.2ÌṩµÄWAL£¨Write Ahead Log£©¹¦ÄÜ£¬°Ñ½ÓÊÕµ½µÄÿһÅúÊý¾Ý³Ö¾Ã»¯µ½´ÅÅ̺󣬸üÐÂtopic-partitionµÄoffsetÐÅÏ¢£¬ÔÙÈ¥½ÓÊÕÏÂÒ»ÅúKafkaÊý¾Ý¡£ÍòÒ»Receiverʧ°Ü£¬ÖØÆôºó»¹ÄÜ´ÓWALÀïÃæ»Ö¸´³öÒѽÓÊÕµÄÊý¾Ý£¬´Ó¶ø±ÜÃâÁËReceiver½Úµãå´»úÔì³ÉµÄÊý¾Ý¶ªÊ§£¨ÒÔÏ´úÂëɾ³ýÁËϸ֦ĩ½ÚµÄÂß¼£©£º

ÆôÓÃWALºó£¬ËäÈ»ReceiverµÄÊý¾Ý¿É¿¿ÐÔ·çÏÕ½µµÍÁË£¬µ«È´ÓÉÓÚ´ÅÅ̳־û¯´øÀ´µÄ¿ªÏú£¬ÏµÍ³ÕûÌåÍÌÍÂÂÊ»áÃ÷ÏÔϽµ¡£Òò´Ë£¬×îз¢²¼µÄSpark 1.3°æ±¾£¬SparkStreamingÔö¼ÓÁËʹÓÃDirect APIµÄ·½Ê½À´ÊµÏÖKafkaÊý¾ÝÔ´µÄ·ÃÎÊ¡£
ÒýÈëÁËDirect APIºó£¬SparkStreaming²»ÔÙÆô¶¯³£×¤µÄReceiver½ÓÊÕÈÎÎñ£¬¶øÊÇÖ±½Ó·ÖÅä¸øÃ¿¸öBatch¼°RDD×îеÄtopic partition offset¡£jobÆô¶¯ÔËÐкóExecutorʹÓÃKafkaµÄsimple consumer APIÈ¥»ñÈ¡ÄÇÒ»¶ÎoffsetµÄÊý¾Ý¡£
ÕâÑù×öµÄºÃ´¦²»½ö±ÜÃâÁËReceiverå´»ú´øÀ´Êý¾Ý¿É¿¿ÐԵķçÏÕ£¬Ò²ÓÉÓÚ±ÜÃâʹÓÃZooKeeper×öoffset¸ú×Ù£¬¶øÊµÏÖÁËÊý¾ÝµÄ¾«È·Ò»´ÎÐÔ£¨ÒÔÏ´úÂëɾ³ýÁËϸ֦ĩ½ÚµÄÂß¼£©£º

ԤдÈÕÖ¾ Write Ahead Log
Spark 1.2¿ªÊ¼ÌṩԤдÈÕÖ¾ÄÜÁ¦£¬ÓÃÓÚReceiverÊý¾Ý¼°DriverÔªÊý¾ÝµÄ³Ö¾Ã»¯ºÍ¹ÊÕϻָ´¡£WALÖ®ËùÒÔÄÜÌṩ³Ö¾Ã»¯ÄÜÁ¦£¬ÊÇÒòΪËüÀûÓÃÁ˿ɿ¿µÄHDFS×öÊý¾Ý´æ´¢¡£
SparkStreamingԤдÈÕÖ¾»úÖÆµÄºËÐÄAPI°üÀ¨£º
¹ÜÀíWALÎļþµÄWriteAheadLogManager
¶Á/дWALµÄWriteAheadLogWriterºÍWriteAheadLogReader
»ùÓÚWALµÄRDD£ºWriteAheadLogBackedBlockRDD
»ùÓÚWALµÄPartition£ºWriteAheadLogBackedBlockRDDPartition
ÒÔÉϺËÐÄAPIÔÚÊý¾Ý½ÓÊպͻָ´½×¶ÎµÄ½»»¥Ê¾ÒâͼÈçͼ4Ëùʾ¡£

´ÓWriteAheadLogWriterµÄÔ´ÂëÀï¿ÉÒÔÇå³þ¿´µ½£¬Ã¿´ÎдÈëÒ»¿éÊý¾Ýbufferµ½HDFSºó¶¼»áµ÷ÓÃflush·½·¨È¥Ç¿ÖÆË¢Èë´ÅÅÌ£¬È»ºó²ÅȥȡÏÂÒ»¿éÊý¾Ý¡£Òò´Ëreceiver½ÓÊÕµÄÊý¾ÝÊÇ¿ÉÒÔ±£Ö¤³Ö¾Ã»¯µ½´ÅÅÌÁË£¬Òò¶ø×öµ½½ÏºÃµÄÊý¾Ý¿É¿¿ÐÔ¡£

½áÊøÓï
µÃÒæÓÚKafkaÕâÀà¿É¿¿µÄdata sourceÒÔ¼°×ÔÉíµÄcheckpoint/WALµÈ»úÖÆ£¬SparkStreamingµÄÊý¾Ý¿É¿¿ÐԵõ½Á˺ܺõı£Ö¤£¬Êý¾ÝÄܱ£Ö¤¡°ÖÁÉÙÒ»´Î¡±£¨at least once£©±»´¦Àí¡£µ«ÓÉÓÚÆäoutbound¶ËµÄÒ»ÖÂÐÔʵÏÖ»¹Î´ÍêÉÆ£¬Òò´ËExact onceÓïÒåÈÔÈ»²»Äܶ˵½¶Ë±£Ö¤¡£SparkStreamingÉçÇøÒѾÔÚ¸ú½øÕâ¸öÌØÐÔµÄʵÏÖ£¨SPARK-4122£©£¬Ô¤¼ÆºÜ¿ì½«ºÏÈëtrunk·¢²¼¡£
×÷Õß¼ò½é£ºÒ¶ç÷£¬»ªÎªÈí¼þ¹«Ë¾Universe²úÆ·²¿¸ß¼¶¼Ü¹¹Ê¦£¬×¨×¢ÓÚ´óÊý¾Ýµ×²ã·Ö²¼Ê½´æ´¢ºÍ¼ÆËã»ù´¡ÉèÊ©£¬ÊÇ»ªÎªÈí¼þ¹«Ë¾Hadoop·¢ÐаæµÄÖ÷Òª¼Ü¹¹Ê¦£¬Ä¿Ç°ÐËȤµãÔÚÁ÷¼ÆËãÓëSpark¡£ |