±¾ÎÄÊÇϵÁÐÎÄÕµĵÚ4ƪ£¬
µÚһƪ
"KafkaÉè¼Æ½âÎö£¨Ò»£©- Kafka±³¾°¼°¼Ü¹¹½éÉÜ"
µÚ¶þƪ
"KafkaÉè¼Æ½âÎö£¨¶þ£©- Kafka High Availability £¨ÉÏ£©
µÚÈýƪ
KafkaÉè¼Æ½âÎö£¨Èý£©- Kafka High Availability £¨ÖУ©
µÚËÄÆª
KafkaÉè¼Æ½âÎö£¨ËÄ£©- Kafka High Availability £¨Ï£©
µÚÎåÆª
KafkaÉè¼Æ½âÎö£¨Î壩- Kafka ConsumerÉè¼Æ½âÎö
µÚÁùƪ KafkaÉè¼Æ½âÎö£¨Áù£©-
KafkaÐÔÄܲâÊÔ·½·¨¼°Benchmark±¨¸æ
¡¶KafkaÉè¼Æ½âÎö¡·ÏµÁÐÉÏһƪ¡¶Kafka¸ßÐÔÄܼܹ¹Ö®µÀ¡ª¡ªKafkaÉè¼Æ½âÎö£¨Áù£©¡·´Óºê¹Û¼Ü¹¹µ½¾ßÌåʵÏÖ·ÖÎöÁËKafkaʵÏÖ¸ßÐÔÄܵÄÔÀí¡£±¾ÎĽéÉÜÁËKafka
StreamµÄ¼Ü¹¹ºÍ²¢·¢Ä£ÐÍ£¬Í¬Ê±·ÖÎöÁËKafka StreamÈçºÎ½â¾öÁ÷ʽ¼ÆËãµÄ¹Ø¼üÎÊÌâ¡£
ʲôÊÇÁ÷ʽ¼ÆËã
Ò»°ãÁ÷ʽ¼ÆËã»áÓëÅúÁ¿¼ÆËãÏà±È½Ï
ÔÚÁ÷ʽ¼ÆËãÄ£ÐÍÖУ¬ÊäÈëÊdzÖÐøµÄ£¬ÔÚʱ¼äÉÏÊÇÎÞ½çµÄ¡£ÕâÒ²¾ÍÒâζ×Å£¬ÓÀÔ¶Äò»µ½È«Á¿Êý¾Ý¼¯½øÐмÆË㡣ͬʱ£¬¼ÆËã½á¹û»á³ÖÐøÊä³ö£¬Ò²¼´¼ÆËã½á¹ûÔÚʱ¼äÉÏÒ²ÊÇÎÞ½çµÄ¡£
Á÷ʽ¼ÆËãÒ»°ã¶ÔʵʱÐÔÒªÇó±È½Ï¸ß£¬Í¬Ê±Ò»°ãÊÇÏȶ¨ÒåÄ¿±ê¼ÆË㣬ȻºóÊý¾Ýµ½´ïºó½«¼ÆËãÂß¼Ó¦ÓÃÓÚÊý¾ÝÖ®ÉÏ¡£Í¬Ê±ÎªÁËÌá¸ß¼ÆËãЧÂÊ£¬Ò»°ã¾¡¿ÉÄÜ£¨¶ÔÓڿɺϲ¢µÄ¼ÆË㣩²ÉÓÃÔöÁ¿¼ÆËã´úÌæÈ«Á¿¼ÆËã¡£

ÅúÁ¿´¦ÀíÄ£ÐÍÖУ¬Ò»°ãÏÈÓÐÈ«Á¿Êý¾Ý¼¯£¬È»ºó½«¼ÆËãÂß¼Ó¦ÓÃÓÚ¸ÃÈ«Á¿Êý¾Ý¼¯¡£ÌصãÊÇÈ«Á¿¼ÆË㣬²¢ÇÒ¼ÆËã½á¹ûÒ»´ÎÐÔÈ«Á¿Êä³ö£¬ÔÚʱ¼äÉÏÊÇÓнçµÄ¡£

Kafka StreamÊÇʲô
Kafka StreamÊÇKafka´Ó0.10.*ÒýÈëµÄÒ»¸öеÄÌØÐÔ¡£ËüÌṩÁ˶ԴæÓÚKafkaÄÚµÄÊý¾Ý½øÐзֲ¼Ê½Á÷ʽ´¦ÀíÒԺͷÖÎöµÄÄÜÁ¦¡£
Kafka StreamµÄÌØµãÈçÏ£º
³ýÁËKafkaÍ⣬²»ÒÀÀµÓÚÈκÎÍⲿϵͳ
Kafka StreamÊÇÒ»¸ö·Ç³£¼òµ¥²¢ÇÒÇáÁ¿¼¶µÄÀà¿â£¬¿ÉÒԷdz£·½±ãµØ½«ËüǶÈëÈÎÒâJava³ÌÐòÖУ¬Ò²¿ÉÒÔÈÎÒⷽʽ½øÐдò°üÒÔ¼°²¿Êð
ͬʱÌṩµ×²ãµÄ´¦Àíµ¥ÔªProcessor£¨ÀàËÆÓÚStormÌṩµÄboltºÍspout£©£¬ÒÔ¼°¸ß²ã³éÏóµÄDSL£¨ÀàËÆÓÚSparkµÄgroup/reduce/map£©
ͨ¹ý¾ßÓÐÈÝ´íÐÔµÄstate storeʵÏÖ¿É¿¿µÄ״̬²Ù×÷£¨Èçwindowed joinºÍaggregation£©
Ö§³ÖExactly Once£¨ÕýºÃÒ»´Î£©´¦ÀíÓïÒå
¾ß±¸¼Ç¼¼¶£¨Ò²¼´Ðм¶£©µÄÊý¾Ý´¦ÀíÄÜÁ¦£¬´Ó¶ø½«ÑÓ³Ù½µµÍµ½ºÁÃë¼¶±ð
³ä·ÖÀûÓÃKafka·ÖÇø»úÖÆÒÔʵÏÖScale Out£¨Ë®Æ½À©Õ¹£©²¢Ìṩ˳ÐòÐÔ±£Ö¤
Ö§³Ö»ùÓÚʼþʱ¼äµÄ´°¿Ú²Ù×÷£¨Spark StreamingÔݲ»Ö§³Öʼþʱ¼ä£©£¬²¢ÇÒ¿É´¦ÀíÍíµ½µÄÊý¾Ý£¨late
arrival of records£©
Kafka Stream¶¨Î»¼°ÓÅÊÆ
µ±Ç°ÒѾÓжàÖÖ·Ö²¼Ê½Á÷ʽ´¦Àíϵͳ£¬×îÖªÃûÇÒÓ¦ÓÃ×î¶àµÄ¿ªÔ´Á÷ʽ´¦Àíϵͳµ±ÊôTwitter¿ªÔ´µÄApache
StormºÍUC berkeleyµÄSpark Streaming¡£¡£¡£¡£
Apache Storm¾¹ý¶àÄê·¢Õ¹£¬Ó¦Óù㷺£¬²¢ÇÒͬÑùÌṩ¼Ç¼¼¶£¨Ðм¶£©µÄ´¦ÀíÄÜÁ¦£¬ÑÓ³ÙÒ²ÔÚºÁÃë¼¶¡£Ä¿Ç°ÒÑÖ§³ÖSQL
on Stream¡£
Spark Streaming»ùÓÚApache Spark£¬Çҷdz£±ãÓÚÓëSQL´¦ÀíºÍͼ¼ÆËãµÈ¼¯³É£¬¹¦ÄÜÇ¿´ó£¬¶ÔÓÚÊìϤÆäËüSparkÓ¦Óÿª·¢µÄÓû§¶øÑÔʹÓÃÃż÷·Ç³£µÍ
ÁíÍ⣬ĿǰÖ÷Á÷µÄHadoop·¢Ðа棬ÈçCloudera£¬HortonworksºÍMapR£¬¶¼¼¯³ÉÁËSparkºÍStorm£¬Ê¹µÃ²¿ÊðÓëÔËάÕâЩϵͳ·Ç³£·½±ã¡£¡£¡£¡£¡£¡£¡£
¼ÈÈ»Apache StormÓëApache SparkÓµÓÃÈç´Ë¶àÓŵ㣬ÄÇΪºÎ»¹ÐèÒªKafka StreamÄØ£¿±ÊÕßÈÏΪÖ÷ÒªÓÐÈçÏÂÔÒò¡£¡£¡£¡£¡£¡£¡£
µÚÒ»£¬SparkºÍStorm¶¼ÊÇÁ÷ʽ´¦Àí¿ò¼Ü£¬¶øKafka StreamÌṩµÄÊÇÒ»¸ö»ùÓÚKafkaµÄÁ÷ʽ´¦ÀíÀà¿â¡£¿ò¼ÜÒªÇ󿪷¢Õß°´ÕÕÖ¸¶¨µÄ·½Ê½È¥¿ª·¢Âß¼²¿·Ö£¬²¢°´ÕÕÖ¸¶¨µÄ·½·¨²¿Ê𡣿ª·¢ÕߺÜÄÑÁ˽â¿ò¼ÜµÄÄÚ²¿´¦Àí·½Ê½£¬´Ó¶øÊ¹µÃµ÷ÊÔºÍÔËά³É±¾½Ï¸ß£¬ÇÒʹÓÃÊÜÏÞ¡£¶øKafka
Stream×÷ΪÁ÷ʽ´¦ÀíÀà¿â£¬Ö±½ÓÌṩ¾ßÌåµÄÀà¸ø¿ª·¢Õßµ÷Óã¬Õû¸öÓ¦ÓõÄÔËÐкͲ¿Êð·½Ê½Íê³ÉÓÉ¿ª·¢Õß¾ö¶¨£¬¼«´óµØ·½±ãÁËʹÓú͵÷ÊÔ¡£Ó¦ÓóÌÐòÓëÀà¿â¼°¿ò¼ÜµÄ¹ØÏµÈçÏÂͼËùʾ¡£

µÚ¶þ£¬Ö÷Á÷µÄ·Ö²¼Ê½Á÷ʽ´¦Àíϵͳ£¬»ù±¾¶¼Ö§³ÖÒÔKafka×÷ΪÆäÊý¾ÝÔ´¡£ÀýÈçSparkÌṩרÃŵÄspark-streaming-kafkaÄ£¿é£¬¶øStormÒ²¾ßÓÐרÃŵÄkafka-spout¡£ÊÂʵÉÏ£¬Kafka¿ÉÒÔ˵Êǵ±Ç°Òµ½çÖ÷Á÷µÄ·Ö²¼Ê½Á÷ʽ´¦ÀíϵͳµÄ±ê×¼Êý¾ÝÔ´£¬´ó²¿·ÖµäÐ͵ÄÁ÷ʽϵͳÖж¼ÒѲ¿ÊðÁËKafka£¬´ËʱʹÓÃKafka
StreamµÄʹÓúÍά»¤³É±¾·Ç³£µÍ¡£¡£¡£¡£¡£¡£¡£
µÚÈý£¬ËäÈ»HortonworksÓëCloudera·½±ãÁËSparkºÍStormµÄ²¿Ê𣬵«ÕâЩ¿ò¼ÜµÄ²¿ÊðºÍÔËάÈÔÏà¶Ô¸´ÔÓ¡£Ïà·´£¬Kafka
Stream×÷ΪÀà¿â£¬¿ÉÒԷdz£·½±ãµØ±»Ç¶Èëµ½ÒÑÓеÄÓ¦ÓóÌÐòÖУ¬Ëü¶ÔÓ¦ÓõĴò°ü·½Ê½¼°²¿Êð·½Ê½»ù±¾ÉÏûÓÐÈκÎÒªÇ󡣡£¡£¡£¡£¡£
µÚËÄ£¬ÓÉÓÚKafka±¾ÉíÌṩÊý¾Ý³Ö¾Ã»¯£¬Òò´ËKafka Stream¾ßÓÐÔÚÏß¹ö¶¯Éý¼¶ºÍ¹ö¶¯²¿Êð¼°ÖØÐ¼ÆËãµÄÄÜÁ¦¡£¡£
µÚÎ壬Kafka Stream³ä·ÖÀûÓÃÁËConsumerµÄRebalance»úÖÆºÍKafkaµÄ·ÖÇø»úÖÆ£¬Ê¹µÃKafka
Stream¿ÉÒԷdz£·½±ãµØ½øÐÐˮƽÀ©Õ¹¡£¾ßÌåÀ´Ëµ£¬Ã¿¸öÔËÐÐKafka StreamµÄÓ¦ÓÃʵÀý¶¼°üº¬ÁËÒ»¸öKafka
ConsumerʵÀý£¬¶à¸öͬһӦÓõIJ»Í¬ÊµÀý¼ä²¢Ðд¦ÀíÄ¿±êÊý¾Ý¼¯¡£¶ø²»Í¬ÊµÀýÖ®¼äµÄ²¿Êð·½Ê½²¢²»±ØÍêȫһÖ£¬±ÈÈ粿·ÖʵÀýÔËÐÐÔÚWebÈÝÆ÷ÖУ¬²¿·ÖʵÀý¿ÉÒÔÔËÐÐÔÚDocker»òKubernetesµÈÐéÄ⻯ÈÝÆ÷ÖС£
µÚÁù£¬Kafka¾ßÓÐConsumer Rebalance»úÖÆ£¬Òò´Ë¿ÉÔÚÏß¶¯Ì¬µ÷Õû²¢Ðжȶø²»ÐèÒªÖØÆô
µÚÁù£¬Ê¹ÓÃSpark Streaming»òStormʱ£¬ÐèҪΪ¿ò¼Ü±¾ÉíµÄ½ø³ÌÔ¤Áô×ÊÔ´£¬ÈçSpark
on YARNµÄnode managerºÍStormµÄsupervisor¡£¶ÔÓ¦ÓóÌÐò£¬¿ò¼Ü±¾ÉíÒ²»áÕ¼Óò¿·Ö×ÊÔ´£¬ÈçSpark
StreamingÐèҪΪshuffleºÍstorageÔ¤ÁôÄÚ´æ¡£¡£¡£¡£¡£¡£¡£
Kafka StreamÕûÌå¼Ü¹¹
Kafka StreamµÄÕûÌå¼Ü¹¹Í¼ÈçÏ¡£

Ŀǰ£¨Kafka 0.11.0.0£©Kafka StreamµÄÊý¾ÝÔ´Ö»ÄÜÊÇKafka£¨ÈçÉÏͼËùʾ£©¡£µ«ÊÇ´¦Àí½á¹û²¢²»Ò»¶¨ÒªÈçÉÏͼËùʾÊä³öµ½Kafka¡£Êµ¼ÊÉÏGlobalKTableºÍKTable¼°KStreamµÄʵÀý»¯¶¼ÐëÖ¸¶¨Topic£¨ÈçÏÂËùʾ£©¡£

ÁíÍ⣬ÉÏͼÖеÄProducerºÍConsumer²»ÐèÓÉ¿ª·¢ÕßÔÚKafka StreamÓ¦ÓÃÖÐÏÔʾµØÊµÀý»¯£¬¶øÊÇÓÉKafka
Stream¸ù¾Ý²ÎÊýÒþʽʵÀý»¯£¬´Ó¶ø½µµÍÁËʹÓÃKafkaµÄÃż÷¡£¿ª·¢ÕßÖ»ÐèרעÓÚ¿ª·¢ºËÐÄÒµÎñÂß¼£¬Ò²¼´ÉÏͼÖÐTaskÄڵIJ¿·Ö¡£
Processor Topology
»ùÓÚKafka StreamµÄÁ÷ʽӦÓõÄÒµÎñÂ߼ȫ²¿ÓÉÒ»¸ö±»³ÆÎªProcessor TopologyµÄ×é¼þÖ´ÐС£ËüÓëSpark
StreamingµÄDAGºÍStormµÄTopologyÀàËÆ£¬¶¼¶¨ÒåÁËÊý¾ÝÔÚ¸÷¸ö´¦Àíµ¥Ôª£¨ÔÚKafka
StreamÖб»³Æ×÷Processor£©¼äµÄÁ÷¶¯·½Ê½£¬Ò²¼´¶¨ÒåÁËÊý¾ÝµÄ´¦ÀíÂß¼
ÏÂÃæÊÇÒ»¸öProcessorµÄʾÀý£¬¸ÃProcessorʵÏÖÁËWord Count¹¦ÄÜ£¬ÇÒÿÃëÊä³öÒ»´Î¼ÆËã½á¹û¡£

ÓÉÉÏÊö´úÂë¿É¼û
context.getStateStoreÌṩµÄ״̬´æ´¢ÎªÓÐ״̬¼ÆË㣨Èç¾ÛºÏ²Ù×÷£¬´°¿Ú²Ù×÷£©ÌṩÁË¿ÉÄÜÐÔ
process¶¨ÒåÁ˶ÔÿÌõ¼Ç¼µÄ´¦ÀíÂß¼£¬Ò²Ó¡Ö¤ÁËKafka¾ßÓмǼ¼¶µÄÊý¾Ý´¦ÀíÄÜÁ¦
context.scheduler¶¨ÒåÁËpunctuate±»Ö´ÐеÄÖÜÆÚ£¬´Ó¶øÌṩÁËʵÏÖ´°¿Ú²Ù×÷µÄÄÜÁ¦
Kafka Stream²¢ÐÐÄ£ÐÍ
Kafka StreamµÄ²¢ÐÐÄ£ÐÍÖУ¬×îСÁ£¶ÈΪTask£¬Ã¿¸öTask°üº¬Ò»¸öÌØ¶¨×ÓTopologyµÄËùÓÐProcessor¡£Òò´Ëÿ¸öTaskËùÖ´ÐеĴúÂëÍêȫһÑù£¬Î¨Ò»µÄ²»Í¬ÔÚÓÚËù´¦ÀíµÄÊý¾Ý¼¯»¥²¹¡£
ÕâÒ»µã¸úStormµÄTopologyÍêÈ«²»Ò»Ñù¡£StormµÄTopologyµÄÿһ¸öTaskÖ»°üº¬Ò»¸öSpout»òBoltµÄʵÀý¡£Òò´ËStormµÄÒ»¸öTopologyÄڵIJ»Í¬TaskÖ®¼äÐèҪͨ¹ýÍøÂçͨÐÅ´«µÝÊý¾Ý£¬¶øKafka
StreamµÄTask°üº¬ÁËÍêÕûµÄ×ÓTopology£¬ËùÒÔTaskÖ®¼ä²»ÐèÒª´«µÝÊý¾Ý£¬Ò²¾Í²»ÐèÒªÍøÂçͨÐÅ¡£ÕâÒ»µã½µµÍÁËϵͳ¸´ÔÓ¶È£¬Ò²Ìá¸ßÁË´¦ÀíЧÂÊ¡£
Èç¹ûij¸öStreamµÄÊäÈëTopicÓжà¸ö(±ÈÈç2¸öTopic£¬1¸öPartitionÊýΪ5£¬ÁíÒ»¸öPartitionÊýΪ6)£¬Ôò×ܵÄTaskÊýµÈÓÚPartitionÊý×î¶àµÄÄǸöTopicµÄPartitionÊý£¨max(5,6)=6£©¡£ÕâÊÇÒòΪKafka
StreamʹÓÃÁËConsumerµÄRebalance»úÖÆ£¬Ã¿¸öPartition¶ÔÓ¦Ò»¸öTask¡£
ÏÂͼչʾÁËÔÚÒ»¸ö½ø³Ì£¨Instance£©ÖÐÒÔ2¸öTopic£¨PartitionÊý¾ùΪ4£©ÎªÊý¾ÝÔ´µÄKafka
StreamÓ¦ÓõIJ¢ÐÐÄ£ÐÍ¡£´ÓͼÖпÉÒÔ¿´µ½£¬ÓÉÓÚKafka StreamÓ¦ÓõÄĬÈÏÏß³ÌÊýΪ1£¬ËùÒÔ4¸öTaskÈ«²¿ÔÚÒ»¸öÏß³ÌÖÐÔËÐС£

ΪÁ˳ä·ÖÀûÓöàÏ̵߳IJ¢Ðд¦ÀíÓÅÊÆ£¬Kafka StreamÓ¦ÓóÌÐò¿ÉÉèÖÃÏß³ÌÊý£¨Ä¬ÈÏΪ1£©¡£ÏÂͼչʾÁËÏß³ÌÊýΪ2ʱµÄ²¢ÐÐÄ£ÐÍ¡£

´ÓÉÏͼ¿É¼û£¬Ã¿¸öÏ̷ֱ߳ð¸ºÔðÖ´ÐÐÁ½¸öTask¡£
ǰÎÄÓÐÌáµ½£¬Kafka Stream¿É±»Ç¶Èëµ½ÈÎÒâJavaÓ¦Óã¨ÀíÂÛÉÏ»ùÓÚJVMµÄÓ¦Óö¼¿ÉÒÔ£©ÖС£ÏÂͼչʾÁËÔÚͬһ̨»úÆ÷µÄ²»Í¬½ø³ÌÖÐͬʱÆô¶¯Í¬Ò»¸öKafka
StreamÓ¦ÓÃʱµÄ²¢ÐÐÄ£ÐÍ¡£¡£¡£

×¢Ò⣬ÕâÀïÒª±£Ö¤Á½¸ö½ø³ÌµÄStreamsConfig.APPLICATION_ID_CONFIGÍêȫһÑù¡£ÒòΪKafka
Stream½«APPLICATION_ID_CONFIG×÷ΪÒþʽÆô¶¯µÄConsumerµÄGroup
ID¡£Ö»Óб£Ö¤APPLICATION_ID_CONFIGÏàͬ£¬²ÅÄܱ£Ö¤ÕâÁ½¸ö½ø³ÌµÄConsumerÊôÓÚͬһ¸öGroup£¬´Ó¶ø¿ÉÒÔͨ¹ýConsumer
Rebalance»úÖÆÄõ½»¥²¹µÄÊý¾Ý¼¯¡£
¼ÈȻʵÏÖÁË¶à½ø³Ì²¿Ê𣬿ÉÒÔÒÔͬÑùµÄ·½Ê½ÊµÏÖ¶à»úÆ÷²¿Ê𡣸ò¿Êð·½Ê½Ò²ÒªÇóËùÓнø³ÌµÄAPPLICATION_ID_CONFIGÍêȫһÑù¡£´ÓͼÉÏÒ²¿ÉÒÔ¿´µ½£¬Ã¿¸öʵÀýÖеÄÏß³ÌÊý²¢²»ÒªÇóÒ»Ñù¡£µ«ÊÇÎÞÂÛÈçºÎ²¿Êð£¬Task×ÜÊý×ܻᱣ֤һÖ¡£

×¢Ò⣺Kafka StreamµÄ²¢ÐÐÄ£ÐÍ£¬·Ç³£ÒÀÀµÓÚ¡¶KafkaÉè¼Æ½âÎö£¨ËÄ£©- Kafka ConsumerÉè¼Æ½âÎö¡·Ò»ÎÄÖнéÉܵÄConsumerµÄRebalance»úÖÆºÍ¡¶KafkaÉè¼Æ½âÎö£¨Ò»£©-
Kafka±³¾°¼°¼Ü¹¹½éÉÜ¡·Ò»ÎÄÖнéÉܵÄKafka·ÖÇø»úÖÆ¡£Ç¿ÁÒ½¨Ò鲻̫ÊìϤÕâÁ½ÖÖ»úÖÆµÄÅóÓÑ£¬ÏÈÐÐÔĶÁÕâÁ½ÆªÎÄÕ¡£
ÕâÀï¶Ô±ÈÒ»ÏÂStormµÄTopologyºÍKafka StreamµÄProcessor Topology¡£
StormµÄTopologyÓÉSpoutºÍBolt×é³É£¬SpoutÌṩÊý¾ÝÔ´£¬¶øBoltÌṩ¼ÆËãºÍÊý¾Ýµ¼³ö¡£Kafka
StreamµÄProcessor TopologyÍêÈ«ÓÉProcessor×é³É£¬ÒòΪËüµÄÊý¾Ý¹Ì¶¨ÓÉKafkaµÄConsumer´ÓKafkaµÄÒ»¸ö»ò¶à¸öTopicÖлñÈ¡
StormµÄTopology¿ÉÒÔͬʱ°üº¬Shuffle²¿·ÖºÍ·ÇShuffle²¿·Ö£¬²¢ÇÒÍùÍùÒ»¸öTopology¾ÍÊÇÒ»¸öÍêÕûµÄÓ¦Ó᣶øKafka
StreamµÄÒ»¸öÎïÀíTopologyÖ»°üº¬·ÇShuffle²¿·Ö£¬¶øShuffle²¿·ÖÐèҪͨ¹ýthrough²Ù×÷ÏÔʾÍê³É£¬¸Ã²Ù×÷½«Ò»¸ö´óµÄTopology·Ö³ÉÁË2¸ö×ÓTopology
StormµÄ²»Í¬BoltÔËÐÐÔÚ²»Í¬µÄExecutorÖУ¬ºÜ¿ÉÄÜλÓÚ²»Í¬µÄ»úÆ÷£¬ÐèҪͨ¹ýÍøÂçͨÐÅ´«ÊäÊý¾Ý¡£¶øKafka
StreamµÄProcessor TopologyµÄ²»Í¬ProcessorÍêÈ«ÔËÐÐÓÚͬһ¸öTaskÖУ¬Ò²¾ÍÍêÈ«´¦ÓÚͬһ¸öỊ̈߳¬ÎÞÐèÍøÂçͨÐÅ
StormµÄÒ»¸öTaskÖ»°üº¬Ò»¸öSpout»òÕßBoltµÄʵÀý£¬¶øKafka StreamµÄÒ»¸öTask°üº¬ÁËÒ»¸ö×ÓTopologyµÄËùÓÐProcessor
StormµÄTopologyÄÚ£¬²»Í¬Bolt/SpoutµÄ²¢ÐжȿÉÒÔ²»Ò»Ñù£¬¶øKafka StreamµÄ×ÓTopologyÄÚ£¬ËùÓÐProcessorµÄ²¢ÐжÈÍêȫһÑù
StormÈç¹ûÒªÐÞ¸Äij¸öSpout»òBoltµÄ²¢Ðжȣ¬ÐèÒªÖØÆôTopology¡£¶øKafka Stream¿ÉÀûÓÃConsumer
Rebalance»úÖÆ·Ç³£·½±ãµØÔÚÏß¶¯Ì¬µ÷Õû²¢ÐжÈ
State store
Á÷ʽ´¦ÀíÖУ¬²¿·Ö²Ù×÷ÊÇÎÞ״̬µÄ£¬ÀýÈç¹ýÂ˲Ù×÷£¨Kafka Stream DSLÖÐÓÃfiler·½·¨ÊµÏÖ£©¡£¶ø²¿·Ö²Ù×÷ÊÇÓÐ״̬µÄ£¬ÐèÒª¼Ç¼Öмä״̬£¬ÈçWindow²Ù×÷ºÍ¾ÛºÏ¼ÆËã¡£State
store±»ÓÃÀ´´æ´¢Öмä״̬¡£Ëü¿ÉÒÔÊÇÒ»¸ö³Ö¾Ã»¯µÄKey-Value´æ´¢£¬Ò²¿ÉÒÔÊÇÄÚ´æÖеÄHashMap£¬»òÕßÊÇÊý¾Ý¿â¡£KafkaÌṩÁË»ùÓÚTopicµÄ״̬´æ´¢¡£
TopicÖд洢µÄÊý¾Ý¼Ç¼±¾ÉíÊÇKey-ValueÐÎʽµÄ£¬Í¬Ê±KafkaµÄlog compaction»úÖÆ¿É¶ÔÀúÊ·Êý¾Ý×öcompact²Ù×÷£¬±£Áôÿ¸öKey¶ÔÓ¦µÄ×îºóÒ»¸öValue£¬´Ó¶øÔÚ±£Ö¤Key²»¶ªÊ§µÄǰÌáÏ£¬¼õÉÙ×ÜÊý¾ÝÁ¿£¬´Ó¶øÌá¸ß²éѯЧÂÊ¡£
¹¹ÔìKTableʱ£¬ÐèÒªÖ¸¶¨Æästate store name¡£Ä¬ÈÏÇé¿öÏ£¬¸ÃÃû×ÖÒ²¼´ÓÃÓÚ´æ´¢¸ÃKTableµÄ״̬µÄTopicµÄÃû×Ö£¬±éÀúKTableµÄ¹ý³Ì£¬Êµ¼Ê¾ÍÊDZéÀúËü¶ÔÓ¦µÄstate
store£¬»òÕß˵±éÀúTopicµÄËùÓÐkey£¬²¢È¡Ã¿¸öKey×îÐÂÖµµÄ¹ý³Ì¡£ÎªÁËʹµÃ¸Ã¹ý³Ì¸ü¼Ó¸ßЧ£¬Ä¬ÈÏÇé¿öÏ»á¶Ô¸ÃTopic½øÐÐcompact²Ù×÷¡£
ÁíÍ⣬³ýÁËKTable£¬ËùÓÐ״̬¼ÆË㣬¶¼ÐèÒªÖ¸¶¨state store name£¬´Ó¶ø¼Ç¼Öмä״̬¡£
KStream vs. KTable
KTableºÍKStreamÊÇKafka StreamÖзdz£ÖØÒªµÄÁ½¸ö¸ÅÄËüÃÇÊÇKafkaʵÏÖ¸÷ÖÖÓïÒåµÄ»ù´¡¡£Òò´ËÕâÀïÓбØÒª·ÖÎö϶þÕßµÄÇø±ð¡£
KStreamÊÇÒ»¸öÊý¾ÝÁ÷£¬¿ÉÒÔÈÏΪËùÓмǼ¶¼Í¨¹ýInsert onlyµÄ·½Ê½²åÈë½øÕâ¸öÊý¾ÝÁ÷Àï¡£¶øKTable´ú±íÒ»¸öÍêÕûµÄÊý¾Ý¼¯£¬¿ÉÒÔÀí½âΪÊý¾Ý¿âÖÐµÄ±í¡£ÓÉÓÚÿÌõ¼Ç¼¶¼ÊÇKey-Value¶Ô£¬ÕâÀï¿ÉÒÔ½«KeyÀí½âΪÊý¾Ý¿âÖеÄPrimary
Key£¬¶øValue¿ÉÒÔÀí½âΪһÐмǼ¡£¿ÉÒÔÈÏΪKTableÖеÄÊý¾Ý¶¼ÊÇͨ¹ýUpdate onlyµÄ·½Ê½½øÈëµÄ¡£Ò²¾ÍÒâζ×Å£¬Èç¹ûKTable¶ÔÓ¦µÄTopicÖÐнøÈëµÄÊý¾ÝµÄKeyÒѾ´æÔÚ£¬ÄÇô´ÓKTableÖ»»áÈ¡³öͬһKey¶ÔÓ¦µÄ×îºóÒ»ÌõÊý¾Ý£¬Ï൱ÓÚеÄÊý¾Ý¸üÐÂÁ˾ɵÄÊý¾Ý¡£
ÒÔÏÂͼΪÀý£¬¼ÙÉèÓÐÒ»¸öKStreamºÍKTable£¬»ùÓÚͬһ¸öTopic´´½¨£¬²¢ÇÒ¸ÃTopicÖаüº¬ÈçÏÂͼËùʾ5ÌõÊý¾Ý¡£´Ëʱ±éÀúKStream½«µÃµ½ÓëTopicÄÚÊý¾ÝÍêȫһÑùµÄËùÓÐ5ÌõÊý¾Ý£¬ÇÒ˳Ðò²»±ä¡£¶ø´Ëʱ±éÀúKTableʱ£¬ÒòΪÕâ5Ìõ¼Ç¼ÖÐÓÐ3¸ö²»Í¬µÄKey£¬ËùÒÔ½«µÃµ½3Ìõ¼Ç¼£¬Ã¿¸öKey¶ÔÓ¦×îеÄÖµ£¬²¢ÇÒÕâÈýÌõÊý¾ÝÖ®¼äµÄ˳ÐòÓëÔÀ´ÔÚTopicÖеÄ˳Ðò±£³ÖÒ»Ö¡£ÕâÒ»µãÓëKafkaµÄÈÕÖ¾compactÏàͬ¡£

´ËʱÈç¹û¶Ô¸ÃKStreamºÍKTable·Ö±ð»ùÓÚkey×öGroup£¬¶ÔValue½øÐÐSum£¬µÃµ½µÄ½á¹û½«»á²»Í¬¡£¶ÔKtableµÄ¼ÆËã½á¹ûÊÇ<Mike£¬4>£¬<Jack£¬3>£¬<Lily£¬5>¡£¶ø¶ÔKStreamµÄ¼ÆËã½á¹û½«ÊÇ<Jack£¬4>£¬<Lily£¬7>£¬<Mike£¬4>¡£
ʱ¼ä
ÔÚÁ÷ʽÊý¾Ý´¦ÀíÖУ¬Ê±¼äÊÇÊý¾ÝµÄÒ»¸ö·Ç³£ÖØÒªµÄÊôÐÔ¡£´ÓKafka 0.10¿ªÊ¼£¬Ã¿Ìõ¼Ç¼³ýÁËKeyºÍValueÍ⣬»¹Ôö¼ÓÁËtimestampÊôÐÔ¡£Ä¿Ç°Kafka
StreamÖ§³ÖÈýÖÖʱ¼ä
ʼþ·¢Éúʱ¼ä¡£Ê¼þ·¢ÉúµÄʱ¼ä£¬°üº¬ÔÚÊý¾Ý¼Ç¼ÖС£·¢Éúʱ¼äÓÉProducerÔÚ¹¹ÔìProducerRecordʱָ¶¨¡£²¢ÇÒÐèÒªBroker»òÕßTopic½«message.timestamp.typeÉèÖÃΪCreateTime£¨Ä¬ÈÏÖµ£©²ÅÄÜÉúЧ¡£
ÏûÏ¢½ÓÊÕʱ¼ä£¬Ò²¼´ÏûÏ¢´æÈëBrokerµÄʱ¼ä¡£µ±Broker»òTopic½«message.timestamp.typeÉèÖÃΪLogAppendTimeʱÉúЧ¡£´ËʱBroker»áÔÚ½ÓÊÕµ½ÏûÏ¢ºó£¬´æÈë´ÅÅÌǰ£¬½«ÆätimestampÊôÐÔÖµÉèÖÃΪµ±Ç°»úÆ÷ʱ¼ä¡£Ò»°ãÏûÏ¢½ÓÊÕʱ¼ä±È½Ï½Ó½üÓÚʼþ·¢Éúʱ¼ä£¬²¿·Ö³¡¾°Ï¿ɴúÌæÊ¼þ·¢Éúʱ¼ä¡£
ÏûÏ¢´¦Àíʱ¼ä£¬Ò²¼´Kafka Stream´¦ÀíÏûϢʱµÄʱ¼ä¡£
×¢£ºKafka StreamÔÊÐíͨ¹ýʵÏÖorg.apache.kafka.streams.processor.TimestampExtractor½Ó¿Ú×Ô¶¨Òå¼Ç¼ʱ¼ä¡£
´°¿Ú
ǰÎÄÌáµ½£¬Á÷ʽÊý¾ÝÊÇÔÚʱ¼äÉÏÎÞ½çµÄÊý¾Ý¡£¶ø¾ÛºÏ²Ù×÷Ö»ÄÜ×÷ÓÃÔÚÌØ¶¨µÄÊý¾Ý¼¯£¬Ò²¼´ÓнçµÄÊý¾Ý¼¯ÉÏ¡£Òò´ËÐèҪͨ¹ýijÖÖ·½Ê½´ÓÎÞ½çµÄÊý¾Ý¼¯Éϰ´Ìض¨µÄÓïÒåѡȡ³öÓнçµÄÊý¾Ý¡£´°¿ÚÊÇÒ»Öַdz£³£ÓõÄÉ趨¼ÆËã±ß½çµÄ·½Ê½¡£²»Í¬µÄÁ÷ʽ´¦Àíϵͳ֧³ÖµÄ´°¿ÚÀàËÆ£¬µ«²»¾¡Ïàͬ¡£
Kafka StreamÖ§³ÖµÄ´°¿ÚÈçÏ¡£
Hopping Time Window ¸Ã´°¿Ú¶¨ÒåÈçÏÂͼËùʾ¡£ËüÓÐÁ½¸öÊôÐÔ£¬Ò»¸öÊÇAdvance interval£¬Ò»¸öÊÇWindow
size¡£Advance interval¶¨ÒåÊä³öµÄʱ¼ä¼ä¸ô£¬¶øWindow sizeÖ¸¶¨ÁË´°¿ÚµÄ´óС£¬Ò²¼´Ã¿´Î¼ÆËãµÄÊý¾Ý¼¯µÄ´óС¡£¶ø¡£Ò»¸öµäÐ͵ÄÓ¦Óó¡¾°ÊÇ£¬Ã¿¸ô5ÃëÖÓÊä³öÒ»´Î¹ýÈ¥1¸öСʱÄÚÍøÕ¾µÄPV»òÕßUV

Tumbling Time Window ¸Ã´°¿Ú¶¨ÒåÈçÏÂͼËùʾ¡£¿ÉÒÔÈÏΪËüÊÇHopping Time
WindowµÄÒ»ÖÖÌØÀý£¬Ò²¼´Window sizeºÍAdvance intervalÏàµÈ¡£ËüµÄÌØµãÊǸ÷¸öWindowÖ®¼äÍêÈ«²»Ïཻ¡£

Session Window ¸Ã´°¿ÚÓÃÓÚ¶ÔKey×öGroupºóµÄ¾ÛºÏ²Ù×÷ÖС£ËüÐèÒª¶ÔKey×ö·Ö×飬Ȼºó¶Ô×éÄÚµÄÊý¾Ý¸ù¾ÝÒµÎñÐèÇó¶¨ÒåÒ»¸ö´°¿ÚµÄÆðʼµãºÍ½áÊøµã¡£Ò»¸öµäÐ͵ݸÀýÊÇ£¬Ï£Íûͨ¹ýSession
Window¼ÆËãij¸öÓû§·ÃÎÊÍøÕ¾µÄʱ¼ä¡£¶ÔÓÚÒ»¸öÌØ¶¨µÄÓû§£¨ÓÃKey±íʾ£©¶øÑÔ£¬µ±·¢ÉúµÇ¼²Ù×÷ʱ£¬¸ÃÓû§£¨Key£©µÄ´°¿Ú¼´¿ªÊ¼£¬µ±·¢ÉúÍ˳ö²Ù×÷»òÕß³¬Ê±Ê±£¬¸ÃÓû§£¨Key£©µÄ´°¿Ú¼´½áÊø¡£´°¿Ú½áÊøÊ±£¬¿É¼ÆËã¸ÃÓû§µÄ·ÃÎÊʱ¼ä»òÕßµã»÷´ÎÊýµÈ¡£
Sliding Window ¸Ã´°¿ÚÖ»ÓÃÓÚ2¸öKStream½øÐÐJoin¼ÆËãʱ¡£¸Ã´°¿ÚµÄ´óС¶¨ÒåÁËJoinÁ½²àKStreamµÄÊý¾Ý¼Ç¼±»ÈÏΪÔÚͬһ¸ö´°¿ÚµÄ×î´óʱ¼ä²î¡£¼ÙÉè¸Ã´°¿ÚµÄ´óСΪ5Ã룬Ôò²ÎÓëJoinµÄ2¸öKStreamÖУ¬¼Ç¼ʱ¼ä²îСÓÚ5µÄ¼Ç¼±»ÈÏΪÔÚͬһ¸ö´°¿ÚÖУ¬¿ÉÒÔ½øÐÐJoin¼ÆËã¡£
Join
Kafka StreamÓÉÓÚ°üº¬KtableºÍKStreamÁ½ÖÖÊý¾Ý¼¯£¬Òò´ËÌṩÈçÏÂJoin¼ÆËã
KTable Join KTable ½á¹ûÈÔΪKTable¡£ÈÎÒâÒ»±ßÓиüУ¬½á¹ûKTable¶¼»á¸üС£
KStream Join KStream ½á¹ûΪKStream¡£±ØÐë´ø´°¿Ú²Ù×÷£¬·ñÔò»áÔì³ÉJoin²Ù×÷Ò»Ö±²»½áÊø¡£
KStream Join KTable / GlobalKTable ½á¹ûΪKStream¡£Ö»Óе±KStreamÖÐÓÐÐÂÊý¾Ýʱ£¬²Å»á´¥·¢Join¼ÆËã²¢Êä³ö½á¹û¡£KStreamÎÞÐÂÊý¾Ýʱ£¬KTableµÄ¸üв¢²»»á´¥·¢Join¼ÆË㣬Ҳ²»»áÊä³öÊý¾Ý¡£²¢ÇҸøüÐÂÖ»¶ÔÏ´ÎJoinÉúЧ¡£Ò»¸öµäÐ͵ÄʹÓó¡¾°ÊÇ£¬KStreamÖеĶ©µ¥ÐÅÏ¢ÓëKTableÖеÄÓû§ÐÅÏ¢×ö¹ØÁª¼ÆËã¡£
¶ÔÓÚJoin²Ù×÷£¬Èç¹ûÒªµÃµ½ÕýÈ·µÄ¼ÆËã½á¹û£¬ÐèÒª±£Ö¤²ÎÓëJoinµÄKTable»òKStreamÖÐKeyÏàͬµÄÊý¾Ý±»·ÖÅ䵽ͬһ¸öTask¡£¾ßÌå·½·¨ÊÇ
²ÎÓëJoinµÄKTable»òKStream¶ÔÓ¦µÄTopicµÄPartitionÊýÏàͬ
²ÎÓëJoinµÄKTable»òKStreamµÄKeyÀàÐÍÏàͬ£¨Êµ¼ÊÉÏ£¬ÒµÎñº¬ÒâÒ²Ó¦¸ÃÏàͬ£©
Partitioner²ßÂÔµÄ×îÖÕ½á¹ûµÈЧ£¨ÊµÏÖ²»ÐèÒªÍêȫһÑù£¬µ«Ð§¹û±ØÐëÒ»Ö£©£¬Ò²¼´KeyÏàͬµÄÇé¿öÏ£¬±»·ÖÅäµ½IDÏàͬµÄPartitionÄÚ
Èç¹ûÉÏÊöÌõ¼þ²»Âú×㣬¿Éͨ¹ýµ÷ÓÃÈçÏ·½·¨Ê¹µÃËüÂú×ãÉÏÊöÌõ¼þ¡£
KStream<K, V> through(Serde<K> keySerde,
Serde<V> valSerde, StreamPartitioner<K, V>
partitioner, String topic)
ͨ¹ýthrough·½·¨£¬½øÐÐJoin²Ù×÷µÄ¹ý³ÌÈçÏÂͼËùʾ

´ÓÉÏͼ¿ÉÒÔ¿´³ö£¬ÎªÁËÂú×ãJoinÌõ¼þ£¬ÐèҪͨ¹ýthroughµÈ·½·¨¶Ô²ÎÓëJoinµÄijһ·½½øÐÐÖØÐ·ÖÇø£¬Ï൱ÓÚStormµÄField
GroupingºÍSparkµÄShuffle¡£
ΪÁËÌá¸ßJoinµÄЧÂÊ£¬0.10.2.0ÖÐÒýÈëÁËGlobalKTable¡£µ±KStreamÓëÒ»¸öGlobalKTable
Joinʱ£¬GlobalKTableµÄËùÓÐÊý¾Ý»á±»¸´ÖƵ½ËùÓÐKafka StreamÓ¦ÓÃʵÀý£¬Òò´ËKStream¿ÉÔÚTaskÄÚÖ±½ÓÓëÆäËùÔÚʵÀýÖеÄGlobalKTable¸±±¾½øÐÐJoin£¬²»ÐèҪͨ¹ýthroughµÈ·½·¨½øÐÐÖØÐ·ÖÇø£¬¼«´óÌá¸ßÁËJoinʱµÄЧÂÊ¡£ÓëGlobalKTableµÄJoin¹ý³ÌÈçÏÂͼËùʾ¡£

Ò»¸öµäÐ͵ÄÊÊÓó¡¾°ÊÇ£¬ÔÚÀàÊý¾Ý²Ö¿âµÄÓ¦ÓÃÖУ¬½«°üº¬´óÁ¿ÔöÁ¿Êý¾ÝµÄTopicͨ¹ýKStreamÒýÓ㬶ø½«°üº¬ÉÙÁ¿£¬¿ÉÄܸüеÄÊý¾Ý£¬ÖÃÓÚGlobalKTableÖС£³ä·ÖÀûÓÃGlobalKTableµÄÊý¾Ý¸´ÖÆÌØÐÔ£¬½µµÍJoin¿ªÏú£¬Ìá¸ßÐÔÄÜ¡£
µ«ÐèҪעÒâµÄÊÇ£¬GlobalKTableÐèÒª½«ËùÓÐÊý¾Ý¸´ÖƵ½Ã¿Ò»¸öʵÀý£¬Òò´Ë±ØÐ뱣֤ʵÀýÄÚ´æÖÁÉÙ×ã¹»±£´æ¸ÃGlobalKTableÄÚÈ«²¿Êý¾Ý¡£
¾ÛºÏÓëÂÒÐò´¦Àí
¾ÛºÏ²Ù×÷¿ÉÓ¦ÓÃÓÚKStreamºÍKTable¡£µ±¾ÛºÏ·¢ÉúÔÚKStreamÉÏʱ±ØÐëÖ¸¶¨´°¿Ú£¬´Ó¶øÏÞ¶¨¼ÆËãµÄÄ¿±êÊý¾Ý¼¯¡£
ÐèҪ˵Ã÷µÄÊÇ£¬¾ÛºÏ²Ù×÷µÄ½á¹û¿Ï¶¨ÊÇKTable¡£ÒòΪKTableÊǿɸüÐµģ¬¿ÉÒÔÔÚÍíµ½µÄÊý¾Ýµ½À´Ê±£¨Ò²¼´·¢ÉúÊý¾ÝÂÒÐòʱ£©¸üнá¹ûKTable¡£
ÕâÀï¾ÙÀý˵Ã÷¡£¼ÙÉè¶ÔKStreamÒÔ5ÃëΪ´°¿Ú´óС£¬½øÐÐTumbling Time WindowÉϵÄCount²Ù×÷¡£²¢ÇÒKStreamÏȺó³öÏÖʱ¼äΪ1Ãë,
3Ãë, 5ÃëµÄÊý¾Ý£¬´Ëʱ5ÃëµÄ´°¿ÚÒÑ´ïÉÏÏÞ£¬Kafka Stream¹Ø±Õ¸Ã´°¿Ú£¬´¥·¢Count²Ù×÷²¢½«½á¹û3Êä³öµ½KTableÖУ¨¼ÙÉè¸Ã½á¹û±íʾΪ<1-5,3>£©¡£Èô1Ãëºó£¬ÓÖÊÕµ½ÁËʱ¼äΪ2ÃëµÄ¼Ç¼£¬ÓÉÓÚ1-5ÃëµÄ´°¿ÚÒѹرգ¬ÈôÖ±½ÓÅׯú¸ÃÊý¾Ý£¬Ôò¿ÉÈÏΪ֮ǰµÄ½á¹û<1-5,3>²»×¼È·¡£¶øÈç¹ûÖ±½Ó½«ÍêÕûµÄ½á¹û<1-5,4>Êä³öµ½KStreamÖУ¬ÔòKStreamÖн«»á°üº¬¸Ã´°¿ÚµÄ2Ìõ¼Ç¼£¬<1-5,3>,
<1-5,4>£¬Ò²»á´æÔÚ°¹Êý¾Ý¡£Òò´ËKafka StreamÑ¡Ôñ½«¾ÛºÏ½á¹û´æÓÚKTableÖУ¬´ËʱеĽá¹û<1-5,4>»áÌæ´ú¾ÉµÄ½á¹û<1-5,3>¡£Óû§¿ÉµÃµ½ÍêÕûµÄÕýÈ·µÄ½á¹û¡£
ÕâÖÖ·½Ê½±£Ö¤ÁËÊý¾Ý׼ȷÐÔ£¬Í¬Ê±Ò²Ìá¸ßÁËÈÝ´íÐÔ¡£
µ«ÐèҪ˵Ã÷µÄÊÇ£¬Kafka Stream²¢²»»á¶ÔËùÓÐÍíµ½µÄÊý¾Ý¶¼ÖØÐ¼ÆËã²¢¸üнá¹û¼¯£¬¶øÊÇÈÃÓû§ÉèÖÃÒ»¸öretention
period£¬½«Ã¿¸ö´°¿ÚµÄ½á¹û¼¯ÔÚÄÚ´æÖб£ÁôÒ»¶¨Ê±¼ä£¬¸Ã´°¿ÚÄÚµÄÊý¾ÝÍíµ½Ê±£¬Ö±½ÓºÏ²¢¼ÆË㣬²¢¸üнá¹ûKTable¡£³¬¹ýretention
periodºó£¬¸Ã´°¿Ú½á¹û½«´ÓÄÚ´æÖÐɾ³ý£¬²¢ÇÒÍíµ½µÄÊý¾Ý¼´Ê¹ÂäÈë´°¿Ú£¬Ò²»á±»Ö±½Ó¶ªÆú¡£
ÈÝ´í
Kafka Stream´ÓÈçϼ¸¸ö·½Ãæ½øÐÐÈÝ´í
¸ß¿ÉÓõÄPartition±£Ö¤ÎÞÊý¾Ý¶ªÊ§¡£Ã¿¸öTask¼ÆËãÒ»¸öPartition£¬¶øKafkaÊý¾Ý¸´ÖÆ»úÖÆ±£Ö¤ÁËPartitionÄÚÊý¾ÝµÄ¸ß¿ÉÓÃÐÔ£¬¹ÊÎÞÊý¾Ý¶ªÊ§·çÏÕ¡£Í¬Ê±ÓÉÓÚÊý¾ÝÊdz־û¯µÄ£¬¼´Ê¹ÈÎÎñʧ°Ü£¬ÒÀÈ»¿ÉÒÔÖØÐ¼ÆËã¡£
״̬´æ´¢ÊµÏÖ¿ìËÙ¹ÊÕϻָ´ºÍ´Ó¹ÊÕϵã¼ÌÐø´¦Àí¡£¶ÔÓÚJoinºÍ¾ÛºÏ¼°´°¿ÚµÈÓÐ״̬¼ÆË㣬״̬´æ´¢¿É±£´æÖмä״̬¡£¼´Ê¹·¢ÉúFailover»òConsumer
Rebalance£¬ÈÔÈ»¿ÉÒÔͨ¹ý״̬´æ´¢»Ö¸´Öмä״̬£¬´Ó¶ø¿ÉÒÔ¼ÌÐø´ÓFailover»òConsumer
RebalanceǰµÄµã¼ÌÐø¼ÆËã¡£
KTableÓëretention periodÌṩÁ˶ÔÂÒÐòÊý¾ÝµÄ´¦ÀíÄÜÁ¦¡£
Kafka StreamÓ¦ÓÃʾÀý
ÏÂÃæ½áºÏÒ»¸ö°¸ÀýÀ´½²½âÈçºÎ¿ª·¢Kafka StreamÓ¦Óᣱ¾ÀýÍêÕû´úÂë¿É´Ó×÷ÕßGithub»ñÈ¡¡£https://github.com/habren/KafkaExample
¶©µ¥KStream£¨ÃûΪorderStream£©£¬µ×²ãTopicµÄPartitionÊýΪ3£¬KeyΪÓû§Ãû£¬Value°üº¬Óû§Ãû£¬ÉÌÆ·Ãû£¬¶©µ¥Ê±¼ä£¬ÊýÁ¿¡£Óû§KTable£¨ÃûΪuserTable£©£¬µ×²ãTopicµÄPartitionÊýΪ3£¬KeyΪÓû§Ãû£¬Value°üº¬ÐԱ𣬵ØÖ·ºÍÄêÁä¡£ÉÌÆ·KTable£¨ÃûΪitemTable£©£¬µ×²ãTopicµÄPartitionÊýΪ6£¬KeyΪÉÌÆ·Ãû£¬¼Û¸ñ£¬ÖÖÀàºÍ²úµØ¡£ÏÖÔÚÏ£Íû¼ÆËãÿСʱ¹ºÂò²úµØÓë×Ô¼ºËùÔÚµØÏàͬµÄÓû§×ÜÊý¡£
Ê×ÏÈÓÉÓÚÏ£ÍûʹÓö©µ¥Ê±¼ä£¬¶øËü°üº¬ÔÚorderStreamµÄValueÖУ¬ÐèҪͨ¹ýÌṩһ¸öʵÏÖTimestampExtractor½Ó¿ÚµÄÀà´ÓorderStream¶ÔÓ¦µÄTopicÖгéÈ¡³ö¶©µ¥Ê±¼ä¡£

½Ó×Åͨ¹ý½«orderStreamÓëuserTable½øÐÐJoin£¬À´»ñÈ¡¶©µ¥Óû§ËùÔڵء£ÓÉÓÚ¶þÕß¶ÔÓ¦µÄTopicµÄPartitionÊýÏàͬ£¬ÇÒKey¶¼ÎªÓû§Ãû£¬ÔÙ¼ÙÉèProducerÍùÕâÁ½¸öTopicдÊý¾ÝʱËùÓõÄPartitionerʵÏÖÏàͬ£¬Ôò´ËʱÉÏÎÄËùÊöJoinÌõ¼þÂú×㣬¿ÉÖ±½Ó½øÐÐJoin¡£

´ÓÉÏÊö´úÂëÖУ¬¿ÉÒÔ¿´µ½£¬JoinʱÐèÒªÖ¸¶¨ÈçºÎ´Ó²ÎÓëJoinË«·½µÄ¼Ç¼Éú³É½á¹û¼Ç¼µÄValue¡£Key²»ÐèÒªÖ¸¶¨£¬ÒòΪ½á¹û¼Ç¼µÄKeyÓëJoin
KeyÏàͬ£¬¹ÊÎÞÐëÖ¸¶¨¡£Join½á¹û´æÓÚÃûΪorderUserStreamµÄKStreamÖС£
½ÓÏÂÀ´ÐèÒª½«orderUserStreamÓëitemTable½øÐÐJoin£¬´Ó¶ø»ñÈ¡ÉÌÆ·²úµØ¡£´ËʱorderUserStreamµÄKeyÈÔΪÓû§Ãû£¬¶øitemTable¶ÔÓ¦µÄTopicµÄKeyΪ²úÆ·Ãû£¬²¢ÇÒ¶þÕßµÄPartitionÊý²»Ò»Ñù£¬Òò´ËÎÞ·¨Ö±½ÓJoin¡£´ËʱÐèҪͨ¹ýthrough·½·¨£¬¶ÔÆäÖÐÒ»·½»òË«·½½øÐÐÖØÐ·ÖÇø£¬Ê¹µÃ¶þÕßÂú×ãJoinÌõ¼þ¡£ÕâÒ»¹ý³ÌÏ൱ÓÚSparkµÄShuffle¹ý³ÌºÍStormµÄFieldGrouping¡£

´ÓÉÏÊö´úÂë¿É¼û£¬throughʱÐèÒªÖ¸¶¨KeyµÄÐòÁл¯Æ÷£¬ValueµÄÐòÁл¯Æ÷£¬ÒÔ¼°·ÖÇø·½Ê½ºÍ½á¹û¼¯ËùÔÚµÄTopic¡£ÕâÀïҪעÒ⣬¸ÃTopic£¨orderuser-repartition-by-item£©µÄPartitionÊý±ØÐëÓëitemTable¶ÔÓ¦TopicµÄPartitionÊýÏàͬ£¬²¢ÇÒthroughʹÓõķÖÇø·½·¨±ØÐëÓëiteamTable¶ÔÓ¦TopicµÄ·ÖÇø·½Ê½Ò»Ñù¡£¾¹ýÕâÖÖthrough²Ù×÷£¬orderUserStreamÓëitemTableÂú×ãÁËJoinÌõ¼þ£¬¿ÉÖ±½Ó½øÐÐJoin¡£
×ܽá
Kafka StreamµÄ²¢ÐÐÄ£ÐÍÍêÈ«»ùÓÚKafkaµÄ·ÖÇø»úÖÆºÍRebalance»úÖÆ£¬ÊµÏÖÁËÔÚÏß¶¯Ì¬µ÷Õû²¢ÐжÈ
ͬһTask°üº¬ÁËÒ»¸ö×ÓTopologyµÄËùÓÐProcessor£¬Ê¹µÃËùÓд¦ÀíÂß¼¶¼ÔÚͬһÏß³ÌÄÚÍê³É£¬±ÜÃâÁ˲»±ØµÄÍøÂçͨÐÅ¿ªÏú£¬´Ó¶øÌá¸ßÁËЧÂÊ¡£
through·½·¨ÌṩÁËÀàËÆSparkµÄShuffle»úÖÆ£¬ÎªÊ¹Óò»Í¬·ÖÇø²ßÂÔµÄÊý¾ÝÌṩÁËJoinµÄ¿ÉÄÜ
log compactÌá¸ßÁË»ùÓÚKafkaµÄstate storeµÄ¼ÓÔØÐ§ÂÊ
state storeΪ״̬¼ÆËãÌṩÁË¿ÉÄÜ
»ùÓÚoffsetµÄ¼ÆËã½ø¶È¹ÜÀíÒÔ¼°»ùÓÚstate storeµÄÖмä״̬¹ÜÀíΪ·¢ÉúConsumer rebalance»òFailoverʱ´Ó¶Ïµã´¦¼ÌÐø´¦ÀíÌṩÁË¿ÉÄÜ£¬²¢ÎªÏµÍ³ÈÝ´íÐÔÌṩÁ˱£ÕÏ
KTableµÄÒýÈ룬ʹµÃ¾ÛºÏ¼ÆËãÓµÓÃÁË´¦ÀíÂÒÐòÎÊÌâµÄÄÜÁ¦ |