±à¼ÍƼö: |
±¾ÎÄÀ´×ÔÓÚcsdn£¬ÎÄÖнéÉÜÁËConcepts£¬Kafka
ºËÐÄÔÀí£¬Use Cases £¬Development with Kafka £¬Administration
£¬Cluster Planning £¬Compare µÈ¡£ |
|
Ò». Concepts
Kafka is used for building real-time data pipelines
and streaming apps
·Ö²¼Ê½ÏûÏ¢´«µÝ
ÍøÕ¾»îÔ¾Êý¾Ý¸ú×Ù
ÈÕÖ¾¾ÛºÏ
Á÷ʽÊý¾Ý´¦Àí
Êý¾Ý´æ´¢
ʼþÔ´
¡¡

Kafka terminology ÊõÓï
1.Topics
Kafka maintains feeds of messages in categories called
topics.
ÏûÏ¢¶¼¹éÊôÓÚÒ»¸öÀà±ð³ÉΪtopic,ÔÚÎïÀíÉϲ»Í¬TopicµÄÏûÏ¢·Ö¿ª´æ´¢,Âß¼ÉÏÒ»¸öTopicµÄÏûÏ¢¶ÔʹÓÃÕß͸Ã÷

2.Partitions
Topics are broken up into ordered commit logs called
partitions
ÿ¸öTopics»®·ÖΪһ¸ö»òÕß¶à¸öPartition,²¢ÇÒPartitionÖеÄÿÌõÏûÏ¢¶¼±»±ê¼ÇÁËÒ»¸ösequential
id ,Ò²¾ÍÊÇoffset,²¢ÇÒ´æ´¢µÄÊý¾ÝÊÇ¿ÉÅäÖô洢ʱ¼äµÄ

3.Message Ordering
ÏûÏ¢Ö»±£Ö¤ÔÚͬһ¸öPartitionÖÐÓÐÐò,ËùÒÔ,Èç¹ûÒª±£Ö¤´ÓTopicÖÐÄõ½µÄÊý¾ÝÓÐÐò,ÔòÐèÒª×öµ½:
Group messages in a partition by key(producer)
Configure exactly one consumer instance per partition
within a consumer group
kafkaÄܱ£Ö¤µÄÊÇ:
Message sent by a producer to a particular topic
partition will be appended in the order they are sent
A consumer instance sees messages in the order they
are stored in the log
For a topic with replication factor N, kafka can tolerate
up to N-1 server failures without ¡°losing¡± any messages
committed to the log
4.Log
Partition¶ÔÓ¦Âß¼ÉϵÄLog
5.Replication ¸±±¾
Topics can (and should) be replicated
The unit of replication is the partition
Each partition in a topic has 1 leader and 0 or more
replicas
A replica is deemed to be ¡°in-sync¡± if
The replica can communicate with Zookeeper
The replica is not ¡°too far¡± behind the leader(configurable)
The group of in-sync replicas for a partition is
called the ISR(In-Sync-Replicas)
The Replication factor cannot be lowered
6.kafka durability ¿É¿¿ÐÔ
Durability can be configured with the producer configuration
request.required.acks
0 : The producer never waits for an ack
1 : The producer gets an ack after the leader replica
has received the data
-1 : The producer gets an ack after all ISRs receive
the data
Minimum available ISR can also be configured such
that an error is returned if enough replicas are not
available to replicate data
ËùÒÔ,kafka¿ÉÒÔÑ¡Ôñ²»Í¬µÄdurabilityÀ´»»È¡²»Í¬µÄÍÌÍÂÁ¿

ͨÓÃ,kafka¿ÉÒÔͨ¹ýÔö¼Ó¸ü¶àµÄBrokerÀ´ÌáÉýÍÌÍÂÁ¿
Ò»¸öÍÆ¼öµÄÅäÖÃ:

7.Broker
Kafka is run as a cluster comparised of one or more
servers each of which is called broker

8.Producer
Processes that publish messages to a kafka topic
are called producers
Producers publish to a topic of their choosing(push)
Êý¾ÝÔØÈëkafka¿ÉÒÔÊÇ·Ö²¼Ê½µÄ,ͨ³£ÊÇͨ¹ý¡±Round-Robin¡±Ëã·¨²ßÂÔ,Ò²¿ÉÒÔ¸ù¾ÝmessageÖеÄkeyÀ´½øÐÐÓïÒå·Ö¸î¡±semantic
partitioning¡±À´·Ö²¼Ê½ÔØÈë,Brokers ͨ¹ý·ÖÇøÀ´¾ùºâÔØÈë
kafkaÖ§³ÖÒì²½·¢ËÍasync,Òì²½·¢ËÍÏûÏ¢ÊÇless durableµÄ,µ«ÊÇÊǸßÍÌ͵Ä
ProducerµÄÔØÈëÆ½ºâºÍISRs

9.Consumer
Processes that subscribe(¶©ÔÄ) to tpics and process
the feed of published messages are called consumers
Multiple Consumer can read from the same topic
Each Consumer is responsible for managing it¡¯s own
offset
Message stay on kafka¡ they are not removed after
they consumed

Consumer¿ÉÒÔ´ÓÈÎÒ»µØ·½¿ªÊ¼Ïû·Ñ,È»ºóÓֻص½×î´óÆ«ÒÆÁ¿´¦,ConsumersÓÖ¿ÉÒÔ±»»®·ÖΪConsumer
Group
10.Consumer Group
Consumer GroupÊÇÏÔʽ·Ö²¼Ê½,¶à¸öConsumer¹¹³É×é½á¹¹,MessageÖ»ÄÜ´«Ê䏸ij¸öGroupÖеÄijһ¸öConsumer
³£ÓõÄConsumer Groupģʽ:
All consumer instances in one group
Acts like a traditional queue with load balancing
All consumer instances in different groups
All messages are broadcast to all consumer instances
¡°Logical Subscriber¡± - Many consumer instances in
a group
Consumers are added for scalability and fault tolerance,Each
consumer instance reads from one or more partitions
for a topic ,There cannot be more consumer instances
than partitions

Consumer Groups ÌṩÁËtopicsºÍpartitionsµÄ¸ôÀë

²¢ÇÒµ±Ä³¸öConsumer¹ÒµôºóÄܹ»ÖØÐÂÆ½ºâ
Consumer GroupµÄÓ¦Óó¡¾°
µã¶Ôµã
½«ËùÓÐÏû·ÑÕ߷ŵ½Ò»¸öConsumer Group
¹ã²¥
½«Ã¿¸öÏû·ÑÕßµ¥¶À·Åµ½Ò»¸öConsumer Group
ˮƽÀ©Õ¹
ÏòConsumer GroupÖÐÌí¼ÓÏû·ÑÕß²¢½øÐÐRebalance
¹ÊÕÏ×ªÒÆ
µ±Ä³¸öConsumer·¢Éú¹ÊÕÏʱ,Consumer GroupÖØÐ·ÖÅä·ÖÇø
¶þ. Kafka ºËÐÄÔÀí
KafkaÉè¼ÆË¼Ïë
¿É³Ö¾Ã»¯Message
³Ö¾Ã»¯±¾µØÎļþϵͳ,ÉèÖÃÓÐЧÆÚ
Ö§³Ö¸ßÁ÷Á¿´¦Àí
ÃæÏòÌØ¶¨µÄʹÓó¡¾°¶ø²»ÊÇͨÓù¦ÄÜ
Ïû·Ñ״̬±£´æÔÚÏû·Ñ¶Ë¶ø²»ÊÇ·þÎñ¶Ë
¼õÇá·þÎñÆ÷¸ºµ£ºÍ½»»¥
Ö§³Ö·Ö²¼Ê½
Éú²úÕß/Ïû·ÑÕß͸Ã÷Òì²½
ÒÀÀµ´ÅÅÌÎļþϵͳ×öÏûÏ¢»º´æ
²»ÏûºÄÄÚ´æ
¸ßЧµÄ´ÅÅÌ´æÈ¡
¸´ÔÓ¶ÈΪO(1)
Ç¿µ÷¼õÉÙÊý¾ÝµÄÐòÁл¯ºÍ¿½±´¿ªÏú
ÅúÁ¿´æ´¢ºÍ·¢ËÍ¡¢zero-copy
Ö§³ÖÊý¾Ý²¢ÐмÓÔØµ½Hadoop
¼¯³ÉHadoop
KafkaÔÀí·ÖÎö
1.´æ´¢
Partition
topicÎïÀíÉϵķÖ×飬һ¸ötopic¿ÉÒÔ·ÖΪ¶à¸öpartition£¬Ã¿¸öpartitionÊÇÒ»¸öÓÐÐòµÄ¶ÓÁС£
ÔÚKafkaÎļþ´æ´¢ÖУ¬Í¬Ò»¸ötopicÏÂÓжà¸ö²»Í¬partition£¬Ã¿¸öpartitionΪһ¸öĿ¼£¬partitonÃüÃû¹æÔòΪtopicÃû³Æ+ÓÐÐòÐòºÅ£¬µÚÒ»¸öpartitonÐòºÅ´Ó0¿ªÊ¼£¬ÐòºÅ×î´óֵΪpartitionsÊýÁ¿¼õ1

ÿ¸öpartion(Ŀ¼)Ï൱ÓÚÒ»¸ö¾ÞÐÍÎļþ±»Æ½¾ù·ÖÅäµ½¶à¸ö´óСÏàµÈsegment(¶Î)Êý¾ÝÎļþÖС£µ«Ã¿¸ö¶Îsegment
fileÏûÏ¢ÊýÁ¿²»Ò»¶¨ÏàµÈ£¬ÕâÖÖÌØÐÔ·½±ãold segment file¿ìËÙ±»É¾³ý¡£
ÿ¸öpartitonÖ»ÐèÒªÖ§³Ö˳Ðò¶Áд¾ÍÐÐÁË£¬segmentÎļþÉúÃüÖÜÆÚÓÉ·þÎñ¶ËÅäÖòÎÊý¾ö¶¨¡£
ÕâÑù×öµÄºÃ´¦¾ÍÊÇÄÜ¿ìËÙɾ³ýÎÞÓÃÎļþ£¬ÓÐЧÌá¸ß´ÅÅÌÀûÓÃÂÊ¡£
segment file
segment file×é³É£ºÓÉ2´ó²¿·Ö×é³É£¬·Ö±ðΪindex fileºÍdata file£¬´Ë2¸öÎļþÒ»Ò»¶ÔÓ¦£¬³É¶Ô³öÏÖ£¬ºó׺¡±.index¡±ºÍ¡°.log¡±·Ö±ð±íʾΪsegmentË÷ÒýÎļþ¡¢Êý¾ÝÎļþ.
segmentÎļþÃüÃû¹æÔò£ºpartionÈ«¾ÖµÄµÚÒ»¸ösegment´Ó0¿ªÊ¼£¬ºóÐøÃ¿¸ösegmentÎļþÃûΪÉÏÒ»¸ösegmentÎļþ×îºóÒ»ÌõÏûÏ¢µÄoffsetÖµ¡£ÊýÖµ×î´óΪ64λlong´óС£¬19λÊý×Ö×Ö·û³¤¶È£¬Ã»ÓÐÊý×ÖÓÃ0Ìî³ä¡£

ÆäÖÐ.indexË÷ÒýÎļþ´æ´¢´óÁ¿ÔªÊý¾Ý£¬.logÊý¾ÝÎļþ´æ´¢´óÁ¿ÏûÏ¢£¬Ë÷ÒýÎļþÖÐÔªÊý¾ÝÖ¸Ïò¶ÔÓ¦Êý¾ÝÎļþÖÐmessageµÄÎïÀíÆ«ÒÆµØÖ·¡£ËûÃÇÁ½¸öÊÇÒ»Ò»¶ÔÓ¦µÄ,¶ÔÓ¦¹ØÏµÈçÏÂ

Message
segment data fileÓÉÐí¶àmessage×é³É£¬messageÎïÀí½á¹¹ÈçÏÂ

²ÎÊý˵Ã÷:

2. Consumer
High Level Consumer
Ïû·ÑÕß±£´æÏû·Ñ״̬£º½«´Óij¸öPartition¶ÁÈ¡µÄ×îºóÒ»ÌõÏûÏ¢µÄoffset´æÓÚZooKeeperÖÐ
Low Level Consumer£º¸üºÃµÄ¿ØÖÆÊý¾ÝµÄÏû·Ñ
ͬһÌõÏûÏ¢¶Á¶à´Î
Ö»¶Áȡij¸öTopicµÄ²¿·ÖPartition
¹ÜÀíÊÂÎñ£¬´Ó¶øÈ·±£Ã¿ÌõÏûÏ¢±»´¦ÀíÒ»´Î£¬ÇÒ½ö±»´¦ÀíÒ»´Î
´óÁ¿¶îÍ⹤×÷
±ØÐëÔÚÓ¦ÓóÌÐòÖиú×Ùoffset£¬´Ó¶øÈ·¶¨ÏÂÒ»ÌõÓ¦¸ÃÏû·ÑÄÄÌõÏûÏ¢
Ó¦ÓóÌÐòÐèҪͨ¹ý³ÌÐò»ñ֪ÿ¸öPartitionµÄLeaderÊÇË
±ØÐë´¦ÀíLeaderµÄ±ä»¯
3.ÏûÏ¢´«µÝÓïÒåDelivery Semantics
At least once
kafkaµÄĬÈÏÉèÖÃ
Messages are never lost but maybe redelivered
At most once
Messages are lost but never redelivered
Exactly once
±È½ÏÄÑʵÏÖ
Messages are delivered once and only once
ʵÏÖExactly OnceÐèÒª¿¼ÂÇ:
Must consider two components
Durability guarantees when publishing a message
Durability guarantees when consuming a message
Producer
What happens when a produce request was sent but a
network error returned before an ack ?
RE:Use a single writer per partition and check the
latest committed value after network errors
Consumer
include a unique ID(e.g.UUID) and de-duplicate
Consider storing offsets with data
½âÊÍ:
ÏûÏ¢´«µÝÓïÒå: Producer ½Ç¶È
µ±ProducerÏòbroker·¢ËÍÏûϢʱ£¬Ò»µ©ÕâÌõÏûÏ¢±»commit£¬ÒòΪreplicationµÄ´æÔÚ£¬Ëü¾Í²»»á¶ª,µ«ÊÇÈç¹ûProducer·¢ËÍÊý¾Ý¸øbrokerºó£¬Óöµ½ÍøÂçÎÊÌâ¶øÔì³ÉͨÐÅÖжϣ¬ÄÇProducer¾ÍÎÞ·¨ÅжϸÃÌõÏûÏ¢ÊÇ·ñÒѾcommit
ÀíÏëµÄ½â¾ö·½°¸£ºProducer¿ÉÒÔÉú³ÉÒ»ÖÖÀàËÆÓÚÖ÷¼üµÄ¶«Î÷£¬·¢Éú¹ÊÕÏʱÃݵÈÐÔµÄÖØÊÔ¶à´Î£¬ÕâÑù¾Í×öµ½ÁËExactly
once,ĿǰĬÈÏÇé¿öÏÂÒ»ÌõÏûÏ¢´ÓProducerµ½brokerÊÇÈ·±£ÁËAt least once
ÏûÏ¢´«µÝÓïÒå: Consumer : High Level API
ConsumerÔÚ´Óbroker¶ÁÈ¡ÏûÏ¢ºó£¬¿ÉÒÔÑ¡Ôñcommit£¬¸Ã²Ù×÷»áÔÚZookeeperÖб£´æ¸ÃConsumerÔÚ¸ÃPartitionÖжÁÈ¡µÄÏûÏ¢µÄoffset
¸ÃConsumerÏÂÒ»´ÎÔÙ¶Á¸ÃPartitionʱ»á´ÓÏÂÒ»Ìõ¿ªÊ¼¶ÁÈ¡£»Èçδcommit£¬ÏÂÒ»´Î¶ÁÈ¡µÄ¿ªÊ¼Î»Öûá¸úÉÏÒ»´ÎcommitÖ®ºóµÄ¿ªÊ¼Î»ÖÃÏàͬ
ÏÖʵµÄÎÊÌ⣺µ½µ×ÊÇÏÈ´¦ÀíÏûÏ¢ÔÙcommit£¬»¹ÊÇÏÈcommitÔÙ´¦ÀíÏûÏ¢£¿
ÏÈ´¦ÀíÏûÏ¢ÔÙcommit
Èç¹ûÔÚ´¦ÀíÍêÏûÏ¢ÔÙ½øÐÐcommit֮ǰConsumer·¢Éúå´»ú£¬Ï´ÎÖØÐ¿ªÊ¼¹¤×÷ʱ»¹»á´¦Àí¸Õ¸ÕδcommitµÄÏûÏ¢£¬Êµ¼ÊÉϸÃÏûÏ¢ÒѾ±»´¦Àí¹ýÁË¡£Õâ¾Í¶ÔÓ¦ÓÚAt
least once
ÒµÎñ³¡¾°Ê¹ÓÃÃݵÈÐÔ£ºÏûÏ¢¶¼ÓÐÒ»¸öÖ÷¼ü£¬ËùÒÔÏûÏ¢µÄ´¦ÀíÍùÍù¾ßÓÐÃݵÈÐÔ£¬¼´¶à´Î´¦ÀíÕâÒ»ÌõÏûÏ¢¸úÖ»´¦ÀíÒ»´ÎÊǵÈЧµÄ£¬ÄǾͿÉÒÔÈÏΪÊÇExactly
once¡£
ÏÈcommitÔÙ´¦ÀíÏûÏ¢
Èç¹ûConsumerÔÚcommitºó»¹Ã»À´µÃ¼°´¦ÀíÏûÏ¢¾Íå´»úÁË£¬Ï´ÎÖØÐ¿ªÊ¼¹¤×÷ºó¾ÍÎÞ·¨¶Áµ½¸Õ¸ÕÒÑÌá½»¶øÎ´´¦ÀíµÄÏûÏ¢£¬Õâ¾Í¶ÔÓ¦ÓÚAt
most once
ÏûÏ¢´«µÝÓïÒå: Consumer : Lower Level API
Exactly onceµÄʵÏÖ˼Ï룺е÷offsetºÍÏûÏ¢Êý¾Ý
¾µä×ö·¨ÊÇÒýÈëÁ½½×¶ÎÌá½»
offsetºÍÏûÏ¢Êý¾Ý·ÅÔÚͬһ¸öµØ·½£ºConsumerÄõ½Êý¾Ýºó¿ÉÄܰÑÊý¾Ý·Åµ½¹²Ïí¿Õ¼äÖУ¬Èç¹û°Ñ×îеÄoffsetºÍÊý¾Ý±¾ÉíÒ»Æðдµ½¹²Ïí¿Õ¼ä£¬ÄǾͿÉÒÔ±£Ö¤Êý¾ÝµÄÊä³öºÍoffsetµÄ¸üÐÂҪô¶¼Íê³É£¬ÒªÃ´¶¼²»Íê³É£¬¼ä½ÓʵÏÖExactly
once
High level API¶øÑÔ£¬offsetÊÇ´æÓÚZookeeperÖеģ¬ÎÞ·¨»ñÈ¡£¬¶øLow level
APIµÄoffsetÊÇÓÉ×Ô¼ºÈ¥Î¬»¤µÄ£¬¿ÉÒÔʵÏÖÒÔÉÏ·½°¸
4.¸ß¿ÉÓÃÐÔ
ͬһ¸öPartition¿ÉÄÜ»áÓжà¸öReplica£¬ÐèÒª±£Ö¤Í¬Ò»¸öPartitionµÄ¶à¸öReplicaÖ®¼äµÄÊý¾ÝÒ»ÖÂÐÔ
¶øÕâʱÐèÒªÔÚÕâЩReplicationÖ®¼äÑ¡³öÒ»¸öLeader£¬ProducerºÍConsumerÖ»ÓëÕâ¸öLeader½»»¥£¬ÆäËüReplica×÷ΪFollower´ÓLeaderÖи´ÖÆÊý¾Ý
¸±±¾Óë¸ß¿ÉÓÃÐÔ£º¸±±¾Leader ElectionËã·¨
ZookeeperÖеÄÑ¡¾ÙËã·¨»Ø¹Ë
ÉÙÊý·þ´Ó¶àÊý£ºÈ·±£¼¯ÈºÖÐÒ»°ëÒÔÉϵĻúÆ÷µÃµ½Í¬²½
ÊʺϹ²Ïí¼¯ÈºÅäÖõÄϵͳÖУ¬¶ø²¢²»ÊʺÏÐèÒª´æ´¢´óÁ¿Êý¾ÝµÄϵͳ£¬ÒòΪÐèÒª´óÁ¿¸±±¾¼¯¡£f¸öReplicaʧ°ÜÇé¿öÏÂÐèÒª2f+1¸ö¸±±¾
KafkaµÄ×ö·¨
ISR(in-sync replicas)£¬Õâ¸öISRÀïµÄËùÓи±±¾¶¼¸úÉÏÁËLeader£¬Ö»ÓÐISRÀïµÄ³ÉÔ±²ÅÓб»Ñ¡ÎªLeaderµÄ¿ÉÄÜ
ÔÚÕâÖÖģʽÏ£¬¶ÔÓÚf+1¸ö¸±±¾£¬Ò»¸öPartitionÄÜÔÚ±£Ö¤²»¶ªÊ§ÒѾcommitµÄÏûÏ¢µÄǰÌáÏÂÈÝÈÌf¸ö¸±±¾µÄʧ°Ü
ISRÐèÒªµÄ×ܵĸ±±¾µÄ¸öÊý¼¸ºõÊÇ¡°ÉÙÊý·þ´Ó¶àÊý¡±µÄÒ»°ë
¸±±¾Óë¸ß¿ÉÓÃÐÔ£ºReplica·ÖÅäËã·¨
½«ËùÓÐReplica¾ùÔÈ·Ö²¼µ½Õû¸ö¼¯Èº
½«ËùÓÐn¸öBrokerºÍ´ý·ÖÅäµÄPartitionÅÅÐò
½«µÚi¸öPartition·ÖÅäµ½µÚ(i mod n)¸öBrokerÉÏ
½«µÚi¸öPartitionµÄµÚj¸öReplica·ÖÅäµ½µÚ((i + j) mode n)¸öBrokerÉÏ

5. Á㿽±´
Kafkaͨ¹ýMessage·Ö×éºÍSendfileϵͳµ÷ÓÃʵÏÖÁËzero-copy
´«Í³µÄsocket·¢ËÍÎļþ¿½±´

1.ÄÚºË̬
2.Óû§Ì¬
3.ÄÚºË̬
4.Íø¿¨»º´æ
¾ÀúÁËÄÚºË̬ºÍÓû§Ì¬µÄ¿½±´
sendfileϵͳµ÷ÓÃ

±ÜÃâÁËÄÚºË̬ÓëÓû§Ì¬µÄÉÏÏÂÎÄÇл»¶¯×÷

Èý. Use Cases
Real-Time Stream Processing(combined with Spark
Streaming)
General purpose Message Bus
Collecting User Activity Data
Collecting Operational Metrics from applications,servers
or devices
Log Aggregation
Change Data Capture
Commit Log for distributed systems

ËÄ. Development with Kafka

Îå. Administration
list && describe
echo "´ËKafka¼¯ÈºËùÓеÄTopic
: "
kafka-topics -- list -- zookeeper dc226. dooioo.cn:2181,
dc227.dooioo.cn:2181, dc229. dooioo.cn:2181 /kafka
echo " ÄúÒª²é¿´µÄTopicÏêϸ : "
kafka- topics - -describe -- zookeeper dc226.
dooioo.cn: 2181, dc227.dooioo.cn: 2181,dc229.
dooioo.cn:2181 /kafka --topic $ topicName |
create topic
kafka-topics
--create -- zookeeper dc226.dooioo.cn:2181,dc227.
dooioo.cn:2181,dc229. dooioo.cn:2181 /kafka --
replication- factor 1 --pa
rtitions 1 --topic $topicName |
open producer
kafka-console-
producer -- broker- list 10.22. 253.227:9092 --topic
$topicName |
open consumer
kafka-console-consumer
--zookeeper 10.22.253.226: 2181, 10.22. 253 .227:
2181,10.22. 253.229:2181 /kafka -- topic $topicName
-- from- beginning |
Áù. Cluster Planning
Æß. Compare



|