±à¼ÍƼö: |
±¾ÎÄÖ÷Òª½éÉÜÁËraftËã·¨¸ÅÀÀ¡¢leader
election¡¢log replication¡¢safety¡¢corner caseµÄµÈÏà¹ØÄÚÈÝ¡£
±¾ÎÄÀ´×ÔÓÚ²©¿ÍÔ°£¬ÓÉ»ðÁú¹ûÈí¼þAnna±à¼ÍƼö¡£ |
|
ÕýÎÄ
raftÊǹ¤³ÌÉÏʹÓýÏΪ¹ã·ºµÄǿһÖÂÐÔ¡¢È¥ÖÐÐÄ»¯¡¢¸ß¿ÉÓõķֲ¼Ê½ÐÒé¡£ÔÚÕâÀïÇ¿µ÷ÁËÊÇÔÚ¹¤³ÌÉÏ£¬ÒòΪÔÚѧÊõÀíÂ۽磬×îÒ«Ñ۵ϹÊÇ´óÃû¶¦¶¦µÄPaxos¡£µ«PaxosÊÇ£ºÉÙÊýÕæÕýÀí½âµÄÈ˾õµÃ¼òµ¥£¬ÉÐδÀí½âµÄÈ˾õµÃºÜÄÑ£¬´ó¶àÊýÈ˶¼ÊÇÒ»Öª°ë½â¡£±¾ÈËÒ²»¨Á˺ܶàʱ¼ä¡¢¿´Á˺ܶà²ÄÁÏҲûÓÐÕæÕýÀí½â¡£Ö±µ½¿´µ½raftµÄÂÛÎÄ£¬Á½Î»Ñо¿ÕßÒ²Ìáµ½£¬ËûÃÇÒ²»¨Á˺ܳ¤µÄʱ¼äÀ´Àí½âPaxos£¬ËûÃÇÒ²¾õµÃºÜÄÑÀí½â£¬ÓÚÊÇÑо¿³öÁËraftËã·¨¡£
raftÊÇÒ»¸ö¹²Ê¶Ëã·¨£¨consensus algorithm£©£¬Ëùν¹²Ê¶£¬¾ÍÊǶà¸ö½Úµã¶Ôij¸öÊÂÇé´ï³ÉÒ»ÖµĿ´·¨£¬¼´Ê¹ÊÇÔÚ²¿·Ö½Úµã¹ÊÕÏ¡¢ÍøÂçÑÓʱ¡¢ÍøÂç·Ö¸îµÄÇé¿öÏ¡£ÕâЩÄê×îΪ»ðÈȵļÓÃÜ»õ±Ò£¨±ÈÌØ±Ò¡¢Çø¿éÁ´£©¾ÍÐèÒª¹²Ê¶Ëã·¨£¬¶øÔÚ·Ö²¼Ê½ÏµÍ³ÖУ¬¹²Ê¶Ëã·¨¸ü¶àÓÃÓÚÌá¸ßϵͳµÄÈÝ´íÐÔ£¬±ÈÈç·Ö²¼Ê½´æ´¢Öеĸ´ÖƼ¯£¨replication£©£¬ÔÚ´ø×ÅÎÊÌâѧϰ·Ö²¼Ê½ÏµÍ³Ö®ÖÐÐÄ»¯¸´ÖƼ¯Ò»ÎÄÖнéÉÜÁËÖÐÐÄ»¯¸´ÖƼ¯µÄÏà¹ØÖªÊ¶¡£raftÐÒé¾ÍÊÇÒ»ÖÖleader-basedµÄ¹²Ê¶Ëã·¨£¬ÓëÖ®ÏàÓ¦µÄÊÇleaderlessµÄ¹²Ê¶Ëã·¨¡£
±¾ÎÄ»ùÓÚÂÛÎÄAIn
Search of an Understandable Consensus Algorithm¶ÔraftÐÒé½øÐзÖÎö£¬µ±È»£¬»¹Êǽ¨Òé¶ÁÕßÖ±½Ó¿´ÂÛÎÄ¡£
raftËã·¨¸ÅÀÀ RaftËã·¨µÄÍ·ºÅÄ¿±ê¾ÍÊÇÈÝÒ×Àí½â£¨UnderStandable£©£¬Õâ´ÓÂÛÎĵıêÌâ¾Í¿ÉÒÔ¿´³öÀ´¡£µ±È»£¬RaftÔöÇ¿ÁË¿ÉÀí½âÐÔ£¬ÔÚÐÔÄÜ¡¢¿É¿¿ÐÔ¡¢¿ÉÓÃÐÔ·½ÃæÊDz»ÊäÓÚPaxosµÄ¡£
Raft more understandable than Paxos and also provides
a better foundation for building practical systems
ΪÁË´ïµ½Ò×ÓÚÀí½âµÄÄ¿±ê£¬raft×öÁ˺ܶàŬÁ¦£¬ÆäÖÐ×îÖ÷ÒªÊÇÁ½¼þÊÂÇ飺
ÎÊÌâ·Ö½â
״̬¼ò»¯ ÎÊÌâ·Ö½âÊǽ«"¸´ÖƼ¯ÖнڵãÒ»ÖÂÐÔ"Õâ¸ö¸´ÔÓµÄÎÊÌâ»®·ÖΪÊý¸ö¿ÉÒÔ±»¶ÀÁ¢½âÊÍ¡¢Àí½â¡¢½â¾öµÄ×ÓÎÊÌâ¡£ÔÚraft£¬×ÓÎÊÌâ°üÀ¨£¬leader
election£¬ log replication£¬safety£¬membership changes¡£¶ø×´Ì¬¼ò»¯¸üºÃÀí½â£¬¾ÍÊǶÔËã·¨×ö³öһЩÏÞÖÆ£¬¼õÉÙÐèÒª¿¼ÂǵÄ״̬Êý£¬Ê¹µÃËã·¨¸ü¼ÓÇåÎú£¬¸üÉٵIJ»È·¶¨ÐÔ£¨±ÈÈ磬±£Ö¤ÐÂÑ¡¾Ù³öÀ´µÄleader»á°üº¬ËùÓÐcommited
log entry£©
Raft implements consensus by first electing a distinguished
leader, then giving the leader complete responsibility
for managing the replicated log. The leader accepts
log entries from clients, replicates them on other
servers, and tells servers when it is safe to apply
log entries to their state machines. A leader can
fail or become disconnected from the other servers,
in which case a new leader is elected.
ÉÏÃæµÄÒýÎĶÔraftÐÒéµÄ¹¤×÷ÔÀí½øÐÐÁ˸߶ȵĸÅÀ¨£ºraft»áÏÈÑ¡¾Ù³öleader£¬leaderÍêÈ«¸ºÔðreplicated
logµÄ¹ÜÀí¡£leader¸ºÔð½ÓÊÜËùÓпͻ§¶Ë¸üÐÂÇëÇó£¬È»ºó¸´ÖƵ½follower½Úµã£¬²¢ÔÚ¡°°²È«¡±µÄʱºòÖ´ÐÐÕâЩÇëÇó¡£Èç¹ûleader¹ÊÕÏ£¬followes»áÖØÐÂÑ¡¾Ù³öеÄleader¡£
Õâ¾ÍÉæ¼°µ½raft×îеÄÁ½¸ö×ÓÎÊÌ⣺ leader electionºÍlog replication
leader election raftÐÒéÖУ¬Ò»¸ö½ÚµãÈÎһʱ¿Ì´¦ÓÚÒÔÏÂÈý¸ö״̬֮һ£º
leader
follower
candidate ¸ø³ö×´Ì¬×ªÒÆÍ¼ÄܺÜÖ±¹ÛµÄÖ±µ½ÕâÈý¸ö״̬µÄÇø±ð

¿ÉÒÔ¿´³öËùÓнڵãÆô¶¯Ê±¶¼ÊÇfollower״̬£»ÔÚÒ»¶Îʱ¼äÄÚÈç¹ûûÓÐÊÕµ½À´×ÔleaderµÄÐÄÌø£¬´ÓfollowerÇл»µ½candidate£¬·¢ÆðÑ¡¾Ù£»Èç¹ûÊÕµ½majorityµÄÔì³ÉƱ£¨º¬×Ô¼ºµÄһƱ£©ÔòÇл»µ½leader״̬£»Èç¹û·¢ÏÖÆäËû½Úµã±È×Ô¼º¸üУ¬ÔòÖ÷¶¯Çл»µ½follower¡£
×ÜÖ®£¬ÏµÍ³ÖÐ×î¶àÖ»ÓÐÒ»¸öleader£¬Èç¹ûÔÚÒ»¶Îʱ¼äÀï·¢ÏÖûÓÐleader£¬Ôò´ó¼Òͨ¹ýÑ¡¾Ù-ͶƱѡ³öleader¡£leader»á²»Í£µÄ¸øfollower·¢ÐÄÌøÏûÏ¢£¬±íÃ÷×Ô¼ºµÄ´æ»î״̬¡£Èç¹ûleader¹ÊÕÏ£¬ÄÇôfollower»áת»»³Écandidate£¬ÖØÐÂÑ¡³öleader¡£
term ´ÓÉÏÃæ¿ÉÒÔ¿´³ö£¬Äĸö½Úµã×öleaderÊÇ´ó¼ÒͶƱѡ¾Ù³öÀ´µÄ£¬Ã¿¸öleader¹¤×÷Ò»¶Îʱ¼ä£¬È»ºóÑ¡³öеÄleader¼ÌÐø¸ºÔð¡£Õâ¸ùÃñÖ÷Éç»áµÄÑ¡¾ÙºÜÏñ£¬Ã¿Ò»½ìеÄÂÄÖ°ÆÚ³ÆÖ®ÎªÒ»½ìÈÎÆÚ£¬ÔÚraftÐÒéÖУ¬Ò²ÊÇÕâÑùµÄ£¬¶ÔÓ¦µÄÊõÓï½Ðterm¡£

term£¨ÈÎÆÚ£©ÒÔÑ¡¾Ù£¨election£©¿ªÊ¼£¬È»ºó¾ÍÊÇÒ»¶Î»ò³¤»ò¶ÌµÄÎȶ¨¹¤×÷ÆÚ£¨normal
Operation£©¡£´ÓÉÏͼ¿ÉÒÔ¿´µ½£¬ÈÎÆÚÊǵÝÔöµÄ£¬Õâ¾Í³äµ±ÁËÂ߼ʱÖÓµÄ×÷Óã»ÁíÍ⣬term 3չʾÁËÒ»ÖÖÇé¿ö£¬¾ÍÊÇ˵ûÓÐÑ¡¾Ù³öleader¾Í½áÊøÁË£¬È»ºó»á·¢ÆðеÄÑ¡¾Ù£¬ºóÃæ»á½âÊÍÕâÖÖsplit
voteµÄÇé¿ö¡£
Ñ¡¾Ù¹ý³ÌÏê½â ÉÏÃæÒѾ˵¹ý£¬Èç¹ûfollowerÔÚelection timeoutÄÚûÓÐÊÕµ½À´×ÔleaderµÄÐÄÌø£¬£¨Ò²Ðí´Ëʱ»¹Ã»ÓÐÑ¡³öleader£¬´ó¼Ò¶¼Ôڵȣ»Ò²Ðíleader¹ÒÁË£»Ò²ÐíÖ»ÊÇleaderÓë¸ÃfollowerÖ®¼äÍøÂç¹ÊÕÏ£©£¬Ôò»áÖ÷¶¯·¢ÆðÑ¡¾Ù¡£²½ÖèÈçÏ£º
Ôö¼Ó½Úµã±¾µØµÄ current term £¬Çл»µ½candidate״̬
Ͷ×Ô¼ºÒ»Æ±
²¢ÐиøÆäËû½Úµã·¢ËÍ RequestVote RPCs
µÈ´ýÆäËû½ÚµãµÄ»Ø¸´ ÔÚÕâ¸ö¹ý³ÌÖУ¬¸ù¾ÝÀ´×ÔÆäËû½ÚµãµÄÏûÏ¢£¬¿ÉÄܳöÏÖÈýÖÖ½á¹û
ÊÕµ½majorityµÄͶƱ£¨º¬×Ô¼ºµÄһƱ£©£¬ÔòÓ®µÃÑ¡¾Ù£¬³ÉΪleader
±»¸æÖª±ðÈËÒѵ±Ñ¡£¬ÄÇô×ÔÐÐÇл»µ½follower
Ò»¶Îʱ¼äÄÚûÓÐÊÕµ½majorityͶƱ£¬Ôò±£³Öcandidate״̬£¬ÖØÐ·¢³öÑ¡¾Ù µÚÒ»ÖÖÇé¿ö£¬Ó®µÃÁËÑ¡¾ÙÖ®ºó£¬ÐµÄleader»áÁ¢¿Ì¸øËùÓнڵ㷢ÏûÏ¢£¬¹ã¶ø¸æÖ®£¬±ÜÃâÆäÓà½Úµã´¥·¢ÐµÄÑ¡¾Ù¡£ÔÚÕâÀÏȻص½Í¶Æ±ÕßµÄÊӽǣ¬Í¶Æ±ÕßÈçºÎ¾ö¶¨ÊÇ·ñ¸øÒ»¸öÑ¡¾ÙÇëÇóÍ¶Æ±ÄØ£¬ÓÐÒÔÏÂÔ¼Êø£º
ÔÚÈÎÒ»ÈÎÆÚÄÚ£¬µ¥¸ö½Úµã×î¶àÖ»ÄÜͶһƱ
ºòÑ¡ÈËÖªµÀµÄÐÅÏ¢²»ÄܱÈ×Ô¼ºµÄÉÙ£¨ÕâÒ»²¿·Ö£¬ºóÃæ½éÉÜlog replicationºÍsafetyµÄʱºò»áÏêϸ½éÉÜ£©
first-come-first-served ÏÈÀ´Ïȵà µÚ¶þÖÖÇé¿ö£¬±ÈÈçÓÐÈý¸ö½ÚµãA B C¡£A Bͬʱ·¢ÆðÑ¡¾Ù£¬¶øAµÄÑ¡¾ÙÏûÏ¢Ïȵ½´ïC£¬C¸øAͶÁËһƱ£¬µ±BµÄÏûÏ¢µ½´ïCʱ£¬ÒѾ²»ÄÜÂú×ãÉÏÃæÌáµ½µÄµÚÒ»¸öÔ¼Êø£¬¼´C²»»á¸øBͶƱ£¬¶øAºÍBÏÔÈ»¶¼²»»á¸ø¶Ô·½Í¶Æ±¡£Aʤ³öÖ®ºó£¬»á¸øB,C·¢ÐÄÌøÏûÏ¢£¬½ÚµãB·¢ÏÖ½ÚµãAµÄterm²»µÍÓÚ×Ô¼ºµÄterm£¬ÖªµÀÓÐÒѾÓÐLeaderÁË£¬ÓÚÊÇת»»³Éfollower¡£
µÚÈýÖÖÇé¿ö£¬Ã»ÓÐÈκνڵã»ñµÃmajorityͶƱ£¬±ÈÈçÏÂͼÕâÖÖÇé¿ö£º

×ܹ²ÓÐËĸö½Úµã£¬Node C¡¢Node Dͬʱ³ÉΪÁËcandidate£¬½øÈëÁËterm 4£¬µ«Node
AͶÁËNodeDһƱ£¬NodeBͶÁËNode CһƱ£¬Õâ¾Í³öÏÖÁËÆ½Æ± split voteµÄÇé¿ö¡£Õâ¸öʱºò´ó¼Ò¶¼ÔڵȰ¡µÈ£¬Ö±µ½³¬Ê±ºóÖØÐ·¢ÆðÑ¡¾Ù¡£Èç¹û³öÏÖÆ½Æ±µÄÇé¿ö£¬ÄÇô¾ÍÑÓ³¤ÁËϵͳ²»¿ÉÓõÄʱ¼ä£¨Ã»ÓÐleaderÊDz»ÄÜ´¦Àí¿Í»§¶ËдÇëÇóµÄ£©£¬Òò´ËraftÒýÈëÁËrandomized
election timeoutsÀ´¾¡Á¿±ÜÃâÆ½Æ±Çé¿ö¡£Í¬Ê±£¬leader-based ¹²Ê¶Ëã·¨ÖУ¬½ÚµãµÄÊýÄ¿¶¼ÊÇÆæÊý¸ö£¬¾¡Á¿±£Ö¤majorityµÄ³öÏÖ¡£
log replication µ±ÓÐÁËleader£¬ÏµÍ³Ó¦¸Ã½øÈë¶ÔÍ⹤×÷ÆÚÁË¡£¿Í»§¶ËµÄÒ»ÇÐÇëÇóÀ´·¢Ë͵½leader£¬leaderÀ´µ÷¶ÈÕâЩ²¢·¢ÇëÇóµÄ˳Ðò£¬²¢ÇÒ±£Ö¤leaderÓëfollowers״̬µÄÒ»ÖÂÐÔ¡£raftÖеÄ×ö·¨ÊÇ£¬½«ÕâЩÇëÇóÒÔ¼°Ö´ÐÐ˳Ðò¸æÖªfollowers¡£leaderºÍfollowersÒÔÏàͬµÄ˳ÐòÀ´Ö´ÐÐÕâЩÇëÇ󣬱£Ö¤×´Ì¬Ò»Ö¡£
Replicated state machines ¹²Ê¶Ëã·¨µÄʵÏÖÒ»°ãÊÇ»ùÓÚ¸´ÖÆ×´Ì¬»ú£¨Replicated state machines£©£¬ºÎΪ¸´ÖÆ×´Ì¬»ú£º
If two identical, deterministic processes begin in
the same state and get the same inputs in the same
order, they will produce the same output and end in
the same state.
¼òµ¥À´Ëµ£ºÏàͬµÄ³õʶ״̬ + ÏàͬµÄÊäÈë = ÏàͬµÄ½áÊø×´Ì¬¡£ÒýÎÄÖÐÓÐÒ»¸öºÜÖØÒªµÄ´Êdeterministic£¬¾ÍÊÇ˵²»Í¬½ÚµãÒªÒÔÏàͬÇÒÈ·¶¨ÐԵĺ¯ÊýÀ´´¦ÀíÊäÈ룬¶ø²»ÒªÒýÈëһϲ»È·¶¨µÄÖµ£¬±ÈÈç±¾µØÊ±¼äµÈ¡£ÈçºÎ±£Ö¤ËùÓнڵã
get the same inputs in the same order£¬Ê¹ÓÃreplicated
logÊÇÒ»¸öºÜ²»´íµÄ×¢Ò⣬log¾ßÓг־û¯¡¢±£ÐòµÄÌØµã£¬ÊÇ´ó¶àÊý·Ö²¼Ê½ÏµÍ³µÄ»ùʯ¡£
Òò´Ë£¬¿ÉÒÔÕâô˵£¬ÔÚraftÖУ¬leader½«¿Í»§¶ËÇëÇó£¨command£©·â×°µ½Ò»¸ö¸ölog
entry£¬½«ÕâЩlog entries¸´ÖÆ£¨replicate£©µ½ËùÓÐfollower½Úµã£¬È»ºó´ó¼Ò°´Ïàͬ˳ÐòÓ¦Óã¨apply£©log
entryÖеÄcommand£¬Ôò״̬¿Ï¶¨ÊÇÒ»Öµġ£
ÏÂͼÐÎÏóչʾÁËÕâÖÖlog-based replicated state machine

ÇëÇóÍêÕûÁ÷³Ì
µ±ÏµÍ³£¨leader£©ÊÕµ½Ò»¸öÀ´×Ô¿Í»§¶ËµÄдÇëÇ󣬵½·µ»Ø¸ø¿Í»§¶Ë£¬Õû¸ö¹ý³Ì´ÓleaderµÄÊÓ½ÇÀ´¿´»á¾ÀúÒÔϲ½Ö裺
leader append log entry
leader issue AppendEntries RPC in parallel
leader wait for majority response
leader apply entry to state machine
leader reply to client
leader notify follower apply log ¿ÉÒÔ¿´µ½ÈÕÖ¾µÄÌá½»¹ý³ÌÓеãÀàËÆÁ½½×¶ÎÌá½»(2PC)£¬²»¹ýÓë2PCµÄÇø±ðÔÚÓÚ£¬leaderÖ»ÐèÒª´ó¶àÊý£¨majority£©½ÚµãµÄ»Ø¸´¼´¿É£¬ÕâÑùÖ»Òª³¬¹ýÒ»°ë½Úµã´¦ÓÚ¹¤×÷״̬Ôòϵͳ¾ÍÊÇ¿ÉÓõġ£
ÄÇôÈÕÖ¾ÔÚÿ¸ö½ÚµãÉÏÊÇʲôÑù×ÓµÄÄØ

²»ÄÑ¿´µ½£¬logsÓÉ˳Ðò±àºÅµÄlog entry×é³É £¬Ã¿¸ölog entry³ýÁ˰üº¬command£¬»¹°üº¬²úÉú¸Ãlog
entryʱµÄleader term¡£´ÓÉÏͼ¿ÉÒÔ¿´µ½£¬Îå¸ö½ÚµãµÄÈÕÖ¾²¢²»ÍêȫһÖ£¬raftË㷨ΪÁ˱£Ö¤¸ß¿ÉÓ㬲¢²»ÊÇǿһÖÂÐÔ£¬¶øÊÇ×îÖÕÒ»ÖÂÐÔ£¬leader»á²»¶Ï³¢ÊÔ¸øfollower·¢log
entries£¬Ö±µ½ËùÓнڵãµÄlog entries¶¼Ïàͬ¡£
ÔÚÉÏÃæµÄÁ÷³ÌÖУ¬leaderÖ»ÐèÒªÈÕÖ¾±»¸´ÖƵ½´ó¶àÊý½Úµã¼´¿ÉÏò¿Í»§¶Ë·µ»Ø£¬Ò»µ©Ïò¿Í»§¶Ë·µ»Ø³É¹¦ÏûÏ¢£¬ÄÇôϵͳ¾Í±ØÐë±£Ö¤log£¨ÆäʵÊÇlogËù°üº¬µÄcommand£©ÔÚÈκÎÒì³£µÄÇé¿ö϶¼²»»á·¢Éú»Ø¹ö¡£ÕâÀïÓÐÁ½¸ö´Ê£ºcommit£¨committed£©£¬apply(applied)£¬Ç°ÕßÊÇÖ¸ÈÕÖ¾±»¸´ÖƵ½ÁË´ó¶àÊý½ÚµãºóÈÕÖ¾µÄ״̬£»¶øºóÕßÔòÊǽڵ㽫ÈÕÖ¾Ó¦Óõ½×´Ì¬»ú£¬ÕæÕýÓ°Ïìµ½½Úµã״̬¡£
The leader decides when it is safe to apply a log
entry to the state machines; such an entry is called
committed. Raft guarantees that committed entries
are durable and will eventually be executed by all
of the available state machines. A log entry is committed
once the leader that created the entry has replicated
it on a majority of the servers
safety ÔÚÉÏÃæÌáµ½Ö»ÒªÈÕÖ¾±»¸´ÖƵ½majority½Úµã£¬¾ÍÄܱ£Ö¤²»»á±»»Ø¹ö£¬¼´Ê¹ÔÚ¸÷ÖÖÒì³£Çé¿öÏ£¬Õâ¸ùleader
electionÌáµ½µÄÑ¡¾ÙÔ¼ÊøÓйء£ÔÚÕâÒ»²¿·Ö£¬Ö÷ÒªÌÖÂÛraftÐÒéÔÚ¸÷ÖÖ¸÷ÑùµÄÒì³£Çé¿öÏÂÈçºÎ¹¤×÷µÄ¡£
ºâÁ¿Ò»¸ö·Ö²¼Ê½Ëã·¨£¬ÓÐÐí¶àÊôÐÔ£¬Èç
safety£ºnothing bad happens,
liveness£º something good eventually happens. ÔÚÈκÎϵͳģÐÍÏ£¬¶¼ÐèÒªÂú×ãsafetyÊôÐÔ£¬¼´ÔÚÈκÎÇé¿öÏ£¬ÏµÍ³¶¼²»ÄܳöÏÖ²»¿ÉÄæµÄ´íÎó£¬Ò²²»ÄÜÏò¿Í»§¶Ë·µ»Ø´íÎóµÄÄÚÈÝ¡£±ÈÈ磬raft±£Ö¤±»¸´ÖƵ½´ó¶àÊý½ÚµãµÄÈÕÖ¾²»»á±»»Ø¹ö£¬ÄÇô¾ÍÊÇsafetyÊôÐÔ¡£¶øraft×îÖÕ»áÈÃËùÓнڵã״̬һÖ£¬ÕâÊôÓÚlivenessÊôÐÔ¡£
raftÐÒé»á±£Ö¤ÒÔÏÂÊôÐÔ

Election safety Ñ¡¾Ù°²È«ÐÔ£¬¼´ÈÎÒ»ÈÎÆÚÄÚ×î¶àÒ»¸öleader±»Ñ¡³ö¡£ÕâÒ»µã·Ç³£ÖØÒª£¬ÔÚÒ»¸ö¸´ÖƼ¯ÖÐÈκÎʱ¿ÌÖ»ÄÜÓÐÒ»¸öleader¡£ÏµÍ³ÖÐͬʱÓжàÓàÒ»¸öleader£¬±»³ÆÖ®ÎªÄÔÁÑ£¨brain
split£©£¬ÕâÊǷdz£ÑÏÖØµÄÎÊÌ⣬»áµ¼ÖÂÊý¾ÝµÄ¸²¸Ç¶ªÊ§¡£ÔÚraftÖУ¬Á½µã±£Ö¤ÁËÕâ¸öÊôÐÔ£º
Ò»¸ö½ÚµãijһÈÎÆÚÄÚ×î¶àÖ»ÄÜͶһƱ£»
Ö»ÓлñµÃmajorityͶƱµÄ½Úµã²Å»á³ÉΪleader¡£ Òò´Ë£¬Ä³Ò»ÈÎÆÚÄÚÒ»¶¨Ö»ÓÐÒ»¸öleader¡£
log matching ºÜÓÐÒâ˼£¬logÆ¥ÅäÌØÐÔ£¬ ¾ÍÊÇ˵Èç¹ûÁ½¸ö½ÚµãÉϵÄij¸ölog entryµÄlog indexÏàͬÇÒtermÏàͬ£¬ÄÇôÔÚ¸Ãindex֮ǰµÄËùÓÐlog
entryÓ¦¸Ã¶¼ÊÇÏàͬµÄ¡£ÈçºÎ×öµ½µÄ£¿ÒÀÀµÓÚÒÔÏÂÁ½µã
If two entries in different logs have the same index
and term, then they store the same command.
If two entries in different logs have the same index
and term, then the logs are identical in all preceding
entries. Ê×ÏÈ£¬leaderÔÚijһtermµÄÈÎһλÖÃÖ»»á´´½¨Ò»¸ölog entry£¬ÇÒlog entryÊÇappend-only¡£Æä´Î£¬consistency
check¡£leaderÔÚAppendEntriesÖаüº¬×îÐÂlog entry֮ǰµÄÒ»¸ölog µÄtermºÍindex£¬Èç¹ûfollowerÔÚ¶ÔÓ¦µÄterm
indexÕÒ²»µ½ÈÕÖ¾£¬ÄÇô¾Í»á¸æÖªleader²»Ò»Ö¡£
ÔÚûÓÐÒì³£µÄÇé¿öÏ£¬log matchingÊǺÜÈÝÒ×Âú×ãµÄ£¬µ«Èç¹û³öÏÖÁËnode crash£¬Çé¿ö¾Í»á±äµÃ¸ºÔð¡£±ÈÈçÏÂͼ

×¢Ò⣺ÉÏͼµÄa-f²»ÊÇ6¸öfollower£¬¶øÊÇij¸öfollower¿ÉÄÜ´æÔÚµÄÁù¸ö״̬
leader¡¢follower¶¼¿ÉÄÜcrash£¬ÄÇôfollowerά»¤µÄÈÕÖ¾ÓëleaderÏà±È¿ÉÄܳöÏÖÒÔÏÂÇé¿ö
±ÈleaderÈÕÖ¾ÉÙ£¬ÈçÉÏͼÖеÄab
±ÈleaderÈÕÖ¾¶à£¬ÈçÉÏͼÖеÄcd
ijЩλÖñÈleader¶à£¬Ä³Ð©ÈÕÖ¾±ÈleaderÉÙ£¬Èçef£¨¶àÉÙÊÇÕë¶ÔijһÈÎÆÚ¶øÑÔ£© µ±³öÏÖÁËleaderÓëfollower²»Ò»ÖµÄÇé¿ö£¬leaderÇ¿ÖÆfollower¸´ÖÆ×Ô¼ºµÄlog
To bring a follower¡¯s log into consistency with its
own, the leader must find the latest log entry where
the two logs agree, delete any entries in the follower¡¯s
log after that point, and send the follower all of
the leader¡¯s entries after that point.
leader»áά»¤Ò»¸önextIndex[]Êý×飬¼Ç¼ÁËleader¿ÉÒÔ·¢ËÍÿһ¸öfollowerµÄlog
index£¬³õʼ»¯Îªeader×îºóÒ»¸ölog index¼Ó1£¬ Ç°ÃæÒ²Ìáµ½£¬leaderÑ¡¾Ù³É¹¦Ö®ºó»áÁ¢¼´¸øËùÓÐfollower·¢ËÍAppendEntries
RPC£¨²»°üº¬ÈκÎlog entry£¬ Ò²³äµ±ÐÄÌøÏûÏ¢£©,ÄÇôÁ÷³Ì×ܽáΪ£º
s1 leader ³õʼ»¯nextIndex[x]Ϊ leader×îºóÒ»¸ölog index +
1
s2 AppendEntriesÀïprevLogTerm prevLogIndexÀ´×Ô logs[nextIndex[x]
- 1]
s3 Èç¹ûfollowerÅжÏprevLogIndexλÖõÄlog term²»µÈÓÚprevLogTerm£¬ÄÇô·µ»Ø
False£¬·ñÔò·µ»ØTrue
s4 leaderÊÕµ½followerµÄ»Ø¸´£¬Èç¹û·µ»ØÖµÊÇFalse£¬ÔònextIndex[x] -=
1, Ìø×ªµ½s2. ·ñÔò
s5 ͬ²½nextIndex[x]ºóµÄËùÓÐlog entries
leader completeness vs elcetion restriction leaderÍêÕûÐÔ£ºÈç¹ûÒ»¸ölog entryÔÚij¸öÈÎÆÚ±»Ìá½»£¨committed£©£¬ÄÇôÕâÌõÈÕÖ¾Ò»¶¨»á³öÏÖÔÚËùÓиü¸ßtermµÄleaderµÄÈÕÖ¾ÀïÃæ¡£Õâ¸ö¸úleader
election¡¢log replication¶¼Óйء£
Ò»¸öÈÕÖ¾±»¸´ÖƵ½majority½Úµã²ÅËãcommitted
Ò»¸ö½ÚµãµÃµ½majorityµÄͶƱ²ÅÄܳÉΪleader£¬¶ø½ÚµãA¸ø½ÚµãBͶƱµÄÆäÖÐÒ»¸öǰÌáÊÇ£¬BµÄÈÕÖ¾²»ÄܱÈAµÄÈÕÖ¾¾É¡£ÏÂÃæµÄÒýÎÄÖ¸´¦ÁËÈçºÎÅжÏÈÕÖ¾µÄоÉ
voter denies its vote if its own log is more up-to-date
than that of the candidate.
If the logs have last entries with different terms,
then the log with the later term is more up-to-date.
If the logs end with the same term, then whichever
log is longer is more up-to-date.
ÉÏÃæÁ½µã¶¼Ìáµ½ÁËmajority£ºcommit majority and vote majority£¬¸ù¾ÝQuorum£¬ÕâÁ½¸ömajorityÒ»¶¨ÊÇÓÐÖØºÏµÄ£¬Òò´Ë±»Ñ¡¾Ù³öµÄleaderÒ»¶¨°üº¬ÁË×îеÄcommittedµÄÈÕÖ¾¡£
raftÓëÆäËûÐÒ飨Viewstamped Replication¡¢mongodb£©²»Í¬£¬raftʼÖÕ±£Ö¤leade°üº¬×îеÄÒÑÌá½»µÄÈÕÖ¾£¬Òò´Ëleader²»»á´Ófollower
catchupÈÕÖ¾£¬ÕâÒ²´ó´ó¼ò»¯ÁËϵͳµÄ¸´ÔÓ¶È¡£
corner case
stale leader raft±£Ö¤Election safety£¬¼´Ò»¸öÈÎÆÚÄÚ×î¶àÖ»ÓÐÒ»¸öleader£¬µ«ÔÚÍøÂç·Ö¸î£¨network
partition£©µÄÇé¿öÏ£¬¿ÉÄÜ»á³öÏÖÁ½¸öleader£¬µ«Á½¸öleaderËù´¦µÄÈÎÆÚÊDz»Í¬µÄ¡£ÈçÏÂͼËùʾ

ϵͳÓÐ5¸ö½ÚµãABCDE×é³É£¬ÔÚterm1£¬Node BÊÇleader£¬µ«Node A¡¢BºÍNode
C¡¢D¡¢EÖ®¼ä³öÏÖÁËÍøÂç·Ö¸î£¬Òò´ËNode C¡¢D¡¢EÎÞ·¨ÊÕµ½À´×Ôleader£¨Node B£©µÄÏûÏ¢£¬ÔÚelection
timeÖ®ºó£¬Node C¡¢D¡¢E»á·ÖÆÚÑ¡¾Ù£¬ÓÉÓÚÂú×ãmajorityÌõ¼þ£¬Node E³ÉΪÁËterm
2µÄleader¡£Òò´Ë£¬ÔÚϵͳÖÐÃ²ËÆ³öÏÖÁËÁ½¸öleader£ºterm 1µÄNode B£¬ term
2µÄNode E, Node BµÄterm¸ü¾É£¬µ«ÓÉÓÚÎÞ·¨ÓëMajority½ÚµãͨÐÅ£¬NodeBÈÔÈ»»áÈÏΪ×Ô¼ºÊÇleader¡£
ÔÚÕâÑùµÄÇé¿öÏ£¬ÎÒÃÇÀ´¿¼ÂǶÁд¡£
Ê×ÏÈ£¬Èç¹û¿Í»§¶Ë½«ÇëÇó·¢Ë͵½ÁËNodeB£¬NodeBÎÞ·¨½«log entry ¸´ÖƵ½majority½Úµã£¬Òò´Ë²»»á¸æË߿ͻ§¶ËдÈë³É¹¦£¬Õâ¾Í²»»áÓÐÎÊÌâ¡£
¶ÔÓÚ¶ÁÇëÇó£¬stale leader¿ÉÄÜ·µ»Østale data£¬±ÈÈçÔÚread-after-writeµÄÒ»ÖÂÐÔÒªÇóÏ£¬¿Í»§¶ËдÈëµ½ÁËterm2ÈÎÆÚµÄleader
Node E£¬µ«¶ÁÇëÇó·¢Ë͵½ÁËNode B¡£Èç¹ûÒª±£Ö¤²»·µ»Østale data£¬leaderÐèÒªcheck×Ô¼ºÊÇ·ñ¹ýʱÁË£¬°ì·¨¾ÍÊÇÓë´ó¶àÊý½ÚµãͨÐÅÒ»´Î£¬Õâ¸ö¿ÉÄÜ»á³öÏÖЧÂÊÎÊÌâ¡£ÁíÒ»ÖÖ·½Ê½ÊÇʹÓÃlease£¬µ«Õâ¾Í»áÒÀÀµÎïÀíʱÖÓ¡£
´ÓraftµÄÂÛÎÄÖпÉÒÔ¿´µ½£¬leaderת»»³ÉfollowerµÄÌõ¼þÊÇÊÕµ½À´×Ô¸ü¸ßtermµÄÏûÏ¢£¬Èç¹ûÍøÂç·Ö¸îÒ»Ö±³ÖÐø£¬ÄÇôstale
leader¾Í»áÒ»Ö±´æÔÚ¡£¶øÔÚraftµÄһЩʵÏÖ»òÕßraft-likeÐÒéÖУ¬leaderÈç¹ûÊÕ²»µ½majority½ÚµãµÄÏûÏ¢£¬ÄÇô¿ÉÒÔ×Ô¼ºstep
down£¬×ÔÐÐת»»µ½follower״̬¡£
State Machine Safety Ç°ÃæÔÚ½éÉÜsafetyµÄʱºòÓÐÒ»ÌõÊôÐÔûÓÐÏêϸ½éÉÜ£¬ÄǾÍÊÇState Machine Safety£º
State Machine Safety: if a server has applied a log
entry at a given index to its state machine, no other
server will ever apply a different log entry for the
same index.
Èç¹û½Úµã½«Ä³Ò»Î»ÖõÄlog entryÓ¦Óõ½ÁË״̬»ú£¬ÄÇôÆäËû½ÚµãÔÚͬһλÖò»ÄÜÓ¦Óò»Í¬µÄÈÕÖ¾¡£¼òµ¥µãÀ´Ëµ£¬ËùÓнڵãÔÚͬһλÖã¨index
in log entries£©Ó¦¸ÃÓ¦ÓÃͬÑùµÄÈÕÖ¾¡£µ«ÊÇËÆºõÓÐijЩÇé¿ö»áÎ¥±³Õâ¸öÔÔò£º
ÉÏͼÊÇÒ»¸ö½ÏΪ¸´ÔÓµÄÇé¿ö¡£ÔÚʱ¿Ì(a), s1ÊÇleader£¬ÔÚterm2Ìá½»µÄÈÕÖ¾Ö»¸³Öµµ½ÁËs1
s2Á½¸ö½Úµã¾ÍcrashÁË¡£ÔÚʱ¿Ì£¨b), s5³ÉΪÁËterm 3µÄleader£¬ÈÕÖ¾Ö»¸³Öµµ½ÁËs5£¬È»ºócrash¡£È»ºóÔÚ(c)ʱ¿Ì£¬s1ÓÖ³ÉΪÁËterm
4µÄleader£¬¿ªÊ¼¸³ÖµÈÕÖ¾£¬ÓÚÊǰÑterm2µÄÈÕÖ¾¸´ÖƵ½ÁËs3£¬´Ë¿Ì£¬¿ÉÒÔ¿´³öterm2¶ÔÓ¦µÄÈÕÖ¾ÒѾ±»¸´ÖƵ½ÁËmajority£¬Òò´ËÊÇcommitted£¬¿ÉÒÔ±»×´Ì¬»úÓ¦Óᣲ»ÐÒµÄÊÇ£¬½ÓÏÂÀ´£¨d£©Ê±¿Ì£¬s1ÓÖcrashÁË£¬s5ÖØÐµ±Ñ¡£¬È»ºó½«term3µÄÈÕÖ¾¸´ÖƵ½ËùÓнڵ㣬Õâ¾Í³öÏÖÁËÒ»ÖÖÆæ¹ÖµÄÏÖÏ󣺱»¸´ÖƵ½´ó¶àÊý½Úµã£¨»òÕß˵¿ÉÄÜÒѾӦÓ㩵ÄÈÕÖ¾±»»Ø¹ö¡£
¾¿Æä¸ù±¾£¬ÊÇÒòΪterm4ʱµÄleader s1ÔÚ£¨C£©Ê±¿ÌÌá½»ÁË֮ǰterm2ÈÎÆÚµÄÈÕÖ¾¡£ÎªÁ˶žøÕâÖÖÇé¿öµÄ·¢Éú£º
Raft never commits log entries from previous terms
by counting replicas.
Only log entries from the leader¡¯s current term are
committed by counting replicas; once an entry from
the current term has been committed in this way, then
all prior entries are committed indirectly because
of the Log Matching Property.
Ò²¾ÍÊÇ˵£¬Ä³¸öleaderÑ¡¾Ù³É¹¦Ö®ºó£¬²»»áÖ±½ÓÌύǰÈÎleaderʱÆÚµÄÈÕÖ¾£¬¶øÊÇͨ¹ýÌá½»µ±Ç°ÈÎÆÚµÄÈÕÖ¾µÄʱºò¡°Ë³ÊÖ¡±°Ñ֮ǰµÄÈÕÖ¾Ò²Ìá½»ÁË£¬¾ßÌåÔõôʵÏÖÁË£¬ÔÚlog
matching²¿·ÖÓÐÏêϸ½éÉÜ¡£ÄÇôÎÊÌâÀ´ÁË£¬Èç¹ûleader±»Ñ¡¾ÙºóûÓÐÊÕµ½¿Í»§¶ËµÄÇëÇóÄØ£¬ÂÛÎÄÖÐÓÐÌáµ½£¬ÔÚÈÎÆÚ¿ªÊ¼µÄʱºò·¢Á¢¼´³¢ÊÔ¸´ÖÆ¡¢Ìá½»Ò»Ìõ¿ÕµÄlog
Raft handles this by having each leader commit a
blank no-op entry into the log at the start of its
term.
Òò´Ë£¬ÔÚÉÏͼÖУ¬²»»á³öÏÖ£¨C£©Ê±¿ÌµÄÇé¿ö£¬¼´term4ÈÎÆÚµÄleader s1²»»á¸´ÖÆterm2µÄÈÕÖ¾µ½s3¡£¶øÊÇÈçͬ(e)ÃèÊöµÄÇé¿ö£¬Í¨¹ý¸´ÖÆ-Ìá½»
term4µÄÈÕ־˳±ãÌá½»term2µÄÈÕÖ¾¡£Èç¹ûterm4µÄÈÕÖ¾Ìá½»³É¹¦£¬ÄÇôterm2µÄÈÕÖ¾Ò²Ò»¶¨Ìá½»³É¹¦£¬´Ëʱ¼´Ê¹s1crash£¬s5Ò²²»»áÖØÐµ±Ñ¡¡£
leader crash followerµÄcrash´¦Àí·½Ê½Ïà¶Ô¼òµ¥£¬leaderÖ»Òª²»Í£µÄ¸øfollower·¢ÏûÏ¢¼´¿É¡£µ±leader
crashµÄʱºò£¬ÊÂÇé¾Í»á±äµÃ¸´ÔÓ¡£ÔÚÕâÆªÎÄÕÂÖУ¬×÷Õ߾͸ø³öÁËÒ»¸ö¸üÐÂÇëÇóµÄÁ÷³Ìͼ¡£
Àý×Ó ÎÒÃÇ¿ÉÒÔ·ÖÎöleaderÔÚÈÎÒâʱ¿ÌcrashµÄÇé¿ö£¬ÓÐÖúÓÚÀí½âraftËã·¨µÄÈÝ´íÐÔ¡£
×ܽá raft½«¹²Ê¶ÎÊÌâ·Ö½â³ÉÁ½¸öÏà¶Ô¶ÀÁ¢µÄÎÊÌ⣬leader election£¬log replication¡£Á÷³ÌÊÇÏÈÑ¡¾Ù³öleader£¬È»ºóleader¸ºÔð¸´ÖÆ¡¢Ìá½»log£¨logÖаüº¬command£©
ΪÁËÔÚÈκÎÒì³£Çé¿öÏÂϵͳ²»³ö´í£¬¼´Âú×ãsafetyÊôÐÔ£¬¶Ôleader election£¬log
replicationÁ½¸ö×ÓÎÊÌâÓÐÖî¶àÔ¼Êø
leader electionÔ¼Êø£º
ͬһÈÎÆÚÄÚ×î¶àÖ»ÄÜͶһƱ£¬ÏÈÀ´ÏȵÃ
Ñ¡¾ÙÈ˱ØÐë±È×Ô¼ºÖªµÀµÄ¸ü¶à£¨±È½Ïterm£¬log index£©
log replicationÔ¼Êø£º
Ò»¸ölog±»¸´ÖƵ½´ó¶àÊý½Úµã£¬¾ÍÊÇcommitted£¬±£Ö¤²»»á»Ø¹ö
leaderÒ»¶¨°üº¬×îеÄcommitted log£¬Òò´ËleaderÖ»»á×·¼ÓÈÕÖ¾£¬²»»áɾ³ý¸²¸ÇÈÕÖ¾
²»Í¬½Úµã£¬Ä³¸öλÖÃÉÏÈÕÖ¾Ïàͬ£¬ÄÇôÕâ¸öλÖÃ֮ǰµÄËùÓÐÈÕÖ¾Ò»¶¨ÊÇÏàͬµÄ
Raft never commits log entries from previous terms
by counting replicas. ±¾ÎÄÊÇÔÚ¿´ÍêraftÂÛÎĺó×Ô¼ºµÄ×ܽᣬ²»Ò»¶¨È«Ãæ¡£¸öÈ˾õµÃ£¬Èç¹ûÖ»ÊÇÏà¶ÔraftÐÒéÓÐÒ»¸ö¼òµ¥Á˽⣬¿´Õâ¸ö¶¯»ÑÝʾ¾Í×ã¹»ÁË£¬Èç¹ûÏëÉîÈëÁ˽⣬»¹ÊÇÒª¿´ÂÛÎÄ£¬ÂÛÎÄÖÐFigure
2¶ÔraftËã·¨½øÐÐÁ˸ÅÀ¨¡£×îºó£¬»¹ÊÇÕÒÒ»¸öʵÏÖÁËraftËã·¨µÄϵͳÀ´¿´¿´¸üºÃ¡£ |