¼ò½é
OpenStackÊÇÒ»¸öÃÀ¹ú¹ú¼Òº½¿Õº½Ìì¾ÖºÍRackspaceºÏ×÷Ñз¢µÄ¿ªÔ´ÔƼÆËãÏîÄ¿£¬²¢³ÉΪApacheϵÄÒ»¸öÖØÒª¿ªÔ´ÏîÄ¿£¬Ä¿Ç°ÒѾ·¢Õ¹µ½ÁË180¼Ò¹«Ë¾²ÎÓëÆäÖС£
OpenStack Object Storage£¨Swift£©ÊÇOpenStack¿ªÔ´ÔƼÆËãÏîÄ¿µÄ×ÓÏîĿ֮һ¡£SwiftµÄÄ¿µÄÊÇʹÓÃÆÕͨӲ¼þÀ´¹¹½¨ÈßÓàµÄ¡¢¿ÉÀ©Õ¹µÄ·Ö²¼Ê½¶ÔÏó´æ´¢¼¯Èº£¬´æ´¢ÈÝÁ¿¿É´ïPB¼¶¡£OpenStack
Object Storage ×î³õÓÉ Rackspace ²ÉÓÃPythonÓïÑÔ¿ª·¢£¬²¢ÓÚ 2010 Äê
7 Ô¹±Ï׸øOpenStack ,×÷Ϊ¸Ã¿ªÔ´ÏîÄ¿µÄÒ»²¿·Ö¡£ËüµÄÄ¿µÄÊÇÓÃÓÚÍÐ¹Ü RackspaceµÄ Cloud
Files service £¬ÔʼÏîÄ¿´úºÅÊÇ swift£¬ËùÒÔÑØÓÃÖÁ½ñ¡£
ÔÚ·Ö²¼Ê½¶ÔÏó´æ´¢ÖеÄÒ»¸ö¹Ø¼üÎÊÌâÊÇÊý¾Ý¸ÃÈçºÎ´æ·Å¡£RingÊÇSwiftÖÐ×îÖØÒªµÄ×é¼þ£¬ÓÃÓڼǼ´æ´¢¶ÔÏóÓëÎïÀíλÖüäÓ³Éä¹ØÏµ¡£ÔÚÉæ¼°²éѯaccount¡¢container¡¢objectÐÅϢʱ¾ÍÐèÒª²éѯ¼¯ÈºµÄringÐÅÏ¢¡£
ÏÈÀ´¿´Ò»ÏÂSwiftÎĵµÖйØÓÚRingµÄÃèÊö£º
RingÓÃÀ´È·¶¨Êý¾ÝפÁôÔÚ¼¯ÈºÖеÄλÖá£Óе¥¶À¶ÔÓ¦ÓÚAccountÊý¾Ý¿â¡¢containerÊý¾Ý¿âºÍµ¥¸öobjectµÄring¡£
RingÖÐÿ¸öpartitionÔÚ¼¯ÈºÖж¼(ĬÈÏ)ÓÐ3¸öreplica¡£Ã¿¸öpartitionµÄλÖÃÓÉringÀ´Î¬»¤,²¢´æ´¢ÔÚÓ³ÉäÖС£
RingʹÓÃzoneµÄ¸ÅÄîÀ´±£Ö¤Êý¾ÝµÄ¸ôÀ롣ÿ¸öpartitionµÄreplica¶¼È·±£·ÅÔÚÁ˲»Í¬µÄzoneÖС£Ò»¸özone¿ÉÒÔÊÇÒ»¸öÓ²ÅÌ£¬Ò»¸ö·þÎñÆ÷£¬Ò»¸ö»ú¼Ü£¬Ò»¸ö½»»»»ú£¬ÉõÖÁÊÇÒ»¸öÊý¾ÝÖÐÐÄ............
ÔÚÉÏÊöRingµÄÌØÐÔÃèÊöÖÐÌáµ½ÁËRingʹÓÃzone¡¢device¡¢partitionºÍreplicaµÈµÈÀ´Î¬»¤Êý¾ÝºÍ´ÅÅ̼äµÄÓ³ÉäÐÅÏ¢¡£ÄÇôÔÚRingµÄ±³ºó²ÉÓÃʲôËã·¨£¬Ê¹ÓÃÁËʲô»úÖÆÀ´±£Ö¤Êý¾ÝµÄ°²È«¡¢¸ßЧºÍ¿ÉÀ©Õ¹ÄØ£¿ÕâЩ¸ÅÄî¶ÔÓÚÊý¾Ý´æ´¢´øÀ´ÁËʲôºÃ´¦£¿±¾ÎÄÖð²½ÉîÈë̽ÌÖÁËSwiftÈçºÎͨ¹ýRing×é¼þÀ´ÊµÏÖÈßÓàµÄ¡¢¿ÉÀ©Õ¹µÄÄ¿µÄ¡£
1. ÆÕͨHashËã·¨Ó볡¾°·ÖÎö
ÏÈÀ´¿´Ò»¸ö¼òµ¥µÄÀý×Ó¼ÙÉèÎÒÃÇÊÖÀïÓÐN̨´æ´¢·þÎñÆ÷£¨ÒÔϼò³Ænode£©£¬´òËãÓÃÓÚͼƬÎļþ´æ´¢£¬ÎªÁËʹ·þÎñÆ÷µÄ¸ºÔؾùºâ£¬ÐèÒª°Ñ¶ÔÏó¾ùÔȵØÓ³É䵽ÿ̨·þÎñÆ÷ÉÏ£¬Í¨³£»áʹÓùþÏ£Ëã·¨À´ÊµÏÖ£¬¼ÆËã²½ÖèÈçÏ£º
a) ¼ÆËãobjectµÄhashÖµKey
b) ¼ÆËãKey mod NÖµ
ÓÐN¸ö´æ´¢½Úµã£¬½«KeyÄ£NµÃµ½µÄÓàÊý¾ÍÊǸÃKey¶ÔÓ¦µÄÖµÐèÒª´æ·ÅµÄ½Úµã¡£±ÈÈ磬NÊÇ2£¬ÄÇôֵΪ0¡¢1¡¢2¡¢3¡¢4µÄKeyÐèÒª·Ö±ð´æ·ÅÔÚ0¡¢1¡¢0¡¢1ºÍ0ºÅ½ÚµãÉÏ¡£Èç¹û¹þÏ£Ëã·¨ÊǾùÔȵģ¬Êý¾Ý¾Í»á±»Æ½¾ù·ÖÅäµ½Á½¸ö½ÚµãÖС£Èç¹ûÿ¸öÊý¾ÝµÄ·ÃÎÊÁ¿±È½Ïƽ¾ù£¬¸ºÔØÒ²»á±»Æ½¾ù·ÖÅäµ½Á½¸ö½ÚµãÉÏ¡£
µ«ÊÇ£¬µ±Êý¾ÝÁ¿ºÍ·ÃÎÊÁ¿½øÒ»²½Ôö¼Ó£¬Á½¸ö½ÚµãÎÞ·¨Âú×ãÐèÇóµÄʱºò£¬ÐèÒªÔö¼ÓÒ»¸ö½ÚµãÀ´·þÎñ¿Í»§¶ËµÄÇëÇó¡£Õâʱ£¬N±ä³ÉÁË3£¬Ó³Éä¹ØÏµ±ä³ÉÁËKey
mod (N+1)£¬Òò´Ë£¬ÉÏÊö¹þϣֵΪ2¡¢3¡¢4µÄÊý¾ÝÐèÒªÖØÐ·ÖÅ䣨2->server 2£¬3 ->
server 0£¬4 -> server 1£©¡£Èç¹ûÊý¾ÝÁ¿ºÜ´óµÄ»°£¬ÄÇôÊý¾ÝÁ¿µÄÇ¨ÒÆ¹¤×÷½«»á·Ç³£´ó¡£µ±NÒѾºÜ´ó£¬´ÓN¼ÓÈëÒ»¸ö½Úµã±ä³ÉN+1¸ö½ÚµãµÄ¹ý³Ì£¬»áµ¼ÖÂÕû¸ö¹þÏ£»·µÄÖØÐ·ÖÅ䣬Õâ¸ö¹ý³Ì¼¸ºõÊÇÎÞ·¨ÈÝÈ̵쬼¸ºõÈ«²¿µÄÊý¾Ý¶¼ÒªÖØÐÂÒÆ¶¯Ò»±é¡£
ÎÒÃǾÙÀý˵Ã÷£¬¼ÙÉèÓÐ100¸önodeµÄ¼¯Èº£¬½«107ÏîÊý¾ÝʹÓÃmd5 hashËã·¨·ÖÅ䵽ÿ¸önodeÖУ¬Python´úÂëÈçÏ£º
from hashlib import md5 from struct import unpack_from NODE_COUNT = 100 DATA_ID_COUNT = 10000000 node_counts = [0] * NODE_COUNT for data_id in xrange(DATA_ID_COUNT): data_id = str(data_id) # This just pulls part of the hash out as an integer hsh = unpack_from('>I', md5(data_id).digest())[0] node_id = hsh % NODE_COUNT node_counts[node_id] += 1 desired_count = DATA_ID_COUNT / NODE_COUNT print '%d: Desired data ids per node' % desired_count max_count = max(node_counts) over = 100.0 * (max_count - desired_count) / desired_count print '%d: Most data ids on one node, %.02f%% over' % \ (max_count, over) min_count = min(node_counts) under = 100.0 * (desired_count - min_count) / desired_count print '%d: Least data ids on one node, %.02f%% under' % \ (min_count, under) 100000: Desired data ids per node 100695: Most data ids on one node, 0.69% over 99073: Least data ids on one node, 0.93% under |
·Ö²¼½á¹ûÈçÏÂËùʾ£º

´ÓÊý¾Ý·Ö²¼ÉÏÀ´¿´ÓµÓÐ×î¶à/×îÉÙÊý¾ÝÏîµÄ½ÚµãûÓг¬³öƽ¾ùÖµµÄ1%¡£ÏÖÔÚ¼ÙÉèÔö¼ÓÒ»¸ö½ÚµãÌṩ¸ºÔØÄÜÁ¦£¬²»¹ýµÃÖØÐ·ÖÅäÊý¾ÝÏеĽڵãÉÏ£¬´úÂëÈçÏ£º
from hashlib import md5 from struct import unpack_from NODE_COUNT = 100 NEW_NODE_COUNT = 101 DATA_ID_COUNT = 10000000 moved_ids = 0 for data_id in xrange(DATA_ID_COUNT): data_id = str(data_id) hsh = unpack_from('>I', md5(str(data_id)).digest())[0] node_id = hsh % NODE_COUNT new_node_id = hsh % NEW_NODE_COUNT if node_id != new_node_id: moved_ids += 1 percent_moved = 100.0 * moved_ids / DATA_ID_COUNT print '%d ids moved, %.02f%%' % (moved_ids, percent_moved) 9900989 ids moved, 99.01% |
ͨ¹ý¼ÆËãÎÒÃÇ·¢ÏÖ£¬ÎªÁËÌá¸ß¼¯Èº1%µÄ´æ´¢ÄÜÁ¦£¬ÎÒÃÇÐèÒªÒÆ¶¯9900989¸öÊý¾ÝÏҲ¾ÍÊÇ99.01%µÄÊý¾ÝÏÏÔÈ»£¬ÕâÖÖËã·¨ÑÏÖØµØÓ°ÏìÁËϵͳµÄÐÔÄܺͿÉÀ©Õ¹ÐÔ¡£

Ôö¼Ó1%µÄ´æ´¢ÄÜÁ¦=ÒÆ¶¯99%µÄÊý¾Ý£¿
ÕâÖÖ¿÷±¾ÉúÒâÏÔÈ»×ö²»µÃ£¬ÄÇôÔõô°ìÄØ£¿Ò»ÖÂÐÔ¹þÏ£Ëã·¨¾ÍÊÇΪÁ˽â¾öÕâ¸öÎÊÌâ¶øÀ´¡£
2. Ò»ÖÂÐÔ¹þÏ£Ëã·¨
Ò»ÖÂÐÔ¹þÏ£Ëã·¨ÊÇÓÉD. Darger¡¢E. LehmanºÍT. Leighton µÈÈËÓÚ1997ÄêÔÚÂÛÎÄConsistent
Hashing and Random Trees:Distributed Caching Protocols
for Relieving Hot Spots On the World Wide WebÊ×´ÎÌá³ö£¬Ä¿µÄÖ÷ÒªÊÇΪÁ˽â¾ö·Ö²¼Ê½ÍøÂçÖеÄÈȵãÎÊÌâ¡£ÔÚÆäÂÛÎÄÖУ¬Ìá³öÁËÒ»ÖÂÐÔ¹þÏ£Ëã·¨²¢¸ø³öÁ˺âÁ¿Ò»¸ö¹þÏ£Ëã·¨µÄ4¸öÖ¸±ê£º
ƽºâÐÔ(Balance)
ƽºâÐÔÊÇÖ¸HashµÄ½á¹ûÄܹ»¾¡¿ÉÄÜ·Ö²¼¾ùÔÈ£¬³ä·ÖÀûÓÃËùÓлº´æ¿Õ¼ä¡£
µ¥µ÷ÐÔ(Monotonicity)
µ¥µ÷ÐÔÊÇÖ¸Èç¹ûÒѾÓÐһЩÄÚÈÝͨ¹ý¹þÏ£·ÖÅɵ½ÁËÏàÓ¦µÄ»º³åÖУ¬ÓÖÓÐÐµĻº³å¼ÓÈ뵽ϵͳÖС£¹þÏ£µÄ½á¹ûÓ¦Äܹ»±£Ö¤ÔÓÐÒÑ·ÖÅäµÄÄÚÈÝ¿ÉÒÔ±»Ó³Éäµ½ÐµĻº³åÖÐÈ¥£¬¶ø²»»á±»Ó³Éäµ½¾ÉµÄ»º³å¼¯ºÏÖÐµÄÆäËû»º³åÇø¡£
·ÖÉ¢ÐÔ(Spread)
·ÖÉ¢ÐÔ¶¨ÒåÁË·Ö²¼Ê½»·¾³ÖУ¬²»Í¬ÖÕ¶Ëͨ¹ýHash¹ý³Ì½«ÄÚÈÝÓ³ÉäÖÁ»º´æÉÏʱ£¬Òò¿É¼û»º´æ²»Í¬£¬Hash½á¹û²»Ò»Ö£¬ÏàͬµÄÄÚÈݱ»Ó³ÉäÖÁ²»Í¬µÄ»º³åÇø¡£
¸ºÔØ(Load)
¸ºÔØÊǶԷÖÉ¢ÐÔÒªÇóµÄÁíÒ»¸öγ¶È¡£¼ÈÈ»²»Í¬µÄÖÕ¶Ë¿ÉÒÔ½«ÏàͬµÄÄÚÈÝÓ³Éäµ½²»Í¬µÄ»º³åÇøÖУ¬ÄÇô¶ÔÓÚÒ»¸öÌØ¶¨µÄ»º³åÇø¶øÑÔ£¬Ò²¿ÉÄܱ»²»Í¬µÄÓû§Ó³ÉäΪ²»Í¬µÄÄÚÈÝ¡£
SwiftʹÓøÃËã·¨µÄÖ÷ҪĿµÄÊÇÔڸı伯ȺµÄnodeÊýÁ¿Ê±£¨Ôö¼Ó/ɾ³ý·þÎñÆ÷£©£¬Äܹ»¾¡¿ÉÄÜÉٵظıäÒÑ´æÔÚkeyºÍnodeµÄÓ³Éä¹ØÏµ£¬ÒÔÂú×ãµ¥µ÷ÐÔ¡£Ò»ÖÂÐÔ¹þÏ£Ò»°ãÁ½ÖÖ˼·£º
1.Ç¨ÒÆÎªÖ÷ÒªÌØµã(swift³õÆÚ²ÉÓÃ)
2.ÒýÈëÐé½áµã£¬¼õÉÙÒÆ¶¯ÎªÌصã(swiftÏÖ²ÉÓÃ)
¾ßÌå²½ÖèÈçÏ£º
1. Ê×ÏÈÇó³öÿ¸ö½Úµã(»úÆ÷Ãû»òÕßÊÇIPµØÖ·)µÄ¹þÏ£Öµ£¬²¢½«Æä·ÖÅäµ½Ò»¸öÔ²»·Çø¼äÉÏ£¨ÕâÀïÈ¡0-2^32£©¡£
2. Çó³öÐèÒª´æ´¢¶ÔÏóµÄ¹þÏ£Öµ£¬Ò²½«Æä·ÖÅäµ½Õâ¸öÔ²»·ÉÏ¡£
3. ´Ó¶ÔÏóÓ³Éäµ½µÄλÖÿªÊ¼Ë³Ê±Õë²éÕÒ£¬½«¶ÔÏó±£´æµ½ÕÒµ½µÄµÚÒ»¸ö½ÚµãÉÏ¡£
ÆäÖÐÕâ¸ö´Ó¹þÏ£µ½Î»ÖÃÓ³ÉäµÄÔ²»·£¬ÎÒÃǾͿÉÒÔÀí½âΪºÎʹÓÃÊõÓï¡°Ring¡±À´±íʾÁË¡£¹þÏ£»·¿Õ¼äÉϵķֲ¼Èçͼ1Ëùʾ£º

ͼ1 ¹þÏ£»·¿Õ¼ä
¼ÙÉèÔÚÕâ¸ö»·ÐιþÏ£¿Õ¼äÖУ¬Cache5±»Ó³ÉäÔÚCache3ºÍCache4Ö®¼ä£¬ÄÇôÊÜÓ°ÏìµÄ½«½öÊÇÑØCache5ÄæÊ±Õë±éÀúÖ±µ½ÏÂÒ»¸öCache£¨Cache3£©Ö®¼äµÄ¶ÔÏó£¨ËüÃDZ¾À´Ó³Éäµ½Cache4ÉÏ£©¡£

ͼ2 Ò»ÖÂÐÔ¹þÏ£Ëã·¨µÄÊý¾ÝÒÆ¶¯
ÏÖÔÚ£¬Ê¹ÓøÃËã·¨ÔÚ¼¯ÈºÖÐÔö¼ÓÒ»¸önode£¬Í¬Ê±Òª±£Ö¤Ã¿¸ö½ÚµãµÄÊý¾ÝÏîÊýÁ¿¾ùºâ£¬´úÂëÈçÏÂËùʾ£¬ÆäÖÐnode_range_starts±íʾÿ¸önodeµÄÊý¾ÝÏîµÄ¿ªÊ¼Î»Öá£
from bisect import bisect_left from hashlib import md5 from struct import unpack_from NODE_COUNT = 100 NEW_NODE_COUNT = 101 DATA_ID_COUNT = 10000000 node_range_starts = [] for node_id in xrange(NODE_COUNT): node_range_starts.append(DATA_ID_COUNT / NODE_COUNT * node_id) new_node_range_starts = [] for new_node_id in xrange(NEW_NODE_COUNT): new_node_range_starts.append(DATA_ID_COUNT / NEW_NODE_COUNT * new_node_id) moved_ids = 0 for data_id in xrange(DATA_ID_COUNT): data_id = str(data_id) hsh = unpack_from('>I', md5(str(data_id)).digest())[0] node_id = bisect_left(node_range_starts, hsh % DATA_ID_COUNT) % NODE_COUNT new_node_id = bisect_left(new_node_range_starts, hsh % DATA_ID_COUNT) % NEW_NODE_COUNT if node_id != new_node_id: moved_ids += 1 percent_moved = 100.0 * moved_ids / DATA_ID_COUNT print '%d ids moved, %.02f%%' % (moved_ids, percent_moved) 4901707 ids moved, 49.02% |
½á¹ûËäÈ»±È֮ǰºÃÁËЩ£¬µ«ÊÇÌá¸ß1%µÄÐÔÄÜÓëÒÆ¶¯50%µÄÊý¾ÝÈÔ²»ÀíÏë¡£
Ôö¼Ó1%ÄÜÁ¦=ÒÆ¶¯50%Êý¾Ý£¿
ÒýÈëÐéÄâ½Úµã(Partition)
¿¼Âǵ½¹þÏ£Ëã·¨ÔÚnode½ÏÉÙµÄÇé¿öÏ£¬¸Ä±änodeÊý»á´øÀ´¾Þ´óµÄÊý¾ÝÇ¨ÒÆ¡£ÎªÁ˽â¾öÕâÖÖÇé¿ö£¬Ò»ÖÂÐÔ¹þÏ£ÒýÈëÁË¡°ÐéÄâ½Úµã¡±µÄ¸ÅÄ
¡°ÐéÄâ½Úµã¡±ÊÇʵ¼Ê½ÚµãÔÚ»·ÐοռäµÄ¸´ÖÆÆ·£¬Ò»¸öʵ¼Ê½Úµã¶ÔÓ¦ÁËÈô¸É¸ö¡°ÐéÄâ½Úµã¡±£¬¡°ÐéÄâ½Úµã¡±ÔÚ¹þÏ£¿Õ¼äÖÐÒÔ¹þÏ£ÖµÅÅÁС£
ͼ3 ÐéÄâ½Úµã
ÒýÈëÁË¡°ÐéÄâ½Úµã¡±ºó£¬Ó³Éä¹ØÏµ¾Í´Ó¡¾object--->node¡¿×ª»»³ÉÁË¡¾object--->virtual
node---> node¡¿¡£²éѯobjectËùÔÚnodeµÄÓ³Éä¹ØÏµÈçÏÂͼËùʾ¡£

ͼ4 ¶ÔÏó¡¢Ðé½áµã¡¢½Úµã¼äµÄÓ³Éä¹ØÏµ
¶Ô100¸önodeϸ·ÖΪ1000¸övnode£¬Ê¹ÓÃvnode_range_startsÀ´Ö¸¶¨vnodeµÄ¿ªÊ¼·¶Î§£¬vnode2node±íʾvnodeµ½nodeµÄÖ¸ÅÉ£¬È»ºóÔö¼ÓÒ»¸önode£¬Íê³ÉvnodeµÄÖØÐ·ÖÅ䣬²¢¼ÆËãËùÒÆ¶¯µÄÊý¾ÝÏ
from bisect import bisect_left from hashlib import md5 from struct import unpack_from NODE_COUNT = 100 DATA_ID_COUNT = 10000000 VNODE_COUNT = 1000 vnode_range_starts = [] vnode2node = [] for vnode_id in xrange(VNODE_COUNT): vnode_range_starts.append(DATA_ID_COUNT / VNODE_COUNT * vnode_id) vnode2node.append(vnode_id % NODE_COUNT) new_vnode2node = list(vnode2node) new_node_id = NODE_COUNT NEW_NODE_COUNT = NODE_COUNT + 1 vnodes_to_reassign = VNODE_COUNT / NEW_NODE_COUNT while vnodes_to_reassign > 0: for node_to_take_from in xrange(NODE_COUNT): for vnode_id, node_id in enumerate(new_vnode2node): if node_id == node_to_take_from: new_vnode2node[vnode_id] = new_node_id vnodes_to_reassign -= 1 if vnodes_to_reassign <= 0: break if vnodes_to_reassign <= 0: break moved_ids = 0 for data_id in xrange(DATA_ID_COUNT): data_id = str(data_id) hsh = unpack_from('>I', md5(str(data_id)).digest())[0] vnode_id = bisect_left(vnode_range_starts, hsh % DATA_ID_COUNT) % VNODE_COUNT node_id = vnode2node[vnode_id] new_node_id = new_vnode2node[vnode_id] if node_id != new_node_id: moved_ids += 1 percent_moved = 100.0 * moved_ids / DATA_ID_COUNT print '%d ids moved, %.02f%%' % (moved_ids, percent_moved) 90108 ids moved, 0.90% |
½á¹ûÏÔʾ£¬½öÒÆ¶¯ÁË0.9%µÄÊý¾Ý¡£ÓëÇ°ÃæÏà±È£¬Õû¸ö¼¯ÈºµÄÐÔÄÜ´ó´óÌá¸ßÁË¡£

Ôö¼Ó1%µÄÄÜÁ¦=ÒÆ¶¯0.9%Êý¾Ý
¹Ì»¯Ðé½Úµãµ½Êý¾ÝÏîµÄÓ³Éä
ÓÉÓÚÐé½Úµã¸öÊýÔÚ¼¯ÈºµÄÕû¸öÉúÃüÖÜÆÚÖÐÊDz»»á±ä»¯µÄ£¬ËüÓëÊý¾ÝÏîµÄÓ³Éä¹ØÏµ²»»á·¢Éú±ä»¯£¬¸Ä±äµÄ½öÊÇvnodeÓënodeµÄÓ³Éä¹ØÏµ£¬ËùÒÔÐè¶ÔÒÔÉÏ´úÂë½øÐÐÓÅ»¯¡£
from struct import unpack_from from hashlib import md5 from time import time NODE_COUNT = 100 DATA_ID_COUNT = 10000000 VNODE_COUNT = 1000 begin = time() vnode2node = [] for vnode_id in xrange(VNODE_COUNT): vnode2node.append(vnode_id % NODE_COUNT) new_vnode2node = list(vnode2node) new_node_id = NODE_COUNT vnodes_to_assign = VNODE_COUNT / (NODE_COUNT + 1) while vnodes_to_assign > 0: for node_to_take_from in xrange(NODE_COUNT): for vnode_id, node_id in enumerate(vnode2node): if node_id == node_to_take_from: vnode2node[vnode_id] = new_node_id vnodes_to_assign -= 1 if vnodes_to_assign <= 0: break if vnodes_to_assign <= 0: break moved_id = 0 for data_id in xrange(DATA_ID_COUNT): data_id = str(data_id) hsh = unpack_from('>I', md5(str(data_id)).digest())[0] vnode_id = hsh % VNODE_COUNT#(1) node_id = vnode2node[vnode_id] new_node_id = new_vnode2node[vnode_id] if node_id != new_node_id: moved_id += 1 percent_moved = 100.0 * moved_id / DATA_ID_COUNT print '%d ids moved, %.02f%%' % (moved_id, percent_moved) print '%d seconds pass ...' % (time() - begin) 90108 ids moved, 0.90% |
Ô¤ÉèºÏÀíµÄÐé½áµãÊý
ÏÖÔÚÒѹ¹½¨ºÃÁËÒ»ÖÂÐÔ¹þÏ£ringµÄÔÐÍ¡£µ«ÊÇ´æÔÚÒ»¸öÎÊÌ⣬ÒÔÉÏÀý×ÓÖУ¬1000¸öÐé½áµã¶ÔÓ¦×Å100¸ö½áµã£¬½áµã±ä¶¯Ê±£¬Ðé½áµã¾ÍÐèÒªÖØÐ·ÖÅäµ½½áµã¡£µ±100¸ö½áµãÀ©Õ¹µ½1001¸ö½áµãʱ£¬´ËʱÖÁÉÙÓÐÒ»¸ö½áµã·ÖÅä²»µ½Ðé½áµã£¬ÄÇô¾ÍÐèÒªÔÙÔö¼ÓÐé½áµãÊý£¬¶øÐé½áµãÊÇÓëÊý¾ÝÏî¶ÔÓ¦µÄ¹þÏ£¹ØÏµ£¬Èç¹û¸Ä±äÁËÐé½ÚµãÊý£¬ÄÇô¾ÍÐèÒªÖØÐ·ÖÅäËùÓеÄÊý¾ÝÏÕ⽫µ¼ÖÂÒÆ¶¯´óÁ¿µÄÊý¾Ý¡£
ËùÒÔÔÚÉèÖÃÐé½áµãÊýµÄʱºò£¬ÐèÒª¶ÔϵͳԤÆÚµÄ¹æÄ£×ö³ä·Ö¿¼ÂÇ£¬¼ÙÈ缯ȺµÄ¹æÄ£²»»á³¬¹ý6000¸ö½áµã£¬ÄÇô¿ÉÒÔ½«Ðé½áµãÊýÉèÖÃΪ½áµãÊýµÄ100±¶¡£ÕâÑù£¬±ä¶¯ÈÎÒâÒ»¸ö½áµãµÄ¸ºÔؽöÓ°Ïì1%µÄÊý¾ÝÏî¡£´ËʱÓÐ6°ÙÍò¸övnodeÊý£¬Ê¹ÓÃ2bytesÀ´´æ´¢½áµãÊý(0~65535)¡£»ù±¾µÄÄÚ´æÕ¼ÓÃÊÇ6*106*2bytes=12Mb£¬¶ÔÓÚ·þÎñÆ÷À´ËµÍêÈ«¿ÉÒÔ³ÐÊÜ¡£
ÔÚ´Ë£¬ÒýÈëÁË2¸ö¸ÅÄ
ÔÚswiftÖУ¬ÎªÁËÇø·ÖvnodeºÍnode£¬½«vnode³ÆÎªpartition¡£
λ²Ù×÷´úÌæÈ¡Ä£²Ù×÷
´ËÍ⣬ÔÚ¼ÆËã»úÖÐʹÓÃλ²Ù×÷À´È·¶¨partitionµÄλÖñÈȡģ¸ü¿ì¡£ËùÒÔ£¬ÔÚ´ËÒýÈëÁËpartition
powerµÄ¸ÅÄî¡£
¼ÌÐø¸Ä½øringµÄ´úÂ룬ÉèÓÐ65536¸önode(2^16)£¬ÓÐ128£¨2^7£©±¶¸öpartitionÊý(2^23)¡£ÓÉÓÚMD5ÂëÊÇ32λµÄ£¬Ê¹ÓÃPARTITION_SHIFT(µÈÓÚ32-
PARTITION_POWER)½«Êý¾ÝÏîµÄMD5¹þÏ£ÖµÓ³Éäµ½partitionµÄ2^23µÄ¿Õ¼äÖС£
from array import array from hashlib import md5 from struct import unpack_from PARTITION_POWER = 23 PARTITION_SHIFT = 32 - PARTITION_POWER NODE_COUNT = 65536 DATA_ID_COUNT = 100000000 part2node = array('H') for part in xrange(2 ** PARTITION_POWER): part2node.append(part % NODE_COUNT) node_counts = [0] * NODE_COUNT for data_id in xrange(DATA_ID_COUNT): data_id = str(data_id) part = unpack_from('>I', md5(str(data_id)).digest())[0] >> PARTITION_SHIFT node_id = part2node[part] node_counts[node_id] += 1 desired_count = DATA_ID_COUNT / NODE_COUNT print '%d: Desired data ids per node' % desired_count max_count = max(node_counts) over = 100.0 * (max_count - desired_count) / desired_count print '%d: Most data ids on one node, %.02f%% over' % \ (max_count, over) min_count = min(node_counts) under = 100.0 * (desired_count - min_count) / desired_count print '%d: Least data ids on one node, %.02f%% under' % \ (min_count, under) 1525: Desired data ids per node 1683: Most data ids on one node, 10.36% over 1360: Least data ids on one node, 10.82% under |
Êý¾Ý²»¾ùºâµÄÔÒòÔÚÓÚÊý¾ÝÏîÏà¶ÔÓÚpartitionÊý̫СÁË(10^8¶Ô2^23)£¬ÈôÊý¾ÝÏîÔ½¶à£¬·Ö²¼Ô½¾ùºâ¡£
±£Ö¤Êý¾Ý°²È«£¬ÒýÈëreplica
µ½Ä¿Ç°ÎªÖ¹£¬ÔÚ¼¯ÈºÖеÄÊý¾ÝÔÚ±¾µØ½ÚµãÉÏÖ»ÓÐÒ»·Ý£¬½ÚµãÒ»µ©·¢Éú¹ÊÕϾͿÉÄÜ»áÔì³ÉÊý¾ÝµÄÓÀ¾ÃÐÔ¶ªÊ§¡£Òò´Ë£¬SwiftÖÐÒýÈëreplicaµÄ¸ÅÄîʹÓÃÈßÓั±¾À´±£Ö¤Êý¾ÝµÄ°²È«¡£replicaµÄĬÈÏֵΪ3£¬ÆäÀíÂÛÒÀ¾ÝÖ÷ÒªÀ´Ô´ÓÚNWR²ßÂÔ¡£
NWRÊÇÒ»ÖÖÔÚ·Ö²¼Ê½´æ´¢ÏµÍ³ÖÐÓÃÓÚ¿ØÖÆÒ»ÖÂÐÔ¼¶±ðµÄÒ»ÖÖ²ßÂÔ¡£ÔÚAmazonµÄDynamoÔÆ´æ´¢ÏµÍ³ÖУ¬¾ÍÓ¦ÓÃNWRÀ´¿ØÖÆÒ»ÖÂÐÔ¡£Ã¿¸ö×ÖĸµÄºÒåÈçÏ£º
N£ºÍ¬Ò»·ÝÊý¾ÝµÄReplicaµÄ·ÝÊý
W£ºÊǸüÐÂÒ»¸öÊý¾Ý¶ÔÏóµÄʱºòÐèҪȷ±£³É¹¦¸üеķÝÊý
R£º¶Áȡһ¸öÊý¾ÝÐèÒª¶ÁÈ¡µÄReplicaµÄ·ÝÊý
ÔÚ·Ö²¼Ê½ÏµÍ³ÖУ¬Êý¾ÝµÄµ¥µãÊDz»ÔÊÐí´æÔڵġ£¼´ÏßÉÏÕý³£´æÔÚµÄReplicaÊýÁ¿ÊÇ1µÄÇé¿öÊǷdz£Î£Ïյģ¬ÒòΪһµ©Õâ¸öReplicaÔٴδíÎ󣬾ͿÉÄÜ·¢ÉúÊý¾ÝµÄÓÀ¾ÃÐÔ´íÎó¡£¼ÙÈçÎÒÃǰÑNÉèÖóÉΪ2£¬ÄÇô£¬Ö»ÒªÓÐÒ»¸ö´æ´¢½Úµã·¢ÉúË𻵣¬¾Í»áÓе¥µãµÄ´æÔÚ¡£ËùÒÔN±ØÐë´óÓÚ2¡£NÔ¼¸ß£¬ÏµÍ³µÄά»¤ºÍÕûÌå³É±¾¾ÍÔ½¸ß¡£¹¤Òµ½çͨ³£°ÑNÉèÖÃΪ3¡£
Òò´Ë£¬ÔÚringµÄ´úÂëÖÐÒýÈëreplica£¬ÊýÁ¿ÉèÖÃΪ3£¬ÆäÖÐ node_ids¼Ç¼µÄÊÇ3¸öreplica´æ·ÅµÄnode
id¡£part2node[part]ÊǸù¾Ýpartition id ÕÒµ½¶ÔÓ¦µÄnode id¡£
from array import array from hashlib import md5 from struct import unpack_from REPLICAS = 3 PARTITION_POWER = 16 PARTITION_SHIFT = 32 - PARTITION_POWER PARTITION_MAX = 2 ** PARTITION_POWER - 1 NODE_COUNT = 256 DATA_ID_COUNT = 10000000 part2node = array('H') for part in xrange(2 ** PARTITION_POWER): part2node.append(part % NODE_COUNT) node_counts = [0] * NODE_COUNT for data_id in xrange(DATA_ID_COUNT): data_id = str(data_id) part = unpack_from('>I', md5(str(data_id)).digest())[0] >> PARTITION_SHIFT node_ids = [part2node[part]] node_counts[node_ids[0]] += 1 for replica in xrange(1, REPLICAS): while part2node[part] in node_ids: part += 1 if part > PARTITION_MAX: part = 0 node_ids.append(part2node[part]) node_counts[node_ids[-1]] += 1 desired_count = DATA_ID_COUNT / NODE_COUNT * REPLICAS print '%d: Desired data ids per node' % desired_count max_count = max(node_counts) over = 100.0 * (max_count - desired_count) / desired_count print '%d: Most data ids on one node, %.02f%% over' % \ (max_count, over) min_count = min(node_counts) under = 100.0 * (desired_count - min_count) / desired_count print '%d: Least data ids on one node, %.02f%% under' % \ (min_count, under) 117186: Desired data ids per node 118133: Most data ids on one node, 0.81% over 116093: Least data ids on one node, 0.93% under |
½á¹ûÈçÉÏ£¬ÓÉÓÚʹÓÃÁË256¸önode£¬·Ö²¼Ô¼ÓÐ1%µÄ²¨¶¯£¬±È½Ï¾ùÔÈÁË¡£
µ«ÊÇ´æÔÚÓÐ2¸öÎÊÌ⣺
1.Ëæ»ú·ÖÅäÓ³Éä
Ê×ÏÈpart2nodeÊÇ»ùÓÚ˳Ðò·ÖÅäµÄ£¬¶ÔÓÚ¸ø¶¨µÄnode£¬ËüËùÓÐpartitionµÄcopies¾ùÔÚÁíÁ½¸önodeÉÏ£¬Èôij¸önodeƵ·±å´»ú£¬ÓëËüÏàÓ¦µÄÁ½¸önodeÉϵÄÊý¾ÝÏîÐèҪƵ·±¸´ÖÆ¡£½â¾ö·½·¨ÊÇËæ»ú·ÖÅäpartitionµ½nodeµÄÓ³Éä¡£
2.·ÖÇøÈÝÈÌÐÔºÍÒýÈëzone
Æä´ÎÊǵ±Ç°µÄ¼¯Èº²»Âú×ãCAPÔÀíÖеķÖÇøÈÝÈÌÐÔ£¨Partition Tolerance£©¡£Gilbert
ºÍLynch½«·ÖÇøÈÝÈÌÐÔ¶¨ÒåÈçÏ£º
No set of failures less than total network
failure is allowed to cause the system to respond incorrectly¡£
·Òëһϣ¬¾ÍÊdzýÁËÈ«²¿ÍøÂç½Úµã·¢Éú¹ÊÕÏÒÔÍ⣬ËùÓÐ×ӽڵ㼯ºÏµÄ¹ÊÕ϶¼²»ÔÊÐíµ¼ÖÂÕû¸öϵͳµÄÏìÓ¦¹ÊÕÏ¡£
ÏÖÔÚΪֹ£¬ÕâЩnode¶¼ÔÚÒ»¸ö»ú¼ÜÉÏ£¬Ò»µ©·¢Éú¶Ïµç£¬ÍøÂç¹ÊÕÏ£¬ÄÇô½«É¥Ê§ÕâÒ»ÐÔÖÊ¡£Òò´Ë¾ÍÐèÒªÒ»ÖÖ»úÖÆ¶Ô»úÆ÷µÄÎïÀíλÖýøÐиôÀë¡£ËùÒÔÒýÈëÁËzoneµÄ¸ÅÄî¡£
ÔÚring´úÂëÖÐÒýÈëzone_count£¬°ÑÕâЩnode·Ö¸îµ½16¸özoneÖÐÈ¥¡£ÆäÖÐpartitionµÄreplica²»ÄÜ·ÅÔÚͬһ¸önodeÉÏ»òͬһ¸özoneÄÚ¡£
from array import array from hashlib import md5 from random import shuffle from struct import unpack_from REPLICAS = 3 PARTITION_POWER = 16 PARTITION_SHIFT = 32 - PARTITION_POWER PARTITION_MAX = 2 ** PARTITION_POWER - 1 NODE_COUNT = 256 ZONE_COUNT = 16 DATA_ID_COUNT = 10000000 node2zone = [] while len(node2zone) < NODE_COUNT: zone = 0 while zone < ZONE_COUNT and len(node2zone) < NODE_COUNT: node2zone.append(zone) zone += 1 part2node = array('H') for part in xrange(2 ** PARTITION_POWER): part2node.append(part % NODE_COUNT) shuffle(part2node) node_counts = [0] * NODE_COUNT zone_counts = [0] * ZONE_COUNT for data_id in xrange(DATA_ID_COUNT): data_id = str(data_id) part = unpack_from('>I', md5(str(data_id)).digest())[0] >> PARTITION_SHIFT node_ids = [part2node[part]] zones = [node2zone[node_ids[0]]] node_counts[node_ids[0]] += 1 zone_counts[zones[0]] += 1 for replica in xrange(1, REPLICAS): while part2node[part] in node_ids and \ node2zone[part2node[part]] in zones: part += 1 if part > PARTITION_MAX: part = 0 node_ids.append(part2node[part]) zones.append(node2zone[node_ids[-1]]) node_counts[node_ids[-1]] += 1 zone_counts[zones[-1]] += 1 desired_count = DATA_ID_COUNT / NODE_COUNT * REPLICAS print '%d: Desired data ids per node' % desired_count max_count = max(node_counts) over = 100.0 * (max_count - desired_count) / desired_count print '%d: Most data ids on one node, %.02f%% over' % \ (max_count, over) min_count = min(node_counts) under = 100.0 * (desired_count - min_count) / desired_count print '%d: Least data ids on one node, %.02f%% under' % \ (min_count, under) desired_count = DATA_ID_COUNT / ZONE_COUNT * REPLICAS print '%d: Desired data ids per zone' % desired_count max_count = max(zone_counts) over = 100.0 * (max_count - desired_count) / desired_count print '%d: Most data ids in one zone, %.02f%% over' % \ (max_count, over) min_count = min(zone_counts) under = 100.0 * (desired_count - min_count) / desired_count print '%d: Least data ids in one zone, %.02f%% under' % \ (min_count, under) 117186: Desired data ids per node 118782: Most data ids on one node, 1.36% over 115632: Least data ids on one node, 1.33% under 1875000: Desired data ids per zone 1878533: Most data ids in one zone, 0.19% over 1869070: Least data ids in one zone, 0.32% under |
µ½Ä¿Ç°ÎªÖ¹£¬ringµÄ»ù±¾¹¦Äܶ¼ÒѾÓÐÁË£ºÒ»ÖÂÐÔ¹þÏ£ring¡¢partition¡¢partition
power¡¢replica¡¢zone¡£Ä¿Ç°»¹²îweightÒÔ¼°½«ÒÔÉÏ´úÂë¸ÄдΪÀà·â×°³Émodule¡£
weight
ÒýÈëweightµÄ¸ÅÄĿµÄÊÇ¡°ÄÜÕß¶àÀÍ¡±£º½â¾öδÀ´Ìí¼Ó´æ´¢ÄÜÁ¦¸ü´óµÄnodeʱ£¬Ê¹µÃ¿ÉÒÔ·ÖÅäµ½¸ü¶àµÄpartition¡£ÀýÈ磬2T
ÈÝÁ¿µÄnodeµÄpartitionÊýΪ1TµÄÁ½±¶¡£
ÔÚringµÄ¹¹½¨ÖУ¬¼ÓÈëÁËweightÊôÐÔ¡£±¾ÀýÖÐweight¼òµ¥µØÈ¡1ºÍ2Á½¸öÖµ£¬¸ù¾Ýÿ¸ö½áµãÔÚweightºÍÖеıÈÀý·ÖÅäËùÐèpartitionÊý¡£
from array import array from hashlib import md5 from random import shuffle from struct import unpack_from from time import time class Ring(object): def __init__(self, nodes, part2node, replicas): self.nodes = nodes self.part2node = part2node self.replicas = replicas partition_power = 1 while 2 ** partition_power < len(part2node): partition_power += 1 if len(part2node) != 2 ** partition_power: raise Exception("part2node's length is not an " "exact power of 2") self.partition_shift = 32 - partition_power def get_nodes(self, data_id): data_id = str(data_id) part = unpack_from('>I', md5(data_id).digest())[0] >> self.partition_shift node_ids = [self.part2node[part]] zones = [self.nodes[node_ids[0]]] for replica in xrange(1, self.replicas): while self.part2node[part] in node_ids and \ self.nodes[self.part2node[part]] in zones: part += 1 if part >= len(self.part2node): part = 0 node_ids.append(self.part2node[part]) zones.append(self.nodes[node_ids[-1]]) return [self.nodes[n] for n in node_ids] def build_ring(nodes, partition_power, replicas): begin = time() parts = 2 ** partition_power total_weight = \ float(sum(n['weight'] for n in nodes.itervalues())) for node in nodes.itervalues(): node['desired_parts'] = \ parts / total_weight * node['weight'] part2node = array('H') for part in xrange(2 ** partition_power): for node in nodes.itervalues(): if node['desired_parts'] >= 1: node['desired_parts'] -= 1 part2node.append(node['id']) break else: for node in nodes.itervalues(): if node['desired_parts'] >= 0: node['desired_parts'] -= 1 part2node.append(node['id']) break shuffle(part2node) ring = Ring(nodes, part2node, replicas) print '%.02fs to build ring' % (time() - begin) return ring def test_ring(ring): begin = time() DATA_ID_COUNT = 10000000 node_counts = {} zone_counts = {} for data_id in xrange(DATA_ID_COUNT): for node in ring.get_nodes(data_id): node_counts[node['id']] = \ node_counts.get(node['id'], 0) + 1 zone_counts[node['zone']] = \ zone_counts.get(node['zone'], 0) + 1 print '%ds to test ring' % (time() - begin) total_weight = float(sum(n['weight'] for n in ring.nodes.itervalues())) max_over = 0 max_under = 0 for node in ring.nodes.itervalues(): desired = DATA_ID_COUNT * REPLICAS * \ node['weight'] / total_weight diff = node_counts[node['id']] - desired if diff > 0: over = 100.0 * diff / desired if over > max_over: max_over = over else: under = 100.0 * (-diff) / desired if under > max_under: max_under = under print '%.02f%% max node over' % max_over print '%.02f%% max node under' % max_under max_over = 0 max_under = 0 for zone in set(n['zone'] for n in ring.nodes.itervalues()): zone_weight = sum(n['weight'] for n in ring.nodes.itervalues() if n['zone'] == zone) desired = DATA_ID_COUNT * REPLICAS * \ zone_weight / total_weight diff = zone_counts[zone] - desired if diff > 0: over = 100.0 * diff / desired if over > max_over: max_over = over else: under = 100.0 * (-diff) / desired if under > max_under: max_under = under print '%.02f%% max zone over' % max_over print '%.02f%% max zone under' % max_under if __name__ == '__main__': PARTITION_POWER = 16 REPLICAS = 3 NODE_COUNT = 256 ZONE_COUNT = 16 nodes = {} while len(nodes) < NODE_COUNT: zone = 0 while zone < ZONE_COUNT and len(nodes) < NODE_COUNT: node_id = len(nodes) nodes[node_id] = {'id': node_id, 'zone': zone, 'weight': 1.0 + (node_id % 2)} zone += 1 ring = build_ring(nodes, PARTITION_POWER, REPLICAS) test_ring(ring) 0.10s to build ring 162s to test ring 118581: Most data ids on one node,1.19% over 115537: Least data ids on one node, 1.41% under 1878343: Most data ids in one zone, 0.18% over 1870880: Least data ids in one zone, 0.22% under |
ÿ¸önodeÉÏ·Ö²¼µÄ×î´ó²¨¶¯Îª1.19%ºÍ1.41%£¬¶øzoneÉϵIJ¨¶¯·Ö²¼ÔÚ0.22%ÒÔÏ¡£
×ܽá
ÒÔÉϾÍÊÇringµÄ¹¹½¨ÔÀí·ÖÎö£¬ÒýÈëÒ»ÖÂÐÔ¹þÏ£µÄÔÒòÊÇΪÁ˼õÉÙÓÉÓÚÔö¼Ó½áµãµ¼ÖÂÊý¾ÝÏîÒÆ¶¯µÄÊýÁ¿À´Ìá¸ßµ¥µ÷ÐÔ£¬ÒýÈëpartitionµÄÔÒòÊÇΪÁ˼õÉÙÓÉÓÚ½ÚµãÊý¹ýÉÙµ¼ÖÂÒÆ¶¯¹ý¶àµÄÊý¾ÝÏî¡¢ÒýÈëreplicaµÄÔÒòÊÇ·ÀÖ¹Êý¾Ýµ¥µãÌá¸ßÈßÓàÐÔ£¬ÒýÈëzoneµÄÔÒòÊÇΪÁ˱£Ö¤·ÖÇøÈÝÈÌÐÔ¡¢ÒýÈëweightµÄÔÒòÊÇΪÁ˱£Ö¤partition·ÖÅäµÄ¾ùºâ¡£
ÄÇôRingµÄ½á¹¹ÊÇ·ñ¾Í´ËÖ¹²½ÁËÄØ£¬ÔÚSwift¿ª·¢ÈËÔ±µÄ²©¿ÍÖÐÌáµ½£¬Ö»Òª·¢ÏÖ¸üºÃµÄÊý¾ÝÓ³Éä»úÖÆ£¬¾Í½«RingÍÆµ¹ÖØÀ´£¬ËùÒÔδÀ´Ring»áÈçºÎÑÝ»¯£¬ÔÛÃÇÒ²¿ÉÒÔ²ÎÓëÆäÖУ¬´ÙÆä²»¶ÏµØ½ø»¯¡£
|