±à¼ÍƼö: |
±¾ÎÄÖ÷Òª½²½âÁËĿǰµÄ¼ÆËã¿ò¼Ü´æÔڵĶ̰塢ÔÀí¡¢Ray ϵͳ½á¹¹¡¢ÐÔÄܱíÏֵȵÈÏà¹ØÄÚÈÝ¡£
±¾ÎÄÀ´×ÔÓÚinfoq£¬ÓÉ»ðÁú¹ûÈí¼þAnna±à¼¡¢ÍƼö¡£
|
|
µ¼¶Á£ºÏÂÒ»´úÈ˹¤ÖÇÄÜÓ¦ÓóÌÐòÐèÒª²»¶ÏµØÓë»·¾³½»»¥£¬²¢´ÓÕâЩ½»»¥ÖÐѧϰ¡£Õâ¶ÔϵͳµÄÐÔÄܺÍÁé»îÐÔÌá³öÁËеÄÒªÇ󣬶øÏÖÓеĻúÆ÷ѧϰ¼ÆËã¿ò¼Ü´ó¶àÎÞ·¨Âú×ãÕâЩҪÇó¡£Îª´Ë£¬UC
Berkeley ÏîÄ¿×鿪·¢ÁËÒ»¸öеķֲ¼Ê½¿ò¼Ü Ray£¬²¢ÓÚ½üÈÕÔÚ Arvix ÉÏ·¢±íÁËÏà¹ØÂÛÎÄ£º¡¶Ray:
A Distributed Framework for Emerging AI Applications¡·¡£
ÂÛÎĵÚÒ»×÷ÕßΪ Philipp Moritz ¼° Robert Nishihara£¬ÊÇ UC Berkeley
AMP Lab µÄ²©Ê¿Éú£¬¶ø Michael I. Jordan ºÍ Ion Stoica µÄÃû×ÖÒ²ºÕÈ»ÁÐÓÚÆäÖС£
Michael I. Jordan £ºUC Berkeley µçÆø¹¤³ÌÓë¼ÆËã»ú¿ÆÑ§ÏµºÍͳ¼ÆÏµ½Ü³ö½ÌÊÚ£¬ÊÇÃÀ¹ú¹ú¼Ò¿ÆÑ§Ôº¡¢ÃÀ¹ú¹ú¼Ò¹¤³ÌÔº¡¢ÃÀ¹úÒÕÊõÓë¿ÆÑ§ÔºÈýԺԺʿ£¬ÊÇ»úÆ÷ѧϰÁìÓòΨһ»ñ´Ë³É¾ÍµÄ¿ÆÑ§¼Ò¡£2016
Ä꣬Ëû±» Semantic Scholar ÆÀΪ¡°×î¾ßÓ°ÏìÁ¦µÄ¼ÆËã»ú¿ÆÑ§¼Ò¡±¡£
Ion Stoica £ºUC Berkeley ¼ÆËã»úϵ½ÌÊÚ£¬AMPLab ¹²Í¬´´Ê¼ÈË£¬µ¯ÐÔ P2P
ÐÒé Chord¡¢¼¯ÈºÄÚ´æ¼ÆËã¿ò¼Ü Spark¡¢¼¯Èº×ÊÔ´¹ÜÀíÆ½Ì¨ Mesos ºËÐÄ×÷Õß¡£
ĿǰµÄ¼ÆËã¿ò¼Ü´æÔڵĶ̰å
Èç½ñ´ó²¿·ÖÈ˹¤ÖÇÄÜÓ¦Óö¼ÊÇ»ùÓÚ¾ÖÏÞÐԽϴóµÄ¼à¶½Ñ§Ï°µÄ·¶Ê½¶ø¿ª·¢µÄ£¬¼´Ä£ÐÍÔÚÏßϽøÐÐѵÁ·£¬È»ºó²¿Êðµ½·þÎñÆ÷ÉϽøÐÐÏßÉÏÔ¤²â¡£Ëæ×ŸÃÁìÓòµÄ³ÉÊ죬»úÆ÷ѧϰӦÓÃÐèÒª¸ü¶àµØÔÚ¶¯Ì¬»·¾³ÏÂÔËÐУ¬ÏìÓ¦»·¾³Öеı仯£¬²¢ÇÒ²ÉÓÃһϵÁе͝×÷À´Íê³É¼È¶¨Ä¿±ê¡£ÕâЩҪÇó×ÔÈ»µØ½¨Á¢ÔÚÔöǿѧϰ£¨Reinforcement
Learning£¬RL£©·¶Ê½ÖУ¬¼´ÔÚ²»È·¶¨µÄ»·¾³ÖÐÁ¬ÐøÑ§Ï°¡£
RL Ó¦ÓÃÓ봫ͳµÄ¼à¶½Ñ§Ï°Ó¦ÓÃÓÐÈý¸ö²»Í¬Ö®´¦£º
1£©RL Ó¦ÓÃÑÏÖØÒÀÀµ·ÂÕæÀ´Ì½Ë÷ËùÔÚ״̬¼°²Ù×÷½á¹û¡£ÕâÐèÒª´óÁ¿µÄ¼ÆË㣬ÏÖʵÇé¿öÏ£¬Ò»¸öÓ¦Óôó¸ÅÐèÒª½øÐÐÒÚÍò´Î·ÂÕæ¡£
2£©RL Ó¦ÓõļÆËãͼÊÇÒìÖʵġ¢¶¯Ì¬±ä»¯µÄ¡£Ò»´Î·ÂÕæ¿ÉÄܻỨµô¼¸ºÁÃëµ½¼¸·ÖÖÓµÄʱ¼ä£¬·ÂÕæµÄ½á¹ûÓÖ¾ö¶¨Î´À´·ÂÕæµÄ²ÎÊý¡£
3£©Ðí¶à RL Ó¦ÓóÌÐò£¬Èç»úÆ÷ÈË¿ØÖÆ»ò×ÔÖ÷¼ÝÊ»£¬ÐèҪѸËÙ²ÉÈ¡Ðж¯£¬ÒÔÏìÓ¦²»¶Ï±ä»¯µÄ»·¾³¡£
Òò´Ë£¬ÎÒÃÇÐèÒªÒ»¸öÄÜÖ§³ÖÒìÖʺͶ¯Ì¬¼ÆËãͼ£¬Í¬Ê±ÒÔºÁÃë¼¶ÑÓ³ÙÿÃë´¦ÀíÊýÒÔ°ÙÍò¼ÆÈÎÎñµÄ¼ÆËã¿ò¼Ü¡£¶øÄ¿Ç°µÄ¼ÆËã¿ò¼Ü»òÊÇÎÞ·¨´ïµ½ÆÕͨ
RL Ó¦ÓõÄÑÓ³ÙÒªÇó£¨MapReduce¡¢Apache Spark¡¢CIEL£©£¬»òÊÇʹÓþ²Ì¬¼ÆËãͼ£¨TensorFlow¡¢Naiad¡¢MPI¡¢Canary£©¡£
RL Ó¦ÓöÔϵͳÌá³öÁËÁé»îÐÔ¡¢±íÏÖÐÔÄÜÒÔ¼°Ò׿ª·¢µÄÒªÇó£¬Ray ϵͳÔòÊÇΪÂú×ãÕâЩҪÇó¶øÉè¼ÆµÄ¡£
ʾÀý

¾µäRL ѵÁ·Ó¦ÓÃα´úÂë

ÓÃRay ʵÏÖµÄpython ´úÂëÑùÀý
ÔÚRay ÖУ¬Í¨¹ý@ray.remote ÉùÃ÷remote º¯ÊýºÍactor¡£µ±µ÷ÓÃremote
º¯ÊýºÍactor methods ʱ»áÁ¢¼´·µ»ØÒ»¸öfuture£¨¶ÔÏóid£©£¬Ê¹ÓÃray.get£¨£©¿ÉÒÔͬ²½»ñÈ¡¸Ãid
¶ÔÓ¦µÄ¶ÔÏ󣬿ÉÒÔ´«µÝ¸øºóÐøµÄremote º¯ÊýºÍactor methods À´±àÂëÈÎÎñÒÀÀµÏÿ¸öactor
ÓÐÒ»¸ö»·¾³¶ÔÏó self.env£¬ÔÚÈÎÎñÖ®¼ä¹²Ïí״̬¡£

ÉÏͼÊǵ÷ÓÃtrain_policy.remote() ¶ÔÓ¦µÄÈÎÎñͼ¡£remote º¯ÊýºÍactor
methods µ÷ÓöÔÓ¦ÈÎÎñͼÖеÄÈÎÎñ¡£Í¼ÖÐÓÐ2 ¸öactor£¬Ã¿¸öactor Ö®¼äµÄ״̬±ß£¨stateful
edges£©Òâζ×ÅËûÃǹ²Ïí¿É±ä״̬¡£´Ótrain_policy µ½ËüËùµ÷ÓõÄÈÎÎñÖ®¼äÓпØÖƱߣ¨control
edges£©¡£Òª²¢ÐÐѵÁ·²ßÂÔ£¨policy£©£¬¿ÉÒÔ¶à´Îµ÷ÓÃtrain_policy.remote()¡£
ÔÀí
ΪÁËÖ§³ÖRL Ó¦ÓÃËù´øÀ´µÄÒìÖʺͶ¯Ì¬¹¤×÷¸ººÉÒªÇó£¬Ray ²ÉÓÃÓëCIEL ÀàËÆµÄ¶¯Ì¬ÈÎÎñͼ¼ÆËãÄ£ÐÍ¡£³ýÁËCIEL
µÄÈÎÎñ²¢Ðмò»¯Í⣬Ray ÔÚÖ´ÐÐÄ£ÐͶ¥²ãÌṩÁË´úÂë¼ò»¯£¬Äܹ»Ö§³ÖÖîÈçµÚÈý·½·ÂÕæµÄ״̬½á¹¹¡£
Ray ϵͳ½á¹¹
ΪÁËÔÚÖ§³Ö¶¯Ì¬¼ÆËãͼµÄͬʱÂú×ãÑϸñµÄÐÔÄÜÒªÇó£¬Ray ²ÉȡһÖÖеĿɺáÏòÀ©Õ¹µÄ·Ö²¼Ê½½á¹¹¡£Ray µÄ½á¹¹ÓÉÁ½²¿·Ö×é³É£ºapplication
²ãºÍ system ²ã¡£Application ²ãʵÏÖ API ºÍ¼ÆËãÄ£ÐÍ£¬Ö´Ðзֲ¼Ê½¼ÆËãÈÎÎñ¡£System
²ã¸ºÔðÈÎÎñµ÷¶ÈºÍÊý¾Ý¹ÜÀí£¬À´Âú×ã±íÏÖÐÔÄܺÍÈÝ´íµÄÒªÇó¡£

Ray ϵͳ½á¹¹
¸Ã½á¹¹»ùÓÚÁ½¸ö¹Ø¼üÏë·¨£º
1£©È«¾Ö״̬´æ´¢ GSC£¨Global Control Store£©¡£ÏµÍ³ËùÓеĿØÖÆ×´Ì¬´æ´¢ÔÚ GSC
ÖУ¬ÕâÑùϵͳÆäËû×é¼þ¿ÉÒÔÊÇÎÞ״̬µÄ¡£²»½ö¼ò»¯Á˶ÔÈÝ´íµÄÖ§³Ö£¨³öÏÖ´íÎóʱ£¬×é¼þ¿ÉÒÔ´Ó GSC ÖжÁÈ¡×î½ü״̬²¢ÖØÐÂÆô¶¯£©£¬Ò²Ê¹µÃÆäËû×é¼þ¿ÉÒÔºáÏòÀ©Õ¹£¨¸Ã×é¼þµÄ¸´ÖÆ»òË鯬¿ÉÒÔͨ¹ý
GSC ״̬¹²Ïí£©¡£
2£©×Ôµ×ÏòÉϵķֲ¼Ê½µ÷¶ÈÆ÷¡£ÈÎÎñÓÉ driver ºÍ worker ×Ôµ×ÏòÉϵØÌá½»¸ø¾Ö²¿µ÷¶ÈÆ÷£¨local
scheduler£©¡£¾Ö²¿µ÷¶ÈÆ÷¿ÉÒÔÑ¡Ôñ¾Ö²¿µ÷¶ÈÈÎÎñ£¬»ò½«ÈÎÎñ´«µÝ¸øÈ«¾Öµ÷¶ÈÆ÷¡£Í¨¹ýÔÊÐí±¾µØ¾ö²ß£¬½µµÍÁËÈÎÎñÑÓ³Ù£¬²¢ÇÒͨ¹ý¼õÉÙÈ«¾Öµ÷¶ÈÆ÷µÄ¸ºµ££¬Ôö¼ÓÁËϵͳµÄÍÌÍÂÁ¿¡£

×Ôµ×ÏòÉϵķֲ¼Ê½µ÷¶ÈÆ÷
ÐÔÄܱíÏÖ
1£©¿ÉÀ©Õ¹ÐԺͱíÏÖÐÔÄÜ
¶Ëµ½¶Ë¿ÉÀ©Õ¹ÐÔ¡£ GCS µÄÖ÷ÒªÓÅÊÆÊÇÔöǿϵͳµÄºáÏò¿ÉÀ©Õ¹ÐÔ¡£ÎÒÃÇ¿ÉÒԹ۲쵽¼¸ºõÏßÐÔµÄÈÎÎñÍÌÍÂÁ¿Ôö³¤¡£ÔÚ
60 ½Úµã£¬Ray ¿ÉÒÔ´ïµ½³¬¹ýÿÃë 100 Íò¸öÈÎÎñµÄÍÌÍÂÁ¿£¬²¢ÏßÐÔµØÔÚ 100 ¸ö½ÚµãÉϳ¬¹ýÿÃë
180 Íò¸öÈÎÎñ¡£×îÓұߵÄÊý¾ÝµãÏÔʾ£¬Ray ¿ÉÒÔÔÚ²»µ½Ò»·ÖÖÓµÄʱ¼ä´¦Àí 1 ÒÚ¸öÈÎÎñ£¨54s£©¡£

È«¾Öµ÷¶ÈÆ÷µÄÖ÷ÒªÖ°ÔðÊÇÔÚÕû¸öϵͳÖб£³Ö¸ºÔØÆ½ºâ¡£Driver ÔÚµÚÒ»¸ö½ÚµãÌá½»ÁË100K ÈÎÎñ£¬ÓÉÈ«¾Öµ÷¶ÈÆ÷ƽºâ·ÖÅ䏸21
¸ö¿ÉÓýڵ㡣

¶ÔÏó´æ´¢ÐÔÄÜ¡£¶ÔÓÚ´ó¶ÔÏ󣬵¥Ò»¿Í»§¶ËÍÌÍÂÁ¿³¬¹ýÁË15GB/s£¨ºìÉ«£©£¬¶ÔÓÚС¶ÔÏ󣬶ÔÏó´æ´¢IOPS
´ïµ½18K£¨ÇàÉ«£©£¬Ã¿´Î²Ù×÷ʱ¼äÔ¼56 ΢Ãë¡£

2£©ÈÝ´íÐÔ
´Ó¶ÔÏóʧ°ÜÖлָ´¡£Ëæ×Å worker ½Úµã±»Öսᣬ»îÔ¾µÄ¾Ö²¿µ÷¶ÈÆ÷»á×Ô¶¯´¥·¢¶ªÊ§¶ÔÏóÖØ½¨¡£ÔÚÖØ½¨ÆÚ¼ä£¬driver
×î³õÌá½»µÄÈÎÎñ±»¸éÖã¬ÒòΪËüÃǵÄÒÀÀµ¹ØÏµ²»ÄÜÂú×ã¡£µ«ÊÇÕûÌåµÄÈÎÎñÍÌÍÂÁ¿±£³ÖÎȶ¨£¬ÍêÈ«ÀûÓÿÉÓÃ×ÊÔ´£¬Ö±µ½¶ªÊ§µÄÒÀÀµÏî±»ÖØ½¨¡£

·Ö²¼Ê½ÈÎÎñµÄÍêȫ͸Ã÷ÈÝ´í¡£ÐéÏß±íʾ¼¯ÈºÖеĽڵãÊý¡£ÇúÏßÏÔʾÐÂÈÎÎñ£¨ÇàÉ«£©ºÍÖØÐÂÖ´ÐÐÈÎÎñ£¨ºìÉ«£©µÄÍÌÍÂÁ¿£¬µ½210s
ʱ£¬Ô½À´Ô½¶àµÄ½Úµã¼Ó»Øµ½ÏµÍ³£¬Ray ¿ÉÒÔÍêÈ«»Ö¸´µ½³õʼµÄÈÎÎñÍÌÍÂÁ¿¡£
´Óactor ʧ°ÜÖлָ´¡£Í¨¹ý½«Ã¿¸öactor µÄ·½·¨µ÷ÓñàÂëµ½ÒÀÀµ¹ØÏµÍ¼ÖУ¬ÎÒÃÇ¿ÉÒÔÖØÓÃͬһ¶ÔÏóÖØ¹¹»úÖÆ¡£

t=200s ʱ£¬ÎÒÃÇÍ£Ö¹ 10 ¸ö½ÚµãÖÐµÄ 2 ¸ö£¬µ¼Ö¼¯ÈºÖÐ 2000 ¸ö actor ÖеÄ
400 ¸öÐèÒªÔÚÊ£Óà½ÚµãÉϻָ´¡££¨a£©ÏÔʾµÄÊÇûÓÐÖмä½Úµã״̬±»´æ´¢µÄ¼«¶ËÇé¿ö¡£µ÷ÓöªÊ§µÄ actor
µÄ·½·¨±ØÐëÖØÐ´®ÐÐÖ´ÐУ¨t = 210-330s£©¡£¶ªÊ§µÄ½ÇÉ«½«×Ô¶¯·Ö²¼ÔÚ¿ÉÓýڵãÉÏ£¬ÍÌÍÂÁ¿ÔÚÖØ½¨ºóÍêÈ«»Ö¸´¡££¨b£©ÏÔʾµÄÊÇͬÑù¹¤×÷¸ºÔØÏ£¬Ã¿
10 ´Î·½·¨µ÷ÓÃÿ¸ö actor ×Ô¶¯½øÐÐÁËÒ»´Î checkpoint ´æ´¢¡£½ÚµãʧЧºó£¬´ó²¿·ÖÖØ½¨ÊÇͨ¹ýÖ´ÐÐ
checkpoint ÈÎÎñÖØ½¨ actor µÄ״̬£¨t = 210-270s£©¡£
GCS ¸´ÖÆÏûºÄ¡£ÎªÁËʹ GCS ÈÝ´í£¬ÎÒÃǸ´ÖÆÃ¿¸öÊý¾Ý¿âË鯬¡£µ±¿Í»§¶ËдÈë GCS µÄÒ»¸öËéÆ¬Ê±£¬Ëü½«Ð´Èë¸´ÖÆµ½ËùÓи±±¾¡£Í¨¹ý¼õÉÙ
GCS µÄË鯬ÊýÁ¿£¬ÎÒÃÇÈËΪµØÊ¹ GCS ³ÉΪ¹¤×÷¸ºÔØµÄÆ¿¾±£¬Ë«Ïò¸´ÖƵĿªÏúСÓÚ 10%¡£
3£©RL Ó¦ÓÃ
ÎÒÃÇÓà Ray ʵÏÖÁËÁ½ÖÖ RL Ëã·¨£¬ÓëרΪÕâÁ½ÖÖËã·¨Éè¼ÆµÄϵͳ½øÐжԱȣ¬Ray ¿ÉÒÔ¸ÏÉÏÉõÖÁ³¬Ô½Ìض¨µÄϵͳ¡£³ý´ËÖ®Í⣬ʹÓÃ
Ray ÔÚ¼¯ÈºÉÏ·Ö²¼ÕâЩËã·¨Ö»ÐèÒªÔÚË㷨ʵÏÖÖÐÐ޸ĺÜÉÙ¼¸ÐдúÂë¡£
ES Ëã·¨£¨Evolution Strategies£©

Ray ºÍ²Î¿¼ÏµÍ³ÊµÏÖ ES Ëã·¨ÔÚ Humanoid v1 ÈÎÎñÉÏ´ïµ½ 6000 ·ÖËùÐèʱ¼ä¶Ô±È¡£
ÔÚ Ray ÉÏʵÏÖµÄ ES Ëã·¨¿ÉÒԺܺõØÀ©Õ¹µ½ 8192 ºË£¬¶øÌØÖƵÄϵͳÔÚ 1024 ºËºó±ãÎÞ·¨ÔËÐС£ÔÚ
8192 ºËÉÏ£¬ÎÒÃÇÈ¡µÃÁËÖÐֵΪ 3.7 ·ÖÖÓµÄЧ¹û£¬±ÈĿǰ×îºÃЧ¹û¿ìÁ½±¶¡£
PPO Ëã·¨£¨Proximal Policy Optimization£©
ΪÁËÆÀ¹À Ray ÔÚµ¥Ò»½ÚµãºÍ¸üС RL ¹¤×÷¸ºÔصÄÐÔÄÜ£¬ÎÒÃÇÔÚ Ray ÉÏʵÏÖÁË PPO Ëã·¨£¬Óë
OpenMPI ʵÏÖµÄËã·¨½øÐжԱȡ£

MPI ºÍ Ray ʵÏÖ PPO Ëã·¨ÔÚ Humanoid v1 ÈÎÎñÉÏ´ïµ½ 6000 ·ÖËùÐèʱ¼ä¶Ô±È¡£
Óà Ray ʵÏÖµÄ PPO Ëã·¨³¬Ô½ÁËÌØÊâµÄ MPI ʵÏÖ£¬²¢ÇÒʹÓà GPU ¸üÉÙ¡£
¿ØÖÆ·ÂÕæ»úÆ÷ÈË
ʵÑé±íÃ÷£¬Ray ¿ÉÒԴﵽʵʱ¿ØÖÆÄ£Äâ»úÆ÷È˵ÄÈíʵʱҪÇó¡£Ray µÄÇý¶¯³ÌÐòÄÜÔËÐÐÄ£Äâ»úÆ÷ÈË£¬²¢Ôڹ̶¨µÄʱ¼ä¼ä¸ô²ÉÈ¡Ðж¯£¬´Ó
1 ºÁÃëµ½ 30 ºÁÃ룬ÒÔÄ£ÄⲻͬµÄʵʱҪÇó¡£
δÀ´¹¤×÷
¿¼Âǵ½¹¤×÷¸ºÔØµÄÆÕ±éÐÔ£¬ÌØÊâµÄÓÅ»¯ÊDZȽÏÄѵġ£ÀýÈ磬±ØÐëÔÚûÓмÆËãͼµÄÈ«²¿ÖªÊ¶Çé¿öϲÉÈ¡µ÷¶È¾ö²ß¡£Ray
µÄµ÷¶È¾ö²ß»òÐíÐèÒª¸ü¸´ÔÓµÄÉèÖᣳý´ËÖ®Í⣬ÿ¸öÈÎÎñµÄ´æ´¢Æ×ϵÐèÒªÖ´ÐÐÀ¬»øÊÕ¼¯²ßÂÔ£¬ÒÔÔÚ GCS ÖÐÏÞÖÆ´æ´¢³É±¾£¬ÕâÊÇĿǰÕýÔÚ¿ª·¢µÄ¹¦ÄÜ¡£
µ± GCS µÄÏûºÄ³ÉΪƿ¾±Ê±£¬¿ÉÒÔͨ¹ýÔö¼Ó¸ü¶àµÄË鯬À´À©Õ¹È«¾Öµ÷¶ÈÆ÷¡£Ä¿Ç°»¹ÐèÒªÊÖ¶¯ÉèÖà GCS
Ë鯬ºÍÈ«¾Öµ÷¶ÈÆ÷µÄÊýÁ¿£¬Î´À´½«¿ª·¢×ÔÊÊÓ¦Ëã·¨À´×Ô¶¯µ÷ÕûËüÃǵÄÊýÁ¿¡£¿¼Âǵ½ GCS ½á¹¹Îª¸Ãϵͳ´øÀ´µÄÓÅÊÆ£¬×÷ÕßÈÏΪ¼¯Öл¯¿ØÖÆ×´Ì¬ÊÇδÀ´·Ö²¼Ê½ÏµÍ³µÄ¹Ø¼üÉè¼ÆÔªËØ¡£ |