ʲôÊÇPaddlePaddle
PaddlePaddleÊÇÒ»¸öÒ×Óõġ¢¸ßЧµÄ¡¢Áé»îµÄºÍ¿ÉÀ©Õ¹µÄÉî¶Èѧϰƽ̨£¬×î³õÓɰٶȿª·¢£¬Ä¿µÄÊǽ«Éî¶ÈѧϰӦÓÃÓÚ°Ù¶È×Ô2014ÄêÖ®ºóµÄ²úÆ·¡£
ʹÓÃPaddlePaddleËùÖ§³ÖµÄ15¸ö°Ù¶È²úÆ·ÒѾ´´ÔìÁË50¶àÏгɹû£¬Æä·¶Î§´ÓËÑË÷ÒýÇæ¡¢ÔÚÏß¹ã¸æ£¬µ½ÎÊ´ðºÍϵͳ°²È«¡£
ÔÚ2016Äê9Ô£¬°Ù¶È¿ªÔ´ÁËPaddlePaddle£¬ÕâÂíÉϾÍÎüÒýÁËÐí¶à°Ù¶ÈÖ®ÍâµÄ¹±Ï×Õß¡£
ΪʲôҪÔÚ KubernetesÉÏÔËÐÐPaddlePaddle
PaddlePaddleµÄÖ¼ÔÚ×ö³ÉÇᱡ¶ÀÁ¢µÄ¼ÆËã¼Ü¹¹¡£Óû§¿ÉÒÔÔÚHadoop¡¢Spark¡¢Mesos¡¢Kubernetes¼°ÆäËû¿ò¼ÜÖ®ÉÏÔËÐÐËü¡£ÎÒÃǶÔKubernetesŨºñµÄÐËȤ²ú³ö×ÔËüµÄÁé»îÐÔ¡¢Ð§Âʼ°Æä·á¸»µÄ¹¦ÄÜ¡£
ÎÒÃÇÔÚ¸÷ÖְٶȲúÆ·ÖÐÓ¦ÓÃPaddlePaddleµÄ¹ý³ÌÖУ¬·¢ÏÖPaddlePaddleÖ÷ÒªÓÃÓÚÁ½¸ö·½Ã棺Ñо¿ºÍ²úÆ·¡£Ñо¿Êý¾Ý²»¾³£¸Ä¶¯£¬¹Ø×¢µãÊÇ¿ìËÙʵÑéÈ¥´ï³ÉÔ¤ÆÚµÄ¿ÆÑ§²âÁ¿¡£²úÆ·Êý¾Ýͨ³£À´×ÔÓÚÓÉWeb·þÎñ²úÉúµÄÈÕÖ¾ÏûÏ¢£¬¾³£»á±ä»¯¡£
³É¹¦µÄÉî¶ÈѧϰÏîÄ¿¼È°üÀ¨Ñо¿Ò²°üÀ¨Êý¾Ý´¦Àí¹ÜµÀ£¬ÓÐÐí¶àÐèÒª½øÐе÷ÕûµÄ²ÎÊý¡£Ðí¶à¹¤³ÌʦͬʱͶÉíÓÚÏîÄ¿µÄ²»Í¬²¿¼þ¡£
Ϊȷ±£ÏîÄ¿Ò×ÓÚ¹ÜÀí²¢ÓÐЧµØÀûÓÃÓ²¼þ×ÊÔ´£¬ÎÒÃÇÏ£ÍûÔÚͬһ¼Ü¹¹Æ½Ì¨ÉÏÔËÐÐÏîÄ¿µÄËùÓв¿¼þ¡£
ƽ̨Ӧ¸ÃÌṩ£º
ÈÝ´íÐÔ¡£ËüÓ¦¸Ã°Ñ¹ÜµÀµÄÿһ½×¶Î³éÏóΪ·þÎñ£¬ËüÓÉÐí¶à´¦Àí¹¹³É£¬Í¨¹ýÈßÓàÌṩ¸ßÍÌÍÂÂʺͽ¡×³ÐÔ¡£
×Ô¶¯À©Õ¹¡£ÔÚ°×Ì죬ͨ³£»áÓÐÐí¶à»î¶¯µÄÓû§£¬Æ½Ì¨Ó¦¸ÃÀ©Õ¹ÔÚÏß·þÎñ¡£¶øµ½ÁËÍíÉÏ£¬Æ½Ì¨ÔòÓ¦¸ÃÊÍ·ÅһЩ×ÊÔ´½øÐÐÉî¶ÈѧϰʵÑé¡£
ÈÎÎñ´ò°üºÍ¸ôÀë¡£ËüÓ¦¸ÃÄܹ»°ÑÐèÒªGPUµÄPaddlePaddleѵÁ·¹ý³Ì¡¢ÐèÒª´óÄÚ´æµÄºó¶Ë·þÎñÒÔ¼°ÐèÒª´ÅÅÌIOµÄCephFS¹ý³Ì·ÖÅ䵽ͬһ½ÚµãÉÏÒÔ³ä·ÖÀûÓÃÆäÓ²¼þ¡£
ÎÒÃÇÏëÒªµÄÊÇÒ»¸öÔÚͬһ¼¯ÈºÖÐÔËÐÐÉî¶Èѧϰϵͳ¡¢Web·þÎñÆ÷£¨±ÈÈçNginx£©¡¢ÈÕÖ¾ÊÕ¼¯Æ÷£¨±ÈÈçfluentd£©¡¢·Ö²¼Ê½¶ÓÁзþÎñ£¨±ÈÈçKafka£©¡¢ÈÕÖ¾ºÏ²¢¹¤¾ßºÍÆäËûÓÃStorm¡¢SparkºÍ
Hadoop MapReduceд³ÉµÄÊý¾Ý´¦ÀíÆ÷µÄƽ̨¡£ÎÒÃÇÏ£ÍûÔÚͬһ¼¯ÈºÖÐÔËÐÐËùÓеÄÈÎÎñ£¨ÔÚÏߺÍÀëÏߣ©¡¢Éú²úºÍʵÑ飬ËùÒÔÎÒÃÇÓ¦¸Ã³ä·ÖÀûÓøü¯Èº£¬ÒòΪ²»Í¬ÀàÐ͵ÄÈÎÎñÐèÒª²»Í¬µÄÓ²¼þ×ÊÔ´¡£
ÒòΪÐéÄâ»ú´øÀ´µÄÈÕ³£·ÑÓÃÓëÎÒÃǵÄЧÂʺÍÀûÓÃÂʵÄÄ¿±êÊÇì¶ÜµÄ£¬ËùÒÔÎÒÃÇ»ùÓÚ½â¾ö·½°¸Ñ¡ÔñÈÝÆ÷¡£
¼øÓÚÎÒÃÇ»ùÓÚ½â¾ö·½°¸¶Ô²»Í¬ÈÝÆ÷µÄÑо¿£¬Kubernetes×îÊʺÏÎÒÃǵÄÐèÇó¡£
ÔÚKubernetesÉϽøÐзֲ¼Ê½ÑµÁ·
PaddlePaddleÌìÉú¾ÍÖ§³Ö·Ö²¼Ê½ÑµÁ·¡£ÔÚPaddlePaddle¼¯ÈºÖÐÓÐÁ½¸öÖ°Ô𣺲ÎÊý·þÎñÆ÷ºÍѵÁ·Õß¡£Ã¿¸ö²ÎÊý·þÎñÆ÷¹ý³Ìά»¤Ò»¸ö¹«¹²Ä£Ð͵ÄË鯬¡£Ã¿¸öѵÁ·ÕßÓÐËü×Ô¼º±¾µØµÄÄ£ÐÍ¿½±´£¬²¢ÓÃ×Ô¼ºµÄ±¾µØÊý¾ÝÈ¥¸üÐÂÕâ¸öÄ£ÐÍ¡£ÔÚѵÁ·¹ý³ÌÆÚ¼ä£¬ÑµÁ·Õß·¢ËÍÄ£Ð͸üе½·þÎñ²ÎÊý·þÎñÆ÷£¬²ÎÊý·þÎñÆ÷¸ºÔðÊÕ¼¯ÕâЩ¸üУ¬ÒÔ±ãѵÁ·ÕßÄܹ»ÓÃÈ«¾ÖÄ£ÐÍͬ²½ËûÃǵı¾µØ¿½±´¡£

ͼ1£º·Ö¸îΪÁ½¸öË鯬µÄÄ£ÐÍ¡£ÓÉÁ½¸ö²ÎÊý·þÎñÆ÷¸ºÔð¹ÜÀí¡£
ÓÐЩÆäËû·½Ê½ÊÇʹÓÃÒ»×é²ÎÊý·þÎñÆ÷È¥¹²Í¬³ÖÓÐÒ»¸ö·Ç³£¾Þ´óµÄÄ£ÐÍ£¬¸ÃÄ£ÐÍ´¦ÓÚ¶ą̀Ö÷»úÉϵÄCPUÄÚ´æ¿Õ¼äÖС£µ«Êµ¼ÊÉÏ£¬ÎÒÃÇͨ³£²»»áÓÐÕâô´óµÄÄ£ÐÍ£¬ÒòΪÊÜGPUÄÚ´æËùÏÞ´¦ÀíÕâô´óµÄÄ£ÐÍÓ¦¸ÃЧÂʼ«µÍ¡£ÔÚÎÒÃǵÄÅäÖÃÖУ¬¶ą̀²ÎÊý·þÎñÆ÷´ó¶àÊÇΪÁË¿ìËÙͨÐÅ¡£¼ÙÉèÓëËùÓÐѵÁ·ÕßÒ»Æð¹¤×÷µÄÖ»ÓÐһ̨²ÎÊý·þÎñÆ÷£¬ÄÇô¸Ã²ÎÊý·þÎñÆ÷¾Í±ØÐëµÃ´ÓËùÓÐѵÁ·ÕßÖÐÊÕ¼¯½¥±äÇé¿ö£¬ÓÚÊǾͳÉÁËÒ»¸öÆ¿¾±¡£ÎÒÃǵľÑé±íÃ÷£¬°üº¬ÓÐͬÑùÊýÁ¿µÄѵÁ·ÕߺͲÎÊý·þÎñÆ÷ÊÇÒ»ÏîʵÑéÐÔµÄÓÐЧÅäÖ᣶øÎÒÃÇͨ³£»áÔÚͬһ½ÚµãÉÏÔËÐÐÒ»¶ÔѵÁ·ÕߺͲÎÊý·þÎñÆ÷¡£ÔÚÈçÏÂKubernetesÈÎÎñÅäÖÃÖУ¬ÎÒÃÇÆô¶¯Ò»¸öÔËÐÐÔÚN¸öPodµÄÈÎÎñ£¬Ã¿¸öPodÉÏÓÐÒ»¸ö²ÎÊý·þÎñÆ÷ºÍÒ»¸öѵÁ·Õß½ø³Ì¡£
yaml apiVersion: batch/v1 kind: Job metadata: name: PaddlePaddle-cluster-job spec: parallelism: 3 completions: 3 template: metadata: name: PaddlePaddle-cluster-job spec: volumes: - name: jobpath hostPath: path: /home/admin/efs containers: - name: trainer image: your_repo/paddle:mypaddle command: ["bin/bash", "-c", "/root/start.sh"] env: - name: JOB_NAME value: paddle-cluster-job - name: JOB_PATH value: /home/jobpath - name: JOB_NAMESPACE value: default volumeMounts: - name: jobpath mountPath: /home/jobpath restartPolicy: Never |
ÎÒÃÇ¿ÉÒÔ¿´µ½Õâ¸öÅäÖÃÖеÄparallelism¡¢completions¶¼ÉèÖÃΪÁË3¡£ËùÒÔ¸ÃÈÎÎñ½«Í¬Ê±Æô¶¯3¸öPaddlePaddle
pod£¬¶øÇÒ¸ÃÈÎÎñ½«ÔÚËùÓÐÈý¸öpod½áÊøÊ±Íê³É¡£

ͼ2£ºÔÚÁ½¸ö½ÚµãÉÏÔËÐеÄÒ»¸öpodµÄÈÎÎñBºÍÈý¸öpodµÄÈÎÎñA¡£
ÿ¸öpodµÄÈë¿ÚµãÊÇstart.sh¡£Ëü´ÓÒ»¸ö´æ´¢·þÎñÖÐÏÂÔØÊý¾Ý£¬ÒÔ±ãѵÁ·ÕßÄÜ´Ópod±¾µØ´ÅÅ̿ռäÖпìËÙ¶ÁÈ¡¡£ÔÚÏÂÔØÍê³ÉÖ®ºó£¬ËüÔËÐÐÒ»¶ÎPython½Å±¾start_paddle.py£¬ËüÆô¶¯ÁËÒ»¸ö²ÎÊý·þÎñÆ÷£¬µÈµ½ËùÓÐpodµÄ²ÎÊý·þÎñÆ÷¶¼×¼±¸¾ÍÐ÷ºó£¬ÔÙÆô¶¯¸ÃpodÉϵÄѵÁ·Õß½ø³Ì¡£
Õâ¸öµÈ´ýÊDZØÐëµÄ£¬ÒòΪÿ¸öѵÁ·ÕßÐèÒªÓëËùÓвÎÊý·þÎñÆ÷¶Ô»°£¬Èçͼ1Ëùʾ¡£KubernetesAPI ʹѵÁ·Õß¿ÉÒÔ¼ì²épodµÄ״̬£¬ËùÒÔPython½Å±¾Ó¦¸ÃµÈµ½ËùÓвÎÊý·þÎñÆ÷µÄ״̬¶¼±äΪ¡°ÔËÐÐÖУ¨running£©¡±ºóÔÙÆô¶¯ÑµÁ·Ô±½ø³Ì¡£
Ò»°ã£¬´ÓÊý¾ÝË鯬µ½pod/ѵÁ·Ô±µÄÓ³ÉäÊǾ²Ì¬µÄ¡£Èç¹ûÎÒÃÇ´òËãÈ¥ÔËÐÐN¸öѵÁ·Õߣ¬¾ÍÐèÒª°ÑÊý¾Ý·Ö¸îΪN¸öË鯬£¬È»ºó°Ñÿ¸öË鯬¾²Ì¬Ö¸¶¨¸øÃ¿¸öѵÁ·Õß¡£ÎÒÃÇÔÙ´ÎÒÀÀµKubernetes
API°ÑpodÕûÀí³ÉÁбí·Åµ½ÈÎÎñÖУ¬°Ñpod/ѵÁ·Õß´Ó1µ½N½øÐбàºÅ£¬ÄÇôµÚi¸öѵÁ·Õ߾ͿÉÒÔ¶Áµ½µÚi¸öÊý¾ÝË鯬ÁË¡£
ѵÁ·µÄÊý¾Ýͨ³£ÔÚ·Ö²¼Ê½ÎļþϵͳÉÏÌṩ¡£Êµ¼ÊÉÏÎÒÃÇʹÓõÄÊÇÎÒÃÇÆóÒµÔ¤ÖÃÐͼ¯ÈºÉϵÄCephFSºÍAWSÉϵÄAmazon
Elastic File System¡£Èç¹ûÄãÓÐÐËȤ¹¹½¨ÔËÐзֲ¼Ê½PaddlePaddleѵÁ·ÈÎÎñµÄKubernetes¼¯Èº£¬ÇëÔĶÁÕâ¸ö½Ì³Ì¡£
½ÓÏÂÀ´µÄ¹¤×÷
ÎÒÃÇÕýÖÂÁ¦ÓÚÈÃʹÓÃKubernetesµÄPaddlePaddle¸üƽÎȵØÔËÐС£
Äã¿ÉÄÜ×¢Òâµ½ÁË£¬Ä¿Ç°ÑµÁ·Õßµ÷¶ÈÍêÈ«ÒÀÀµÓÚ»ùÓÚ¾²Ì¬·Ö¸îÓ³ÉäµÄKubernetes¡£ÕâÖÖ·½Ê½Ò×ÓÚÉÏÊÖ£¬µ«¿ÉÄܻᵼÖÂһЩЧÂÊÎÊÌâ¡£Ê×ÏÈ£¬»ºÂý»òÍ£Ö͵ÄѵÁ·Õß»á×è°Õû¸öÈÎÎñ¡£ÔÚ³õʼ²¿ÊðÖ®ºóûÓпɿصÄÇÀÕ¼»òÖØÐµ÷¶È¡£µÚ¶þ£¬ÕâÖÖ×ÊÔ´·ÖÅäÊǾ²Ì¬µÄ¡£ËùÒÔÈç¹ûKubernetes¾ßÓеĿÉÓÃ×ÊÔ´³¬³öÎÒÃǵÄÔ¤ÁÏ£¬ÄÇô¾Í²»µÃ²»È¥ÊÖ¹¤ÐÞ¸Ä×ÊÔ´ÐèÇó¡£ÕâÊÇÒ»Ïî¿ÝÔ﷦ζµÄ¹¤×÷£¬ÓëÎÒÃÇЧÂʺÍÀûÓÃÂʵÄÄ¿±êÊDz»Ò»Öµġ£
Ϊ½â¾öÉÏÊöÎÊÌ⣬ÎÒÃǽ«Ôö¼ÓÒ»¸ö¶®µÃKubernetes APIµÄPaddlePaddleÖ÷»ú£¬ËüÄܶ¯Ì¬Ôö¼Ó¡¢ÒƳý×ÊÔ´ÄÜÁ¦£¬²¢ÒÔ¸ü¼Ó¶¯Ì¬µÄ·½Ê½°ÑË鯬·ÖÅɸøÑµÁ·Õß¡£¸ÃPaddlePaddleÖ÷»ú°Ñetcd×÷Ϊ¶¯Ì¬Ó³ÉäË鯬µ½ÑµÁ·ÕßµÄÈÝ´íÐÔ´æ´¢¡£Òò´Ë£¬¼´Ê¹¸ÃÖ÷»ú±ÀÀ£ÁË£¬Ó³ÉäÒ²²»»á¶ªÊ§¡£Kubernetes¿ÉÒÔÖØÆô¸ÃÖ÷»ú£¬ÈÎÎñÈÔ½«±£³ÖÔËÐС£
ÁíÒ»¸ö¿ÉÄܵĸĽøÊǸüºÃµÄPaddlePaddleÈÎÎñÅäÖá£ÎÒÃÇËùÌᳫµÄ¾ßÓÐͬµÈÊýÁ¿µÄѵÁ·ÕߺͲÎÊý·þÎñµÄ¾ÑéÖ÷ÒªÊÇ´ÓÌØ¶¨Ä¿±ê¼¯ÈºµÄÔËÓÃÖÐÊÕ¼¯¶øÀ´µÄ¡£Õâ¸ö²ßÂÔÔÚÎÒÃÇÖ»ÔËÐÐPaddlePaddleÈÎÎñµÄ¼¯ÈºÉÏÓÐÃ÷ÏÔµÄЧ¹û¡£È»¶ø£¬ÔÚÔËÐÐÐí¶àÖÖÈÎÎñµÄ¶àÓÃ;¼¯ÈºÉÏ£¬Õâ¸ö²ßÂÔ¿ÉÄܾÍδ±ØºÏÊÊÁË¡£
PaddlePaddleѵÁ·ÕßÄÜÀûÓöà¸öGPUÈ¥¼Ó¿ì¼ÆËã¡£µ«GPUÔÚKubernetesÖл¹²»ÊôÓÚµÚÒ»Àà×ÊÔ´¡£ÎÒÃDZØÐë°ëÊÖ¶¯µØ¹ÜÀíGPU¡£ÎÒÃǽ«ºÜÔ¸ÒâÓëKubernetesÉçÇø¹²Í¬È¥¸Ä½øGPUµÄÖ§³Ö£¬È·±£PaddlePaddleÔÚKubernetesÉÏÒÔ×î¼Ñ״̬ÔËÐС£
Äã¿ÉÒÔ£º
ÏÂÔØKubernetes
ÔÚ GitHub ÉϲÎÓëKubernetesÏîÄ¿
ÔÚStack Overflow ÉÏÌáÎÊ»ò»Ø´ðÎÊÌâ
ÔÚ Slack ÉÏÓëÉçÇøÈ¡µÃÁªÏµ
ÔÚÍÆÌØÉÏ @Kubernetesio ×Éѯ×îиüеÄÊÂÒË
±à¼°´£ºÕâÆªÎÄÕÂÓɰٶÈÉî¶ÈѧϰÍŶӺÍCoreOS etcdÍŶÓÁªºÏ·¢²¼£¬¾×÷ÕßÊÚȨ£¬ÓÉInfoQ·ÒëΪÖÐÎİæÓèÒÔ·¢²¼¡£ |