Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Model Center   Code  
»áÔ±   
   
 
     
   
 ¶©ÔÄ
  ¾èÖú
¹«ÓÐÔÆÉϹ¹½¨ÔÆÔ­Éú AI ƽ̨µÄ̽Ë÷Óëʵ¼ù
 
 
  2083  次浏览      30
 2021-7-29
 
±à¼­ÍƼö:
±¾ÎĽéÉÜÁË AI ÀàÒµÎñÔÚ¹«ÓÐÔÆÉϵÄÏÖ×´ÒÔ¼°ÏàÓ¦µÄ¼¼ÊõÑ¡ÐͺÍÃæÁÙµÄÎÊÌâ¡£×îºóͨ¹ý·ÖÎö¿ªÔ´ÉçÇøºÍÒµ½çµÄÇ÷ÊÆ£¬·ÖÏíÁ˶ÔÓÚδÀ´È«µ¯ÐÔµÄAI »ù´¡ÉèÊ©µÄÕ¹Íû¡£
±¾ÎÄÀ´×ÔÓÚCSDN£¬ÓÉ»ðÁú¹ûÈí¼þLinda±à¼­¡¢ÍƼö¡£

±³¾°ÓëÏÖ×´

Éî¶Èѧϰ·¢Õ¹ÖÁ½ñ£¬ÐµÄÄ£Ðͽṹ²ã³ö²»Çî¡£×Ô 2018 Äê GPT-1¡¢Bert Ïà¼ÌÎÊÊÀ£¬Ä£ÐͽṹµÄ²ÎÊýÁ¿³ÊÖ¸Êý¼¶Ôö³¤¡£Ä¿Ç° Transformer µÈ½á¹¹²»½öÔÚ×ÔÈ»ÓïÑÔ´¦ÀíÁìÓò·¢¹â·¢ÈÈ£¬ÔÚ¼ÆËã»úÊÓ¾õµÈÁìÓò£¬Ò²³ÊÒ°»ðÁÇÔ­Ö®ÊÆ¡£Óɴ˿ɼû£¬Î´À´¶ÔÓÚËãÁ¦ºÍÏÔ´æµÄÐèÇó»áÔ½·¢Ç¿ÁÒ¡£¶øÒÔ Nvidia Ϊ´ú±íµÄÓ²¼þ³§ÉÌÌṩµÄÓ²¼þÐÔÄÜÈ´²¢²»ÄÜÓë֮ͬ²½Ìá¸ß¡£ÉÏͼչʾÁËÁ½ÕßÖ®¼äµÄºè¹µ£¬ºìÉ«ÏßÌõÊÇÄ£ÐͲÎÊý¹æÄ£µÄ±ä»¯Ç÷ÊÆ£¬Ä¿Ç°ÕýÔÚÒÔÿÄê 120 ±¶µÄËÙ¶ÈÌáÉý¡£¶øÂÌÉ«ÏßÌõ´ú±íµÄÏÔ´æÈÝÁ¿Ã¿ÄêÌá¸ßµÄËÙ¶ÈÖ»ÓÐ 2 ±¶¡£

Òò´Ë£¬ÎÞÂÛÊÇÔÚ¼ÆËã»úÊÓ¾õ¡¢×ÔÈ»ÓïÑÔ´¦ÀíµÈÁìÓò£¬»¹ÊÇ»¥ÁªÍøÐÐÒµÂ䵨¹ã·ºµÄËÑË÷¹ã¸æÍƼöÁìÓò£¬·Ö²¼Ê½ÑµÁ·¶¼³ÉΪÁËÖ÷Á÷ѵÁ··½Ê½¡£

ÓëÖ®Ïà¶ÔÓ¦µÄ£¬Éî¶Èѧϰ¿ò¼ÜÒ²³Ê°Ù»¨Æë·ÅµÄÌ¬ÊÆ¡£´«Í³µÄ¿ò¼ÜÈç TensorFlow¡¢PyTorch¡¢Keras ÈÔȻʮ·ÖÁ÷ÐС£¶øÒ»Ð©ÐµĿò¼ÜÒ²Öð½¥³öÏÖ£¬±ÈÈç΢ÈíµÄ DeepSpeed¡¢°Ù¶ÈµÄ Paddle µÈ¡£

×ܽáÀ´Ëµ£¬Ä¿Ç° AI ÔÚ¹¤Òµ½çµÄ¸÷¸öÁìÓò¶¼ÓÐÁ˹㷺µÄÂ䵨¡£´«Í³µÄËÑË÷¹ã¸æÍƼöÁìÓò×Ô²»±ØËµ£¬ÔÚÊÓ¾õÓë×ÔÈ»ÓïÑÔ´¦ÀíÁìÓò£¬»ùÓÚÉî¶ÈѧϰµÄ·½·¨ÒѾ­³ÉΪÁË state-of-art¡£ÔÚÓÎÏ·¡¢»úÆ÷È˵ÈÁìÓò£¬Ç¿»¯Ñ§Ï°Ò²ÔÚÂýÂý×ßÏòÉú²ú¡£ÎªÁËÂú×ãÒµÎñ¶Ô¸´ÔÓÄ£Ð͵ÄÐèÇó£¬ÐµÄÓ²¼þºÍ¿ò¼Ü²ã³ö²»Çî¡£µ±È»£¬»¹ÓÐÒ»¸ö·Ç³£Ã÷ÏÔµÄÇ÷ÊÆ£¬²»ÉÙ AI ÀàÒµÎñÕýÔÚÉϹ«ÓÐÔÆ£¬Ï£Íû½èÖú¹«ÓÐÔÆµÄµ¯ÐÔ¼ÆËãÄÜÁ¦½µµÍËãÁ¦³É±¾£¬Ìá¸ßЧÂÊ¡£

ÔÚ¹«ÓÐÔÆÉ쵀 AI Â䵨

½ÓÏÂÀ´£¬ÎÒÃǽéÉÜÒ»ÏÂÔÚ·þÎñ¹«ÓÐÔÆÉϵĿͻ§Ê±¹ØÓÚÔÆÔ­Éú AI µÄһЩ¹Û²ì¡£

»ùÓÚ¹«ÓÐÔÆµÄÔÆÔ­Éú AI ĿǰÕýÔÚÖð½¥Â䵨£¬ÆäÖмȰüÀ¨Ï¡ÊèÀàµÄËÑË÷/¹ã¸æ/ÍÆ¼öÒµÎñ£¬Ò²°üÀ¨³íÃÜÀàµÄ¼ÆËã»úÊÓ¾õµÈÒµÎñ¡£»¥ÁªÍøÁìÓòµÄÍÆ¼ö³¡¾°Â䵨Ïà¶Ô½Ï¶à¡£Ò²ÕýÊÇÓÉÓÚËÑË÷/¹ã¸æ/ÍÆ¼öÒµÎñ³¡¾°¸´ÔÓ£¬¶Ëµ½¶ËÑÓ³ÙÒªÇóµÍ£¬Òò´Ë¸ÄÔìµÄ³É±¾Ïà¶Ô½Ï¸ß£¬ËùÒÔ´ó¶àÊýÒµÎñ£¬ÓÈÆäÊÇÀëÏßѵÁ·¹ý³Ì£¬ÈÔÈ»²»ÄܺܺõØÀûÓÃÔÆµÄµ¯ÐÔÄÜÁ¦¡£

Óë´Ëͬʱ´ÓÉî¶Èѧϰ¿ò¼ÜµÄ½Ç¶È¿´£¬Ä¿Ç°¾ø´ó¶àÊýµÄÒµÎñÈÔÈ»ÔÚʹÓà TensorFlow¡£ÕâÓë֮ǰµÄ¹Û²ìÓÐÒ»¶¨µÄÏà¹ØÐÔ¡£ËÑË÷/¹ã¸æ/ÍÆ¼öÒµÎñÖÐ TensorFlow ÈÔȻռ¾ÝÁ˾ø¶ÔµÄÊг¡¡£µ«ÊÇĿǰ PyTorch µÄʹÓÃÒ²Ô½À´Ô½¶à£¬ÓÈÆäÊÇÔÚ¼ÆËã»úÊÓ¾õ¡¢×ÔÈ»ÓïÑÔ´¦ÀíµÈÁìÓò¡£

ÌÚÑ¶ÔÆÔ­ÉúAI·þÎñ

½áºÏÎÒÃǵÄÕâЩ¹Û²ìºÍʵ¼ù£¬ÌÚÑ¶ÔÆÔ­ÉúÍŶÓÎ§ÈÆ×Å Kubeflow ¹¹½¨ÁËÌÚÑ¶ÔÆÈÝÆ÷·þÎñµÄÔÆÔ­Éú AI ²úÆ·»¯·½°¸¡£Ä¿Ç°ÒѾ­¿ªÊ¼Ãâ·ÑÄڲ⣬»¶Ó­ÁªÏµÎÒÃÇÊÔÓã¬ÄúµÄÈκν¨Òé¶¼»á³ÉΪÎÒÃǵı¦¹ó¶¯Á¦¡£ ÌÚÑ¶ÔÆÔÆÔ­ÉúAI·þÎñΪÓû§ÌṩÁË AI»·¾³µÄ¿ìËÙ½»¸¶ÒÔ¼°¹ÜÀíÄÜÁ¦¡¢µ¯Ð﵀ Jupyter ·þÎñ¡¢ÒÔ¼°·Ö²¼Ê½Ä£ÐÍ·þÎñµÈÄÜÁ¦£¬Ä¿Ç°¹ØÓÚÄ£Ð͹ÜÀíµÈ²úÆ·ÌØÐÔÒ²ÔÚÖð²½½¨ÉèÖС£ ´ËÍ⣬ΪÁ˽â¾ö´ø¿íÐÔÄܵį¿¾±ÎÊÌ⣬ÎÒÃDz»½öÔÚ´æ´¢¶ËÁªºÏÌÚѶ COS ÍŶӣ¬½èÖú GooseFS »º´æÒýÇæÓÅ»¯£¬¶øÇÒÔÚ¼ÆËã¶ËÁªºÏÌÚÑ¶ÔÆÓÅͼʵÑéÊÒ£¬½èÖúÆäÔÚѵÁ·ÓëÍÆÀíÉ϶àÄêÀ´µÄ¾­Ñé³Áµí£¬×¼±¸ÍƳö¸ß¶ÈÓÅ»¯µÄÉî¶Èѧϰ¿ò¼Ü¡£ÎÒÃÇ»á³ä·ÖÀûÓÃÔÆÔ­ÉúAI×÷Ϊͳһ´°¿ÚµÄÓÅÊÆ£¬ÓëÌÚÑ¶ÔÆ¶à¸öÍŶӺÏ×÷¹²½¨Æ½Ì¨£¬Ìṩ¿ªÏä¼´ÓõIJúÆ·»¯ÄÜÁ¦£¬·´²¸¿Í»§ÓëÉçÇø¡£ ¸ü¶à¹ØÓÚÔÆÔ­ÉúAIµÄ×î¼Ñʵ¼ù»áÔÚÎÒÃǺóÐøµÄ¡¶ÔÆÔ­ÉúAI±ê×¼Ö¸ÄÏ¡·ÒÔ¼°¡¶ÔÆÔ­ÉúAIÇ°ÑØ¹Û²ì¡·ÏµÁÐÖÐÍÆ³ö¡£

ÂäµØÊµ¼ù

ÔÚ½éÉÜÍ깫ÓÐÔÆµÄ AI ÔÆÔ­ÉúÂ䵨Çé¿öºó£¬ÎÒÃÇ·ÖÏíÒ»ÏÂÔÚ¹«ÓÐÔÆÉÏÔËÐÐ AI ÀàÒµÎñµÄµäÐÍÑ¡ÐÍ¡£Ê×ÏÈÊÇѵÁ·Ïà¹ØµÄ¼¼ÊõÕ»¡£Ê×ÏÈ£¬ÔÚ×îµ×²ãµÄÔÆ·þÎñÆ÷²à£¬Ò»°ã¶øÑÔÊÇÓÉÔÆ³§ÉÌÌṩµÄÐéÄâ»ú»òÕßÂã½ðÊô»úÆ÷¡£Ä¿Ç°´ó²¿·ÖÒµÎñ¶¼²ÉÓà Kubernetes ÈÝÆ÷·þÎñ£¬ËùÒÔÒ»°ã¼ÆËã²à»á½«·þÎñÆ÷×é³É Kubernetes ¼¯Èº½øÐÐ×ÊÔ´¹ÜÀíºÍµ÷¶È¡£ÔÚÆäÉÏ£¬Ò»°ã»áÒÀÀµ¶ÔÏó´æ´¢¡¢Îļþ´æ´¢»òÕß¿é´æ´¢½øÐÐѵÁ·Ñù±¾ºÍÄ£Ð͵Ĵ洢¡£Ò»°ã¶øÑÔÔÚ¶ÁдѹÁ¦²»Ì«´óµÄ³¡¾°Ï£¬´ó¶àʹÓöÔÏó´æ´¢¡£Ïà±ÈÓÚÆäËû·½Ê½£¬¶ÔÏó´æ´¢Ö§³Ö·Ö²ãѹËõ¹éµµ£¬ÐԼ۱ȸߡ£ÔÚ¶ÁдѹÁ¦±È½Ï´óµÄ³¡¾°£¬Îļþ´æ´¢ºÍ¿é´æ´¢Óиü¶àµÄÂ䵨¡£

ΪÁËÄܹ»¾¡¿ÉÄÜÌá¸ßÊý¾ÝµÄÍÌÍ£¬ÓÐʱ»áÀûÓÃһЩ¼ÆËã²àµÄ»º´æ½øÐмÓËÙ¡£ÆäÖеÄÑ¡ÐͰüÀ¨ Alluxio ºÍÌÚÑ¶ÔÆ¶ÔÏó´æ´¢»º´æ¼ÓËÙ²úÆ· GooseFS µÈ¡£Í¨¹ý°ÑÔ¶¶ËµÄÊý¾Ý»º´æÔÚ¼ÆËã²à¼¯ÈºÖУ¬±ÜÃâÁËÔ¶¶ËÀ­È¡Êý¾ÝµÄ¿ªÏú£¬ÔÚijЩ³¡¾°ÏÂÄܹ»ÏÔÖøµØÌá¸ßѵÁ·ËÙ¶È¡£

¹¹½¨ÔÚ·þÎñÆ÷ºÍ´æ´¢Ö®ÉϵÄÊÇ·Ö²¼Ê½ÑµÁ·µÄ»ù´¡ÉèÊ©¡£Ä¿Ç° Kubeflow ±»Ó¦ÓõØ×îΪ¹ã·º¡£Í¨¹ý Kubeflow£¬Óû§¿ÉÒÔÇáËɵش´½¨³ö TensorFlow¡¢PyTorch¡¢Horovod µÈ¿ò¼ÜµÄ·Ö²¼Ê½ÑµÁ·ÈÎÎñ¡£²¢ÇÒ Kubeflow ¿ÉÒԺܺõØÓë Kubernetes µÄ¸÷ÖÖÌØÐÔЭͬ¹¤×÷£¬Äܹ»Ö§³Ö Volcano µÈµ÷¶ÈÆ÷¡£

¾¡¹Ü Kubeflow ÒѾ­Äܹ»Ö§³ÖÓû§½øÐÐÄ£Ð͵ÄѵÁ·ºÍÆÀ¹À£¬µ«ÊÇÖ±½ÓʹÓà Kubeflow ÈÔÈ»¾ßÓÐһЩÎÊÌâ¡£²»Í¬µÄÊý¾ÝÒÀÀµ¿ÉÄÜÔÚ²»Í¬µÄÊý¾ÝϵͳÖУ¬Òò´ËÊý¾Ý´¦ÀíµÄÂß¼­¿ÉÄܷdz£¸´ÔÓ¡£ÎªÁ˼ò»¯Ëã·¨¹¤³ÌʦµÄʹÓÃÁ÷³Ì£¬Ìá¸ßÓû§ÌåÑ飬һ°ãÔÚÉϲã»á¹¹½¨Ò»¸öÁ÷Ë®Ïßϵͳ£¬ÓÃÀ´½«»úÆ÷ѧϰÁ÷³ÌÖеĸ÷¸ö»·½Ú½øÐÐ×éºÏÁ¬½Ó¡£Í¬Ê±»áÌṩ·½±ãµÄ¿É±à³Ì»·¾³£¬°ïÖúËã·¨¹¤³Ìʦ¸ü¿ìµØÊµÏÖÒµÎñ¡£ÔÚÕâÒ»»·½ÚÖУ¬Ò»°ãÀ´Ëµ¿ÉÑ¡µÄϵͳ°üÀ¨ Jupyter¡¢Argo Workflow¡¢Airflow¡¢Kubeflow µÈ¡£´ÓÓû§µÄ½Ç¶È¿´£¬Ëã·¨¹¤³ÌʦֻÐèÒª¹ØÐÄ×îÉϲãµÄʵÑé»·¾³ºÍÁ÷Ë®Ïßϵͳ¡£¶øÆäϵĸ÷²ã Infra ÔòÓÉ»ù´¡ÉèÊ©ÍŶӺ͹«ÓÐÔÆÌṩ¡£ÕâÑùµÄ·Ö²ãÄܹ»½µµÍ²»Í¬½ÇÉ«µÄ¹¤³ÌʦµÄÐÄÖǸºµ££¬Ìá¸ßЧÂÊ¡£

½ÓÏÂÀ´£¬ÎÒÃǾÍÒÔ·Ö²¼Ê½ÑµÁ·ÎªÀý£¬½éÉÜÑ¡ÐÍÖпÉÄÜÓöµ½µÄÎÊÌ⣬ÒÔ¼°½â¾ö°ì·¨¡£ÔÚ·Ö²¼Ê½ÑµÁ·ÖУ¬°´ÕÕ²ÎÊý¸üеķ½Ê½²»Í¬£¬¿ÉÒÔ·ÖΪ Parameter Server£¨ÒÔϼò³ÆÎª PS£©Worker µÄģʽºÍ AllReduce µÄģʽ¡£ÔÚ PS ģʽÏ£¬Ò»¹²ÓÐÁ½¸ö½ÇÉ«²ÎÓëѵÁ·£¬·Ö±ðÊÇ PS ºÍ Worker¡£ÆäÖÐ Worker ¸ºÔðÖ÷ÒªµÄ¼ÆË㣬¼ÆËãºÃµÄÌݶȻᷢË͸ø¶ÔÓ¦µÄ PS£¬PS ¸üжÔÓ¦µÄ²ÎÊý£¬Ëæºó·¢»Ø¸ø Worker¡£ÔÚ AllReduce ģʽÖУ¬Ã¿¸ö Worker ÖÐÓÐÈ«Á¿µÄÄ£ÐÍ£¬²»Í¬ Worker ½ÓÊܲ»Í¬µÄÊý¾Ý£¬Ï໥֮¼ä´«µÝÌݶȣ¬½øÐÐÌݶȵĸüÐÂÓëͬ²½¡£

ÎÞÂÛÉÏÊöµÄÄÄÖÖѵÁ··½Ê½£¬¶¼´æÔÚһЩÎÊÌâ¡£Ê×ÏÈÊÇÔÚÄ£ÐͲÎÊý½Ï¶àµÄÇé¿öÏ£¬ÌݶȻò²ÎÊýͨÐÅʱµÄÍøÂç´ø¿íÐèÇóºÜ¸ß£¬ÍøÂç»á³ÉΪѵÁ·¹ý³ÌÖÐµÄÆ¿¾±¡£ÕâÒ»ÎÊÌâÔÚ³íÃÜÀàÄ£Ð͵ÄѵÁ·ÖÐÓÈΪÃ÷ÏÔ¡£Æä´Î£¬ÔÚÒ»¸öÔËÐÐÉî¶ÈѧϰÈÎÎñµÄ¼¯ÈºÉÏ£¬ÍùÍùÔËÐÐ×Ŷà¸öÉî¶ÈѧϰÈÎÎñ¡£²»Í¬µÄÈÎÎñ¶¼ÐèÒª·ÃÎÊ´æ´¢£¬Õâʱ´æ´¢´ø¿íÒ²¿ÉÄܳÉΪƿ¾±¡£×ܽáÆðÀ´£¬ÔÚÍøÂçºÍ´æ´¢ÉÏ£¬¶¼ÓпÉÄÜÓöµ½´ø¿í²»×ãµÄÎÊÌâ¡£

ÔÚ¹«ÓÐÔÆÉÏ£¬Í¨³£ÔÆ·þÎñÆ÷²»Ìṩ RDMA Íø¿¨£¬ÄÚÍø´ø¿íͨ³£ÔÚ 20-50Gbps ×óÓÒ¡£ÔÚÕâÑùµÄ»·¾³Ï£¬ÎªÁËÄܹ»½µµÍÌݶÈͬ²½´øÀ´µÄ´ø¿íѹÁ¦£¬Ò»°ã»áÐèÒª½øÐÐÌݶÈѹËõµÈÓÅ»¯¡£ÌݶÈѹËõ¿ÉÒÔ½µµÍµ¥´Îͬ²½µÄÌݶȴóС£¬Óë´Ëͬʱ£¬Ò²¿ÉÒÔÌæ»» AllReduce µÄʵÏÖ£¬Ñ¡Ôñ¶ÔµÍ´ø¿í»·¾³¸üΪÓѺõÄʵÏÖ£¬Èç 2DReduce µÈ¡£ÕâЩ¹¤×÷ÔÚÌÚÑ¶ÔÆµÄ Ti-Horovod Öж¼ÓжÔӦʵÏÖ¡£ËüÔڵʹø¿íµÄÇé¿öÏ»áÓбÈÔ­ÉúµÄ Horovod ¸üºÃµÄ±íÏÖ¡£

¶øÈç¹ûÔÚÂã½ðÊôµÈ·þÎñÆ÷ÉϽøÐÐѵÁ·£¬Ôò¿ÉÒÔÀûÓà RDMA Íø¿¨½øÐÐÌݶȵļÓËÙ¡£ÔÚÕâÑùµÄѵÁ·»·¾³ÖУ¬´æÔÚÒ»ÕÅ VPC Íø¿¨£¬ÓÃÓÚÓë¶ÔÏó´æ´¢µÈÔÆ²úÆ·½»»¥£»Ò»ÕÅ RoCE Íø¿¨ÒÔ¼°Ò»¸öÏÔ¿¨¡£Òò´ËÐèÒª½øÐÐÒ»¶¨µÄ¸ÄÔ죬À´Ö§³Öͨ¹ý VPC Íø¿¨½øÐÐѵÁ·Ñù±¾µÄÀ­È¡£¬¶øÌݶÈͬ²½¸üÐÂÔòͨ¹ý RDMA Íø¿¨½øÐС£

¶øÕâÑùµÄ·½Ê½£¬»áÓбȽϸߵĸÅÂÊÓöµ½Ö®Ç°Ëù˵µÄ´æ´¢´ø¿íµÄÎÊÌâ¡£ÌݶȵÄͬ²½Í¨¹ý¸ß´ø¿íµÄ RDMA ½øÐÐÁ˼ÓËÙ£¬Ïà¶ÔÓ¦µØ´æ´¢ÉϾ͸üÓпÉÄܳÉΪƿ¾±¡£ÎªÁ˽â¾öÕâÒ»ÎÊÌ⣬ÔÚ¹«ÓÐÔÆÉÏ¿ÉÒÔÀûÓüÆËã²àµÄ»º´æ²úÆ·£¬ÈçÌÚÑ¶ÔÆµÄ GooseFS£¬»òÕß¿ªÔ´µÄ Allxuio µÈ·½°¸£¬½«Êý¾Ý»º´æÔÚ¼¯ÈºÄÚ£¬±ÜÃâÔÚѵÁ·Ê±ÔÚÏßÀ­È¡¶ÔÏó´æ´¢ÖеÄÊý¾Ý£¬±ÜÃâ´æ´¢´øÀ´µÄÆ¿¾±ÎÊÌâ¡£

ÔÚÍÆÀí³¡¾°Ï£¬¼Ü¹¹Ïà¶Ô¸üΪ¼òµ¥¡£×îµ×²ãÒÀÈ»ÊÇÔÆ·þÎñÆ÷×é³ÉµÄ Kubernetes ¼¯Èº£¬Ä£ÐÍÒ»°ã¶øÑÔ»á´æ´¢ÔÚ¶ÔÏó´æ´¢ÖУ¬Ä£ÐÍ·þÎñÔò»áͨ¹ý TFServing¡¢Triton Inference Server »òÕß×ÔÑзþÎñ¿ò¼ÜµÄ·½Ê½¶ÔÍâÌṩ·þÎñ¡£

ÓÉÓÚ²¿·ÖÒµÎñµÄ¶Ëµ½¶ËÁ÷³ÌÏà¶Ô¸´ÔÓ£¬Óз±¸´µÄǰ´¦ÀíºÍºó´¦Àí»·½Ú¡£Èç¹ûʹÓà TFServing »òÕß Triton Inference ServerÀ´ÊµÏÖ£¬Âß¼­»áÓÈΪ¸´ÔÓ¡£Óë´Ëͬʱ£¬Ä£ÐÍ·þÎñ»áÓëÄÚ²¿µÄ»ù´¡ÉèÊ©ÓÐñîºÏ£¬ÐèÒª¶Ô½ÓÄÚ²¿µÄÍø¹ØµÈ·þÎñ¡£Òò´Ë×ÔÑзþÎñ¿ò¼ÜµÄÐèÇóÒ²Ïà¶ÔÍúÊ¢¡£¾¡¹Ü TFServing ºÍ Triton Inference Server ÔÚ¿ªÔ´ÁìÓò¹ãÊܹØ×¢£¬µ«ÊÇĿǰÈÔÓÐÏ൱¹æÄ£µÄÒµÎñʹÓÃ×ÔÑзþÎñ¿ò¼Ü¡£

δÀ´Õ¹Íû

AI ÒµÎñÔÚÉϹ«ÓÐÔÆµÄ¹ý³ÌÖУ¬Óи÷ÖÖ¸÷ÑùµÄÎÊÌâ¡£ÔÚͨÐÅ¡¢´æ´¢²àµÄ´ø¿íÆ¿¾±×Ô²»±ØËµ¡£³ý´ËÖ®Í⣬Éî¶ÈѧϰÍùÍùÒÀÀµ Nvidia µÄÖî¶àµ×²ã¿â£¬ÒÔ¼° Python µÄ¸÷ÀàÒÀÀµ¡£ÔÚ¼¯³É»·¾³ÖУ¬Jupyter Õ¼ÓÃµÄ GPU ÏÔ´æÒÔ¼°¼ÆËãµÄÀûÓÃÂʹýµÍµÈ¡£

»ù´¡¼Ü¹¹µÄÑݽøÒ²Ò»¶¨»á³¯×Žâ¾öÕâЩÎÊÌâµÄ·½Ïòǰ½ø¡£ÎÒÃÇÈÏΪ£¬Î´À´µÄ AI »ù´¡Éèʩһ¶¨ÊÇÈ«µ¯ÐԵġ£ÔÚѵÁ·³¡¾°Ï£¬Ô­±¾µÄѵÁ··½Ê½ÐèÒª½«²ÎÓëѵÁ·µÄ¸÷¸ö½ÇÉ«µÄÅäÖù̶¨ÏÂÀ´¡£±ÈÈçÓÉ 5 ¸ö Worker ²ÎÓëµÄ·Ö²¼Ê½ÑµÁ·ÈÎÎñ£¬ÔÚѵÁ·¹ý³ÌÖÐÐèÒª±£Ö¤ÓÐÇÒ½öÓÐ 5 ¸ö Worker ²ÎÓë¡£ÕâʹµÃ×ÊÔ´µÄÅäÖÃÖ»Äܾ²Ì¬µØÖ¸¶¨£¬ÔÚ¼¯Èº×ÊÔ´Çé¿ö·¢Éú±ä»¯Ê±ÎÞ·¨¶¯Ì¬µØµ÷Õû²ÎÓëѵÁ·µÄ Worker ÊýÁ¿¡£

Ŀǰ£¬ÄÜ¿´µ½ÓÐÔ½À´Ô½¶àµÄÉî¶Èѧϰ¿ò¼ÜÕýÔÚÖ§³Öµ¯ÐÔѵÁ·¡£ÒÔ Horovod ΪÀý£¬ËüÒýÈëÁË Driver µÄ¸ÅÄ¹ÜÀí Worker µÄÉúÃüÖÜÆÚ¡£µ±ÓÐÈκÎÒ»¸ö Worker ³öÏÖÎÊÌâʱ£¬Driver »á²¶»ñµ½Òì³£²¢ÇÒ¸ù¾ÝÅäÖÃÖØÐ½¨Á¢»·£¬ÈÃѵÁ·¼ÌÐøÏÂÈ¥¡£ÔÚÕâÒ»¹ý³ÌÖУ¬ÑµÁ·²»»áÖжϡ£ÕâʹµÃѵÁ·ÈÎÎñ¿ÉÒÔÔÚ¼¯Èº¸ºÔصͣ¬ÓпÕÏÐ GPU µÄʱºòÀ©ÈÝ£¬ÔÚ¼¯Èº¸ºÔظߵÄʱºòËõÈÝ¡£ÕâÑùµÄ¼Ü¹¹Äܹ»½áºÏ¹«ÓÐÔÆµÄµ¯ÐÔʵÀýµÈÄÜÁ¦£¬ÔÚÌá¸ßÈÝ´íÐÔµÄͬʱ£¬½µµÍѵÁ·µÄ³É±¾¡£

ÓëÖ®ÏàËÆµÄ£¬»¹Óе¯Ð﵀ Jupyter ÄÜÁ¦¡£ÔÚ Jupyter Ô­±¾µÄʵÏÖÖУ¬Ã¿¸ö Kernel ¶¼ÊÇÓë Notebook ÔËÐÐÔÚÒ»ÆðµÄ£¬ÕâÒ²¾ÍÒâζ×ÅËüÐèÒª³¤ÆÚÕ¼ÓÐÒ»ÕÅÍêÕûµÄ GPU ¿¨£¬ÕâͬÑùʹµÃ GPU µÄÀûÓÃÂʵò»µ½ÌáÉý¡£Jupyter ÔÚ¿¨µÄʹÓÃÉÏÈç¹ûÄܹ»×öµ½°´ÐèÉêÇëʹÓã¬Ò²Ò»¶¨»á½øÒ»²½µØÌá¸ß¼¯ÈºµÄ×ÊÔ´ÀûÓÃÂÊ£¬½µ±¾ÔöЧ¡£

×ܽá

×îºó£¬ÎÒÃÇ×ܽ᱾´Î·ÖÏíµÄÖ÷Òª¹Ûµã¡£Ä¿Ç°¹«ÓÐÔÆµÄÄÚÍø´ø¿íÈÔÈ»ÊÇÖÆÔ¼ AI ÒµÎñÉÏÔÆµÄÒ»¸öÖ÷ÒªÎÊÌâ¡£ÎÒÃÇÕë¶Ô²»Í¬µÄ³¡¾°Óв»Í¬µÄ·½·¨¿ÉÒÔ»º½âËü£¬Ò²ÓаüÀ¨Âã½ðÊôÔÚÄÚµÄ RDMA ·½°¸¿É¹©Ñ¡Ôñ¡£ÏàÐÅÔÚδÀ´Ëæ×Ź«ÓÐÔÆÍøÂç´ø¿íµÄÖð²½ÌáÉý£¬Õ⽫²»ÔÙ³ÉΪÎÊÌâ¡£

Æä´Î£¬¹¤Òµ½çĿǰÈÔȻȱ·¦ AI »ù´¡ÉèÊ©µÄÊÂʵ±ê×¼¡£Ä¿Ç°Óзdz£¶àµÄ¿ªÔ´ AI »ù´¡ÉèÊ©ÏîÄ¿£¬ÆäÖÐ Kubeflow ÊÇÂ䵨×î¶àµÄ£¬Æ¾½è×ÅÓë Kubernetes µÄÉî¶È¼¯³É£¬Ó빫˾ÄÚ²¿ÏÖÓеĻù´¡ÉèÊ©Äܹ»¸üºÃµØÐ­Í¬¹¤×÷£¬ÓÐÒ»¶¨µÄÓÅÊÆ¡£²»¹ýÕûÌå¶øÑÔ£¬Ä¿Ç°ÕâÒ»ÁìÓòÈÔȻȱ·¦ÊÂʵ±ê×¼¡£¸÷¸öϵͳ֮¼äµÄ²îÒì·Ç³£´ó¡£ÕâÒ²ÊÇĿǰÕâÒ»ÁìÓò×î´óµÄÎÊÌâÖ®Ò»£¬¸÷¸ö¹«Ë¾µÄ AI »ù´¡ÉèÊ©¶¼¸÷ÓÐÌØÉ«£¬ÄÑÒÔÏñ¼¯Èºµ÷¶ÈÁìÓò Kubernetes Ò»Ñù£¬ÔÚÉçÇøÐγɺÏÁ¦£¬¹²Í¬Íƶ¯ÐÐÒµ½ø²½¡£

×îºó£¬È«µ¯ÐԵļܹ¹ÊÇÎÒÃÇÈÏΪµÄÏÂÒ»²½Ñݽø·½Ïò¡£Ä¿Ç°ÔÚ AI ÒµÎñÖл¹²»ÄܺܺõØÀûÓõ¯ÐÔÄÜÁ¦£¬¶øÕâÊÇÔÆ¼ÆËã´ø¸øÎÒÃÇ×î´óµÄºìÀû¡£Ö»ÓÐÒÀÍÐÕæÕýµÄµ¯ÐԼܹ¹£¬Ó¦ÓòÅÄÜÉúÓÚÔÆÉÏ£¬³¤ÔÚÔÆÉÏ£¬·þÎñÓÚÆóÒµ½µ±¾ÔöЧµÄÖÕ¼«Ä¿±ê¡£

   
2083 ´Îä¯ÀÀ       30
Ïà¹ØÎÄÕÂ

ÔÆ¼ÆËãµÄ¼Ü¹¹
¶ÔÔÆ¼ÆËã·þÎñÄ£ÐÍ
ÔÆ¼ÆËãºËÐļ¼ÊõÆÊÎö
Á˽âÔÆ¼ÆËãµÄ©¶´
Ïà¹ØÎĵµ

ÔÆ¼ÆËã¼ò½é
ÔÆ¼ÆËã¼ò½éÓëÔÆ°²È«
ÏÂÒ»´úÍøÂç¼ÆËã--ÔÆ¼ÆËã
ÈídzÎöÔÆ¼ÆËã
Ïà¹Ø¿Î³Ì

ÔÆ¼ÆËãÔ­ÀíÓëÓ¦ÓÃ
ÔÆ¼ÆËãÓ¦ÓÃÓ뿪·¢
CMMIÌåϵÓëʵ¼ù
»ùÓÚCMMI±ê×¼µÄÈí¼þÖÊÁ¿±£Ö¤
×îл¼Æ»®
DeepSeekÔÚÈí¼þ²âÊÔÓ¦ÓÃʵ¼ù 4-12[ÔÚÏß]
DeepSeek´óÄ£ÐÍÓ¦Óÿª·¢Êµ¼ù 4-19[ÔÚÏß]
UAF¼Ü¹¹ÌåϵÓëʵ¼ù 4-11[±±¾©]
AIÖÇÄÜ»¯Èí¼þ²âÊÔ·½·¨Óëʵ¼ù 5-23[ÉϺ£]
»ùÓÚ UML ºÍEA½øÐзÖÎöÉè¼Æ 4-26[±±¾©]
ÒµÎñ¼Ü¹¹Éè¼ÆÓ뽨ģ 4-18[±±¾©]
 
×îÐÂÎÄÕÂ
ÔÆÔ­Éú¼Ü¹¹¸ÅÊö
K8S¸ß¿ÉÓü¯Èº¼Ü¹¹ÊµÏÖ
ÈÝÆ÷ÔÆ¹ÜÀíÖ®K8S¼¯Èº¸ÅÊö
k8s-ÕûÌå¸ÅÊöºÍ¼Ü¹¹
Ê®·ÖÖÓѧ»áÓÃdocker²¿Êð΢·þÎñ
×îпγÌ
ÔÆ¼ÆË㡢΢·þÎñÓë·Ö²¼Ê½¼Ü¹¹
Æóҵ˽ÓÐÔÆÔ­ÀíÓë¹¹½¨
»ùÓÚKubernetesµÄDevOpsʵ¼ù
ÔÆÆ½Ì¨¼Ü¹¹ÓëÓ¦Ó㨰¢ÀïÔÆ£©
Docker²¿Êð±»²âϵͳÓë×Ô¶¯»¯¿ò¼Üʵ¼ù
³É¹¦°¸Àý
±±¾© ÔÆÆ½Ì¨Óë΢·þÎñ¼Ü¹¹Éè¼Æ
ͨÓù«Ë¾GE DockerÔ­ÀíÓëʵ¼ùÅàѵ
ij¾ü¹¤Ñо¿µ¥Î» MDA£¨Ä£ÐÍÇý¶¯¼Ü¹¹£©
ÖªÃûÏû·Ñ½ðÈÚ¹«Ë¾ ÁìÓòÇý¶¯Éè¼Æ
ÉîÛÚijÆû³µÆóÒµ Ä£ÐÍÇý¶¯µÄ·ÖÎöÉè¼Æ