Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Model Center   Code  
»áÔ±   
   
 
     
   
 ¶©ÔÄ
  ¾èÖú
GPU¼Ü¹¹ÑݽøÊ®Ä꣬´Ó·ÑÃ×µ½°²Åà
 
  2363  次浏览      31
 2021-11-3
 
±à¼­ÍƼö:
±¾ÎÄÊÔͼÕûÀí´Ó2010Äêµ½2020ÄêÕâÊ®Äê¼äµÄӢΰ´ïGPU¼Ü¹¹ÑݽøÊ·£¬Ï£Íû¶ÔÄúµÄѧϰÓÐËù°ïÖú¡£
À´×ÔÓÚOneFlowÉî¶Èѧϰ¿ò¼Ü ,ÓÉ»ðÁú¹ûÈí¼þAlice±à¼­¡¢ÍƼö¡£

1¡¢CPU and GPU

ÎÒÃÇÏȶÔGPUÓÐÒ»¸öÖ±¹ÛµÄÈÏʶ£¬ÈçÏÂͼ£º

ÖÚËùÖÜÖª£¬ÓÉÓÚ´æ´¢Æ÷µÄ·¢Õ¹ÂýÓÚ´¦ÀíÆ÷£¬ÔÚCPUÉÏ·¢Õ¹³öÁ˶༶¸ßËÙ»º´æµÄ½á¹¹£¬ÈçÉÏÃæ×óͼËùʾ¡£¶øÔÚGPUÖУ¬Ò²´æÔÚÀàËÆµÄ¶à¼¶¸ßËÙ»º´æ½á¹¹¡£Ö»ÊÇÏà±ÈCPU£¬GPU½«¸ü¶àµÄ¾§Ìå¹ÜÓÃÓÚÊýÖµ¼ÆË㣬¶ø²»ÊÇ»º´æºÍÁ÷¿Ø£¨Flow Control£©¡£ÕâÔ´ÓÚÁ½Õß²»Í¬µÄÉè¼ÆÄ¿±ê£¬CPUµÄÉè¼ÆÄ¿±êÊDz¢ÐÐÖ´Ðм¸Ê®¸öỊ̈߳¬¶øGPUµÄÄ¿±êÊÇÒª²¢ÐÐÖ´Ðм¸Ç§¸öÏ̡߳£

¿ÉÒÔÔÚÉÏÃæÓÒͼ¿´µ½£¬GPUµÄCoreÊýÁ¿ÒªÔ¶Ô¶¶àÓÚCPU£¬µ«ÊÇÓеñØÓÐʧ£¬¿ÉÒÔ¿´µ½GPUµÄCacheºÍControlÒªÔ¶Ô¶ÉÙÓÚCPU£¬ÕâʹµÃGPUµÄµ¥CoreµÄ×ÔÓɶÈÒªÔ¶Ô¶µÍÓÚCPU£¬»áÊܵ½Öî¶àÏÞÖÆ£¬¶øÕâ¸öÏÞÖÆ×îÖÕ»áÓɳÌÐòÔ±³Ðµ£¡£ÕâЩÏÞÖÆÒ²Ê¹µÃGPU±à³ÌÓëCPU¶àÏ̱߳à³ÌÓÐןù±¾Çø±ð¡£

ÕâÆäÖÐ×î¸ù±¾µÄÒ»¸öÇø±ð¿ÉÒÔÔÚÉÏÓÒͼÖп´³ö£¬Ã¿Ò»ÐÐÓжà¸öCore£¬È´Ö»ÓÐÒ»¸öControl£¬Õâ´ú±í×Ŷà¸öCoreͬһʱ¿ÌÖ»ÄÜÖ´ÐÐͬÑùµÄÖ¸ÁÕâÖÖģʽҲ³ÆÎª SIMT (Single Instruction Multiple Threads). ÕâÓëÏÖ´úCPUµÄSIMDµ¹ÊÇÓÐЩÏàËÆ£¬µ«È´Óиù±¾²î±ð£¬±¾ÎÄÔÚºóÃæ»á¼ÌÐøÉîÈëϸ¾¿¡£

´ÓGPUµÄ¼Ü¹¹³ö·¢£¬ÎÒÃǻᷢÏÖ£¬ÒòΪCacheºÍControlµÄȱʧ£¬Ö»ÓÐ ¼ÆËãÃܼ¯ Óë Êý¾Ý²¢ÐÐ µÄ³ÌÐòÊʺÏʹÓÃGPU¡£

¼ÆËãÃܼ¯£ºÊýÖµ¼ÆËãµÄ±ÈÀýÒªÔ¶´óÓÚÄÚ´æ²Ù×÷£¬Òò´ËÄÚ´æ·ÃÎʵÄÑÓʱ¿ÉÒÔ±»¼ÆËãÑڸǣ¬´Ó¶ø¶ÔCacheµÄÐèÇóÏà¶ÔCPUûÄÇô´ó¡£

Êý¾Ý²¢ÐУº´óÈÎÎñ¿ÉÒÔ²ð½âΪִÐÐÏàָͬÁîµÄСÈÎÎñ£¬Òò´Ë¶Ô¸´ÔÓÁ÷³Ì¿ØÖƵÄÐèÇó½ÏµÍ¡£

¶øÉî¶ÈѧϰǡºÃÂú×ãÒÔÉÏÁ½µã£¬±¾ÈËÈÏΪ£¬¼´Ê¹´æÔÚ±ÈÉî¶Èѧϰ¼ÆËãÁ¿¸üµÍÇÒ±í´ïÄÜÁ¦¸üÇ¿µÄÄ£ÐÍ£¬µ«Èç¹û²»Âú×ãÒÔÉÏÁ½µã£¬¶¼ÊƱشò²»¹ýGPU¼Ó³ÖϵÄÉî¶Èѧϰ¡£

2¡¢Fermi

FermiÊÇNvidiaÔÚ2010Äê·¢²¼µÄ¼Ü¹¹£¬ÒýÈëÁ˺ܶà½ñÌìÒ²ÈÔÈ»²»¹ýʱµÄ¸ÅÄ¶ø±ÈFermi¸üÔç֮ǰµÄ¼Ü¹¹£¬Ò²ÒѾ­ÕÒ²»µ½Ì«¶à×ÊÁÏÁË£¬ËùÒÔ±¾ÎÄ´ÓFermi¿ªÊ¼£¬ÏÈÀ´Ò»ÕÅ×ÜÀÀ¡£

GPUͨ¹ýHost Interface¶ÁÈ¡CPUÖ¸ÁGigaThread Engine½«Ìض¨µÄÊý¾Ý´ÓHost MemoryÖп½±´µ½ÄÚ²¿µÄFramebufferÖС£ËæºóGigaThread Engine´´½¨²¢·Ö·¢¶à¸öThread Blocksµ½¶à¸öSMÉÏ¡£¶à¸öSM±Ë´Ë¶ÀÁ¢£¬²¢¶ÀÁ¢µ÷¶È¸÷×ԵĶà¸öThread Wrapsµ½SMÄÚµÄCUDA CoresºÍÆäËûÖ´Ðе¥ÔªÉÏÖ´ÐС£

ÉÏÃæÕâ¾ä»°Óм¸¸ö¸ÅÄî½âÊÍһϣº

SM: ¶ÔÓ¦ÓÚÉÏͼÖеÄSMÓ²¼þʵÌ壬ÄÚ²¿ÓкܶàµÄCUDA Cores

Thread Block: Ò»¸öThread Block°üº¬¶à¸öỊ̈߳¨±ÈÈ缸°Ù¸ö£©£¬¶à¸öBlocksÖ®¼äµÄÖ´ÐÐÍêÈ«¶ÀÁ¢£¬Ó²¼þ¿ÉÒÔÈÎÒâµ÷¶È¶à¸öBlock¼äµÄÖ´ÐÐ˳Ðò£¬¶øBlockÄÚ²¿µÄ¶à¸öÏß³ÌÖ´ÐйæÔòÓɳÌÐòÔ±¾ö¶¨£¬³Ìͬʱ³ÌÐòÔ±¿ÉÒÔ¾ö¶¨Ò»¹²ÓжàÉÙ¸öBlocks

Thread Warp: 32¸öÏß³ÌΪһ¸öThread Warp£¬WarpµÄµ÷¶ÈÓÐÌØÊâ¹æÔò£¬±¾ÎĺóÃæ»á¼ÌÐøÉîÈë

ÓÉÓÚ±¾ÎIJ»Êǽ²ÔõôдCUDA£¬ËùÒÔÈç¹û¶ÔSM/BlockµÄ½âÊÍÈÔÈ»²»Ã÷°×£¬¿ÉÒԲο¼ÕâһС½Ú£ºhttps://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#scalable-programming-model

ÉÏͼ´æÔÚ16¸öSMs£¬Ã¿¸öSM´ø32¸öCuda Cores£¬Ò»¹²512¸öCuda Cores. ÕâЩÊýÁ¿²»Êǹ̶¨µÄ£¬ºÍ¾ßÌåµÄ¼Ü¹¹ºÍÐͺÅÏà¹Ø¡£

½ÓÏÂÀ´ÎÒÃÇÉîÈë¿´SM£¬À´Ò»ÕÅSM×ÜÀÀ¡£

´ÓÉÏͼ¿ÉÖª£¬SMÄÚÓÐ32¸öCUDA Cores£¬Ã¿¸öCUDA Coreº¬ÓÐÒ»¸öInteger arithmetic logic unit (ALU)ºÍÒ»¸öFloating point unit(FPU). ²¢ÇÒÌṩÁ˶ÔÓÚµ¥¾«¶ÈºÍË«¾«¶È¸¡µãÊýµÄFMAÖ¸Áî¡£

SMÄÚ»¹ÓÐ16¸öLD/STµ¥Ôª£¬Ò²¾ÍÊÇLoad/Storeµ¥Ôª£¬Ö§³Ö16¸öÏß³ÌÒ»Æð´ÓCache/DRAM´æÈ¡Êý¾Ý¡£

4¸öSFU£¬ÊÇÖ¸Special Function Unit£¬ÓÃÓÚ¼ÆËãsin/cosÕâÀàÌØÊâÖ¸Áÿ¸öSFUÿ¸öʱÖÓÖÜÆÚÖ»ÄÜÒ»¸öÏß³ÌÖ´ÐÐÒ»ÌõÖ¸Áî¡£¶øÒ»¸öWarp(32Ïß³Ì)¾ÍÐèÒªÖ´ÐÐ8¸öʱÖÓÖÜÆÚ¡£SFUµÄÁ÷Ë®ÏßÊÇ´ÓDispatch Unit½âñîµÄ£¬ËùÒÔµ±SFU±»Õ¼ÓÃʱ£¬Dispatch Unit»áȥʹÓÃÆäËûµÄÖ´Ðе¥Ôª¡£

֮ǰһֱÌáµ½Warp£¬µ«Ö®Ç°Ö»ËµÃ÷ÁËÊÇ32¸öỊ̈߳¬ÎÒÃÇÔÚÕâÀïÖÕÓÚ¿ªÊ¼Ïêϸ˵Ã÷£¬Ê×ÏÈÀ´¿´Dual Warp SchedulerµÄ¸ÅÀÀ¡£

ÔÚ֮ǰµÄSM¸ÅÀÀͼÒÔ¼°ÉÏͼÀ¿ÉÒÔ×¢Òâµ½SMÄÚÓÐÁ½¸öWarp SchedulerºÍÁ½¸öDispatch Unit. ÕâÒâζ×Å£¬Í¬Ò»Ê±¿Ì£¬»á²¢·¢ÔËÐÐÁ½¸öwarp£¬Ã¿¸öwarp»á±»·Ö·¢µ½Ò»¸öCuda Core Group(16¸öCUDA Core), »òÕß16¸öload/storeµ¥Ôª£¬»òÕß4¸öSFUÉÏÈ¥ÕæÕýÖ´ÐУ¬ÇÒÿ´Î·Ö·¢Ö»Ö´ÐÐ Ò»Ìõ Ö¸Á¶øWarp Schedulerά»¤Á˶à¸ö£¨±ÈÈ缸ʮ¸ö£©µÄWarp״̬¡£

ÕâÀïÒýÈëÁËÒ»¸öºËÐĵÄÔ¼Êø£¬ÈÎÒâʱ¿Ì£¬Ò»¸öWarpÀïµÄThread¶¼ÔÚÖ´ÐÐͬÑùµÄÖ¸Á¶ÔÓÚ³ÌÐòÔ±À´Ëµ£¬¹Û²â²»µ½Ò»¸öwarpÀﲻͬthreadµÄ²»Í¬Ö´ÐÐÇé¿ö¡£

µ«ÊÇÖÚËùÖÜÖª£¬²»Í¬Ï߳̿ÉÄÜ»á½øÈ벻ͬµÄ·ÖÖ§£¬ÕâʱÈçºÎÖ´ÐÐÒ»ÑùµÄÖ¸Á

¿ÉÒÔ¿´ÉÏͼ£¬µ±·¢Éú·Ö֧ʱ£¬Ö»»áÖ´ÐнøÈë¸Ã·ÖÖ§µÄỊ̈߳¬Èç¹û½øÈë¸Ã·ÖÖ§µÄÏß³ÌÉÙ£¬Ôò»á·¢Éú×ÊÔ´ÀË·Ñ¡£

ÔÚSM¸ÅÀÀͼÀÎÒÃÇ¿ÉÒÔ¿´µ½SMÄÚ64KBµÄOn-Chip Memory£¬ÆäÖÐ48KB×÷Ϊshared memory, 16KB×÷ΪL1 Cache. ¶ÔÓÚL1 Cache ÒÔ¼°·ÇOn-ChipµÄL2 Cache£¬Æä×÷ÓÃÓëCPU¶à¼¶»º´æ½á¹¹ÖеÄL1/L2 Cache·Ç³£½Ó½ü£¬¶øShared Memory£¬ÔòÊÇÏà±ÈCPUµÄÒ»¸ö´óÇø±ð¡£ÎÞÂÛÊÇCPU»¹ÊÇGPUÖеÄL1/L2 Cache£¬Ò»°ãÒâÒåÉ϶¼ÊÇÎÞ·¨±»³ÌÐòÔ±µ÷¶ÈµÄ£¬¶øShared MemoryÉè¼Æ³öÀ´¾ÍÊÇÈöɸø³ÌÐòÔ±½øÐе÷¶ÈµÄƬÉϸßËÙ»º´æ¡£

3¡¢Kepler

2012ÄêNVIDIA·¢²¼ÁËKepler¼Ü¹¹£¬ÎÒÃÇÖ±½Ó¿´Ê¹ÓÃKepler¼Ü¹¹µÄGTX680¸ÅÀÀͼ¡£

¿ÉÒÔ¿´µ½£¬Ê×ÏÈSM¸ÄÃû³ÉÁËSMX£¬µ«ÊÇËù´ú±íµÄ¸ÅÄîûÓдó±ä»¯£¬ÎÒÃÇÏÈ¿´¿´SMXµÄÄÚ²¿¡£

»¹ÊÇFermiÖÐÊìϤµÄÃû´Ê£¬¾ÍÊÇÊýÁ¿±ä¶àÁ˺ܶࡣ

±¾ÈËÈÏΪÕâ¸öKepler¼Ü¹¹ÖÐ×îÖµµÃÒ»ÌáµÄÊÇGPUDirect¼¼Êõ£¬¿ÉÒÔÈÆ¹ýCPU/System Memory£¬Íê³ÉÓë±¾»úÆäËûGPU»òÕ߯äËû»úÆ÷GPUµÄÖ±½ÓÊý¾Ý½»»»¡£±Ï¾¹ÔÚ2021ÄêµÄµ±½ñ£¬Bypass CPU/OSÒѾ­ÊÇ×îÖØÒª¼ÓËÙÊÖ¶ÎÖ®Ò»¡£

4¡¢Maxwell

2014ÄêNVIDIA·¢²¼ÁËMaxwell¼Ü¹¹£¬ÎÒÃÇÖ±½Ó¿´¼Ü¹¹Í¼¡£

¿ÉÒÔ¿´µ½£¬Õâ´ÎµÄSM¸Ä½ÐSMMÁË£¬Core¸ü¶àÁË£¬Ò²¸üÇ¿´óÁË£¬ÕâÀï¾Í²»¹ý¶à½éÉÜÁË¡£

5¡¢Pascal

2016ÄêNVIDIA·¢²¼ÁËPascal¼Ü¹¹£¬ÕâÊǵÚÒ»¸ö¿¼ÂÇDeep LearningµÄ¼Ü¹¹£¬Ò²ÊÇÒ»¸öÖµµÃ´óÊé±ÊÄ«µÄ¼Ü¹¹£¬Ê×ÏÈ¿´ÈçÏÂͼP100¡£

¿ÉÒÔ¿´µ½£¬»¹ÊÇÒ»Èç¼ÈÍùµØÔö¼ÓÁ˺ܶàCores, ÎÒÃÇϸ¿´SMÄÚ²¿¡£

µ¥¸öSMÖ»ÓÐ64¸öFP32 Cuda Cores£¬Ïà±ÈMaxwellµÄ128ºÍKeplerµÄ192£¬Õâ¸öÊýÁ¿ÒªÉٺܶ࣬²¢ÇÒ64¸öCuda Cores·ÖΪÁËÁ½¸öÇø¿é¡£ÐèҪעÒâµÄÊÇ£¬Register FileµÄ´óС²¢Î´¼õÉÙ£¬ÕâÒâζ×Åÿ¸öÏ߳̿ÉÒÔʹÓõļĴæÆ÷¸ü¶àÁË£¬¶øÇÒµ¥¸öSMÒ²¿ÉÒÔ²¢·¢¸ü¶àµÄthread/warp/block. ÓÉÓÚShared Memory²¢Î´¼õÉÙ£¬Í¬ÑùÒâζ×Åÿ¸öÏ߳̿ÉÒÔʹÓõÄShared Memory¼°Æä´ø¿í¶¼»á±ä´ó¡£

Ôö¼ÓÁË32¸öFP64 Cuda Cores, Ò²¾ÍÊÇÉÏͼµÄDP Unit. ´ËÍâFP32 Cuda Coreͬʱ¾ß±¸´¦ÀíFP16µÄÄÜÁ¦£¬ÇÒÍÌÍÂÂÊÊÇFP32µÄÁ½±¶£¬ÕâÈ´ÊÇΪÁËDeep Learning×¼±¸µÄÁË¡£

Õâ¸ö°æ±¾ÒýÈëÁËÒ»¸öºÜÖØÒªµÄ¶«Î÷£¬NVLink.

Ëæ×ŵ¥GPUµÄ¼ÆËãÄÜÁ¦Ô½À´Ô½ÄÑÒÔÓ¦¶ÔÉî¶Èѧϰ¶ÔËãÁ¦µÄÐèÇó£¬ÈËÃÇ×ÔÈ»¶øÈ»¿ªÊ¼Óöà¸öGPUÈ¥½â¾öÎÊÌâ¡£´Óµ¥»ú¶àGPUµ½¶à»ú¶àGPU£¬Õâµ±ÖжÔGPU»¥Á¬µÄ´ø¿íµÄÐèÇóÒ²Ô½À´Ô½¶à¡£¶à»úÖ®¼ä£¬²ÉÓÃInfiniBandºÍ100Gb EthernetȥͨÐÅ£¬ÔÚµ¥»úÄÚ£¬ÌرðÊÇ´Óµ¥»úµ¥GPUµ½´ïµ¥»ú8GPUÒÔºó£¬PCIeµÄ´ø¿íÍùÍù¾Í³ÉΪÁËÆ¿¾±¡£ÎªÁ˽â¾öÕâ¸öÎÊÌ⣬NVIDIAÌṩÁËNVLinkÓÃÒÔµ¥»úÄÚ¶àGPUÄڵĵ㵽µãͨÐÅ£¬´ø¿í´ïµ½ÁË160GB/s, ´óÔ¼5±¶ÓÚPCIe 3 x 16. ÏÂͼÊÇÒ»¸öµäÐ͵ĵ¥»ú8 P100ÍØÆË¡£

Ò»Ð©ÌØÊâµÄCPUÒ²¿ÉÒÔͨ¹ýNVLinkÓëGPUÁ¬½Ó£¬±ÈÈçIBMµÄPOWER8¡£

6¡¢Volta

2017ÄêNVIDIA·¢²¼ÁËVolta¼Ü¹¹£¬Õâ¸ö¼Ü¹¹¿ÉÒÔ˵ÊÇÍêÈ«ÒÔDeep LearningΪºËÐÄÁË£¬Ïà±ÈPascalÒ²ÊÇÒ»¸ö´ó°æ±¾¡£

Ê×ÏÈ»¹ÊÇÒ»Èç¼ÈÍùµØÔö¼ÓÁËSM/Core, ÎÒÃǾÍÖ±½Ó¿´µ¥¸öSMÄÚ²¿°É¡£

ºÍPascalµÄ¸Ä±äÀàËÆ£¬µ½ÁËVolta£¬Ö±½Ó²ðÁË4¸öÇø¿é£¬Ã¿¸öÇø¿é¶àÅäÁËÒ»¸öL0Ö¸Á´æ£¬¶øShared Memory/Register FileÕⶼûÓбäÉÙ£¬Ò²¾ÍºÍPascalµÄ¸Ä±äÒ»Ñù£¬µ¥¸öÏ߳̿ÉʹÓõÄ×ÊÔ´¸ü¶àÁË¡£µ¥¸öÇø¿é»¹¶à¸öÁ½¸öÃûΪTensor CoreµÄµ¥Ôª£¬Õâ¾ÍÊÇÕâ¸ö°æ±¾µÄºËÐÄÁË¡£¿ÉÒÔͲÛһϣ¬Õâ¸ö°æ±¾ÓÖ°ÑL1ºÍShared MemoryºÏ²¢ÁË¡£

ÎÒÃÇÊ×ÏÈ¿´CUDA Core, ¿ÉÒÔ¿´µ½£¬Ô­±¾µÄCUDA Core±»²ð³ÉÁËFP32 Cuda CoreºÍINT32 Cuda Core£¬ÕâÒâζ×Å¿ÉÒÔͬʱִÐÐFP32ºÍINT32µÄ²Ù×÷¡£

ÖÚËùÖÜÖª£¬DeepLearningµÄ¼ÆËãÆ¿¾±ÔÚ¾ØÕó³Ë·¨£¬ÔÚBLASÖгÆÎªGEMM£¬TensorCore¾ÍÊÇÖ»×öGEMM¼ÆËãµÄµ¥Ôª£¬¿ÉÒÔ¿´µ½£¬´ÓÕâÀ↑ʼ£¬NVIDIA´ÓSIMT×ßµ½ÁËSIMT+DSAµÄ»ìºÏ¡£

ÿ¸öTensorCoreÖ»×öÈçϲÙ×÷

D=A*B+C

¼´

ÆäÖÐA, B, C, D¶¼ÊÇ4x4µÄ¾ØÕó£¬ÇÒAºÍBÊÇFP16¾ØÕó£¬CºÍD¿ÉÒÔÊÇFP16»òÕßFP32. ͨ³££¬¸ü´óµÄ¾ØÕó¼ÆËã»á±»²ð½âΪÕâÑùµÄ4x4¾ØÕó³Ë·¨¡£

ÕâÑùµÄ¾ØÕó³Ë·¨ÊÇ×÷ΪThread Warp¼¶±ðµÄ²Ù×÷ÔÚCUDA 9¿ªÊ¼±©Â¶¸ø³ÌÐòÔ±£¬³ý´ËÒÔÍ⣬ʹÓÃcublasºÍcudnnµ±È»Í¬ÑùÒ²»áÔÚºÏÊʵÄÇé¿öÏÂÆôÓÃTensorCore.

ÔÚÕâ¸ö°æ±¾ÖУ¬ÁíÒ»¸öÖØÒª¸üÐÂÊÇNVLink, ¼òµ¥À´Ëµ¾ÍÊǸü¶à¸ü¿ì¡£Ã¿¸öÁ¬½ÓÌṩ˫Ïò¸÷×Ô25GB/sµÄ´ø¿í£¬²¢ÇÒÒ»¸öGPU¿ÉÒÔ½Ó6¸öNVLink£¬¶ø²»ÊÇPascalʱ´úµÄ4¸ö¡£Ò»¸öµäÐ͵ÄÍØÆËÈçÏÂͼ¡£

´ÓVolta¿ªÊ¼£¬Ï̵߳÷¶È·¢ÉúÁ˱仯£¬ÔÚPascalÒÔ¼°Ö®Ç°µÄGPUÉÏ£¬Ã¿¸öWarpÀïµÄ32¸öÏ̹߳²ÏíÒ»¸öProgram Counter (¼ò³ÆPC) £¬²¢ÇÒʹÓÃÒ»¸öActive Mask±íʾÈÎÒâʱ¿ÌÄÄЩÏß³ÌÊÇ¿ÉÔËÐеģ¬Ò»¸ö¾­µäµÄÔËÐÐÈçÏ¡£

Ö±µ½µÚÒ»¸ö·ÖÖ§ÍêÕû½áÊø£¬²Å»áÖ´ÐÐÁíÒ»¸ö·ÖÖ§¡£ÕâÒâζ×Åͬһ¸öwarpÄÚ²»Í¬·Ö֧ʧȥÁ˲¢·¢ÐÔ£¬²»Í¬·ÖÖ§µÄÏ̻߳¥ÏàÎÞ·¨·¢ËÍÐźŻòÕß½»»»Êý¾Ý£¬µ«Í¬Ê±£¬²»Í¬warpÖ®¼äµÄÏß³ÌÓÖ±£ÁôÁ˲¢·¢ÐÔ£¬Õâµ±ÖеÄÏ̲߳¢·¢´æÔÚ×Ų»Ò»Ö£¬ÊÂʵÉÏÈç¹û³ÌÐòÔ±²»×¢ÒâÕâµã£¬ºÜ¿ÉÄܵ¼ÖÂËÀËø¡£

ÔÚVoltaÖнâ¾öÁËÕâ¸öÎÊÌ⣬ͬwarpÄÚµÄÏß³ÌÓжÀÁ¢µÄPCºÍÕ»£¬ÈçÏ¡£

ÓÉÓÚÔËÐÐʱÈÔȻҪ·ûºÏSIMT£¬ËùÒÔ´æÔÚÒ»¸öµ÷¶ÈÓÅ»¯Æ÷¸ºÔ𽫿ÉÔËÐеÄÏ̷߳Ö×飬ʹÓÃSIMTģʽִÐС£¾­µäÔËÐÐÈçÏ¡£

ÉÏͼ¿ÉÒÔ×¢Òâµ½£¬ZµÄÖ´Ðв¢Ã»Óб»ºÏ²¢£¬ÕâÊÇÒòΪZ¿ÉÄÜ»á²úÉúһЩ±»ÆäËû·ÖÖ§ÐèÒªµÄÊý¾Ý£¬ËùÒÔµ÷¶ÈÓÅ»¯Æ÷Ö»ÓÐÔÚÈ·¶¨°²È«µÄÇé¿öϲŻáºÏ²¢Z£¬ËùÒÔÉÏͼZδºÏ²¢Ö»ÊÇÒ»ÖÖÇé¿ö£¬Ò»°ãÀ´Ëµ£¬µ÷¶ÈÓÅ»¯Æ÷×ã¹»´ÏÃ÷¿ÉÒÔ·¢ÏÖ°²È«µÄºÏ²¢¡£³ÌÐòÔ±Ò²¿ÉÒÔͨ¹ýÒ»¸öAPIÀ´Ç¿Öƺϲ¢£¬ÈçÏ¡£

´ÓVolta¿ªÊ¼£¬Ìá¸ßÁË¶Ô¶à½ø³Ì²¢·¢Ê¹ÓÃGPUµÄÖ§³Ö¡£ÔÚPascal¼°Ö®Ç°£¬¶à¸ö½ø³Ì¶Ôµ¥Ò»GPUµÄʹÓÃÊǾ­µäµÄʱ¼äƬ·½Ê½¡£´ÓVolta¿ªÊ¼£¬¶à¸öÓò»ÂúGPUµÄ½ø³Ì¿ÉÒÔÔÚGPUÉϲ¢ÐУ¬ÈçÏÂͼ¡£

7¡¢Turing

2018ÄêNVIDIA·¢²¼ÁËTuring¼Ü¹¹£¬¸öÈËÈÏΪÊÇVoltaµÄÑÓÉì°æ±¾£¬µ±È»Ê×Ïȸ÷ÖÖ²ÎÊý¼ÓÇ¿£¬²»¹ýÎÒÃÇÕâÀï¾Í²»Ìá²ÎÊý¼ÓÇ¿ÁË¡£

±È½ÏÖØÒªÊǵÄÔö¼ÓÁËÒ»¸öRT Core£¬È«ÃûÊÇRay Tracing Core, ¹ËÃû˼Ò壬Õâ¸öÊǸøÓÎÏ·»òÕß·ÂÕæÓõģ¬ÒòΪ±¾ÈËûÓдÓʹýÕâÀ๤×÷£¬¾Í²»½éÉÜÁË¡£

´ËÍâTuringÀïµÄTensor CoreÔö¼ÓÁ˶ÔINT8/INT4/BinaryµÄÖ§³Ö£¬ÎªÁ˼ÓËÙdeep learningµÄinference, Õâ¸öʱºòÉî¶ÈѧϰģÐ͵ÄÁ¿»¯²¿ÊðÒ²½¥½¥³ÉÊì¡£

8¡¢Ampere

2020ÄêNVIDIA·¢²¼ÁËAmpere¼Ü¹¹£¬Õâ¾ÍÊÇÒ»¸ö´ó°æ±¾ÁË£¬ÀïÃæÓÖϸ·ÖÁËGA100, GA102, GA104, ÎÒÃÇÕâÀï¾ÍÖ»¹Ø×¢GA100

ÎÒÃÇÏÈ¿´GA100µÄSM

ÕâÀïÃæ×îºËÐĵÄÉý¼¶¾ÍÊÇTensor CoreÁË

³ýÁËÔÚVoltaÖеÄFP16ÒÔ¼°ÔÚTuringÖеÄINT8/INT4/Binary£¬Õâ¸ö°æ±¾Ð¼ÓÈëÁËTF32, BF16, FP64µÄÖ§³Ö¡£×ÅÖØËµËµTF32ºÍBF16, ÈçÏÂͼ¡£

FP16µÄÎÊÌâÔÚÓÚ±íʾ·¶Î§²»¹»´ó£¬ÔÚÌݶȼÆËãʱÈÝÒ׳öÏÖunderflow, ¶øÇÒǰºóÏò¼ÆËãÒ²Ïà¶ÔÈÝÒ׳öÏÖoverflow, Ïà¶ÔÀ´Ëµ£¬ÔÚÉî¶Èѧϰ¼ÆËãÀ·¶Î§±È¾«¶ÈÒªÖØÒªµÃ¶à£¬ÓÚÊÇÓÐÁËBF16£¬ÎþÉüÁ˾«¶È£¬±£³ÖºÍFP32²î²»¶àµÄ·¶Î§£¬ÔÚ´Ëǰ±È½ÏÖªÃûÖ§³ÖBF16µÄ¾ÍÊÇTPU. ¶øTF32µÄÉè¼Æ£¬ÔÚÓÚ¼´¼³È¡ÁËBF16µÄºÃ´¦£¬ÓÖ±£³ÖÁËÒ»¶¨³Ì¶È¶ÔÖ÷Á÷FP32µÄ¼æÈÝ£¬FP32Ö»Òª½Ø¶Ï¾ÍÊÇTF32ÁË¡£ÏȽضϳÉTF32¼ÆË㣬ÔÙת³ÉFP32, ¶ÔÀúÊ·µÄ¹¤×÷¼¸ºõÎÞÓ°Ï죬ÈçÏÂͼ

ÁíÒ»¸ö±ä»¯ÔòÊÇϸÁ£¶ÈµÄ½á¹¹»¯Ï¡Ê裬Éî¶ÈѧϰģÐÍѹËõÕâ¸öÁìÓò³ýÁËÁ¿»¯£¬Ï¡ÊèÒ²ÊÇÒ»¸ö´ó·½Ïò£¬Ö»ÊÇÏ¡Ê軯ģÐÍÄÑÒÔÀûÓÃÓ²¼þ¼ÓËÙ£¬Õâ¸ö°æ±¾µÄGPUÔòΪϡÊèÌṩÁËһЩ֧³Ö£¬µ±Ç°µÄÖ÷ҪĿµÄÔòÊÇÓ¦ÓÃÓÚInference³¡¾°¡£

Ê×ÏÈ˵NVIDIA¶¨ÒåµÄÏ¡Êè¾ØÕó£¬ÕâÀï³ÆÎª2:4µÄ½á¹¹»¯Ï¡Ê裬2:4µÄÒâ˼ÊÇÿ4¸öÔªËØµ±ÖÐÓÐ2¸öÖµ·Ç0£¬ÈçÏÂͼ

Ê×ÏÈʹÓÃÕý³£µÄ³íÃÜweightѵÁ·£¬ÑµÁ·µ½ÊÕÁ²ºó²Ã¼ôµ½2:4µÄ½á¹¹»¯Ï¡ÊèTensor£¬È»ºó×ßfine tune¼ÌÐøÑµÁ··Ç0µÄweight, Ö®ºóµÃµ½µÄ2:4½á¹¹»¯Ï¡ÊèweightÀíÏëÇé¿öϾßÓкͳíÃÜweightÒ»ÑùµÄ¾«È·¶È£¬È»ºóʹÓôËÏ¡Ê軯ºóµÄweight½øÐÐInference. ¶øÕâ¸ö°æ±¾µÄTensorCoreÖ§³ÖÒ»¸ö2:4µÄ½á¹¹»¯Ï¡Êè¾ØÕóÓëÁíÒ»¸ö³íÃܾØÕóÖ±½ÓÏà³Ë¡£

×îºóÒ»¸ö±È½ÏÖØÒªµÄÌØÐÔ¾ÍÊÇMIG(Multi-Instance GPU)ÁË£¬ËäȻҵ½çµÄ¼ÆËã¹æÄ£È·ÊµÔ½À´Ô½´ó£¬µ«Ò²´æÔÚ²»ÉÙµÄÈÎÎñÒòΪÆäÌØÐÔµ¼ÖÂÎÞ·¨ÓÃÂúGPUµ¼ÖÂ×ÊÔ´ÀË·Ñ£¬ËùÒÔ´æÔÚÐèÇóÔÚÒ»¸öGPUÉÏÅܶà¸öÈÎÎñ£¬ÔÚÕâ֮ǰÓÐÐ©ÔÆ¼ÆËã³§ÉÌ»áÌṩÐéÄ⻯·½°¸¡£¶øÔÚ°²ÅàÖУ¬»áΪ´ËÐèÇóÌṩ֧³Ö£¬³ÆÎªMIG.

¿ÉÄÜ»áÓÐÈËÓÐÒÉÎÊ£¬ÔÚVoltaÖÐÒýÈëµÄ¶à½ø³ÌÖ§³Ö²»Êǽâ¾öÁËÎÊÌâÂ𣿾ٸöÀý×Ó£¬ÔÚVoltaÖУ¬ËäÈ»¶à¸ö½ø³Ì¿ÉÒÔ²¢ÐУ¬µ«ÊÇÓÉÓÚËùÓнø³Ì¶¼¿ÉÒÔ·ÃÎÊËùÓеÄÄÚ´æ×ÊÔ´£¬¿ÉÄÜ´æÔÚÒ»¸ö½ø³Ì°ÑËùÓеÄDRAM´ø¿íÕ¼ÂúÓ°Ïìµ½ÆäËû½ø³ÌµÄÔËÐУ¬¶øÕâЩ±»Ó°ÏìµÄ½ø³ÌºÜ¿ÉÄÜÓÐThroughput/LatencyÒªÇó¡£ËùÒÔÎÒÃÇÐèÒª¸üÑϸñµÄ¸ôÀë¡£

¶øÔÚ°²ÅàMIGÖУ¬Ã¿¸öA100¿ÉÒÔ±»·ÖΪ7¸öGPUʵÀý±»²»Í¬µÄÈÎÎñʹÓá£Ã¿¸öʵÀýµÄSMsÓжÀÁ¢µÄÄÚ´æ×ÊÔ´£¬¿ÉÒÔ±£Ö¤Ã¿¸öÈÎÎñÓзûºÏÔ¤ÆÚµÄÎȶ¨µÄThroughput/Latency. Óû§¿ÉÒÔ½«ÕâЩÐéÄâµÄGPUʵÀýµ±³ÉÕæÊµµÄGPUʹÓá£

9¡¢½áÓï

ÊÂʵÉϹØÓÚ¸÷¸ö¼Ü¹¹µÄϸ½Ú»¹Óкܶ࣬ÏÞÓÚÆª·ùÕâÀïÖ»Äܼòµ¥¸ÅÊö¡£Ëæºó½«·ÖÏíһЩ¸ü¾ßÌåµÄ¹ØÓÚCUDA±à³ÌµÄÄÚÈÝ¡£

Reference

https://images.nvidia.com/aem-dam/en-zz/Solutions /geforce / ampere /pdf/NVIDIA-ampere-GA102-GPU-Architecture-Whitepaper-V1.pdf
https://images.nvidia.com/aem-dam/en-zz/Solutions / design-visualization/technologies/turing-architecture/NVIDIA-Turing-Architecture-Whitepaper .pdf
https://images.nvidia.com/content/volta-architecture / pdf/ volta-architecture-whitepaper.pdf
https://images.nvidia.com/content/pdf/tesla/ whitepaper / pascal-architecture-whitepaper.pdf
https://www.microway.com/download/whitepaper/ NVIDIA _ Maxwell_GM204_Architecture_Whitepaper.pdf
https://www.nvidia.com/content/PDF/product - specifications/ GeForce_GTX_680_Whitepape r_FINAL.pdf
https://www.hardwarebg.com/b4k/files/nvidia _ gf100_whitepaper.pdf
https://developer.nvidia.com/content/life- triangle -nvidias - logical-pipeline
https://blog.nowcoder.net/n/4dcb2f6a55a34de 9ae6c 9067ba3d3bfb
https://jcf94.com/2020/05/24/2020-05-24- nvidia-arch/
https://docs.nvidia.com/cuda/cuda-c - programming-guide/index.html
https://en.wikipedia.org/wiki/Bfloat16 _ floating-point_format

¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª¡ª

°æÈ¨ÉùÃ÷£º±¾ÎÄΪCSDN²©Ö÷¡¸OneFlowÉî¶Èѧϰ¿ò¼Ü¡¹µÄÔ­´´ÎÄÕ£¬×ñÑ­CC 4.0 BY-SA°æÈ¨Ð­Òé£¬×ªÔØÇ븽ÉÏÔ­Îijö´¦Á´½Ó¼°±¾ÉùÃ÷¡£

Ô­ÎÄÁ´½Ó£ºhttps://blog.csdn.net/OneFlow_Official / article/ details/120681936

   
2363 ´Îä¯ÀÀ       31
Ïà¹ØÎÄÕÂ

ÆóÒµ¼Ü¹¹¡¢TOGAFÓëArchiMate¸ÅÀÀ
¼Ü¹¹Ê¦Ö®Â·-ÈçºÎ×öºÃÒµÎñ½¨Ä££¿
´óÐÍÍøÕ¾µçÉÌÍøÕ¾¼Ü¹¹°¸ÀýºÍ¼¼Êõ¼Ü¹¹µÄʾÀý
ÍêÕûµÄArchimateÊÓµãÖ¸ÄÏ£¨°üÀ¨Ê¾Àý£©
Ïà¹ØÎĵµ

Êý¾ÝÖÐ̨¼¼Êõ¼Ü¹¹·½·¨ÂÛÓëʵ¼ù
ÊÊÓÃArchiMate¡¢EA ºÍ iSpace½øÐÐÆóÒµ¼Ü¹¹½¨Ä£
ZachmanÆóÒµ¼Ü¹¹¿ò¼Ü¼ò½é
ÆóÒµ¼Ü¹¹ÈÃSOAÂ䵨
Ïà¹Ø¿Î³Ì

ÔÆÆ½Ì¨Óë΢·þÎñ¼Ü¹¹Éè¼Æ
ÖÐ̨սÂÔ¡¢ÖÐ̨½¨ÉèÓëÊý×ÖÉÌÒµ
ÒÚ¼¶Óû§¸ß²¢·¢¡¢¸ß¿ÉÓÃϵͳ¼Ü¹¹
¸ß¿ÉÓ÷ֲ¼Ê½¼Ü¹¹Éè¼ÆÓëʵ¼ù
×îл¼Æ»®
DeepSeekÔÚÈí¼þ²âÊÔÓ¦ÓÃʵ¼ù 4-12[ÔÚÏß]
DeepSeek´óÄ£ÐÍÓ¦Óÿª·¢Êµ¼ù 4-19[ÔÚÏß]
UAF¼Ü¹¹ÌåϵÓëʵ¼ù 4-11[±±¾©]
AIÖÇÄÜ»¯Èí¼þ²âÊÔ·½·¨Óëʵ¼ù 5-23[ÉϺ£]
»ùÓÚ UML ºÍEA½øÐзÖÎöÉè¼Æ 4-26[±±¾©]
ÒµÎñ¼Ü¹¹Éè¼ÆÓ뽨ģ 4-18[±±¾©]
 
×îÐÂÎÄÕÂ
¼Ü¹¹Éè¼Æ-̸̸¼Ü¹¹
ʵÏÖSaaS£¨Èí¼þ¼°·þÎñ£©¼Ü¹¹Èý´ó¼¼ÊõÌôÕ½
µ½µ×ʲôÊÇÊý¾ÝÖÐ̨£¿
ÏìӦʽ¼Ü¹¹¼ò½é
ÒµÎñ¼Ü¹¹¡¢Ó¦Óüܹ¹ÓëÔÆ»ù´¡¼Ü¹¹
×îпγÌ
Èí¼þ¼Ü¹¹Éè¼Æ·½·¨¡¢°¸ÀýÓëʵ¼ù
´Ó´óÐ͵çÉ̼ܹ¹Ñݽø¿´»¥ÁªÍø¸ß¿ÉÓüܹ¹Éè¼Æ
´óÐÍ»¥ÁªÍø¸ß¿ÉÓüܹ¹Éè¼ÆÊµ¼ù
ÆóÒµ¼Ü¹¹Ê¦ (TOGAF¹Ù·½ÈÏÖ¤)
ǶÈëʽÈí¼þ¼Ü¹¹Éè¼Æ¡ª¸ß¼¶Êµ¼ù
³É¹¦°¸Àý
ijÐÂÄÜÔ´µçÁ¦ÆóÒµ Èí¼þ¼Ü¹¹Éè¼Æ·½·¨¡¢°¸ÀýÓëʵ¼ù
Öк½¹¤ÒµÄ³Ñо¿Ëù ǶÈëʽÈí¼þ¿ª·¢Ö¸ÄÏ
ij¹ìµÀ½»Í¨ÐÐÒµ ǶÈëʽÈí¼þ¸ß¼¶Éè¼ÆÊµ¼ù
±±¾© º½Ìì¿Æ¹¤Ä³×Ó¹«Ë¾ Èí¼þ²âÊԼܹ¹Ê¦
±±¾©Ä³ÁìÏÈÊý×ÖµØÍ¼ ¼Ü¹¹Ê¦£¨Éè¼Æ°¸Àý£©