GPU¼Ü¹¹
1.OpenCLspecºÍ¶àºËÓ²¼þµÄ¶ÔÓ¦¹ØÏµ
1.AMD GPU¼Ü¹¹
2.Nvdia GPU¼Ü¹¹
3.Cell Broadband Engine
2.һЩ¹ØÓÚOpenCLµÄÌØÊâÖ÷Ìâ
1.OpenCL±àÒëϵͳ
2.Installable client driver
Ê×ÏÈÎÒÃÇ¿ÉÄÜÓÐÒÉÎÊ£¬¼ÈÈ»OpenCL¾ßÓÐÆ½Ì¨ÎÞ¹ØÐÔ£¬ÎÒÃÇΪʲô»¹ÒªÈ¥Ñо¿²»Í¬³§É̵ÄÌØÊâÓ²¼þÉè±¸ÄØ?
1.Á˽â³ÌÐòÖеÄÑ»·ºÍÊý¾ÝÔõÑùÓ³Éäµ½OpenCL KernelÖУ¬±ãÓÚÎÒÃÇÌá¸ß´úÂëÖÊÁ¿£¬»ñµÃ¸ü¸ßµÄÐÔÄÜ¡£
2.Á˽âAMDºÍNvdiaÏÔ¿¨µÄÇø±ð¡£
3.Á˽â¸÷ÖÖÓ²¼þµÄÇø±ð£¬¿ÉÒÔ°ïÖúÎÒÃÇʹÓûùÓÚÕâЩӲ¼þµÄÒ»Ð©ÌØÊâµÄOpenCLÀ©Õ¹£¬ÕâЩÀ©Õ¹ÔÚºóÃæ¿Î³ÌÖлὲµ½¡£
3¡¢´«Í³µÄCPU¼Ü¹¹

1.¶Ôµ¥¸öÏß³ÌÀ´Ëµ£¬CPUÓÅ»¯ÄÜ»ñµÃ×îСʱÑÓ£¬¶øÇÒCPUÒ²Êʺϴ¦Àí¿ØÖÆÁ÷Ãܼ¯µÄ¹¤×÷£¬±ÈÈçif¡¢else»òÕßÌø×ªÖ¸Áî±È½Ï¶àµÄÈÎÎñ¡£
2.¿ØÖÆÂß¼µ¥ÔªÔÚоƬÖÐÕ¼ÓõÄÃæ»ýÒª±ÈALUµ¥Ôª¶à¡£
3.¶à²ã´ÎµÄcacheÉè¼Æ±»ÓÃÀ´Òþ²ØÊ±ÑÓ£¨¿ÉÒԺܺõÄÀûÓÿռäºÍʱ¼ä¾Ö²¿ÐÔÔÀí£©
4.ÓÐÏ޵ļĴæÆ÷ÊýÁ¿Ê¹µÃͬʱactiveµÄÏ̲߳»ÄÜÌ«¶à¡£
5.¿ØÖÆÂß¼µ¥Ôª¼Ç¼³ÌÐòµÄÖ´ÐС¢ÌṩָÁ²¢ÐУ¨ILP£©ÒÔ¼°×îС»¯CPU¹ÜÏߵĿÕÖÃÖÜÆÚ£¨stalls£¬ÔÚ¸ÃʱÖÓÖÜÆÚ£¬ALUû×öʲôÊ£©¡£
4¡¢ÏÖ´úµÄGPGPU¼Ü¹¹

1.¶ÔÓÚÏÖ´úµÄGPU£¬Í¨³£µÄËüµÄ¿ØÖÆÂß¼µ¥Ôª±È½Ï¼òµ¥£¨ºÍcpuÏà±È£©£¬cacheÒ²±È½ÏС
2.Ïß³ÌÇл»¿ªÏú±È½ÏС£¬¶¼ÊÇÇáÁ¿¼¶µÄÏ̡߳£
3.GPUµÄÿ¸ö¡°ºË¡±ÓдóÁ¿µÄALUÒÔ¼°ºÜСµÄÓû§¿É¹ÜÀíµÄcache¡£[Õâ¶ùµÄºËÓ¦¸ÃÊÇÖ¸Õû¸öGPU]¡£
4.ÄÚ´æ×ÜÏß¶¼ÊÇ»ùÓÚ´ø¿íÓÅ»¯µÄ¡£150GB/sµÄ´ø¿í¿ÉÒÔʹµÃ´óÁ¿ALUͬʱ½øÐÐÄÚ´æ²Ù×÷¡£
5¡¢AMD GPUÓ²¼þ¼Ü¹¹
ÏÖÔÚÎÒÃǼòµ¥¿´ÏÂAMD 5870ÏÔ¿¨(cypress)µÄ¼Ü¹¹

1.20¸ösimdÒýÇæ£¬Ã¿¸ösimdÒýÇæ°üº¬16¸ösimd¡£
2.ÿ¸ösimd°üº¬16¸östream core
3.ÿ¸östream core¶¼ÊÇ5·µÄ³Ë·¨-¼Ó·¨ÔËËãµ¥Ôª£¨VLIW
processing£©¡£
4.µ¥¾«¶ÈÔËËã¿ÉÒÔ´ïµ½ Teraflops¡£
5.Ë«¾«¶ÈÔËËã¿ÉÒÔ´ïµ½544Gb/s

ÉÏͼΪһ¸ösimdÒýÇæµÄʾÒâͼ£¬Ã¿¸ösimdÒýÇæÓÉһϵÁеÄstream
core×é³É¡£
1.ÿ¸östream coreÊÇÒ»¸ö5·µÄVLIW´¦ÀíÆ÷£¬ÔÚÒ»¸öVLIWÖ¸ÁîÖУ¬¿ÉÒÔ×î¶à·¢Éä5¸ö±êÁ¿²Ù×÷¡£±êÁ¿²Ù×÷ÔÚÿ¸öpeÉÏÖ´ÐС£
2.CU£¨8xxϵÁÐcu¶ÔÓ¦Ó²¼þµÄsimd£©ÄÚµÄstream coreÖ´ÐÐÏàͬµÄVLIWÖ¸Áî¡£
3.ÔÚCU£¨»òÕß˵simd£©ÄÚͬʱִÐеÄwork item·ÅÔÚÒ»Æð³Æ×÷Ò»¸öwave£¬ËüÊÇcuÖÐͬʱִÐеÄÏß³ÌÊýÄ¿¡£ÔÚ5870ÖÐwave´óСÊÇ64£¬Ò²¾ÍÊÇ˵һ¸öcuÄÚ£¬×î¶àÓÐ64¸öwork
itemÔÚͬʱִÐС£
×¢£º5·µÄÔËËã¶ÔÓ¦(x,y,z,w),ÒÔ¼°T£¨³¬Ô½º¯Êý£©£¬ÔÚcaymanÖУ¬ÒѾȡÏûÁËT£¬¸Ä³ÉËÄ·ÁË¡£

ÎÒÃÇÏÖÔÚ¿´ÏÂAMD GPUÓ²¼þÔÚOpenCLÖеĶÔÓ¦¹ØÏµ£º
1.Ò»¸öworkitme¶ÔÓ¦Ò»¸öpe£¬pe¾ÍÊǵ¥¸öµÄVLIW core
2.Ò»¸öcu¶ÔÓ¦¶à¸öpe£¬cu¾ÍÊÇsimdÒýÇæ¡£

ÉÏͼÊÇAMD GPUµÄÄÚ´æ¼Ü¹¹£¨Ô¿Î¼þÖеÄͼÓеãС´íÎ󣬰ÑGlobal
memoryд³ÉÁËLDS)
1.¶Ôÿ¸öcuÀ´Ëµ£¬ËüʹÓõÄÄÚ´æ°üÀ¨onchipµÄLDSÒÔ¼°Ïà¹Ø¼Ä´æÆ÷¡£ÔÚ5870ÖУ¬Ã¿¸öLDSÊÇ32K£¬¹²32¸öbank£¬Ã¿¸öbank
1k£¬¶Áдµ¥Î»4 byte¡£
2.¶Ôû¸øcuÀ´Ëµ£¬ÓÐ8KµÄL1 cache¡££¨for 5870£©
3.¸÷¸öcuÖ®¼ä¹²ÏíµÄL2 cache£¬ÔÚ5870ÖÐÊÇ512K¡£
4.fast PathÖ»ÄÜÖ´ÐÐ32λ»ò32λ±¶ÊýµÄÄÚ´æ²Ù×÷¡£
5.complete pathÄܹ»Ö´ÐÐÔ×Ó²Ù×÷ÒÔ¼°Ð¡ÓÚ32λµÄÄÚ´æ²Ù×÷¡£

AMD GPUµÄÄÚ´æ¼Ü¹¹ºÍOpenCLÄÚ´æÄ£ÐÍÖ®¼äµÄ¶ÔÓ¦¹ØÏµ£º
1.LDS¶ÔÓ¦local memeory£¬Ö÷ÒªÓÃÀ´ÔÚÒ»¸öwork groupÄÚµÄwork
timesÖ®¼ä¹²ÏíÊý¾Ý¡£steam core·ÃÎÊLDSµÄËÙ¶ÈÒª±ÈGlobal memory¿ìÒ»¸öÊýÁ¿¼¶¡£
2.private memory¶ÔӦÿ¸öpeµÄ¼Ä´æÆ÷¡£
3.constant memoryÖ÷ÒªÊÇÀûÓÃÁËL1 cache
×¢Ò⣺¶ÔAMD CPU£¬constant memoryµÄ·ÃÎʰüÀ¨ÈýÖÖ·½Ê½£ºDirect-Addressing
Patterns£¬ÕâÖÖģʽҪÇó²»°üÀ¨ÐÐÁÐʽ£¬ËüµÄÖµ¶¼ÊÇÔÚkernelº¯Êý³õʼ»¯µÄʱºò¾Í¾ö¶¨ÁË£¬±ÈÈç´«ÈëÒ»¸ö¹Ì¶¨µÄ²ÎÊý¡£Same
Index Patterns£¬ËùÓеÄwork item¶¼·ÃÎÊÏàͬµÄË÷ÒýµØÖ·¡£Globally scoped
constant arrays£¬ÐÐÁÐʽ»á±»³õʼ»¯£¬Èç¹ûСÓÚ16K£¬»áʹÓÃL1 cache£¬´Ó¶ø¼Ó¿ì·ÃÎÊËÙ¶È¡£
µ±ËùÓеÄwork item·ÃÎʲ»Í¬µÄË÷ÒýµØÖ·Ê±ºò£¬²»Äܱ»cache£¬ÕâʱҪÔÚglobal
memoryÖжÁÈ¡¡£
6¡¢Nvdia GPU Femi¼Ü¹¹


GTX480-Compute 2.0 capability£º
1.ÓÐ15¸öcore»òÕß˵SM£¨Streaming Multiprocessors
£©¡£
2.ÿ¸öSM,Ò»°ãÓÐ32 cuda´¦ÀíÆ÷¡£
3.¹²480¸öcuda´¦ÀíÆ÷¡£
4.´øECCµÄglobal memory
5.ÿ¸öSMÄÚµÄḬ̈߳´32¸öµ¥Î»µ÷¶ÈÖ´ÐУ¬³Æ×÷warp¡£Ã¿¸öSMÄÚÓÐ2¸öwarp·¢Éäµ¥Ôª¡£
6.Ò»¸öcudaºËÓÉÒ»¸öALUºÍÒ»¸öFPU×é³É£¬FPUÊǸ¡µã´¦Àíµ¥Ôª¡£
SIMTºÍSIMD
SIMTÊÇÖ¸µ¥Ö¸Áî¡¢¶àÏ̡߳£
1.Ó²¼þ¾ö¶¨Á˶à¸öALUÖ®¼äÒª¹²ÏíÖ¸Áî¡£
2.ͨ¹ýÔ¤²âÀ´´¦Àí¶à¸öÏ̼߳äµÄDiverage(ÊÇָͬһ¸öwarpÖеÄÖ¸ÁîÖ´Ðз¾¶²úÉú²»Í¬£©¡£
3.NV°ÑÒ»¸öwarpÖÐÖ´ÐеÄÖ¸Áîµ±×÷Ò»¸öSIMT¡£SIMTÖ¸ÁîÖ¸¶¨ÁËÒ»¸öÏ̵߳ÄÖ´ÐÐÒÔ¼°·ÖÖ§ÐÐΪ¡£
SIMDÖ¸Áî¿ÉÒԵõ½ÏòÁ¿µÄ¿í¶È£¬ÕâµãºÍX86 SSEÏòÁ¿Ö¸Áî±È½ÏÀàËÆ¡£
SIMDµÄÖ´Ðк͹ÜÏßÏà¹Ø
ËùÓеÄALUÖ´ÐÐÏàͬµÄÖ¸Áî¡£
¸ù¾ÝÖ¸Áî¿ÉÒÔ¹ÜÏß·ÖΪ²»Í¬µÄ½×¶Î¡£µ±µÚÒ»ÌõÖ¸ÁîÍê³ÉµÄʱºò£¨4¸öÖÜÆÚ£©£¬ÏÂÌõÖ¸ÁʼִÐС£

Nvida GPUÄÚ´æ»úÖÆ£º

1.ÿ¸öSM¶¼ÓÐL1 cache£¬Í¨¹ýÅäÖã¬Ëü¿ÉÒÔÖ§³Öshared memory£¬Ò²¿ÉÒÔÖ§³Öglobal
memory¡£
2.48 KB Shared / 16 KB of L1 cache£¬16
KB Shared / 48 KB of L1 cache
3.work itemÖ®¼äÊý¾Ý¹²Ïíͨ¹ýshared memory
4.ÿ¸öSMÓÐ32KµÄregister bank
5.L2(768K)Ö§³ÖËùÓеIJÙ×÷£¬±ÈÈçload,storeµÈµÈ
6.Unified path to global for loads and
stores

ºÍAMD GPUÀàËÆ£¬NvµÄGPU ÄÚ´æÄ£ÐͺÍOpenCLÄÚ´æÄ£Ð͵ĶÔÓ¦¹ØÏµÊÇ£º
shared memory¶ÔÓ¦local memory
¼Ä´æÆ÷¶ÔÓ¦private memory
7¡¢Cell Broadband Engine

ÓÉË÷Äᣬ¶«Ö¥£¬IBMµÈÁªºÏ¿ª·¢£¬¿ÉÓÃÓÚǶÈëʽƽ̨£¬Ò²¿ÉÓÃÓÚ¸ßÐÔÄܼÆË㣨SP3´ÎÊÀ´úÓÎÏ·Ö÷»ú¾ÍÓÃÁËcell´¦ÀíÆ÷£©¡£
1.Bladecenter serversÌṩOpenCL driverÖ§³Ö
2.ÈçͼËùʾ£¬cell´¦ÀíÆ÷ÓÉÒ»¸öPower Processing Element
(PPE) ºÍ¶à¸öSynergistic Processing Elements (SPE)×é³É¡£
3.Uses the IBM XL C for OpenCL compiler
11
4.Cell Power/VMX CPU µÄÉ豸ÀàÐÍÊÇCL_DEVICE_TYPE_CPU£¬Cell
SPU µÄÉ豸ÀàÐÍÊÇCL_DEVICE_TYPE_ACCELERATOR¡£
5.OpenCL AcceleratorÉ豸ºÍCPU¹²ÏíÄÚ´æ×ÜÏß¡£
6.ÌṩһЩÀ©Õ¹£¬±ÈÈçDevice Fission¡¢Migrate ObjectsÀ´Ö¸¶¨Ò»¸öOpenCL¶ÔÏóפÁôÔÚʲôλÖá£
7.²»Ö§³ÖOpenCL image¶ÔÏó£¬Ô×Ó²Ù×÷£¬sampler¶ÔÏóÒÔ¼°×Ö½ÚÄÚ´æµØÖ·¡£
8¡¢OpenCL±àÒëϵͳ
1.LLVM-µ×²ãµÄÐéÄâ»ú
2.KernelÊ×ÏÈÔÚfront-end±»±àÒë³ÉLLVM IR
3.LLVMÊÇÒ»¸ö¿ªÔ´µÄ±àÒëÆ÷£¬¾ßÓÐÆ½Ì¨¶ÀÁ¢ÐÔ£¬¿ÉÒÔÖ§³Ö²»Í¬³§É̵Äback_end±àÒë,ÍøÖ·£ºhttp://llvm.org
9¡¢Installable Client Driver
1.ICDÖ§³Ö²»Í¬³§É̵ÄOpenCLʵʩÔÚϵͳÖй²´æ¡£
2.´úÂë½ô±»Á´½Ó½Óµ½libOpenCL.so
3.Ó¦ÓóÌÐò¿ÉÔÚÔËÐÐʱѡÔñ²»Í¬µÄOpenCLʵʩ£¨¾ÍÊÇÑ¡Ôñ²»Í¬platform£©
4.ÏÖÔÚµÄGPUÇý¶¯»¹²»Ö§³Ö¿ç³§É̵Ķà¸öGPUÉ豸ͬʱ¹¤×÷¡£
5.ͨ¹ýclGetPlatformIDs() ºÍclGetPlatformInfo()
À´¼ì²â²»Í¬³§É̵ÄOpenCLƽ̨¡£

|