±¾½ÚÖ÷Òª½²ÊöGPUµÄmemory¼Ü¹¹¡£ÓÅ»¯»ùÓÚGPU deviceµÄkernel³ÌÐòʱ£¬ÎÒÃÇÐèÒªÁ˽âºÜ¶àGPUµÄmemory֪ʶ£¬±ÈÈçÄÚ´æºÏ²¢£¬bank
conflit£¨³åÍ»£©µÈµÈ£¬ÕâÑù²ÅÄÜÕë¶Ô¾ßÌåËã·¨×öһЩÓÅ»¯¹¤×÷¡£
1¡¢GPU×ÜÏßѰַ½éÉÜ

¼Ù¶¨XÊÇÒ»¸öÖ¸ÏòÕûÊý£¨32λÕûÊý£©Êý×éµÄÖ¸Õ룬Êý×éµÄÊ×µØÖ·Îª0x00001232¡£Ò»¸öÏß³ÌÒª·ÃÎÊÔªËØX[0],int
tmp = X[0];

¼Ù¶¨memory×ÜÏß¿í¶ÈΪ256λ(HD5870¾ÍÊÇÈç´Ë,¼´Îª32×Ö½Ú£©£¬ÒòΪ»ùÓÚ×Ö½ÚµØÖ·µÄ×ÜÏßÒª·ÃÎÊmemeory£¬±ØÐëºÍ×ÜÏß¿í¶È¶ÔÆë£¬Ò²¾ÍÊÇ˵°´±ØÐë32×Ö½Ú¶ÔÆëÀ´·ÃÎÊmemory£¬±ÈÈç·ÃÎÊ0x00000000,0x00000020,0x00000040,¡µÈ£¬ËùÒÔÎÒÃÇÒªµÃµ½µØÖ·0x00001232ÖеÄÊý¾Ý£¬±ÈÈç·ÃÎʵØÖ·0x00001220,Õâʱ£¬Ëü»áͬʱµÃµ½0x00001220µ½
0x0000123F µÄËùÓÐÊý¾Ý¡£ÒòΪÎÒÃÇÖ»ÊÇÈ¡µÄÒ»¸ö32λÕûÊý£¬ËùÒÔÓÐÓõÄÊý¾ÝÊÇ4¸ö×Ö½Ú£¬ÆäËü28µÄ×Ö½ÚµÄÊý¾Ý¶¼±»ÀË·ÑÁË£¬°×°×ÏûºÄÁË´ø¿í¡£

2¡¢ºÏ²¢ÄÚ´æ·ÃÎÊ
ΪÁËÀûÓÃ×ÜÏß´ø¿í£¬GPUͨ³£°Ñ¶à¸öÏ̵߳ÄÄÚ´æ·ÃÎʾ¡Á¿ºÏ²¢µ½½ÏÉÙµÄÄÚ´æÇëÇóÃüÁîÖÐÈ¥¡£
¼Ù¶¨ÏÂÃæµÄOpenCL kernel´úÂ룺int tmp = X[get_global_id(0)];
Êý×éXµÄÊ×µØÖ·ºÍÇ°ÃæÀý×ÓÒ»Ñù£¬Ò²ÊÇ0x00001232£¬Ôòǰ16¸öÏ߳̽«·ÃÎʵØÖ·£º0x00001232
µ½ 0x00001272¡£¼ÙÉèÿ¸ömemory·ÃÎÊÇëÇó¶¼µ¥¶À·¢Ë͵ϰ£¬ÔòÓÐ16¸örequest£¬ÓÐÓõÄÊý¾ÝÖ»ÓÐ64×Ö½Ú£¬À˷ѵôÁË448×Ö½Ú£¨16*28£©¡£
¼Ù¶¨¶à¸öÏ̷߳ÃÎÊ32¸ö×Ö½ÚÒÔÄڵĵØÖ·£¬ËüÃǵķÃÎÊ¿ÉÒÔͨ¹ýÒ»¸ömemory requestÍê³É£¬ÕâÑù¿ÉÒÔ´ó´óÌá¸ß´ø¿íÀûÓÃÂÊ£¬ÔÚרҵÊõÓïÃèÊöÖÐÕâÑùµÄºÏ²¢·ÃÎʳÆ×÷coalescing¡£

ÀýÈçÉÏÃæ16¸öÏ̷߳ÃÎʵØÖ·0x00001232 µ½ 0x00001272£¬ÎÒÃÇÖ»ÐèÒª3´Îmemory
requst¡£
ÔÚHD5870ÏÔ¿¨ÖУ¬Ò»¸öwaveÖÐ16¸öÁ¬ÐøÏ̵߳ÄÄÚ´æ·ÃÎʻᱻºÏ²¢£¬³Æ×÷quarter-wavefront£¬ÊÇÖØÒªµÄÓ²¼þµ÷¶Èµ¥Î»¡£
ÏÂÃæµÄͼÊÇHD5870ÖУ¬Ê¹ÓÃmemory·ÃÎʺϲ¢ÒÔ¼°Ã»ÓÐʹÓúϲ¢µÄbandwidth±È½Ï£º

ÏÂͼÊÇGTX285ÖеıȽϣº

3¡¢Global memoryµÄbankÒÔ¼°channel·ÃÎʳåÍ»
ÎÒÃÇÖªµÀÄÚ´æÓÉbank£¬channel×é³É£¬bankÊÇʵ¼Ê´æ´¢Êý¾ÝµÄµ¥Ôª£¬Ò»¸ömc¿ÉÒÔÁ¬½Ó¶à¸öchannel£¬Ðγɵ¥mc£¬¶àchannelµÄÁ¬½Ó·½Ê½¡£ÔÚÎïÀíÉÏ£¬²»Í¬bankµÄÊý¾Ý¿ÉÒÔͬʱ·ÃÎÊ£¬ÏàͬµÄbankµÄÊý¾ÝÔò±ØÐë´®ÐзÃÎÊ£¬channelÒ²ÊÇͬÑùµÄµÀÀí¡£µ«ÓÉÓںϲ¢·ÃÎʵÄÔµ¹Ê£¬¶ÔÓÚglobal
memoryÀ´Ëµ£¬bank conflitÓ°ÏìҪСºÜ¶à£¬³ý·ÇÊǷǺϲ¢ÎÊ£¬²»Í¬Ï̷߳ÃÎÊͬһ¸öbank¡£ÀíÏëÇé¿öÏ£¬ÎÒÃÇÓ¦¸Ã×öµ½²»Í¬µÄworkgroup·ÃÎʵIJ»Í¬µÄbank£¬Í¬Ò»¸ögroupÄÚ£¬×îºÃÓúϲ¢²Ù×÷¡£
ÏÂÃæÎÒ¼òµ¥µÄ»Ò»¸öͼ£¬²»ÖªµÀÊÇ·ñ׼ȷ£¬½ö¹©²Î¿¼£º


ÔÚHD5870ÖУ¬memoryµØÖ·µÄµÍ8λ±íʾһ¸öbankÖеÄÊý¾Ý£¬½ÓÏÂÀ´µÄ3λ±íʾchannel£¨¹²8¸öchannel£©£¬bankλµÄ¶àÉÙÒÀÀµÓÚÏÔ´æÖÐbankµÄ¶àÉÙ¡£
4¡¢local memoryµÄbank conflit
bank·ÃÎʳåÍ»¶Ôlocal memory²Ù×÷Óиü´óµÄÓ°Ï죨Ïà±ÈÓÚglobal memory£©£¬Á¬ÐøµÄlocal
memory·ÃÎʵØÖ·£¬Ó¦¸ÃÓ³Éäµ½²»Í¬µÄbankÉÏ£¬

ÔÚAMDÏÔ¿¨ÖУ¬Ò»¸ö²úÉúbank·ÃÎʳåÍ»wave½«»áµÈ´ýËùÓеÄlocal memory·ÃÎÊÍê³É£¬Ó²¼þ²»ÄÜͨ¹ýÇл»µ½ÁíÒ»¸öwaveÀ´Òþ²Ølocal
memory·ÃÎÊʱÑÓ¡£ËùÒÔ¶Ôlocal memory·ÃÎʵÄÓÅ»¯¾ÍºÜÖØÒª¡£HD5870ÏÔ¿¨ÖУ¬Ã¿¸öcu£¨simd£©ÓÐ32bank£¬Ã¿¸öbank
1k£¬°´4×Ö½Ú¶ÔÆë·ÃÎÊ¡£Èç¹ûûÓÐbank conflit£¬Ã¿¸öbankÄܹ»Ã»ÓÐÑÓʱµÄ·µ»ØÒ»¸öÊý¾Ý£¬ÏÂÃæµÄͼ¾ÍÊÇÕâÖÖÇé¿ö¡£

Èç¹û¶à¸ömemory·ÃÎʶÔÓ¦µ½Ò»¸öbankÉÏ£¬ÔòconflitsµÄÊýÁ¿¾ö¶¨Ê±ÑӵĴóС¡£ÏÂÃæµÄ·ÃÎÊ·½Ê½½«»áÓÐ3±¶µÄʱÑÓ¡£

µ«ÊÇ£¬Èç¹ûËùÓзÃÎʶ¼Ó³Éäµ½Ò»¸öbankÉÏ£¬Ôòϵͳ»á¹ã²¥Êý¾Ý·ÃÎÊ£¬²»»á²úÉú¶îÍâʱÑÓ¡£

GPUÏ̼߳°µ÷¶È
±¾½ÚÖ÷Òª½²ÊöOpenCLÖеÄWorkgroupÈçºÎÔÚÓ²¼þÉ豸Öб»µ÷¶ÈÖ´ÐС£Í¬Ê±Ò²»á½²Ò»ÏÂͬһ¸öworkgroupÖеÄworkitem£¬Èç¹ûËüÃÇÖ´ÐеÄÖ¸Áî·¢Éúdiverage£¨¾ÍÊÇÖ´ÐÐÖ¸Áî²»Ò»Ö£©¶ÔÐÔÄܵÄÓ°Ï졣ѧϰOpenCL²¢Ðбà³Ì£¬²»½ö½öÊǶÔOpenCL
Spec±¾ÉíÁ˽⣬¸üÖØÒªµÄÊÇÁ˽âOpenCLÓ²¼þÉ豸µÄÌØÐÔ£¬ÏÖ½×¶ÎÀ´Ëµ£¬Ö÷ÒªÊÇÁ˽âGPUµÄµÄ¼Ü¹¹ÌØÐÔ£¬ÕâÑù²ÅÄÜÕë¶ÔÓ²¼þÌØÐÔÓÅ»¯Ëã·¨¡£
ÏÖÔÚOpenCLµÄSpecÊÇ1.1£¬Ëæ×ÅÓ²¼þµÄ·¢Õ¹£¬ÏàÐÅOpenCL»áÖ§³Ö¸ü¶àµÄ²¢ÐмÆËãÌØÐÔ¡£»ùÓÚOpenCLµÄ²¢ÐмÆËã²Å¸Õ¸ÕÆð²½£¬¡
1¡¢workgroupµ½Ó²¼þÏß³Ì

ÔÚOpenCLÖУ¬Kernelº¯Êý±»workgroupÖеÄworkitem£¨Ị̈߳¬ÎÒ¿ÉÄÜ»ìÓÃÕâÁ½¸ö¸ÅÄִÐС£ÔÚÓ²¼þ²ã´Î£¬workgroup±»Ó³Éäµ½Ó²¼þµÄcu£¨compute
unit£©µ¥ÔªÀ´Ö´ÐоßÌ弯Ë㣬¶øcuÒ»°ãÓɸü¶àµÄSIMT£¨µ¥Ö¸ÁỊ̈߳©pe£¨processing elements£©×é³É¡£ÕâЩpeÖ´ÐоßÌåµÄworkitem¼ÆË㣬ËüÃÇÖ´ÐÐͬÑùµÄÖ¸Áµ«²Ù×÷µÄÊý¾Ý²»Ò»Ñù£¬ÓÃsimdµÄ·½Ê½Íê³É×îÖյļÆËã¡£
ÓÉÓÚÓ²¼þµÄÏÞÖÆ£¬±ÈÈçcuÖÐpeÊýÁ¿µÄÏÞÖÆ£¬Êµ¼ÊÉÏworkgroupÖÐÏ̲߳¢²»ÊÇͬʱִÐе쬶øÊÇÓÐÒ»¸öµ÷¶Èµ¥Î»£¬Í¬Ò»¸öworkgroupÖеÄỊ̈߳¬°´ÕÕµ÷¶Èµ¥Î»·Ö×飬ȻºóÒ»×éÒ»×éµ÷¶ÈÓ²¼þÉÏÈ¥Ö´ÐС£Õâ¸öµ÷¶Èµ¥Î»ÔÚnvµÄÓ²¼þÉϳÆ×÷warp,ÔÚAMDµÄÓ²¼þÉϳÆ×÷wavefront£¬»òÕß¼ò³ÆÎªwave¡£

ÉÏͼÏÔʾÁËworkgroupÖУ¬Ï̱߳»»®·ÖΪ²»Í¬waveµÄ·Ö×éÇé¿ö¡£waveÖеÄÏß³Ìͬ²½Ö´ÐÐÏàͬµÄÖ¸Áµ«Ã¿¸öÏ̶߳¼ÓÐ×Ô¼ºµÄregister״̬£¬¿ÉÒÔÖ´Ðв»Í¬µÄ¿ØÖÆ·ÖÖ§¡£±ÈÈçÒ»¸ö¿ØÖÆÓï¾ä
if(A)
{
¡ //·ÖÖ§A
}
else
{
¡ //·ÖÖ§B
}
|
¼ÙÉèwaveÖеÄ64¸öÏß³ÌÖУ¬ÆæÊýÏß³ÌÖ´ÐзÖÖ§A£¬Å¼ÊýÏß³ÌÖ´ÐзÖÖ§B£¬ÓÉÓÚwaveÖеÄÏ̱߳ØÐëÖ´ÐÐÏàͬµÄÖ¸ÁËùÒÔÕâÌõ¿ØÖÆÓï¾ä±»²ð·ÖΪÁ½´ÎÖ´ÐÐ[±àÒë½×¶Î½øÐÐÁË·ÖÖ§Ô¤²â]£¬µÚÒ»´Î·ÖÖ§AµÄÆæÊýÏß³ÌÖ´ÐУ¬Å¼ÊýÏ߳̽øÐпղÙ×÷£¬µÚ¶þ´ÎżÊýÏß³ÌÖ´ÐУ¬ÆæÊýÏ߳̿ղÙ×÷¡£Ó²¼þϵͳÓÐÒ»¸ö64λmask¼Ä´æÆ÷£¬µÚÒ»´ÎÊÇËüΪ01¡0101,µÚ¶þ´Î»á½øÐз´×ª²Ù×÷10¡1010£¬¸ù¾Ýmask¼Ä´æÆ÷µÄÖÃλÇé¿ö£¬À´Ñ¡ÔñÖ´Ðв»Í¬µÄÏ̡߳£¿É¼û¶ÔÓÚ·ÖÖ§¶àµÄkernelº¯Êý£¬Èç¹û²»Í¬Ï̵߳ÄÖ´Ðз¢ÉúdiverageµÄÇé¿öÌ«¶à£¬»áÓ°Ïì³ÌÐòµÄÐÔÄÜ¡£
2¡¢AMD waveµ÷¶È

AMD GPUµÄÏ̵߳÷¶Èµ¥Î»ÊÇwave£¬Ã¿¸öwaveµÄ´óСÊÇ64¡£Ö¸Áî·¢Éäµ¥Ôª·¢Éä5·µÄVLIWÖ¸Áÿ¸östream
core(SC)Ö´ÐÐÒ»ÌõVLIWÖ¸Á16¸östream coreÔÚÒ»¸öʱÖÓÖÜÆÚÖ´ÐÐ16ÌõVLIWÖ¸Áÿ¸öʱÖÓÖÜÆÚ£¬1/4wave±»Íê³É£¬Õû¸öwaveÍê³ÉÐèÒªËĸöÁ¬ÐøµÄʱÖÓÖÜÆÚ¡£
ÁíÍ⻹ÓÐÒÔϼ¸µãÖµµÃÎÒÃÇÁ˽⣺
?·¢ÉúRAW hazardÇé¿öÏ£¬Õû¸öwave±ØÐëstall 4¸öʱÖÓÖÜÆÚ£¬Õâʱ£¬Èç¹ûÆäËüµÄwave¿ÉÒÔÀûÓã¬ALU»áÖ´ÐÐÆäËüµÄwaveÒÔ±ãÒþ²ØÊ±ÑÓ£¬8¸öʱÖÓÖÜÆÚºó£¬Èç¹ûÏÈǰµÈ´ýwaveÒѾ׼±¸ºÃÁË£¬ALU»á¼ÌÐøÖ´ÐÐÕâ¸öwave¡£
?Á½¸öwaveÄܹ»ÍêÈ«Òþ²ØRAWʱÑÓ¡£µÚÒ»¸öwaveÖ´ÐÐʱºò£¬µÚ¶þ¸öwaveÔÚµ÷¶ÈµÈ´ýÊý¾Ý£¬µÚÒ»¸öwaveÖ´ÐÐÍêʱ£¬µÚ¶þ¸öwave¿ÉÒÔÁ¢¼´¿ªÊ¼Ö´ÐС£
3¡¢nv warpµ÷¶È

work groupÒÔ32¸öÏß³ÌΪµ¥Î»£¬·Ö³É²»Í¬warp£¬ÕâЩwarp±»SMµ÷¶ÈÖ´ÐС£Ã¿´ÎwarpÖÐÒ»°ëµÄÏ̱߳»·¢ÉäÖ´ÐУ¬¶øÇÒÕâЩÏß³ÌÄܹ»½»´íÖ´ÐС£¿ÉÒÔÓõÄwarpÊýÁ¿ÒÀÀµÓÚÿ¸öblockµÄ×ÊÔ´Çé¿ö¡£³ýÁË´óС²»Ò»ÑùÍ⣬waveºÍwarpÔÚÓ²¼þÌØÐÔÉϺÜÏàËÆ¡£
4¡¢Occupancy¿ªÏú
ÔÚÿ¸öcuÖУ¬Í¬Ê±¼¤»îµÄwaveÊýÁ¿ÊÇÊÜÏÞÖÆµÄ£¬ÕâºÍÿ¸öÏß³ÌʹÓÃregisterºÍlocal memory´óСÓйأ¬ÒòΪ¶ÔÓÚÿ¸öcu£¬registerºÍlocal
memory×ÜÁ¿ÊÇÒ»¶¨µÄ¡£
ÎÒÃÇÓÃÊõÓïOccupancyÀ´ºâÁ¿Ò»¸öcuÖÐactive waveµÄÊýÁ¿¡£Èç¹ûͬʱ¼¤»îµÄwaveÔ½¶à£¬ÄܸüºÃµÄÒþ²ØÊ±ÑÓ£¬ÔÚºóÃæÐÔÄÜÓÅ»¯µÄÕ½ÚÖУ¬ÎÒÃÇ»¹»á¸ü¾ßÌåÌÖÂÛOccupancy¡£
5¡¢¿ØÖÆÁ÷ºÍ·ÖÖ§Ô¤²â(prediction)
Ç°ÃæÎÒ˵ÁËif elseµÄ·ÖÖ§Ö´ÐÐÇé¿ö£¬µ±Ò»¸öwaveÖв»Í¬Ï̳߳öÏÖdiverageµÄʱºò£¬»áͨ¹ýmaskÀ´¿ØÖÆÏ̵߳ÄÖ´Ðз¾¶¡£ÕâÖÖÔ¤²â
prediction£©µÄ·½Ê½»ùÓÚÏÂÃæµÄ¿¼ÂÇ£º
1.·ÖÖ§µÄ´úÂë¶¼±È½Ï¶Ì
2.ÕâÖÖpredictionµÄ·½Ê½±ÈÌõ¼þÖ¸Áî¸ü¸ßЧ¡£
3.ÔÚ±àÒë½×¶Î£¬±àÒëÆ÷Äܹ»ÓÃpreditionÌæ»»switch»òÕßif
else¡£
prediction ¿ÉÒÔ¶¨ÒåΪ£º¸ù¾ÝÅжÏÌõ¼þ£¬Ìõ¼þÂë±»ÉèÖÃΪtrue»òÕßfalse¡£
__kernel
void test() {
int tid= get_local_id(0) ;
if( tid %2 == 0)
Do_Some_Work() ;
else
Do_Other_Work() ;
} |
ÀýÈçÉÏÃæµÄ´úÂë¾ÍÊÇ¿ÉÔ¤²âµÄ£¬
Predicate = True for threads 0,2,4¡.
Predicate = False for threads 1,3,5¡.
|
ÏÂÃæÔÚ¿´Ò»¸ö¿ØÖÆÁ÷diverageµÄÀý×Ó

1.ÔÚcase1ÖУ¬ËùÓÐÆæÊýÏß³ÌÖ´ÐÐDoSomeWork2(),ËùÓÐżÊýÏß³ÌÖ´ÐÐDoSomeWorks,µ«ÊÇÔÚÿ¸öwaveÖУ¬ifºÍelse´úÂëÖ¸Áî¶¼Òª±»·¢Éä¡£
2.ÔÚcase2ÖУ¬µÚÒ»¸öwaveÖ´ÐÐif£¬ÆäËüµÄwaveÖ´ÐÐelse£¬ÕâÖÖÇé¿öÏ£¬Ã¿¸öwaveÖУ¬ifºÍelse´úÂëÖ»±»·¢ÉäÒ»¸ö¡£

ÔÚpredictionÏ£¬Ö¸ÁîÖ´ÐÐʱ¼äÊÇif£¬elseÁ½¸ö´úÂë¿ìÖ´ÐÐʱ¼äÖ®ºÍ¡£
6¡¢Warp voting
warp votingÊÇÒ»¸öwarpÄÚµÄÏß³ÌÖ®¼äÒþʽͬ²½µÄ»úÖÆ¡£

±ÈÈçÒ»¸öwarpÄÚÏß³ÌͬʱдLocal meoryij¸öµØÖ·£¬ÔÚÏ̲߳¢·¢Ö´ÐÐʱºò£¬warp
voting»úÖÆ¿ÉÒÔ±£Ö¤ËüÃǵÄǰºó˳ÐòÕýÈ·¡£¸üÏêϸµÄwarp voting´ó¼Ò¿ÉÒԲο¼cudaµÄ×ÊÁÏ¡£
ÔÚOpenCL±à³ÌÖУ¬ÓÉÓÚ¸÷ÖÖÓ²¼þÉ豸²»Í¬£¬µ¼ÖÂÎÒÃDZØÐëÕë¶Ô²»Í¬µÄÓ²¼þ½øÐÐÓÅ»¯£¬ÕâÒ²ÊÇOpenCL±à³ÌµÄÒ»¸öÌôÕ½£¬±ÈÈçwarpºÍwaveÊýÁ¿µÄ²»Í¬£¬Ê¹µÃÎÒÃÇÔÚÉè¼Æworkgroup´óСʱºò£¬±ØÐëÕë¶Ô×Ô¼ºµÄƽ̨½øÐÐÓÅ»¯£¬Èç¹ûÑ¡Ôñ32£¬¶ÔÓÚAMD
GPU£¬¿ÉÄÜÒ»¸öwaveÖÐ32Ïß³ÌÊǿղÙ×÷£¬¶øÈç¹ûÑ¡Ôñ64£¬¶Ônv GPUÀ´Ëµ£¬¿ÉÄÜ»á³öÏÖ×ÊÔ´¾ºÕùµÄÇé¿ö¼Ó¾ç£¬±ÈÈçregisterÒÔ¼°local
meomoryµÄ·ÖÅäµÈµÈ¡£Õâ¶ù»¹²»Ëµ»ìºÏCPU deviceµÄÇé¿ö,OpenCL²¢Ðбà³ÌµÄµÀ·»¹ºÜÂþ³¤£¬ÆÚ´ýеÄOpenCL¼Ü¹¹µÄ³öÏÖ¡£
|