ÐÔÄÜÓÅ»¯
1¡¢Ïß³ÌÓ³Éä
ËùνÏß³ÌÓ³ÉäÊÇָij¸öÏ̷߳ÃÎÊÄÄÒ»²¿·ÖÊý¾Ý£¬Æäʵ¾ÍÊÇÏß³ÌidºÍ·ÃÎÊÊý¾ÝÖ®¼äµÄ¶ÔÓ¦¹ØÏµ¡£
ºÏÊʵÄÏß³ÌÓ³Éä¿ÉÒÔ³ä·ÖÀûÓÃÓ²¼þÌØÐÔ£¬´Ó¶øÌá¸ß³ÌÐòµÄÐÔÄÜ£¬·´Ö®£¬Ôò»á½µµÍperformance¡£
Çë²Î¿¼Static Memory Access Pattern Analysis on a Massively
Parallel GPUÕâÆªpaper£¬ÎÄÖн²ÊöÏß³ÌÈçºÎÔÚËã·¨Öгä·ÖÀûÓÃÏß³ÌÓ³Éä¡£ÕâÊÇÎÒÔÚgoogleÖÐËÑË÷µ½µÄÏÂÔØµØÖ·£ºhttp://www.ece.neu.edu/~bjang/patternAnalysis.pdf
ʹÓò»Í¬µÄÏß³ÌÓ³É䣬ͬһ¸öÏ߳̿ÉÄÜ·ÃÎʲ»Í¬Î»ÖõÄÊý¾Ý¡£ÏÂÃæÊǼ¸¸öÏß³ÌÓ³ÉäµÄÀý×Ó£º


ÎÒÃÇ¿¼ÂÇÒ»¸ö¼òµ¥µÄ´®ÐоØÕó³Ë·¨£ºÕâ¸öËã·¨±È½ÏÊʺÏÊä³öÊý¾Ý½µÎ¬²Ù×÷£¬Í¨¹ý´´½¨N*M¸öỊ̈߳¬ÎÒÃÇÒÆÈ¥Á½²ãÍâÑ»·£¬ÕâÑùÿ¸öÏß³ÌÖ´ÐÐP¸ö¼Ó·¨³Ë·¨²Ù×÷¡£ÏÖÔÚÐèÒªÎÒÃÇ¿¼ÂǵÄÎÊÌâÊÇ£¬Ïß³ÌË÷Òý¿Õ¼ä¾¿¾¹Ó¦¸ÃÊÇM*N»¹ÊÇN*M£¿

µ±ÎÒÃÇʹÓÃM*NÏß³ÌË÷Òý¿Õ¼äʱºò£¬KernelÈçÏÂͼËùʾ£º

¶øÊ¹ÓÃN*MÏß³ÌË÷Òý¿Õ¼äʱºò£¬KernelÈçÏÂͼËùʾ:

ʹÓÃÁ½ÖÖÓ³Éä¹ØÏµ£¬³ÌÐòÖ´Ðнá¹ûÊÇÒ»ÑùµÄ¡£ÏÂÃæÊÇÔÚnvµÄ¿¨GeForce 285 and 8800 GPUsÉϵÄÖ´Ðнá¹û¡£¿ÉÒÔ¿´µ½Ó³Éä2£¨¼°N*MÏß³ÌË÷Òý¿Õ¼ä£©£¬³ÌÐòµÄperformance¸ü¸ß¡£

performance²îÒìÖ÷ÒªÊÇÒòΪÔÚÁ½ÖÖÓ³É䷽ʽÏ£¬¶Ôglobal memory·ÃÎʵķ½Ê½ÓÐËù²»Í¬¡£ÔÚÐÐÖ÷ÐòµÄbufferÖУ¬Êý¾Ý¶¼Êǰ´ÐÐÖð¸ö´æ´¢£¬ÎªÁ˱£Ö¤ºÏ²¢·ÃÎÊ£¬ÎÒÃÇÓ¦¸Ã°ÑÒ»¸öwaveÖÐÁ¬ÐøµÄÏß³ÌÓ³Éäµ½¾ØÕóµÄÁУ¨µÚ¶þά£©£¬ÕâÑùÔÚA*B=CµÄÇé¿öÏ£¬»á°Ñ¾ØÕóBºÍCµÄÄÚ´æ¶ÁдʵÏֺϲ¢·ÃÎÊ£¬¶øÁ½ÖÖÓ³É䷽ʽ¶ÔAûÓÐÓ°Ï죨AÓÖi3¾ö¶¨Ë³Ðò£©¡£
ÍêÕûµÄÔ´´úÂëÇë´Ó£ºhttp://code.google.com/p/imagefilter-opencl/downloads/detail?name=amduniCourseCode4.zip&can=2&q=#makechangesÏÂÔØ£¬³ÌÐòÖÐÎÒʵÏÖÁËÁ½ÖÖ·½Ê½µÄ±È½Ï¡£½á¹ûȷʵµÚ¶þÖÖ·½Ê½Òª¿ìһЩ¡£
ÏÂÃæÎÒÃÇÔÙ¿´Ò»¸ö¾ØÕóתÖõÄÀý×Ó£¬ÔÚÀý×ÓÖУ¬Í¨¹ý¸Ä±äÓ³É䷽ʽ£¬Ìá¸ßÁËglobal memory·ÃÎʵÄЧÂÊ¡£

¾ØÕóתÖõĹ«Ê½ÊÇ:Out(x,y) = In(y,x)
´ÓÉÏͼ¿ÉÒÔ¿´³ö£¬ÎÞÂÛ²ÅÈ¥ÄÇÖÖÓ³É䷽ʽ£¬×ÜÓÐÒ»¸öbufferÊǷǺϲ¢·ÃÎÊ·½Ê½£¨×¢£ºÔÚ¾ØÕóתÖÃʱ£¬±ØÐëÒª°ÑÊäÈë¾ØÕóµÄij¸öÔªËØ¿½±´µ½ÁÙʱλÖ㬱ÈÈç¼Ä´æÆ÷£¬È»ºó²ÅÄÜ¿½±´µ½Êä³ö¾ØÕ󣩡£ÎÒÃÇ¿ÉÒԸıäÏß³ÌÓ³É䷽ʽ£¬ÓÃlocal
memory×÷ΪÖмäÔªËØ£¬´Ó¶øÊµÏÖÊäÈ룬Êä³ö¾ØÕó¶¼ÊÇglobal memoryºÏ²¢·ÃÎÊ¡£

ÏÂÃæÊÇAMD 5870ÏÔ¿¨ÉÏ£¬Á½ÖÖÏß³ÌÓ³É䷽ʽʵÏֵľØÕóתÖÃÐÔÄܱȽϣº

ÍêÕû´úÂ룺http://code.google.com/p/imagefilter-opencl/downloads/detail?name=amduniCourseCode5.zip&can=2&q=#makechanges
2¡¢Occupancy
Ç°ÃæµÄ½Ì³ÌÖУ¬ÎÒÃÇÌáµ½¹ýOccupancyµÄ¸ÅÄËüÖ÷ÒªÓÃÀ´ÃèÊöCUÖÐ×ÊÔ´µÄÀûÓÃÂÊ¡£
OpenCLÖÐworkgroup±»Ó³Éäµ½Ó²¼þµÄCUÖÐÖ´ÐУ¬ÔÚÒ»¸öworkgroupÖеÄËùÓÐÏß³ÌÖ´ÐÐÍêÖ®ºó£¬Õâ¸öworkgroup²ÅËãÖ´ÐнáÊø¡£¶ÔÒ»¸öÌØ¶¨µÄcuÀ´Ëµ£¬ËüµÄ×ÊÔ´£¨±ÈÈç¼Ä´æÆ÷ÊýÁ¿£¬local
memory´óС£¬×î´óÏß³ÌÊýÁ¿µÈ£©Êǹ̶¨µÄ£¬ÕâЩ×ÊÔ´¶¼»áÏÞÖÆcuÖÐͬʱ´¦ÓÚµ÷¶È״̬µÄworkgroupÊýÁ¿¡£Èç¹ûcuÖеÄ×ÊÔ´ÊýÁ¿×ã¹»µÄµÄ»°£¬Ó³É䵽ͬһ¸öcuµÄ¶à¸öworkgroupÄÜͬʱ´¦ÓÚµ÷¶È״̬£¬ÆäÖÐÒ»¸öworkgroupµÄwave´¦ÓÚÖ´ÐÐ״̬£¬µ±´¦ÓÚÖ´ÐÐ״̬µÄworkgroupËùÓÐwaveÒòΪµÈ´ý×ÊÔ´¶øÇл»µ½µÈ´ý״̬µÄ»°£¬²»Í¬workgroupÄܹ»´Ó¾ÍÐ÷״̬Çл»µ½ALUÖ´ÐУ¬ÕâÑùÒþ²Ømemory·ÃÎÊʱÑÓ¡£ÕâÓеãÀàËÆ²Ù×÷ϵͳÖнø³ÌÖ®¼äµÄµ÷¶È״̬¡£ÎÒ¼òµ¥»¸öͼ£¬ÒÔ¹©²Î¿¼£º

1.¶ÔÓÚÒ»¸ö±È½Ï³¤µÄkernel£¬¼Ä´æÆ÷ÊÇÖ÷ÒªµÄ×ÊÔ´Æ¿¾±¡£¼ÙÉèkernelÐèÒªµÄ×î´ó¼Ä´æÆ÷ÊýĿΪ35£¬ÔòworkgroupÖеÄËùÓÐÏ̶߳¼»áʹÓÃ35¸ö¼Ä´æÆ÷£¬¶øÒ»¸öCU£¨¼ÙÉèΪ5870£©µÄ×î´ó¼Ä´æÆ÷ÊýĿΪ16384£¬ÔòcuÖÐ×î¶à¿ÉÓÐ16384/35=468Ị̈߳¬´Ëʱ£¬Ò»¸öworkgroupÖеÄÏß³ÌÊýÄ¿£¨workitem£©²»¿ÉÄܳ¬¹ý468£¬
2.¿¼ÂÇÁíÒ»¸öÎÊÌ⣬һ¸öcu¹²16384¸ö¼Ä´æÆ÷£¬¶øworkgroup¹Ì¶¨Îª256¸öỊ̈߳¬ÔòʹÓõļĴæÆ÷ÊýÁ¿¿É´ïµ½64¸ö¡£
ÿ¸öCUµÄlocal memoryÒ²ÊÇÓÐÏ޵쬶ÔÓÚAMD HD 5XXXÏÔ¿¨£¬local memoryÊÇ32K£¬NVµÄÏÔ¿¨local
memoryÊÇ32-48K£¨¾ßÌå¿´Ðͺţ©¡£ºÍʹÓüĴæÆ÷µÄÇé¿öÏàËÆ£¬Èç¹ûkernelʹÓùý¶àµÄlocal
memory£¬ÔòworkgroupÖеÄÏß³ÌÊýĿҲ»áÓÐÏÞÖÆ¡£
GPUÓ²¼þ»¹ÓÐÒ»¸öCUÄÚµÄ×î´óÏß³ÌÊýÄ¿ÏÞÖÆ£ºAMDÏÔ¿¨256£¬nvÏÔ¿¨512¡£
NVµÄÏÔ¿¨¶ÔÓÚÿ¸öCUÄڵ碌îÏß³ÌÓÐÊýÁ¿ÏÞÖÆ£¬Ã¿¸öcu 8¸ö»ò16¸öwarp£¬768»òÕß1024¸öÏ̡߳£
AMDÏÔ¿¨¶Ôÿ¸öCUÄÚµÄwaveÊýÁ¿ÓÐÏÞÖÆ£¬¶ÔÓÚ5870£¬×î¶à496¸öwave¡£
ÕâЩÏÞÖÆ¶¼ÊÇÒòΪÓÐÏÞµÄ×ÊÔ´¾ºÕùÒýÆðµÄ£¬ÔÚnv cudaÖУ¬¿ÉÒÔͨ¹ý¿ÉÊÓ»¯µÄ·½Ê½²é¿´×ÊÔ´µÄÏÞÖÆÇé¿ö¡£
3¡¢ÏòÁ¿»¯
ÏòÁ¿»¯ÔÊÐíÒ»¸öÏß³ÌͬʱִÐжà¸ö²Ù×÷¡£ÎÒÃÇ¿ÉÒÔÔÚkernel´úÂëÖУ¬Ê¹ÓÃÏòÁ¿Êý¾ÝÀàÐÍ£¬±ÈÈçfloat4À´»ñµÃ¼ÓËÙ¡£ÏòÁ¿»¯ÔÚAMDµÄGPUÉÏЧ¹û¸üΪÃ÷ÏÔ£¬ÕâÊÇÒòΪAMDµÄÏÔ¿¨µÄstream
coreÊÇ(x,y,z,w)ÕâÑùµÄÏòÁ¿ÔËËãµ¥Ôª¡£
ÏÂͼÊÇÔÚ¼òµ¥µÄÏòÁ¿¸³ÖµÔËËãÖУ¬Ê¹ÓÃfloatºÍfloat4µÄÐÔÄܱȽϡ£

kernel´úÂëΪ£º
±¾½ÚÖ÷Òª½éÉÜNBodyËã·¨µÄOpenCLÐÔÄÜÓÅ»¯¡£
1¡¢NBody
NBodyϵͳÖ÷ÒªÓÃÀ´Í¨¹ýÁ£×ÓÖ®¼äµÄÎïÀí×÷ÓÃÁ¦À´Ä£ÄâÐÇϵϵͳ¡£Ã¿¸öÁ£×Ó±íʾһ¸öÐÇÐÇ£¬¶à¸öÁ£×ÓÖ®¼äµÄÏ໥×÷Ó㬾ͳÊÏÖ³öÐÇϵµÄЧ¹û¡£

ÉÏͼΪһ¸öÁ£×ÓÄ£ÄâÐÇϵµÄͼƬ£ºSource: THE GALAXY-CLUSTER-SUPERCLUSTER
CONNECTION£¬http://www.casca.ca/ecass/issues/1997-DS/West/west-bil.html
ÓÉÓÚÿ¸öÁ£×ÓÖ®¼ä¶¼ÓÐÏ໥×÷ÓõÄÒýÁ¦£¬ËùÒÔÕâ¸öËã·¨µÄ¸´ÔÓ¶ÈÊÇN2µÄ¡£ÏÂÃæÎÒÃÇÖ÷Ҫ̽ÌÖÈçºÎÓÅ»¯Ëã·¨ÒÔ¼°ÔÚOpenCL»ù´¡ÉÏÓÅ»¯Ëã·¨¡£
2¡¢NBodyËã·¨
¼ÙÉèÁ½¸öÁ£×ÓÖ®¼äͨ¹ýÍòÓÐÒýÁ¦Ï໥×÷Óã¬ÔòÈÎÒâÁ½¸öÁ£×ÓÖ®¼äµÄÏ໥×÷ÓÃÁ¦F¹«Ê½ÈçÏ£º

×µÄ·½·¨¾ÍÊǼÆËãÿ¸öÁ£×ÓºÍÆäËüÁ£×ÓµÄ×÷ÓÃÁ¦Ö®ºÍ£¬Õâ¸ö·½·¨Í¨³£³Æ×÷N-PairµÄNBodyÄ£Äâ¡£
Á£×ÓÖ®¼äµÄÍòÓÐÒýÁ¦ºÍËüÃÇÖ®¼äµÄ¾àÀë³É·´±È£¬¶ÔÓÚÒ»¸öÁ£×Ó¶øÑÔ£¨¼ÙÉèÁ£×ÓÖÊÁ¿¶¼Ò»Ñù£©£¬Ô¶¾àÀëÁ£×ÓµÄ×÷ÓÃÁ¦ÓÐʱºòºÜС£¬ÉõÖÁ¿ÉÒÔºöÂÔ¡£Barnes
Hut °Ñ3D¿Õ¼ä°´°Ë²æÊ÷½øÐзָֻÓÐÔÚÏàÁÚcellµÄÁ£×Ó²ÅÖ±½Ó¼ÆËãËüÃÇÖ®¼äµÄÒýÁ¦£¬Ô¶¾àÀëcellÖеÄÁ£×Óµ±×÷Ò»¸öÕûÌåÀ´¼ÆËãÒýÁ¦¡£

3¡¢OpenCLÓÅ»¯Nbody
ÔÚ±¾½ÚÖУ¬ÎÒÃDz»¿¼ÂÇËã·¨±¾ÉíµÄÓÅ»¯£¬Ö»ÊÇͨ¹ýOpenCL»úÖÆÀ´ÓÅ»¯N-PairµÄNBodyÄ£Äâ¡£
×î¼òµ¥µÄʵʩ·½·¨¾ÍÊÇÿ¸öÀý×ÓµÄ×÷ÓÃÁ¦Ïà¼Ó£¬´úÂëÈçÏ£º
for(i=0; i<n; i++) { ax = ay = az = 0; // Loop over all particles "j¡± for (j=0; j<n; j++) {
//Calculate Displacement
dx=x[j]-x[i];
dy=y[j]-y[i];
dz=z[j]-z[i];
// small eps is delta added for dx,dy,dz = 0
invr= 1.0/sqrt(dx*dx+dy*dy+dz*dz +eps); |
ÎÒÃǶÔÿ¸öÁ£×Ó¼ÆËã×÷ÓÃÔÚËüÉÏÃæµÄºÏÁ¦£¬È»ºóÇóÔÚºÏÁ¦×÷ÓÃÏ£¬deltaʱ¼äÄÚÁ£×ÓµÄÐÂλÖ㬲¢°ÑÕâ¸öÐÂλÖõ±×÷Ï´μÆËãµÄÊäÈë²ÎÊý¡£

ûÓÐÓÅ»¯µÄOpenCL kernel´úÂëÈçÏ£º
__global float4* pos , __global float4* vel, int numBodies, float deltaTime, float epsSqr, __local float4* localPos, __global float4* newPosition, __global float4* newVelocity)
{
unsigned int tid = get_local_id(0);
unsigned int gid = get_global_id(0);
unsigned int localSize = get_local_size(0);
|
ÔÚÕâÖÖʵÏÖÖУ¬Ã¿´Î¶¼Òª´Óglobal memoryÖжÁÈ¡ÆäËüÁ£×ÓµÄλÖã¬ËÙ¶È£¬ÄÚ´æ·ÃÎÊ= N reads*N
threads= N2
ÎÒÃÇ¿ÉÒÔͨ¹ýlocal memory½øÐÐÓÅ»¯£¬Ò»¸öÁ£×ÓÊý¾Ý¶Á½øÀ´ÒԺ󣬿ÉÒÔ±»p*p¸öÏ̹߳²Óã¬p*p¼´ÎªworkgroupµÄ´óС£¬¶ÔÓÚÿ¸öÁ£×Ó£¬ÎÒÃÇͨ¹ýµü´úp*pµÄtile£¬ÀÛ»ýµÃµ½×îÖÕ½á¹û¡£

ÓÅ»¯ºóµÄkernel´úÂëÈçÏ£º
int numBodies,
float deltaTime,
float epsSqr,
__local float4* localPos,
__global float4* newPosition,
__global float4* newVelocity)
{
unsigned int tid = get_local_id(0);
|
ÏÂÃæÊÇÔÚAMD, NVÁ½¸öƽ̨ÉÏÐÔÄܲâÊÔ½á¹û£º
AMD GPU = 5870 Stream SDK 2.2
Nvidia GPU = GTX 480 with CUDA 3.1

ÁíÍ⣬ÔÚ³ÌÐòÖУ¬Ò²³¢ÊÔÁËÑ»·Õ¹¿ª£¬Í¨¹ýÕ¹¿ªÄÚÑ»·£¬´Ó¶ø¼õÉÙGPUÖ´ÐзÖÖ§Ö¸ÁÎҵIJâÊÔÖУ¬Ê¹ÓÃÕ¹¿ªËĴΣ¬µÃµ½µÄFPS±Èûչ¿ªÇ°¿ìÁË30%¡££¨AMD
5670ÏÔ¿¨£©¡£¾ßÌåʵÏÖ¿ÉÒÔ¿´kernel´úÂëÖеÄ__kernel void nbody_sim_unrollº¯Êý¡£ÔÚAMDƽ̨ÉÏ£¬Ê¹ÓÃÏòÁ¿»¯Ò²¿ÉÒÔÌá¸ß10%×óÓÒµÄÐÔÄÜ¡£
1¡¢OpenCLÀ©Õ¹
OpenCLÀ©Õ¹ÊÇÖ¸deviceÖ§³ÖijÖÖÌØÐÔ£¬µ«ÕâÖÐÌØÐÔ²¢²»ÊÇOpenCL±ê×¼µÄÒ»²¿·Ö¡£Í¨¹ýÀ©Õ¹£¬³§ÉÌ¿ÉÒÔ¸ødeviceÔö¼ÓһЩÐµĹ¦ÄÜ£¬¶ø²»Óÿ¼ÂǼæÈÝÐÔÎÊÌâ¡£ÏÖÔÚ¸÷¸ö³§ÉÌÔÚOpenCLµÄʵÏÖÖлò¶à»òÉÙµÄʹÓÃÁË×Ô¼ºµÄÀ©Õ¹¡£
À©Õ¹µÄÀàÐÍ·ÖΪÈýÖÖ£º
1.Khronos OpenCL¹¤×÷×éÅú×¼µÄÀ©Õ¹£¬ÕâÖÖÒª¾¹ýÒ»ÖÂÐÔ²âÊÔ£¬¿ÉÄܻᱻÔö¼Óµ½Ð°汾µÄOpenCL¹æ·¶ÖС£ÕâÖÖÀ©Õ¹¶¼ÒÔcl_khr×÷ΪÀ©Õ¹Ãû¡£
2.ÍⲿÀ©Õ¹, ÒÔcl_extΪÀ©Õ¹Ãû¡£ÕâÖÖÀ©Õ¹ÊÇÓÉ2¸ö»ò2¸öÒÔÉϵij§ÉÌ·¢Æð£¬²¢²»ÐèÒª½øÐÐÒ»ÖÂÐÔ²âÊÔ¡£±ÈÈçcl_ext_device_fissionÀ©Õ¹¡£
3.ij¸ö³§ÉÌ×Ô¼ºµÄÀ©Õ¹£¬±ÈÈçAMDµÄÀ©Õ¹printf
2¡¢Ê¹ÓÃÀ©Õ¹
OpenCLÖУ¬ÒªÊ¹ÓÃÀ©Õ¹£¬ÎÒÃDZØÐë´ò¿ªÀ©Õ¹£¬ÔÚĬÈÏ״̬Ï£¬ËùÓеÄÀ©Õ¹¶¼ÊǽûÖ¹µÄ¡£
#pragma OPENCL EXTENSION extension_name : enable
¶ÔÓÚOpenCL£¬Ò»¸öº¯ÊýÖ»ÓÐÔÚÔËÐÐʱ£¬²ÅÖªµÀÆäÊÇ·ñ¿ÉÓã¬ËùÒÔҪȷ¶¨Ä³¸öÀ©Õ¹ÊÇ·ñ¿ÉÓã¬ÊdzÌÐòÔ±µÄÔðÈΣ¬ÎÒÃDZØÐëÔÚʹÓÃǰ²éѯËüµÄ״̬¡£ÏÂÃæÊDzéѯÀ©Õ¹ÊÇ·ñ¿ÉÓõĴúÂ룺

3¡¢Ò»Ð©KhronosÅú×¼µÄÀ©Õ¹
Ô×Ó²Ù×÷£¬Ëü¿ÉÒÔ±£Ö¤º¯ÊýÖ»ÔÚÒ»¸ödeviceÉÏʵʩÔ×Ó²Ù×÷£¬±ÈÈ磺
¡ªcl_khr_{global | local}_int32_base_atomics
¡ªcl_khr_{global | local}_int32_extended_atomics
¡ªcl_khr_int64_base_atomics
¡ªcl_khr_int64_extended_atomics
×¢Ò⣺Ô×Ó²Ù×÷Äܹ»±£Ö¤²Ù×÷½á¹ûÕýÈ·£¬µ«²»±£Ö¤²Ù×÷µÄ˳Ðò¡£


Ë«¾«¶ÈºÍhalf¾«¶ÈÀ©Õ¹cl_khr_fp64£¬ÔÚһЩÎïÀíÄ£Äâ»òÕß¿ÆÑ§¼ÆËãÖУ¬ÐèҪ˫¾«¶ÈÖ§³Ö¡£AMDµÄ64λÀ©Õ¹ÓÃcl_amd_fp64£¬¶ÔÓÚcl_khr_fp64ÊDz¿·ÖÖ§³Ö£¬NVÖ§³Öcl_khr_fp64À©Õ¹¡£µ«half¾«¶ÈÀ©Õ¹cl_khr_fp16£¬ÕâÁ½¼Ò³§ÉÌÏÖÔÚ¶¼»¹²»Ö§³Ö¡£
ÔÚOpenCLÖУ¬Byte addressable store Ò²ÊÇÒ»¸öÀ©Õ¹£¬¶ÔÓÚsub 32µÄд£¬±ÈÈçchar£¬ÐèÒª¸ÃÀ©Õ¹µÄÖ§³Ö¡£ÀýÈçAMD
Ö±·½Í¼µÄÀý×ÓÖУ¬Ã¿¸öbinÓÃÒ»¸öbyteÀ´´æ´¢¡£
3D Image Write Extensions£¬ÔÚOpenCL±ê×¼ÖУ¬Ö§³Ö2DͼÏñµÄ¶Áд£¬3DͼÐεÄд¾ÍÐèҪͨ¹ýÀ©Õ¹À´²Ù×÷¡£
The extension cl_KHR_gl_sharing ÔÊÐíÓ¦ÓóÌÐòʹÓÃOpenGL buffer£¬ÎÆÀíµÈ¡£
4¡¢AMDÀ©Õ¹
cl_ext_device_fissionÀ©Õ¹£¬Í¨¹ý¸ÃÀ©Õ¹°ÑÒ»¸öÉ豸·Ö³É¶à¸ö×ÓÉ豸£¬Ã¿Ò»¸öÉ豸¶¼ÓÐ×Ô¼ºµÄ¶ÓÁУ¬Ö÷ÒªÊǶàºËcpuÒÔ¼°Cell
Broadband EngineʹÓ㬸ÃÀ©Õ¹ÓÉAMD£¬Apple£¬IntelÒÔ¼°IBMËļÒÁªºÏÌá³ö¡£
fissionÉ豸¿ÉÄܵÄÓÃ;°üÀ¨£º
1.±£ÁôÒ»²¿·ÖÉ豸´¦Àí¸ßÓÅÏȼ¶¡¢µÍʱÑÓµÄÈÎÎñ¡£
2.Control for the assignment of work to individual
compute units
3.Subdivide compute devices along some shared hardware
feature like a cache
¶ÔÓÚÿ¸ö×ÓÉ豸£¬¶¼ÓÐ×Ô¼ºµÄqueue£¬±ÈÈçÏÂÃæµÄͼÖУ¬ÎÒÃǰѲ»Í¬ÈÎÎñ·¢Ë͵½Á½¸ö×ÓÉ豸¡£ÖµµÃ×¢ÒâµÄÊÇ£ºÒª°ÑÉ豸²ð·ÖΪ×ÓÉ豸£¬Ê×ÏÈÎÒÃÇÒªÁ˽â¸ÃÉ豸µÄ¼Ü¹¹£¬È»ºó¸ù¾ÝÈÎÎñ¼°device¼Ü¹¹½øÐвð·Ö¡£

GPU printf À©Õ¹£¬Ö÷ÒªÓÃÀ´debug kernel´úÂë¡£cl_amd_media_opsÀ©Õ¹£¬Ö÷ÒªÓÃÓÚһЩ¶àýÌå²Ù×÷¡£The
AMD device query extension Ö÷ÒªÓÃÓÚ²éѯºÍʼþ´¦Àí¡£
5¡¢NVÀ©Õ¹
1.Compiler Options
2.Interoperability Extensions
3.Device Query Extension
6¡¢Cell Broadband Engine Extensions
cell´¦ÀíÆ÷ÓõIJ»¶à£¬¾Í²»Ïêϸ˵ÁË£¬Ê¹ÓõÄÈË¿ÉÒÔ²éѯÆäÏà¹ØÊֲᡣ
|