Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Modeler   Code  
»áÔ±   
 
   
 
 
     
   
 ¶©ÔÄ
  ¾èÖú
´ÓK½üÁÚËã·¨¡¢¾àÀë¶ÈÁ¿Ì¸µ½KDÊ÷¡¢SIFT+BBFËã·¨
 

×÷Õß Yarkin£¬ »ðÁú¹ûÈí¼þ ·¢²¼ÓÚ£º2014-09-02

  3060  次浏览      34
 

ǰÑÔ

ǰÁ½ÈÕ£¬ÔÚ΢²©ÉÏ˵£º¡°µ½½ñÌìΪֹ£¬ÎÒÖÁÉÙ¿÷Ç·ÁË3ƪÎÄÕ´ýд£º1¡¢KDÊ÷£»2¡¢Éñ¾­ÍøÂ磻3¡¢±à³ÌÒÕÊõµÚ28Õ¡£Äã¿´µ½£¬blogÄÚµÄÎÄÕÂÓëÄãÓÚ±ð´¦Ëù¼ûµÄÈκζ¼²»Í¬¡£ÓÚÊÇ£¬µÈ°¡µÈ£¬µÈһ̨µçÄÔ£¬Ö»ºÃµÈ´ý..¡±¡£µÃÒæÓÚÌ½èÁËÎÒһ̨µçÄÔ£¨½èËûµçÄÔµÄʱºò£¬ÎÒÁ¬±íʾ¸Ðл£¬Ëû˵¡°ÄÜÕÒµ½¹¤×÷È«¿¿ÄãµÄ²©¿Í£¬Õâµã¶ùСæ»¹Ëµ£¬²»µØµÀ¡±£¬ÓеÄʱºò£¬ÉÔÐí¸ÐÊܵ½ÊÜÈËÐÅÈÎÒ²ÊÇÒ»ÖÖѹÁ¦£¬Ô¸ÎÒ²»¹¼¸º´ó¼Ò¶ÔÎÒµÄÐÅÈΣ©£¬ÓÚÊǽñÌ쿪ʼTop 10 Algorithms in Data MiningϵÁеÚÈýƪÎÄÕ£¬¼´±¾ÎÄ¡¸´ÓK½üÁÚË㷨̸µ½KDÊ÷¡¢SIFT+BBFËã·¨¡¹µÄ´´×÷¡£

Ò»¸öÈ˼á³Ö×Ô¼ºµÄÐËȤÊDZȽÏÄѵģ¬ÒòΪ̫¶àµÄÈËÌ«ÈÝÒ×ΪÍâ½çËù¶¯ÁË£¬¶øÓÈÆäµ±ÄãÎÞ·¨´ÓÖеõ½¶àÉÙʵ¼ÊÐԵĻر¨Ê±£¬ËùÐÒ£¬ÎÒÄÜÒ»Ö±¼á³ÖÏÂÀ´¡£±Ï´ï¸çÀ­Ë¹Ñ§ÅÉÓоäÃûÑÔ£º¡°ÍòÎï½ÔÊý¡±£¬×î½ü¶ÁÍ꡸΢»ý·Ö¸ÅÄչʷ¡¹ºóÒ²¸ÐÊܵ½ÁËÕâÒ»µã¡£Í¬Ê±£¬´ÓËã·¨µ½Êý¾ÝÍÚ¾ò¡¢»úÆ÷ѧϰ£¬ÔÙµ½Êýѧ£¬ÆäÖÐÿһ¸öÁìÓòÈκÎÒ»¸öϸ½Ú¶¼ÖµµÃ̽Ë÷ÖÕÉú£¬»òÐí£¬Õâ¾ÍÊÇ¡°ÖÕÉúΪѧ¡±µÄÒâ˼¡£

ͬʱ£¬Ä㽫¿´µ½£¬K½üÁÚË㷨ͬ±¾ÏµÁеÄǰÁ½ÆªÎÄÕÂËù½²µÄ¾ö²ßÊ÷·ÖÀ౴Ҷ˹·ÖÀ࣬¼°Ö§³ÖÏòÁ¿»úSVMÒ»Ñù£¬Ò²ÊÇÓÃÓÚ½â¾ö·ÖÀàÎÊÌâµÄËã·¨£¬

¶ø±¾Êý¾ÝÍÚ¾òÊ®´óË㷨ϵÁÐÒ²»á°´ÕÕ·ÖÀ࣬¾ÛÀ࣬¹ØÁª·ÖÎö£¬Ô¤²â»Ø¹éµÈÎÊÌâÒÀ´ÎÕ¹¿ª²ûÊö¡£

OK£¬ÐÐÎIJִ٣¬±¾ÎÄÈôÓÐÈκΩ¶´£¬ÎÊÌâ»òÕß´íÎ󣬻¶Ó­ÅóÓÑÃÇËæÊ±²»ÁßÖ¸Õý£¬¸÷λµÄÅúÆÀÒ²ÊÇÎÒ¼ÌÐøÐ´ÏÂÈ¥µÄ¶¯Á¦Ö®Ò»¡£¸Ðл¡£

µÚÒ»²¿·Ö¡¢K½üÁÚËã·¨

1.1¡¢Ê²Ã´ÊÇK½üÁÚËã·¨

ºÎνK½üÁÚËã·¨£¬¼´K-Nearest Neighbor algorithm£¬¼ò³ÆKNNËã·¨£¬µ¥´ÓÃû×ÖÀ´²ÂÏ룬¿ÉÒÔ¼òµ¥´Ö±©µÄÈÏΪÊÇ£ºK¸ö×î½üµÄÁÚ¾Ó£¬µ±K=1ʱ£¬Ëã·¨±ã³ÉÁË×î½üÁÚËã·¨£¬¼´Ñ°ÕÒ×î½üµÄÄǸöÁÚ¾Ó¡£ÎªºÎÒªÕÒÁÚ¾Ó£¿´ò¸ö±È·½À´Ëµ£¬¼ÙÉèÄãÀ´µ½Ò»¸öİÉúµÄ´åׯ£¬ÏÖÔÚÄãÒªÕÒµ½ÓëÄãÓÐ×ÅÏàËÆÌØÕ÷µÄÈËȺÈÚÈëËûÃÇ£¬ËùνÈë»ï¡£

Óùٷ½µÄ»°À´Ëµ£¬ËùνK½üÁÚËã·¨£¬¼´ÊǸø¶¨Ò»¸öѵÁ·Êý¾Ý¼¯£¬¶ÔеÄÊäÈëʵÀý£¬ÔÚѵÁ·Êý¾Ý¼¯ÖÐÕÒµ½Óë¸ÃʵÀý×îÁÚ½üµÄK¸öʵÀý£¨Ò²¾ÍÊÇÉÏÃæËù˵µÄK¸öÁÚ¾Ó£©£¬ÕâK¸öʵÀýµÄ¶àÊýÊôÓÚij¸öÀ࣬¾Í°Ñ¸ÃÊäÈëʵÀý·ÖÀൽÕâ¸öÀàÖС£¸ù¾ÝÕâ¸ö˵·¨£¬ÔÛÃÇÀ´¿´ÏÂÒý×Ôά»ù°Ù¿ÆÉϵÄÒ»·ùͼ£º

ÈçÉÏͼËùʾ£¬ÓÐÁ½À಻ͬµÄÑù±¾Êý¾Ý£¬·Ö±ðÓÃÀ¶É«µÄСÕý·½ÐκͺìÉ«µÄСÈý½ÇÐαíʾ£¬¶øÍ¼ÕýÖмäµÄÄǸöÂÌÉ«µÄÔ²Ëù±êʾµÄÊý¾ÝÔòÊÇ´ý·ÖÀàµÄÊý¾Ý¡£Ò²¾ÍÊÇ˵£¬ÏÖÔÚ£¬ÎÒÃDz»ÖªµÀÖмäÄǸöÂÌÉ«µÄÊý¾ÝÊÇ´ÓÊôÓÚÄÄÒ»ÀࣨÀ¶É«Ð¡Õý·½ÐÎorºìɫСÈý½ÇÐΣ©£¬ÏÂÃæ£¬ÎÒÃǾÍÒª½â¾öÕâ¸öÎÊÌ⣺¸øÕâ¸öÂÌÉ«µÄÔ²·ÖÀà¡£

ÎÒÃdz£Ëµ£¬ÎïÒÔÀà¾Û£¬ÈËÒÔȺ·Ö£¬ÅбðÒ»¸öÈËÊÇÒ»¸öʲôÑùÆ·ÖÊÌØÕ÷µÄÈË£¬³£³£¿ÉÒÔ´ÓËû/ËýÉí±ßµÄÅóÓÑÈëÊÖ£¬Ëùν¹ÛÆäÓÑ£¬¶øÊ¶ÆäÈË¡£ÎÒÃDz»ÊÇÒªÅбðÉÏͼÖÐÄǸöÂÌÉ«µÄÔ²ÊÇÊôÓÚÄÄÒ»ÀàÊý¾Ýô£¬ºÃ˵£¬´ÓËüµÄÁÚ¾ÓÏÂÊÖ¡£µ«Ò»´ÎÐÔ¿´¶àÉÙ¸öÁÚ¾ÓÄØ£¿´ÓÉÏͼÖУ¬Ä㻹ÄÜ¿´µ½£º

Èç¹ûK=3£¬ÂÌɫԲµãµÄ×î½üµÄ3¸öÁÚ¾ÓÊÇ2¸öºìɫСÈý½ÇÐκÍ1¸öÀ¶É«Ð¡Õý·½ÐΣ¬ÉÙÊý´ÓÊôÓÚ¶àÊý£¬»ùÓÚͳ¼ÆµÄ·½·¨£¬Åж¨ÂÌÉ«µÄÕâ¸ö´ý·ÖÀàµãÊôÓÚºìÉ«µÄÈý½ÇÐÎÒ»Àà¡£

Èç¹ûK=5£¬ÂÌɫԲµãµÄ×î½üµÄ5¸öÁÚ¾ÓÊÇ2¸öºìÉ«Èý½ÇÐκÍ3¸öÀ¶É«µÄÕý·½ÐΣ¬»¹ÊÇÉÙÊý´ÓÊôÓÚ¶àÊý£¬»ùÓÚͳ¼ÆµÄ·½·¨£¬Åж¨ÂÌÉ«µÄÕâ¸ö´ý·ÖÀàµãÊôÓÚÀ¶É«µÄÕý·½ÐÎÒ»Àà¡£

ÓÚ´ËÎÒÃÇ¿´µ½£¬µ±ÎÞ·¨Åж¨µ±Ç°´ý·ÖÀàµãÊÇ´ÓÊôÓÚÒÑÖª·ÖÀàÖеÄÄÄÒ»Ààʱ£¬ÎÒÃÇ¿ÉÒÔÒÀ¾Ýͳ¼ÆÑ§µÄÀíÂÛ¿´ËüËù´¦µÄλÖÃÌØÕ÷£¬ºâÁ¿ËüÖÜΧÁÚ¾ÓµÄÈ¨ÖØ£¬¶ø°ÑËü¹éΪ(»ò·ÖÅä)µ½È¨Öظü´óµÄÄÇÒ»Àà¡£Õâ¾ÍÊÇK½üÁÚËã·¨µÄºËÐÄ˼Ïë¡£

1.2¡¢½üÁڵľàÀë¶ÈÁ¿±íʾ·¨

ÉÏÎĵÚÒ»½Ú£¬ÎÒÃÇ¿´µ½£¬K½üÁÚËã·¨µÄºËÐÄÔÚÓÚÕÒµ½ÊµÀýµãµÄÁÚ¾Ó£¬Õâ¸öʱºò£¬ÎÊÌâ¾Í½Óõà¶øÖÁÁË£¬ÈçºÎÕÒµ½ÁÚ¾Ó£¬ÁÚ¾ÓµÄÅж¨±ê×¼ÊÇʲô£¬ÓÃʲôÀ´¶ÈÁ¿¡£ÕâһϵÁÐÎÊÌâ±ãÊÇÏÂÃæÒª½²µÄ¾àÀë¶ÈÁ¿±íʾ·¨¡£µ«ÓеĶÁÕß¿ÉÄܾÍÓÐÒÉÎÊÁË£¬ÎÒÊÇÒªÕÒÁÚ¾Ó£¬ÕÒÏàËÆÐÔ£¬ÔõôÓÖ¸ú¾àÀë³¶ÉϹØÏµÁË£¿

ÕâÊÇÒòÎªÌØÕ÷¿Õ¼äÖÐÁ½¸öʵÀýµãµÄ¾àÀë¿ÉÒÔ·´Ó¦³öÁ½¸öʵÀýµãÖ®¼äµÄÏàËÆÐԳ̶ȡ£K½üÁÚÄ£Ð͵ÄÌØÕ÷¿Õ¼äÒ»°ãÊÇnάʵÊýÏòÁ¿¿Õ¼ä£¬Ê¹ÓõľàÀë¿ÉÒÔʹŷʽ¾àÀ룬ҲÊÇ¿ÉÒÔÊÇÆäËü¾àÀ룬¼ÈÈ»³¶µ½Á˾àÀ룬ÏÂÃæ¾ÍÀ´¾ßÌå²ûÊö϶¼ÓÐÄÄЩ¾àÀë¶ÈÁ¿µÄ±íʾ·¨£¬È¨µ±À©Õ¹¡£

1. Å·ÊϾàÀ룬×î³£¼ûµÄÁ½µãÖ®¼ä»ò¶àµãÖ®¼äµÄ¾àÀë±íʾ·¨£¬ÓÖ³ÆÖ®ÎªÅ·¼¸ÀïµÃ¶ÈÁ¿£¬Ëü¶¨ÒåÓÚÅ·¼¸ÀïµÃ¿Õ¼äÖУ¬Èçµã x = (x1,...,xn) ºÍ y = (y1,...,yn) Ö®¼äµÄ¾àÀëΪ£º

(1)¶þÎ¬Æ½ÃæÉÏÁ½µãa(x1,y1)Óëb(x2,y2)¼äµÄÅ·ÊϾàÀ룺

(2)Èýά¿Õ¼äÁ½µãa(x1,y1,z1)Óëb(x2,y2,z2)¼äµÄÅ·ÊϾàÀ룺

(3)Á½¸önάÏòÁ¿a(x11,x12,¡­,x1n)Óë b(x21,x22,¡­,x2n)¼äµÄÅ·ÊϾàÀ룺

Ò²¿ÉÒÔÓñíʾ³ÉÏòÁ¿ÔËËãµÄÐÎʽ£º

ÆäÉÏ£¬¶þÎ¬Æ½ÃæÉÏÁ½µãŷʽ¾àÀ룬´úÂë¿ÉÒÔÈçϱàд£º

//unixfy£º¼ÆËãÅ·ÊϾàÀë  
double euclideanDistance(const vector<double>& v1, const vector<double>& v2)
{
assert(v1.size() == v2.size());
double ret = 0.0;
for (vector<double>::size_type i = 0; i != v1.size(); ++i)
{
ret += (v1[i] - v2[i]) * (v1[i] - v2[i]);
}
return sqrt(ret);
}

2. Âü¹þ¶Ù¾àÀ룬ÎÒÃÇ¿ÉÒÔ¶¨ÒåÂü¹þ¶Ù¾àÀëµÄÕýʽÒâÒåΪL1-¾àÀë»ò³ÇÊÐÇø¿é¾àÀ룬Ҳ¾ÍÊÇÔÚÅ·¼¸ÀïµÃ¿Õ¼äµÄ¹Ì¶¨Ö±½Ç×ø±êϵÉÏÁ½µãËùÐγɵÄÏ߶ζÔÖá²úÉúµÄͶӰµÄ¾àÀë×ܺ͡£ÀýÈçÔÚÆ½ÃæÉÏ£¬×ø±ê£¨x1, y1£©µÄµãP1Óë×ø±ê£¨x2, y2£©µÄµãP2µÄÂü¹þ¶Ù¾àÀëΪ£º£¬Òª×¢ÒâµÄÊÇ£¬Âü¹þ¶Ù¾àÀëÒÀÀµ×ù±êϵͳµÄת¶È£¬¶ø·ÇϵͳÔÚ×ù±êÖáÉÏµÄÆ½ÒÆ»òÓ³Éä¡£

ͨË×À´½²£¬ÏëÏóÄãÔÚÂü¹þ¶ÙÒª´ÓÒ»¸öÊ®×Ö·¿Ú¿ª³µµ½ÁíÍâÒ»¸öÊ®×Ö·¿Ú£¬¼ÝÊ»¾àÀëÊÇÁ½µã¼äµÄÖ±Ïß¾àÀëÂð£¿ÏÔÈ»²»ÊÇ£¬³ý·ÇÄãÄÜ´©Ô½´óÂ¥¡£¶øÊµ¼Ê¼ÝÊ»¾àÀë¾ÍÊÇÕâ¸ö¡°Âü¹þ¶Ù¾àÀ롱£¬´Ë¼´Âü¹þ¶Ù¾àÀëÃû³ÆµÄÀ´Ô´£¬ ͬʱ£¬Âü¹þ¶Ù¾àÀëÒ²³ÆÎª³ÇÊнÖÇø¾àÀë(City Block distance)¡£

(1)¶þÎ¬Æ½ÃæÁ½µãa(x1,y1)Óëb(x2,y2)¼äµÄÂü¹þ¶Ù¾àÀë

(2)Á½¸önάÏòÁ¿a(x11,x12,¡­,x1n)Óë b(x21,x22,¡­,x2n)¼äµÄÂü¹þ¶Ù¾àÀë

3. ÇбÈÑ©·ò¾àÀ룬Èô¶þ¸öÏòÁ¿»ò¶þ¸öµãp ¡¢and q£¬Æä×ù±ê·Ö±ðΪ¼°£¬ÔòÁ½ÕßÖ®¼äµÄÇбÈÑ©·ò¾àÀ붨ÒåÈçÏ£º

ÕâÒ²µÈÓÚÒÔÏÂLp¶ÈÁ¿µÄ¼«Öµ£º£¬Òò´ËÇбÈÑ©·ò¾àÀëÒ²³ÆÎªL¡Þ¶ÈÁ¿¡£

ÒÔÊýѧµÄ¹ÛµãÀ´¿´£¬ÇбÈÑ©·ò¾àÀëÊÇÓÉÒ»Ö·¶Êý£¨uniform norm£©£¨»ò³ÆÎªÉÏÈ·½ç·¶Êý£©ËùÑÜÉúµÄ¶ÈÁ¿£¬Ò²Êdz¬Í¹¶ÈÁ¿£¨injective metric space£©µÄÒ»ÖÖ¡£

ÔÚÆ½Ã漸ºÎÖУ¬Èô¶þµãp¼°qµÄÖ±½Ç×ø±êÏµ×ø±êΪ£¬ÔòÇбÈÑ©·ò¾àÀëΪ£º¡£

Íæ¹ý¹ú¼ÊÏóÆåµÄÅóÓÑ»òÐíÖªµÀ£¬¹úÍõ×ßÒ»²½Äܹ»Òƶ¯µ½ÏàÁÚµÄ8¸ö·½¸ñÖеÄÈÎÒâÒ»¸ö¡£ÄÇô¹úÍõ´Ó¸ñ×Ó(x1,y1)×ßµ½¸ñ×Ó(x2,y2)×îÉÙÐèÒª¶àÉÙ²½£¿¡£Äã»á·¢ÏÖ×îÉÙ²½Êý×ÜÊÇmax( | x2-x1 | , | y2-y1 | ) ²½ ¡£ÓÐÒ»ÖÖÀàËÆµÄÒ»ÖÖ¾àÀë¶ÈÁ¿·½·¨½ÐÇбÈÑ©·ò¾àÀë¡£

(1)¶þÎ¬Æ½ÃæÁ½µãa(x1,y1)Óëb(x2,y2)¼äµÄÇбÈÑ©·ò¾àÀë

(2)Á½¸önάÏòÁ¿a(x11,x12,¡­,x1n)Óë b(x21,x22,¡­,x2n)¼äµÄÇбÈÑ©·ò¾àÀë

Õâ¸ö¹«Ê½µÄÁíÒ»ÖֵȼÛÐÎʽÊÇ

4. ãɿɷò˹»ù¾àÀë(Minkowski Distance)£¬ãÉÊϾàÀë²»ÊÇÒ»ÖÖ¾àÀ룬¶øÊÇÒ»×é¾àÀëµÄ¶¨Òå¡£

(1) ãÉÊϾàÀëµÄ¶¨Òå

Á½¸önά±äÁ¿a(x11,x12,¡­,x1n)Óë b(x21,x22,¡­,x2n)¼äµÄãɿɷò˹»ù¾àÀ붨ÒåΪ£º

ÆäÖÐpÊÇÒ»¸ö±ä²ÎÊý¡£

µ±p=1ʱ£¬¾ÍÊÇÂü¹þ¶Ù¾àÀë

µ±p=2ʱ£¬¾ÍÊÇÅ·ÊϾàÀë

µ±p¡ú¡Þʱ£¬¾ÍÊÇÇбÈÑ©·ò¾àÀë

¸ù¾Ý±ä²ÎÊýµÄ²»Í¬£¬ãÉÊϾàÀë¿ÉÒÔ±íʾһÀàµÄ¾àÀë¡£

5. ±ê×¼»¯Å·ÊϾàÀë (Standardized Euclidean distance )£¬±ê×¼»¯Å·ÊϾàÀëÊÇÕë¶Ô¼òµ¥Å·ÊϾàÀëµÄȱµã¶ø×÷µÄÒ»ÖָĽø·½°¸¡£±ê׼ŷÊϾàÀëµÄ˼·£º¼ÈÈ»Êý¾Ý¸÷ά·ÖÁ¿µÄ·Ö²¼²»Ò»Ñù£¬ÄÇÏȽ«¸÷¸ö·ÖÁ¿¶¼¡°±ê×¼»¯¡±µ½¾ùÖµ¡¢·½²îÏàµÈ¡£ÖÁÓÚ¾ùÖµºÍ·½²î±ê×¼»¯µ½¶àÉÙ£¬Ïȸ´Ï°µãͳ¼ÆÑ§ÖªÊ¶¡£

¼ÙÉèÑù±¾¼¯XµÄÊýѧÆÚÍû»ò¾ùÖµ(mean)Ϊm£¬±ê×¼²î(standard deviation£¬·½²î¿ª¸ù)Ϊs£¬ÄÇôXµÄ¡°±ê×¼»¯±äÁ¿¡±X*±íʾΪ£º(X-m£©/s£¬¶øÇÒ±ê×¼»¯±äÁ¿µÄÊýѧÆÚÍûΪ0£¬·½²îΪ1¡£

¼´£¬Ñù±¾¼¯µÄ±ê×¼»¯¹ý³Ì(standardization)Óù«Ê½ÃèÊö¾ÍÊÇ£º

±ê×¼»¯ºóµÄÖµ = ( ±ê×¼»¯Ç°µÄÖµ £­ ·ÖÁ¿µÄ¾ùÖµ ) /·ÖÁ¿µÄ±ê×¼²î¡¡¡¡
¾­¹ý¼òµ¥µÄÍÆµ¼¾Í¿ÉÒԵõ½Á½¸önάÏòÁ¿a(x11,x12,¡­,x1n)Óë b(x21,x22,¡­,x2n)¼äµÄ±ê×¼»¯Å·ÊϾàÀëµÄ¹«Ê½£º¡¡¡¡

Èç¹û½«·½²îµÄµ¹Êý¿´³ÉÊÇÒ»¸öÈ¨ÖØ£¬Õâ¸ö¹«Ê½¿ÉÒÔ¿´³ÉÊÇÒ»ÖÖ¼ÓȨŷÊϾàÀë(Weighted Euclidean distance)¡£

6. ÂíÊϾàÀë(Mahalanobis Distance)

£¨1£©ÂíÊϾàÀ붨Òå

ÓÐM¸öÑù±¾ÏòÁ¿X1~Xm£¬Ð­·½²î¾ØÕó¼ÇΪS£¬¾ùÖµ¼ÇΪÏòÁ¿¦Ì£¬ÔòÆäÖÐÑù±¾ÏòÁ¿Xµ½uµÄÂíÊϾàÀë±íʾΪ£º

£¨Ð­·½²î¾ØÕóÖÐÿ¸öÔªËØÊǸ÷¸öʸÁ¿ÔªËØÖ®¼äµÄЭ·½²îCov(X,Y)£¬Cov(X,Y) = E{ [X-E(X)] [Y-E(Y)]}£¬ÆäÖÐEΪÊýѧÆÚÍû£©
¶øÆäÖÐÏòÁ¿XiÓëXjÖ®¼äµÄÂíÊϾàÀ붨ÒåΪ£º

ÈôЭ·½²î¾ØÕóÊǵ¥Î»¾ØÕ󣨸÷¸öÑù±¾ÏòÁ¿Ö®¼ä¶ÀÁ¢Í¬·Ö²¼£©,Ôò¹«Ê½¾Í³ÉÁË£º

Ò²¾ÍÊÇÅ·ÊϾàÀëÁË¡£¡¡¡¡

ÈôЭ·½²î¾ØÕóÊǶԽǾØÕ󣬹«Ê½±ä³ÉÁ˱ê×¼»¯Å·ÊϾàÀë¡£

(2)ÂíÊϾàÀëµÄÓÅȱµã£ºÁ¿¸ÙÎ޹أ¬Åųý±äÁ¿Ö®¼äµÄÏà¹ØÐԵĸÉÈÅ¡£

¡¸Î¢²©ÉϵÄseafood¸ßÇå°æµãÆÀµÀ£ºÔ­À´ÂíÊϾàÀëÊǸù¾ÝЭ·½²î¾ØÕóÑݱ䣬һֱ±»ÀÏʦÎóµ¼ÁË£¬¹Ö²»µÃ¿´KillianÔÚ05ÄêNIPS·¢±íµÄLMNNÂÛÎÄʱºòÀÏÊÇ¿´µ½Ð­·½²î¾ØÕóºÍ°ëÕý¶¨£¬Ô­À´ÊÇÕâ»ØÊ¡¹

7¡¢°ÍÊϾàÀ루Bhattacharyya Distance£©£¬ÔÚͳ¼ÆÖУ¬Bhattacharyya¾àÀë²âÁ¿Á½¸öÀëÉ¢»òÁ¬Ðø¸ÅÂÊ·Ö²¼µÄÏàËÆÐÔ¡£ËüÓëºâÁ¿Á½¸öͳ¼ÆÑùÆ·»òÖÖȺ֮¼äµÄÖØµþÁ¿µÄBhattacharyyaϵÊýÃÜÇÐÏà¹Ø¡£Bhattacharyya¾àÀëºÍBhattacharyyaϵÊýÒÔ20ÊÀ¼Í30Äê´úÔøÔÚÓ¡¶Èͳ¼ÆÑо¿Ëù¹¤×÷µÄÒ»¸öͳ¼ÆÑ§¼ÒA. BhattacharyaÃüÃû¡£Í¬Ê±£¬BhattacharyyaϵÊý¿ÉÒÔ±»ÓÃÀ´È·¶¨Á½¸öÑù±¾±»ÈÏΪÏà¶Ô½Ó½üµÄ£¬ËüÊÇÓÃÀ´²âÁ¿ÖеÄÀà·ÖÀàµÄ¿É·ÖÀëÐÔ¡£

£¨1£©°ÍÊϾàÀëµÄ¶¨Òå

¶ÔÓÚÀëÉ¢¸ÅÂÊ·Ö²¼ pºÍqÔÚͬһÓò X£¬Ëü±»¶¨ÒåΪ£º

ÆäÖУº

ÊÇBhattacharyyaϵÊý¡£

¶ÔÓÚÁ¬Ðø¸ÅÂÊ·Ö²¼£¬BhattacharyyaϵÊý±»¶¨ÒåΪ£º

ÔÚÕâÁ½ÖÖÇé¿öÏ£¬°ÍÊϾàÀ벢ûÓзþ´ÓÈý½Ç²»µÈʽ.£¨ÖµµÃÒ»ÌáµÄÊÇ£¬Hellinger¾àÀë²»·þ´ÓÈý½Ç²»µÈʽ£©¡£

¶ÔÓÚ¶à±äÁ¿µÄ¸ß˹·Ö²¼ £¬ ºÍÊÇÊֶκÍЭ·½²îµÄ·Ö²¼¡£

ÐèҪעÒâµÄÊÇ£¬ÔÚÕâÖÖÇé¿öÏ£¬µÚÒ»ÏîÖеÄBhattacharyya¾àÀëÓëÂíÊϾàÀëÓйØÁª¡£

£¨2£©BhattacharyyaϵÊý

BhattacharyyaϵÊýÊÇÁ½¸öͳ¼ÆÑù±¾Ö®¼äµÄÖØµþÁ¿µÄ½üËÆ²âÁ¿£¬¿ÉÒÔ±»ÓÃÓÚÈ·¶¨±»¿¼ÂǵÄÁ½¸öÑù±¾µÄÏà¶Ô½Ó½ü¡£

¼ÆËãBhattacharyyaϵÊýÉæ¼°¼¯³ÉµÄ»ù±¾ÐÎʽµÄÁ½¸öÑù±¾µÄÖØµþµÄʱ¼ä¼ä¸ôµÄÖµµÄÁ½¸öÑù±¾±»·ÖÁѳÉÒ»¸öÑ¡¶¨µÄ·ÖÇøÊý£¬²¢ÇÒÔÚÿ¸ö·ÖÇøÖеÄÿ¸öÑùÆ·µÄ³ÉÔ±µÄÊýÁ¿£¬ÔÚÏÂÃæµÄ¹«Ê½ÖÐʹÓÃ

¿¼ÂÇÑùÆ·a ºÍ b £¬nÊǵķÖÇøÊý£¬²¢ÇÒ£¬±»Ò»¸öºÍ b iµÄÈÕ·ÖÇøÖеÄÑù±¾ÊýÁ¿µÄ³ÉÔ±¡£

8. ººÃ÷¾àÀë(Hamming distance)£¬ Á½¸öµÈ³¤×Ö·û´®s1Óës2Ö®¼äµÄººÃ÷¾àÀ붨ÒåΪ½«ÆäÖÐÒ»¸ö±äΪÁíÍâÒ»¸öËùÐèÒª×÷µÄ×îÐ¡Ìæ»»´ÎÊý¡£ÀýÈç×Ö·û´®¡°1111¡±Óë¡°1001¡±Ö®¼äµÄººÃ÷¾àÀëΪ2¡£Ó¦ÓãºÐÅÏ¢±àÂ루ΪÁËÔöÇ¿ÈÝ´íÐÔ£¬Ó¦Ê¹µÃ±àÂë¼äµÄ×îСººÃ÷¾àÀ뾡¿ÉÄܴ󣩡£

»òÐí£¬Ä㻹ûÃ÷°×ÎÒÔÙ˵ʲô£¬²»¼±£¬¿´ÏÂÉÏÆªblogÖеÚ78ÌâµÄµÚ3СÌâÕûÀíµÄÒ»µÀÃæÊÔÌâÄ¿£¬±ãһĿÁËÈ»ÁË¡£ÈçÏÂͼËùʾ£º

//¶¯Ì¬¹æ»®£º    

//f[i,j]±íʾs[0...i]Óët[0...j]µÄ×îС±à¼­¾àÀë¡£
f[i,j] = min { f[i-1,j]+1, f[i,j-1]+1, f[i-1,j-1]+(s[i]==t[j]?0:1) }

//·Ö±ð±íʾ£ºÌí¼Ó1¸ö£¬É¾³ý1¸ö£¬Ìæ»»1¸ö£¨Ïàͬ¾Í²»ÓÃÌæ»»£©¡£

Óë´Ëͬʱ£¬ÃæÊÔ¹Ù»¹¿ÉÒÔ¼ÌÐøÎÊÏÂÈ¥£ºÄÇô£¬ÇëÎÊ£¬ÈçºÎÉè¼ÆÒ»¸ö±È½ÏÁ½ÆªÎÄÕÂÏàËÆÐÔµÄËã·¨£¿£¨Õâ¸öÎÊÌâµÄÌÖÂÛ¿ÉÒÔ¿´¿´ÕâÀhttp://t.cn/zl82CAH£¬¼°ÕâÀï¹ØÓÚsimhashËã·¨µÄ½éÉÜ£ºhttp://www.cnblogs.com/linecong/archive/2010/08/28/simhash.html£©£¬½ÓÏÂÀ´£¬±ãÒý³öÁËÏÂÎĹØÓڼнÇÓàÏÒµÄÌÖÂÛ¡£

£¨ÉÏÆªblogÖеÚ78ÌâµÄµÚ3СÌâ¸ø³öÁ˶àÖÖ·½·¨£¬¶ÁÕß¿ÉÒԲο´Ö®¡£Í¬Ê±£¬³ÌÐòÔ±±à³ÌÒÕÊõϵÁеڶþÊ®°ËÕ½«Ïêϸ²ûÊöÕâ¸öÎÊÌ⣩

9. ¼Ð½ÇÓàÏÒ(Cosine) £¬¼¸ºÎÖмнÇÓàÏÒ¿ÉÓÃÀ´ºâÁ¿Á½¸öÏòÁ¿·½ÏòµÄ²îÒ죬»úÆ÷ѧϰÖнèÓÃÕâÒ»¸ÅÄîÀ´ºâÁ¿Ñù±¾ÏòÁ¿Ö®¼äµÄ²îÒì¡£

(1)ÔÚ¶þά¿Õ¼äÖÐÏòÁ¿A(x1,y1)ÓëÏòÁ¿B(x2,y2)µÄ¼Ð½ÇÓàÏÒ¹«Ê½£º

(2) Á½¸önάÑù±¾µãa(x11,x12,¡­,x1n)ºÍb(x21,x22,¡­,x2n)µÄ¼Ð½ÇÓàÏÒ

ÀàËÆµÄ£¬¶ÔÓÚÁ½¸önάÑù±¾µãa(x11,x12,¡­,x1n)ºÍb(x21,x22,¡­,x2n)£¬¿ÉÒÔʹÓÃÀàËÆÓڼнÇÓàÏҵĸÅÄîÀ´ºâÁ¿ËüÃǼäµÄÏàËÆ³Ì¶È£¬¼´£º

¼Ð½ÇÓàÏÒȡֵ·¶Î§Îª[-1,1]¡£¼Ð½ÇÓàÏÒÔ½´ó±íʾÁ½¸öÏòÁ¿µÄ¼Ð½ÇԽС£¬¼Ð½ÇÓàÏÒԽС±íʾÁ½ÏòÁ¿µÄ¼Ð½ÇÔ½´ó¡£µ±Á½¸öÏòÁ¿µÄ·½ÏòÖØºÏʱ¼Ð½ÇÓàÏÒÈ¡×î´óÖµ1£¬µ±Á½¸öÏòÁ¿µÄ·½ÏòÍêÈ«Ïà·´¼Ð½ÇÓàÏÒÈ¡×îСֵ-1¡£

10. ½Ü¿¨µÂÏàËÆÏµÊý(Jaccard similarity coefficient)

(1) ½Ü¿¨µÂÏàËÆÏµÊý

Á½¸ö¼¯ºÏAºÍBµÄ½»¼¯ÔªËØÔÚA£¬BµÄ²¢¼¯ÖÐËùÕ¼µÄ±ÈÀý£¬³ÆÎªÁ½¸ö¼¯ºÏµÄ½Ü¿¨µÂÏàËÆÏµÊý£¬Ó÷ûºÅJ(A,B)±íʾ¡£¡¡

½Ü¿¨µÂÏàËÆÏµÊýÊǺâÁ¿Á½¸ö¼¯ºÏµÄÏàËÆ¶ÈÒ»ÖÖÖ¸±ê¡£

(2) ½Ü¿¨µÂ¾àÀë

Óë½Ü¿¨µÂÏàËÆÏµÊýÏà·´µÄ¸ÅÄîÊǽܿ¨µÂ¾àÀë(Jaccard distance)¡£

½Ü¿¨µÂ¾àÀë¿ÉÓÃÈçϹ«Ê½±íʾ£º¡¡¡¡

½Ü¿¨µÂ¾àÀëÓÃÁ½¸ö¼¯ºÏÖв»Í¬ÔªËØÕ¼ËùÓÐÔªËØµÄ±ÈÀýÀ´ºâÁ¿Á½¸ö¼¯ºÏµÄÇø·Ö¶È¡£

(3) ½Ü¿¨µÂÏàËÆÏµÊýÓë½Ü¿¨µÂ¾àÀëµÄÓ¦ÓÃ

¿É½«½Ü¿¨µÂÏàËÆÏµÊýÓÃÔÚºâÁ¿Ñù±¾µÄÏàËÆ¶ÈÉÏ¡£

¾ÙÀý£ºÑù±¾AÓëÑù±¾BÊÇÁ½¸önάÏòÁ¿£¬¶øÇÒËùÓÐά¶ÈµÄȡֵ¶¼ÊÇ0»ò1£¬ÀýÈ磺A(0111)ºÍB(1011)¡£ÎÒÃǽ«Ñù±¾¿´³ÉÊÇÒ»¸ö¼¯ºÏ£¬1±íʾ¼¯ºÏ°üº¬¸ÃÔªËØ£¬0±íʾ¼¯ºÏ²»°üº¬¸ÃÔªËØ¡£

M11 £ºÑù±¾AÓëB¶¼ÊÇ1µÄά¶ÈµÄ¸öÊý

M01£ºÑù±¾AÊÇ0£¬Ñù±¾BÊÇ1µÄά¶ÈµÄ¸öÊý

M10£ºÑù±¾AÊÇ1£¬Ñù±¾BÊÇ0 µÄά¶ÈµÄ¸öÊý

M00£ºÑù±¾AÓëB¶¼ÊÇ0µÄά¶ÈµÄ¸öÊý

ÒÀ¾ÝÉÏÎĸøµÄ½Ü¿¨µÂÏàËÆÏµÊý¼°½Ü¿¨µÂ¾àÀëµÄÏà¹Ø¶¨Ò壬Ñù±¾AÓëBµÄ½Ü¿¨µÂÏàËÆÏµÊýJ¿ÉÒÔ±íʾΪ£º

ÕâÀïM11+M01+M10¿ÉÀí½âΪAÓëBµÄ²¢¼¯µÄÔªËØ¸öÊý£¬¶øM11ÊÇAÓëBµÄ½»¼¯µÄÔªËØ¸öÊý¡£¶øÑù±¾AÓëBµÄ½Ü¿¨µÂ¾àÀë±íʾΪJ'£º

11.Ƥ¶ûѷϵÊý(Pearson Correlation Coefficient)

ÔÚ¾ßÌå²ûÊöƤ¶ûÑ·Ïà¹ØÏµÊý֮ǰ£¬ÓбØÒª½âÊÍÏÂʲôÊÇÏà¹ØÏµÊý ( Correlation coefficient )ÓëÏà¹Ø¾àÀë(Correlation distance)¡£

Ïà¹ØÏµÊý ( Correlation coefficient )µÄ¶¨ÒåÊÇ£º

(ÆäÖУ¬EΪÊýѧÆÚÍû»ò¾ùÖµ£¬DΪ·½²î£¬D¿ª¸ùºÅΪ±ê×¼²î£¬E{ [X-E(X)] [Y-E(Y)]}³ÆÎªËæ»ú±äÁ¿XÓëYµÄЭ·½²î£¬¼ÇΪCov(X,Y)£¬¼´Cov(X,Y) = E{ [X-E(X)] [Y-E(Y)]}£¬¶øÁ½¸ö±äÁ¿Ö®¼äµÄЭ·½²îºÍ±ê×¼²îµÄÉÌÔò³ÆÎªËæ»ú±äÁ¿XÓëYµÄÏà¹ØÏµÊý£¬¼ÇΪ)

Ïà¹ØÏµÊýºâÁ¿Ëæ»ú±äÁ¿XÓëYÏà¹Ø³Ì¶ÈµÄÒ»ÖÖ·½·¨£¬Ïà¹ØÏµÊýµÄȡֵ·¶Î§ÊÇ[-1,1]¡£Ïà¹ØÏµÊýµÄ¾ø¶ÔÖµÔ½´ó£¬Ôò±íÃ÷XÓëYÏà¹Ø¶ÈÔ½¸ß¡£µ±XÓëYÏßÐÔÏà¹ØÊ±£¬Ïà¹ØÏµÊýȡֵΪ1£¨ÕýÏßÐÔÏà¹Ø£©»ò-1£¨¸ºÏßÐÔÏà¹Ø£©¡£

¾ßÌåµÄ£¬Èç¹ûÓÐÁ½¸ö±äÁ¿£ºX¡¢Y£¬×îÖÕ¼ÆËã³öµÄÏà¹ØÏµÊýµÄº¬Òå¿ÉÒÔÓÐÈçÏÂÀí½â£º

µ±Ïà¹ØÏµÊýΪ0ʱ£¬XºÍYÁ½±äÁ¿ÎÞ¹ØÏµ¡£

µ±XµÄÖµÔö´ó£¨¼õС£©£¬YÖµÔö´ó£¨¼õС£©£¬Á½¸ö±äÁ¿ÎªÕýÏà¹Ø£¬Ïà¹ØÏµÊýÔÚ0.00Óë1.00Ö®¼ä¡£

µ±XµÄÖµÔö´ó£¨¼õС£©£¬YÖµ¼õС£¨Ôö´ó£©£¬Á½¸ö±äÁ¿Îª¸ºÏà¹Ø£¬Ïà¹ØÏµÊýÔÚ-1.00Óë0.00Ö®¼ä¡£

Ïà¹Ø¾àÀëµÄ¶¨ÒåÊÇ£º

   
3060 ´Îä¯ÀÀ       34
     
Ïà¹ØÎÄÕ Ïà¹ØÎĵµ Ïà¹ØÊÓÆµ



ÎÒÃǸÃÈçºÎÉè¼ÆÊý¾Ý¿â
Êý¾Ý¿âÉè¼Æ¾­Ñé̸
Êý¾Ý¿âÉè¼Æ¹ý³Ì
Êý¾Ý¿â±à³Ì×ܽá
Êý¾Ý¿âÐÔÄܵ÷Óż¼ÇÉ
Êý¾Ý¿âÐÔÄܵ÷Õû
Êý¾Ý¿âÐÔÄÜÓÅ»¯½²×ù
Êý¾Ý¿âϵͳÐÔÄܵ÷ÓÅϵÁÐ
¸ßÐÔÄÜÊý¾Ý¿âÉè¼ÆÓëÓÅ»¯
¸ß¼¶Êý¾Ý¿â¼Ü¹¹Ê¦
Êý¾Ý²Ö¿âºÍÊý¾ÝÍÚ¾ò¼¼Êõ
HadoopÔ­Àí¡¢²¿ÊðÓëÐÔÄܵ÷ÓÅ
×îл¼Æ»®
ǶÈëʽÈí¼þ¼Ü¹¹Éè¼Æ 12-11[±±¾©]
LLM´óÄ£ÐÍÓëÖÇÄÜÌ忪·¢ÊµÕ½ 12-18[±±¾©]
ǶÈëʽÈí¼þ²âÊÔ 12-25[±±¾©]
AIÔ­ÉúÓ¦ÓõÄ΢·þÎñ¼Ü¹¹ 1-9[±±¾©]
AI´óÄ£Ðͱàд¸ßÖÊÁ¿´úÂë 1-14[±±¾©]
ÐèÇó·ÖÎöÓë¹ÜÀí 1-22[±±¾©]

MySQLË÷Òý±³ºóµÄÊý¾Ý½á¹¹
MySQLÐÔÄܵ÷ÓÅÓë¼Ü¹¹Éè¼Æ
SQL ServerÊý¾Ý¿â±¸·ÝÓë»Ö¸´
ÈÃÊý¾Ý¿â·ÉÆðÀ´ 10´óDB2ÓÅ»¯
oracleµÄÁÙʱ±í¿Õ¼äдÂú´ÅÅÌ
Êý¾Ý¿âµÄ¿çƽ̨Éè¼Æ


²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿â
¸ß¼¶Êý¾Ý¿â¼Ü¹¹Éè¼ÆÊ¦
HadoopÔ­ÀíÓëʵ¼ù
Oracle Êý¾Ý²Ö¿â
Êý¾Ý²Ö¿âºÍÊý¾ÝÍÚ¾ò
OracleÊý¾Ý¿â¿ª·¢Óë¹ÜÀí


GE Çø¿éÁ´¼¼ÊõÓëʵÏÖÅàѵ
º½Ìì¿Æ¹¤Ä³×Ó¹«Ë¾ Nodejs¸ß¼¶Ó¦Óÿª·¢
ÖÐÊ¢Òæ»ª ׿Խ¹ÜÀíÕß±ØÐë¾ß±¸µÄÎåÏîÄÜÁ¦
ijÐÅÏ¢¼¼Êõ¹«Ë¾ PythonÅàѵ
ij²©²ÊITϵͳ³§ÉÌ Ò×ÓÃÐÔ²âÊÔÓëÆÀ¹À
ÖйúÓÊ´¢ÒøÐÐ ²âÊÔ³ÉÊì¶ÈÄ£Ðͼ¯³É(TMMI)
ÖÐÎïÔº ²úÆ·¾­ÀíÓë²úÆ·¹ÜÀí