|
ǰÑÔ
ǰÁ½ÈÕ£¬ÔÚ΢²©ÉÏ˵£º¡°µ½½ñÌìΪֹ£¬ÎÒÖÁÉÙ¿÷Ç·ÁË3ƪÎÄÕ´ýд£º1¡¢KDÊ÷£»2¡¢Éñ¾ÍøÂ磻3¡¢±à³ÌÒÕÊõµÚ28Õ¡£Äã¿´µ½£¬blogÄÚµÄÎÄÕÂÓëÄãÓÚ±ð´¦Ëù¼ûµÄÈκζ¼²»Í¬¡£ÓÚÊÇ£¬µÈ°¡µÈ£¬µÈһ̨µçÄÔ£¬Ö»ºÃµÈ´ý..¡±¡£µÃÒæÓÚÌ½èÁËÎÒһ̨µçÄÔ£¨½èËûµçÄÔµÄʱºò£¬ÎÒÁ¬±íʾ¸Ðл£¬Ëû˵¡°ÄÜÕÒµ½¹¤×÷È«¿¿ÄãµÄ²©¿Í£¬Õâµã¶ùСæ»¹Ëµ£¬²»µØµÀ¡±£¬ÓеÄʱºò£¬ÉÔÐí¸ÐÊܵ½ÊÜÈËÐÅÈÎÒ²ÊÇÒ»ÖÖѹÁ¦£¬Ô¸ÎÒ²»¹¼¸º´ó¼Ò¶ÔÎÒµÄÐÅÈΣ©£¬ÓÚÊǽñÌ쿪ʼTop
10 Algorithms in Data MiningϵÁеÚÈýƪÎÄÕ£¬¼´±¾ÎÄ¡¸´ÓK½üÁÚË㷨̸µ½KDÊ÷¡¢SIFT+BBFËã·¨¡¹µÄ´´×÷¡£
Ò»¸öÈ˼á³Ö×Ô¼ºµÄÐËȤÊDZȽÏÄѵģ¬ÒòΪ̫¶àµÄÈËÌ«ÈÝÒ×ΪÍâ½çËù¶¯ÁË£¬¶øÓÈÆäµ±ÄãÎÞ·¨´ÓÖеõ½¶àÉÙʵ¼ÊÐԵĻر¨Ê±£¬ËùÐÒ£¬ÎÒÄÜÒ»Ö±¼á³ÖÏÂÀ´¡£±Ï´ï¸çÀ˹ѧÅÉÓоäÃûÑÔ£º¡°ÍòÎï½ÔÊý¡±£¬×î½ü¶ÁÍ꡸΢»ý·Ö¸ÅÄչʷ¡¹ºóÒ²¸ÐÊܵ½ÁËÕâÒ»µã¡£Í¬Ê±£¬´ÓËã·¨µ½Êý¾ÝÍÚ¾ò¡¢»úÆ÷ѧϰ£¬ÔÙµ½Êýѧ£¬ÆäÖÐÿһ¸öÁìÓòÈκÎÒ»¸öϸ½Ú¶¼ÖµµÃ̽Ë÷ÖÕÉú£¬»òÐí£¬Õâ¾ÍÊÇ¡°ÖÕÉúΪѧ¡±µÄÒâ˼¡£
ͬʱ£¬Ä㽫¿´µ½£¬K½üÁÚË㷨ͬ±¾ÏµÁеÄǰÁ½ÆªÎÄÕÂËù½²µÄ¾ö²ßÊ÷·ÖÀ౴Ҷ˹·ÖÀ࣬¼°Ö§³ÖÏòÁ¿»úSVMÒ»Ñù£¬Ò²ÊÇÓÃÓÚ½â¾ö·ÖÀàÎÊÌâµÄËã·¨£¬

¶ø±¾Êý¾ÝÍÚ¾òÊ®´óË㷨ϵÁÐÒ²»á°´ÕÕ·ÖÀ࣬¾ÛÀ࣬¹ØÁª·ÖÎö£¬Ô¤²â»Ø¹éµÈÎÊÌâÒÀ´ÎÕ¹¿ª²ûÊö¡£
OK£¬ÐÐÎIJִ٣¬±¾ÎÄÈôÓÐÈκΩ¶´£¬ÎÊÌâ»òÕß´íÎ󣬻¶ÓÅóÓÑÃÇËæÊ±²»ÁßÖ¸Õý£¬¸÷λµÄÅúÆÀÒ²ÊÇÎÒ¼ÌÐøÐ´ÏÂÈ¥µÄ¶¯Á¦Ö®Ò»¡£¸Ðл¡£
µÚÒ»²¿·Ö¡¢K½üÁÚËã·¨
1.1¡¢Ê²Ã´ÊÇK½üÁÚËã·¨
ºÎνK½üÁÚËã·¨£¬¼´K-Nearest Neighbor algorithm£¬¼ò³ÆKNNËã·¨£¬µ¥´ÓÃû×ÖÀ´²ÂÏ룬¿ÉÒÔ¼òµ¥´Ö±©µÄÈÏΪÊÇ£ºK¸ö×î½üµÄÁÚ¾Ó£¬µ±K=1ʱ£¬Ëã·¨±ã³ÉÁË×î½üÁÚËã·¨£¬¼´Ñ°ÕÒ×î½üµÄÄǸöÁÚ¾Ó¡£ÎªºÎÒªÕÒÁÚ¾Ó£¿´ò¸ö±È·½À´Ëµ£¬¼ÙÉèÄãÀ´µ½Ò»¸öİÉúµÄ´åׯ£¬ÏÖÔÚÄãÒªÕÒµ½ÓëÄãÓÐ×ÅÏàËÆÌØÕ÷µÄÈËȺÈÚÈëËûÃÇ£¬ËùνÈë»ï¡£
Óùٷ½µÄ»°À´Ëµ£¬ËùνK½üÁÚËã·¨£¬¼´ÊǸø¶¨Ò»¸öѵÁ·Êý¾Ý¼¯£¬¶ÔеÄÊäÈëʵÀý£¬ÔÚѵÁ·Êý¾Ý¼¯ÖÐÕÒµ½Óë¸ÃʵÀý×îÁÚ½üµÄK¸öʵÀý£¨Ò²¾ÍÊÇÉÏÃæËù˵µÄK¸öÁÚ¾Ó£©£¬ÕâK¸öʵÀýµÄ¶àÊýÊôÓÚij¸öÀ࣬¾Í°Ñ¸ÃÊäÈëʵÀý·ÖÀൽÕâ¸öÀàÖС£¸ù¾ÝÕâ¸ö˵·¨£¬ÔÛÃÇÀ´¿´ÏÂÒý×Ôά»ù°Ù¿ÆÉϵÄÒ»·ùͼ£º

ÈçÉÏͼËùʾ£¬ÓÐÁ½À಻ͬµÄÑù±¾Êý¾Ý£¬·Ö±ðÓÃÀ¶É«µÄСÕý·½ÐκͺìÉ«µÄСÈý½ÇÐαíʾ£¬¶øÍ¼ÕýÖмäµÄÄǸöÂÌÉ«µÄÔ²Ëù±êʾµÄÊý¾ÝÔòÊÇ´ý·ÖÀàµÄÊý¾Ý¡£Ò²¾ÍÊÇ˵£¬ÏÖÔÚ£¬ÎÒÃDz»ÖªµÀÖмäÄǸöÂÌÉ«µÄÊý¾ÝÊÇ´ÓÊôÓÚÄÄÒ»ÀࣨÀ¶É«Ð¡Õý·½ÐÎorºìɫСÈý½ÇÐΣ©£¬ÏÂÃæ£¬ÎÒÃǾÍÒª½â¾öÕâ¸öÎÊÌ⣺¸øÕâ¸öÂÌÉ«µÄÔ²·ÖÀà¡£
ÎÒÃdz£Ëµ£¬ÎïÒÔÀà¾Û£¬ÈËÒÔȺ·Ö£¬ÅбðÒ»¸öÈËÊÇÒ»¸öʲôÑùÆ·ÖÊÌØÕ÷µÄÈË£¬³£³£¿ÉÒÔ´ÓËû/ËýÉí±ßµÄÅóÓÑÈëÊÖ£¬Ëùν¹ÛÆäÓÑ£¬¶øÊ¶ÆäÈË¡£ÎÒÃDz»ÊÇÒªÅбðÉÏͼÖÐÄǸöÂÌÉ«µÄÔ²ÊÇÊôÓÚÄÄÒ»ÀàÊý¾Ýô£¬ºÃ˵£¬´ÓËüµÄÁÚ¾ÓÏÂÊÖ¡£µ«Ò»´ÎÐÔ¿´¶àÉÙ¸öÁÚ¾ÓÄØ£¿´ÓÉÏͼÖУ¬Ä㻹ÄÜ¿´µ½£º
Èç¹ûK=3£¬ÂÌɫԲµãµÄ×î½üµÄ3¸öÁÚ¾ÓÊÇ2¸öºìɫСÈý½ÇÐκÍ1¸öÀ¶É«Ð¡Õý·½ÐΣ¬ÉÙÊý´ÓÊôÓÚ¶àÊý£¬»ùÓÚͳ¼ÆµÄ·½·¨£¬Åж¨ÂÌÉ«µÄÕâ¸ö´ý·ÖÀàµãÊôÓÚºìÉ«µÄÈý½ÇÐÎÒ»Àà¡£
Èç¹ûK=5£¬ÂÌɫԲµãµÄ×î½üµÄ5¸öÁÚ¾ÓÊÇ2¸öºìÉ«Èý½ÇÐκÍ3¸öÀ¶É«µÄÕý·½ÐΣ¬»¹ÊÇÉÙÊý´ÓÊôÓÚ¶àÊý£¬»ùÓÚͳ¼ÆµÄ·½·¨£¬Åж¨ÂÌÉ«µÄÕâ¸ö´ý·ÖÀàµãÊôÓÚÀ¶É«µÄÕý·½ÐÎÒ»Àà¡£
ÓÚ´ËÎÒÃÇ¿´µ½£¬µ±ÎÞ·¨Åж¨µ±Ç°´ý·ÖÀàµãÊÇ´ÓÊôÓÚÒÑÖª·ÖÀàÖеÄÄÄÒ»Ààʱ£¬ÎÒÃÇ¿ÉÒÔÒÀ¾Ýͳ¼ÆÑ§µÄÀíÂÛ¿´ËüËù´¦µÄλÖÃÌØÕ÷£¬ºâÁ¿ËüÖÜΧÁÚ¾ÓµÄÈ¨ÖØ£¬¶ø°ÑËü¹éΪ(»ò·ÖÅä)µ½È¨Öظü´óµÄÄÇÒ»Àà¡£Õâ¾ÍÊÇK½üÁÚËã·¨µÄºËÐÄ˼Ïë¡£
1.2¡¢½üÁڵľàÀë¶ÈÁ¿±íʾ·¨
ÉÏÎĵÚÒ»½Ú£¬ÎÒÃÇ¿´µ½£¬K½üÁÚËã·¨µÄºËÐÄÔÚÓÚÕÒµ½ÊµÀýµãµÄÁÚ¾Ó£¬Õâ¸öʱºò£¬ÎÊÌâ¾Í½Óõà¶øÖÁÁË£¬ÈçºÎÕÒµ½ÁÚ¾Ó£¬ÁÚ¾ÓµÄÅж¨±ê×¼ÊÇʲô£¬ÓÃʲôÀ´¶ÈÁ¿¡£ÕâһϵÁÐÎÊÌâ±ãÊÇÏÂÃæÒª½²µÄ¾àÀë¶ÈÁ¿±íʾ·¨¡£µ«ÓеĶÁÕß¿ÉÄܾÍÓÐÒÉÎÊÁË£¬ÎÒÊÇÒªÕÒÁÚ¾Ó£¬ÕÒÏàËÆÐÔ£¬ÔõôÓÖ¸ú¾àÀë³¶ÉϹØÏµÁË£¿
ÕâÊÇÒòÎªÌØÕ÷¿Õ¼äÖÐÁ½¸öʵÀýµãµÄ¾àÀë¿ÉÒÔ·´Ó¦³öÁ½¸öʵÀýµãÖ®¼äµÄÏàËÆÐԳ̶ȡ£K½üÁÚÄ£Ð͵ÄÌØÕ÷¿Õ¼äÒ»°ãÊÇnάʵÊýÏòÁ¿¿Õ¼ä£¬Ê¹ÓõľàÀë¿ÉÒÔʹŷʽ¾àÀ룬ҲÊÇ¿ÉÒÔÊÇÆäËü¾àÀ룬¼ÈÈ»³¶µ½Á˾àÀ룬ÏÂÃæ¾ÍÀ´¾ßÌå²ûÊö϶¼ÓÐÄÄЩ¾àÀë¶ÈÁ¿µÄ±íʾ·¨£¬È¨µ±À©Õ¹¡£
1. Å·ÊϾàÀ룬×î³£¼ûµÄÁ½µãÖ®¼ä»ò¶àµãÖ®¼äµÄ¾àÀë±íʾ·¨£¬ÓÖ³ÆÖ®ÎªÅ·¼¸ÀïµÃ¶ÈÁ¿£¬Ëü¶¨ÒåÓÚÅ·¼¸ÀïµÃ¿Õ¼äÖУ¬Èçµã
x = (x1,...,xn) ºÍ y = (y1,...,yn) Ö®¼äµÄ¾àÀëΪ£º

(1)¶þÎ¬Æ½ÃæÉÏÁ½µãa(x1,y1)Óëb(x2,y2)¼äµÄÅ·ÊϾàÀ룺

(2)Èýά¿Õ¼äÁ½µãa(x1,y1,z1)Óëb(x2,y2,z2)¼äµÄÅ·ÊϾàÀ룺

(3)Á½¸önάÏòÁ¿a(x11,x12,¡,x1n)Óë b(x21,x22,¡,x2n)¼äµÄÅ·ÊϾàÀ룺

Ò²¿ÉÒÔÓñíʾ³ÉÏòÁ¿ÔËËãµÄÐÎʽ£º

ÆäÉÏ£¬¶þÎ¬Æ½ÃæÉÏÁ½µãŷʽ¾àÀ룬´úÂë¿ÉÒÔÈçϱàд£º
//unixfy£º¼ÆËãÅ·ÊϾàÀë double euclideanDistance(const vector<double>& v1, const vector<double>& v2) { assert(v1.size() == v2.size()); double ret = 0.0; for (vector<double>::size_type i = 0; i != v1.size(); ++i) { ret += (v1[i] - v2[i]) * (v1[i] - v2[i]); } return sqrt(ret); } |
2. Âü¹þ¶Ù¾àÀ룬ÎÒÃÇ¿ÉÒÔ¶¨ÒåÂü¹þ¶Ù¾àÀëµÄÕýʽÒâÒåΪL1-¾àÀë»ò³ÇÊÐÇø¿é¾àÀ룬Ҳ¾ÍÊÇÔÚÅ·¼¸ÀïµÃ¿Õ¼äµÄ¹Ì¶¨Ö±½Ç×ø±êϵÉÏÁ½µãËùÐγɵÄÏ߶ζÔÖá²úÉúµÄͶӰµÄ¾àÀë×ܺ͡£ÀýÈçÔÚÆ½ÃæÉÏ£¬×ø±ê£¨x1,
y1£©µÄµãP1Óë×ø±ê£¨x2, y2£©µÄµãP2µÄÂü¹þ¶Ù¾àÀëΪ£º £¬Òª×¢ÒâµÄÊÇ£¬Âü¹þ¶Ù¾àÀëÒÀÀµ×ù±êϵͳµÄת¶È£¬¶ø·ÇϵͳÔÚ×ù±êÖáÉÏµÄÆ½ÒÆ»òÓ³Éä¡£
ͨË×À´½²£¬ÏëÏóÄãÔÚÂü¹þ¶ÙÒª´ÓÒ»¸öÊ®×Ö·¿Ú¿ª³µµ½ÁíÍâÒ»¸öÊ®×Ö·¿Ú£¬¼ÝÊ»¾àÀëÊÇÁ½µã¼äµÄÖ±Ïß¾àÀëÂð£¿ÏÔÈ»²»ÊÇ£¬³ý·ÇÄãÄÜ´©Ô½´óÂ¥¡£¶øÊµ¼Ê¼ÝÊ»¾àÀë¾ÍÊÇÕâ¸ö¡°Âü¹þ¶Ù¾àÀ롱£¬´Ë¼´Âü¹þ¶Ù¾àÀëÃû³ÆµÄÀ´Ô´£¬
ͬʱ£¬Âü¹þ¶Ù¾àÀëÒ²³ÆÎª³ÇÊнÖÇø¾àÀë(City Block distance)¡£
(1)¶þÎ¬Æ½ÃæÁ½µãa(x1,y1)Óëb(x2,y2)¼äµÄÂü¹þ¶Ù¾àÀë

(2)Á½¸önάÏòÁ¿a(x11,x12,¡,x1n)Óë b(x21,x22,¡,x2n)¼äµÄÂü¹þ¶Ù¾àÀë

3. ÇбÈÑ©·ò¾àÀ룬Èô¶þ¸öÏòÁ¿»ò¶þ¸öµãp ¡¢and q£¬Æä×ù±ê·Ö±ðΪ¼°£¬ÔòÁ½ÕßÖ®¼äµÄÇбÈÑ©·ò¾àÀ붨ÒåÈçÏ£º

ÕâÒ²µÈÓÚÒÔÏÂLp¶ÈÁ¿µÄ¼«Öµ£º £¬Òò´ËÇбÈÑ©·ò¾àÀëÒ²³ÆÎªL¡Þ¶ÈÁ¿¡£
ÒÔÊýѧµÄ¹ÛµãÀ´¿´£¬ÇбÈÑ©·ò¾àÀëÊÇÓÉÒ»Ö·¶Êý£¨uniform norm£©£¨»ò³ÆÎªÉÏÈ·½ç·¶Êý£©ËùÑÜÉúµÄ¶ÈÁ¿£¬Ò²Êdz¬Í¹¶ÈÁ¿£¨injective
metric space£©µÄÒ»ÖÖ¡£
ÔÚÆ½Ã漸ºÎÖУ¬Èô¶þµãp¼°qµÄÖ±½Ç×ø±êÏµ×ø±êΪ £¬ÔòÇбÈÑ©·ò¾àÀëΪ£º ¡£
Íæ¹ý¹ú¼ÊÏóÆåµÄÅóÓÑ»òÐíÖªµÀ£¬¹úÍõ×ßÒ»²½Äܹ»Òƶ¯µ½ÏàÁÚµÄ8¸ö·½¸ñÖеÄÈÎÒâÒ»¸ö¡£ÄÇô¹úÍõ´Ó¸ñ×Ó(x1,y1)×ßµ½¸ñ×Ó(x2,y2)×îÉÙÐèÒª¶àÉÙ²½£¿¡£Äã»á·¢ÏÖ×îÉÙ²½Êý×ÜÊÇmax(
| x2-x1 | , | y2-y1 | ) ²½ ¡£ÓÐÒ»ÖÖÀàËÆµÄÒ»ÖÖ¾àÀë¶ÈÁ¿·½·¨½ÐÇбÈÑ©·ò¾àÀë¡£
(1)¶þÎ¬Æ½ÃæÁ½µãa(x1,y1)Óëb(x2,y2)¼äµÄÇбÈÑ©·ò¾àÀë

(2)Á½¸önάÏòÁ¿a(x11,x12,¡,x1n)Óë b(x21,x22,¡,x2n)¼äµÄÇбÈÑ©·ò¾àÀë

Õâ¸ö¹«Ê½µÄÁíÒ»ÖֵȼÛÐÎʽÊÇ

4. ãɿɷò˹»ù¾àÀë(Minkowski Distance)£¬ãÉÊϾàÀë²»ÊÇÒ»ÖÖ¾àÀ룬¶øÊÇÒ»×é¾àÀëµÄ¶¨Òå¡£
(1) ãÉÊϾàÀëµÄ¶¨Òå
Á½¸önά±äÁ¿a(x11,x12,¡,x1n)Óë b(x21,x22,¡,x2n)¼äµÄãɿɷò˹»ù¾àÀ붨ÒåΪ£º
ÆäÖÐpÊÇÒ»¸ö±ä²ÎÊý¡£
µ±p=1ʱ£¬¾ÍÊÇÂü¹þ¶Ù¾àÀë
µ±p=2ʱ£¬¾ÍÊÇÅ·ÊϾàÀë
µ±p¡ú¡Þʱ£¬¾ÍÊÇÇбÈÑ©·ò¾àÀë
¸ù¾Ý±ä²ÎÊýµÄ²»Í¬£¬ãÉÊϾàÀë¿ÉÒÔ±íʾһÀàµÄ¾àÀë¡£
5. ±ê×¼»¯Å·ÊϾàÀë (Standardized Euclidean distance
)£¬±ê×¼»¯Å·ÊϾàÀëÊÇÕë¶Ô¼òµ¥Å·ÊϾàÀëµÄȱµã¶ø×÷µÄÒ»ÖָĽø·½°¸¡£±ê׼ŷÊϾàÀëµÄ˼·£º¼ÈÈ»Êý¾Ý¸÷ά·ÖÁ¿µÄ·Ö²¼²»Ò»Ñù£¬ÄÇÏȽ«¸÷¸ö·ÖÁ¿¶¼¡°±ê×¼»¯¡±µ½¾ùÖµ¡¢·½²îÏàµÈ¡£ÖÁÓÚ¾ùÖµºÍ·½²î±ê×¼»¯µ½¶àÉÙ£¬Ïȸ´Ï°µãͳ¼ÆÑ§ÖªÊ¶¡£
¼ÙÉèÑù±¾¼¯XµÄÊýѧÆÚÍû»ò¾ùÖµ(mean)Ϊm£¬±ê×¼²î(standard
deviation£¬·½²î¿ª¸ù)Ϊs£¬ÄÇôXµÄ¡°±ê×¼»¯±äÁ¿¡±X*±íʾΪ£º(X-m£©/s£¬¶øÇÒ±ê×¼»¯±äÁ¿µÄÊýѧÆÚÍûΪ0£¬·½²îΪ1¡£
¼´£¬Ñù±¾¼¯µÄ±ê×¼»¯¹ý³Ì(standardization)Óù«Ê½ÃèÊö¾ÍÊÇ£º

±ê×¼»¯ºóµÄÖµ = ( ±ê×¼»¯Ç°µÄÖµ £ ·ÖÁ¿µÄ¾ùÖµ ) /·ÖÁ¿µÄ±ê×¼²î¡¡¡¡
¾¹ý¼òµ¥µÄÍÆµ¼¾Í¿ÉÒԵõ½Á½¸önάÏòÁ¿a(x11,x12,¡,x1n)Óë b(x21,x22,¡,x2n)¼äµÄ±ê×¼»¯Å·ÊϾàÀëµÄ¹«Ê½£º¡¡¡¡

Èç¹û½«·½²îµÄµ¹Êý¿´³ÉÊÇÒ»¸öÈ¨ÖØ£¬Õâ¸ö¹«Ê½¿ÉÒÔ¿´³ÉÊÇÒ»ÖÖ¼ÓȨŷÊϾàÀë(Weighted
Euclidean distance)¡£
6. ÂíÊϾàÀë(Mahalanobis Distance)
£¨1£©ÂíÊϾàÀ붨Òå
ÓÐM¸öÑù±¾ÏòÁ¿X1~Xm£¬Ð·½²î¾ØÕó¼ÇΪS£¬¾ùÖµ¼ÇΪÏòÁ¿¦Ì£¬ÔòÆäÖÐÑù±¾ÏòÁ¿Xµ½uµÄÂíÊϾàÀë±íʾΪ£º
£¨Ð·½²î¾ØÕóÖÐÿ¸öÔªËØÊǸ÷¸öʸÁ¿ÔªËØÖ®¼äµÄз½²îCov(X,Y)£¬Cov(X,Y)
= E{ [X-E(X)] [Y-E(Y)]}£¬ÆäÖÐEΪÊýѧÆÚÍû£©
¶øÆäÖÐÏòÁ¿XiÓëXjÖ®¼äµÄÂíÊϾàÀ붨ÒåΪ£º

Èôз½²î¾ØÕóÊǵ¥Î»¾ØÕ󣨸÷¸öÑù±¾ÏòÁ¿Ö®¼ä¶ÀÁ¢Í¬·Ö²¼£©,Ôò¹«Ê½¾Í³ÉÁË£º

Ò²¾ÍÊÇÅ·ÊϾàÀëÁË¡£¡¡¡¡
Èôз½²î¾ØÕóÊǶԽǾØÕ󣬹«Ê½±ä³ÉÁ˱ê×¼»¯Å·ÊϾàÀë¡£
(2)ÂíÊϾàÀëµÄÓÅȱµã£ºÁ¿¸ÙÎ޹أ¬Åųý±äÁ¿Ö®¼äµÄÏà¹ØÐԵĸÉÈÅ¡£
¡¸Î¢²©ÉϵÄseafood¸ßÇå°æµãÆÀµÀ£ºÔÀ´ÂíÊϾàÀëÊǸù¾Ýз½²î¾ØÕóÑݱ䣬һֱ±»ÀÏʦÎóµ¼ÁË£¬¹Ö²»µÃ¿´KillianÔÚ05ÄêNIPS·¢±íµÄLMNNÂÛÎÄʱºòÀÏÊÇ¿´µ½Ð·½²î¾ØÕóºÍ°ëÕý¶¨£¬ÔÀ´ÊÇÕâ»ØÊ¡¹
7¡¢°ÍÊϾàÀ루Bhattacharyya Distance£©£¬ÔÚͳ¼ÆÖУ¬Bhattacharyya¾àÀë²âÁ¿Á½¸öÀëÉ¢»òÁ¬Ðø¸ÅÂÊ·Ö²¼µÄÏàËÆÐÔ¡£ËüÓëºâÁ¿Á½¸öͳ¼ÆÑùÆ·»òÖÖȺ֮¼äµÄÖØµþÁ¿µÄBhattacharyyaϵÊýÃÜÇÐÏà¹Ø¡£Bhattacharyya¾àÀëºÍBhattacharyyaϵÊýÒÔ20ÊÀ¼Í30Äê´úÔøÔÚÓ¡¶Èͳ¼ÆÑо¿Ëù¹¤×÷µÄÒ»¸öͳ¼ÆÑ§¼ÒA.
BhattacharyaÃüÃû¡£Í¬Ê±£¬BhattacharyyaϵÊý¿ÉÒÔ±»ÓÃÀ´È·¶¨Á½¸öÑù±¾±»ÈÏΪÏà¶Ô½Ó½üµÄ£¬ËüÊÇÓÃÀ´²âÁ¿ÖеÄÀà·ÖÀàµÄ¿É·ÖÀëÐÔ¡£
£¨1£©°ÍÊϾàÀëµÄ¶¨Òå
¶ÔÓÚÀëÉ¢¸ÅÂÊ·Ö²¼ pºÍqÔÚͬһÓò X£¬Ëü±»¶¨ÒåΪ£º

ÆäÖУº

ÊÇBhattacharyyaϵÊý¡£
¶ÔÓÚÁ¬Ðø¸ÅÂÊ·Ö²¼£¬BhattacharyyaϵÊý±»¶¨ÒåΪ£º

ÔÚ ÕâÁ½ÖÖÇé¿öÏ£¬°ÍÊϾàÀë ²¢Ã»Óзþ´ÓÈý½Ç²»µÈʽ.£¨ÖµµÃÒ»ÌáµÄÊÇ£¬Hellinger¾àÀë²»·þ´ÓÈý½Ç²»µÈʽ£© ¡£
¶ÔÓÚ¶à±äÁ¿µÄ¸ß˹·Ö²¼
£¬
ºÍ ÊÇÊֶκÍз½²îµÄ·Ö²¼¡£
ÐèҪעÒâµÄÊÇ£¬ÔÚÕâÖÖÇé¿öÏ£¬µÚÒ»ÏîÖеÄBhattacharyya¾àÀëÓëÂíÊϾàÀëÓйØÁª¡£
£¨2£©BhattacharyyaϵÊý
BhattacharyyaϵÊýÊÇÁ½¸öͳ¼ÆÑù±¾Ö®¼äµÄÖØµþÁ¿µÄ½üËÆ²âÁ¿£¬¿ÉÒÔ±»ÓÃÓÚÈ·¶¨±»¿¼ÂǵÄÁ½¸öÑù±¾µÄÏà¶Ô½Ó½ü¡£
¼ÆËãBhattacharyyaϵÊýÉæ¼°¼¯³ÉµÄ»ù±¾ÐÎʽµÄÁ½¸öÑù±¾µÄÖØµþµÄʱ¼ä¼ä¸ôµÄÖµµÄÁ½¸öÑù±¾±»·ÖÁѳÉÒ»¸öÑ¡¶¨µÄ·ÖÇøÊý£¬²¢ÇÒÔÚÿ¸ö·ÖÇøÖеÄÿ¸öÑùÆ·µÄ³ÉÔ±µÄÊýÁ¿£¬ÔÚÏÂÃæµÄ¹«Ê½ÖÐʹÓÃ
¿¼ÂÇÑùÆ·a ºÍ b £¬nÊǵķÖÇøÊý£¬²¢ÇÒ £¬ ±»Ò»¸öºÍ
b iµÄÈÕ·ÖÇøÖеÄÑù±¾ÊýÁ¿µÄ³ÉÔ±¡£
8. ººÃ÷¾àÀë(Hamming distance)£¬ Á½¸öµÈ³¤×Ö·û´®s1Óës2Ö®¼äµÄººÃ÷¾àÀ붨ÒåΪ½«ÆäÖÐÒ»¸ö±äΪÁíÍâÒ»¸öËùÐèÒª×÷µÄ×îÐ¡Ìæ»»´ÎÊý¡£ÀýÈç×Ö·û´®¡°1111¡±Óë¡°1001¡±Ö®¼äµÄººÃ÷¾àÀëΪ2¡£Ó¦ÓãºÐÅÏ¢±àÂ루ΪÁËÔöÇ¿ÈÝ´íÐÔ£¬Ó¦Ê¹µÃ±àÂë¼äµÄ×îСººÃ÷¾àÀ뾡¿ÉÄܴ󣩡£
»òÐí£¬Ä㻹ûÃ÷°×ÎÒÔÙ˵ʲô£¬²»¼±£¬¿´ÏÂÉÏÆªblogÖеÚ78ÌâµÄµÚ3СÌâÕûÀíµÄÒ»µÀÃæÊÔÌâÄ¿£¬±ãһĿÁËÈ»ÁË¡£ÈçÏÂͼËùʾ£º

//¶¯Ì¬¹æ»®£º //f[i,j]±íʾs[0...i]Óët[0...j]µÄ×îС±à¼¾àÀë¡£ f[i,j] = min { f[i-1,j]+1, f[i,j-1]+1, f[i-1,j-1]+(s[i]==t[j]?0:1) } //·Ö±ð±íʾ£ºÌí¼Ó1¸ö£¬É¾³ý1¸ö£¬Ìæ»»1¸ö£¨Ïàͬ¾Í²»ÓÃÌæ»»£©¡£ |
Óë´Ëͬʱ£¬ÃæÊÔ¹Ù»¹¿ÉÒÔ¼ÌÐøÎÊÏÂÈ¥£ºÄÇô£¬ÇëÎÊ£¬ÈçºÎÉè¼ÆÒ»¸ö±È½ÏÁ½ÆªÎÄÕÂÏàËÆÐÔµÄËã·¨£¿£¨Õâ¸öÎÊÌâµÄÌÖÂÛ¿ÉÒÔ¿´¿´ÕâÀhttp://t.cn/zl82CAH£¬¼°ÕâÀï¹ØÓÚsimhashËã·¨µÄ½éÉÜ£ºhttp://www.cnblogs.com/linecong/archive/2010/08/28/simhash.html£©£¬½ÓÏÂÀ´£¬±ãÒý³öÁËÏÂÎĹØÓڼнÇÓàÏÒµÄÌÖÂÛ¡£
£¨ÉÏÆªblogÖеÚ78ÌâµÄµÚ3СÌâ¸ø³öÁ˶àÖÖ·½·¨£¬¶ÁÕß¿ÉÒԲο´Ö®¡£Í¬Ê±£¬³ÌÐòÔ±±à³ÌÒÕÊõϵÁеڶþÊ®°ËÕ½«Ïêϸ²ûÊöÕâ¸öÎÊÌ⣩
9. ¼Ð½ÇÓàÏÒ(Cosine) £¬¼¸ºÎÖмнÇÓàÏÒ¿ÉÓÃÀ´ºâÁ¿Á½¸öÏòÁ¿·½ÏòµÄ²îÒ죬»úÆ÷ѧϰÖнèÓÃÕâÒ»¸ÅÄîÀ´ºâÁ¿Ñù±¾ÏòÁ¿Ö®¼äµÄ²îÒì¡£
(1)ÔÚ¶þά¿Õ¼äÖÐÏòÁ¿A(x1,y1)ÓëÏòÁ¿B(x2,y2)µÄ¼Ð½ÇÓàÏÒ¹«Ê½£º

(2) Á½¸önάÑù±¾µãa(x11,x12,¡,x1n)ºÍb(x21,x22,¡,x2n)µÄ¼Ð½ÇÓàÏÒ

ÀàËÆµÄ£¬¶ÔÓÚÁ½¸önάÑù±¾µãa(x11,x12,¡,x1n)ºÍb(x21,x22,¡,x2n)£¬¿ÉÒÔʹÓÃÀàËÆÓڼнÇÓàÏҵĸÅÄîÀ´ºâÁ¿ËüÃǼäµÄÏàËÆ³Ì¶È£¬¼´£º
¼Ð½ÇÓàÏÒȡֵ·¶Î§Îª[-1,1]¡£¼Ð½ÇÓàÏÒÔ½´ó±íʾÁ½¸öÏòÁ¿µÄ¼Ð½ÇԽС£¬¼Ð½ÇÓàÏÒԽС±íʾÁ½ÏòÁ¿µÄ¼Ð½ÇÔ½´ó¡£µ±Á½¸öÏòÁ¿µÄ·½ÏòÖØºÏʱ¼Ð½ÇÓàÏÒÈ¡×î´óÖµ1£¬µ±Á½¸öÏòÁ¿µÄ·½ÏòÍêÈ«Ïà·´¼Ð½ÇÓàÏÒÈ¡×îСֵ-1¡£
10. ½Ü¿¨µÂÏàËÆÏµÊý(Jaccard similarity coefficient)
(1) ½Ü¿¨µÂÏàËÆÏµÊý
Á½¸ö¼¯ºÏAºÍBµÄ½»¼¯ÔªËØÔÚA£¬BµÄ²¢¼¯ÖÐËùÕ¼µÄ±ÈÀý£¬³ÆÎªÁ½¸ö¼¯ºÏµÄ½Ü¿¨µÂÏàËÆÏµÊý£¬Ó÷ûºÅJ(A,B)±íʾ¡£¡¡

½Ü¿¨µÂÏàËÆÏµÊýÊǺâÁ¿Á½¸ö¼¯ºÏµÄÏàËÆ¶ÈÒ»ÖÖÖ¸±ê¡£
(2) ½Ü¿¨µÂ¾àÀë
Óë½Ü¿¨µÂÏàËÆÏµÊýÏà·´µÄ¸ÅÄîÊǽܿ¨µÂ¾àÀë(Jaccard distance)¡£
½Ü¿¨µÂ¾àÀë¿ÉÓÃÈçϹ«Ê½±íʾ£º¡¡¡¡

½Ü¿¨µÂ¾àÀëÓÃÁ½¸ö¼¯ºÏÖв»Í¬ÔªËØÕ¼ËùÓÐÔªËØµÄ±ÈÀýÀ´ºâÁ¿Á½¸ö¼¯ºÏµÄÇø·Ö¶È¡£
(3) ½Ü¿¨µÂÏàËÆÏµÊýÓë½Ü¿¨µÂ¾àÀëµÄÓ¦ÓÃ
¿É½«½Ü¿¨µÂÏàËÆÏµÊýÓÃÔÚºâÁ¿Ñù±¾µÄÏàËÆ¶ÈÉÏ¡£
¾ÙÀý£ºÑù±¾AÓëÑù±¾BÊÇÁ½¸önάÏòÁ¿£¬¶øÇÒËùÓÐά¶ÈµÄȡֵ¶¼ÊÇ0»ò1£¬ÀýÈ磺A(0111)ºÍB(1011)¡£ÎÒÃǽ«Ñù±¾¿´³ÉÊÇÒ»¸ö¼¯ºÏ£¬1±íʾ¼¯ºÏ°üº¬¸ÃÔªËØ£¬0±íʾ¼¯ºÏ²»°üº¬¸ÃÔªËØ¡£
M11 £ºÑù±¾AÓëB¶¼ÊÇ1µÄά¶ÈµÄ¸öÊý
M01£ºÑù±¾AÊÇ0£¬Ñù±¾BÊÇ1µÄά¶ÈµÄ¸öÊý
M10£ºÑù±¾AÊÇ1£¬Ñù±¾BÊÇ0 µÄά¶ÈµÄ¸öÊý
M00£ºÑù±¾AÓëB¶¼ÊÇ0µÄά¶ÈµÄ¸öÊý
ÒÀ¾ÝÉÏÎĸøµÄ½Ü¿¨µÂÏàËÆÏµÊý¼°½Ü¿¨µÂ¾àÀëµÄÏà¹Ø¶¨Ò壬Ñù±¾AÓëBµÄ½Ü¿¨µÂÏàËÆÏµÊýJ¿ÉÒÔ±íʾΪ£º

ÕâÀïM11+M01+M10¿ÉÀí½âΪAÓëBµÄ²¢¼¯µÄÔªËØ¸öÊý£¬¶øM11ÊÇAÓëBµÄ½»¼¯µÄÔªËØ¸öÊý¡£¶øÑù±¾AÓëBµÄ½Ü¿¨µÂ¾àÀë±íʾΪJ'£º

11.Ƥ¶ûѷϵÊý(Pearson Correlation Coefficient)
ÔÚ¾ßÌå²ûÊöƤ¶ûÑ·Ïà¹ØÏµÊý֮ǰ£¬ÓбØÒª½âÊÍÏÂʲôÊÇÏà¹ØÏµÊý ( Correlation
coefficient )ÓëÏà¹Ø¾àÀë(Correlation distance)¡£
Ïà¹ØÏµÊý ( Correlation coefficient )µÄ¶¨ÒåÊÇ£º

(ÆäÖУ¬EΪÊýѧÆÚÍû»ò¾ùÖµ£¬DΪ·½²î£¬D¿ª¸ùºÅΪ±ê×¼²î£¬E{ [X-E(X)]
[Y-E(Y)]}³ÆÎªËæ»ú±äÁ¿XÓëYµÄз½²î£¬¼ÇΪCov(X,Y)£¬¼´Cov(X,Y) = E{ [X-E(X)]
[Y-E(Y)]}£¬¶øÁ½¸ö±äÁ¿Ö®¼äµÄз½²îºÍ±ê×¼²îµÄÉÌÔò³ÆÎªËæ»ú±äÁ¿XÓëYµÄÏà¹ØÏµÊý£¬¼ÇΪ)
Ïà¹ØÏµÊýºâÁ¿Ëæ»ú±äÁ¿XÓëYÏà¹Ø³Ì¶ÈµÄÒ»ÖÖ·½·¨£¬Ïà¹ØÏµÊýµÄȡֵ·¶Î§ÊÇ[-1,1]¡£Ïà¹ØÏµÊýµÄ¾ø¶ÔÖµÔ½´ó£¬Ôò±íÃ÷XÓëYÏà¹Ø¶ÈÔ½¸ß¡£µ±XÓëYÏßÐÔÏà¹ØÊ±£¬Ïà¹ØÏµÊýȡֵΪ1£¨ÕýÏßÐÔÏà¹Ø£©»ò-1£¨¸ºÏßÐÔÏà¹Ø£©¡£
¾ßÌåµÄ£¬Èç¹ûÓÐÁ½¸ö±äÁ¿£ºX¡¢Y£¬×îÖÕ¼ÆËã³öµÄÏà¹ØÏµÊýµÄº¬Òå¿ÉÒÔÓÐÈçÏÂÀí½â£º
µ±Ïà¹ØÏµÊýΪ0ʱ£¬XºÍYÁ½±äÁ¿ÎÞ¹ØÏµ¡£
µ±XµÄÖµÔö´ó£¨¼õС£©£¬YÖµÔö´ó£¨¼õС£©£¬Á½¸ö±äÁ¿ÎªÕýÏà¹Ø£¬Ïà¹ØÏµÊýÔÚ0.00Óë1.00Ö®¼ä¡£
µ±XµÄÖµÔö´ó£¨¼õС£©£¬YÖµ¼õС£¨Ôö´ó£©£¬Á½¸ö±äÁ¿Îª¸ºÏà¹Ø£¬Ïà¹ØÏµÊýÔÚ-1.00Óë0.00Ö®¼ä¡£
Ïà¹Ø¾àÀëµÄ¶¨ÒåÊÇ£º

|