ÔÚѧÊõÑо¿Óë½ÌѧÖУ¬ºÜ¶àËã·¨¶¼ÓÐÒ»¸ö»ù±¾¼ÙÉ裬ÄǾÍÊÇÊý¾Ý·Ö²¼ÊǾùÔȵġ£µ±ÎÒÃǰÑÕâЩËã·¨Ö±½ÓÓ¦ÓÃÓÚʵ¼ÊÊý¾Ýʱ£¬´ó¶àÊýÇé¿ö϶¼ÎÞ·¨È¡µÃÀíÏëµÄ½á¹û¡£ÒòΪʵ¼ÊÊý¾ÝÍùÍù·Ö²¼µÃºÜ²»¾ùÔÈ£¬¶¼»á´æÔÚ¡°³¤Î²ÏÖÏó¡±£¬Ò²¾ÍÊÇËùνµÄ¡°¶þ°ËÔÀí¡±¡£
Õ⼸ÄêÀ´£¬»úÆ÷ѧϰºÍÊý¾ÝÍÚ¾ò·Ç³£»ðÈÈ£¬ËüÃÇÖð½¥ÎªÊÀ½ç´øÀ´Êµ¼Ê¼ÛÖµ¡£Óë´Ëͬʱ£¬Ô½À´Ô½¶àµÄ»úÆ÷ѧϰËã·¨´ÓѧÊõ½ç×ßÏò¹¤Òµ½ç£¬¶øÔÚÕâ¸ö¹ý³ÌÖлáÓкܶàÀ§ÄÑ¡£Êý¾Ý²»Æ½ºâÎÊÌâËäÈ»²»ÊÇ×îÄѵ쬵«¾ø¶ÔÊÇ×îÖØÒªµÄÎÊÌâÖ®Ò»¡£
Ò»¡¢Êý¾Ý²»Æ½ºâ
ÔÚѧÊõÑо¿Óë½ÌѧÖУ¬ºÜ¶àËã·¨¶¼ÓÐÒ»¸ö»ù±¾¼ÙÉ裬ÄǾÍÊÇÊý¾Ý·Ö²¼ÊǾùÔȵġ£µ±ÎÒÃǰÑÕâЩËã·¨Ö±½ÓÓ¦ÓÃÓÚʵ¼ÊÊý¾Ýʱ£¬´ó¶àÊýÇé¿ö϶¼ÎÞ·¨È¡µÃÀíÏëµÄ½á¹û¡£ÒòΪʵ¼ÊÊý¾ÝÍùÍù·Ö²¼µÃºÜ²»¾ùÔÈ£¬¶¼»á´æÔÚ¡°³¤Î²ÏÖÏó¡±£¬Ò²¾ÍÊÇËùνµÄ¡°¶þ°ËÔÀí¡±¡£ÏÂͼÊÇÐÂÀË΢²©½»»¥·Ö²¼Çé¿ö£º

¿ÉÒÔ¿´µ½´ó²¿·Ö΢²©µÄ×Ü»¥¶¯Êý(±»×ª·¢¡¢ÆÀÂÛÓëµãÔÞÊýÁ¿)ÔÚ0-5Ö®¼ä£¬½»»¥Êý¶àµÄ΢²©(¶àÓÚ100)·Ç³£Ö®ÉÙ¡£Èç¹ûÎÒÃÇÈ¥Ô¤²âÒ»Ìõ΢²©½»»¥ÊýËùÔÚµµÎ»£¬Ô¤²âÆ÷Ö»ÐèÒª°ÑËùÓÐ΢²©Ô¤²âΪµÚÒ»µµ(0-5)¾ÍÄÜ»ñµÃ·Ç³£¸ßµÄ׼ȷÂÊ£¬¶øÕâÑùµÄÔ¤²âÆ÷ûÓÐÈκμÛÖµ¡£ÄÇÈçºÎÀ´½â¾ö»úÆ÷ѧϰÖÐÊý¾Ý²»Æ½ºâÎÊÌâÄØ?Õâ±ãÊÇÕâÆªÎÄÕÂÒªÌÖÂÛµÄÖ÷ÒªÄÚÈÝ¡£
ÑϸñµØ½²£¬ÈκÎÊý¾Ý¼¯É϶¼ÓÐÊý¾Ý²»Æ½ºâÏÖÏó£¬ÕâÍùÍùÓÉÎÊÌâ±¾Éí¾ö¶¨µÄ£¬µ«ÎÒÃÇÖ»¹Ø×¢ÄÇЩ·Ö²¼²î±ð±È½ÏÐüÊâµÄ;ÁíÍ⣬ËäÈ»ºÜ¶àÊý¾Ý¼¯¶¼°üº¬¶à¸öÀà±ð£¬µ«ÕâÀï×ÅÖØ¿¼ÂǶþ·ÖÀ࣬ÒòΪ½â¾öÁ˶þ·ÖÀàÖеÄÊý¾Ý²»Æ½ºâÎÊÌâºó£¬Íƶø¹ãÖ®¾ÍÄܵõ½¶à·ÖÀàÇé¿öϵĽâ¾ö·½°¸¡£×ÛÉÏ£¬ÕâÆªÎÄÕÂÖ÷ÒªÌÖÂÛÈçºÎ½â¾ö¶þ·ÖÀàÖÐÕý¸ºÑù±¾²îÁ½¸ö¼°ÒÔÉÏÊýÁ¿¼¶Çé¿öϵÄÊý¾Ý²»Æ½ºâÎÊÌâ¡£
²»Æ½ºâ³Ì¶ÈÏàͬ(¼´Õý¸ºÑù±¾±ÈÀýÀàËÆ)µÄÁ½¸öÎÊÌ⣬½â¾öµÄÄÑÒ׳̶ÈÒ²¿ÉÄܲ»Í¬£¬ÒòΪÎÊÌâÄÑÒ׳̶Ȼ¹È¡¾öÓÚÎÒÃÇËùÓµÓÐÊý¾ÝÓжà´ó¡£±ÈÈçÔÚÔ¤²â΢²©»¥¶¯ÊýµÄÎÊÌâÖУ¬ËäÈ»Êý¾Ý²»Æ½ºâ£¬µ«Ã¿¸öµµÎ»µÄÊý¾ÝÁ¿¶¼ºÜ´ó¡ª¡ª×îÉÙµÄÀà±ðÒ²Óм¸Íò¸öÑù±¾£¬ÕâÑùµÄÎÊÌâͨ³£±È½ÏÈÝÒ×½â¾ö;¶øÔÚ°©Ö¢Õï¶ÏµÄ³¡¾°ÖУ¬ÒòΪ»¼°©Ö¢µÄÈ˱¾À´¾ÍºÜÉÙ£¬ËùÒÔÊý¾Ý²»µ«²»Æ½ºâ£¬Ñù±¾Êý»¹·Ç³£ÉÙ£¬ÕâÑùµÄÎÊÌâ¾Í·Ç³£¼¬ÊÖ¡£×ÛÉÏ£¬¿ÉÒÔ°ÑÎÊÌâ¸ù¾ÝÄѶȴÓСµ½´óÅŸöÐò£º´óÊý¾Ý+·Ö²¼¾ùºâ<´óÊý¾Ý+·Ö²¼²»¾ùºâ<СÊý¾Ý+Êý¾Ý¾ùºâ<СÊý¾Ý+Êý¾Ý²»¾ùºâ¡£¶ÔÓÚÐèÒª½â¾öµÄÎÊÌ⣬Äõ½Êý¾Ýºó£¬Ê×ÏÈͳ¼Æ¿ÉÓÃѵÁ·Êý¾ÝÓжà´ó£¬È»ºóÔÙ¹Û²ìÊý¾Ý·Ö²¼Çé¿ö¡£¾Ñé±íÃ÷£¬ÑµÁ·Êý¾ÝÖÐÿ¸öÀà±ðÓÐ5000¸öÒÔÉÏÑù±¾£¬Êý¾ÝÁ¿ÊÇ×ã¹»µÄ£¬Õý¸ºÑù±¾²îÒ»¸öÊýÁ¿¼¶ÒÔÄÚÊÇ¿ÉÒÔ½ÓÊܵ쬲»Ì«ÐèÒª¿¼ÂÇÊý¾Ý²»Æ½ºâÎÊÌâ(ÍêÈ«ÊǾÑ飬ûÓÐÀíÂÛÒÀ¾Ý£¬½ö¹©²Î¿¼)¡£
¶þ¡¢ÈçºÎ½â¾ö
½â¾öÕâÒ»ÎÊÌâµÄ»ù±¾Ë¼Â·ÊÇÈÃÕý¸ºÑù±¾ÔÚѵÁ·¹ý³ÌÖÐÓµÓÐÏàͬµÄ»°ÓïȨ£¬±ÈÈçÀûÓòÉÑùÓë¼ÓȨµÈ·½·¨¡£ÎªÁË·½±ãÆð¼û£¬ÎÒÃǰÑÊý¾Ý¼¯ÖÐÑù±¾½Ï¶àµÄÄÇÒ»Àà³ÆÎª¡°´óÖÚÀࡱ£¬Ñù±¾½ÏÉÙµÄÄÇÒ»Àà³ÆÎª¡°Ð¡ÖÚÀࡱ¡£
1. ²ÉÑù
²ÉÑù·½·¨ÊÇͨ¹ý¶ÔѵÁ·¼¯½øÐд¦ÀíʹÆä´Ó²»Æ½ºâµÄÊý¾Ý¼¯±ä³ÉƽºâµÄÊý¾Ý¼¯£¬Ôڴ󲿷ÖÇé¿öÏ»á¶Ô×îÖյĽá¹û´øÀ´ÌáÉý¡£
²ÉÑù·ÖΪÉϲÉÑù(Oversampling)ºÍϲÉÑù(Undersampling)£¬ÉϲÉÑùÊǰÑСÖÖÀà¸´ÖÆ¶à·Ý£¬Ï²ÉÑùÊÇ´Ó´óÖÚÀàÖÐÌÞ³ýһЩÑù±¾£¬»òÕß˵ֻ´Ó´óÖÚÀàÖÐѡȡ²¿·ÖÑù±¾¡£
Ëæ»ú²ÉÑù×î´óµÄÓŵãÊǼòµ¥£¬µ«È±µãÒ²ºÜÃ÷ÏÔ¡£ÉϲÉÑùºóµÄÊý¾Ý¼¯Öлᷴ¸´³öÏÖһЩÑù±¾£¬ÑµÁ·³öÀ´µÄÄ£ÐÍ»áÓÐÒ»¶¨µÄ¹ýÄâºÏ;¶øÏ²ÉÑùµÄȱµãÏÔ¶øÒ×¼û£¬ÄǾÍÊÇ×îÖÕµÄѵÁ·¼¯¶ªÊ§ÁËÊý¾Ý£¬Ä£ÐÍֻѧµ½ÁË×ÜÌåģʽµÄÒ»²¿·Ö¡£
ÉϲÉÑù»á°ÑСÖÚÑù±¾¸´Öƶà·Ý£¬Ò»¸öµã»áÔÚ¸ßά¿Õ¼äÖз´¸´³öÏÖ£¬Õâ»áµ¼ÖÂÒ»¸öÎÊÌ⣬ÄǾÍÊÇÔËÆøºÃ¾ÍÄֶܷԺܶàµã£¬·ñÔò·Ö´íºÜ¶àµã¡£ÎªÁ˽â¾öÕâÒ»ÎÊÌ⣬¿ÉÒÔÔÚÿ´ÎÉú³ÉÐÂÊý¾Ýµãʱ¼ÓÈëÇá΢µÄËæ»úÈŶ¯£¬¾Ñé±íÃ÷ÕâÖÖ×ö·¨·Ç³£ÓÐЧ¡£
ÒòΪϲÉÑù»á¶ªÊ§ÐÅÏ¢£¬ÈçºÎ¼õÉÙÐÅÏ¢µÄËðÊ§ÄØ?µÚÒ»ÖÖ·½·¨½Ð×öEasyEnsemble£¬ÀûÓÃÄ£ÐÍÈںϵķ½·¨(Ensemble)£º¶à´ÎϲÉÑù(·Å»Ø²ÉÑù£¬ÕâÑù²úÉúµÄѵÁ·¼¯²ÅÏ໥¶ÀÁ¢)²úÉú¶à¸ö²»Í¬µÄѵÁ·¼¯£¬½ø¶øÑµÁ·¶à¸ö²»Í¬µÄ·ÖÀàÆ÷£¬Í¨¹ý×éºÏ¶à¸ö·ÖÀàÆ÷µÄ½á¹ûµÃµ½×îÖյĽá¹û¡£µÚ¶þÖÖ·½·¨½Ð×öBalanceCascade£¬ÀûÓÃÔöÁ¿ÑµÁ·µÄ˼Ïë(Boosting)£ºÏÈͨ¹ýÒ»´ÎϲÉÑù²úÉúѵÁ·¼¯£¬ÑµÁ·Ò»¸ö·ÖÀàÆ÷£¬¶ÔÓÚÄÇЩ·ÖÀàÕýÈ·µÄ´óÖÚÑù±¾²»·Å»Ø£¬È»ºó¶ÔÕâ¸ö¸üСµÄ´óÖÚÑù±¾Ï²ÉÑù²úÉúѵÁ·¼¯£¬ÑµÁ·µÚ¶þ¸ö·ÖÀàÆ÷£¬ÒÔ´ËÀàÍÆ£¬×îÖÕ×éºÏËùÓзÖÀàÆ÷µÄ½á¹ûµÃµ½×îÖÕ½á¹û¡£µÚÈýÖÖ·½·¨ÊÇÀûÓÃKNNÊÔͼÌôÑ¡ÄÇЩ×î¾ß´ú±íÐԵĴóÖÚÑù±¾£¬½Ð×öNearMiss£¬ÕâÀà·½·¨¼ÆËãÁ¿ºÜ´ó£¬¸ÐÐËȤµÄ¿ÉÒԲο¼¡°Learning
from Imbalanced Data¡±ÕâÆª×ÛÊöµÄ3.2.1½Ú¡£
2. Êý¾ÝºÏ³É
Êý¾ÝºÏ³É·½·¨ÊÇÀûÓÃÒÑÓÐÑù±¾Éú³É¸ü¶àÑù±¾£¬ÕâÀà·½·¨ÔÚСÊý¾Ý³¡¾°ÏÂÓкܶà³É¹¦°¸Àý£¬±ÈÈçҽѧͼÏñ·ÖÎöµÈ¡£
ÆäÖÐ×î³£¼ûµÄÒ»ÖÖ·½·¨½Ð×öSMOTE£¬ËüÀûÓÃСÖÚÑù±¾ÔÚÌØÕ÷¿Õ¼äµÄÏàËÆÐÔÀ´Éú³ÉÐÂÑù±¾¡£¶ÔÓÚСÖÚÑù±¾
´ÓËüÊôÓÚСÖÚÀàµÄK½üÁÚÖÐËæ»úѡȡһ¸öÑù±¾µã
Éú³ÉÒ»¸öеÄСÖÚÑù±¾

´ÓËüÊôÓÚСÖÚÀàµÄK½üÁÚÖÐËæ»úѡȡһ¸öÑù±¾µã

Éú³ÉÒ»¸öеÄСÖÚÑù±¾
,ÆäÖÐ
ÆäÖÐ

ÊÇËæ»úÊý¡£

ÉÏͼÊÇSMOTE·½·¨ÔÚK=6½üÁÚϵÄʾÒâͼ£¬ºÚÉ«·½¸ñÊÇÉú³ÉµÄÐÂÑù±¾¡£
SMOTEΪÿ¸öСÖÚÑù±¾ºÏ³ÉÏàͬÊýÁ¿µÄÐÂÑù±¾£¬Õâ´øÀ´Ò»Ð©Ç±ÔÚµÄÎÊÌ⣺һ·½ÃæÊÇÔö¼ÓÁËÀàÖ®¼äÖØµþµÄ¿ÉÄÜÐÔ£¬ÁíÒ»·½ÃæÊÇÉú³ÉһЩûÓÐÌṩÓÐÒæÐÅÏ¢µÄÑù±¾¡£ÎªÁ˽â¾öÕâ¸öÎÊÌ⣬³öÏÖÁ½ÖÖ·½·¨£ºBorderline-SMOTEÓëADASYN¡£
Borderline-SMOTEµÄ½â¾ö˼·ÊÇѰÕÒÄÇЩӦ¸ÃΪ֮ºÏ³ÉÐÂÑù±¾µÄСÖÚÑù±¾¡£¼´ÎªÃ¿¸öСÖÚÑù±¾¼ÆËãK½üÁÚ£¬Ö»ÎªÄÇЩK½üÁÚÖÐÓÐÒ»°ëÒÔÉÏ´óÖÚÑù±¾µÄСÖÚÑù±¾Éú³ÉÐÂÑù±¾¡£Ö±¹ÛµØ½²£¬Ö»ÎªÄÇЩÖÜΧ´ó²¿·ÖÊÇ´óÖÚÑù±¾µÄСÖÚÑù±¾Éú³ÉÐÂÑù±¾£¬ÒòΪÕâЩÑù±¾ÍùÍùÊDZ߽çÑù±¾¡£È·¶¨ÁËΪÄÄЩСÖÚÑù±¾Éú³ÉÐÂÑù±¾ºóÔÙÀûÓÃSMOTEÉú³ÉÐÂÑù±¾¡£
ADASYNµÄ½â¾ö˼·ÊǸù¾ÝÊý¾Ý·Ö²¼Çé¿öΪ²»Í¬Ð¡ÖÚÑù±¾Éú³É²»Í¬ÊýÁ¿µÄÐÂÑù±¾¡£Ê×Ïȸù¾Ý×îÖյį½ºâ³Ì¶ÈÉ趨×ܹ²ÐèÒªÉú³ÉµÄÐÂСÖÚÑù±¾ÊýÁ¿G
È»ºóΪÿ¸öСÖÚÑù±¾ ¼ÆËã·Ö²¼±ÈÀý

ÆäÖÐ ÊÇ K½üÁÚÖдóÖÚÑù±¾µÄÊýÁ¿£¬ZÓÃÀ´¹éÒ»»¯Ê¹µÃ
×îºóΪСÖÚÑù±¾ Éú³ÉÐÂÑù±¾µÄ¸öÊýΪ
È·¶¨¸öÊýºóÔÙÀûÓÃSMOTEÉú³ÉÐÂÑù±¾¡£
3. ¼ÓȨ
³ýÁ˲ÉÑùºÍÉú³ÉÐÂÊý¾ÝµÈ·½·¨£¬ÎÒÃÇ»¹¿ÉÒÔͨ¹ý¼ÓȨµÄ·½Ê½À´½â¾öÊý¾Ý²»Æ½ºâÎÊÌ⣬¼´¶Ô²»Í¬Àà±ð·Ö´íµÄ´ú¼Û²»Í¬£¬ÈçÏÂͼ£º

ºáÏòÊÇÕæÊµ·ÖÀàÇé¿ö£¬×ÝÏòÊÇÔ¤²â·ÖÀàÇé¿ö£¬C(i,j)ÊǰÑÕæÊµÀà±ðΪjµÄÑù±¾Ô¤²âΪiʱµÄËðʧ£¬ÎÒÃÇÐèÒª¸ù¾Ýʵ¼ÊÇé¿öÀ´É趨ËüµÄÖµ¡£
ÕâÖÖ·½·¨µÄÄѵãÔÚÓÚÉèÖúÏÀíµÄÈ¨ÖØ£¬Êµ¼ÊÓ¦ÓÃÖÐÒ»°ãÈø÷¸ö·ÖÀà¼äµÄ¼ÓȨËðʧֵ½üËÆÏàµÈ¡£µ±È»Õâ²¢²»ÊÇͨÓ÷¨Ôò£¬»¹ÊÇÐèÒª¾ßÌåÎÊÌâ¾ßÌå·ÖÎö¡£
4. Ò»·ÖÀà
¶ÔÓÚÕý¸ºÑù±¾¼«²»Æ½ºâµÄ³¡¾°£¬ÎÒÃÇ¿ÉÒÔ»»Ò»¸öÍêÈ«²»Í¬µÄ½Ç¶ÈÀ´¿´´ýÎÊÌ⣺°ÑËü¿´×öÒ»·ÖÀà(One Class
Learning)»òÒì³£¼ì²â(Novelty Detection)ÎÊÌâ¡£ÕâÀà·½·¨µÄÖØµã²»ÔÚÓÚ²¶×½Àà¼äµÄ²î±ð£¬¶øÊÇΪÆäÖÐÒ»Àà½øÐн¨Ä££¬¾µäµÄ¹¤×÷°üÀ¨One-class
SVMµÈ¡£
Èý¡¢ÈçºÎÑ¡Ôñ
½â¾öÊý¾Ý²»Æ½ºâÎÊÌâµÄ·½·¨Óкܶ࣬ÉÏÃæÖ»ÊÇһЩ×î³£Óõķ½·¨£¬¶ø×î³£Óõķ½·¨Ò²ÓÐÕâô¶àÖÖ£¬ÈçºÎ¸ù¾Ýʵ¼ÊÎÊÌâÑ¡ÔñºÏÊʵķ½·¨ÄØ?½ÓÏÂÀ´Ì¸Ì¸Ò»Ð©ÎҵľÑé¡£
ÔÚÕý¸ºÑù±¾¶¼·Ç³£Ö®ÉÙµÄÇé¿öÏ£¬Ó¦¸Ã²ÉÓÃÊý¾ÝºÏ³ÉµÄ·½Ê½;ÔÚ¸ºÑù±¾×ã¹»¶à£¬ÕýÑù±¾·Ç³£Ö®ÉÙÇÒ±ÈÀý¼°ÆäÐüÊâµÄÇé¿öÏ£¬Ó¦¸Ã¿¼ÂÇÒ»·ÖÀà·½·¨;ÔÚÕý¸ºÑù±¾¶¼×ã¹»¶àÇÒ±ÈÀý²»ÊÇÌØ±ðÐüÊâµÄÇé¿öÏ£¬Ó¦¸Ã¿¼ÂDzÉÑù»òÕß¼ÓȨµÄ·½·¨¡£
²ÉÑùºÍ¼ÓȨÔÚÊýѧÉÏÊǵȼ۵쬵«Êµ¼ÊÓ¦ÓÃÖÐЧ¹ûÈ´Óвî±ð¡£ÓÈÆäÊDzÉÑùÁËÖîÈçRandom ForestµÈ·ÖÀà·½·¨£¬ÑµÁ·¹ý³Ì»á¶ÔѵÁ·¼¯½øÐÐËæ»ú²ÉÑù¡£ÔÚÕâÖÖÇé¿öÏ£¬Èç¹û¼ÆËã×ÊÔ´ÔÊÐíÉϲÉÑùÍùÍùÒª±È¼ÓȨºÃһЩ¡£
ÁíÍ⣬ËäÈ»ÉϲÉÑùºÍϲÉÑù¶¼¿ÉÒÔʹÊý¾Ý¼¯±äµÃƽºâ£¬²¢ÇÒÔÚÊý¾Ý×ã¹»¶àµÄÇé¿öϵȼۣ¬µ«Á½ÕßÒ²ÊÇÓÐÇø±ðµÄ¡£Êµ¼ÊÓ¦ÓÃÖУ¬ÎҵľÑéÊÇÈç¹û¼ÆËã×ÊÔ´×ã¹»ÇÒСÖÚÀàÑù±¾×ã¹»¶àµÄÇé¿öÏÂʹÓÃÉϲÉÑù£¬·ñÔòʹÓÃϲÉÑù£¬ÒòΪÉϲÉÑù»áÔö¼ÓѵÁ·¼¯µÄ´óС½ø¶øÔö¼ÓѵÁ·Ê±¼ä£¬Í¬Ê±Ð¡µÄѵÁ·¼¯·Ç³£ÈÝÒײúÉú¹ýÄâºÏ¡£¶ÔÓÚϲÉÑù£¬Èç¹û¼ÆËã×ÊÔ´Ïà¶Ô½Ï¶àÇÒÓÐÁ¼ºÃµÄ²¢Ðл·¾³£¬Ó¦¸ÃÑ¡ÔñEnsemble·½·¨¡£ |