Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Model Center   Code  
»áÔ±   
   
 
     
   
 ¶©ÔÄ
  ¾èÖú
¡¾Ëã·¨¡¿TF-IDFËã·¨¼°Ó¦ÓÃ
 
 
  9939  次浏览      27
 2020-6-24 
 
±à¼­ÍƼö:
ÎÄÕ½²½âÁËÈçºÎ¼ÆËãTF-IDF£¿TF-IDFÓÐʲôӦÓã¿ÈçºÎÌáÈ¡Îı¾µÄ¹Ø¼ü´ÊºÍÕªÒª£¿
±¾ÎÄÀ´×Ô΢ÐÅÊý¾Ý¿ÆÑ§ÓëÈ˹¤ÖÇÄÜ£¬ÓÉ»ðÁú¹ûÈí¼þAnna±à¼­¡¢ÍƼö¡£

ǰÑÔ

ÓÐһƪºÜ³¤µÄÎÄÕ£¬ÎÒÒªÓüÆËã»úÌáÈ¡ËüµÄ¹Ø¼ü´Ê£¨Automatic Keyphrase extraction£©£¬ÍêÈ«²»¼ÓÒÔÈ˹¤¸ÉÔ¤£¬ÇëÎÊÔõÑù²ÅÄÜÕýÈ·×öµ½£¿

Õâ¸öÎÊÌâÉæ¼°µ½Êý¾ÝÍÚ¾ò¡¢Îı¾´¦Àí¡¢ÐÅÏ¢¼ìË÷µÈºÜ¶à¼ÆËã»úÇ°ÑØÁìÓò£¬µ«ÊdzöºõÒâÁϵÄÊÇ£¬ÓÐÒ»¸ö·Ç³£¼òµ¥µÄ¾­µäËã·¨£¬¿ÉÒÔ¸ø³öÁîÈËÏ൱ÂúÒâµÄ½á¹û¡£Ëü¼òµ¥µ½¶¼²»ÐèÒª¸ßµÈÊýѧ£¬ÆÕͨÈËÖ»ÓÃ10·ÖÖӾͿÉÒÔÀí½â£¬Õâ¾ÍÊÇÎÒ½ñÌìÏëÒª½éÉܵÄTF-IDFËã·¨¡£

ÈÃÎÒÃÇ´ÓÒ»¸öʵÀý¿ªÊ¼½²Æð¡£¼Ù¶¨ÏÖÔÚÓÐһƪ³¤ÎÄ¡¶ÖйúµÄÃÛ·äÑøÖ³¡·£¬ÎÒÃÇ×¼±¸ÓüÆËã»úÌáÈ¡ËüµÄ¹Ø¼ü´Ê¡£

Ò»¸öÈÝÒ×Ïëµ½µÄ˼·£¬¾ÍÊÇÕÒµ½³öÏÖ´ÎÊý×î¶àµÄ´Ê¡£Èç¹ûij¸ö´ÊºÜÖØÒª£¬ËüÓ¦¸ÃÔÚÕâÆªÎÄÕÂÖжà´Î³öÏÖ¡£ÓÚÊÇ£¬ÎÒÃǽøÐÐ"´ÊƵ"£¨Term Frequency£¬ËõдΪTF£©Í³¼Æ¡£

½á¹ûÄã¿Ï¶¨²Âµ½ÁË£¬³öÏÖ´ÎÊý×î¶àµÄ´ÊÊÇ----"µÄ"¡¢"ÊÇ"¡¢"ÔÚ"----ÕâÒ»Àà×î³£ÓõĴʡ£ËüÃǽÐ×ö"Í£ÓôÊ"£¨ http://baike.baidu.com/view/3784680.htm £©£¨stop words£©£¬±íʾ¶ÔÕÒµ½½á¹ûºÁÎÞ°ïÖú¡¢±ØÐë¹ýÂ˵ôµÄ´Ê¡£

¼ÙÉèÎÒÃǰÑËüÃǶ¼¹ýÂ˵ôÁË£¬Ö»¿¼ÂÇʣϵÄÓÐʵ¼ÊÒâÒåµÄ´Ê¡£ÕâÑùÓÖ»áÓöµ½ÁËÁíÒ»¸öÎÊÌ⣬ÎÒÃÇ¿ÉÄÜ·¢ÏÖ"Öйú"¡¢"ÃÛ·ä"¡¢"ÑøÖ³"ÕâÈý¸ö´ÊµÄ³öÏÖ´ÎÊýÒ»Ñù¶à¡£ÕâÊDz»ÊÇÒâζ×Å£¬×÷Ϊ¹Ø¼ü´Ê£¬ËüÃǵÄÖØÒªÐÔÊÇÒ»ÑùµÄ£¿

ÏÔÈ»²»ÊÇÕâÑù¡£ÒòΪ"Öйú"ÊǺܳ£¼ûµÄ´Ê£¬Ïà¶Ô¶øÑÔ£¬"ÃÛ·ä"ºÍ"ÑøÖ³"²»ÄÇô³£¼û¡£Èç¹ûÕâÈý¸ö´ÊÔÚһƪÎÄÕµijöÏÖ´ÎÊýÒ»Ñù¶à£¬ÓÐÀíÓÉÈÏΪ£¬"ÃÛ·ä"ºÍ"ÑøÖ³"µÄÖØÒª³Ì¶ÈÒª´óÓÚ"Öйú"£¬Ò²¾ÍÊÇ˵£¬Ôڹؼü´ÊÅÅÐòÉÏÃæ£¬"ÃÛ·ä"ºÍ"ÑøÖ³"Ó¦¸ÃÅÅÔÚ"Öйú"µÄÇ°Ãæ¡£

ËùÒÔ£¬ÎÒÃÇÐèÒªÒ»¸öÖØÒªÐÔµ÷ÕûϵÊý£¬ºâÁ¿Ò»¸ö´ÊÊDz»Êdz£¼û´Ê¡£Èç¹ûij¸ö´Ê±È½ÏÉÙ¼û£¬µ«ÊÇËüÔÚÕâÆªÎÄÕÂÖжà´Î³öÏÖ£¬ÄÇôËüºÜ¿ÉÄܾͷ´Ó³ÁËÕâÆªÎÄÕµÄÌØÐÔ£¬ÕýÊÇÎÒÃÇËùÐèÒªµÄ¹Ø¼ü´Ê¡£

ÓÃͳ¼ÆÑ§ÓïÑÔ±í´ï£¬¾ÍÊÇÔÚ´ÊÆµµÄ»ù´¡ÉÏ£¬Òª¶Ôÿ¸ö´Ê·ÖÅäÒ»¸ö"ÖØÒªÐÔ"È¨ÖØ¡£×î³£¼ûµÄ´Ê£¨"µÄ"¡¢"ÊÇ"¡¢"ÔÚ"£©¸øÓè×îСµÄÈ¨ÖØ£¬½Ï³£¼ûµÄ´Ê£¨"Öйú"£©¸øÓè½ÏСµÄÈ¨ÖØ£¬½ÏÉÙ¼ûµÄ´Ê£¨"ÃÛ·ä"¡¢"ÑøÖ³"£©¸øÓè½Ï´óµÄÈ¨ÖØ¡£Õâ¸öÈ¨ÖØ½Ð×ö"ÄæÎĵµÆµÂÊ"£¨Inverse Document Frequency£¬ËõдΪIDF£©£¬ËüµÄ´óСÓëÒ»¸ö´ÊµÄ³£¼û³Ì¶È³É·´±È¡£

ÖªµÀÁË"´ÊƵ"£¨TF£©ºÍ"ÄæÎĵµÆµÂÊ"£¨IDF£©ÒԺ󣬽«ÕâÁ½¸öÖµÏà³Ë£¬¾ÍµÃµ½ÁËÒ»¸ö´ÊµÄTF-IDFÖµ¡£Ä³¸ö´Ê¶ÔÎÄÕµÄÖØÒªÐÔÔ½¸ß£¬ËüµÄTF-IDFÖµ¾ÍÔ½´ó¡£ËùÒÔ£¬ÅÅÔÚ×îÇ°ÃæµÄ¼¸¸ö´Ê£¬¾ÍÊÇÕâÆªÎÄÕµĹؼü´Ê¡£

ÏÂÃæ¾ÍÊÇÕâ¸öËã·¨µÄϸ½Ú¡£

µÚÒ»²½£¬¼ÆËã´ÊƵ¡£

¿¼Âǵ½ÎÄÕÂÓ㤶ÌÖ®·Ö£¬ÎªÁ˱ãÓÚ²»Í¬ÎÄÕµıȽϣ¬½øÐÐ"´ÊƵ"±ê×¼»¯¡£

»òÕß

µÚ¶þ²½£¬¼ÆËãÄæÎĵµÆµÂÊ¡£

Õâʱ£¬ÐèÒªÒ»¸öÓïÁϿ⣨corpus£©£¬ÓÃÀ´Ä£ÄâÓïÑÔµÄʹÓû·¾³¡£

Èç¹ûÒ»¸ö´ÊÔ½³£¼û£¬ÄÇô·Öĸ¾ÍÔ½´ó£¬ÄæÎĵµÆµÂʾÍԽСԽ½Ó½ü0¡£·Öĸ֮ËùÒÔÒª¼Ó1£¬ÊÇΪÁ˱ÜÃâ·ÖĸΪ0£¨¼´ËùÓÐÎĵµ¶¼²»°üº¬¸Ã´Ê£©¡£log±íʾ¶ÔµÃµ½µÄֵȡ¶ÔÊý¡£

µÚÈý²½£¬¼ÆËãTF-IDF¡£

¿ÉÒÔ¿´µ½£¬TF-IDFÓëÒ»¸ö´ÊÔÚÎĵµÖеijöÏÖ´ÎÊý³ÉÕý±È£¬Óë¸Ã´ÊÔÚÕû¸öÓïÑÔÖеijöÏÖ´ÎÊý³É·´±È¡£ËùÒÔ£¬×Ô¶¯ÌáÈ¡¹Ø¼ü´ÊµÄËã·¨¾ÍºÜÇå³þÁË£¬¾ÍÊǼÆËã³öÎĵµµÄÿ¸ö´ÊµÄTF-IDFÖµ£¬È»ºó°´½µÐòÅÅÁУ¬È¡ÅÅÔÚ×îÇ°ÃæµÄ¼¸¸ö´Ê¡£

»¹ÊÇÒÔ¡¶ÖйúµÄÃÛ·äÑøÖ³¡·ÎªÀý£¬¼Ù¶¨¸ÃÎij¤¶ÈΪ1000¸ö´Ê£¬"Öйú"¡¢"ÃÛ·ä"¡¢"ÑøÖ³"¸÷³öÏÖ20´Î£¬ÔòÕâÈý¸ö´ÊµÄ"´ÊƵ"£¨TF£©¶¼Îª0.02¡£È»ºó£¬ËÑË÷Google·¢ÏÖ£¬°üº¬"µÄ"×ÖµÄÍøÒ³¹²ÓÐ250ÒÚÕÅ£¬¼Ù¶¨Õâ¾ÍÊÇÖÐÎÄÍøÒ³×ÜÊý¡£°üº¬"Öйú"µÄÍøÒ³¹²ÓÐ62.3ÒÚÕÅ£¬°üº¬"ÃÛ·ä"µÄÍøÒ³Îª0.484ÒÚÕÅ£¬°üº¬"ÑøÖ³"µÄÍøÒ³Îª0.973ÒÚÕÅ¡£ÔòËüÃǵÄÄæÎĵµÆµÂÊ£¨IDF£©ºÍTF-IDFÈçÏ£º

´ÓÉϱí¿É¼û£¬"ÃÛ·ä"µÄTF-IDFÖµ×î¸ß£¬"ÑøÖ³"Æä´Î£¬"Öйú"×îµÍ¡££¨Èç¹û»¹¼ÆËã"µÄ"×ÖµÄTF-IDF£¬Äǽ«ÊÇÒ»¸ö¼«Æä½Ó½ü0µÄÖµ¡££©ËùÒÔ£¬Èç¹ûֻѡÔñÒ»¸ö´Ê£¬"ÃÛ·ä"¾ÍÊÇÕâÆªÎÄÕµĹؼü´Ê¡£

³ýÁË×Ô¶¯ÌáÈ¡¹Ø¼ü´Ê£¬TF-IDFËã·¨»¹¿ÉÒÔÓÃÓÚÐí¶à±ðµÄµØ·½¡£±ÈÈ磬ÐÅÏ¢¼ìË÷ʱ£¬¶ÔÓÚÿ¸öÎĵµ£¬¶¼¿ÉÒÔ·Ö±ð¼ÆËãÒ»×éËÑË÷´Ê£¨"Öйú"¡¢"ÃÛ·ä"¡¢"ÑøÖ³"£©µÄTF-IDF£¬½«ËüÃÇÏà¼Ó£¬¾Í¿ÉÒԵõ½Õû¸öÎĵµµÄTF-IDF¡£Õâ¸öÖµ×î¸ßµÄÎĵµ¾ÍÊÇÓëËÑË÷´Ê×îÏà¹ØµÄÎĵµ¡£

TF-IDFËã·¨µÄÓŵãÊǼòµ¥¿ìËÙ£¬½á¹û±È½Ï·ûºÏʵ¼ÊÇé¿ö¡£È±µãÊÇ£¬µ¥´¿ÒÔ"´ÊƵ"ºâÁ¿Ò»¸ö´ÊµÄÖØÒªÐÔ£¬²»¹»È«Ã棬ÓÐÊ±ÖØÒªµÄ´Ê¿ÉÄܳöÏÖ´ÎÊý²¢²»¶à¡£¶øÇÒ£¬ÕâÖÖËã·¨ÎÞ·¨ÌåÏִʵÄλÖÃÐÅÏ¢£¬³öÏÖλÖÿ¿Ç°µÄ´ÊÓë³öÏÖλÖÿ¿ºóµÄ´Ê£¬¶¼±»ÊÓÎªÖØÒªÐÔÏàͬ£¬ÕâÊDz»ÕýÈ·µÄ¡££¨Ò»ÖÖ½â¾ö·½·¨ÊÇ£¬¶ÔÈ«ÎĵĵÚÒ»¶ÎºÍÿһ¶ÎµÄµÚÒ»¾ä»°£¬¸øÓè½Ï´óµÄÈ¨ÖØ¡££©

ÕÒ³öÏàËÆÎÄÕÂ

ÎÒÃÇÔÙÀ´Ñо¿ÁíÒ»¸öÏà¹ØµÄÎÊÌâ¡£ÓÐЩʱºò£¬³ýÁËÕÒµ½¹Ø¼ü´Ê£¬ÎÒÃÇ»¹Ï£ÍûÕÒµ½ÓëÔ­ÎÄÕÂÏàËÆµÄÆäËûÎÄÕ¡£±ÈÈ磬"GoogleÐÂÎÅ"ÔÚÖ÷ÐÂÎÅÏ·½£¬»¹Ìṩ¶àÌõÏàËÆµÄÐÂÎÅ¡£

ΪÁËÕÒ³öÏàËÆµÄÎÄÕ£¬ÐèÒªÓõ½"ÓàÏÒÏàËÆÐÔ" £¨cosine similiarity£©¡£ÏÂÃæ£¬ÎÒ¾ÙÒ»¸öÀý×ÓÀ´ËµÃ÷£¬Ê²Ã´ÊÇ"ÓàÏÒÏàËÆÐÔ"¡£

ΪÁ˼òµ¥Æð¼û£¬ÎÒÃÇÏÈ´Ó¾ä×Ó×ÅÊÖ¡£

¾ä×ÓA£ºÎÒϲ»¶¿´µçÊÓ£¬²»Ï²»¶¿´µçÓ°¡£

¾ä×ÓB£ºÎÒ²»Ï²»¶¿´µçÊÓ£¬Ò²²»Ï²»¶¿´µçÓ°¡£

ÇëÎÊÔõÑù²ÅÄܼÆËãÉÏÃæÁ½¾ä»°µÄÏàËÆ³Ì¶È£¿

»ù±¾Ë¼Â·ÊÇ£ºÈç¹ûÕâÁ½¾ä»°µÄÓôÊÔ½ÏàËÆ£¬ËüÃǵÄÄÚÈݾÍÓ¦¸ÃÔ½ÏàËÆ¡£Òò´Ë£¬¿ÉÒÔ´Ó´ÊÆµÈëÊÖ£¬¼ÆËãËüÃǵÄÏàËÆ³Ì¶È¡£

µÚÒ»²½£¬·Ö´Ê¡£

¾ä×ÓA£ºÎÒ/ϲ»¶/¿´/µçÊÓ£¬²»/ϲ»¶/¿´/µçÓ°¡£

¾ä×ÓB£ºÎÒ/²»/ϲ»¶/¿´/µçÊÓ£¬Ò²/²»/ϲ»¶/¿´/µçÓ°¡£

µÚ¶þ²½£¬ÁгöËùÓеĴʡ£

ÎÒ£¬Ï²»¶£¬¿´£¬µçÊÓ£¬µçÓ°£¬²»£¬Ò²¡£

µÚÈý²½£¬¼ÆËã´ÊƵ¡£

¾ä×ÓA£ºÎÒ 1£¬Ï²»¶ 2£¬¿´ 2£¬µçÊÓ 1£¬µçÓ° 1£¬²» 1£¬Ò² 0¡£

¾ä×ÓB£ºÎÒ 1£¬Ï²»¶ 2£¬¿´ 2£¬µçÊÓ 1£¬µçÓ° 1£¬²» 2£¬Ò² 1¡£

µÚËIJ½£¬Ð´³ö´ÊƵÏòÁ¿¡£

¾ä×ÓA£º[1, 2, 2, 1, 1, 1, 0]

¾ä×ÓB£º[1, 2, 2, 1, 1, 2, 1]

µ½ÕâÀÎÊÌâ¾Í±ä³ÉÁËÈçºÎ¼ÆËãÕâÁ½¸öÏòÁ¿µÄÏàËÆ³Ì¶È¡£

ÎÒÃÇ¿ÉÒÔ°ÑËüÃÇÏëÏó³É¿Õ¼äÖеÄÁ½ÌõÏ߶Σ¬¶¼ÊÇ´ÓÔ­µã£¨[0, 0, ...]£©³ö·¢£¬Ö¸Ïò²»Í¬µÄ·½Ïò¡£Á½ÌõÏß¶ÎÖ®¼äÐγÉÒ»¸ö¼Ð½Ç£¬Èç¹û¼Ð½ÇΪ0¶È£¬Òâζ×Å·½ÏòÏàͬ¡¢Ïß¶ÎÖØºÏ£»Èç¹û¼Ð½ÇΪ90¶È£¬Òâζ×ÅÐγÉÖ±½Ç£¬·½ÏòÍêÈ«²»ÏàËÆ£»Èç¹û¼Ð½ÇΪ180¶È£¬Òâζ×Å·½ÏòÕýºÃÏà·´¡£Òò´Ë£¬ÎÒÃÇ¿ÉÒÔͨ¹ý¼Ð½ÇµÄ´óС£¬À´ÅжÏÏòÁ¿µÄÏàËÆ³Ì¶È¡£¼Ð½ÇԽС£¬¾Í´ú±íÔ½ÏàËÆ¡£

ÒÔ¶þά¿Õ¼äΪÀý£¬ÉÏͼµÄaºÍbÊÇÁ½¸öÏòÁ¿£¬ÎÒÃÇÒª¼ÆËãËüÃǵļнǦȡ£ÓàÏÒ¶¨Àí¸æËßÎÒÃÇ£¬¿ÉÒÔÓÃÏÂÃæµÄ¹«Ê½ÇóµÃ£º

¼Ù¶¨aÏòÁ¿ÊÇ[x1, y1]£¬bÏòÁ¿ÊÇ[x2, y2]£¬ÄÇô¿ÉÒÔ½«ÓàÏÒ¶¨Àí¸Äд³ÉÏÂÃæµÄÐÎʽ£º

Êýѧ¼ÒÒѾ­Ö¤Ã÷£¬ÓàÏÒµÄÕâÖÖ¼ÆËã·½·¨¶ÔnάÏòÁ¿Ò²³ÉÁ¢¡£¼Ù¶¨AºÍBÊÇÁ½¸önάÏòÁ¿£¬AÊÇ [A1, A2, ..., An] £¬BÊÇ [B1, B2, ..., Bn] £¬ÔòAÓëBµÄ¼Ð½Ç¦ÈµÄÓàÏÒµÈÓÚ£º

ʹÓÃÕâ¸ö¹«Ê½£¬ÎÒÃǾͿÉÒԵõ½£¬¾ä×ÓAÓë¾ä×ÓBµÄ¼Ð½ÇµÄÓàÏÒ¡£

ÓàÏÒÖµÔ½½Ó½ü1£¬¾Í±íÃ÷¼Ð½ÇÔ½½Ó½ü0¶È£¬Ò²¾ÍÊÇÁ½¸öÏòÁ¿Ô½ÏàËÆ£¬Õâ¾Í½Ð"ÓàÏÒÏàËÆÐÔ"¡£ËùÒÔ£¬ÉÏÃæµÄ¾ä×ÓAºÍ¾ä×ÓBÊǺÜÏàËÆµÄ£¬ÊÂʵÉÏËüÃǵļнǴóԼΪ20.3¶È¡£

ÓÉ´Ë£¬ÎÒÃǾ͵õ½ÁË"ÕÒ³öÏàËÆÎÄÕÂ"µÄÒ»ÖÖËã·¨£º

£¨1£©Ê¹ÓÃTF-IDFËã·¨£¬ÕÒ³öÁ½ÆªÎÄÕµĹؼü´Ê£»

£¨2£©Ã¿ÆªÎÄÕ¸÷È¡³öÈô¸É¸ö¹Ø¼ü´Ê£¨±ÈÈç20¸ö£©£¬ºÏ²¢³ÉÒ»¸ö¼¯ºÏ£¬¼ÆËãÿƪÎÄÕ¶ÔÓÚÕâ¸ö¼¯ºÏÖÐµÄ´ÊµÄ´ÊÆµ£¨ÎªÁ˱ÜÃâÎÄÕ³¤¶ÈµÄ²îÒ죬¿ÉÒÔʹÓÃÏà¶Ô´ÊƵ£©£»

£¨3£©Éú³ÉÁ½ÆªÎÄÕ¸÷×ÔµÄ´ÊÆµÏòÁ¿£»

£¨4£©¼ÆËãÁ½¸öÏòÁ¿µÄÓàÏÒÏàËÆ¶È£¬ÖµÔ½´ó¾Í±íʾԽÏàËÆ¡£

"ÓàÏÒÏàËÆ¶È"ÊÇÒ»Öַdz£ÓÐÓõÄËã·¨£¬Ö»ÒªÊǼÆËãÁ½¸öÏòÁ¿µÄÏàËÆ³Ì¶È£¬¶¼¿ÉÒÔ²ÉÓÃËü¡£

×Ô¶¯ÕªÒª

ÓÐʱºò£¬ºÜ¼òµ¥µÄÊýѧ·½·¨£¬¾Í¿ÉÒÔÍê³ÉºÜ¸´ÔÓµÄÈÎÎñ¡£

ǰÁ½²¿·Ö¾ÍÊǺܺõÄÀý×Ó¡£½ö½öÒÀ¿¿Í³¼Æ´ÊƵ£¬¾ÍÄÜÕÒ³ö¹Ø¼ü´ÊºÍÏàËÆÎÄÕ¡£ËäÈ»ËüÃÇËã²»ÉÏЧ¹û×îºÃµÄ·½·¨£¬µ«¿Ï¶¨ÊÇ×î¼ò±ãÒ×Ðеķ½·¨¡£

½ÓÏÂÀ´ÌÖÂÛÈçºÎͨ¹ý´ÊƵ£¬¶ÔÎÄÕ½øÐÐ×Ô¶¯ÕªÒª£¨Automatic summarization£©¡£

Èç¹ûÄÜ´Ó3000×ÖµÄÎÄÕ£¬ÌáÁ¶³ö150×ÖµÄÕªÒª£¬¾Í¿ÉÒÔΪ¶ÁÕß½ÚÊ¡´óÁ¿ÔĶÁʱ¼ä¡£ÓÉÈËÍê³ÉµÄÕªÒª½Ð"È˹¤ÕªÒª"£¬ÓÉ»úÆ÷Íê³ÉµÄ¾Í½Ð"×Ô¶¯ÕªÒª"¡£Ðí¶àÍøÕ¾¶¼ÐèÒªËü£¬±ÈÈçÂÛÎÄÍøÕ¾¡¢ÐÂÎÅÍøÕ¾¡¢ËÑË÷ÒýÇæµÈµÈ¡£2007Ä꣬ÃÀ¹úѧÕßµÄÂÛÎÄ¡¶A Survey on Automatic Text Summarization¡·£¨Dipanjan Das, Andre F.T. Martins, 2007£©×ܽáÁËĿǰµÄ×Ô¶¯ÕªÒªËã·¨¡£ÆäÖУ¬ºÜÖØÒªµÄÒ»ÖÖ¾ÍÊÇ´ÊÆµÍ³¼Æ¡£

ÕâÖÖ·½·¨×îÔç³ö×Ô1958ÄêµÄIBM¹«Ë¾¿ÆÑ§¼ÒH.P. LuhnµÄÂÛÎÄ¡¶The Automatic Creation of Literature Abstracts¡·¡£

Luhn²©Ê¿ÈÏΪ£¬ÎÄÕµÄÐÅÏ¢¶¼°üº¬ÔÚ¾ä×ÓÖУ¬ÓÐЩ¾ä×Ó°üº¬µÄÐÅÏ¢¶à£¬ÓÐЩ¾ä×Ó°üº¬µÄÐÅÏ¢ÉÙ¡£"×Ô¶¯ÕªÒª"¾ÍÊÇÒªÕÒ³öÄÇЩ°üº¬ÐÅÏ¢×î¶àµÄ¾ä×Ó¡£

¾ä×ÓµÄÐÅÏ¢Á¿ÓÃ"¹Ø¼ü´Ê"À´ºâÁ¿¡£Èç¹û°üº¬µÄ¹Ø¼ü´ÊÔ½¶à£¬¾Í˵Ã÷Õâ¸ö¾ä×ÓÔ½ÖØÒª¡£LuhnÌá³öÓÃ"´Ø"£¨cluster£©±íʾ¹Ø¼ü´ÊµÄ¾Û¼¯¡£Ëùν"´Ø"¾ÍÊǰüº¬¶à¸ö¹Ø¼ü´ÊµÄ¾ä×ÓÆ¬¶Î¡£

ÉÏͼ¾ÍÊÇLuhnԭʼÂÛÎĵIJåͼ£¬±»¿òÆðÀ´µÄ²¿·Ö¾ÍÊÇÒ»¸ö"´Ø"¡£Ö»Òª¹Ø¼ü´ÊÖ®¼äµÄ¾àÀëСÓÚ"Ãż÷Öµ"£¬ËüÃǾͱ»ÈÏΪ´¦ÓÚͬһ¸ö´ØÖ®ÖС£Luhn½¨ÒéµÄÃż÷ÖµÊÇ4»ò5¡£Ò²¾ÍÊÇ˵£¬Èç¹ûÁ½¸ö¹Ø¼ü´ÊÖ®¼äÓÐ5¸öÒÔÉÏµÄÆäËû´Ê£¬¾Í¿ÉÒÔ°ÑÕâÁ½¸ö¹Ø¼ü´Ê·ÖÔÚÁ½¸ö´Ø¡£

ÏÂÒ»²½£¬¶ÔÓÚÿ¸ö´Ø£¬¶¼¼ÆËãËüµÄÖØÒªÐÔ·ÖÖµ¡£

ÒÔǰͼΪÀý£¬ÆäÖеĴØÒ»¹²ÓÐ7¸ö´Ê£¬ÆäÖÐ4¸öÊǹؼü´Ê¡£Òò´Ë£¬ËüµÄÖØÒªÐÔ·ÖÖµµÈÓÚ ( 4 x 4 ) / 7 = 2.3¡£

È»ºó£¬ÕÒ³ö°üº¬·ÖÖµ×î¸ßµÄ´ØµÄ¾ä×Ó£¨±ÈÈç5¾ä£©£¬°ÑËüÃǺÏÔÚÒ»Æð£¬¾Í¹¹³ÉÁËÕâÆªÎÄÕµÄ×Ô¶¯ÕªÒª¡£¾ßÌåʵÏÖ¿ÉÒԲμû¡¶Mining the Social Web: Analyzing Data from Facebook, Twitter, LinkedIn, and Other Social Media Sites¡·£¨O'Reilly, 2011£©Ò»ÊéµÄµÚ8Õ£¬python´úÂë¼ûgithub¡£

LuhnµÄÕâÖÖËã·¨ºóÀ´±»¼ò»¯£¬²»ÔÙÇø·Ö"´Ø"£¬Ö»¿¼ÂǾä×Ó°üº¬µÄ¹Ø¼ü´Ê¡£ÏÂÃæ¾ÍÊÇÒ»¸öÀý×Ó£¨²ÉÓÃαÂë±íʾ£©£¬Ö»¿¼Âǹؼü´ÊÊ×ÏȳöÏֵľä×Ó¡£

Summarizer(originalText, maxSummarySize):

// ¼ÆËãԭʼÎı¾µÄ´ÊƵ£¬Éú³ÉÒ»¸öÊý×飬±ÈÈç[(10,'the'), (3,'language'), (8,'code')...]
wordFrequences = getWordCounts(originalText)

// ¹ýÂ˵ôÍ£Óôʣ¬Êý×é±ä³É[(3, 'language'), (8, 'code')...]
contentWordFrequences = filtStopWords(wordFrequences)

// °´ÕÕ´ÊÆµ½øÐÐÅÅÐò£¬Êý×é±ä³É['code', 'language'...]
contentWordsSortbyFreq = sortByFreqThenDropFreq( contentWordFrequences)

// ½«ÎÄÕ·ֳɾä×Ó
sentences = getSentences(originalText)

// Ñ¡Ôñ¹Ø¼ü´ÊÊ×ÏȳöÏֵľä×Ó
setSummarySentences = {}
foreach word in contentWordsSortbyFreq:
firstMatchingSentence = search(sentences, word)
setSummarySentences.add(firstMatchingSentence)
if setSummarySentences.size() = maxSummarySize:
break

// ½«Ñ¡Öеľä×Ó°´ÕÕ³öÏÖ˳Ðò£¬×é³ÉÕªÒª
summary = ""
foreach sentence in sentences:
if sentence in setSummarySentences:
summary = summary + " " + sentence

return summary

ÀàËÆµÄËã·¨ÒѾ­±»Ð´³ÉÁ˹¤¾ß£¬±ÈÈç»ùÓÚJavaµÄClassifier4J¿âµÄSimpleSummariserÄ£¿é¡¢»ùÓÚCÓïÑÔµÄOTS¿â¡¢ÒÔ¼°»ùÓÚclassifier4JµÄC#ʵÏÖºÍpythonʵÏÖ¡£

   
9939 ´Îä¯ÀÀ       27
Ïà¹ØÎÄÕÂ

»ùÓÚͼ¾í»ýÍøÂçµÄͼÉî¶Èѧϰ
×Ô¶¯¼ÝÊ»ÖеÄ3DÄ¿±ê¼ì²â
¹¤Òµ»úÆ÷ÈË¿ØÖÆÏµÍ³¼Ü¹¹½éÉÜ
ÏîĿʵս£ºÈçºÎ¹¹½¨ÖªÊ¶Í¼Æ×
 
Ïà¹ØÎĵµ

5GÈ˹¤ÖÇÄÜÎïÁªÍøµÄµäÐÍÓ¦ÓÃ
Éî¶ÈѧϰÔÚ×Ô¶¯¼ÝÊ»ÖеÄÓ¦ÓÃ
ͼÉñ¾­ÍøÂçÔÚ½»²æÑ§¿ÆÁìÓòµÄÓ¦ÓÃÑо¿
ÎÞÈË»úϵͳԭÀí
Ïà¹Ø¿Î³Ì

È˹¤ÖÇÄÜ¡¢»úÆ÷ѧϰ&TensorFlow
»úÆ÷ÈËÈí¼þ¿ª·¢¼¼Êõ
È˹¤ÖÇÄÜ£¬»úÆ÷ѧϰºÍÉî¶Èѧϰ
ͼÏñ´¦ÀíËã·¨·½·¨Óëʵ¼ù
×îл¼Æ»®
DeepSeekÔÚÈí¼þ²âÊÔÓ¦ÓÃʵ¼ù 4-12[ÔÚÏß]
DeepSeek´óÄ£ÐÍÓ¦Óÿª·¢Êµ¼ù 4-19[ÔÚÏß]
UAF¼Ü¹¹ÌåϵÓëʵ¼ù 4-11[±±¾©]
AIÖÇÄÜ»¯Èí¼þ²âÊÔ·½·¨Óëʵ¼ù 5-23[ÉϺ£]
»ùÓÚ UML ºÍEA½øÐзÖÎöÉè¼Æ 4-26[±±¾©]
ÒµÎñ¼Ü¹¹Éè¼ÆÓ뽨ģ 4-18[±±¾©]
 
×îÐÂÎÄÕÂ
¶àÄ¿±ê¸ú×Ù£ºAI²úÆ·¾­ÀíÐèÒªÁ˽âµÄCVͨʶ
Éî¶Èѧϰ¼Ü¹¹
¾í»ýÉñ¾­ÍøÂç֮ǰÏò´«²¥Ëã·¨
´Ó0µ½1´î½¨AIÖÐ̨
¹¤Òµ»úÆ÷ÈË¿ØÖÆÏµÍ³¼Ü¹¹½éÉÜ
×îпγÌ
È˹¤ÖÇÄÜ£¬»úÆ÷ѧϰºÍÉî¶Èѧϰ
È˹¤ÖÇÄÜÓë»úÆ÷ѧϰӦÓÃʵս
È˹¤ÖÇÄÜ-ͼÏñ´¦ÀíºÍʶ±ð
È˹¤ÖÇÄÜ¡¢»úÆ÷ѧϰ& TensorFlow+Keras¿ò¼Üʵ¼ù
È˹¤ÖÇÄÜ+Python£«´óÊý¾Ý
³É¹¦°¸Àý
ij×ÛºÏÐÔ¿ÆÑлú¹¹ È˹¤ÖÇÄÜÓë»úÆ÷ѧϰӦÓÃ
Ä³ÒøÐÐ È˹¤ÖÇÄÜ+Python+´óÊý¾Ý
±±¾© È˹¤ÖÇÄÜ¡¢»úÆ÷ѧϰ& TensorFlow¿ò¼Üʵ¼ù
ijÁìÏÈÊý×ÖµØÍ¼ÌṩÉÌ PythonÊý¾Ý·ÖÎöÓë»úÆ÷ѧϰ
ÖйúÒÆ¶¯ È˹¤ÖÇÄÜ¡¢»úÆ÷ѧϰºÍÉî¶Èѧϰ