Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Model Center   Code  
»áÔ±   
   
 
     
   
 ¶©ÔÄ
  ¾èÖú
Éî¶ÈѧϰÔÚµçÉÌÉÌÆ·ÍƼöµ±ÖеÄÓ¦ÓÃ
 
À´Ô´£ºbigdata.evget.com ·¢²¼ÓÚ£º 2017-9-12
  5552  次浏览      30
 

1.³£¼ûËã·¨Ì×·

µçÉÌÐÐÒµÖУ¬¶ÔÓÚÓû§µÄÉÌÆ·ÍƼöÒ»Ö±ÊÇÒ»¸ö·Ç³£ÈÈÃŶøÇÒÖØÒªµÄ»°Ì⣬Óкܶà±È½Ï³ÉÊìµÄ·½·¨£¬µ«ÊÇÒ²¸÷ÓÐÀû±×£¬´óÖÂÈçÏ£º

»ùÓÚÉÌÆ·ÏàËÆ¶È£º±ÈÈçʳÎïAºÍʳÎïB£¬¶ÔÓÚËüÃǼ۸ñ¡¢Î¶µÀ¡¢±£ÖÊÆÚ¡¢Æ·ÅƵÈά¶È£¬¿ÉÒÔ¼ÆËãËüÃǵÄÏàËÆ³Ì¶È£¬¿ÉÒÔÏëÏó£¬ÎÒÂòÁ˰ü×Ó£¬ºÜÓпÉÄÜ˳·´øÒ»ºÐË®½È»Ø¼Ò¡£

Óŵ㣺ÀäÆô¶¯£¬ÆäʵֻҪÄãÓÐÉÌÆ·µÄÊý¾Ý£¬ÔÚÒµÎñ³õÆÚÓû§Êý¾Ý²»¶àµÄÇé¿öÏ£¬Ò²¿ÉÒÔ×öÍÆ¼ö

ȱµã£ºÔ¤´¦Àí¸´ÔÓ£¬ÈκÎÒ»¼þÉÌÆ·£¬Î¬¶È¿ÉÒÔ˵ÖÁÉÙ¿ÉÒÔÉϰ٣¬ÈçºÎѡȡºÏÊʵÄά¶È½øÐмÆË㣬Éè¼Æµ½¹¤³Ì¾­Ñ飬ÕâЩҲÊÇ»¨Ç®Âò²»µ½µÄ

µäÐÍ£ºÑÇÂíÑ·ÔçÆÚµÄÍÆ¼öϵͳ

»ùÓÚ¹ØÁª¹æÔò£º×î³£¼ûµÄ¾ÍÊÇͨ¹ýÓû§¹ºÂòµÄϰ¹ß£¬¾­µäµÄ¾ÍÊÇ¡°Æ¡¾ÆÄò²¼¡±µÄ°¸Àý£¬µ«ÊÇʵ¼ÊÔËÓªÖÐÕâÖÖ·½·¨ÔËÓõÄÒ²ÊÇ×îÉٵģ¬Ê×ÏÈÒª×ö¹ØÁª¹æÔò£¬Êý¾ÝÁ¿Ò»¶¨Òª³ä×㣬·ñÔòÖÃÐŶÈÌ«µÍ£¬µ±Êý¾ÝÁ¿ÉÏÉýÁË£¬ÎÒÃÇÓиü¶àÓÅÐãµÄ·½·¨£¬¿ÉÒÔ˵ûÓÐʲôÁÁµã£¬ÒµÄÚµÄËã·¨ÓÐapriori¡¢ftgrowÖ®ÀàµÄ

Óŵ㣺¼òµ¥ÒײÙ×÷£¬ÉÏÊÖËٶȿ죬²¿ÊðÆðÀ´Ò²·Ç³£·½±ã

ȱµã£ºÐèÒªÓн϶àµÄÊý¾Ý£¬¾«¶ÈЧ¹ûÒ»°ã

µäÐÍ£ºÔçÆÚÔËÓªÉ̵ÄÌײÍÍÆ¼ö

»ùÓÚÎïÆ·µÄÐ­Í¬ÍÆ¼ö£º¼ÙÉèÎïÆ·A±»Ð¡ÕÅ¡¢Ð¡Ã÷¡¢Ð¡¶­Âò¹ý£¬ÎïÆ·B±»Ð¡ºì¡¢Ð¡Àö¡¢Ð¡³¿Âò¹ý£¬ÎïÆ·C±»Ð¡ÕÅ¡¢Ð¡Ã÷¡¢Ð¡ÀîÂò¹ý;Ö±¹ÛµÄ¿´À´£¬ÎïÆ·AºÍÎïÆ·CµÄ¹ºÂòÈËȺÏàËÆ¶È¸ü¸ß(Ïà¶ÔÓÚÎïÆ·B)£¬ÏÖÔÚÎÒÃÇ¿ÉÒÔ¶ÔС¶­ÍƼöÎïÆ·C£¬Ð¡ÀîÍÆ¼öÎïÆ·A£¬Õâ¸öÍÆ¼öËã·¨±È½Ï³ÉÊ죬ÔËÓõĹ«Ë¾Ò²±È½Ï¶à

Óŵ㣺Ïà¶Ô¾«×¼£¬½á¹û¿É½âÊÍÐÔÇ¿£¬¸±²úÎï¿ÉÒԵóöÉÌÆ·ÈÈÃÅÅÅÐò

ȱµã£º¼ÆË㸴ÔÓ£¬Êý¾Ý´æ´¢Æ¿¾±£¬ÀäÃÅÎïÆ·ÍÆ¼öЧ¹û²î

µäÐÍ£ºÔçÆÚÒ»ºÅµêÉÌÆ·ÍƼö

»ùÓÚÓû§µÄÐ­Í¬ÍÆ¼ö£º¼ÙÉèÓû§AÂò¹ý¿ÉÀÖ¡¢Ñ©±Ì¡¢»ð¹øµ×ÁÏ£¬Óû§BÂò¹ýÎÀÉúÖ½¡¢Ò·þ¡¢Ð¬£¬Óû§CÂò¹ý»ð¹ø¡¢¹ûÖ­¡¢Æßϲ;Ö±¹ÛÉÏÀ´¿´£¬Óû§AºÍÓû§CÏàËÆ¶È¸ü¸ß(Ïà¶ÔÓÚÓû§B)£¬ÏÖÔÚÎÒÃÇ¿ÉÒÔ¶ÔÓû§AÍÆ¼öÓû§CÂò¹ýµÄÆäËû¶«Î÷£¬¶ÔÓû§CÍÆ¼öÓû§AÂò¹ýÂò¹ýµÄÆäËû¶«Î÷£¬ÓÅȱµãÓë»ùÓÚÎïÆ·µÄÐ­Í¬ÍÆ¼öÀàËÆ£¬²»Öظ´ÁË¡£

»ùÓÚÄ£Ð͵ÄÍÆ¼ö£ºsvd+¡¢ÌØÕ÷Öµ·Ö½âµÈµÈ£¬½«Óû§µÄ¹ºÂòÐÐΪµÄ¾ØÕó²ð·Ö³ÉÁ½×éÈ¨ÖØ¾ØÕóµÄ³Ë»ý£¬Ò»×龨Õó´ú±íÓû§µÄÐÐÎªÌØÕ÷£¬Ò»×龨Õó´ú±íÉÌÆ·µÄÖØÒªÐÔ£¬ÔÚÓû§ÍƼö¹ý³ÌÖУ¬¼ÆËã¸ÃÓû§ÔÚÀúʷѵÁ·¾ØÕóϵĸ÷ÉÌÆ·µÄ¿ÉÄÜÐÔ½øÐÐÍÆ¼ö¡£

Óŵ㣺¾«×¼£¬¶ÔÓÚÀäÃŵÄÉÌÆ·Ò²Óкܲ»´íµÄÍÆ¼öЧ¹û

ȱµã£º¼ÆËãÁ¿·Ç³£´ó£¬¾ØÕó²ð·ÖµÄЧÄܼ°ÄÜÁ¦Æ¿¾±Ò»Ö±ÊÇÊÜÔ¼ÊøµÄ

µäÐÍ£º»ÝÆÕµÄµçÄÔÍÆ¼ö

»ùÓÚʱÐòµÄÍÆ¼ö£ºÕâ¸ö±È½ÏÌØ±ð£¬ÔÚµçÉÌÔËÓõÄÉÙ£¬ÔÚTwitter£¬Facebook£¬¶¹°êÔËÓõıȽ϶࣬¾ÍÊÇÖ»ÓÐÔÞͬºÍ·´¶ÔµÄÇé¿öÏ£¬Ôõô½øÐÐÆÀÂÛÅÅÐò£¬ÏêϸµÄ¿ÉÒԲμûÎÒ֮ǰдµÄһƪÎÄÕ£ºÓ¦ÓãºÍƼöϵͳ-Íþ¶ûÑ·Çø¼ä·¨

»ùÓÚÉî¶ÈѧϰµÄÍÆ¼ö£ºÏÖÔڱȽϻðµÄCNN(¾í»ýÉñ¾­ÍøÂç)¡¢RNN(Ñ­»·Éñ¾­ÍøÂç)¡¢DNN(Éî¶ÈÉñ¾­ÍøÂç)¶¼ÓÐÔËÓÃÔÚÍÆ¼öÉÏÃæµÄÀý×Ó£¬µ«ÊǶ¼»¹ÊÇÊÔÑé½×¶Î£¬µ«ÊÇÓиö»ùÓÚword2vecµÄ·½·¨ÒѾ­Ïà¶Ô±È½Ï³ÉÊ죬ҲÊÇÎÒÃǽñÌì½éÉܵÄÖØµã¡£

ÓÅµã£ºÍÆ¼öЧ¹û·Ç³£¾«×¼£¬ËùÐèÒªµÄ»ù´¡´æ´¢×ÊÔ´½ÏÉÙ

ȱµã£º¹¤³ÌÔËÓò»³ÉÊ죬ģÐÍѵÁ·µ÷²Î¼¼ÇÉÄÑ

µäÐÍ£ºËÕÄþÒ×¹ºµÄ»áÔ±ÉÌÆ·ÍƼö

2.item2vecµÄ¹¤³ÌÒýÈë

ÏÖÔÚËÕÄþµÄÉÌÆ·ÓÐÔ¼4ÒÚ¸ö£¬ÉÌÆ·µÄÀàÄ¿ÓÐ10000¶à×飬´óµÄÆ·ÀàÒ²Óнü40¸ö£¬Èç¹ûͨ¹ý´«Í³µÄÐ­Í¬ÍÆ¼ö£¬ÊµÊ±¼ÆËãµÄ»°£¬·þÎñÆ÷³É±¾£¬¼ÆËãÄÜÁ¦¶¼ÊǷdz£´óµÄ¾ÖÏÞ£¬Ö®Ç°ÒѾ­Óйý¼¸ÆªÓ¦ÓýéÉÜ£º»ùÓÚÍÆ¼öµÄ½»²æÏúÊÛ¡¢»ùÓÚÓû§ÐÐΪµÄÍÆ¼öÔ¤¹À¡£»áÔ±Ñз¢²¿ÃÅÒòΪ²»ÊÇÖ÷ÒªÍÆ¼öµÄÓ¦Óò¿ÃÅ£¬ËùÒÔÔÚÑ¡ÔñÉÏ£¬ÎÒÃÇÆÚÍûµÄÊǸü¼Ó¸ßЧ¸ßËÙÇÒÏà¶Ô׼ȷµÄ¼òÔ¼°æÄ£ÐÍ·½Ê½£¬ËùÒÔÎÒÃÇÕâ±ß»ùÓÚÁËword2vecµÄԭʼËã·¨£¬·ÂÔìÁËitemNvecµÄ·½Ê½¡£

Ê×ÏÈ£¬ÈÃÎÒÃǶÔitemNvec½øÐÐÀíÂÛ²ð·Ö£º

part one£ºn-gram

Ä¿±êÉÌÆ·µÄǰºóÉÌÆ·¶ÔÄ¿±êÉÌÆ·µÄÓ°Ïì³Ì¶È

ÕâÊÇÁ½¸öÓû§userA£¬userBÔÚÒ×¹ºÉÏÃæµÄÏû·Ñtime line£¬»ÒÉ«·½¿òÄÚΪÎÒÃǹ۲ì¶ÔÏó£¬ÊÔÎÊһϣ¬Èç¹û»»Ò»Ï»ÒÉ«·½¿òÄÚµÄuserA¡¢userBµÄ¹ºÂòÎïÆ·£¬Ö±¹ÛµÄ¿ÉÄÜÐÔÓжà´ó?

Ö±¹ÛµÄÌåÑ鏿ËßÎÒÃÇ£¬ÕâÊDz»¿ÉÄܳöÏÖ£¬»òÕß¾ø¶Ô²»Êdz£³öÏֵģ¬ËùÒÔ£¬ÎÒÃǾÍÓÐÒ»¸ö³õʼµÄ¼ÙÉ裬¶ÔÓÚijЩÓû§ÔÚÌØ¶¨µÄÀàĿϣ¬Óû§µÄÏû·ÑÐÐΪÊÇÁ¬ÐøÓ°ÏìµÄ£¬»»¾ä»°Ëµ£¬¾ÍÊÇÎÒÂòÁËʲô¶«Î÷ÊÇÒÀÀµÎÒ֮ǰÂò¹ýʲô¶«Î÷¡£ÈçºÎͨ¹ýËã·¨ÓïÑÔ½âÊÍÉÏÃæËµµÄÕâ¼þÊÂÄØ?

´ó¼Ò»ØÏëһϣ¬naive bayes×öÀ¬»øÓʼþ·ÖÀàµÄʱºòÊÇÔõô×öµÄ?

¼ÙÉè¡°ÎÒ¹«Ë¾¿ÉÒÔÌṩ·¢Æ±¡¢¾ü»ð³öÊÛ¡¢º½Ä¸Î¬ÐÞ¡±Õâ¾ä»°ÊDz»ÊÇÀ¬»øÓʼþ?

P1(¡°À¬»øÓʼþ¡±|¡°ÎÒ¹«Ë¾¿ÉÒÔÌṩ·¢Æ±¡¢¾ü»ð³öÊÛ¡¢º½Ä¸Î¬ÐÞ¡±)

=p(¡°À¬»øÓʼþ¡±)p(¡°ÎÒ¹«Ë¾¿ÉÒÔÌṩ·¢Æ±¡¢¾ü»ð³öÊÛ¡¢º½Ä¸Î¬ÐÞ¡±/¡°À¬»øÓʼþ¡±)/p(¡°ÎÒ¹«Ë¾¿ÉÒÔÌṩ·¢Æ±¡¢¾ü»ð³öÊÛ¡¢º½Ä¸Î¬ÐÞ¡±)

=p(¡°À¬»øÓʼþ¡±)p(¡°·¢Æ±¡±£¬¡°¾ü»ð¡±£¬¡°º½Ä¸¡±/¡°À¬»øÓʼþ¡±)/p(¡°·¢Æ±¡±£¬¡°¾ü»ð¡±£¬¡°º½Ä¸¡±)

ͬÀí

P2(¡°Õý³£Óʼþ¡±|¡°ÎÒ¹«Ë¾¿ÉÒÔÌṩ·¢Æ±¡¢¾ü»ð³öÊÛ¡¢º½Ä¸Î¬ÐÞ¡±)

=p(¡°Õý³£Óʼþ¡±)p(¡°·¢Æ±¡±£¬¡°¾ü»ð¡±£¬¡°º½Ä¸¡±/¡°Õý³£Óʼþ¡±)/p(¡°·¢Æ±¡±£¬¡°¾ü»ð¡±£¬¡°º½Ä¸¡±)

ÎÒÃÇÖ»ÐèÒª±È½Ïp1ºÍp2µÄ´óС¼´¿É£¬ÔÚÌõ¼þ¶ÀÁ¢µÄÇé¿öÏ¿ÉÒÔÖ±½Óд³É£º

P1(¡°À¬»øÓʼþ¡±|¡°ÎÒ¹«Ë¾¿ÉÒÔÌṩ·¢Æ±¡¢¾ü»ð³öÊÛ¡¢º½Ä¸Î¬ÐÞ¡±)

=p(¡°À¬»øÓʼþ¡±)p(¡°·¢Æ±¡±/¡°À¬»øÓʼþ¡±)p(¡°¾ü»ð¡±/¡°À¬»øÓʼþ¡±)p(¡°º½Ä¸¡±/¡°À¬»øÓʼþ¡±)

P2(¡°Õý³£Óʼþ¡±|¡°ÎÒ¹«Ë¾¿ÉÒÔÌṩ·¢Æ±¡¢¾ü»ð³öÊÛ¡¢º½Ä¸Î¬ÐÞ¡±)

=p(¡°Õý³£Óʼþ¡±)p(¡°·¢Æ±¡±/¡°Õý³£Óʼþ¡±)p(¡°¾ü»ð¡±/¡°Õý³£Óʼþ¡±)p(¡°º½Ä¸¡±/¡°Õý³£Óʼþ¡±)

µ«ÊÇ£¬ÎÒÃÇ¿´µ½£¬ÎÞÂÛ¡°ÎÒ¹«Ë¾¿ÉÒÔÌṩ·¢Æ±¡¢¾ü»ð³öÊÛ¡¢º½Ä¸Î¬ÐÞ¡±´ÊÓïµÄ˳ÐòÔõô±ä»¯£¬²»Ó°ÏìËü×îºóµÄ½á¹ûÅж¨£¬µ«ÊÇÎÒÃÇÕâ±ßµÄÐèÇóÀïÃæÇ°ÃæÂòµÄ¶«Î÷¶ÔºóÏîµÄÓ°Ïì»á¸ü´ó¡£

±ùÏä=>Ï´Ò»ú=>Ò¹ñ=>µçÊÓ=>ÆûË®£¬ÕâÑùµÄϵ¥Á÷³ÌºÏÀí

±ùÏä=>Ï´Ò»ú=>ÆûË®=>µçÊÓ=>Ò¹ñ£¬ÕâÑùµÄϵ¥Á÷³ÌÏà¶ÔÀ´½²¿ÉÄÜÐÔ»á¸üµÍ

µ«ÊǶÔÓÚnaive bayes£¬ËüÃÇÊÇÒ»Öµġ£

ËùÒÔ£¬ÎÒÃÇÕâ±ß¿¼ÂÇ˳Ðò£¬»¹ÊÇÉÏÃæÄǸöÀ¬»øÓʼþµÄÎÊÌâ¡£

P1(¡°À¬»øÓʼþ¡±|¡°ÎÒ¹«Ë¾¿ÉÒÔÌṩ·¢Æ±¡¢¾ü»ð³öÊÛ¡¢º½Ä¸Î¬ÐÞ¡±)

=p(¡°À¬»øÓʼþ¡±)p(¡°·¢Æ±¡±)p(¡°¾ü»ð¡±/¡°·¢Æ±¡±)p(¡°¾ü»ð¡±/¡°º½Ä¸¡±)

P1(¡°Õý³£Óʼþ¡±|¡°ÎÒ¹«Ë¾¿ÉÒÔÌṩ·¢Æ±¡¢¾ü»ð³öÊÛ¡¢º½Ä¸Î¬ÐÞ¡±)

=p(¡°Õý³£Óʼþ¡±)p(¡°·¢Æ±¡±)p(¡°¾ü»ð¡±/¡°·¢Æ±¡±)p(¡°¾ü»ð¡±/¡°º½Ä¸¡±)

Õâ±ßÎÒÃÇÿ¸ö´ÊÖ»ÒÀÀµÇ°Ò»¸ö´Ê£¬ÀíÂÛÉϽ²ÒÀÀµ1-3¸ö´Êͨ³£¶¼ÊǿɽÓÊܵġ£ÒÔÉϵĿ¼ÂÇ˳ÐòµÄbayes¾ÍÊÇ»ùÓÚÖøÃûµÄÂí¶û¿Æ·ò¼ÙÉè(Markov Assumption)£ºÏÂÒ»¸ö´ÊµÄ³öÏÖ½öÒÀÀµÓÚËüÇ°ÃæµÄÒ»¸ö»ò¼¸¸ö´ÊϵÄÁªºÏ¸ÅÂÊÎÊÌ⣬Ïà¹ØÏêϸµÄÀíÂÛÊýѧ¹«Ê½¾Í²»¸ø³öÁË£¬Õâ±ßÕâÉæ¼°Ò»¸ö˼Ïë¡£

part two£ºHuffman Coding

¸ü´óµÄÊý¾Ý´æ´¢ÐÎʽ

ÎÒÃdz£ÓõÄuserµ½itemµÄÓ³ÉäÊÇͨ¹ýone hot encodingµÄÐÎʽȥʵÏֵģ¬ÕâÓÐÒ»¸ö·Ç³£´óµÄ±×¶Ë¾ÍÊÇÊý¾Ý´æ´¢ÏµÊýÇÒά¶ÈÔÖÄÑ¿ÉÄÜÐÔ¼«´ó¡£

»Øµ½×î³õµÄÄÇ×éÊý¾Ý£ºÏÖÔÚËÕÄþµÄÉÌÆ·ÓÐÔ¼4ÒÚ¸ö£¬ÉÌÆ·µÄÀàÄ¿ÓÐ10000¶à×飬´óµÄÆ·ÀàÒ²Óнü40¸ö£¬Í¬Ê±ÏÖÔÚ»áÔ±ÊýÄ¿´ïµ½3ÒÚ£¬ÒªÊÇÐèÒª½¨ÔìÒ»¸öÓû§ÉÌÆ·¶ÔÓ¦µÄ¹ºÂò¹ØÏµ¾ØÕó×ö»ùÓÚÓû§µÄÐ­Í¬ÍÆ¼öµÄ»°£¬ÎÒÃÇÐèÒª×öÒ»¸ö4ÒÚX6ÒÚµÄ1/0¾ØÕó£¬Õâ¸öÊǼ¸ºõ²»¿ÉÄܵģ¬Huffman²ÉÈ¡ÁËÒ»¸ö½üËÆ¶þ²æÊ÷µÄÐÎʽ½øÐд洢£º

ÎÒÃÇÒÔÒ×¹ºÉÌÆ·¹ºÂòÁ¿ÎªÀý£¬½²½âÒ»ÏÂÈçºÎÒÔ¶þ²æÊ÷µÄÐÎÊ½Ìæ»»one hot encoding´æ´¢·½Ê½£º

¼ÙÉ裬818ËÕÄþ´ó´ÙÆÚ¼ä£¬¾­¹ýͳ¼Æ£¬ÓбùÏä=>Ï´Ò»ú=>ºæ¸É»ú=>µçÊÓ=>Ò¹ñ=>×êʯµÄÓû§Ïµ¥Á´Ìõ(¼°¹ºÂòÎïÆ·Ë³ÐòÈçÉÏ)£¬ÆäÖбùÏä×ÜÊÛ³ö15Íǫ̀£¬Ï´Ò»ú×ÜÊÛ³ö8Íǫ̀£¬ºæ¸É»ú×ÜÊÛ³ö6Íǫ̀£¬µçÊÓ×ÜÊÛ³ö5Íǫ̀£¬Ò¹ñ×ÜÊÛ³ö3Íǫ̀£¬×êʯ×ÜÊÛ³ö1Íò¿Å

HuffmanÊ÷¹¹Ôì¹ý³Ì

1.¸ø¶¨{15,8,6,5,3,1}Ϊ¶þ²æÊ÷µÄ½Úµã£¬Ã¿¸öÊ÷½öÓÐÒ»¸ö½Úµã£¬ÄǾʹæÔÚ6¿Åµ¥¶ÀµÄÊ÷

2.Ñ¡Ôñ½ÚµãÈ¨ÖØÖµ×îСµÄÁ½¿ÅÊ÷½øÐкϲ¢Ò²¾ÍÊÇ{3}¡¢{1}£¬ºÏ²¢ºó¼ÆËãÐÂÈ¨ÖØ3+1=4

3.½«{3}£¬{1}Ê÷´Ó½ÚµãÁбíɾ³ý£¬½«3+1=4µÄÐÂ×éºÏÊ÷·Å»ØÔ­½ÚµãÁбí

4.ÖØÐ½øÐÐ2-3£¬Ö±µ½Ö»Ê£Ò»¿ÃÊ÷Ϊֹ

Õë¶Ôÿ²ãÿ´Î·ÖÖ§¹ý³Ì£¬ÎÒÃÇ¿ÉÒÔ½«ËùÓÐÈ¨ÖØ´óµÄ½Úµã¿´×öÊÇ1£¬È¨ÖØÐ¡µÄ½Úµã¿´×öÊÇ0£¬Ïà·´Òà¿É¡£ÏÖÔÚ£¬ÎÒÃDZÈÈçÐèÒªÖªµÀ×êʯµÄcode£¬¾ÍÊÇ1000£¬Ò²¾ÍÊÇ»ÒÉ«·½¿òµÄλÖã¬Ï´Ò»úµÄcode¾ÍÊÇ111;ÕâÑùµÄ´æ´¢ÀûÓÃÁË0/1µÄ´æ´¢·½Ê½£¬Ò²Í¬Ê±¿¼ÂÇÁË×éºÏλÖõÄÅÅÁг¤¶È£¬½ÚÊ¡ÁËÊý¾ÝµÄ´æ´¢¿Õ¼ä¡£

part three£ºnode probility

×î´ó»¯µ±Ç°Êý¾Ý³öÏÖ¿ÉÄܵĸÅÂÊÃܶȺ¯Êý

¶ÔÓÚ×êʯµÄλÖöøÑÔ£¬ËüµÄHuffman codeÊÇ1000£¬ÄǾÍÒâζ×ÅÔÚÿһ´Î¶þ²æÑ¡ÔñµÄʱºò£¬ËüÐèÒªÒ»´Î±»·Öµ½1£¬Èý´Î±»·Öµ½0£¬¶øÇÒÿ´Î·ÖµÄ¹ý³ÌÖУ¬Ö»ÓÐ1/0¿ÉÒÔÑ¡Ôñ£¬ÕâÊDz»ÊǺÍlogistic regressionÀïÃæµÄ0/1·ÖÀàÏàËÆ£¬ËùÒÔÕâ±ßÎÒÃÇÒ²Ö±½ÓʹÓÃÁËlrÀïÃæµÄ½»²æìØÀ´×÷Ϊloss function¡£

Æäʵ¶ÔÓںܶà»úÆ÷ѧϰµÄËã·¨¶øÑÔ£¬¶¼Êǰ´ÕÕÏȼٶ¨Ò»¸öÄ£ÐÍ£¬ÔÙ¹¹ÔìÒ»¸öËðʧº¯Êý£¬Í¨¹ýÊý¾ÝÀ´ÑµÁ·Ëðʧº¯ÊýÇóargmin(Ëðʧº¯Êý)µÄ²ÎÊý£¬·Å»Øµ½Ô­Ä£ÐÍ¡£

ÈÃÎÒÃÇÏêϸµÄ¿´Õâ¸ö×êʯÕâ¸öÀý×Ó£º

µÚÒ»²½

p(1|No.1²ãδ֪²ÎÊý)=sigmoid(No.1²ãδ֪²ÎÊý)

µÚ¶þ²½

p(0|No.2²ãδ֪²ÎÊý)=sigmoid(No.2²ãδ֪²ÎÊý)

ͬÀí£¬µÚÈýµÚËIJ㣺

p(0|No.3²ãδ֪²ÎÊý)=sigmoid(No.3²ãδ֪²ÎÊý)

p(0|No.4²ãδ֪²ÎÊý)=sigmoid(No.4²ãδ֪²ÎÊý)

È»ºóÇóp(1|No.1²ãδ֪²ÎÊý)xp(0|No.2²ãδ֪²ÎÊý)xp(0|No.3²ãδ֪²ÎÊý)xp(0|No.4²ãδ֪²ÎÊý)×î´ó϶ÔÓ¦µÄÿ²ãµÄδ֪²ÎÊý¼´¿É£¬Çó½â·½Ê½ÓëlogisticÇó½â·½Ê½½üËÆ£¬Î´Öª²ÎÊý·Ö²¼Æ«µ¼£¬ºóÐø²ÉÓÃÌݶÈϽµµÄ·½Ê½(¼«´ó¡¢ÅúÁ¿¡¢Å£¶Ù°´ÐèʹÓÃ)

part four£ºapproximate nerual network

ÉÌÆ·µÄÏàËÆ¶È

¸Õ²ÅÔÚpart threeÀïÃæÓиöp(1|No.1²ãδ֪²ÎÊý)Õâ¸öÂß¼­£¬Õâ¸öNO.1²ãδ֪²ÎÊýÀïÃæÓÐÒ»¸ö¾ÍÊÇÉÌÆ·ÏòÁ¿¡£

¾Ù¸öÀý×Ó£º

´æÔÚ1000Íò¸öÓû§Óйý£º¡°Æ¡¾Æ=>Î÷¹Ï=>ÌêÐëµ¶=>°ÙÊ¿ÉÀÖ¡±µÄÉÌÆ·¹ºÂò˳Ðò

10Íò¸öÓû§Óйý£º¡°Æ¡¾Æ=>Æ»¹û=>ÌêÐëµ¶=>°ÙÊ¿ÉÀÖ¡±µÄÉÌÆ·¹ºÂò˳Ðò£¬Èç¹û°´ÕÕ´«Í³µÄ¸ÅÂÊÄ£ÐͱÈÈçnavie bayes »òÕßn-gramÀ´¿´£¬P(Æ¡¾Æ=>Î÷¹Ï=>ÌêÐëµ¶=>°ÙÊ¿ÉÀÖ)>>p(Æ¡¾Æ=>Æ»¹û=>ÌêÐëµ¶=>°ÙÊ¿ÉÀÖ)£¬µ«ÊÇʵ¼ÊÉÏÕâÁ½ÕßµÄÈËȺӦ¸ÃÊÇͬһ²¨ÈË£¬ËûÃǵÄÊôÐÔÌØÕ÷Ò»¶¨»áÊÇÒ»ÑùµÄ²Å¶Ô¡£

ÎÒÃÇÕâ±ßͨ¹ýÁËËæ»ú³õʼ»¯Ã¿¸öÉÌÆ·µÄÌØÕ÷ÏòÁ¿£¬È»ºóͨ¹ýpart threeµÄ¸ÅÂÊÄ£ÐÍȥѵÁ·£¬×îºóÈ·¶¨ÁË´ÊÏòÁ¿µÄ´óС¡£³ý´ËÖ®Í⣬»¹¿ÉÒÔͨ¹ýÉñ¾­ÍøÂçË㷨ȥ×öÕâÑùµÄÊÂÇé¡£

Bengio µÈÈËÔÚ 2001 Äê·¢±íÔÚ NIPS ÉϵÄÎÄÕ¡¶A Neural Probabilistic Language Model¡·½éÉÜÁËÏêϸµÄ·½·¨¡£

ÎÒÃÇÕâ±ßÐèÒªÖªµÀµÄ¾ÍÊÇ£¬¶ÔÓÚ×îСά¶ÈÉÌÆ·£¬ÎÒÃÇÒÔÉÌÆ·ÏòÁ¿(0.8213£¬0.8232£¬0.6613£¬0.1234£¬¡­)µÄÐÎÊ½Ìæ´úÁË0-1µã(0£¬0£¬0£¬0£¬0£¬1£¬0£¬0£¬0£¬0¡­)£¬µ¥¸öµÄÉÌÆ·ÏòÁ¿ÎÞÒâÒ壬µ«ÊdzɶԵÄÉÌÆ·ÏòÁ¿ÎÒÃǾͿÉÒԱȽÏËûÃǼäµÄÓàÏÒÏàËÆ¶È£¬¾Í¿ÉÒԱȽÏÀàÄ¿µÄÏàËÆ¶È£¬ÉõÖÁÆ·ÀàµÄÏàËÆ¶È¡£

3.python´úÂëʵÏÖ

1.Êý¾Ý¶ÁÈ¡

# -*- coding:utf-8 -*-
import pandas as pd
import numpy as np
import matplotlib as mt
from gensim.models import word2vec
from sklearn.model_selection import train_test_split

order_data = pd.read_table ('C:/Users/17031877/Desktop/ SuNing/cross_sell_data_tmp1.txt')
dealed_data = order_data.drop ('member_id', axis=1)
dealed_data = pd.DataFrame (dealed_data).fillna(value='')

2.¼òµ¥µÄÊý¾ÝºÏ²¢ÕûÀí

# Êý¾ÝºÏ²¢
dealed_data = dealed_data['top10'] + [" "] + dealed_data['top9'] + [" "] + dealed_data['top8'] + [" "] + \
dealed_data['top7'] + [" "] + dealed_data['top6'] + [" "] + dealed_data['top5'] + [" "] + dealed_data[
'top4'] + [" "] + dealed_data['top3'] + [" "] + dealed_data['top2'] + [" "] + dealed_data['top1']

# Êý¾Ý·ÖÁÐ
dealed_data = [s.encode('utf-8').split() for s in dealed_data]

# Êý¾Ý²ð·Ö
train_data, test_data = train_test_split(dealed_data, test_size=0.3, random_state=42)

3.Ä£ÐÍѵÁ·

# ԭʼÊý¾ÝѵÁ·
# sg=1,skipgram;sg=0,SBOW
# hs=1:hierarchical softmax,huffmantree
# nagative = 0 ·Ç¸º²ÉÑù
model = word2vec.Word2Vec (train_data, sg=1, min_count=10, window=2, hs=1, negative=0)
½ÓÏÂÀ´¾ÍÊÇÓÃmodelÀ´ÑµÁ·µÃµ½ÎÒÃǵÄÍÆ¼öÉÌÆ·£¬ Õâ±ßÓÐÈý¸ö˼·£¬¿ÉÒÔ¸ù¾Ý¾ßÌåµÄÒµÎñÐèÇóºÍʵ¼ÊÊý¾ÝÁ¿À´Ñ¡Ôñ£º

3.1 ÏàËÆÉÌÆ·Ó³Éä±í

# ×îºóÒ»´Îä¯ÀÀÉÌÆ·×îÏàËÆµÄÉÌÆ·×étop3
x = 1000
result = []
result = pd.DataFrame(result)
for i in range(x):
test_data_split = [s.encode('utf-8').split() for s in test_data[i]]
k = len(test_data_split)
last_one = test_data_split[k - 1]
last_one_recommended = model.most_similar(last_one, topn=3)
tmp = last_one_recommended[0] + last_one_recommended[1] + last_one_recommended[2]
last_one_recommended = pd.concat([pd.DataFrame(last_one), pd.DataFrame(np.array(tmp))], axis=0)
last_one_recommended = last_one_recommended.T
result = pd.concat([pd.DataFrame(last_one_recommended), result], axis=0)

¿¼ÂÇÓû§×îºóÒ»´Î²Ù×÷µÄ¹Ø×¢ÎïÆ·x£¬¸ÉµôÄÇЩÒѾ­±»Óû§¹ºÂòµÄÉÌÆ·£¬Ê£ÏµÄÉÌÆ·±íʾÓû§ÒÀ¾ÉÓÐÐËȤµ«ÊÇÒòΪûÕÒµ½ºÏÊʵĻòÕß±ãÒ˵ÄÉÌÆ·£¬Í¨¹ýÉÌÆ·ÏòÁ¿Ö®¼äµÄÏàËÆ¶È£¬¿ÉÒÔÖ±½Ó¼ÆËã³ö£¬ÓëÆä¸ß¶ÈÏàËÆµÄÉÌÆ·ÍƼö¸øÓû§¡£

3.2 ×î´ó¿ÉÄܹºÂòÉÌÆ·

¸ù¾ÝÀúÊ·ÉÏÓû§ÒÀ¾É¹ºÂòµÄÉÌÆ·Ë³Ðò£¬Åжϸù¾Ýµ±Ç°Õâ¸öÄ¿±êÓû§½üÆÚÂòµÄÉÌÆ·£¬½ÓÏÂÀ´Ëû×îÓпÉÄÜÂòʲô?

±ÈÈçÀúÊ·Êý¾Ý¸æËßÎÒÃÇ£¬¹ºÂòÁËÊÖ»ú+µçÄÔµÄÓû§£¬ºóÒ»ÖÜÄÚ×î´ó¿ÉÄܻṺÂò±³°ü£¬ÄÇÎÒÃǾÍÕë¶ÔÄÇЩ½üÆÚ¹ºÂòÁ˵çÄÔ+ÊÖ»úµÄÓû§È¥ÍÆË͵çÄÔ°üµÄÉÌÆ·¸øËû£¬´Ì¼¤ËûµÄDZÔÚ¹æÂÉÐèÇó¡£

# ÏòÁ¿¿â
rbind_data = pd.concat(
[order_data['top1'], order_data['top2'], order_data['top3'], order_data['top4'], order_data['top5'],
order_data['top6'], order_data['top7'], order_data['top8'], order_data['top9'], order_data['top10']], axis=0)
x = 50
start = []
output = []
score_final = []
for i in range(x):
score = np.array(-100000000000000)
name = np.array(-100000000000000)
newscore = np.array(-100000000000000)
tmp = test_data[i]
k = len(tmp)
last_one = tmp[k - 2]
tmp = tmp[0:(k - 1)]
for j in range(number):
tmp1 = tmp[:]
target = rbind_data_level[j]
tmp1.append(target)
test_data_split = [tmp1]
newscore = model.score(test_data_split)
if newscore > score:
score = newscore
name = tmp1[len(tmp1) - 1]
else:
pass
start.append(last_one)
output.append(name)
score_final.append(score)

3.3 ÁªÏë¼ÇÒäÍÆ¼ö

ÔÚ3.2ÖУ¬ÎÒÃǸù¾ÝÁËÕâ¸öÓû§½üÆÚ¹ºÂòÐÐΪ£¬´ÓÀúÊ·ÒѹºÓû§µÄ¹ºÂòÐÐΪÊý¾Ý·¢ÏÖ¹æÂÉ£¬Ìá¹©ÍÆ¼öµÄÉÌÆ·¡£»¹ÓÐÒ»¸ö½üËÆµÄÂß¼­£¬¾ÍÊÇͨ¹ýÄ¿±êÓû§×î½üÒ»´ÎµÄ¹ºÂòÉÌÆ·½øÐÐÍÆ²â£¬²Î¿¼µÄÊÇÀúÊ·Óû§µÄµ¥´Î¹ºÂò¸½½üµÄÊý¾Ý£¬ÏêϸÈçÏ£º

Õâ¸öʵÏÖÒ²·Ç³£µÄ¼òµ¥£¬Õâ±ß´úÂëÎÒ×Ô¼ºÒ²Ã»ÓÐд£¬¾Í²»ÌùÁË£¬²ÉÓõϹÊÇword2vecÀïÃæµÄpredict_output_word(context_words_list, topn=10)£¬Report the probability distribution of the center word given the context words as input to the trained model

   
5552 ´Îä¯ÀÀ       30
Ïà¹ØÎÄÕÂ

»ùÓÚͼ¾í»ýÍøÂçµÄͼÉî¶Èѧϰ
×Ô¶¯¼ÝÊ»ÖеÄ3DÄ¿±ê¼ì²â
¹¤Òµ»úÆ÷ÈË¿ØÖÆÏµÍ³¼Ü¹¹½éÉÜ
ÏîĿʵս£ºÈçºÎ¹¹½¨ÖªÊ¶Í¼Æ×
 
Ïà¹ØÎĵµ

5GÈ˹¤ÖÇÄÜÎïÁªÍøµÄµäÐÍÓ¦ÓÃ
Éî¶ÈѧϰÔÚ×Ô¶¯¼ÝÊ»ÖеÄÓ¦ÓÃ
ͼÉñ¾­ÍøÂçÔÚ½»²æÑ§¿ÆÁìÓòµÄÓ¦ÓÃÑо¿
ÎÞÈË»úϵͳԭÀí
Ïà¹Ø¿Î³Ì

È˹¤ÖÇÄÜ¡¢»úÆ÷ѧϰ&TensorFlow
»úÆ÷ÈËÈí¼þ¿ª·¢¼¼Êõ
È˹¤ÖÇÄÜ£¬»úÆ÷ѧϰºÍÉî¶Èѧϰ
ͼÏñ´¦ÀíËã·¨·½·¨Óëʵ¼ù