Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Model Center   Code  
»áÔ±   
   
 
     
   
 ¶©ÔÄ
  ¾èÖú
ÈçºÎÓÃNeo4jºÍScikit-Learn×ö»úÆ÷ѧϰÈÎÎñ£¿
 
×÷ÕߣºMark Needham
  2976  次浏览      29
 2020-7-13
 
±à¼­ÍƼö:
±¾ÎĽ«ÒÔ¹¹½¨Ò»¸ö»úÆ÷ѧϰ·ÖÀàÆ÷ÈÎÎñΪÀý£¬´Ó»ù´¡±³¾°ÖªÊ¶¡¢Ëã·¨Ô­Àíµ½Ëã·¨´úÂëʵÏÖ½øÐÐÈ«ÃæµÄ½²½âÓëÖ¸µ¼£¬Ï£Íû¶ÔÄúµÄѧϰÓÐËù°ïÖú¡£
±¾ÎÄÀ´×ÔÓÚAI¿Æ¼¼´ó±¾Óª £¬ÓÉ»ðÁú¹ûÈí¼þAlice±à¼­¡¢ÍƼö¡£

ͼËã·¨²»ÊÇÒ»¸öÐÂÐ˼¼ÊõÁìÓò£¬ÔÚ¿ªÔ´¿âÖÐÒѾ­Óкܶ๦ÄÜÇ¿´óµÄË㷨ʵÏÖ¡£½üÁ½Ä꣬ҵÄÚµÄѧÕßÓë¿ÆÑ§¼Ò¶¼ÔÚ»ý¼«Ì½Ë÷¿ÉÒÔÃÖ²¹Éî¶Èѧϰ²»¿É½âÊÍÐÔ£¬ÎÞ·¨½øÐÐÒò¹ûÍÆ¶ÏµÄÕâ¸öȱÏÝ£¬¶øÍ¼Éñ¾­ÍøÂ磨GNN£©³ÉΪ±¸ÊܹØ×¢ºÍÆÚ´ýµÄ¡°³è¶ù¡±¡£Ëæ×Åѧ½çºÍÒµ½çÔ½À´Ô½¹Ø×¢GNN£¬¸÷ÖÖй¤×÷²»¶Ï±»Ìá³ö£¬»ùÓÚͼÉñ¾­ÍøÂçµÄ¿ò¼ÜËæÖ®²úÉú£¬Èç´ó¼ÒÏÖÔÚ¶¼ÒѾ­ÊìϤµÄDGL£¬Á½´óÉî¶Èѧϰ¿ò¼ÜPyTorchºÍTensorFlowÖÐÒ²¿ªÊ¼Ö§³ÖÏàÓ¦µÄ¹¦ÄÜ£¬´ó¼Ò¶Ôͼ£¨Graph£©¡¢Í¼¼ÆË㡢ͼÊý¾Ý¿â¡¢Í¼»úÆ÷ѧϰµÈÑо¿µÄ¹Ø×¢¶ÈÔ½·¢¸ßÕÇ¡£

»ùÓÚͼÊý¾ÝµÄÓÅÐãÐÔÖÊ£¬ÎüÒýÔ½À´Ô½¶àµÄÆóÒµÔÚ»ùÓÚͼÊý¾ÝµÄ»úÆ÷ѧϰÈÎÎñÖпªÊ¼Í¶ÈëÑо¿ÓëʹÓ㬽«Í¼Êý¾ÝÓë»úÆ÷ѧϰËã·¨½áºÏ£¬ÃÖ²¹Ë㷨ȱÏÝ£¬¸³ÓèÐÂÒ»´úͼÊý¾Ý¿âеÄʹÃü¡£Óв»ÉÙÆóÒµÄÚ²¿×ÔÑÐͼÊý¾Ý¿âÓëͼ·ÖÎö¼ÆËãÆ½Ì¨£¬µ«ÊÇ¿ÉÖ±½ÓʹÓõĿªÔ´»ò³ÉÊ칤¾ß²¢²»ÍêÉÆ£¬¶ÔûÓÐÄÜÁ¦×ÔÑÐµÄÆóÒµÀ´Ëµ£¬»ùÓÚͼÊý¾ÝµÄ»úÆ÷ѧϰ¸ÃÔõô×ö£¿¹¤³ÌʦÔÚ×Ô¼ºµÄÑо¿ÖÐÓÐʲô¿ÉÐеij¢ÊÔ·½·¨£¿

½ñÌìµÄÎÄÕÂÖУ¬Í¨¹ý´ó¼Ò¶¼·Ç³£ÊìϤµÄÁ½¸ö¹¤¾ß¡ª¡ªÍ¼Êý¾Ý¿â Neo4JºÍScikit-Learning ÌṩһÖÖ½â¾ö˼·¡£ÎÒÃǽ«ÒÔ¹¹½¨Ò»¸ö»úÆ÷ѧϰ·ÖÀàÆ÷ÈÎÎñΪÀý£¬´Ó»ù´¡±³¾°ÖªÊ¶¡¢Ëã·¨Ô­Àíµ½Ëã·¨´úÂëʵÏÖ½øÐÐÈ«ÃæµÄ½²½âÓëÖ¸µ¼¡£

Êý¾Ý¿â Neo4J

Êý¾Ý¿â Neo4J ÊÇÒ»ÖÖͼÐÎÊý¾Ý¿â£¬Ä¿Ç°¼¸¸öÖ÷Á÷ͼÊý¾Ý¿âÓÐ TigerGraph¡¢Neo4j¡¢Amazon Neptune¡¢JanusGraphºÍArangoDB£¬½üÄêÀ´£¬Neo4JһֱλÁÐͼÊý¾Ý¿âÅÅÐаñ°ñÊ×£¬Ëæ×ÅÕ⼸Äê֪ʶͼÆ×µÄ»ðÈÈ·¢Õ¹£¬ÈÃÊý¾Ý¿â Neo4JÊܵ½¹ã·º¹Ø×¢¡£

Neo4J Ö÷Òª»ùÓÚCypherÓïÑÔ£¬»ùÓÚGraph Algorithm ʵÏÖͼ·ÖÎöËã·¨¡£»ñÈ¡°²×°Neo4j DesktopÒ²·Ç³£ÈÝÒ×£¬Ö»ÐèÒ»¼ü¡£

Neo4j Desktop µØÖ·£º

https://neo4j.com/download/

ÕâÀïÔÙ¸ø´ó¼ÒÍÆ¼öÖ÷Òª»ùÓÚ Neo4JʵÏֵݸÀýËã·¨Êé¡¶Graph Algorithms¡·£¬Æä×÷Õß Amy Holder ºÍ Mark NeedhamÒ²ÊÇ Neo4jµÄÔ±¹¤¡£

ÔÚÏßÔĶÁµØÖ·£º

https://neo4j.com/docs/graph-algorithms/current/

ͼÊý¾Ý¿â¶ÔÓÚ·ÖÎöÒì¹¹Êý¾ÝµãÖ®¼äµÄ¹ØÏµÌرðµÄÓÐÓã¬ÀýÈç·ÀÆÛÕ©»òFacebookµÄºÃÓѹØÏµÍ¼£¬ÒÔÔÚÉç½»ÍøÂç¹ØÏµµÄÔ¤²âÈÎÎñΪÀý£¬¸´Ôӵģ¨Éç½»£©ÍøÂçÒ»¸ö×îÖØÒªµÄ»ù±¾¹¹³ÉÊÇÁ´½Ó£¬ÔÚÉç½»¹ØÏµÍøÂçÖлùÓÚÒÑÓнڵãºÍÁ´½Ó¹¹³ÉµÄÍøÂçÐÅÏ¢£¬Ô¤²âDZÔÚ¹ØÏµ£¬Õâ±³ºóÒ»¸öºËÐĵÄËã·¨¾ÍÊÇÁ´Â·Ô¤²âËã·¨¡£ÕâÒ²ÊÇÎÒÃǽñÌìÎÄÕÂÖеĺËÐÄËã·¨£¬Neo4JͼËã·¨¿âÖ§³ÖÁ˶àÖÖÁ´Â·Ô¤²âËã·¨£¬ÔÚ³õʶNeo4J ºó£¬ÎÒÃǾͿªÊ¼²½ÈëÁ´Â·Ô¤²âËã·¨µÄѧϰ£¬ÒÔ¼°ÈçºÎ½«Êý¾Ýµ¼ÈëNeo4JÖУ¬Í¨¹ýScikit-LearningÓëÁ´Â·Ô¤²âËã·¨£¬´î½¨»úÆ÷ѧϰԤ²âÈÎÎñÄ£ÐÍ¡£

Á´Â·Ô¤²âËã·¨

£¨Ò»£©Ê²Ã´ÊÇÁ´Â·Ô¤²â£¿

Á´Â·Ô¤²âÒѾ­±»Ìá³öºÜ¶àÄêÁË¡£2004Ä꣬ÓÉ Jon Kleinberg ºÍ David Liben-Nowell ·¢±íÏà¹ØÂÛÎÄÖ®ºó£¬Á´Â·Ô¤²â²Å±»ÆÕ¼°¿ªÀ´¡£ËûÃǵÄÂÛÎÄΪ¡¶The Link Prediction Problem for Social Networks¡·

Ëæºó£¬Kleinberg ºÍ Liben-Nowell Ìá³ö´ÓÉç½»ÍøÂçµÄ½Ç¶ÈÀ´½â¾öÁ´Â·Ô¤²âÎÊÌ⣬ÈçÏÂËùÊö£º

Èô¸ø¶¨Ò»¸öÉç½»ÍøÂçµÄ¿ìÕÕ£¬ÎÒÃÇÄÜÔ¤²â³ö¸ÃÍøÂçÖеijÉÔ±ÔÚδÀ´¿ÉÄܳöÏÖÄÄЩÐµĹØÏµÂð£¿ÎÒÃÇ¿ÉÒÔ°ÑÕâ¸öÎÊÌâ¿´×÷Á´Â·Ô¤²âÎÊÌ⣬Ȼºó¶ÔÍøÂçÖи÷½ÚµãµÄÏàËÆ¶È½øÐзÖÎö£¬´Ó¶øµÃ³öÔ¤²âÁ´Â·µÄ·½·¨¡£

ºóÀ´£¬Jim Webber ²©Ê¿ÔÚ GraphConnect San Francisco 2015 ´ó»áÉϽéÉÜÁËͼËã·¨µÄ·¢Õ¹Àú³Ì£¬ËûÓÃͼÀíÂÛ½²½âÁ˵ڶþ´ÎÊÀ½ç´óÕ½¡£

³ýÁËÔ¤²âÊÀ½ç´óÕ½ºÍÉç½»ÍøÂçÖеÄÅóÓѹØÏµ£¬ÎÒÃÇ»¹¿ÉÄÜÔÚʲô³¡¾°Óõ½¹ØÏµÔ¤²âÄØ£¿ÎÒÃÇ¿ÉÒÔÔ¤²â¿Ö²À×éÖ¯³ÉÔ±Ö®¼äµÄ¹ØÏµ£¬ÉúÎïÍøÂçÖзÖ×Ó¼äµÄ¹ØÏµ£¬ÒýÎÄÍøÂçÖÐDZÔڵĹ²Í¬´´×÷¹ØÏµ£¬¶ÔÒÕÊõ¼Ò»òÒÕÊõÆ·µÄÐËȤµÈµÈ£¬ÕâЩ³¡¾°¶¼¿ÉÄÜÓõÃÉÏÁ´Â·Ô¤²â¡£

Á´Â·µÄÔ¤²â¶¼Òâζ×ŶÔδÀ´¿ÉÄÜ·¢ÉúµÄÐÐΪ½øÐÐÔ¤²â£¬±ÈÈçÔÚÒ»¸öÒýÎÄÍøÂçÖУ¬ÎÒÃÇÊÇÔÚ¶ÔÁ½¸öÈËÊÇ·ñ¿ÉÄܺÏ×÷дһƪÂÛÎĽøÐÐÔ¤²â¡£

£¨¶þ£©Á´Â·Ô¤²âËã·¨

Kleinberg ºÍ Liben-Nowell ½éÉÜÁËһϵÁпÉÒÔÓÃÓÚÁ´Â·Ô¤²âµÄËã·¨£¬ÈçÏÂͼËùʾ£º

Kleinberg ºÍ Liben-Nowell ÔÚÂÛÎÄÖÐËù½éÉܵÄËã·¨

ÕâЩ·½·¨¶¼ÊǼÆËãÒ»¶Ô½ÚµãµÄ·ÖÊý£¬¸Ã·ÖÊý¿É¿´×÷ΪÄÇЩ½Úµã»ùÓÚÍØÆËÍøÂçµÄ¡°½üËÆ¶È¡±¡£Á½¸ö½ÚµãÔ½Ïà½ü£¬ËüÃÇÖ®¼ä´æÔÚÁªÏµµÄ¿ÉÄÜÐÔ¾ÍÔ½´ó¡£

ÏÂÃæÎÒÃÇÀ´¿´¿´¼¸¸öÆÀ¹À±ê×¼£¬ÒÔ±ãÓÚÎÒÃÇÀí½âËã·¨µÄÔ­Àí¡£

£¨Èý£©Ëã·¨ÆÀ¹À±ê×¼

1¡¢¹²Í¬ÁÚ¾ÓÊý

×î¼òµ¥µÄ¶ÈÁ¿·½·¨Ö®Ò»ÊǼÆË㹲ͬÁÚ¾ÓÊý£¬¶ÔÕâ¸ö¸ÅÄAhmad Sadraei µÄ½âÊÍÈçÏ£º

×÷ΪԤ²âÒò×Ó£¬¹²Í¬ÁÚ¾ÓÊý¿ÉÒÔ²¶×½µ½ÓµÓÐͬһ¸öÅóÓѵÄÁ½¸öİÉúÈË£¬¶øÕâÁ½¸öÈË¿ÉÄܻᱻÕâ¸öÅóÓѽéÉÜÈÏʶ£¨Í¼ÖгöÏÖÒ»¸ö±ÕºÏµÄÈý½ÇÐΣ©¡£

Õâ¸ö¶ÈÁ¿±ê×¼¼ÆËãÁËÒ»¶Ô½ÚµãËù¹²ÏíµÄÏàͬÁÚ¾ÓÊýÄ¿¡£ÈçÏÂͼËùʾ£¬½Úµã A ºÍ D ÓÐÁ½¸ö¹²Í¬ÁÚ¾Ó£¨½Úµã B ºÍ C£©£¬¶ø½Úµã A ºÍ E Ö»ÓÐÒ»¸ö¹²Í¬ÁÚ¾Ó£¨½Úµã B£©¡£Òò´Ë£¬ÎÒÃÇÈÏΪ½Úµã A ºÍ D ¸üÏà½ü£¬Î´À´¸üÓпÉÄܲúÉú¹ØÁª¡£

2¡¢Adamic Adar£¨AA Ö¸±ê£©

ÔçÔÚ2003Ä꣬Lada Adamic ºÍ Eytan Adar ÔÚÑо¿Éç½»ÍøÂçµÄÔ¤²âÎÊÌâʱ£¬Ìá³öÁË Adamic Adar Ëã·¨¡£AA Ö¸±êÒ²¿¼ÂÇÁ˹²Í¬ÁھӵĶÈÐÅÏ¢£¬µ«³ýÁ˹²Í¬ÁÚ¾Ó£¬»¹¸ù¾Ý¹²Í¬ÁھӵĽڵãµÄ¶È¸øÃ¿¸ö½Úµã¸³ÓèÒ»¸öÈ¨ÖØ£¬¼´¶ÈµÄ¶ÔÊý·ÖÖ®Ò»£¬È»ºó°Ñÿ¸ö½ÚµãµÄËùÓй²Í¬ÁÚ¾ÓµÄÈ¨ÖØÖµÏà¼Ó£¬ÆäºÍ×÷Ϊ¸Ã½Úµã¶ÔµÄÏàËÆ¶ÈÖµ¡£

½ÚµãµÄ¶ÈÖ¸ËüµÄÁÚ¾ÓÊý£¬¸ÃËã·¨µÄ³õÖÔÊÇ£ºµ±Í¼ÖгöÏÖÒ»¸ö±ÕºÏµÄÈý½Çʱ£¬ÄÇЩ¶ÈÊýµÍµÄ½Úµã¿ÉÄÜÓиü´óµÄÓ°ÏìÁ¦¡£±ÈÈçÔÚÒ»¸öÉç½»ÍøÂçÖУ¬ÓÐÁ½¸öÈËÊDZ»ËûÃǵĹ²Í¬ºÃÓѽéÉÜÈÏʶµÄ£¬·¢ÉúÕâÖÖ¹ØÁªµÄ¿ÉÄÜÐÔºÍÕâ¸öÈË»¹ÓжàÉÙ¶ÔÅóÓÑÓйء£Ò»¸ö¡°ÅóÓѲ»¶à¡±µÄÈ˸üÓпÉÄܽéÉÜËûµÄÒ»¶ÔÅóÓÑÈÏʶ¡£

3¡¢ÓÅÏÈÁ¬½Ó

¶ÔÓÚͼËã·¨Ñо¿ÕßÀ´Ëµ£¬ÕâÓ¦¸ÃÊÇ×î³£¼ûµÄ¸ÅÄîÖ®Ò»£¬×î³õÓÉ Albert-L¨¢szl¨® Barab¨¢si ºÍ R¨¦ka Albert Ìá³ö£¬µ±Ê±ËûÃÇÕýÔÚ½øÐÐÓйØÎ޳߶ÈÍøÂçµÄÑо¿¡£¸ÃËã·¨µÄÉè¼Æ³õÖÔÊÇ£¬Ò»¸ö½ÚµãÓµÓеĹØÏµÔ½¶à£¬Î´À´»ñµÃ¸ü¶à¹ØÁªµÄ¿ÉÄÜÐÔ¾ÍÔ½´ó¡£ÕâÊǼÆËãÆðÀ´×î¼òµ¥µÄ¶ÈÁ¿±ê×¼£¬ÎÒÃÇÖ»ÐèÒª¼ÆËãÿ¸ö½ÚµãµÄ¶ÈÊýµÄ³Ë»ý¡£

£¨ËÄ£©Á´Â·Ô¤²â - Neo4j ͼËã·¨¿â

Ŀǰ£¬Neo4j ͼËã·¨¿âº­¸ÇÁË6ÖÖÁ´Â·Ô¤²âËã·¨£ºAdamic Adar Ëã·¨¡¢¹²Í¬ÁÚ¾ÓËã·¨£¨ Common Neighbors£©¡¢ÓÅÏÈÁ¬½ÓËã·¨£¨Preferential Attachment£©¡¢×ÊÔ´·ÖÅäËã·¨£¨Resource Allocation£©¡¢¹²Í¬ÉçÇøËã·¨£¨Same Community£©¡¢×ÜÁÚ¾ÓËã·¨£¨Total Neighbors£©¡£

¿ìËÙѧϰһÏÂÒÔÏÂÎåÖÖËã·¨µÄÔ­Àí£º

£¨1£©Adamic Adar£º¼ÆË㹲ͬÁھӵĶÈÊýµÄ¶ÔÊý·ÖÖ®Ò»£¬²¢ÇóºÍ¡£

£¨2£©ÓÅÏÈÁ¬½ÓËã·¨£º¼ÆËãÿ¸ö½ÚµãµÄ¶ÈÊýµÄ³Ë»ý¡£

£¨3£©×ÊÔ´·ÖÅäËã·¨£º¼ÆË㹲ͬÁھӵĶÈÊý·ÖÖ®Ò»£¬²¢ÇóºÍ¡£

£¨4£©¹²Í¬ÉçÇøËã·¨£ºÀûÓÃÉçÇø·¢ÏÖËã·¨£¬¼ì²éÁ½¸ö½ÚµãÊÇ·ñ´¦ÓÚͬһ¸öÉçÇø¡£

£¨5£©×ÜÁÚ¾ÓËã·¨£º¼ÆËãÁ½¸ö½ÚµãËùÓµÓеIJ»Í¬ÁÚ¾ÓµÄÊýÄ¿¡£

ÏÖÔÚÀ´¿´Ò»ÏÂÈçºÎʹÓÿâÖеĹ²Í¬ÁÚ¾Óº¯Êý£¬ÒÔ֮ǰÌáµ½µÄͼ¹ØÏµ×÷ΪÀý×Ó¡£

Ê×ÏÈÖ´ÐÐ Cypher Óï¾ä£¬ÔÚ Neo4j Öд´½¨Ò»¸öͼ£º

UNWIND [["A", "C"], ["A", "B"], ["B", "D"],
["B", "C"], ["B", "E"], ["C", "D"]] AS pair
MERGE (n1:Node {name: pair[0]})
MERGE (n2:Node {name: pair[1]})
MERGE (n1)-[:FRIENDS]-(n2)

È»ºóÓÃÏÂÃæµÄº¯ÊýÀ´¼ÆËã½Úµã A ºÍ D µÄ¹²Í¬ÁÚ¾ÓÊý£º

neo4j> MATCH (a:Node {name: 'A'})
MATCH (d:Node {name: 'D'})
RETURN algo.linkprediction.commonNeighbors(a, d);
+-------------------------------------------+
| algo.linkprediction.commonNeighbors(a, d) |
+-------------------------------------------+
| 2.0 |
+-------------------------------------------+
1 row available after 97 ms, consumed after another 15 ms

ÕâЩ½ÚµãÓÐÁ½¸ö¹²Í¬ÁÚ¾Ó£¬ËùÒÔËüÃǵĵ÷ÖΪ2¡£ÏÖÔÚ¶Ô½Úµã A ºÍ E ½øÐÐͬÑùµÄ¼ÆËã¡£ÒòΪËüÃÇÖ»ÓÐÒ»¸ö¹²Í¬ÁÚ¾Ó£¬²»³öÒâÍâÎÒÃǵõ½µÄ·ÖÊýÓ¦¸ÃΪ1¡£

neo4j> MATCH (a:Node {name: 'A'})
MATCH (e:Node {name: 'E'})
RETURN algo.linkprediction.commonNeighbors(a, e);
+-------------------------------------------+
| algo.linkprediction.commonNeighbors(a, e) |
+-------------------------------------------+
| 1.0 |

ÈçÎÒÃÇËùÁÏ£¬µÃ·ÖȷʵΪ1¡£¸Ãº¯ÊýĬÈϵļÆË㷽ʽº­¸ÇÈÎÒâµÄÀàÐÍÒÔ¼°Ö¸Ïò¡£ÎÒÃÇÒ²¿ÉÒÔͨ¹ý´«ÈëÌØ¶¨µÄ²ÎÊýÀ´½øÐмÆË㣺

neo4j> WITH {direction: "BOTH", relationshipQuery: "FRIENDS"}
AS config
MATCH (a:Node {name: 'A'})
MATCH (e:Node {name: 'E'})
RETURN algo.linkprediction.commonNeighbors(a, e, config)
AS score;
+-------+
| score |
+-------+
| 1.0 |
+-------+

ΪÁËÈ·±£µÃµ½×¼È·µÄ½á¹û£¬ÎÒÃÇÔÙÊÔÊÔÁíÒ»ÖÖËã·¨¡£

ÓÅÏÈÁ¬½Óº¯Êý·µ»ØµÄÊÇÁ½¸ö½Úµã¶ÈÊýµÄ³Ë»ý¡£Èç¹ûÎÒÃǶԽڵã A ºÍ D ½øÐмÆË㣬»áµÃµ½ 2*2=4 µÄ½á¹û£¬ÒòΪ½Úµã A ºÍ D ¶¼ÓÐÁ½¸öÁÚ¾Ó¡£ÏÂÃæÀ´ÊÔÒ»ÊÔ£º

neo4j> MATCH (a:Node {name: 'A'})
MATCH (d:Node {name: 'D'})
RETURN algo.linkprediction.preferentialAttachment(a, d)
AS score;
+-------+
| score |
+-------+
| 4.0 |
+-------+

£¨Î壩Á´Â·Ô¤²âËùµÃµÄ·ÖÊýÓкÎÓã¿

ÏÖÔÚÎÒÃÇÒѾ­Á˽âÓйØÁ´Â·Ô¤²âºÍÏàËÆ¶ÈÖ¸±êµÄ»ù±¾ÖªÊ¶ÁË£¬µ«»¹ÐèҪŪÃ÷°×ÈçºÎʹÓÃÕâЩָ±ê½øÐÐÁ´Â·Ô¤²â¡£ÓÐÒÔÏÂÁ½ÖÖ·½·¨£º

1¡¢Ö±½ÓʹÓÃÖ¸±ê

ÎÒÃÇ¿ÉÒÔÖ±½ÓʹÓÃÓÉÁ´Â·Ô¤²âËã·¨µÃµ½µÄ·ÖÊý£¬¼´ÉèÖÃÒ»¸öãÐÖµ£¬ÕâÑù¾Í¿ÉÒÔÔ¤²âÒ»¶Ô½ÚµãÊÇ·ñ¿ÉÄÜ´æÔÚ¹ØÏµÁË¡£

ÔÚÉÏÃæµÄÀý×ÓÖУ¬ÎÒÃÇ¿ÉÒÔÉ趨ÿһ¶ÔÓÅÏÈÁ¬½Ó·ÖÊýÔÚ3·ÖÒÔÉϵĽڵ㶼¿ÉÄÜ´æÔÚ¹ØÁª£¬¶øÄÇЩµÃ·ÖСÓÚ»òµÈÓÚ3·ÖµÄ½Úµã¶ÔÔò²»´æÔÚ¹ØÁª¡£

2¡¢Óмලѧϰ

ÎÒÃÇ¿ÉÒÔ°Ñ·ÖÊý×÷ÎªÌØÕ÷ȥѵÁ·Ò»¸ö¶þ·ÖÀàÆ÷£¬´Ó¶ø½øÐÐÓмලѧϰ¡£È»ºóÓÃÕâ¸ö¶þ·ÖÀàÆ÷È¥Ô¤²âÒ»¶Ô½ÚµãÊÇ·ñ´æÔÚ¹ØÁª¡£

ÔÚÕâ¸öϵÁн̳ÌÖУ¬ÎÒÃÇ»áÖØµã½éÉÜÓмලѧϰµÄ·½·¨¡£

¹¹½¨»úÆ÷ѧϰ·ÖÀàÆ÷

¼ÈÈ»ÎÒÃǾö¶¨Ê¹ÓÃÓмලѧϰµÄ·½·¨£¬ÄÇô¾ÍÐèÒª¿¼ÂÇÓйػúÆ÷ѧϰ¹¤×÷Á÷µÄÁ½¸öÎÊÌ⣺

£¨1£©¾ßÌåҪʹÓÃʲô»úÆ÷ѧϰģÐÍ£¿

£¨2£©ÈçºÎ½«Êý¾Ý·Ö³ÉѵÁ·¼¯ºÍ²âÊÔ¼¯£¿

£¨Ò»£©»úÆ÷ѧϰģÐÍ

Ç°ÃæÌáµ½µÄÁ´Â·Ô¤²âÖ¸±ê¶¼ÊǶÔÏàËÆµÄÊý¾Ý½øÐмÆË㣬µ«Èç¹ûÑ¡ÔñʹÓûúÆ÷ѧϰģÐÍ£¬Òâζ×ÅÎÒÃÇÐèÒª½â¾öÌØÕ÷¼äµÄ¹ØÁªÎÊÌâ¡£

ÓÐЩ»úÆ÷ѧϰģÐÍĬÈÏÆä´¦ÀíµÄÌØÕ÷¶¼ÊÇÏ໥¶ÀÁ¢µÄ¡£ÈôÒ»¸öÄ£Ð͵õ½µÄÌØÕ÷²»Âú×ã¸Ã¼ÙÉ裬Ôò»áµ¼ÖÂÔ¤²â½á¹ûµÄ׼ȷ¶ÈºÜµÍ¡£ÎÞÂÛÎÒÃÇÑ¡ÔñʲôģÐÍ£¬¶¼ÐèҪȥ³ýµôÄÇЩ¸ß¶ÈÏà¹ØµÄÌØÕ÷¡£

ÎÒÃÇ»¹¿ÉÒÔѡһ¸ö¼òµ¥·½°¸£¬Ê¹ÓÃÄÇЩ¶ÔÌØÕ÷Ïà¹ØÐÔ²»ÄÇôÃô¸ÐµÄÄ£ÐÍ¡£

һЩ¼¯³É·½·¨ÊÇÐеÃͨµÄ£¬ÒòΪËûÃǶÔÊäÈëÊý¾ÝûÓÐÕâÑùµÄÒªÇ󣬱ÈÈçÌݶÈÌáÉý·ÖÀàÆ÷£¨gradient boosting classifier£©»òÕß Ëæ»úÉ­ÁÖ·ÖÀàÆ÷£¨random forest classifier£©¡£

£¨¶þ£©ÑµÁ·¼¯ºÍ²âÊÔ¼¯

±È½Ï¼¬ÊÖµÄÎÊÌâÊÇѵÁ·¼¯ºÍ²âÊÔ¼¯µÄÇз֣¬ÎÒÃDz»ÄÜÖ»½øÐÐËæ»úÇз֣¬ÒòΪÕâ¿ÉÄܵ¼ÖÂÊý¾Ýй¶¡£

µ±Ä£ÐͲ»Ð¡ÐÄÓõ½ÑµÁ·¼¯ÒÔÍâµÄÊý¾Ýʱ£¬¾Í»á·¢ÉúÊý¾Ýй¶¡£ÕâÔÚͼ¼ÆËãÖкÜÈÝÒ×·¢Éú£¬ÒòΪѵÁ·¼¯ÖеĽڵã¿ÉÄÜÓë²âÊÔ¼¯ÖÐµÄ½Úµã´æÔÚ¹ØÁª¡£

ÎÒÃÇÐèÒª°ÑͼÇзֳÉ×Óͼ×÷ΪѵÁ·¼¯ºÍ²âÊÔ¼¯¡£Èç¹ûͼÊý¾ÝÓÐʱ¼äÕâ¸ö¸ÅÄÄÇÎÒÃǵŤ×÷¾ÍÈÝÒ×¶àÁË£¬ÎÒÃÇ¿ÉÒÔÒÔij¸öʱ¼äµã½øÐзָîµã£¬¸Ãʱ¼äµã֮ǰµÄÊý¾Ý×÷ΪѵÁ·¼¯£¬Ö®ºóµÄÊý¾Ý×÷Ϊ²âÊÔ¼¯¡£

ÕâÈÔÈ»²»ÊÇ×îºÃµÄ½â¾ö·½°¸£¬ÎÒÃÇÐèÒª½øÐг¢ÊÔ£¬È·±£ÑµÁ·¼¯ºÍ²âÊÔ¼¯ÖÐ×ÓͼµÄ´óÖÂÍøÂç½á¹¹ÊÇÏà½üµÄ¡£Ò»µ©×öºÃÕâÒ»²½£¬ÎÒÃǾÍÓµÓÐÁËÓÉÈô¸É´æÔÚ¹ØÁªµÄ½Úµã¶ÔËù×é³ÉµÄѵÁ·¼¯ºÍ²âÊÔ¼¯¡£ËüÃǶ¼ÊôÓÚ»úÆ÷ѧϰģÐÍÖеÄÕýÑù±¾¡£

½ÓÏÂÀ´¿´Ê²Ã´ÊǸºÑù±¾¡£

×î¼òµ¥µÄÇé¿öÊÇ£¬È«²¿½Úµã¶ÔÖ®¼ä¶¼²»´æÔÚ¹ØÁª¡£µ«ÎÊÌâÊÇ£¬ºÜ¶à³¡¾°ÖдæÔÚ¹ØÏµµÄ½Úµã¶ÔÊýĿԶ´óÓÚÄÇЩûÓйØÏµµÄ½Úµã¶Ô¡£

¸ºÑù±¾µÄ×î´óÊýÄ¿ÈçÏ£º

# negative examples = (# nodes)2 - (# relationships) - (# nodes)

Èç¹ûÎÒÃǽ«ÑµÁ·¼¯ÖеÄÈ«²¿¸ºÑù±¾¶¼´úÈëÄ£ÐÍ£¬¾Í»áµ¼ÖÂÑÏÖØµÄÀà±ð²»¾ùºâÎÊÌ⣬¼´¸ºÑù±¾ÊýÔ¶´óÓÚÕýÑù±¾Êý¡£

Èô»ùÓÚÕâÖÖ²»¾ùºâÊý¾Ý¼¯½øÐÐÄ£Ð͵ÄѵÁ·£¬Ö»ÒªÎÒÃÇÔ¤²âÈκνڵã¶Ô¶¼²»´æÔÚ¹ØÁª£¬¾Í¿ÉÒԵõ½·Ç³£²»´íµÄ׼ȷ¶È£¬µ«Õ⵱Ȼ²»ÊÇÎÒÃÇÏëÒªµÄ¡£

ËùÒÔÎÒÃÇÐèÒª¾¡Á¿¼õÉÙ¸ºÑù±¾µÄÊýÄ¿¡£ÓÐÒ»ÖÖ·½·¨±»¶àƪÂÛÎÄÌá¼°¹ý£¬ÄǾÍÊÇÑ¡ÔñÄÇЩ±Ë´Ë¼ä¾àÏàµÈµÄ½Úµã¶Ô¡£ÕâÖÖ·½·¨¿ÉÒÔÓÐЧµØ¼õÉÙ¸ºÑù±¾Êý£¬ËäÈ»¸ºÑù±¾ÊýÈÔȻԶ´óÓÚÕýÑù±¾Êý¡£

ΪÁ˽â¾öÑù±¾²»¾ùºâµÄÎÊÌ⣬ÎÒÃÇÒ²¿ÉÒÔ¶Ô¸ºÑù±¾½øÐÐÇ·²ÉÑù£¬»òÕß¶ÔÕýÑù±¾½øÐйý²ÉÑù¡£

£¨Èý£©´úÂë½Ì³Ì£ºÁ´Â·Ô¤²âʵս

»ùÓÚÉÏÃæ¶ÔÁ´Â·Ô¤²â±³¾°ÖªÊ¶µÄѧϰ£¬×¼±¸ºÃʵ¼ÊÊý¾Ý¼¯ºó£¬ÏÂÃæÎÒÃǾͿªÊ¼Êµ²Ù½Ì³Ì£¬½Ì³Ì½«Íê³ÉÒ»¸öÅжÏÊÇ·ñÊÇÂÛÎĺÏÖøÕß¹ØÏµµÄ»úÆ÷ѧϰԤ²âÄ£ÐÍ¡£

1¡¢Â¼ÈëÒýÓÃÊý¾Ý¿â

ÎÒÃǽ«Ê¹ÓÃÀ´×ÔDBLPÒýÎÄÍøÂçµÄÊý¾Ý£¬ÆäÖаüÀ¨À´×Ô¸÷ÖÖѧÊõÀ´Ô´µÄÒýÎÄÊý¾Ý£¬ÕâÀïÎÒÃÇ»¹ÒªÖØµã¹Ø×¢Ò»Ð©Èí¼þ¿ª·¢»áÒéÉϵÄÊý¾Ý¡£

ͨ¹ýÔËÐÐÒÔÏÂCypherÓï¾äÀ´µ¼Èë¸ÃÊý¾Ý×Ó¼¯¡£Ö»ÒªÔÚNeo4jä¯ÀÀÆ÷ÖÐÆôÓöàÓï¾ä±à¼­Æ÷£¬¾Í¿ÉÒÔÒ»´ÎÈ«²¿ÔËÐС£

// Create constraints
CREATE CONSTRAINT ON (a:Article) ASSERT a.index IS UNIQUE;
CREATE CONSTRAINT ON (a:Author) ASSERT a.name IS UNIQUE;
CREATE CONSTRAINT ON (v:Venue) ASSERT v.name IS UNIQUE;
// Import data from JSON files using the APOC library
CALL apoc.periodic.iterate(
'UNWIND ["dblp-ref-0.json", "dblp-ref-1.json", "dblp-ref-2.json", "dblp-ref-3.json"] AS file
CALL apoc.load.json("https://github.com/mneedham/link-prediction/raw/master/data/" + file)
YIELD value WITH value
RETURN value',
'MERGE (a:Article {index:value.id})
SET a += apoc.map.clean(value,["id","authors","references", "venue"],[0])
WITH a, value.authors as authors, value.references AS citations, value.venue AS venue
MERGE (v:Venue {name: venue})
MERGE (a)-[:VENUE]->(v)
FOREACH(author in authors |
MERGE (b:Author{name:author})
MERGE (a)-[:AUTHOR]->(b))
FOREACH(citation in citations |
MERGE (cited:Article {index:citation})
MERGE (a)-[:CITED]->(cited))',
{batchSize: 1000, iterateList: true});

ÏÂͼÊÇÊý¾Ýµ¼Èëµ½Neo4jºóµÄÏÔʾ£º

2¡¢´î½¨¹²Í¬×÷Õßͼ

¸ÃÊý¾Ý¼¯²»°üº¬ÃèÊöËûÃǵÄЭ×÷µÄ×÷ÕßÖ®¼äµÄ¹ØÏµ£¬µ«ÊÇÎÒÃÇ¿ÉÒÔ¸ù¾Ý²éÕÒ¶à¸öÈË׫дµÄÎÄÕÂÀ´ÍƶÏËûÃÇ¡£ÒÔÏÂCypherÓï¾äÔÚÖÁÉÙ׫д¹ýһƪÎÄÕµÄ×÷ÕßÖ®¼ä´´½¨ÁËCO_AUTHOR¹ØÏµ£º

MATCH (a1)<-[:AUTHOR]-(paper)-[:AUTHOR]->(a2:Author)
WITH a1, a2, paper
ORDER BY a1, paper.year
WITH a1, a2, collect(paper)[0].year AS year,
count(*) AS collaborations
MERGE (a1)-[coauthor:CO_AUTHOR {year: year}]-(a2)
SET coauthor.collaborations = collaborations;

¼´Ê¹ÔÚ¶àÆªÎÄÕÂÖнøÐйýºÏ×÷£¬ÎÒÃÇÒ²Ö»ÄÜÔÚºÏ×÷µÄ×÷ÕßÖ®¼ä´´½¨Ò»ÖÖCO_AUTHOR¹ØÏµ¡£ÎÒÃÇÔÚÕâЩ¹ØÏµÉÏ´´½¨¼¸¸öÊôÐÔ£º

£¨1£©Äê·ÝÊôÐÔ£¬Ö¸ºÏ×÷ÕßÃǹ²Í¬Íê³ÉµÄµÚһƪÎÄÕµijö°æÄê·Ý

£¨2£©ºÏ×÷ÊôÐÔ£¬Ö¸×÷ÕßÃǺÏ×÷¹ý¶àÉÙÆªÎÄÕÂ

Neo4j ÖеĹ²Í¬×÷Õß

ÏÖÔÚÒѾ­ÓÐÁ˺ÏÖøÕß¹ØÏµÍ¼±í£¬ÎÒÃÇÐèҪŪÇå³þÈçºÎÔ¤²â×÷ÕßÖ®¼äδÀ´ºÏ×÷µÄ¿ÉÄÜÐÔ£¬ÎÒÃǽ«¹¹½¨Ò»¸ö¶þ½øÖÆ·ÖÀàÆ÷À´Ö´Ðд˲Ù×÷£¬Òò´ËÏÂÒ»²½ÊÇ´´½¨ÑµÁ·Í¼ºÍ²âÊÔͼ¡£

3¡¢ÑµÁ·ºÍ²âÊÔÊý¾Ý¼¯

¸ù¾ÝÉÏÃæµÄ½éÉÜ£¬ÎÒÃDz»Äܽ«Êý¾ÝËæ»ú·ÖΪѵÁ·Êý¾Ý¼¯ºÍ²âÊÔÊý¾Ý¼¯£¬ÒòΪÈç¹û²»Ð¡ÐĽ«ÑµÁ·Êý¾ÝÖ®ÍâµÄÊý¾ÝÓÃÓÚ´´½¨Ä£ÐÍ£¬Ôò¿ÉÄܻᷢÉúÊý¾Ýй©¡£ÕâºÜÈÝÒ×·¢ÉúÔÚʹÓÃͼÐεÄʱºò£¬ÒòΪѵÁ·¼¯ÖеĽڵã¶Ô¿ÉÄÜÓë²âÊÔ¼¯ÖеĽڵãÏàÁ¬¡£

ΪÁ˽â¾öÕâ¸öÎÊÌ⣬ÎÒÃÇÐèÒª½«ÎÒÃǵÄͼ·ÖΪѵÁ·Í¼ºÍ²âÊÔ×Óͼ£¬ÐÒÔ˵ÄÊÇÒýÎÄͼÖаüº¬ÎÒÃÇ¿ÉÒÔ·Ö¸îµÄʱ¼äÐÅÏ¢¡£ÎÒÃÇ¿ÉÒÔͨ¹ý²ð·ÖÌØ¶¨Äê·ÝµÄÊý¾ÝÀ´´´½¨ÑµÁ·Í¼ºÍ²âÊÔͼ¡£µ«ÊÇ£¬ÎÒÃÇÓ¦¸Ã·Ö¿ªÄÄÒ»ÄêÄØ£¿ÏÈÀ´¿´¿´ºÏ×÷Õß¹²Í¬ºÏ×÷µÄµÚÒ»ÄêµÄ·Ö²¼Çé¿ö£º

£¨Ã¿ÄêµÄºÏ×÷Êý·Ö²¼Í¼£©

¿´ÆðÀ´ÎÒÃÇÓ¦¸ÃÔÚ2016Äê½øÐвð·Ö£¬ÎªÎÒÃǵÄÿ¸ö×ÓͼÌṩºÏÀíÊýÁ¿µÄÊý¾Ý£¬½«2005Äê֮ǰ¿ªÊ¼µÄËùÓкÏÖøÕß×÷ΪѵÁ·Í¼£¬2006ÄêÒÔºóµÄÔò×÷Ϊ²âÊÔͼ¡£

»ùÓÚ¸ÃÄêÔÚͼ±íÖд´½¨Ã÷È·µÄCO_AUTHOR_EARLYºÍCO_AUTHOR_LATE¹ØÏµ¡£ÒÔÏ´úÂ뽫ΪÎÒÃÇ´´½¨ÕâЩ¹ØÏµ£º

ѵÁ·×Óͼ

MATCH (a)-[r:CO_AUTHOR]->(b)
WHERE r.year < 2006
MERGE (a)-[:CO_AUTHOR_EARLY {year: r.year}]-(b);

²âÊÔ×Óͼ

MATCH (a)-[r:CO_AUTHOR]->(b)
WHERE r.year >= 2006
MERGE (a)-[:CO_AUTHOR_LATE {year: r.year}]-(b);

ÕâÑù·Ö×éʹÎÒÃÇÔÚ2005Äê֮ǰµÄÔçÆÚͼ±íÖÐÓÐ81,096¸ö¹ØÏµ£¬ÔÚ2006ÄêÖ®ºóµÄºóÆÚͼ±íÖÐÓÐ74,128¸ö¹ØÏµ£¬ÐγÉÁË52-48µÄ±ÈÀý¡£Õâ¸ö±ÈÀý±Èͨ³£²âÊÔÖÐʹÓõıÈÀý¸ßºÜ¶à£¬µ«Õâû¹ØÏµ¡£ÕâЩ×ÓͼÖеĹØÏµ½«×÷ΪѵÁ·ºÍ²âÊÔ¼¯ÖеÄÕýÀý£¬µ«ÎÒÃÇÒ²ÐèҪһЩ¸ºÀý¡£Ê¹Ó÷ñ¶¨Ê¾Àý¿ÉÒÔÈÃÎÒÃǵÄÄ£ÐÍѧϰÈçºÎÇø·ÖÔÚËüÃÇÖ®¼äÁ´½Ó½ÚµãºÍ²»ÔÚËüÃÇÖ®¼äÁ´½Ó½Úµã¡£

ÓëÁ´½ÓÔ¤²âÎÊÌâÒ»Ñù£¬·ñ¶¨Ê¾Àý±È¿Ï¶¨µÄʾÀý¶àµÃ¶à¡£·ñ¶¨Ê¾ÀýµÄ×î´óÊýÁ¿µÈÓÚ£º

# negative examples = (# nodes)2 - (# relationships) - (# nodes)

¼´½ÚµãµÄƽ·½Êý¼õȥͼÐÎËù¾ßÓеĹØÏµÔÙ¼õÈ¥×ÔÉí¹ØÏµ¡£

³ýÁËʹÓü¸ºõËùÓпÉÄܵÄÅä¶ÔÒÔÍ⣬ÎÒÃÇÒ²½«±Ë´ËÖ®¼äÏà¾à2ÖÁ3ÌøµÄ½Úµã½øÐÐÅä¶Ô£¬Õ⽫ΪÎÒÃÇÌṩ¸ü¶à¿É¹ÜÀíµÄÊý¾Ý¡£ÎÒÃÇ¿ÉÒÔͨ¹ýÔËÐÐÒÔÏ´úÂëÀ´Éú³ÉºÍ²éѯÅä¶Ô£º

MATCH (author:Author)
WHERE (author)-[:CO_AUTHOR_EARLY]-()
MATCH (author)-[:CO_AUTHOR_EARLY*2..3]-(other)
WHERE not((author)-[:CO_AUTHOR_EARLY]-(other))
RETURN id(author) AS node1, id(other) AS node2

´Ë²éѯ·µ»Ø4,389,478¸ö·ñ¶¨Ê¾ºÍ81,096¸ö¿Ï¶¨Ê¾£¬ÕâÒâζ×Å·ñ¶¨Ê¾Êǿ϶¨Ê¾µÄ54±¶Ö®¶à¡£

µ«ÈÔÈ»´æÔںܴóµÄ²»Æ½ºâ£¬ÕâÒâζ×ÅÓÃÓÚÔ¤²âÿ¶Ô½ÚµãÁ´½ÓµÄÄ£Ðͽ«·Ç³£²»×¼È·¡£ÎªÁ˽â¾öÕâ¸öÎÊÌ⣬ÎÒÃÇ¿ÉÒÔ¶ÔÕýÀý½øÐÐÉý²ÉÑù»ò¶Ô¸ºÀý½øÐнµ²ÉÑù£¬¿ÉÒÔʹÓÃϲÉÑù·½·¨¡£

4¡¢Py2neo, pandas, scikit-learn

½ÓÏÂÀ´ÎÒÃÇʹÓÃpy2neo£¬pandasºÍscikit-learn¿â£¬È«²¿»ùÓÚPythonÓïÑÔ£¬Í¨¹ýPypi°²×°£º

pip install py2neo==4.1.3 pandas sklearn

£¨1£©py2neoÇý¶¯³ÌÐòʹÊý¾Ý¿ÆÑ§¼ÒÄܹ»ÇáËɵؽ«Neo4jÓëPythonÊý¾Ý¿ÆÑ§Éú̬ϵͳÖеŤ¾ßÏà½áºÏ¡£ÎÒÃǽ«Ê¹Óøÿâ¶ÔNeo4jÖ´ÐÐCypher²éѯ¡£

£¨2£©pandasÊÇBSDÐí¿ÉµÄ¿ª·ÅÔ´´úÂë¿â£¬ÎªPython±à³ÌÓïÑÔÌṩÁ˸ßÐÔÄÜ¡¢Ò×ÓÚʹÓõÄÊý¾Ý½á¹¹ºÍÊý¾Ý·ÖÎö¹¤¾ß¡£

£¨3£©scikit-learnÊÇÒ»¸ö·Ç³£ÊÜ»¶Ó­µÄ»úÆ÷ѧϰ¿â¡£ÎÒÃǽ«Ê¹ÓøÿâÀ´¹¹½¨ÎÒÃǵĻúÆ÷ѧϰģÐÍ¡£

£¨Scikit-Learn workflow ÍØÕ¹°æ£¬À´Ô´ÍøÂ磩

°²×°ÍêÕâЩ¿âºó£¬µ¼ÈëËùÐèµÄ³ÌÐò°ü£¬²¢´´½¨Êý¾Ý¿âÁ¬½Ó£º

from py2neo import Graph
import pandas as pd
graph = Graph("bolt://localhost", auth=("neo4j", "neo4jPassword"))

5¡¢´î½¨ÎÒÃǵÄѵÁ·ºÍ²âÊÔ¼¯

ÏÖÔÚ£¬ÎÒÃÇ¿ÉÒÔ±àдÒÔÏ´úÂëÀ´´´½¨²âÊÔÊý¾Ý¿ò¼Ü£¬ÆäÖаüº¬»ùÓÚÔçÆÚͼÐεÄÕýÀýºÍ¸ºÀý£º

# Find positive examples
train_existing_links = graph.run("""
MATCH (author:Author)-[:CO_AUTHOR_EARLY]->(other:Author)
RETURN id(author) AS node1, id(other) AS node2, 1 AS label
""").to_data_frame()
# Find negative examples
train_missing_links = graph.run("""
MATCH (author:Author)
WHERE (author)-[:CO_AUTHOR_EARLY]-()
MATCH (author)-[:CO_AUTHOR_EARLY*2..3]-(other)
WHERE not((author)-[:CO_AUTHOR_EARLY]-(other))
RETURN id(author) AS node1, id(other) AS node2, 0 AS label
""").to_data_frame()
# Remove duplicates
train_missing_links = train_missing_links.drop_duplicates()
# Down sample negative examples
train_missing_links = train_missing_links.sample(
n=len(train_existing_links))
# Create DataFrame from positive and negative examples
training_df = train_missing_links.append(
train_existing_links, ignore_index=True)
training_df['label'] = training_df['label'].astype('category')

Ò»¸ö²âÊÔÊý¾Ý¼¯µÄÀý×Ó

Ö´ÐÐÏàͬµÄ²Ù×÷À´´´½¨²âÊÔÊý¾Ý¿ò¼Ü£¬µ«ÊÇÕâ´Î½ö¿¼ÂǺóÆÚͼÐÎÖеĹØÏµ£º

# Find positive examples
test_existing_links = graph.run("""
MATCH (author:Author)-[:CO_AUTHOR_LATE]->(other:Author)
RETURN id(author) AS node1, id(other) AS node2, 1 AS label
""").to_data_frame()
# Find negative examples
test_missing_links = graph.run("""
MATCH (author:Author)
WHERE (author)-[:CO_AUTHOR_LATE]-()
MATCH (author)-[:CO_AUTHOR_LATE*2..3]-(other)
WHERE not((author)-[:CO_AUTHOR_LATE]-(other))
RETURN id(author) AS node1, id(other) AS node2, 0 AS label
""").to_data_frame()
# Remove duplicates
test_missing_links = test_missing_links.drop_duplicates()
# Down sample negative examples
test_missing_links = test_missing_links.sample(n=len(test_existing_links))
# Create DataFrame from positive and negative examples
test_df = test_missing_links.append(
test_existing_links, ignore_index=True)
test_df['label'] = test_df['label'].astype('category')

½ÓÏÂÀ´£¬¿ªÊ¼´´½¨»úÆ÷ѧϰģÐÍ¡£

6¡¢Ñ¡Ôñ»úÆ÷ѧϰËã·¨

ÎÒÃǽ«´´½¨Ò»¸öËæ»úÉ­ÁÖ·ÖÀàÆ÷£¬´Ë·½·¨·Ç³£ÊʺÏÊý¾Ý¼¯Öаüº¬Ç¿ÏîºÍÈõÏîµÄÄ£ÐÍ¡£¾¡¹ÜÈõ¹¦ÄÜÓÐʱ»áÓÐËù°ïÖú£¬µ«Ëæ»úÉ­ÁÖ·½·¨¿ÉÒÔÈ·±£ÎÒÃDz»»á´´½¨¹ý¶ÈÄâºÏѵÁ·Êý¾ÝµÄÄ£ÐÍ¡£Ê¹ÓÃÒÔÏ´úÂë´´½¨Ä£ÐÍ£º

from sklearn.ensemble import RandomForestClassifier

classifier = RandomForestClassifier(n_estimators=30, max_depth=10,
random_state=0)

ÏÖÔÚÊÇʱºòÉè¼ÆÒ»Ð©ÓÃÀ´ÑµÁ·Ä£Ð͵ÄÌØÕ÷¡£ÌØÕ÷ÌáÈ¡ÊÇÒ»ÖÖ½«´óÁ¿Êý¾ÝºÍÊôÐÔÌáȡΪһ×é¾ßÓдú±íÐÔµÄÊýÖµ£¨ÌØÕ÷£©µÄ·½·¨¡£ÕâÐ©ÌØÕ÷»á×÷ΪÊäÈëµÄÊý¾Ý£¬ÒÔ±ãÎÒÃÇÇø·ÖѧϰÈÎÎñµÄÀà±ð/Öµ¡£

7¡¢Éú³ÉÁ´½ÓÔ¤²âÌØÕ÷

ʹÓÃÁ´½ÓÔ¤²â¹¦ÄÜÉú³ÉÒ»Ð©ÌØÕ÷£º

def apply_graphy_features(data, rel_type):
query = """
UNWIND $pairs AS pair
MATCH (p1) WHERE id(p1) = pair.node1
MATCH (p2) WHERE id(p2) = pair.node2
RETURN pair.node1 AS node1,
pair.node2 AS node2,
algo.linkprediction.commonNeighbors(
p1, p2, {relationshipQuery: $relType}) AS cn,
algo.linkprediction.preferentialAttachment(
p1, p2, {relationshipQuery: $relType}) AS pa,
algo.linkprediction.totalNeighbors(
p1, p2, {relationshipQuery: $relType}) AS tn
"""
pairs = [{"node1": pair[0], "node2": pair[1]}
for pair in data[["node1", "node2"]].values.tolist()]
params = {"pairs": pairs, "relType": rel_type}

features = graph.run(query, params).to_data_frame()
return pd.merge(data, features, on = ["node1", "node2"])

´Ë¹¦ÄÜ·¢ÆðÒ»¸ö²éѯ£¬¸Ã²éѯ´ÓÌṩµÄDataFrameÖлñÈ¡Åä¶ÔµÄ½Úµã£¬²¢¶Ôÿһ¶Ô½Úµã½øÐÐÒÔϼÆË㣺¹²Í¬ÁÚ¾Ó£¨cn£©¡¢ÓÅÏȸ½¼þ£¨pa£©ÒÔ¼°ÁÚ¾Ó×ÜÊý£¨tn£©

ÈçÏÂËùʾ£¬ÎÒÃÇ¿ÉÒÔ½«ÆäÓ¦ÓÃÓÚÎÒÃǵÄѵÁ·²¢²âÊÔDataFrame£º

training_df = apply_graphy_features(training_df, "CO_AUTHOR_EARLY")
test_df = apply_graphy_features(test_df, "CO_AUTHOR")

¶ÔÓÚѵÁ·Êý¾Ý¿ò¼Ü£¬½ö¸ù¾ÝÔçÆÚͼÐÎÀ´¼ÆËãÕâЩָ±ê£¬¶ø¶ÔÓÚ²âÊÔÊý¾Ý¿ò¼Ü£¬½«ÔÚÕû¸öͼÐÎÖнøÐмÆËã¡£Ò²¿ÉÒÔʹÓÃÕû¸öͼÐÎÀ´¼ÆËãÕâЩ¹¦ÄÜ£¬ÒòΪͼÐεÄÑݱäÈ¡¾öÓÚËùÓÐʱ¼ä£¬¶ø²»½öÈ¡¾öÓÚ2006Äê¼°ÒÔºóµÄÇé¿ö¡£

²âÊÔѵÁ·¼¯

ʹÓÃÒÔÏ´úÂëѵÁ·Ä£ÐÍ£º

columns = ["cn", "pa", "tn"]
X = training_df[columns]
y = training_df["label"]
classifier.fit(X, y)

ÏÖÔÚµÄÄ£ÐÍÒѾ­¾­¹ýѵÁ·ÁË£¬µ«»¹ÐèÒª¶ÔËü½øÐÐÆÀ¹À¡£

8¡¢ÆÀ¹ÀÄ£ÐÍ

ÎÒÃǽ«¼ÆËãÆä׼ȷÐÔ£¬×¼È·ÐÔºÍÕÙ»ØÂÊ£¬¼ÆËã·½·¨¿É²Î¿¼ÏÂͼ£¬scikit-learnÒ²ÄÚÖÃÁ˴˹¦ÄÜ£¬»¹¿ÉÒԵõ½Ä£ÐÍÖÐʹÓõÄÿ¸öÌØÕ÷µÄÖØÒªÐÔ¡£

from sklearn.metrics import recall_score
from sklearn.metrics import precision_score
from sklearn.metrics import accuracy_score

def evaluate_model(predictions, actual):
accuracy = accuracy_score(actual, predictions)
precision = precision_score(actual, predictions)
recall = recall_score(actual, predictions)

metrics = ["accuracy", "precision", "recall"]
values = [accuracy, precision, recall]
return pd.DataFrame(data={'metric': metrics, 'value': values})
def feature_importance(columns, classifier):
features = list(zip(columns, classifier.feature_importances_))
sorted_features = sorted(features, key = lambda x: x[1]*-1)
keys = [value[0] for value in sorted_features]
values = [value[1] for value in sorted_features]
return pd.DataFrame(data={'feature': keys, 'value': values})

ÆÀ¹ÀÄ£ÐÍÖ´ÐдúÂ룺

predictions = classifier.predict(test_df[columns])
y_test = test_df["label"]

evaluate_model(predictions, y_test)

£¨×¼È·ÂÊ£¬¾«×¼¶È£¬Õٻضȣ©

ÔÚ¸÷¸ö·½ÃæµÄµÃ·Ö¶¼ºÜ¸ß¡£ÏÖÔÚ¿ÉÒÔÔËÐÐÒÔÏ´úÂëÀ´²é¿´ÄĸöÌØÕ÷°çÑÝÁË×îÖØÒªµÄ½ÇÉ«£º

feature_importance(columns, classifier)

£¨ÌØÕ÷ÖØÒª¶È£©

ÔÚÉÏÃæÎÒÃÇ¿ÉÒÔ¿´µ½£¬¹«¹²ÁÚ¾Ó£¨cn£©ÊÇÄ£ÐÍÖеÄÖ÷ÒªÖ§ÅäÌØÕ÷¡£¹²Í¬ÁÚ¾ÓÒâζ×Å×÷ÕßÓµÓеÄδ±ÕºÏµÄЭͬÕßÈý½ÇµÄÊýÁ¿µÄ¼ÆÊý£¬Òò´ËÊýÖµÕâô¸ß²¢²»Ææ¹Ö¡£

½ÓÏÂÀ´£¬Ìí¼ÓһЩ´ÓͼÐÎËã·¨Éú³ÉµÄÐÂÌØÕ÷¡£

9¡¢Èý½ÇÐÎÓë¾ÛÀàϵÊý

Ê×ÏÈ£¬ÔÚ²âÊÔͼºÍѵÁ·×ÓͼÉÏÔËÐÐÈý½Ç¼ÆÊýËã·¨¡£¸ÃËã·¨¿É·µ»ØÃ¿¸ö½ÚµãÐγɵÄÈý½ÇÐÎÊýÁ¿ÒÔ¼°Ã¿¸ö½ÚµãµÄ¾ÛÀàϵÊý¡£½ÚµãµÄ¾ÛÀàϵÊý±íʾÆäÁÚ¾ÓÒ²±»Á¬½ÓµÄ¿ÉÄÜÐÔ¡£¿ÉÒÔÔÚNeo4jä¯ÀÀÆ÷ÖÐÔËÐÐÒÔÏÂCypher²éѯ£¬ÒÔÔÚѵÁ·Í¼ÉÏÔËÐдËËã·¨£º

CALL algo.triangleCount('Author', 'CO_AUTHOR_EARLY', {
write:true,
writeProperty:'trianglesTrain',
clusteringCoefficientProperty:'coefficientTrain'});

È»ºóÖ´ÐÐÒÔÏÂCypher²éѯÒÔÔÚ²âÊÔͼÉÏÔËÐУº

CALL algo.triangleCount('Author', 'CO_AUTHOR', {
write:true,
writeProperty:'trianglesTest',
clusteringCoefficientProperty:'coefficientTest'});

ÏÖÔÚ½ÚµãÉÏÓÐ4¸öÐÂÊôÐÔ£ºÈý½ÇѵÁ·£¬ÏµÊýѵÁ·£¬Èý½Ç²âÊÔºÍϵÊý²âÊÔ¡£ÏÖÔÚ£¬ÔÚÒÔϹ¦ÄܵİïÖúÏ£¬½«ËüÃÇÌí¼Óµ½ÎÒÃǵÄѵÁ·ºÍ²âÊÔDataFrameÖУº

def apply_triangles_features(data,triangles_prop,
coefficient_prop):
query = """
UNWIND $pairs AS pair
MATCH (p1) WHERE id(p1) = pair.node1
MATCH (p2) WHERE id(p2) = pair.node2
RETURN pair.node1 AS node1,
pair.node2 AS node2,
apoc.coll.min([p1[$triangles], p2[$triangles]])
AS minTriangles,
apoc.coll.max([p1[$triangles], p2[$triangles]])
AS maxTriangles,
apoc.coll.min([p1[$coefficient], p2[$coefficient]])
AS minCoeff,
apoc.coll.max([p1[$coefficient], p2[$coefficient]])
AS maxCoeff
"""
pairs = [{"node1": pair[0], "node2": pair[1]}
for pair in data[["node1", "node2"]].values.tolist()]
params = {"pairs": pairs,
"triangles": triangles_prop,
"coefficient": coefficient_prop}
features = graph.run(query, params).to_data_frame()
return pd.merge(data, features, on = ["node1", "node2"])

ÕâЩ²ÎÊýÓëÎÒÃǵ½Ä¿Ç°ÎªÖ¹Ê¹ÓõIJ»Í¬£¬ËüÃDz»ÊÇÌØ¶¨ÓÚij¸ö½ÚµãÅä¶ÔµÄ£¬¶øÊÇÕë¶Ôij¸öµ¥Ò»½ÚµãµÄ²ÎÊý¡£²»Äܼòµ¥µØ½«ÕâЩֵ×÷Ϊ½ÚµãÈý½Ç»ò½ÚµãϵÊýÌí¼Óµ½ÎÒÃǵÄDataFrameÖУ¬ÒòΪÎÞ·¨±£Ö¤½ÚµãÅä¶ÔµÄ˳Ðò£¬ÎÒÃÇÐèÒªÒ»ÖÖÓë˳ÐòÎ޹صķ½·¨¡£ÕâÀï¿ÉÒÔͨ¹ýȡƽ¾ùÖµ¡¢ÖµµÄ³Ë»ý»òͨ¹ý¼ÆËã×îСֵºÍ×î´óÖµÀ´ÊµÏÖ´ËÄ¿µÄ£¬Èç´Ë´¦Ëùʾ£º

training_df = apply_triangles_features(training_df,
"trianglesTrain", "coefficientTrain")
test_df = apply_triangles_features(test_df,
"trianglesTest", "coefficientTest")

ÏÖÔÚ¿ÉÒÔѵÁ·ÓëÆÀ¹À£º

columns = [
"cn", "pa", "tn",
"minTriangles", "maxTriangles", "minCoeff", "maxCoeff"
]
X = training_df[columns]
y = training_df["label"]
classifier.fit(X, y)
predictions = classifier.predict(test_df[columns])
y_test = test_df["label"]
display(evaluate_model(predictions, y_test))

£¨×¼È·ÂÊ£¬¾«×¼¶È£¬Õٻضȣ©

ÕâÐ©ÌØÕ÷ºÜÓаïÖú£¡ÎÒÃǵÄÿÏî²ÎÊý¶¼±È³õʼģÐÍÌá¸ßÁËÔ¼4£¥¡£ÄĸöÌØÕ÷×îÖØÒª£¿

display(feature_importance(columns, classifier))

£¨ÌØÕ÷ÖØÒª¶È£©

¹²Í¬ÁÚ¾Ó»¹ÊÇ×î¾ßÓÐÓ°ÏìÁ¦µÄÌØÕ÷£¬µ«Èý½ÇÌØÕ÷µÄÖØÒªÐÔÒ²ÌáÉýÁ˲»ÉÙ¡£

ÕâÆª½Ì³Ì¼´½«½áÊø£¬»ùÓÚÕû¸ö¹¤×÷Á÷³Ì£¬Ï£Íû»¹¿ÉÒÔ¼¤·¢´ó¼Ò¸ü¶àµÄ˼¿¼£º

£¨1£©»¹ÓÐÆäËû¿ÉÌí¼ÓµÄÌØÕ÷Âð£¿ÕâÐ©ÌØÕ÷ÄܰïÖúÎÒÃÇ´´½¨¸ü¸ß׼ȷÐÔµÄÄ£ÐÍÂð£¿Ò²ÐíÆäËûÉçÇø¼ì²âÉõÖÁÖÐÐÄËã·¨Ò²¿ÉÄÜ»áÓÐËù°ïÖú£¿

£¨2£©Ä¿Ç°£¬Í¼ÐÎËã·¨¿âÖеÄÁ´½ÓÔ¤²âËã·¨½öÊÊÓÃÓÚµ¥Áã¼þͼ£¨Á½¸ö½ÚµãµÄ±êÇ©ÏàͬµÄͼ£©£¬¸ÃËã·¨»ùÓÚ½ÚµãµÄÍØÆË£»Èç¹ûÎÒÃdz¢ÊÔ½«ÆäÓ¦ÓÃÓÚ¾ßÓв»Í¬±êÇ©µÄ½Úµã£¨ÕâЩ½Úµã¿ÉÄܾßÓв»Í¬µÄÍØÆË£©£¬Õâ¾ÍÒâζ×Å´ËËã·¨ÎÞ·¨ºÜºÃµØ·¢»Ó×÷Óã¬ËùÒÔĿǰҲÔÚ¿¼ÂÇÌí¼ÓÊÊÓÃÓÚÆäËûͼ±íµÄÁ´½ÓÔ¤²âËã·¨µÄ°æ±¾.

 

 

   
2976 ´Îä¯ÀÀ       29
Ïà¹ØÎÄÕÂ

»ùÓÚͼ¾í»ýÍøÂçµÄͼÉî¶Èѧϰ
×Ô¶¯¼ÝÊ»ÖеÄ3DÄ¿±ê¼ì²â
¹¤Òµ»úÆ÷ÈË¿ØÖÆÏµÍ³¼Ü¹¹½éÉÜ
ÏîĿʵս£ºÈçºÎ¹¹½¨ÖªÊ¶Í¼Æ×
 
Ïà¹ØÎĵµ

5GÈ˹¤ÖÇÄÜÎïÁªÍøµÄµäÐÍÓ¦ÓÃ
Éî¶ÈѧϰÔÚ×Ô¶¯¼ÝÊ»ÖеÄÓ¦ÓÃ
ͼÉñ¾­ÍøÂçÔÚ½»²æÑ§¿ÆÁìÓòµÄÓ¦ÓÃÑо¿
ÎÞÈË»úϵͳԭÀí
Ïà¹Ø¿Î³Ì

È˹¤ÖÇÄÜ¡¢»úÆ÷ѧϰ&TensorFlow
»úÆ÷ÈËÈí¼þ¿ª·¢¼¼Êõ
È˹¤ÖÇÄÜ£¬»úÆ÷ѧϰºÍÉî¶Èѧϰ
ͼÏñ´¦ÀíËã·¨·½·¨Óëʵ¼ù
×îл¼Æ»®
DeepSeek´óÄ£ÐÍÓ¦Óÿª·¢ 6-12[ÏÃÃÅ]
È˹¤ÖÇÄÜ.»úÆ÷ѧϰTensorFlow 6-22[Ö±²¥]
»ùÓÚ UML ºÍEA½øÐзÖÎöÉè¼Æ 6-30[±±¾©]
ǶÈëʽÈí¼þ¼Ü¹¹-¸ß¼¶Êµ¼ù 7-9[±±¾©]
Óû§ÌåÑé¡¢Ò×ÓÃÐÔ²âÊÔÓëÆÀ¹À 7-25[Î÷°²]
ͼÊý¾Ý¿âÓë֪ʶͼÆ× 8-23[±±¾©]
 
×îÐÂÎÄÕÂ
¶àÄ¿±ê¸ú×Ù£ºAI²úÆ·¾­ÀíÐèÒªÁ˽âµÄCVͨʶ
Éî¶Èѧϰ¼Ü¹¹
¾í»ýÉñ¾­ÍøÂç֮ǰÏò´«²¥Ëã·¨
´Ó0µ½1´î½¨AIÖÐ̨
¹¤Òµ»úÆ÷ÈË¿ØÖÆÏµÍ³¼Ü¹¹½éÉÜ
×îпγÌ
È˹¤ÖÇÄÜ£¬»úÆ÷ѧϰºÍÉî¶Èѧϰ
È˹¤ÖÇÄÜÓë»úÆ÷ѧϰӦÓÃʵս
È˹¤ÖÇÄÜ-ͼÏñ´¦ÀíºÍʶ±ð
È˹¤ÖÇÄÜ¡¢»úÆ÷ѧϰ& TensorFlow+Keras¿ò¼Üʵ¼ù
È˹¤ÖÇÄÜ+Python£«´óÊý¾Ý
³É¹¦°¸Àý
ij×ÛºÏÐÔ¿ÆÑлú¹¹ È˹¤ÖÇÄÜÓë»úÆ÷ѧϰӦÓÃ
Ä³ÒøÐÐ È˹¤ÖÇÄÜ+Python+´óÊý¾Ý
±±¾© È˹¤ÖÇÄÜ¡¢»úÆ÷ѧϰ& TensorFlow¿ò¼Üʵ¼ù
ijÁìÏÈÊý×ÖµØÍ¼ÌṩÉÌ PythonÊý¾Ý·ÖÎöÓë»úÆ÷ѧϰ
ÖйúÒÆ¶¯ È˹¤ÖÇÄÜ¡¢»úÆ÷ѧϰºÍÉî¶Èѧϰ