±à¼ÍƼö: |
±¾ÎĽ«ÒÔ¹¹½¨Ò»¸ö»úÆ÷ѧϰ·ÖÀàÆ÷ÈÎÎñΪÀý£¬´Ó»ù´¡±³¾°ÖªÊ¶¡¢Ëã·¨ÔÀíµ½Ëã·¨´úÂëʵÏÖ½øÐÐÈ«ÃæµÄ½²½âÓëÖ¸µ¼£¬Ï£Íû¶ÔÄúµÄѧϰÓÐËù°ïÖú¡£
±¾ÎÄÀ´×ÔÓÚAI¿Æ¼¼´ó±¾Óª £¬ÓÉ»ðÁú¹ûÈí¼þAlice±à¼¡¢ÍƼö¡£
|
|
ͼËã·¨²»ÊÇÒ»¸öÐÂÐ˼¼ÊõÁìÓò£¬ÔÚ¿ªÔ´¿âÖÐÒѾÓкܶ๦ÄÜÇ¿´óµÄË㷨ʵÏÖ¡£½üÁ½Ä꣬ҵÄÚµÄѧÕßÓë¿ÆÑ§¼Ò¶¼ÔÚ»ý¼«Ì½Ë÷¿ÉÒÔÃÖ²¹Éî¶Èѧϰ²»¿É½âÊÍÐÔ£¬ÎÞ·¨½øÐÐÒò¹ûÍÆ¶ÏµÄÕâ¸öȱÏÝ£¬¶øÍ¼Éñ¾ÍøÂ磨GNN£©³ÉΪ±¸ÊܹØ×¢ºÍÆÚ´ýµÄ¡°³è¶ù¡±¡£Ëæ×Åѧ½çºÍÒµ½çÔ½À´Ô½¹Ø×¢GNN£¬¸÷ÖÖй¤×÷²»¶Ï±»Ìá³ö£¬»ùÓÚͼÉñ¾ÍøÂçµÄ¿ò¼ÜËæÖ®²úÉú£¬Èç´ó¼ÒÏÖÔÚ¶¼ÒѾÊìϤµÄDGL£¬Á½´óÉî¶Èѧϰ¿ò¼ÜPyTorchºÍTensorFlowÖÐÒ²¿ªÊ¼Ö§³ÖÏàÓ¦µÄ¹¦ÄÜ£¬´ó¼Ò¶Ôͼ£¨Graph£©¡¢Í¼¼ÆË㡢ͼÊý¾Ý¿â¡¢Í¼»úÆ÷ѧϰµÈÑо¿µÄ¹Ø×¢¶ÈÔ½·¢¸ßÕÇ¡£
»ùÓÚͼÊý¾ÝµÄÓÅÐãÐÔÖÊ£¬ÎüÒýÔ½À´Ô½¶àµÄÆóÒµÔÚ»ùÓÚͼÊý¾ÝµÄ»úÆ÷ѧϰÈÎÎñÖпªÊ¼Í¶ÈëÑо¿ÓëʹÓ㬽«Í¼Êý¾ÝÓë»úÆ÷ѧϰËã·¨½áºÏ£¬ÃÖ²¹Ë㷨ȱÏÝ£¬¸³ÓèÐÂÒ»´úͼÊý¾Ý¿âеÄʹÃü¡£Óв»ÉÙÆóÒµÄÚ²¿×ÔÑÐͼÊý¾Ý¿âÓëͼ·ÖÎö¼ÆËãÆ½Ì¨£¬µ«ÊÇ¿ÉÖ±½ÓʹÓõĿªÔ´»ò³ÉÊ칤¾ß²¢²»ÍêÉÆ£¬¶ÔûÓÐÄÜÁ¦×ÔÑÐµÄÆóÒµÀ´Ëµ£¬»ùÓÚͼÊý¾ÝµÄ»úÆ÷ѧϰ¸ÃÔõô×ö£¿¹¤³ÌʦÔÚ×Ô¼ºµÄÑо¿ÖÐÓÐʲô¿ÉÐеij¢ÊÔ·½·¨£¿
½ñÌìµÄÎÄÕÂÖУ¬Í¨¹ý´ó¼Ò¶¼·Ç³£ÊìϤµÄÁ½¸ö¹¤¾ß¡ª¡ªÍ¼Êý¾Ý¿â Neo4JºÍScikit-Learning
ÌṩһÖÖ½â¾ö˼·¡£ÎÒÃǽ«ÒÔ¹¹½¨Ò»¸ö»úÆ÷ѧϰ·ÖÀàÆ÷ÈÎÎñΪÀý£¬´Ó»ù´¡±³¾°ÖªÊ¶¡¢Ëã·¨ÔÀíµ½Ëã·¨´úÂëʵÏÖ½øÐÐÈ«ÃæµÄ½²½âÓëÖ¸µ¼¡£
Êý¾Ý¿â Neo4J
Êý¾Ý¿â Neo4J ÊÇÒ»ÖÖͼÐÎÊý¾Ý¿â£¬Ä¿Ç°¼¸¸öÖ÷Á÷ͼÊý¾Ý¿âÓÐ TigerGraph¡¢Neo4j¡¢Amazon
Neptune¡¢JanusGraphºÍArangoDB£¬½üÄêÀ´£¬Neo4JһֱλÁÐͼÊý¾Ý¿âÅÅÐаñ°ñÊ×£¬Ëæ×ÅÕ⼸Äê֪ʶͼÆ×µÄ»ðÈÈ·¢Õ¹£¬ÈÃÊý¾Ý¿â
Neo4JÊܵ½¹ã·º¹Ø×¢¡£
Neo4J Ö÷Òª»ùÓÚCypherÓïÑÔ£¬»ùÓÚGraph Algorithm ʵÏÖͼ·ÖÎöËã·¨¡£»ñÈ¡°²×°Neo4j
DesktopÒ²·Ç³£ÈÝÒ×£¬Ö»ÐèÒ»¼ü¡£
Neo4j Desktop µØÖ·£º
https://neo4j.com/download/

ÕâÀïÔÙ¸ø´ó¼ÒÍÆ¼öÖ÷Òª»ùÓÚ Neo4JʵÏֵݸÀýËã·¨Êé¡¶Graph Algorithms¡·£¬Æä×÷Õß Amy
Holder ºÍ Mark NeedhamÒ²ÊÇ Neo4jµÄÔ±¹¤¡£

ÔÚÏßÔĶÁµØÖ·£º
https://neo4j.com/docs/graph-algorithms/current/
ͼÊý¾Ý¿â¶ÔÓÚ·ÖÎöÒì¹¹Êý¾ÝµãÖ®¼äµÄ¹ØÏµÌرðµÄÓÐÓã¬ÀýÈç·ÀÆÛÕ©»òFacebookµÄºÃÓѹØÏµÍ¼£¬ÒÔÔÚÉç½»ÍøÂç¹ØÏµµÄÔ¤²âÈÎÎñΪÀý£¬¸´Ôӵģ¨Éç½»£©ÍøÂçÒ»¸ö×îÖØÒªµÄ»ù±¾¹¹³ÉÊÇÁ´½Ó£¬ÔÚÉç½»¹ØÏµÍøÂçÖлùÓÚÒÑÓнڵãºÍÁ´½Ó¹¹³ÉµÄÍøÂçÐÅÏ¢£¬Ô¤²âDZÔÚ¹ØÏµ£¬Õâ±³ºóÒ»¸öºËÐĵÄËã·¨¾ÍÊÇÁ´Â·Ô¤²âËã·¨¡£ÕâÒ²ÊÇÎÒÃǽñÌìÎÄÕÂÖеĺËÐÄËã·¨£¬Neo4JͼËã·¨¿âÖ§³ÖÁ˶àÖÖÁ´Â·Ô¤²âËã·¨£¬ÔÚ³õʶNeo4J
ºó£¬ÎÒÃǾͿªÊ¼²½ÈëÁ´Â·Ô¤²âËã·¨µÄѧϰ£¬ÒÔ¼°ÈçºÎ½«Êý¾Ýµ¼ÈëNeo4JÖУ¬Í¨¹ýScikit-LearningÓëÁ´Â·Ô¤²âËã·¨£¬´î½¨»úÆ÷ѧϰԤ²âÈÎÎñÄ£ÐÍ¡£
Á´Â·Ô¤²âËã·¨
£¨Ò»£©Ê²Ã´ÊÇÁ´Â·Ô¤²â£¿
Á´Â·Ô¤²âÒѾ±»Ìá³öºÜ¶àÄêÁË¡£2004Ä꣬ÓÉ Jon Kleinberg
ºÍ David Liben-Nowell ·¢±íÏà¹ØÂÛÎÄÖ®ºó£¬Á´Â·Ô¤²â²Å±»ÆÕ¼°¿ªÀ´¡£ËûÃǵÄÂÛÎÄΪ¡¶The
Link Prediction Problem for Social Networks¡·
Ëæºó£¬Kleinberg ºÍ Liben-Nowell Ìá³ö´ÓÉç½»ÍøÂçµÄ½Ç¶ÈÀ´½â¾öÁ´Â·Ô¤²âÎÊÌ⣬ÈçÏÂËùÊö£º
Èô¸ø¶¨Ò»¸öÉç½»ÍøÂçµÄ¿ìÕÕ£¬ÎÒÃÇÄÜÔ¤²â³ö¸ÃÍøÂçÖеijÉÔ±ÔÚδÀ´¿ÉÄܳöÏÖÄÄЩÐµĹØÏµÂð£¿ÎÒÃÇ¿ÉÒÔ°ÑÕâ¸öÎÊÌâ¿´×÷Á´Â·Ô¤²âÎÊÌ⣬Ȼºó¶ÔÍøÂçÖи÷½ÚµãµÄÏàËÆ¶È½øÐзÖÎö£¬´Ó¶øµÃ³öÔ¤²âÁ´Â·µÄ·½·¨¡£
ºóÀ´£¬Jim Webber ²©Ê¿ÔÚ GraphConnect San
Francisco 2015 ´ó»áÉϽéÉÜÁËͼËã·¨µÄ·¢Õ¹Àú³Ì£¬ËûÓÃͼÀíÂÛ½²½âÁ˵ڶþ´ÎÊÀ½ç´óÕ½¡£
³ýÁËÔ¤²âÊÀ½ç´óÕ½ºÍÉç½»ÍøÂçÖеÄÅóÓѹØÏµ£¬ÎÒÃÇ»¹¿ÉÄÜÔÚʲô³¡¾°Óõ½¹ØÏµÔ¤²âÄØ£¿ÎÒÃÇ¿ÉÒÔÔ¤²â¿Ö²À×éÖ¯³ÉÔ±Ö®¼äµÄ¹ØÏµ£¬ÉúÎïÍøÂçÖзÖ×Ó¼äµÄ¹ØÏµ£¬ÒýÎÄÍøÂçÖÐDZÔڵĹ²Í¬´´×÷¹ØÏµ£¬¶ÔÒÕÊõ¼Ò»òÒÕÊõÆ·µÄÐËȤµÈµÈ£¬ÕâЩ³¡¾°¶¼¿ÉÄÜÓõÃÉÏÁ´Â·Ô¤²â¡£
Á´Â·µÄÔ¤²â¶¼Òâζ×ŶÔδÀ´¿ÉÄÜ·¢ÉúµÄÐÐΪ½øÐÐÔ¤²â£¬±ÈÈçÔÚÒ»¸öÒýÎÄÍøÂçÖУ¬ÎÒÃÇÊÇÔÚ¶ÔÁ½¸öÈËÊÇ·ñ¿ÉÄܺÏ×÷дһƪÂÛÎĽøÐÐÔ¤²â¡£
£¨¶þ£©Á´Â·Ô¤²âËã·¨
Kleinberg ºÍ Liben-Nowell ½éÉÜÁËһϵÁпÉÒÔÓÃÓÚÁ´Â·Ô¤²âµÄËã·¨£¬ÈçÏÂͼËùʾ£º

Kleinberg ºÍ Liben-Nowell ÔÚÂÛÎÄÖÐËù½éÉܵÄËã·¨
ÕâЩ·½·¨¶¼ÊǼÆËãÒ»¶Ô½ÚµãµÄ·ÖÊý£¬¸Ã·ÖÊý¿É¿´×÷ΪÄÇЩ½Úµã»ùÓÚÍØÆËÍøÂçµÄ¡°½üËÆ¶È¡±¡£Á½¸ö½ÚµãÔ½Ïà½ü£¬ËüÃÇÖ®¼ä´æÔÚÁªÏµµÄ¿ÉÄÜÐÔ¾ÍÔ½´ó¡£
ÏÂÃæÎÒÃÇÀ´¿´¿´¼¸¸öÆÀ¹À±ê×¼£¬ÒÔ±ãÓÚÎÒÃÇÀí½âËã·¨µÄÔÀí¡£
£¨Èý£©Ëã·¨ÆÀ¹À±ê×¼
1¡¢¹²Í¬ÁÚ¾ÓÊý
×î¼òµ¥µÄ¶ÈÁ¿·½·¨Ö®Ò»ÊǼÆË㹲ͬÁÚ¾ÓÊý£¬¶ÔÕâ¸ö¸ÅÄAhmad Sadraei µÄ½âÊÍÈçÏ£º
×÷ΪԤ²âÒò×Ó£¬¹²Í¬ÁÚ¾ÓÊý¿ÉÒÔ²¶×½µ½ÓµÓÐͬһ¸öÅóÓѵÄÁ½¸öİÉúÈË£¬¶øÕâÁ½¸öÈË¿ÉÄܻᱻÕâ¸öÅóÓѽéÉÜÈÏʶ£¨Í¼ÖгöÏÖÒ»¸ö±ÕºÏµÄÈý½ÇÐΣ©¡£
Õâ¸ö¶ÈÁ¿±ê×¼¼ÆËãÁËÒ»¶Ô½ÚµãËù¹²ÏíµÄÏàͬÁÚ¾ÓÊýÄ¿¡£ÈçÏÂͼËùʾ£¬½Úµã A ºÍ D ÓÐÁ½¸ö¹²Í¬ÁÚ¾Ó£¨½Úµã
B ºÍ C£©£¬¶ø½Úµã A ºÍ E Ö»ÓÐÒ»¸ö¹²Í¬ÁÚ¾Ó£¨½Úµã B£©¡£Òò´Ë£¬ÎÒÃÇÈÏΪ½Úµã A ºÍ D ¸üÏà½ü£¬Î´À´¸üÓпÉÄܲúÉú¹ØÁª¡£

2¡¢Adamic Adar£¨AA Ö¸±ê£©
ÔçÔÚ2003Ä꣬Lada Adamic ºÍ Eytan Adar ÔÚÑо¿Éç½»ÍøÂçµÄÔ¤²âÎÊÌâʱ£¬Ìá³öÁË
Adamic Adar Ëã·¨¡£AA Ö¸±êÒ²¿¼ÂÇÁ˹²Í¬ÁھӵĶÈÐÅÏ¢£¬µ«³ýÁ˹²Í¬ÁÚ¾Ó£¬»¹¸ù¾Ý¹²Í¬ÁھӵĽڵãµÄ¶È¸øÃ¿¸ö½Úµã¸³ÓèÒ»¸öÈ¨ÖØ£¬¼´¶ÈµÄ¶ÔÊý·ÖÖ®Ò»£¬È»ºó°Ñÿ¸ö½ÚµãµÄËùÓй²Í¬ÁÚ¾ÓµÄÈ¨ÖØÖµÏà¼Ó£¬ÆäºÍ×÷Ϊ¸Ã½Úµã¶ÔµÄÏàËÆ¶ÈÖµ¡£

½ÚµãµÄ¶ÈÖ¸ËüµÄÁÚ¾ÓÊý£¬¸ÃËã·¨µÄ³õÖÔÊÇ£ºµ±Í¼ÖгöÏÖÒ»¸ö±ÕºÏµÄÈý½Çʱ£¬ÄÇЩ¶ÈÊýµÍµÄ½Úµã¿ÉÄÜÓиü´óµÄÓ°ÏìÁ¦¡£±ÈÈçÔÚÒ»¸öÉç½»ÍøÂçÖУ¬ÓÐÁ½¸öÈËÊDZ»ËûÃǵĹ²Í¬ºÃÓѽéÉÜÈÏʶµÄ£¬·¢ÉúÕâÖÖ¹ØÁªµÄ¿ÉÄÜÐÔºÍÕâ¸öÈË»¹ÓжàÉÙ¶ÔÅóÓÑÓйء£Ò»¸ö¡°ÅóÓѲ»¶à¡±µÄÈ˸üÓпÉÄܽéÉÜËûµÄÒ»¶ÔÅóÓÑÈÏʶ¡£
3¡¢ÓÅÏÈÁ¬½Ó
¶ÔÓÚͼËã·¨Ñо¿ÕßÀ´Ëµ£¬ÕâÓ¦¸ÃÊÇ×î³£¼ûµÄ¸ÅÄîÖ®Ò»£¬×î³õÓÉ Albert-L¨¢szl¨® Barab¨¢si
ºÍ R¨¦ka Albert Ìá³ö£¬µ±Ê±ËûÃÇÕýÔÚ½øÐÐÓйØÎ޳߶ÈÍøÂçµÄÑо¿¡£¸ÃËã·¨µÄÉè¼Æ³õÖÔÊÇ£¬Ò»¸ö½ÚµãÓµÓеĹØÏµÔ½¶à£¬Î´À´»ñµÃ¸ü¶à¹ØÁªµÄ¿ÉÄÜÐÔ¾ÍÔ½´ó¡£ÕâÊǼÆËãÆðÀ´×î¼òµ¥µÄ¶ÈÁ¿±ê×¼£¬ÎÒÃÇÖ»ÐèÒª¼ÆËãÿ¸ö½ÚµãµÄ¶ÈÊýµÄ³Ë»ý¡£

£¨ËÄ£©Á´Â·Ô¤²â - Neo4j ͼËã·¨¿â
Ŀǰ£¬Neo4j ͼËã·¨¿âº¸ÇÁË6ÖÖÁ´Â·Ô¤²âËã·¨£ºAdamic Adar Ëã·¨¡¢¹²Í¬ÁÚ¾ÓËã·¨£¨ Common
Neighbors£©¡¢ÓÅÏÈÁ¬½ÓËã·¨£¨Preferential Attachment£©¡¢×ÊÔ´·ÖÅäËã·¨£¨Resource
Allocation£©¡¢¹²Í¬ÉçÇøËã·¨£¨Same Community£©¡¢×ÜÁÚ¾ÓËã·¨£¨Total Neighbors£©¡£
¿ìËÙѧϰһÏÂÒÔÏÂÎåÖÖËã·¨µÄÔÀí£º
£¨1£©Adamic Adar£º¼ÆË㹲ͬÁھӵĶÈÊýµÄ¶ÔÊý·ÖÖ®Ò»£¬²¢ÇóºÍ¡£
£¨2£©ÓÅÏÈÁ¬½ÓËã·¨£º¼ÆËãÿ¸ö½ÚµãµÄ¶ÈÊýµÄ³Ë»ý¡£
£¨3£©×ÊÔ´·ÖÅäËã·¨£º¼ÆË㹲ͬÁھӵĶÈÊý·ÖÖ®Ò»£¬²¢ÇóºÍ¡£
£¨4£©¹²Í¬ÉçÇøËã·¨£ºÀûÓÃÉçÇø·¢ÏÖËã·¨£¬¼ì²éÁ½¸ö½ÚµãÊÇ·ñ´¦ÓÚͬһ¸öÉçÇø¡£
£¨5£©×ÜÁÚ¾ÓËã·¨£º¼ÆËãÁ½¸ö½ÚµãËùÓµÓеIJ»Í¬ÁÚ¾ÓµÄÊýÄ¿¡£
ÏÖÔÚÀ´¿´Ò»ÏÂÈçºÎʹÓÿâÖеĹ²Í¬ÁÚ¾Óº¯Êý£¬ÒÔ֮ǰÌáµ½µÄͼ¹ØÏµ×÷ΪÀý×Ó¡£
Ê×ÏÈÖ´ÐÐ Cypher Óï¾ä£¬ÔÚ Neo4j Öд´½¨Ò»¸öͼ£º
UNWIND [["A",
"C"], ["A", "B"],
["B", "D"],
["B", "C"], ["B",
"E"], ["C", "D"]]
AS pair
MERGE (n1:Node {name: pair[0]})
MERGE (n2:Node {name: pair[1]})
MERGE (n1)-[:FRIENDS]-(n2) |

È»ºóÓÃÏÂÃæµÄº¯ÊýÀ´¼ÆËã½Úµã A ºÍ D µÄ¹²Í¬ÁÚ¾ÓÊý£º
neo4j> MATCH
(a:Node {name: 'A'})
MATCH (d:Node {name: 'D'})
RETURN algo.linkprediction.commonNeighbors(a,
d);
+-------------------------------------------+
| algo.linkprediction.commonNeighbors(a, d) |
+-------------------------------------------+
| 2.0 |
+-------------------------------------------+
1 row available after 97 ms, consumed after another
15 ms |
ÕâЩ½ÚµãÓÐÁ½¸ö¹²Í¬ÁÚ¾Ó£¬ËùÒÔËüÃǵĵ÷ÖΪ2¡£ÏÖÔÚ¶Ô½Úµã A ºÍ E ½øÐÐͬÑùµÄ¼ÆËã¡£ÒòΪËüÃÇÖ»ÓÐÒ»¸ö¹²Í¬ÁÚ¾Ó£¬²»³öÒâÍâÎÒÃǵõ½µÄ·ÖÊýÓ¦¸ÃΪ1¡£
neo4j> MATCH
(a:Node {name: 'A'})
MATCH (e:Node {name: 'E'})
RETURN algo.linkprediction.commonNeighbors(a,
e);
+-------------------------------------------+
| algo.linkprediction.commonNeighbors(a, e) |
+-------------------------------------------+
| 1.0 | |
ÈçÎÒÃÇËùÁÏ£¬µÃ·ÖȷʵΪ1¡£¸Ãº¯ÊýĬÈϵļÆË㷽ʽº¸ÇÈÎÒâµÄÀàÐÍÒÔ¼°Ö¸Ïò¡£ÎÒÃÇÒ²¿ÉÒÔͨ¹ý´«ÈëÌØ¶¨µÄ²ÎÊýÀ´½øÐмÆË㣺
neo4j> WITH
{direction: "BOTH", relationshipQuery:
"FRIENDS"}
AS config
MATCH (a:Node {name: 'A'})
MATCH (e:Node {name: 'E'})
RETURN algo.linkprediction.commonNeighbors(a,
e, config)
AS score;
+-------+
| score |
+-------+
| 1.0 |
+-------+ |
ΪÁËÈ·±£µÃµ½×¼È·µÄ½á¹û£¬ÎÒÃÇÔÙÊÔÊÔÁíÒ»ÖÖËã·¨¡£
ÓÅÏÈÁ¬½Óº¯Êý·µ»ØµÄÊÇÁ½¸ö½Úµã¶ÈÊýµÄ³Ë»ý¡£Èç¹ûÎÒÃǶԽڵã A ºÍ D ½øÐмÆË㣬»áµÃµ½ 2*2=4
µÄ½á¹û£¬ÒòΪ½Úµã A ºÍ D ¶¼ÓÐÁ½¸öÁÚ¾Ó¡£ÏÂÃæÀ´ÊÔÒ»ÊÔ£º
neo4j> MATCH
(a:Node {name: 'A'})
MATCH (d:Node {name: 'D'})
RETURN algo.linkprediction.preferentialAttachment(a,
d)
AS score;
+-------+
| score |
+-------+
| 4.0 |
+-------+ |
£¨Î壩Á´Â·Ô¤²âËùµÃµÄ·ÖÊýÓкÎÓã¿
ÏÖÔÚÎÒÃÇÒѾÁ˽âÓйØÁ´Â·Ô¤²âºÍÏàËÆ¶ÈÖ¸±êµÄ»ù±¾ÖªÊ¶ÁË£¬µ«»¹ÐèҪŪÃ÷°×ÈçºÎʹÓÃÕâЩָ±ê½øÐÐÁ´Â·Ô¤²â¡£ÓÐÒÔÏÂÁ½ÖÖ·½·¨£º
1¡¢Ö±½ÓʹÓÃÖ¸±ê
ÎÒÃÇ¿ÉÒÔÖ±½ÓʹÓÃÓÉÁ´Â·Ô¤²âËã·¨µÃµ½µÄ·ÖÊý£¬¼´ÉèÖÃÒ»¸öãÐÖµ£¬ÕâÑù¾Í¿ÉÒÔÔ¤²âÒ»¶Ô½ÚµãÊÇ·ñ¿ÉÄÜ´æÔÚ¹ØÏµÁË¡£
ÔÚÉÏÃæµÄÀý×ÓÖУ¬ÎÒÃÇ¿ÉÒÔÉ趨ÿһ¶ÔÓÅÏÈÁ¬½Ó·ÖÊýÔÚ3·ÖÒÔÉϵĽڵ㶼¿ÉÄÜ´æÔÚ¹ØÁª£¬¶øÄÇЩµÃ·ÖСÓÚ»òµÈÓÚ3·ÖµÄ½Úµã¶ÔÔò²»´æÔÚ¹ØÁª¡£
2¡¢Óмලѧϰ
ÎÒÃÇ¿ÉÒÔ°Ñ·ÖÊý×÷ÎªÌØÕ÷ȥѵÁ·Ò»¸ö¶þ·ÖÀàÆ÷£¬´Ó¶ø½øÐÐÓмලѧϰ¡£È»ºóÓÃÕâ¸ö¶þ·ÖÀàÆ÷È¥Ô¤²âÒ»¶Ô½ÚµãÊÇ·ñ´æÔÚ¹ØÁª¡£
ÔÚÕâ¸öϵÁн̳ÌÖУ¬ÎÒÃÇ»áÖØµã½éÉÜÓмලѧϰµÄ·½·¨¡£
¹¹½¨»úÆ÷ѧϰ·ÖÀàÆ÷
¼ÈÈ»ÎÒÃǾö¶¨Ê¹ÓÃÓмලѧϰµÄ·½·¨£¬ÄÇô¾ÍÐèÒª¿¼ÂÇÓйػúÆ÷ѧϰ¹¤×÷Á÷µÄÁ½¸öÎÊÌ⣺
£¨1£©¾ßÌåҪʹÓÃʲô»úÆ÷ѧϰģÐÍ£¿
£¨2£©ÈçºÎ½«Êý¾Ý·Ö³ÉѵÁ·¼¯ºÍ²âÊÔ¼¯£¿
£¨Ò»£©»úÆ÷ѧϰģÐÍ
Ç°ÃæÌáµ½µÄÁ´Â·Ô¤²âÖ¸±ê¶¼ÊǶÔÏàËÆµÄÊý¾Ý½øÐмÆË㣬µ«Èç¹ûÑ¡ÔñʹÓûúÆ÷ѧϰģÐÍ£¬Òâζ×ÅÎÒÃÇÐèÒª½â¾öÌØÕ÷¼äµÄ¹ØÁªÎÊÌâ¡£
ÓÐЩ»úÆ÷ѧϰģÐÍĬÈÏÆä´¦ÀíµÄÌØÕ÷¶¼ÊÇÏ໥¶ÀÁ¢µÄ¡£ÈôÒ»¸öÄ£Ð͵õ½µÄÌØÕ÷²»Âú×ã¸Ã¼ÙÉ裬Ôò»áµ¼ÖÂÔ¤²â½á¹ûµÄ׼ȷ¶ÈºÜµÍ¡£ÎÞÂÛÎÒÃÇÑ¡ÔñʲôģÐÍ£¬¶¼ÐèҪȥ³ýµôÄÇЩ¸ß¶ÈÏà¹ØµÄÌØÕ÷¡£
ÎÒÃÇ»¹¿ÉÒÔѡһ¸ö¼òµ¥·½°¸£¬Ê¹ÓÃÄÇЩ¶ÔÌØÕ÷Ïà¹ØÐÔ²»ÄÇôÃô¸ÐµÄÄ£ÐÍ¡£
һЩ¼¯³É·½·¨ÊÇÐеÃͨµÄ£¬ÒòΪËûÃǶÔÊäÈëÊý¾ÝûÓÐÕâÑùµÄÒªÇ󣬱ÈÈçÌݶÈÌáÉý·ÖÀàÆ÷£¨gradient
boosting classifier£©»òÕß Ëæ»úÉÁÖ·ÖÀàÆ÷£¨random forest classifier£©¡£
£¨¶þ£©ÑµÁ·¼¯ºÍ²âÊÔ¼¯
±È½Ï¼¬ÊÖµÄÎÊÌâÊÇѵÁ·¼¯ºÍ²âÊÔ¼¯µÄÇз֣¬ÎÒÃDz»ÄÜÖ»½øÐÐËæ»úÇз֣¬ÒòΪÕâ¿ÉÄܵ¼ÖÂÊý¾Ýй¶¡£
µ±Ä£ÐͲ»Ð¡ÐÄÓõ½ÑµÁ·¼¯ÒÔÍâµÄÊý¾Ýʱ£¬¾Í»á·¢ÉúÊý¾Ýй¶¡£ÕâÔÚͼ¼ÆËãÖкÜÈÝÒ×·¢Éú£¬ÒòΪѵÁ·¼¯ÖеĽڵã¿ÉÄÜÓë²âÊÔ¼¯ÖÐµÄ½Úµã´æÔÚ¹ØÁª¡£
ÎÒÃÇÐèÒª°ÑͼÇзֳÉ×Óͼ×÷ΪѵÁ·¼¯ºÍ²âÊÔ¼¯¡£Èç¹ûͼÊý¾ÝÓÐʱ¼äÕâ¸ö¸ÅÄÄÇÎÒÃǵŤ×÷¾ÍÈÝÒ×¶àÁË£¬ÎÒÃÇ¿ÉÒÔÒÔij¸öʱ¼äµã½øÐзָîµã£¬¸Ãʱ¼äµã֮ǰµÄÊý¾Ý×÷ΪѵÁ·¼¯£¬Ö®ºóµÄÊý¾Ý×÷Ϊ²âÊÔ¼¯¡£
ÕâÈÔÈ»²»ÊÇ×îºÃµÄ½â¾ö·½°¸£¬ÎÒÃÇÐèÒª½øÐг¢ÊÔ£¬È·±£ÑµÁ·¼¯ºÍ²âÊÔ¼¯ÖÐ×ÓͼµÄ´óÖÂÍøÂç½á¹¹ÊÇÏà½üµÄ¡£Ò»µ©×öºÃÕâÒ»²½£¬ÎÒÃǾÍÓµÓÐÁËÓÉÈô¸É´æÔÚ¹ØÁªµÄ½Úµã¶ÔËù×é³ÉµÄѵÁ·¼¯ºÍ²âÊÔ¼¯¡£ËüÃǶ¼ÊôÓÚ»úÆ÷ѧϰģÐÍÖеÄÕýÑù±¾¡£
½ÓÏÂÀ´¿´Ê²Ã´ÊǸºÑù±¾¡£
×î¼òµ¥µÄÇé¿öÊÇ£¬È«²¿½Úµã¶ÔÖ®¼ä¶¼²»´æÔÚ¹ØÁª¡£µ«ÎÊÌâÊÇ£¬ºÜ¶à³¡¾°ÖдæÔÚ¹ØÏµµÄ½Úµã¶ÔÊýĿԶ´óÓÚÄÇЩûÓйØÏµµÄ½Úµã¶Ô¡£
¸ºÑù±¾µÄ×î´óÊýÄ¿ÈçÏ£º
# negative examples
= (# nodes)2 - (# relationships) - (# nodes) |
Èç¹ûÎÒÃǽ«ÑµÁ·¼¯ÖеÄÈ«²¿¸ºÑù±¾¶¼´úÈëÄ£ÐÍ£¬¾Í»áµ¼ÖÂÑÏÖØµÄÀà±ð²»¾ùºâÎÊÌ⣬¼´¸ºÑù±¾ÊýÔ¶´óÓÚÕýÑù±¾Êý¡£
Èô»ùÓÚÕâÖÖ²»¾ùºâÊý¾Ý¼¯½øÐÐÄ£Ð͵ÄѵÁ·£¬Ö»ÒªÎÒÃÇÔ¤²âÈκνڵã¶Ô¶¼²»´æÔÚ¹ØÁª£¬¾Í¿ÉÒԵõ½·Ç³£²»´íµÄ׼ȷ¶È£¬µ«Õ⵱Ȼ²»ÊÇÎÒÃÇÏëÒªµÄ¡£
ËùÒÔÎÒÃÇÐèÒª¾¡Á¿¼õÉÙ¸ºÑù±¾µÄÊýÄ¿¡£ÓÐÒ»ÖÖ·½·¨±»¶àƪÂÛÎÄÌá¼°¹ý£¬ÄǾÍÊÇÑ¡ÔñÄÇЩ±Ë´Ë¼ä¾àÏàµÈµÄ½Úµã¶Ô¡£ÕâÖÖ·½·¨¿ÉÒÔÓÐЧµØ¼õÉÙ¸ºÑù±¾Êý£¬ËäÈ»¸ºÑù±¾ÊýÈÔȻԶ´óÓÚÕýÑù±¾Êý¡£
ΪÁ˽â¾öÑù±¾²»¾ùºâµÄÎÊÌ⣬ÎÒÃÇÒ²¿ÉÒÔ¶Ô¸ºÑù±¾½øÐÐÇ·²ÉÑù£¬»òÕß¶ÔÕýÑù±¾½øÐйý²ÉÑù¡£
£¨Èý£©´úÂë½Ì³Ì£ºÁ´Â·Ô¤²âʵս
»ùÓÚÉÏÃæ¶ÔÁ´Â·Ô¤²â±³¾°ÖªÊ¶µÄѧϰ£¬×¼±¸ºÃʵ¼ÊÊý¾Ý¼¯ºó£¬ÏÂÃæÎÒÃǾͿªÊ¼Êµ²Ù½Ì³Ì£¬½Ì³Ì½«Íê³ÉÒ»¸öÅжÏÊÇ·ñÊÇÂÛÎĺÏÖøÕß¹ØÏµµÄ»úÆ÷ѧϰԤ²âÄ£ÐÍ¡£
1¡¢Â¼ÈëÒýÓÃÊý¾Ý¿â
ÎÒÃǽ«Ê¹ÓÃÀ´×ÔDBLPÒýÎÄÍøÂçµÄÊý¾Ý£¬ÆäÖаüÀ¨À´×Ô¸÷ÖÖѧÊõÀ´Ô´µÄÒýÎÄÊý¾Ý£¬ÕâÀïÎÒÃÇ»¹ÒªÖØµã¹Ø×¢Ò»Ð©Èí¼þ¿ª·¢»áÒéÉϵÄÊý¾Ý¡£

ͨ¹ýÔËÐÐÒÔÏÂCypherÓï¾äÀ´µ¼Èë¸ÃÊý¾Ý×Ó¼¯¡£Ö»ÒªÔÚNeo4jä¯ÀÀÆ÷ÖÐÆôÓöàÓï¾ä±à¼Æ÷£¬¾Í¿ÉÒÔÒ»´ÎÈ«²¿ÔËÐС£

// Create constraints
CREATE CONSTRAINT ON (a:Article) ASSERT a.index
IS UNIQUE;
CREATE CONSTRAINT ON (a:Author) ASSERT a.name
IS UNIQUE;
CREATE CONSTRAINT ON (v:Venue) ASSERT v.name IS
UNIQUE;
// Import data from JSON files using the APOC
library
CALL apoc.periodic.iterate(
'UNWIND ["dblp-ref-0.json", "dblp-ref-1.json",
"dblp-ref-2.json", "dblp-ref-3.json"]
AS file
CALL apoc.load.json("https://github.com/mneedham/link-prediction/raw/master/data/"
+ file)
YIELD value WITH value
RETURN value',
'MERGE (a:Article {index:value.id})
SET a += apoc.map.clean(value,["id","authors","references",
"venue"],[0])
WITH a, value.authors as authors, value.references
AS citations, value.venue AS venue
MERGE (v:Venue {name: venue})
MERGE (a)-[:VENUE]->(v)
FOREACH(author in authors |
MERGE (b:Author{name:author})
MERGE (a)-[:AUTHOR]->(b))
FOREACH(citation in citations |
MERGE (cited:Article {index:citation})
MERGE (a)-[:CITED]->(cited))',
{batchSize: 1000, iterateList: true}); |
ÏÂͼÊÇÊý¾Ýµ¼Èëµ½Neo4jºóµÄÏÔʾ£º

2¡¢´î½¨¹²Í¬×÷Õßͼ
¸ÃÊý¾Ý¼¯²»°üº¬ÃèÊöËûÃǵÄÐ×÷µÄ×÷ÕßÖ®¼äµÄ¹ØÏµ£¬µ«ÊÇÎÒÃÇ¿ÉÒÔ¸ù¾Ý²éÕÒ¶à¸öÈË׫дµÄÎÄÕÂÀ´ÍƶÏËûÃÇ¡£ÒÔÏÂCypherÓï¾äÔÚÖÁÉÙ׫д¹ýһƪÎÄÕµÄ×÷ÕßÖ®¼ä´´½¨ÁËCO_AUTHOR¹ØÏµ£º
MATCH (a1)<-[:AUTHOR]-(paper)-[:AUTHOR]->(a2:Author)
WITH a1, a2, paper
ORDER BY a1, paper.year
WITH a1, a2, collect(paper)[0].year AS year,
count(*) AS collaborations
MERGE (a1)-[coauthor:CO_AUTHOR {year: year}]-(a2)
SET coauthor.collaborations = collaborations;
|
¼´Ê¹ÔÚ¶àÆªÎÄÕÂÖнøÐйýºÏ×÷£¬ÎÒÃÇÒ²Ö»ÄÜÔÚºÏ×÷µÄ×÷ÕßÖ®¼ä´´½¨Ò»ÖÖCO_AUTHOR¹ØÏµ¡£ÎÒÃÇÔÚÕâЩ¹ØÏµÉÏ´´½¨¼¸¸öÊôÐÔ£º
£¨1£©Äê·ÝÊôÐÔ£¬Ö¸ºÏ×÷ÕßÃǹ²Í¬Íê³ÉµÄµÚһƪÎÄÕµijö°æÄê·Ý
£¨2£©ºÏ×÷ÊôÐÔ£¬Ö¸×÷ÕßÃǺÏ×÷¹ý¶àÉÙÆªÎÄÕÂ

Neo4j ÖеĹ²Í¬×÷Õß
ÏÖÔÚÒѾÓÐÁ˺ÏÖøÕß¹ØÏµÍ¼±í£¬ÎÒÃÇÐèҪŪÇå³þÈçºÎÔ¤²â×÷ÕßÖ®¼äδÀ´ºÏ×÷µÄ¿ÉÄÜÐÔ£¬ÎÒÃǽ«¹¹½¨Ò»¸ö¶þ½øÖÆ·ÖÀàÆ÷À´Ö´Ðд˲Ù×÷£¬Òò´ËÏÂÒ»²½ÊÇ´´½¨ÑµÁ·Í¼ºÍ²âÊÔͼ¡£
3¡¢ÑµÁ·ºÍ²âÊÔÊý¾Ý¼¯
¸ù¾ÝÉÏÃæµÄ½éÉÜ£¬ÎÒÃDz»Äܽ«Êý¾ÝËæ»ú·ÖΪѵÁ·Êý¾Ý¼¯ºÍ²âÊÔÊý¾Ý¼¯£¬ÒòΪÈç¹û²»Ð¡ÐĽ«ÑµÁ·Êý¾ÝÖ®ÍâµÄÊý¾ÝÓÃÓÚ´´½¨Ä£ÐÍ£¬Ôò¿ÉÄܻᷢÉúÊý¾Ýй©¡£ÕâºÜÈÝÒ×·¢ÉúÔÚʹÓÃͼÐεÄʱºò£¬ÒòΪѵÁ·¼¯ÖеĽڵã¶Ô¿ÉÄÜÓë²âÊÔ¼¯ÖеĽڵãÏàÁ¬¡£
ΪÁ˽â¾öÕâ¸öÎÊÌ⣬ÎÒÃÇÐèÒª½«ÎÒÃǵÄͼ·ÖΪѵÁ·Í¼ºÍ²âÊÔ×Óͼ£¬ÐÒÔ˵ÄÊÇÒýÎÄͼÖаüº¬ÎÒÃÇ¿ÉÒÔ·Ö¸îµÄʱ¼äÐÅÏ¢¡£ÎÒÃÇ¿ÉÒÔͨ¹ý²ð·ÖÌØ¶¨Äê·ÝµÄÊý¾ÝÀ´´´½¨ÑµÁ·Í¼ºÍ²âÊÔͼ¡£µ«ÊÇ£¬ÎÒÃÇÓ¦¸Ã·Ö¿ªÄÄÒ»ÄêÄØ£¿ÏÈÀ´¿´¿´ºÏ×÷Õß¹²Í¬ºÏ×÷µÄµÚÒ»ÄêµÄ·Ö²¼Çé¿ö£º

£¨Ã¿ÄêµÄºÏ×÷Êý·Ö²¼Í¼£©
¿´ÆðÀ´ÎÒÃÇÓ¦¸ÃÔÚ2016Äê½øÐвð·Ö£¬ÎªÎÒÃǵÄÿ¸ö×ÓͼÌṩºÏÀíÊýÁ¿µÄÊý¾Ý£¬½«2005Äê֮ǰ¿ªÊ¼µÄËùÓкÏÖøÕß×÷ΪѵÁ·Í¼£¬2006ÄêÒÔºóµÄÔò×÷Ϊ²âÊÔͼ¡£
»ùÓÚ¸ÃÄêÔÚͼ±íÖд´½¨Ã÷È·µÄCO_AUTHOR_EARLYºÍCO_AUTHOR_LATE¹ØÏµ¡£ÒÔÏ´úÂ뽫ΪÎÒÃÇ´´½¨ÕâЩ¹ØÏµ£º
ѵÁ·×Óͼ
MATCH (a)-[r:CO_AUTHOR]->(b)
WHERE r.year < 2006
MERGE (a)-[:CO_AUTHOR_EARLY {year: r.year}]-(b); |
²âÊÔ×Óͼ
MATCH (a)-[r:CO_AUTHOR]->(b)
WHERE r.year >= 2006
MERGE (a)-[:CO_AUTHOR_LATE {year: r.year}]-(b); |
ÕâÑù·Ö×éʹÎÒÃÇÔÚ2005Äê֮ǰµÄÔçÆÚͼ±íÖÐÓÐ81,096¸ö¹ØÏµ£¬ÔÚ2006ÄêÖ®ºóµÄºóÆÚͼ±íÖÐÓÐ74,128¸ö¹ØÏµ£¬ÐγÉÁË52-48µÄ±ÈÀý¡£Õâ¸ö±ÈÀý±Èͨ³£²âÊÔÖÐʹÓõıÈÀý¸ßºÜ¶à£¬µ«Õâû¹ØÏµ¡£ÕâЩ×ÓͼÖеĹØÏµ½«×÷ΪѵÁ·ºÍ²âÊÔ¼¯ÖеÄÕýÀý£¬µ«ÎÒÃÇÒ²ÐèҪһЩ¸ºÀý¡£Ê¹Ó÷ñ¶¨Ê¾Àý¿ÉÒÔÈÃÎÒÃǵÄÄ£ÐÍѧϰÈçºÎÇø·ÖÔÚËüÃÇÖ®¼äÁ´½Ó½ÚµãºÍ²»ÔÚËüÃÇÖ®¼äÁ´½Ó½Úµã¡£
ÓëÁ´½ÓÔ¤²âÎÊÌâÒ»Ñù£¬·ñ¶¨Ê¾Àý±È¿Ï¶¨µÄʾÀý¶àµÃ¶à¡£·ñ¶¨Ê¾ÀýµÄ×î´óÊýÁ¿µÈÓÚ£º
# negative examples
= (# nodes)2 - (# relationships) - (# nodes) |
¼´½ÚµãµÄƽ·½Êý¼õȥͼÐÎËù¾ßÓеĹØÏµÔÙ¼õÈ¥×ÔÉí¹ØÏµ¡£
³ýÁËʹÓü¸ºõËùÓпÉÄܵÄÅä¶ÔÒÔÍ⣬ÎÒÃÇÒ²½«±Ë´ËÖ®¼äÏà¾à2ÖÁ3ÌøµÄ½Úµã½øÐÐÅä¶Ô£¬Õ⽫ΪÎÒÃÇÌṩ¸ü¶à¿É¹ÜÀíµÄÊý¾Ý¡£ÎÒÃÇ¿ÉÒÔͨ¹ýÔËÐÐÒÔÏ´úÂëÀ´Éú³ÉºÍ²éѯÅä¶Ô£º
MATCH (author:Author)
WHERE (author)-[:CO_AUTHOR_EARLY]-()
MATCH (author)-[:CO_AUTHOR_EARLY*2..3]-(other)
WHERE not((author)-[:CO_AUTHOR_EARLY]-(other))
RETURN id(author) AS node1, id(other) AS node2 |
´Ë²éѯ·µ»Ø4,389,478¸ö·ñ¶¨Ê¾ºÍ81,096¸ö¿Ï¶¨Ê¾£¬ÕâÒâζ×Å·ñ¶¨Ê¾Êǿ϶¨Ê¾µÄ54±¶Ö®¶à¡£
µ«ÈÔÈ»´æÔںܴóµÄ²»Æ½ºâ£¬ÕâÒâζ×ÅÓÃÓÚÔ¤²âÿ¶Ô½ÚµãÁ´½ÓµÄÄ£Ðͽ«·Ç³£²»×¼È·¡£ÎªÁ˽â¾öÕâ¸öÎÊÌ⣬ÎÒÃÇ¿ÉÒÔ¶ÔÕýÀý½øÐÐÉý²ÉÑù»ò¶Ô¸ºÀý½øÐнµ²ÉÑù£¬¿ÉÒÔʹÓÃϲÉÑù·½·¨¡£
4¡¢Py2neo, pandas, scikit-learn
½ÓÏÂÀ´ÎÒÃÇʹÓÃpy2neo£¬pandasºÍscikit-learn¿â£¬È«²¿»ùÓÚPythonÓïÑÔ£¬Í¨¹ýPypi°²×°£º
pip install py2neo==4.1.3
pandas sklearn |
£¨1£©py2neoÇý¶¯³ÌÐòʹÊý¾Ý¿ÆÑ§¼ÒÄܹ»ÇáËɵؽ«Neo4jÓëPythonÊý¾Ý¿ÆÑ§Éú̬ϵͳÖеŤ¾ßÏà½áºÏ¡£ÎÒÃǽ«Ê¹Óøÿâ¶ÔNeo4jÖ´ÐÐCypher²éѯ¡£
£¨2£©pandasÊÇBSDÐí¿ÉµÄ¿ª·ÅÔ´´úÂë¿â£¬ÎªPython±à³ÌÓïÑÔÌṩÁ˸ßÐÔÄÜ¡¢Ò×ÓÚʹÓõÄÊý¾Ý½á¹¹ºÍÊý¾Ý·ÖÎö¹¤¾ß¡£
£¨3£©scikit-learnÊÇÒ»¸ö·Ç³£ÊÜ»¶ÓµÄ»úÆ÷ѧϰ¿â¡£ÎÒÃǽ«Ê¹ÓøÿâÀ´¹¹½¨ÎÒÃǵĻúÆ÷ѧϰģÐÍ¡£

£¨Scikit-Learn workflow ÍØÕ¹°æ£¬À´Ô´ÍøÂ磩
°²×°ÍêÕâЩ¿âºó£¬µ¼ÈëËùÐèµÄ³ÌÐò°ü£¬²¢´´½¨Êý¾Ý¿âÁ¬½Ó£º
from py2neo
import Graph
import pandas as pd graph = Graph("bolt://localhost",
auth=("neo4j", "neo4jPassword"))
|
5¡¢´î½¨ÎÒÃǵÄѵÁ·ºÍ²âÊÔ¼¯
ÏÖÔÚ£¬ÎÒÃÇ¿ÉÒÔ±àдÒÔÏ´úÂëÀ´´´½¨²âÊÔÊý¾Ý¿ò¼Ü£¬ÆäÖаüº¬»ùÓÚÔçÆÚͼÐεÄÕýÀýºÍ¸ºÀý£º
# Find positive
examples
train_existing_links = graph.run("""
MATCH (author:Author)-[:CO_AUTHOR_EARLY]->(other:Author)
RETURN id(author) AS node1, id(other) AS node2,
1 AS label
""").to_data_frame()
# Find negative examples
train_missing_links = graph.run("""
MATCH (author:Author)
WHERE (author)-[:CO_AUTHOR_EARLY]-()
MATCH (author)-[:CO_AUTHOR_EARLY*2..3]-(other)
WHERE not((author)-[:CO_AUTHOR_EARLY]-(other))
RETURN id(author) AS node1, id(other) AS node2,
0 AS label
""").to_data_frame()
# Remove duplicates
train_missing_links = train_missing_links.drop_duplicates()
# Down sample negative examples
train_missing_links = train_missing_links.sample(
n=len(train_existing_links))
# Create DataFrame from positive and negative
examples
training_df = train_missing_links.append(
train_existing_links, ignore_index=True)
training_df['label'] = training_df['label'].astype('category') |

Ò»¸ö²âÊÔÊý¾Ý¼¯µÄÀý×Ó
Ö´ÐÐÏàͬµÄ²Ù×÷À´´´½¨²âÊÔÊý¾Ý¿ò¼Ü£¬µ«ÊÇÕâ´Î½ö¿¼ÂǺóÆÚͼÐÎÖеĹØÏµ£º
# Find positive
examples
test_existing_links = graph.run("""
MATCH (author:Author)-[:CO_AUTHOR_LATE]->(other:Author)
RETURN id(author) AS node1, id(other) AS node2,
1 AS label
""").to_data_frame()
# Find negative examples
test_missing_links = graph.run("""
MATCH (author:Author)
WHERE (author)-[:CO_AUTHOR_LATE]-()
MATCH (author)-[:CO_AUTHOR_LATE*2..3]-(other)
WHERE not((author)-[:CO_AUTHOR_LATE]-(other))
RETURN id(author) AS node1, id(other) AS node2,
0 AS label
""").to_data_frame()
# Remove duplicates
test_missing_links = test_missing_links.drop_duplicates()
# Down sample negative examples
test_missing_links = test_missing_links.sample(n=len(test_existing_links))
# Create DataFrame from positive and negative
examples
test_df = test_missing_links.append(
test_existing_links, ignore_index=True)
test_df['label'] = test_df['label'].astype('category') |
½ÓÏÂÀ´£¬¿ªÊ¼´´½¨»úÆ÷ѧϰģÐÍ¡£
6¡¢Ñ¡Ôñ»úÆ÷ѧϰËã·¨
ÎÒÃǽ«´´½¨Ò»¸öËæ»úÉÁÖ·ÖÀàÆ÷£¬´Ë·½·¨·Ç³£ÊʺÏÊý¾Ý¼¯Öаüº¬Ç¿ÏîºÍÈõÏîµÄÄ£ÐÍ¡£¾¡¹ÜÈõ¹¦ÄÜÓÐʱ»áÓÐËù°ïÖú£¬µ«Ëæ»úÉÁÖ·½·¨¿ÉÒÔÈ·±£ÎÒÃDz»»á´´½¨¹ý¶ÈÄâºÏѵÁ·Êý¾ÝµÄÄ£ÐÍ¡£Ê¹ÓÃÒÔÏ´úÂë´´½¨Ä£ÐÍ£º
from sklearn.ensemble
import RandomForestClassifier
classifier = RandomForestClassifier(n_estimators=30,
max_depth=10,
random_state=0) |
ÏÖÔÚÊÇʱºòÉè¼ÆÒ»Ð©ÓÃÀ´ÑµÁ·Ä£Ð͵ÄÌØÕ÷¡£ÌØÕ÷ÌáÈ¡ÊÇÒ»ÖÖ½«´óÁ¿Êý¾ÝºÍÊôÐÔÌáȡΪһ×é¾ßÓдú±íÐÔµÄÊýÖµ£¨ÌØÕ÷£©µÄ·½·¨¡£ÕâÐ©ÌØÕ÷»á×÷ΪÊäÈëµÄÊý¾Ý£¬ÒÔ±ãÎÒÃÇÇø·ÖѧϰÈÎÎñµÄÀà±ð/Öµ¡£
7¡¢Éú³ÉÁ´½ÓÔ¤²âÌØÕ÷
ʹÓÃÁ´½ÓÔ¤²â¹¦ÄÜÉú³ÉÒ»Ð©ÌØÕ÷£º
def apply_graphy_features(data,
rel_type):
query = """
UNWIND $pairs AS pair
MATCH (p1) WHERE id(p1) = pair.node1
MATCH (p2) WHERE id(p2) = pair.node2
RETURN pair.node1 AS node1,
pair.node2 AS node2,
algo.linkprediction.commonNeighbors(
p1, p2, {relationshipQuery: $relType}) AS cn,
algo.linkprediction.preferentialAttachment(
p1, p2, {relationshipQuery: $relType}) AS pa,
algo.linkprediction.totalNeighbors(
p1, p2, {relationshipQuery: $relType}) AS tn
"""
pairs = [{"node1": pair[0], "node2":
pair[1]}
for pair in data[["node1", "node2"]].values.tolist()]
params = {"pairs": pairs, "relType":
rel_type}
features = graph.run(query, params).to_data_frame()
return pd.merge(data, features, on = ["node1",
"node2"]) |
´Ë¹¦ÄÜ·¢ÆðÒ»¸ö²éѯ£¬¸Ã²éѯ´ÓÌṩµÄDataFrameÖлñÈ¡Åä¶ÔµÄ½Úµã£¬²¢¶Ôÿһ¶Ô½Úµã½øÐÐÒÔϼÆË㣺¹²Í¬ÁÚ¾Ó£¨cn£©¡¢ÓÅÏȸ½¼þ£¨pa£©ÒÔ¼°ÁÚ¾Ó×ÜÊý£¨tn£©
ÈçÏÂËùʾ£¬ÎÒÃÇ¿ÉÒÔ½«ÆäÓ¦ÓÃÓÚÎÒÃǵÄѵÁ·²¢²âÊÔDataFrame£º
training_df =
apply_graphy_features(training_df, "CO_AUTHOR_EARLY")
test_df = apply_graphy_features(test_df, "CO_AUTHOR") |
¶ÔÓÚѵÁ·Êý¾Ý¿ò¼Ü£¬½ö¸ù¾ÝÔçÆÚͼÐÎÀ´¼ÆËãÕâЩָ±ê£¬¶ø¶ÔÓÚ²âÊÔÊý¾Ý¿ò¼Ü£¬½«ÔÚÕû¸öͼÐÎÖнøÐмÆËã¡£Ò²¿ÉÒÔʹÓÃÕû¸öͼÐÎÀ´¼ÆËãÕâЩ¹¦ÄÜ£¬ÒòΪͼÐεÄÑݱäÈ¡¾öÓÚËùÓÐʱ¼ä£¬¶ø²»½öÈ¡¾öÓÚ2006Äê¼°ÒÔºóµÄÇé¿ö¡£

²âÊÔѵÁ·¼¯
ʹÓÃÒÔÏ´úÂëѵÁ·Ä£ÐÍ£º
columns = ["cn",
"pa", "tn"]
X = training_df[columns]
y = training_df["label"]
classifier.fit(X, y) |
ÏÖÔÚµÄÄ£ÐÍÒѾ¾¹ýѵÁ·ÁË£¬µ«»¹ÐèÒª¶ÔËü½øÐÐÆÀ¹À¡£
8¡¢ÆÀ¹ÀÄ£ÐÍ
ÎÒÃǽ«¼ÆËãÆä׼ȷÐÔ£¬×¼È·ÐÔºÍÕÙ»ØÂÊ£¬¼ÆËã·½·¨¿É²Î¿¼ÏÂͼ£¬scikit-learnÒ²ÄÚÖÃÁ˴˹¦ÄÜ£¬»¹¿ÉÒԵõ½Ä£ÐÍÖÐʹÓõÄÿ¸öÌØÕ÷µÄÖØÒªÐÔ¡£

from sklearn.metrics
import recall_score
from sklearn.metrics import precision_score
from sklearn.metrics import accuracy_score
def evaluate_model(predictions, actual):
accuracy = accuracy_score(actual, predictions)
precision = precision_score(actual, predictions)
recall = recall_score(actual, predictions)
metrics = ["accuracy", "precision",
"recall"]
values = [accuracy, precision, recall]
return pd.DataFrame(data={'metric': metrics,
'value': values})
def feature_importance(columns, classifier):
features = list(zip(columns, classifier.feature_importances_))
sorted_features = sorted(features, key = lambda
x: x[1]*-1) keys = [value[0] for value in sorted_features]
values = [value[1] for value in sorted_features]
return pd.DataFrame(data={'feature': keys, 'value':
values}) |
ÆÀ¹ÀÄ£ÐÍÖ´ÐдúÂ룺
predictions =
classifier.predict(test_df[columns])
y_test = test_df["label"]
evaluate_model(predictions, y_test) |

£¨×¼È·ÂÊ£¬¾«×¼¶È£¬Õٻضȣ©
ÔÚ¸÷¸ö·½ÃæµÄµÃ·Ö¶¼ºÜ¸ß¡£ÏÖÔÚ¿ÉÒÔÔËÐÐÒÔÏ´úÂëÀ´²é¿´ÄĸöÌØÕ÷°çÑÝÁË×îÖØÒªµÄ½ÇÉ«£º
feature_importance(columns,
classifier) |

£¨ÌØÕ÷ÖØÒª¶È£©
ÔÚÉÏÃæÎÒÃÇ¿ÉÒÔ¿´µ½£¬¹«¹²ÁÚ¾Ó£¨cn£©ÊÇÄ£ÐÍÖеÄÖ÷ÒªÖ§ÅäÌØÕ÷¡£¹²Í¬ÁÚ¾ÓÒâζ×Å×÷ÕßÓµÓеÄδ±ÕºÏµÄÐͬÕßÈý½ÇµÄÊýÁ¿µÄ¼ÆÊý£¬Òò´ËÊýÖµÕâô¸ß²¢²»Ææ¹Ö¡£
½ÓÏÂÀ´£¬Ìí¼ÓһЩ´ÓͼÐÎËã·¨Éú³ÉµÄÐÂÌØÕ÷¡£
9¡¢Èý½ÇÐÎÓë¾ÛÀàϵÊý
Ê×ÏÈ£¬ÔÚ²âÊÔͼºÍѵÁ·×ÓͼÉÏÔËÐÐÈý½Ç¼ÆÊýËã·¨¡£¸ÃËã·¨¿É·µ»ØÃ¿¸ö½ÚµãÐγɵÄÈý½ÇÐÎÊýÁ¿ÒÔ¼°Ã¿¸ö½ÚµãµÄ¾ÛÀàϵÊý¡£½ÚµãµÄ¾ÛÀàϵÊý±íʾÆäÁÚ¾ÓÒ²±»Á¬½ÓµÄ¿ÉÄÜÐÔ¡£¿ÉÒÔÔÚNeo4jä¯ÀÀÆ÷ÖÐÔËÐÐÒÔÏÂCypher²éѯ£¬ÒÔÔÚѵÁ·Í¼ÉÏÔËÐдËËã·¨£º
CALL algo.triangleCount('Author',
'CO_AUTHOR_EARLY', {
write:true,
writeProperty:'trianglesTrain',
clusteringCoefficientProperty:'coefficientTrain'}); |
È»ºóÖ´ÐÐÒÔÏÂCypher²éѯÒÔÔÚ²âÊÔͼÉÏÔËÐУº
CALL algo.triangleCount('Author',
'CO_AUTHOR', {
write:true,
writeProperty:'trianglesTest',
clusteringCoefficientProperty:'coefficientTest'}); |
ÏÖÔÚ½ÚµãÉÏÓÐ4¸öÐÂÊôÐÔ£ºÈý½ÇѵÁ·£¬ÏµÊýѵÁ·£¬Èý½Ç²âÊÔºÍϵÊý²âÊÔ¡£ÏÖÔÚ£¬ÔÚÒÔϹ¦ÄܵİïÖúÏ£¬½«ËüÃÇÌí¼Óµ½ÎÒÃǵÄѵÁ·ºÍ²âÊÔDataFrameÖУº
def apply_triangles_features(data,triangles_prop, coefficient_prop):
query = """
UNWIND $pairs AS pair
MATCH (p1) WHERE id(p1) = pair.node1
MATCH (p2) WHERE id(p2) = pair.node2
RETURN pair.node1 AS node1,
pair.node2 AS node2,
apoc.coll.min([p1[$triangles], p2[$triangles]])
AS minTriangles,
apoc.coll.max([p1[$triangles], p2[$triangles]])
AS maxTriangles,
apoc.coll.min([p1[$coefficient], p2[$coefficient]])
AS minCoeff,
apoc.coll.max([p1[$coefficient], p2[$coefficient]])
AS maxCoeff """
pairs = [{"node1": pair[0], "node2":
pair[1]}
for pair in data[["node1", "node2"]].values.tolist()]
params = {"pairs": pairs,
"triangles": triangles_prop,
"coefficient": coefficient_prop}
features = graph.run(query, params).to_data_frame()
return pd.merge(data, features, on = ["node1",
"node2"]) |
ÕâЩ²ÎÊýÓëÎÒÃǵ½Ä¿Ç°ÎªÖ¹Ê¹ÓõIJ»Í¬£¬ËüÃDz»ÊÇÌØ¶¨ÓÚij¸ö½ÚµãÅä¶ÔµÄ£¬¶øÊÇÕë¶Ôij¸öµ¥Ò»½ÚµãµÄ²ÎÊý¡£²»Äܼòµ¥µØ½«ÕâЩֵ×÷Ϊ½ÚµãÈý½Ç»ò½ÚµãϵÊýÌí¼Óµ½ÎÒÃǵÄDataFrameÖУ¬ÒòΪÎÞ·¨±£Ö¤½ÚµãÅä¶ÔµÄ˳Ðò£¬ÎÒÃÇÐèÒªÒ»ÖÖÓë˳ÐòÎ޹صķ½·¨¡£ÕâÀï¿ÉÒÔͨ¹ýȡƽ¾ùÖµ¡¢ÖµµÄ³Ë»ý»òͨ¹ý¼ÆËã×îСֵºÍ×î´óÖµÀ´ÊµÏÖ´ËÄ¿µÄ£¬Èç´Ë´¦Ëùʾ£º
training_df =
apply_triangles_features(training_df, "trianglesTrain",
"coefficientTrain")
test_df = apply_triangles_features(test_df,
"trianglesTest", "coefficientTest") |
ÏÖÔÚ¿ÉÒÔѵÁ·ÓëÆÀ¹À£º
columns = [
"cn", "pa", "tn",
"minTriangles", "maxTriangles",
"minCoeff", "maxCoeff"
]
X = training_df[columns]
y = training_df["label"]
classifier.fit(X, y)
predictions = classifier.predict(test_df[columns])
y_test = test_df["label"]
display(evaluate_model(predictions, y_test)) |
£¨×¼È·ÂÊ£¬¾«×¼¶È£¬Õٻضȣ©
ÕâÐ©ÌØÕ÷ºÜÓаïÖú£¡ÎÒÃǵÄÿÏî²ÎÊý¶¼±È³õʼģÐÍÌá¸ßÁËÔ¼4£¥¡£ÄĸöÌØÕ÷×îÖØÒª£¿
display(feature_importance(columns,
classifier)) |

£¨ÌØÕ÷ÖØÒª¶È£©
¹²Í¬ÁÚ¾Ó»¹ÊÇ×î¾ßÓÐÓ°ÏìÁ¦µÄÌØÕ÷£¬µ«Èý½ÇÌØÕ÷µÄÖØÒªÐÔÒ²ÌáÉýÁ˲»ÉÙ¡£
ÕâÆª½Ì³Ì¼´½«½áÊø£¬»ùÓÚÕû¸ö¹¤×÷Á÷³Ì£¬Ï£Íû»¹¿ÉÒÔ¼¤·¢´ó¼Ò¸ü¶àµÄ˼¿¼£º
£¨1£©»¹ÓÐÆäËû¿ÉÌí¼ÓµÄÌØÕ÷Âð£¿ÕâÐ©ÌØÕ÷ÄܰïÖúÎÒÃÇ´´½¨¸ü¸ß׼ȷÐÔµÄÄ£ÐÍÂð£¿Ò²ÐíÆäËûÉçÇø¼ì²âÉõÖÁÖÐÐÄËã·¨Ò²¿ÉÄÜ»áÓÐËù°ïÖú£¿
£¨2£©Ä¿Ç°£¬Í¼ÐÎËã·¨¿âÖеÄÁ´½ÓÔ¤²âËã·¨½öÊÊÓÃÓÚµ¥Áã¼þͼ£¨Á½¸ö½ÚµãµÄ±êÇ©ÏàͬµÄͼ£©£¬¸ÃËã·¨»ùÓÚ½ÚµãµÄÍØÆË£»Èç¹ûÎÒÃdz¢ÊÔ½«ÆäÓ¦ÓÃÓÚ¾ßÓв»Í¬±êÇ©µÄ½Úµã£¨ÕâЩ½Úµã¿ÉÄܾßÓв»Í¬µÄÍØÆË£©£¬Õâ¾ÍÒâζ×Å´ËËã·¨ÎÞ·¨ºÜºÃµØ·¢»Ó×÷Óã¬ËùÒÔĿǰҲÔÚ¿¼ÂÇÌí¼ÓÊÊÓÃÓÚÆäËûͼ±íµÄÁ´½ÓÔ¤²âËã·¨µÄ°æ±¾.
|