±à¼ÍƼö: |
±¾ÎÄÖ÷Òª½éÉÜGraphXÓ¦Óñ³¾°ÒÔ¼°¿ò¼ÜºÍ·¢Õ¹Àú³Ì£¬Í¨¹ýʵÀýÑÝʾ¶ÔGraphX½øÐÐʵÏÖ·ÖÎö£¬Ï£Íû¶ÔÄúµÄѧϰÓÐËù°ïÖú¡£
±¾ÎÄÀ´×ÔÓÚ²©¿ÍÔ°£¬ÓÉ»ðÁú¹ûÈí¼þAlice±à¼¡¢ÍƼö¡£
|
|
1¡¢GraphX½éÉÜ
1.1 GraphXÓ¦Óñ³¾°
Spark GraphXÊÇÒ»¸ö·Ö²¼Ê½Í¼´¦Àí¿ò¼Ü£¬ËüÊÇ»ùÓÚSparkƽ̨Ìṩ¶Ôͼ¼ÆËãºÍͼÍÚ¾ò¼ò½àÒ×ÓõĶø·á¸»µÄ½Ó¿Ú£¬¼«´óµÄ·½±ãÁ˶Էֲ¼Ê½Í¼´¦ÀíµÄÐèÇó¡£
ÖÚËùÖÜÖª¡¤£¬Éç½»ÍøÂçÖÐÈËÓëÈËÖ®¼äÓÐºÜ¶à¹ØÏµÁ´£¬ÀýÈçTwitter¡¢Facebook¡¢Î¢²©ºÍ΢Ðŵȣ¬ÕâЩ¶¼ÊÇ´óÊý¾Ý²úÉúµÄµØ·½¶¼ÐèҪͼ¼ÆË㣬ÏÖÔÚµÄͼ´¦Àí»ù±¾¶¼ÊÇ·Ö²¼Ê½µÄͼ´¦Àí£¬¶ø²¢·Çµ¥»ú´¦Àí¡£Spark
GraphXÓÉÓڵײãÊÇ»ùÓÚSparkÀ´´¦ÀíµÄ£¬ËùÒÔÌìÈ»¾ÍÊÇÒ»¸ö·Ö²¼Ê½µÄͼ´¦Àíϵͳ¡£
ͼµÄ·Ö²¼Ê½»òÕß²¢Ðд¦ÀíÆäʵÊǰÑͼ²ð·Ö³ÉºÜ¶àµÄ×Óͼ£¬È»ºó·Ö±ð¶ÔÕâЩ×Óͼ½øÐмÆË㣬¼ÆËãµÄʱºò¿ÉÒÔ·Ö±ðµü´ú½øÐзֽ׶εļÆË㣬¼´¶Ôͼ½øÐв¢ÐмÆËã¡£ÏÂÃæÎÒÃÇ¿´Ò»ÏÂͼ¼ÆËãµÄ¼òµ¥Ê¾Àý£º

´ÓͼÖÐÎÒÃÇ¿ÉÒÔ¿´³ö£ºÄõ½WikipediaµÄÎĵµÒԺ󣬿ÉÒÔ±ä³ÉLink TableÐÎʽµÄÊÓͼ£¬È»ºó»ùÓÚLink
TableÐÎʽµÄÊÓͼ¿ÉÒÔ·ÖÎö³ÉHyperlinks³¬Á´½Ó£¬×îºóÎÒÃÇ¿ÉÒÔʹÓÃPageRankÈ¥·ÖÎöµÃ³öTop
Communities¡£ÔÚÏÂÃæÂ·¾¶ÖеÄEditor Graphµ½Community£¬Õâ¸ö¹ý³Ì¿ÉÒÔ³ÆÖ®ÎªTriangle
Computation£¬ÕâÊǼÆËãÈý½ÇÐεÄÒ»¸öËã·¨£¬»ùÓڴ˻ᷢÏÖÒ»¸öÉçÇø¡£´ÓÉÏÃæµÄ·ÖÎöÖÐÎÒÃÇ¿ÉÒÔ·¢ÏÖͼ¼ÆËãÓкܶàµÄ×ö·¨ºÍËã·¨£¬Í¬Ê±Ò²·¢ÏÖͼºÍ±í¸ñ¿ÉÒÔ×ö»¥ÏàµÄת»»¡£
1.2 GraphXµÄ¿ò¼Ü
Éè¼ÆGraphXʱ£¬µã·Ö¸îºÍGAS¶¼ÒѳÉÊ죬ÔÚÉè¼ÆºÍ±àÂëÖÐÕë¶ÔËüÃǽøÐÐÁËÓÅ»¯£¬²¢ÔÚ¹¦ÄܺÍÐÔÄÜÖ®¼äѰÕÒ×î¼ÑµÄƽºâµã¡£ÈçͬSpark±¾Éí£¬Ã¿¸ö×ÓÄ£¿é¶¼ÓÐÒ»¸öºËÐijéÏó¡£GraphXµÄºËÐijéÏóÊÇResilient
Distributed Property Graph£¬Ò»ÖÖµãºÍ±ß¶¼´øÊôÐÔµÄÓÐÏò¶àÖØÍ¼¡£ËüÀ©Õ¹ÁËSpark
RDDµÄ³éÏó£¬ÓÐTableºÍGraphÁ½ÖÖÊÓͼ£¬¶øÖ»ÐèÒªÒ»·ÝÎïÀí´æ´¢¡£Á½ÖÖÊÓͼ¶¼ÓÐ×Ô¼º¶ÀÓеIJÙ×÷·û£¬´Ó¶ø»ñµÃÁËÁé»î²Ù×÷ºÍÖ´ÐÐЧÂÊ¡£

ÈçͬSpark£¬GraphXµÄ´úÂë·Ç³£¼ò½à¡£GraphXµÄºËÐÄ´úÂëÖ»ÓÐ3ǧ¶àÐУ¬¶øÔÚ´ËÖ®ÉÏʵÏÖµÄPregelģʽ£¬Ö»Òª¶Ì¶ÌµÄ20¶àÐС£GraphXµÄ´úÂë½á¹¹ÕûÌåÏÂͼËùʾ£¬ÆäÖд󲿷ֵÄʵÏÖ£¬¶¼ÊÇÎ§ÈÆPartitionµÄÓÅ»¯½øÐеġ£ÕâÔÚijÖ̶ֳÈÉÏ˵Ã÷Á˵ã·Ö¸îµÄ´æ´¢ºÍÏàÓ¦µÄ¼ÆËãÓÅ»¯£¬µÄÈ·ÊÇͼ¼ÆËã¿ò¼ÜµÄÖØµãºÍÄѵ㡣
1.3 ·¢Õ¹Àú³Ì
lÔçÔÚ0.5°æ±¾£¬Spark¾Í´øÁËÒ»¸öСÐ͵ÄBagelÄ£¿é£¬ÌṩÁËÀàËÆPregelµÄ¹¦ÄÜ¡£µ±È»£¬Õâ¸ö°æ±¾»¹·Ç³£Ôʼ£¬ÐÔÄܺ͹¦Äܶ¼±È½ÏÈõ£¬ÊôÓÚʵÑéÐͲúÆ·¡£
lµ½0.8°æ±¾Ê±£¬¼øÓÚÒµ½ç¶Ô·Ö²¼Ê½Í¼¼ÆËãµÄÐèÇóÈÕÒæ¼ûÕÇ£¬Spark¿ªÊ¼¶ÀÁ¢Ò»¸ö·ÖÖ§Graphx-Branch£¬×÷Ϊ¶ÀÁ¢µÄͼ¼ÆËãÄ£¿é£¬½è¼øGraphLab£¬¿ªÊ¼Éè¼Æ¿ª·¢GraphX¡£
lÔÚ0.9°æ±¾ÖУ¬Õâ¸öÄ£¿é±»Õýʽ¼¯³Éµ½Ö÷¸É£¬ËäÈ»ÊÇAlpha°æ±¾£¬µ«ÒÑ¿ÉÒÔÊÔÓã¬Ð¡Ãæ°üȦBagel¸æ±ðÎę̀¡£1.0°æ±¾£¬GraphXÕýʽͶÈëÉú²úʹÓá£

ÖµµÃ×¢ÒâµÄÊÇ£¬GraphXĿǰÒÀÈ»´¦ÓÚ¿ìËÙ·¢Õ¹ÖУ¬´Ó0.8µÄ·ÖÖ§µ½0.9ºÍ1.0£¬Ã¿¸ö°æ±¾´úÂë¶¼Óв»ÉٵĸĽøºÍÖØ¹¹¡£¸ù¾Ý¹Û²ì£¬ÔÚûÓиÄÈκδúÂëÂß¼ºÍÔËÐл·¾³£¬Ö»ÊÇÉý¼¶°æ±¾¡¢Çл»½Ó¿ÚºÍÖØÐ±àÒëµÄÇé¿öÏ£¬Ã¿¸ö°æ±¾ÓÐ10%~20%µÄÐÔÄÜÌáÉý¡£ËäÈ»ºÍGraphLabµÄÐÔÄÜ»¹ÓÐÒ»¶¨²î¾à£¬µ«Æ¾½èSparkÕûÌåÉϵÄÒ»Ì廯Á÷Ë®Ïß´¦Àí£¬ÉçÇøÈÈÁҵĻîÔ¾¶È¼°¿ìËٸĽøËÙ¶È£¬GraphX¾ßÓÐÇ¿´óµÄ¾ºÕùÁ¦¡£
2¡¢GraphXʵÏÖ·ÖÎö
ÈçͬSpark±¾Éí£¬Ã¿¸ö×ÓÄ£¿é¶¼ÓÐÒ»¸öºËÐijéÏó¡£GraphXµÄºËÐijéÏóÊÇResilient Distributed
Property Graph£¬Ò»ÖÖµãºÍ±ß¶¼´øÊôÐÔµÄÓÐÏò¶àÖØÍ¼¡£ËüÀ©Õ¹ÁËSpark RDDµÄ³éÏó£¬ÓÐTableºÍGraphÁ½ÖÖÊÓͼ£¬¶øÖ»ÐèÒªÒ»·ÝÎïÀí´æ´¢¡£Á½ÖÖÊÓͼ¶¼ÓÐ×Ô¼º¶ÀÓеIJÙ×÷·û£¬´Ó¶ø»ñµÃÁËÁé»î²Ù×÷ºÍÖ´ÐÐЧÂÊ¡£

GraphXµÄµ×²ãÉè¼ÆÓÐÒÔϼ¸¸ö¹Ø¼üµã¡£
¶ÔGraphÊÓͼµÄËùÓвÙ×÷£¬×îÖÕ¶¼»áת»»³ÉÆä¹ØÁªµÄTableÊÓͼµÄRDD²Ù×÷À´Íê³É¡£ÕâÑù¶ÔÒ»¸öͼµÄ¼ÆË㣬×îÖÕÔÚÂß¼ÉÏ£¬µÈ¼ÛÓÚһϵÁÐRDDµÄת»»¹ý³Ì¡£Òò´Ë£¬Graph×îÖվ߱¸ÁËRDDµÄ3¸ö¹Ø¼üÌØÐÔ£ºImmutable¡¢DistributedºÍFault-Tolerant£¬ÆäÖÐ×î¹Ø¼üµÄÊÇImmutable£¨²»±äÐÔ£©¡£Âß¼ÉÏ£¬ËùÓÐͼµÄת»»ºÍ²Ù×÷¶¼²úÉúÁËÒ»¸öÐÂͼ£»ÎïÀíÉÏ£¬GraphX»áÓÐÒ»¶¨³Ì¶ÈµÄ²»±ä¶¥µãºÍ±ßµÄ¸´ÓÃÓÅ»¯£¬¶ÔÓû§Í¸Ã÷¡£
Á½ÖÖÊÓͼµ×²ã¹²ÓõÄÎïÀíÊý¾Ý£¬ÓÉRDD[Vertex-Partition]ºÍRDD[EdgePartition]ÕâÁ½¸öRDD×é³É¡£µãºÍ±ßʵ¼Ê¶¼²»ÊÇÒÔ±íCollection[tuple]µÄÐÎʽ´æ´¢µÄ£¬¶øÊÇÓÉVertexPartition/EdgePartitionÔÚÄÚ²¿´æ´¢Ò»¸ö´øË÷Òý½á¹¹µÄ·ÖƬÊý¾Ý¿é£¬ÒÔ¼ÓËÙ²»Í¬ÊÓͼϵıéÀúËÙ¶È¡£²»±äµÄË÷Òý½á¹¹ÔÚRDDת»»¹ý³ÌÖÐÊǹ²Óõ쬽µµÍÁ˼ÆËãºÍ´æ´¢¿ªÏú¡£

ͼµÄ·Ö²¼Ê½´æ´¢²ÉÓõã·Ö¸îģʽ£¬¶øÇÒʹÓÃpartitionBy·½·¨£¬ÓÉÓû§Ö¸¶¨²»Í¬µÄ»®·Ö²ßÂÔ£¨PartitionStrategy£©¡£»®·Ö²ßÂԻὫ±ß·ÖÅäµ½¸÷¸öEdgePartition£¬¶¥µãMaster·ÖÅäµ½¸÷¸öVertexPartition£¬EdgePartitionÒ²»á»º´æ±¾µØ±ß¹ØÁªµãµÄGhost¸±±¾¡£»®·Ö²ßÂԵIJ»Í¬»áÓ°Ïìµ½ËùÐèÒª»º´æµÄGhost¸±±¾ÊýÁ¿£¬ÒÔ¼°Ã¿¸öEdgePartition·ÖÅäµÄ±ßµÄ¾ùºâ³Ì¶È£¬ÐèÒª¸ù¾ÝͼµÄ½á¹¹ÌØÕ÷ѡȡ×î¼Ñ²ßÂÔ¡£Ä¿Ç°ÓÐEdgePartition2d¡¢EdgePartition1d¡¢RandomVertexCutºÍCanonicalRandomVertexCutÕâËÄÖÖ²ßÂÔ¡£
2.1 ´æ´¢Ä£Ê½
2.1.1 ͼ´æ´¢Ä£Ê½
¾ÞÐÍͼµÄ´æ´¢×ÜÌåÉÏÓб߷ָîºÍµã·Ö¸îÁ½ÖÖ´æ´¢·½Ê½¡£2013Ä꣬GraphLab2.0½«Æä´æ´¢·½Ê½Óɱ߷ָî±äΪµã·Ö¸î£¬ÔÚÐÔÄÜÉÏÈ¡µÃÖØ´óÌáÉý£¬Ä¿Ç°»ù±¾Éϱ»Òµ½ç¹ã·º½ÓÊܲ¢Ê¹Óá£
l±ß·Ö¸î£¨Edge-Cut£©£ºÃ¿¸ö¶¥µã¶¼´æ´¢Ò»´Î£¬µ«Óеı߻ᱻ´ò¶Ï·Öµ½Á½Ì¨»úÆ÷ÉÏ¡£ÕâÑù×öµÄºÃ´¦ÊǽÚÊ¡´æ´¢¿Õ¼ä£»»µ´¦ÊǶÔͼ½øÐлùÓڱߵļÆËãʱ£¬¶ÔÓÚÒ»ÌõÁ½¸ö¶¥µã±»·Öµ½²»Í¬»úÆ÷ÉϵıßÀ´Ëµ£¬Òª¿ç»úÆ÷ͨÐÅ´«ÊäÊý¾Ý£¬ÄÚÍøÍ¨ÐÅÁ÷Á¿´ó¡£
lµã·Ö¸î£¨Vertex-Cut£©£ºÃ¿Ìõ±ßÖ»´æ´¢Ò»´Î£¬¶¼Ö»»á³öÏÖÔÚһ̨»úÆ÷ÉÏ¡£ÁÚ¾Ó¶àµÄµã»á±»¸´ÖƵ½¶ą̀»úÆ÷ÉÏ£¬Ôö¼ÓÁË´æ´¢¿ªÏú£¬Í¬Ê±»áÒý·¢Êý¾Ýͬ²½ÎÊÌâ¡£ºÃ´¦ÊÇ¿ÉÒÔ´ó·ù¼õÉÙÄÚÍøÍ¨ÐÅÁ¿¡£

ËäÈ»Á½ÖÖ·½·¨»¥ÓÐÀû±×£¬µ«ÏÖÔÚÊǵã·Ö¸îÕ¼ÉϷ磬¸÷ÖÖ·Ö²¼Ê½Í¼¼ÆËã¿ò¼Ü¶¼½«×Ô¼ºµ×²ãµÄ´æ´¢ÐÎʽ±ä³ÉÁ˵ã·Ö¸î¡£Ö÷ÒªÔÒòÓÐÒÔÏÂÁ½¸ö¡£
1.´ÅÅ̼۸ñϽµ£¬´æ´¢¿Õ¼ä²»ÔÙÊÇÎÊÌ⣬¶øÄÚÍøµÄͨÐÅ×ÊԴûÓÐÍ»ÆÆÐÔ½øÕ¹£¬¼¯Èº¼ÆËãʱÄÚÍø´ø¿íÊDZ¦¹óµÄ£¬Ê±¼ä±È´ÅÅ̸üÕä¹ó¡£Õâµã¾ÍÀàËÆÓÚ³£¼ûµÄ¿Õ¼ä»»Ê±¼äµÄ²ßÂÔ¡£
2.ÔÚµ±Ç°µÄÓ¦Óó¡¾°ÖУ¬¾ø´ó¶àÊýÍøÂç¶¼ÊÇ¡°Î޳߶ÈÍøÂ硱£¬×ñÑÃÝÂÉ·Ö²¼£¬²»Í¬µãµÄÁÚ¾ÓÊýÁ¿Ïà²î·Ç³£ÐüÊâ¡£¶ø±ß·Ö¸î»áʹÄÇЩ¶àÁھӵĵãËùÏàÁ¬µÄ±ß´ó¶àÊý±»·Öµ½²»Í¬µÄ»úÆ÷ÉÏ£¬ÕâÑùµÄÊý¾Ý·Ö²¼»áʹµÃÄÚÍø´ø¿í¸ü¼Ó×½½ó¼ûÖ⣬ÓÚÊÇ±ß·Ö¸î´æ´¢·½Ê½±»½¥½¥ÅׯúÁË¡£
2.1.2 GraphX´æ´¢Ä£Ê½
Graphx½è¼øPowerGraph£¬Ê¹ÓõÄÊÇVertex-Cut(µã·Ö¸î)·½Ê½´æ´¢Í¼£¬ÓÃÈý¸öRDD´æ´¢Í¼Êý¾ÝÐÅÏ¢£º
lVertexTable(id, data)£ºidΪVertex id£¬dataΪEdge data
lEdgeTable(pid, src, dst, data)£ºpidΪPartion id£¬srcΪԶ¨µãid£¬dstΪĿµÄ¶¥µãid
lRoutingTable(id, pid)£ºidΪVertex id£¬pidΪPartion id
µã·Ö¸î´æ´¢ÊµÏÖÈçÏÂͼËùʾ£º

2.2 ¼ÆËãģʽ
2.2.1 ͼ¼ÆËãģʽ
Ŀǰ»ùÓÚͼµÄ²¢ÐмÆËã¿ò¼ÜÒѾÓкܶ࣬±ÈÈçÀ´×ÔGoogleµÄPregel¡¢À´×ÔApache¿ªÔ´µÄͼ¼ÆËã¿ò¼ÜGiraph/HAMAÒÔ¼°×îÎªÖøÃûµÄGraphLab£¬ÆäÖÐPregel¡¢HAMAºÍGiraph¶¼ÊǷdz£ÀàËÆµÄ£¬¶¼ÊÇ»ùÓÚBSP£¨Bulk
Synchronous Parallell£©Ä£Ê½¡£
Bulk Synchronous Parallell£¬¼´ÕûÌåͬ²½²¢ÐУ¬Ëü½«¼ÆËã·Ö³ÉһϵÁеij¬²½£¨superstep£©µÄµü´ú£¨iteration£©¡£´Ó×ÝÏòÉÏ¿´£¬ËüÊÇÒ»¸ö´®ÐÐģʽ£¬¶ø´ÓºáÏòÉÏ¿´£¬ËüÊÇÒ»¸ö²¢ÐеÄģʽ£¬Ã¿Á½¸ösuperstepÖ®¼äÉèÖÃÒ»¸öÕ¤À¸£¨barrier£©£¬¼´ÕûÌåͬ²½µã£¬È·¶¨ËùÓв¢ÐеļÆËã¶¼Íê³ÉºóÔÙÆô¶¯ÏÂÒ»ÂÖsuperstep¡£

ÿһ¸ö³¬²½£¨superstep£©°üº¬Èý²¿·ÖÄÚÈÝ£º
1.¼ÆËãcompute£ºÃ¿Ò»¸öprocessorÀûÓÃÉÏÒ»¸ösuperstep´«¹ýÀ´µÄÏûÏ¢ºÍ±¾µØµÄÊý¾Ý½øÐб¾µØ¼ÆË㣻
2.ÏûÏ¢´«µÝ£ºÃ¿Ò»¸öprocessor¼ÆËãÍê±Ïºó£¬½«ÏûÏ¢´«µÝ¸öÓëÖ®¹ØÁªµÄÆäËüprocessors
3.ÕûÌåͬ²½µã£ºÓÃÓÚÕûÌåͬ²½£¬È·¶¨ËùÓеļÆËãºÍÏûÏ¢´«µÝ¶¼½øÐÐÍê±Ïºó£¬½øÈëÏÂÒ»¸ösuperstep¡£
2.2.2GraphX¼ÆËãģʽ
ÈçͬSparkÒ»Ñù£¬GraphXµÄGraphÀàÌṩÁ˷ḻµÄͼÔËËã·û£¬´óÖ½ṹÈçÏÂͼËùʾ¡£¿ÉÒÔÔÚ¹Ù·½GraphX
Programming GuideÖÐÕÒµ½Ã¿¸öº¯ÊýµÄÏêϸ˵Ã÷£¬±¾ÎĽö½²Êö¼¸¸öÐèҪעÒâµÄ·½·¨¡£

2.2.2.1 ͼµÄ»º´æ
ÿ¸öͼÊÇÓÉ3¸öRDD×é³É£¬ËùÒÔ»áÕ¼Óøü¶àµÄÄÚ´æ¡£ÏàӦͼµÄcache¡¢unpersistºÍcheckpoint£¬¸üÐèҪעÒâʹÓü¼ÇÉ¡£³öÓÚ×î´óÏ޶ȸ´ÓñߵÄÀíÄGraphXµÄĬÈϽӿÚÖ»ÌṩÁËunpersistVertices·½·¨¡£Èç¹ûÒªÊͷűߣ¬µ÷ÓÃg.edges.unpersist()·½·¨²ÅÐУ¬Õâ¸øÓû§´øÀ´ÁËÒ»¶¨µÄ²»±ã£¬µ«ÎªGraphXµÄÓÅ»¯ÌṩÁ˱ãÀûºÍ¿Õ¼ä¡£²Î¿¼GraphXµÄPregel´úÂ룬¶ÔÒ»¸ö´óͼ£¬Ä¿Ç°×î¼ÑµÄʵ¼ùÊÇ£º

´óÌåÖ®ÒâÊǸù¾ÝGraphXÖÐGraphµÄ²»±äÐÔ£¬¶Ôg×ö²Ù×÷²¢¸³»Ø¸øgÖ®ºó£¬gÒѲ»ÊÇÔÀ´µÄgÁË£¬¶øÇÒ»áÔÚÏÂÒ»ÂÖµü´úʹÓã¬ËùÒÔ±ØÐëcache¡£ÁíÍ⣬±ØÐëÏÈÓÃprevG±£Áôס¶ÔÔÀ´Í¼µÄÒýÓ㬲¢ÔÚÐÂͼ²úÉúºó£¬¿ìËÙ½«¾Éͼ³¹µ×Êͷŵô¡£·ñÔò£¬Ê®¼¸ÂÖµü´úºó£¬»áÓÐÄÚ´æÐ¹Â©ÎÊÌ⣬ºÜ¿ìºÄ¹â×÷Òµ»º´æ¿Õ¼ä¡£
2.2.2.2 Áڱ߾ۺÏ
mrTriplets£¨mapReduceTriplets£©ÊÇGraphXÖÐ×îºËÐĵÄÒ»¸ö½Ó¿Ú¡£PregelÒ²»ùÓÚËü¶øÀ´£¬ËùÒÔ¶ÔËüµÄÓÅ»¯Äܴܺó³Ì¶ÈÉÏÓ°ÏìÕû¸öGraphXµÄÐÔÄÜ¡£mrTripletsÔËËã·ûµÄ¼ò»¯¶¨ÒåÊÇ£º

ËüµÄ¼ÆËã¹ý³ÌΪ£ºmap£¬Ó¦ÓÃÓÚÿһ¸öTripletÉÏ£¬Éú³ÉÒ»¸ö»òÕß¶à¸öÏûÏ¢£¬ÏûÏ¢ÒÔTriplet¹ØÁªµÄÁ½¸ö¶¥µãÖеÄÈÎÒâÒ»¸ö»òÁ½¸öΪĿ±ê¶¥µã£»reduce£¬Ó¦ÓÃÓÚÿһ¸öVertexÉÏ£¬½«·¢Ë͸øÃ¿Ò»¸ö¶¥µãµÄÏûÏ¢ºÏ²¢ÆðÀ´¡£
mrTriplets×îºó·µ»ØµÄÊÇÒ»¸öVertexRDD[A]£¬°üº¬Ã¿Ò»¸ö¶¥µã¾ÛºÏÖ®ºóµÄÏûÏ¢£¨ÀàÐÍΪA£©£¬Ã»ÓнÓÊÕµ½ÏûÏ¢µÄ¶¥µã²»»á°üº¬ÔÚ·µ»ØµÄVertexRDDÖС£
ÔÚ×î½üµÄ°æ±¾ÖУ¬GraphXÕë¶ÔËü½øÐÐÁËһЩÓÅ»¯£¬¶ÔÓÚPregelÒÔ¼°ËùÓÐÉϲãËã·¨¹¤¾ß°üµÄÐÔÄܶ¼ÓÐÖØ´óÓ°Ïì¡£Ö÷Òª°üÀ¨ÒÔϼ¸µã¡£
1. Caching for Iterative mrTriplets & Incremental
Updates for Iterative mrTriplets£ºÔںܶàͼ·ÖÎöËã·¨ÖУ¬²»Í¬µãµÄÊÕÁ²Ëٶȱ仯ºÜ´ó¡£ÔÚµü´úºóÆÚ£¬Ö»ÓкÜÉٵĵã»áÓиüС£Òò´Ë£¬¶ÔÓÚûÓиüеĵ㣬ÏÂÒ»´ÎmrTriplets¼ÆËãʱEdgeRDDÎÞÐè¸üÐÂÏàÓ¦µãÖµµÄ±¾µØ»º´æ£¬´ó·ù½µµÍÁËͨÐÅ¿ªÏú¡£
2.Indexing Active Edges£ºÃ»ÓиüÐµĶ¥µãÔÚÏÂÒ»ÂÖµü´úʱ²»ÐèÒªÏòÁÚ¾ÓÖØÐ·¢ËÍÏûÏ¢¡£Òò´Ë£¬mrTriplets±éÀú±ßʱ£¬Èç¹ûÒ»Ìõ±ßµÄÁÚ¾ÓµãÖµÔÚÉÏÒ»ÂÖµü´úʱûÓиüУ¬ÔòÖ±½ÓÌø¹ý£¬±ÜÃâÁË´óÁ¿ÎÞÓõļÆËãºÍͨÐÅ¡£
3.Join Elimination£ºTripletÊÇÓÉÒ»Ìõ±ßºÍÆäÁ½¸öÁÚ¾Óµã×é³ÉµÄÈýÔª×飬²Ù×÷TripletµÄmapº¯Êý³£³£Ö»Ðè·ÃÎÊÆäÁ½¸öÁÚ¾ÓµãÖµÖеÄÒ»¸ö¡£ÀýÈ磬ÔÚPageRank¼ÆËãÖУ¬Ò»¸öµãÖµµÄ¸üÐÂÖ»ÓëÆäÔ´¶¥µãµÄÖµÓйأ¬¶øÓëÆäËùÖ¸ÏòµÄÄ¿µÄ¶¥µãµÄÖµÎ޹ء£ÄÇôÔÚmrTriplets¼ÆËãÖУ¬¾Í²»ÐèÒªVertexRDDºÍEdgeRDDµÄ3-way
join£¬¶øÖ»ÐèÒª2-way join¡£
ËùÓÐÕâЩÓÅ»¯Ê¹GraphXµÄÐÔÄÜÖ𽥱ƽüGraphLab¡£ËäÈ»»¹ÓÐÒ»¶¨²î¾à£¬µ«Ò»Ì廯µÄÁ÷Ë®Ïß·þÎñºÍ·á¸»µÄ±à³Ì½Ó¿Ú£¬¿ÉÒÔÃÖ²¹ÐÔÄܵÄ΢С²î¾à¡£
2.2.2.3 ½ø»¯µÄPregelģʽ
GraphXÖеÄPregel½Ó¿Ú£¬²¢²»Ñϸñ×ñÑPregelģʽ£¬ËüÊÇÒ»¸ö²Î¿¼GAS¸Ä½øµÄPregelģʽ¡£¶¨ÒåÈçÏ£º

ÕâÖÖ»ùÓÚmrTrilets·½·¨µÄPregelģʽ£¬Óë±ê×¼PregelµÄ×î´óÇø±ðÊÇ£¬ËüµÄµÚ2¶Î²ÎÊýÌå½ÓÊÕµÄÊÇ3¸öº¯Êý²ÎÊý£¬¶ø²»½ÓÊÕmessageList¡£Ëü²»»áÔÚµ¥¸ö¶¥µãÉϽøÐÐÏûÏ¢±éÀú£¬¶øÊǽ«¶¥µãµÄ¶à¸öGhost¸±±¾ÊÕµ½µÄÏûÏ¢¾ÛºÏºó£¬·¢Ë͸øMaster¸±±¾£¬ÔÙʹÓÃvprogº¯ÊýÀ´¸üеãÖµ¡£ÏûÏ¢µÄ½ÓÊպͷ¢ËͶ¼±»×Ô¶¯²¢Ðл¯´¦Àí£¬ÎÞÐèµ£Ðij¬¼¶½ÚµãµÄÎÊÌâ¡£
³£¼ûµÄ´úÂëÄ£°åÈçÏÂËùʾ£º

¿ÉÒÔ¿´µ½£¬GraphXÉè¼ÆÕâ¸öģʽµÄÓÃÒâ¡£Ëü×ÛºÏÁËPregelºÍGASÁ½ÕßµÄÓŵ㣬¼´½Ó¿ÚÏà¶Ô¼òµ¥£¬ÓÖ±£Ö¤ÐÔÄÜ£¬¿ÉÒÔÓ¦¶Ôµã·Ö¸îµÄͼ´æ´¢Ä£Ê½£¬Ê¤ÈηûºÏÃÝÂÉ·Ö²¼µÄ×ÔȻͼµÄ´óÐͼÆËã¡£ÁíÍ⣬ֵµÃ×¢ÒâµÄÊÇ£¬¹Ù·½µÄPregel°æ±¾ÊÇ×î¼òµ¥µÄÒ»¸ö°æ±¾¡£¶ÔÓÚ¸´ÔÓµÄÒµÎñ³¡¾°£¬¸ù¾ÝÕâ¸ö°æ±¾À©Õ¹Ò»¸ö¶¨ÖƵÄPregelÊǺܳ£¼ûµÄ×ö·¨¡£
2.2.2.4 ͼËã·¨¹¤¾ß°ü
GraphXÒ²ÌṩÁËÒ»Ì×ͼËã·¨¹¤¾ß°ü£¬·½±ãÓû§¶Ôͼ½øÐзÖÎö¡£Ä¿Ç°×îа汾ÒÑÖ§³ÖPageRank¡¢ÊýÈý½ÇÐΡ¢×î´óÁ¬Í¨Í¼ºÍ×î¶Ì·¾¶µÈ6ÖÖ¾µäµÄͼËã·¨¡£ÕâЩËã·¨µÄ´úÂëʵÏÖ£¬Ä¿µÄºÍÖØµãÔÚÓÚͨÓÃÐÔ¡£Èç¹ûÒª»ñµÃ×î¼ÑÐÔÄÜ£¬¿ÉÒԲο¼ÆäʵÏÖ½øÐÐÐ޸ĺÍÀ©Õ¹Âú×ãÒµÎñÐèÇó¡£ÁíÍ⣬ÑжÁÕâЩ´úÂ룬ҲÊÇÀí½âGraphX±à³Ì×î¼Ñʵ¼ùµÄºÃ·½·¨¡£
3¡¢GraphXʵÀý
3.1 ͼÀýÑÝʾ
3.1.1 Àý×Ó½éÉÜ
ÏÂͼÖÐÓÐ6¸öÈË£¬Ã¿¸öÈËÓÐÃû×ÖºÍÄêÁ䣬ÕâЩÈ˸ù¾ÝÉç»á¹ØÏµÐγÉ8Ìõ±ß£¬Ã¿Ìõ±ßÓÐÆäÊôÐÔ¡£ÔÚÒÔÏÂÀý×ÓÑÝʾÖн«¹¹½¨¶¥µã¡¢±ßºÍͼ£¬´òӡͼµÄÊôÐÔ¡¢×ª»»²Ù×÷¡¢½á¹¹²Ù×÷¡¢Á¬½Ó²Ù×÷¡¢¾ÛºÏ²Ù×÷£¬²¢½áºÏʵ¼ÊÒªÇó½øÐÐÑÝʾ¡£

3.1.2 ³ÌÐò´úÂë
import org.apache.log4j.{Level,
Logger}
import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD
object GraphXExample {
def main(args: Array[String]) {
//ÆÁ±ÎÈÕÖ¾
Logger.getLogger("org.apache.spark").setLevel(Level.WARN)
Logger.getLogger("org.eclipse.jetty.server").setLevel (Level.OFF)
//ÉèÖÃÔËÐл·¾³
val conf = new SparkConf().setAppName("SimpleGraphX").setMaster("local")
val sc = new SparkContext(conf)
//ÉèÖö¥µãºÍ±ß£¬×¢Òâ¶¥µãºÍ±ß¶¼ÊÇÓÃÔª×鶨ÒåµÄArray
//¶¥µãµÄÊý¾ÝÀàÐÍÊÇVD:(String,Int)
val vertexArray = Array(
(1L, ("Alice", 28)),
(2L, ("Bob", 27)),
(3L, ("Charlie", 65)),
(4L, ("David", 42)),
(5L, ("Ed", 55)),
(6L, ("Fran", 50))
)
//±ßµÄÊý¾ÝÀàÐÍED:Int
val edgeArray = Array(
Edge(2L, 1L, 7),
Edge(2L, 4L, 2),
Edge(3L, 2L, 4),
Edge(3L, 6L, 3),
Edge(4L, 1L, 1),
Edge(5L, 2L, 2),
Edge(5L, 3L, 8),
Edge(5L, 6L, 3)
)
//¹¹ÔìvertexRDDºÍedgeRDD
val vertexRDD: RDD[(Long, (String, Int))] =
sc.parallelize(vertexArray)
val edgeRDD: RDD[Edge[Int]] = sc.parallelize(edgeArray)
//¹¹ÔìͼGraph[VD,ED]
val graph: Graph[(String, Int), Int] = Graph(vertexRDD,
edgeRDD)
println
println("ÊôÐÔÑÝʾ")
println
println("ÕÒ³öͼÖÐÄêÁä´óÓÚ30µÄ¶¥µã£º")
graph.vertices.filter { case (id, (name, age))
=>
age > 30}.
collect.foreach {
case (id, (name, age)) => println(s"nameisnameisage")
}
//±ß²Ù×÷£ºÕÒ³öͼÖÐÊôÐÔ´óÓÚ5µÄ±ß
println("ÕÒ³öͼÖÐÊôÐÔ´óÓÚ5µÄ±ß£º")
graph.edges.filter(e => e.attr > 5).collect.foreach(e
=> println(s"e.srcIdtoe.srcIdto{e.dstId}
att ${e.attr}"))
println
//triplets²Ù×÷£¬((srcId, srcAttr), (dstId, dstAttr),
attr)
println("Áгö±ßÊôÐÔ>5µÄtripltes£º")
for (triplet <- graph.triplets.filter(t =>
t.attr > 5)
.collect) {
println(s"triplet.srcAttr.1likestriplet.srcAttr.1likes
{triplet.dstAttr._1}")
}
println
//Degrees²Ù×÷
println("ÕÒ³öͼÖÐ×î´óµÄ³ö¶È¡¢Èë¶È¡¢¶ÈÊý£º")
def max(a: (VertexId, Int), b: (VertexId, Int)):
(VertexId, Int)
= {
if (a._2 > b._2) a else b (max) +
" max of inDegrees:" + graph.inDegrees.reduce(max)
+
" max of Degrees:" + graph.degrees.reduce(max))
println
println
println("ת»»²Ù×÷")
println
println("¶¥µãµÄת»»²Ù×÷£¬¶¥µãage + 10£º")
graph.mapVertices{ case (id, (name, age)) =>
(id, (name, age+10))}.vertices.collect.foreach(v
=> println(s"v.2.1isv.2.1is{v._2._2}"))
println
println("±ßµÄת»»²Ù×÷£¬±ßµÄÊôÐÔ*2£º")
graph.mapEdges(e=>e.attr*2).edges.collect.foreach(e
=> println(s"e.srcIdtoe.srcIdto{e.dstId}
att ${e.attr}"))
println
println
println("½á¹¹²Ù×÷")
println
println("¶¥µãÄê¼Í>30µÄ×Óͼ£º")
val subGraph = graph.subgraph(vpred = (id, vd)
=>
vd._2 >= 30)
println("×ÓͼËùÓж¥µã£º")
subGraph.vertices.collect.foreach(v => println(s"v.2.1isv.2.1is{v._2._2}"))
println
println("×ÓͼËùÓбߣº")
subGraph.edges.collect.foreach(e => println(s"e.srcIdtoe.srcIdto{e.dstId}
att ${e.attr}"))
println
println
println("Á¬½Ó²Ù×÷")
println
val inDegrees: VertexRDD[Int] = graph.inDegrees
case class User(name: String, age: Int, inDeg:
Int,
outDeg: Int)
//´´½¨Ò»¸öÐÂͼ£¬¶¥µãVDµÄÊý¾ÝÀàÐÍΪUser£¬²¢´Ógraph ×öÀàÐÍת»»
val initialUserGraph: Graph[User, Int] = graph. mapVertices { case (id, (name, age)) =>
User (name, age, 0, 0)}
//initialUserGraphÓëinDegrees¡¢outDegrees£¨RDD£©½øÐÐÁ¬½Ó£¬
²¢ÐÞ¸ÄinitialUserGraphÖÐinDegÖµ¡¢outDegÖµ
val userGraph = initialUserGraph.outerJoinVertices(initialUserGraph.
inDegrees) {
case (id, u, inDegOpt) => User(u.name, u.age,
inDegOpt.getOrElse(0), u.outDeg)
}.outerJoinVertices(initialUserGraph.outDegrees)
{
case (id, u, outDegOpt) => User(u.name, u.age,
u.inDeg,outDegOpt.getOrElse(0))
}
println("Á¬½ÓͼµÄÊôÐÔ£º")
userGraph.vertices.collect.foreach(v => println(s"v.2.nameinDeg:v.2.nameinDeg:{v._2.inDeg}
outDeg: ${v._2.outDeg}"))
println
println("³ö¶ÈºÍÈë¶ÁÏàͬµÄÈËÔ±£º")
userGraph.vertices.filter {
case (id, u) => u.inDeg == u.outDeg
}.collect.foreach {
case (id, property) => println(property.name)
}
println
println
println("¾ÛºÏ²Ù×÷")
println
println("ÕÒ³öÄê¼Í×î´óµÄ×·ÇóÕߣº")
val oldestFollower: VertexRDD[(String, Int)]
= userGraph.mapReduceTriplets[(String, Int)](
// ½«Ô´¶¥µãµÄÊôÐÔ·¢Ë͸øÄ¿±ê¶¥µã£¬map¹ý³Ì
edge => Iterator((edge.dstId, (edge.srcAttr.name,
edge.srcAttr.age))),
// µÃµ½×î´ó×·ÇóÕߣ¬reduce¹ý³Ì
(a, b) => if (a._2 > b._2) a else b
)
userGraph.vertices.leftJoin(oldestFollower)
{
(id, user, optOldestFollower) =>
optOldestFollower match {
case None => s"${user.name} does not
have any followers."
case Some((name, age)) => s"nameistheoldestfollowerofnameistheoldestfollowerof
{user.name}."
}
}.collect.foreach { case (id, str) => println(str)}
println
println
println("¾ÛºÏ²Ù×÷")
println
println("ÕÒ³ö5µ½¸÷¶¥µãµÄ×î¶Ì£º")
val sourceId: VertexId = 5L // ¶¨ÒåÔ´µã
val initialGraph = graph.mapVertices((id, _)
=> if (id == sourceId) 0.0 else Double.PositiveInfinity)
val sssp = initialGraph.pregel(Double.PositiveInfinity)(
(id, dist, newDist) => math.min(dist, newDist),
triplet => { // ¼ÆËãÈ¨ÖØ
if (triplet.srcAttr + triplet.attr < triplet.dstAttr)
{
Iterator((triplet.dstId, triplet.srcAttr + triplet.attr))
} else {
Iterator.empty
}
},
(a,b) => math.min(a,b) // ×î¶Ì¾àÀë
)
println(sssp.vertices.collect.mkString("\n"))
sc.stop()
}
}
|
3.1.3 ÔËÐнá¹û
ÔÚIDEA£¨ÈçºÎʹÓÃIDEA²Î¼ûµÚ3¿Î¡¶3.Spark±à³ÌÄ£ÐÍ£¨Ï£©--IDEA´î½¨¼°ÊµÕ½¡·£©ÖÐÊ×ÏȶÔGraphXExample.Scala´úÂë½øÐбàÒ룬±àÒëͨ¹ýºó½øÐÐÖ´ÐУ¬Ö´Ðнá¹ûÈçÏ£º
ÕÒ³öͼÖÐÄêÁä´óÓÚ30µÄ¶¥µã£º
David is 42
Fran is 50 Charlie is 65 Ed is 55 |
ÕÒ³öͼÖÐÊôÐÔ´óÓÚ5µÄ±ß£º
2 to 1 att 7
5 to 3 att 8 |
Áгö±ßÊôÐÔ>5µÄtripltes£º
Bob likes Alice
Ed likes Charlie |
ÕÒ³öͼÖÐ×î´óµÄ³ö¶È¡¢Èë¶È¡¢¶ÈÊý£º
max of outDegrees:(5,3)
max of inDegrees:(2,2) max of Degrees:(2,4) |
ת»»²Ù×÷
¶¥µãµÄת»»²Ù×÷£¬¶¥µãage + 10£º
4 is (David,52)
1 is (Alice,38) 6 is (Fran,60) 3 is (Charlie,75) 5 is (Ed,65) 2 is (Bob,37) |
±ßµÄת»»²Ù×÷£¬±ßµÄÊôÐÔ*2£º
2 to 1 att 14
2 to 4 att 4 3 to 2 att 8 3 to 6 att 6 4 to 1 att 2 5 to 2 att 4 5 to 3 att 16 5 to 6 att 6 |
½á¹¹²Ù×÷
¶¥µãÄê¼Í>30µÄ×Óͼ£º
×ÓͼËùÓж¥µã£º
David is 42
Fran is 50 Charlie is 65 Ed is 55 |
×ÓͼËùÓбߣº
3 to 6 att 3
5 to 3 att 8 5 to 6 att 3 |
Á¬½Ó²Ù×÷
Á¬½ÓͼµÄÊôÐÔ£º
David inDeg:
1 outDeg: 1
Alice inDeg: 2 outDeg: 0 Fran inDeg: 2 outDeg: 0 Charlie inDeg: 1 outDeg: 2 Ed inDeg: 0 outDeg: 3 Bob inDeg: 2 outDeg: 2 |
³ö¶ÈºÍÈë¶ÁÏàͬµÄÈËÔ±£º
¾ÛºÏ²Ù×÷
ÕÒ³öÄê¼Í×î´óµÄ×·ÇóÕߣº
Bob is the oldest
follower of David.
David is the oldest follower of Alice. Charlie is the oldest follower of Fran. Ed is the oldest follower of Charlie. Ed does not have any followers. Charlie is the oldest follower of Bob. |
ʵÓòÙ×÷
ÕÒ³ö5µ½¸÷¶¥µãµÄ×î¶Ì£º
(4,4.0)
(1,5.0) (6,3.0) (3,8.0) (5,0.0) (2,2.0) |

3.2 PageRank ÑÝʾ
3.2.1 Àý×Ó½éÉÜ
PageRank, ¼´ÍøÒ³ÅÅÃû£¬ÓÖ³ÆÍøÒ³¼¶±ð¡¢Google ×ó²àÅÅÃû»òÅ寿ÅÅÃû¡£ËüÊÇGoogle ´´Ê¼ÈËÀÀ
Å寿ºÍл¶û¸Ç¡¤ ²¼ÁÖÓÚ1997 Äê¹¹½¨ÔçÆÚµÄËÑË÷ϵͳÔÐÍʱÌá³öµÄÁ´½Ó·ÖÎöËã·¨¡£Ä¿Ç°ºÜ¶àÖØÒªµÄÁ´½Ó·ÖÎöËã·¨¶¼ÊÇÔÚPageRank
Ëã·¨»ù´¡ÉÏÑÜÉú³öÀ´µÄ¡£PageRank ÊÇGoogle ÓÃÓÚÓÃÀ´±êÊ¶ÍøÒ³µÄµÈ¼¶/ ÖØÒªÐÔµÄÒ»ÖÖ·½·¨£¬ÊÇGoogle
ÓÃÀ´ºâÁ¿Ò»¸öÍøÕ¾µÄºÃ»µµÄΨһ±ê×¼¡£ÔÚÈàºÏÁËÖîÈçTitle ±êʶºÍKeywords ±êʶµÈËùÓÐÆäËüÒòËØÖ®ºó£¬
Google ͨ¹ýPageRank À´µ÷Õû½á¹û£¬Ê¹ÄÇЩ¸ü¾ß¡°µÈ¼¶/ ÖØÒªÐÔ¡±µÄÍøÒ³ÔÚËÑË÷½á¹ûÖÐÁîÍøÕ¾ÅÅÃû»ñµÃÌáÉý£¬´Ó¶øÌá¸ßËÑË÷½á¹ûµÄÏà¹ØÐÔºÍÖÊÁ¿¡£

3.2.2 ²âÊÔÊý¾Ý
ÔÚÕâÀï²âÊÔÊý¾ÝΪ¶¥µãÊý¾Ýgraphx-wiki-vertices.txtºÍ±ßÊý¾Ýgraphx-wiki-edges.txt£¬¿ÉÒÔÔÚ±¾ÏµÁи½´ø×ÊÔ´/data/class9/Ŀ¼ÖÐÕÒµ½ÕâÁ½¸öÊý¾ÝÎļþ£¬ÆäÖиñʽΪ£º
l ¶¥µãΪ¶¥µã±àºÅºÍÍøÒ³±êÌâ

l ±ßÊý¾ÝÓÉÁ½¸ö¶¥µã¹¹³É

3.2.3 ³ÌÐò´úÂë
import org.apache.log4j.{Level,
Logger} import org.apache.spark.{SparkContext,
SparkConf}
import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD
object PageRank {
def main(args: Array[String]) {
//ÆÁ±ÎÈÕÖ¾
Logger.getLogger("org.apache.spark").setLevel(Level.WARN)
Logger.getLogger("org.eclipse.jetty.server").setLevel (Level.OFF)
//ÉèÖÃÔËÐл·¾³
val conf = new SparkConf().setAppName("PageRank").setMaster("local")
val sc = new SparkContext(conf)
//¶ÁÈëÊý¾ÝÎļþ
val articles: RDD[String] = sc.textFile("/home/Hadoop/IdeaProjects/data/graphx /graphx-wiki-vertices.txt")
val links: RDD[String] = sc.textFile("/home/hadoop/IdeaProjects/data/graphx /graphx-wiki-edges.txt")
//×°ÔØ¶¥µãºÍ±ß
val vertices = articles.map { line =>
val fields = line.split('\t')
(fields(0).toLong, fields(1))
}
val edges = links.map { line =>
val fields = line.split('\t')
Edge(fields(0).toLong, fields(1).toLong, 0)
}
//cache²Ù×÷
//val graph = Graph(vertices, edges, "").persist(StorageLevel.MEMORY_ONLY_SER)
val graph = Graph(vertices, edges, "").persist()
//graph.unpersistVertices(false)
//²âÊÔ
println
println("»ñÈ¡5¸ötripletÐÅÏ¢")
println
graph.triplets.take(5).foreach(println(_))
//pageRankËã·¨ÀïÃæµÄʱºòʹÓÃÁËcache()£¬¹ÊÇ°Ãæpersist µÄʱºòÖ»ÄÜʹÓÃMEMORY_ONLY
println
println("PageRank¼ÆË㣬»ñÈ¡×îÓмÛÖµµÄÊý¾Ý")
println
val prGraph = graph.pageRank(0.001).cache()
val titleAndPrGraph = graph.outerJoinVertices(prGraph. vertices) {
(v, title, rank) => (rank.getOrElse(0.0),
title)
}
titleAndPrGraph.vertices.top(10) {
Ordering.by((entry: (VertexId, (Double, String)))
=> entry._2._1)
}.foreach(t => println(t._2._2 + ":
" + t._2._1))
sc.stop()
}
} |
3.2.4 ÔËÐнá¹û
ÔÚIDEAÖÐÊ×ÏȶÔPageRank.scala´úÂë½øÐбàÒ룬±àÒëͨ¹ýºó½øÐÐÖ´ÐУ¬Ö´Ðнá¹ûÈçÏ£º
»ñÈ¡5¸ötripletÐÅÏ¢
PageRank¼ÆË㣬»ñÈ¡×îÓмÛÖµµÄÊý¾Ý
((146271392968588,Computer
Consoles Inc.),(7097126743572404313,Berkeley Software
Distribution),0)
((146271392968588,Computer Consoles Inc.),(8830299306937918434,University
of California, Berkeley),0) ((625290464179456,List of Penguin Classics),(1735121673437871410,George
Berkeley),0) ((1342848262636510,List of college swimming
and diving teams),(8830299306937918434,University
of California, Berkeley),0) ((1889887370673623,Anthony Pawson),(8830299306937918434,University
of California, Berkeley),0) PageRank¼ÆË㣬»ñÈ¡×îÓмÛÖµµÄÊý¾Ý University of California, Berkeley: 1321.111754312097 Berkeley, California: 664.8841977233583 Uc berkeley: 162.50132743397873 Berkeley Software Distribution: 90.4786038848606 Lawrence Berkeley National Laboratory: 81.90404939641944 George Berkeley: 81.85226118457985 Busby Berkeley: 47.871998218019655 Berkeley Hills: 44.76406979519754 Xander Berkeley: 30.324075347288037 Berkeley County, South Carolina: 28.908336483710308 |

|