1¡¢GraphX½éÉÜ
1.1 GraphXÓ¦Óñ³¾°
Spark GraphXÊÇÒ»¸ö·Ö²¼Ê½Í¼´¦Àí¿ò¼Ü£¬ËüÊÇ»ùÓÚSparkƽ̨Ìṩ¶Ôͼ¼ÆËãºÍͼÍÚ¾ò¼ò½àÒ×ÓõĶø·á¸»µÄ½Ó¿Ú£¬¼«´óµÄ·½±ãÁ˶Էֲ¼Ê½Í¼´¦ÀíµÄÐèÇó¡£
ÖÚËùÖÜÖª¡¤£¬Éç½»ÍøÂçÖÐÈËÓëÈËÖ®¼äÓÐºÜ¶à¹ØÏµÁ´£¬ÀýÈçTwitter¡¢Facebook¡¢Î¢²©ºÍ΢Ðŵȣ¬ÕâЩ¶¼ÊÇ´óÊý¾Ý²úÉúµÄµØ·½¶¼ÐèҪͼ¼ÆË㣬ÏÖÔÚµÄͼ´¦Àí»ù±¾¶¼ÊÇ·Ö²¼Ê½µÄͼ´¦Àí£¬¶ø²¢·Çµ¥»ú´¦Àí¡£Spark
GraphXÓÉÓڵײãÊÇ»ùÓÚSparkÀ´´¦ÀíµÄ£¬ËùÒÔÌìÈ»¾ÍÊÇÒ»¸ö·Ö²¼Ê½µÄͼ´¦Àíϵͳ¡£
ͼµÄ·Ö²¼Ê½»òÕß²¢Ðд¦ÀíÆäʵÊǰÑͼ²ð·Ö³ÉºÜ¶àµÄ×Óͼ£¬È»ºó·Ö±ð¶ÔÕâЩ×Óͼ½øÐмÆË㣬¼ÆËãµÄʱºò¿ÉÒÔ·Ö±ðµü´ú½øÐзֽ׶εļÆË㣬¼´¶Ôͼ½øÐв¢ÐмÆËã¡£ÏÂÃæÎÒÃÇ¿´Ò»ÏÂͼ¼ÆËãµÄ¼òµ¥Ê¾Àý£º

´ÓͼÖÐÎÒÃÇ¿ÉÒÔ¿´³ö£ºÄõ½WikipediaµÄÎĵµÒԺ󣬿ÉÒÔ±ä³ÉLink TableÐÎʽµÄÊÓͼ£¬È»ºó»ùÓÚLink
TableÐÎʽµÄÊÓͼ¿ÉÒÔ·ÖÎö³ÉHyperlinks³¬Á´½Ó£¬×îºóÎÒÃÇ¿ÉÒÔʹÓÃPageRankÈ¥·ÖÎöµÃ³öTop
Communities¡£ÔÚÏÂÃæÂ·¾¶ÖеÄEditor Graphµ½Community£¬Õâ¸ö¹ý³Ì¿ÉÒÔ³ÆÖ®ÎªTriangle
Computation£¬ÕâÊǼÆËãÈý½ÇÐεÄÒ»¸öËã·¨£¬»ùÓڴ˻ᷢÏÖÒ»¸öÉçÇø¡£´ÓÉÏÃæµÄ·ÖÎöÖÐÎÒÃÇ¿ÉÒÔ·¢ÏÖͼ¼ÆËãÓкܶàµÄ×ö·¨ºÍËã·¨£¬Í¬Ê±Ò²·¢ÏÖͼºÍ±í¸ñ¿ÉÒÔ×ö»¥ÏàµÄת»»¡£
1.2 GraphXµÄ¿ò¼Ü
Éè¼ÆGraphXʱ£¬µã·Ö¸îºÍGAS¶¼ÒѳÉÊ죬ÔÚÉè¼ÆºÍ±àÂëÖÐÕë¶ÔËüÃǽøÐÐÁËÓÅ»¯£¬²¢ÔÚ¹¦ÄܺÍÐÔÄÜÖ®¼äѰÕÒ×î¼ÑµÄƽºâµã¡£ÈçͬSpark±¾Éí£¬Ã¿¸ö×ÓÄ£¿é¶¼ÓÐÒ»¸öºËÐijéÏó¡£GraphXµÄºËÐijéÏóÊÇResilient
Distributed Property Graph£¬Ò»ÖÖµãºÍ±ß¶¼´øÊôÐÔµÄÓÐÏò¶àÖØÍ¼¡£ËüÀ©Õ¹ÁËSpark
RDDµÄ³éÏó£¬ÓÐTableºÍGraphÁ½ÖÖÊÓͼ£¬¶øÖ»ÐèÒªÒ»·ÝÎïÀí´æ´¢¡£Á½ÖÖÊÓͼ¶¼ÓÐ×Ô¼º¶ÀÓеIJÙ×÷·û£¬´Ó¶ø»ñµÃÁËÁé»î²Ù×÷ºÍÖ´ÐÐЧÂÊ¡£

ÈçͬSpark£¬GraphXµÄ´úÂë·Ç³£¼ò½à¡£GraphXµÄºËÐÄ´úÂëÖ»ÓÐ3ǧ¶àÐУ¬¶øÔÚ´ËÖ®ÉÏʵÏÖµÄPregelģʽ£¬Ö»Òª¶Ì¶ÌµÄ20¶àÐС£GraphXµÄ´úÂë½á¹¹ÕûÌåÏÂͼËùʾ£¬ÆäÖд󲿷ֵÄʵÏÖ£¬¶¼ÊÇÎ§ÈÆPartitionµÄÓÅ»¯½øÐеġ£ÕâÔÚijÖ̶ֳÈÉÏ˵Ã÷Á˵ã·Ö¸îµÄ´æ´¢ºÍÏàÓ¦µÄ¼ÆËãÓÅ»¯£¬µÄÈ·ÊÇͼ¼ÆËã¿ò¼ÜµÄÖØµãºÍÄѵ㡣
1.3 ·¢Õ¹Àú³Ì
lÔçÔÚ0.5°æ±¾£¬Spark¾Í´øÁËÒ»¸öСÐ͵ÄBagelÄ£¿é£¬ÌṩÁËÀàËÆPregelµÄ¹¦ÄÜ¡£µ±È»£¬Õâ¸ö°æ±¾»¹·Ç³£Ôʼ£¬ÐÔÄܺ͹¦Äܶ¼±È½ÏÈõ£¬ÊôÓÚʵÑéÐͲúÆ·¡£
lµ½0.8°æ±¾Ê±£¬¼øÓÚÒµ½ç¶Ô·Ö²¼Ê½Í¼¼ÆËãµÄÐèÇóÈÕÒæ¼ûÕÇ£¬Spark¿ªÊ¼¶ÀÁ¢Ò»¸ö·ÖÖ§Graphx-Branch£¬×÷Ϊ¶ÀÁ¢µÄͼ¼ÆËãÄ£¿é£¬½è¼øGraphLab£¬¿ªÊ¼Éè¼Æ¿ª·¢GraphX¡£
lÔÚ0.9°æ±¾ÖУ¬Õâ¸öÄ£¿é±»Õýʽ¼¯³Éµ½Ö÷¸É£¬ËäÈ»ÊÇAlpha°æ±¾£¬µ«ÒÑ¿ÉÒÔÊÔÓã¬Ð¡Ãæ°üȦBagel¸æ±ðÎę̀¡£1.0°æ±¾£¬GraphXÕýʽͶÈëÉú²úʹÓá£

ÖµµÃ×¢ÒâµÄÊÇ£¬GraphXĿǰÒÀÈ»´¦ÓÚ¿ìËÙ·¢Õ¹ÖУ¬´Ó0.8µÄ·ÖÖ§µ½0.9ºÍ1.0£¬Ã¿¸ö°æ±¾´úÂë¶¼Óв»ÉٵĸĽøºÍÖØ¹¹¡£¸ù¾Ý¹Û²ì£¬ÔÚûÓиÄÈκδúÂëÂß¼ºÍÔËÐл·¾³£¬Ö»ÊÇÉý¼¶°æ±¾¡¢Çл»½Ó¿ÚºÍÖØÐ±àÒëµÄÇé¿öÏ£¬Ã¿¸ö°æ±¾ÓÐ10%~20%µÄÐÔÄÜÌáÉý¡£ËäÈ»ºÍGraphLabµÄÐÔÄÜ»¹ÓÐÒ»¶¨²î¾à£¬µ«Æ¾½èSparkÕûÌåÉϵÄÒ»Ì廯Á÷Ë®Ïß´¦Àí£¬ÉçÇøÈÈÁҵĻîÔ¾¶È¼°¿ìËٸĽøËÙ¶È£¬GraphX¾ßÓÐÇ¿´óµÄ¾ºÕùÁ¦¡£
2¡¢GraphXʵÏÖ·ÖÎö
ÈçͬSpark±¾Éí£¬Ã¿¸ö×ÓÄ£¿é¶¼ÓÐÒ»¸öºËÐijéÏó¡£GraphXµÄºËÐijéÏóÊÇResilient Distributed
Property Graph£¬Ò»ÖÖµãºÍ±ß¶¼´øÊôÐÔµÄÓÐÏò¶àÖØÍ¼¡£ËüÀ©Õ¹ÁËSpark RDDµÄ³éÏó£¬ÓÐTableºÍGraphÁ½ÖÖÊÓͼ£¬¶øÖ»ÐèÒªÒ»·ÝÎïÀí´æ´¢¡£Á½ÖÖÊÓͼ¶¼ÓÐ×Ô¼º¶ÀÓеIJÙ×÷·û£¬´Ó¶ø»ñµÃÁËÁé»î²Ù×÷ºÍÖ´ÐÐЧÂÊ¡£

GraphXµÄµ×²ãÉè¼ÆÓÐÒÔϼ¸¸ö¹Ø¼üµã¡£
¶ÔGraphÊÓͼµÄËùÓвÙ×÷£¬×îÖÕ¶¼»áת»»³ÉÆä¹ØÁªµÄTableÊÓͼµÄRDD²Ù×÷À´Íê³É¡£ÕâÑù¶ÔÒ»¸öͼµÄ¼ÆË㣬×îÖÕÔÚÂß¼ÉÏ£¬µÈ¼ÛÓÚһϵÁÐRDDµÄת»»¹ý³Ì¡£Òò´Ë£¬Graph×îÖվ߱¸ÁËRDDµÄ3¸ö¹Ø¼üÌØÐÔ£ºImmutable¡¢DistributedºÍFault-Tolerant£¬ÆäÖÐ×î¹Ø¼üµÄÊÇImmutable£¨²»±äÐÔ£©¡£Âß¼ÉÏ£¬ËùÓÐͼµÄת»»ºÍ²Ù×÷¶¼²úÉúÁËÒ»¸öÐÂͼ£»ÎïÀíÉÏ£¬GraphX»áÓÐÒ»¶¨³Ì¶ÈµÄ²»±ä¶¥µãºÍ±ßµÄ¸´ÓÃÓÅ»¯£¬¶ÔÓû§Í¸Ã÷¡£
Á½ÖÖÊÓͼµ×²ã¹²ÓõÄÎïÀíÊý¾Ý£¬ÓÉRDD[Vertex-Partition]ºÍRDD[EdgePartition]ÕâÁ½¸öRDD×é³É¡£µãºÍ±ßʵ¼Ê¶¼²»ÊÇÒÔ±íCollection[tuple]µÄÐÎʽ´æ´¢µÄ£¬¶øÊÇÓÉVertexPartition/EdgePartitionÔÚÄÚ²¿´æ´¢Ò»¸ö´øË÷Òý½á¹¹µÄ·ÖƬÊý¾Ý¿é£¬ÒÔ¼ÓËÙ²»Í¬ÊÓͼϵıéÀúËÙ¶È¡£²»±äµÄË÷Òý½á¹¹ÔÚRDDת»»¹ý³ÌÖÐÊǹ²Óõ쬽µµÍÁ˼ÆËãºÍ´æ´¢¿ªÏú¡£

ͼµÄ·Ö²¼Ê½´æ´¢²ÉÓõã·Ö¸îģʽ£¬¶øÇÒʹÓÃpartitionBy·½·¨£¬ÓÉÓû§Ö¸¶¨²»Í¬µÄ»®·Ö²ßÂÔ£¨PartitionStrategy£©¡£»®·Ö²ßÂԻὫ±ß·ÖÅäµ½¸÷¸öEdgePartition£¬¶¥µãMaster·ÖÅäµ½¸÷¸öVertexPartition£¬EdgePartitionÒ²»á»º´æ±¾µØ±ß¹ØÁªµãµÄGhost¸±±¾¡£»®·Ö²ßÂԵIJ»Í¬»áÓ°Ïìµ½ËùÐèÒª»º´æµÄGhost¸±±¾ÊýÁ¿£¬ÒÔ¼°Ã¿¸öEdgePartition·ÖÅäµÄ±ßµÄ¾ùºâ³Ì¶È£¬ÐèÒª¸ù¾ÝͼµÄ½á¹¹ÌØÕ÷ѡȡ×î¼Ñ²ßÂÔ¡£Ä¿Ç°ÓÐEdgePartition2d¡¢EdgePartition1d¡¢RandomVertexCutºÍCanonicalRandomVertexCutÕâËÄÖÖ²ßÂÔ¡£
2.1 ´æ´¢Ä£Ê½
2.1.1 ͼ´æ´¢Ä£Ê½
¾ÞÐÍͼµÄ´æ´¢×ÜÌåÉÏÓб߷ָîºÍµã·Ö¸îÁ½ÖÖ´æ´¢·½Ê½¡£2013Ä꣬GraphLab2.0½«Æä´æ´¢·½Ê½Óɱ߷ָî±äΪµã·Ö¸î£¬ÔÚÐÔÄÜÉÏÈ¡µÃÖØ´óÌáÉý£¬Ä¿Ç°»ù±¾Éϱ»Òµ½ç¹ã·º½ÓÊܲ¢Ê¹Óá£
l±ß·Ö¸î£¨Edge-Cut£©£ºÃ¿¸ö¶¥µã¶¼´æ´¢Ò»´Î£¬µ«Óеı߻ᱻ´ò¶Ï·Öµ½Á½Ì¨»úÆ÷ÉÏ¡£ÕâÑù×öµÄºÃ´¦ÊǽÚÊ¡´æ´¢¿Õ¼ä£»»µ´¦ÊǶÔͼ½øÐлùÓڱߵļÆËãʱ£¬¶ÔÓÚÒ»ÌõÁ½¸ö¶¥µã±»·Öµ½²»Í¬»úÆ÷ÉϵıßÀ´Ëµ£¬Òª¿ç»úÆ÷ͨÐÅ´«ÊäÊý¾Ý£¬ÄÚÍøÍ¨ÐÅÁ÷Á¿´ó¡£
lµã·Ö¸î£¨Vertex-Cut£©£ºÃ¿Ìõ±ßÖ»´æ´¢Ò»´Î£¬¶¼Ö»»á³öÏÖÔÚһ̨»úÆ÷ÉÏ¡£ÁÚ¾Ó¶àµÄµã»á±»¸´ÖƵ½¶ą̀»úÆ÷ÉÏ£¬Ôö¼ÓÁË´æ´¢¿ªÏú£¬Í¬Ê±»áÒý·¢Êý¾Ýͬ²½ÎÊÌâ¡£ºÃ´¦ÊÇ¿ÉÒÔ´ó·ù¼õÉÙÄÚÍøÍ¨ÐÅÁ¿¡£

ËäÈ»Á½ÖÖ·½·¨»¥ÓÐÀû±×£¬µ«ÏÖÔÚÊǵã·Ö¸îÕ¼ÉϷ磬¸÷ÖÖ·Ö²¼Ê½Í¼¼ÆËã¿ò¼Ü¶¼½«×Ô¼ºµ×²ãµÄ´æ´¢ÐÎʽ±ä³ÉÁ˵ã·Ö¸î¡£Ö÷ÒªÔÒòÓÐÒÔÏÂÁ½¸ö¡£
1.´ÅÅ̼۸ñϽµ£¬´æ´¢¿Õ¼ä²»ÔÙÊÇÎÊÌ⣬¶øÄÚÍøµÄͨÐÅ×ÊԴûÓÐÍ»ÆÆÐÔ½øÕ¹£¬¼¯Èº¼ÆËãʱÄÚÍø´ø¿íÊDZ¦¹óµÄ£¬Ê±¼ä±È´ÅÅ̸üÕä¹ó¡£Õâµã¾ÍÀàËÆÓÚ³£¼ûµÄ¿Õ¼ä»»Ê±¼äµÄ²ßÂÔ¡£
2.ÔÚµ±Ç°µÄÓ¦Óó¡¾°ÖУ¬¾ø´ó¶àÊýÍøÂç¶¼ÊÇ¡°Î޳߶ÈÍøÂ硱£¬×ñÑÃÝÂÉ·Ö²¼£¬²»Í¬µãµÄÁÚ¾ÓÊýÁ¿Ïà²î·Ç³£ÐüÊâ¡£¶ø±ß·Ö¸î»áʹÄÇЩ¶àÁھӵĵãËùÏàÁ¬µÄ±ß´ó¶àÊý±»·Öµ½²»Í¬µÄ»úÆ÷ÉÏ£¬ÕâÑùµÄÊý¾Ý·Ö²¼»áʹµÃÄÚÍø´ø¿í¸ü¼Ó×½½ó¼ûÖ⣬ÓÚÊÇ±ß·Ö¸î´æ´¢·½Ê½±»½¥½¥ÅׯúÁË¡£
2.1.2 GraphX´æ´¢Ä£Ê½
Graphx½è¼øPowerGraph£¬Ê¹ÓõÄÊÇVertex-Cut(µã·Ö¸î)·½Ê½´æ´¢Í¼£¬ÓÃÈý¸öRDD´æ´¢Í¼Êý¾ÝÐÅÏ¢£º
lVertexTable(id, data)£ºidΪVertex id£¬dataΪEdge data
lEdgeTable(pid, src, dst, data)£ºpidΪPartion id£¬srcΪԶ¨µãid£¬dstΪĿµÄ¶¥µãid
lRoutingTable(id, pid)£ºidΪVertex id£¬pidΪPartion id
µã·Ö¸î´æ´¢ÊµÏÖÈçÏÂͼËùʾ£º

2.2 ¼ÆËãģʽ
2.2.1 ͼ¼ÆËãģʽ
Ŀǰ»ùÓÚͼµÄ²¢ÐмÆËã¿ò¼ÜÒѾÓкܶ࣬±ÈÈçÀ´×ÔGoogleµÄPregel¡¢À´×ÔApache¿ªÔ´µÄͼ¼ÆËã¿ò¼ÜGiraph/HAMAÒÔ¼°×îÎªÖøÃûµÄGraphLab£¬ÆäÖÐPregel¡¢HAMAºÍGiraph¶¼ÊǷdz£ÀàËÆµÄ£¬¶¼ÊÇ»ùÓÚBSP£¨Bulk
Synchronous Parallell£©Ä£Ê½¡£
Bulk Synchronous Parallell£¬¼´ÕûÌåͬ²½²¢ÐУ¬Ëü½«¼ÆËã·Ö³ÉһϵÁеij¬²½£¨superstep£©µÄµü´ú£¨iteration£©¡£´Ó×ÝÏòÉÏ¿´£¬ËüÊÇÒ»¸ö´®ÐÐģʽ£¬¶ø´ÓºáÏòÉÏ¿´£¬ËüÊÇÒ»¸ö²¢ÐеÄģʽ£¬Ã¿Á½¸ösuperstepÖ®¼äÉèÖÃÒ»¸öÕ¤À¸£¨barrier£©£¬¼´ÕûÌåͬ²½µã£¬È·¶¨ËùÓв¢ÐеļÆËã¶¼Íê³ÉºóÔÙÆô¶¯ÏÂÒ»ÂÖsuperstep¡£

ÿһ¸ö³¬²½£¨superstep£©°üº¬Èý²¿·ÖÄÚÈÝ£º
1.¼ÆËãcompute£ºÃ¿Ò»¸öprocessorÀûÓÃÉÏÒ»¸ösuperstep´«¹ýÀ´µÄÏûÏ¢ºÍ±¾µØµÄÊý¾Ý½øÐб¾µØ¼ÆË㣻
2.ÏûÏ¢´«µÝ£ºÃ¿Ò»¸öprocessor¼ÆËãÍê±Ïºó£¬½«ÏûÏ¢´«µÝ¸öÓëÖ®¹ØÁªµÄÆäËüprocessors
3.ÕûÌåͬ²½µã£ºÓÃÓÚÕûÌåͬ²½£¬È·¶¨ËùÓеļÆËãºÍÏûÏ¢´«µÝ¶¼½øÐÐÍê±Ïºó£¬½øÈëÏÂÒ»¸ösuperstep¡£
2.2.2GraphX¼ÆËãģʽ
ÈçͬSparkÒ»Ñù£¬GraphXµÄGraphÀàÌṩÁ˷ḻµÄͼÔËËã·û£¬´óÖ½ṹÈçÏÂͼËùʾ¡£¿ÉÒÔÔÚ¹Ù·½GraphX
Programming GuideÖÐÕÒµ½Ã¿¸öº¯ÊýµÄÏêϸ˵Ã÷£¬±¾ÎĽö½²Êö¼¸¸öÐèҪעÒâµÄ·½·¨¡£

2.2.2.1 ͼµÄ»º´æ
ÿ¸öͼÊÇÓÉ3¸öRDD×é³É£¬ËùÒÔ»áÕ¼Óøü¶àµÄÄÚ´æ¡£ÏàӦͼµÄcache¡¢unpersistºÍcheckpoint£¬¸üÐèҪעÒâʹÓü¼ÇÉ¡£³öÓÚ×î´óÏ޶ȸ´ÓñߵÄÀíÄGraphXµÄĬÈϽӿÚÖ»ÌṩÁËunpersistVertices·½·¨¡£Èç¹ûÒªÊͷűߣ¬µ÷ÓÃg.edges.unpersist()·½·¨²ÅÐУ¬Õâ¸øÓû§´øÀ´ÁËÒ»¶¨µÄ²»±ã£¬µ«ÎªGraphXµÄÓÅ»¯ÌṩÁ˱ãÀûºÍ¿Õ¼ä¡£²Î¿¼GraphXµÄPregel´úÂ룬¶ÔÒ»¸ö´óͼ£¬Ä¿Ç°×î¼ÑµÄʵ¼ùÊÇ£º

´óÌåÖ®ÒâÊǸù¾ÝGraphXÖÐGraphµÄ²»±äÐÔ£¬¶Ôg×ö²Ù×÷²¢¸³»Ø¸øgÖ®ºó£¬gÒѲ»ÊÇÔÀ´µÄgÁË£¬¶øÇÒ»áÔÚÏÂÒ»ÂÖµü´úʹÓã¬ËùÒÔ±ØÐëcache¡£ÁíÍ⣬±ØÐëÏÈÓÃprevG±£Áôס¶ÔÔÀ´Í¼µÄÒýÓ㬲¢ÔÚÐÂͼ²úÉúºó£¬¿ìËÙ½«¾Éͼ³¹µ×Êͷŵô¡£·ñÔò£¬Ê®¼¸ÂÖµü´úºó£¬»áÓÐÄÚ´æÐ¹Â©ÎÊÌ⣬ºÜ¿ìºÄ¹â×÷Òµ»º´æ¿Õ¼ä¡£
2.2.2.2 Áڱ߾ۺÏ
mrTriplets£¨mapReduceTriplets£©ÊÇGraphXÖÐ×îºËÐĵÄÒ»¸ö½Ó¿Ú¡£PregelÒ²»ùÓÚËü¶øÀ´£¬ËùÒÔ¶ÔËüµÄÓÅ»¯Äܴܺó³Ì¶ÈÉÏÓ°ÏìÕû¸öGraphXµÄÐÔÄÜ¡£mrTripletsÔËËã·ûµÄ¼ò»¯¶¨ÒåÊÇ£º

ËüµÄ¼ÆËã¹ý³ÌΪ£ºmap£¬Ó¦ÓÃÓÚÿһ¸öTripletÉÏ£¬Éú³ÉÒ»¸ö»òÕß¶à¸öÏûÏ¢£¬ÏûÏ¢ÒÔTriplet¹ØÁªµÄÁ½¸ö¶¥µãÖеÄÈÎÒâÒ»¸ö»òÁ½¸öΪĿ±ê¶¥µã£»reduce£¬Ó¦ÓÃÓÚÿһ¸öVertexÉÏ£¬½«·¢Ë͸øÃ¿Ò»¸ö¶¥µãµÄÏûÏ¢ºÏ²¢ÆðÀ´¡£
mrTriplets×îºó·µ»ØµÄÊÇÒ»¸öVertexRDD[A]£¬°üº¬Ã¿Ò»¸ö¶¥µã¾ÛºÏÖ®ºóµÄÏûÏ¢£¨ÀàÐÍΪA£©£¬Ã»ÓнÓÊÕµ½ÏûÏ¢µÄ¶¥µã²»»á°üº¬ÔÚ·µ»ØµÄVertexRDDÖС£
ÔÚ×î½üµÄ°æ±¾ÖУ¬GraphXÕë¶ÔËü½øÐÐÁËһЩÓÅ»¯£¬¶ÔÓÚPregelÒÔ¼°ËùÓÐÉϲãËã·¨¹¤¾ß°üµÄÐÔÄܶ¼ÓÐÖØ´óÓ°Ïì¡£Ö÷Òª°üÀ¨ÒÔϼ¸µã¡£
1. Caching for Iterative mrTriplets & Incremental
Updates for Iterative mrTriplets£ºÔںܶàͼ·ÖÎöËã·¨ÖУ¬²»Í¬µãµÄÊÕÁ²Ëٶȱ仯ºÜ´ó¡£ÔÚµü´úºóÆÚ£¬Ö»ÓкÜÉٵĵã»áÓиüС£Òò´Ë£¬¶ÔÓÚûÓиüеĵ㣬ÏÂÒ»´ÎmrTriplets¼ÆËãʱEdgeRDDÎÞÐè¸üÐÂÏàÓ¦µãÖµµÄ±¾µØ»º´æ£¬´ó·ù½µµÍÁËͨÐÅ¿ªÏú¡£
2.Indexing Active Edges£ºÃ»ÓиüÐµĶ¥µãÔÚÏÂÒ»ÂÖµü´úʱ²»ÐèÒªÏòÁÚ¾ÓÖØÐ·¢ËÍÏûÏ¢¡£Òò´Ë£¬mrTriplets±éÀú±ßʱ£¬Èç¹ûÒ»Ìõ±ßµÄÁÚ¾ÓµãÖµÔÚÉÏÒ»ÂÖµü´úʱûÓиüУ¬ÔòÖ±½ÓÌø¹ý£¬±ÜÃâÁË´óÁ¿ÎÞÓõļÆËãºÍͨÐÅ¡£
3.Join Elimination£ºTripletÊÇÓÉÒ»Ìõ±ßºÍÆäÁ½¸öÁÚ¾Óµã×é³ÉµÄÈýÔª×飬²Ù×÷TripletµÄmapº¯Êý³£³£Ö»Ðè·ÃÎÊÆäÁ½¸öÁÚ¾ÓµãÖµÖеÄÒ»¸ö¡£ÀýÈ磬ÔÚPageRank¼ÆËãÖУ¬Ò»¸öµãÖµµÄ¸üÐÂÖ»ÓëÆäÔ´¶¥µãµÄÖµÓйأ¬¶øÓëÆäËùÖ¸ÏòµÄÄ¿µÄ¶¥µãµÄÖµÎ޹ء£ÄÇôÔÚmrTriplets¼ÆËãÖУ¬¾Í²»ÐèÒªVertexRDDºÍEdgeRDDµÄ3-way
join£¬¶øÖ»ÐèÒª2-way join¡£
ËùÓÐÕâЩÓÅ»¯Ê¹GraphXµÄÐÔÄÜÖ𽥱ƽüGraphLab¡£ËäÈ»»¹ÓÐÒ»¶¨²î¾à£¬µ«Ò»Ì廯µÄÁ÷Ë®Ïß·þÎñºÍ·á¸»µÄ±à³Ì½Ó¿Ú£¬¿ÉÒÔÃÖ²¹ÐÔÄܵÄ΢С²î¾à¡£
2.2.2.3 ½ø»¯µÄPregelģʽ
GraphXÖеÄPregel½Ó¿Ú£¬²¢²»Ñϸñ×ñÑPregelģʽ£¬ËüÊÇÒ»¸ö²Î¿¼GAS¸Ä½øµÄPregelģʽ¡£¶¨ÒåÈçÏ£º

ÕâÖÖ»ùÓÚmrTrilets·½·¨µÄPregelģʽ£¬Óë±ê×¼PregelµÄ×î´óÇø±ðÊÇ£¬ËüµÄµÚ2¶Î²ÎÊýÌå½ÓÊÕµÄÊÇ3¸öº¯Êý²ÎÊý£¬¶ø²»½ÓÊÕmessageList¡£Ëü²»»áÔÚµ¥¸ö¶¥µãÉϽøÐÐÏûÏ¢±éÀú£¬¶øÊǽ«¶¥µãµÄ¶à¸öGhost¸±±¾ÊÕµ½µÄÏûÏ¢¾ÛºÏºó£¬·¢Ë͸øMaster¸±±¾£¬ÔÙʹÓÃvprogº¯ÊýÀ´¸üеãÖµ¡£ÏûÏ¢µÄ½ÓÊպͷ¢ËͶ¼±»×Ô¶¯²¢Ðл¯´¦Àí£¬ÎÞÐèµ£Ðij¬¼¶½ÚµãµÄÎÊÌâ¡£
³£¼ûµÄ´úÂëÄ£°åÈçÏÂËùʾ£º

¿ÉÒÔ¿´µ½£¬GraphXÉè¼ÆÕâ¸öģʽµÄÓÃÒâ¡£Ëü×ÛºÏÁËPregelºÍGASÁ½ÕßµÄÓŵ㣬¼´½Ó¿ÚÏà¶Ô¼òµ¥£¬ÓÖ±£Ö¤ÐÔÄÜ£¬¿ÉÒÔÓ¦¶Ôµã·Ö¸îµÄͼ´æ´¢Ä£Ê½£¬Ê¤ÈηûºÏÃÝÂÉ·Ö²¼µÄ×ÔȻͼµÄ´óÐͼÆËã¡£ÁíÍ⣬ֵµÃ×¢ÒâµÄÊÇ£¬¹Ù·½µÄPregel°æ±¾ÊÇ×î¼òµ¥µÄÒ»¸ö°æ±¾¡£¶ÔÓÚ¸´ÔÓµÄÒµÎñ³¡¾°£¬¸ù¾ÝÕâ¸ö°æ±¾À©Õ¹Ò»¸ö¶¨ÖƵÄPregelÊǺܳ£¼ûµÄ×ö·¨¡£
2.2.2.4 ͼËã·¨¹¤¾ß°ü
GraphXÒ²ÌṩÁËÒ»Ì×ͼËã·¨¹¤¾ß°ü£¬·½±ãÓû§¶Ôͼ½øÐзÖÎö¡£Ä¿Ç°×îа汾ÒÑÖ§³ÖPageRank¡¢ÊýÈý½ÇÐΡ¢×î´óÁ¬Í¨Í¼ºÍ×î¶Ì·¾¶µÈ6ÖÖ¾µäµÄͼËã·¨¡£ÕâЩËã·¨µÄ´úÂëʵÏÖ£¬Ä¿µÄºÍÖØµãÔÚÓÚͨÓÃÐÔ¡£Èç¹ûÒª»ñµÃ×î¼ÑÐÔÄÜ£¬¿ÉÒԲο¼ÆäʵÏÖ½øÐÐÐ޸ĺÍÀ©Õ¹Âú×ãÒµÎñÐèÇó¡£ÁíÍ⣬ÑжÁÕâЩ´úÂ룬ҲÊÇÀí½âGraphX±à³Ì×î¼Ñʵ¼ùµÄºÃ·½·¨¡£
3¡¢GraphXʵÀý
3.1 ͼÀýÑÝʾ
3.1.1 Àý×Ó½éÉÜ
ÏÂͼÖÐÓÐ6¸öÈË£¬Ã¿¸öÈËÓÐÃû×ÖºÍÄêÁ䣬ÕâЩÈ˸ù¾ÝÉç»á¹ØÏµÐγÉ8Ìõ±ß£¬Ã¿Ìõ±ßÓÐÆäÊôÐÔ¡£ÔÚÒÔÏÂÀý×ÓÑÝʾÖн«¹¹½¨¶¥µã¡¢±ßºÍͼ£¬´òӡͼµÄÊôÐÔ¡¢×ª»»²Ù×÷¡¢½á¹¹²Ù×÷¡¢Á¬½Ó²Ù×÷¡¢¾ÛºÏ²Ù×÷£¬²¢½áºÏʵ¼ÊÒªÇó½øÐÐÑÝʾ¡£

3.1.2 ³ÌÐò´úÂë
import org.apache.log4j.{Level, Logger} import org.apache.spark.{SparkContext, SparkConf} import org.apache.spark.graphx._ import org.apache.spark.rdd.RDD object GraphXExample { def main(args: Array[String]) { //ÆÁ±ÎÈÕÖ¾ Logger.getLogger("org.apache.spark").setLevel(Level.WARN) Logger.getLogger("org.eclipse.jetty.server").setLevel(Level.OFF) //ÉèÖÃÔËÐл·¾³ val conf = new SparkConf().setAppName("SimpleGraphX").setMaster("local") val sc = new SparkContext(conf) //ÉèÖö¥µãºÍ±ß£¬×¢Òâ¶¥µãºÍ±ß¶¼ÊÇÓÃÔª×鶨ÒåµÄArray //¶¥µãµÄÊý¾ÝÀàÐÍÊÇVD:(String,Int) val vertexArray = Array( (1L, ("Alice", 28)), (2L, ("Bob", 27)), (3L, ("Charlie", 65)), (4L, ("David", 42)), (5L, ("Ed", 55)), (6L, ("Fran", 50)) ) //±ßµÄÊý¾ÝÀàÐÍED:Int val edgeArray = Array( Edge(2L, 1L, 7), Edge(2L, 4L, 2), Edge(3L, 2L, 4), Edge(3L, 6L, 3), Edge(4L, 1L, 1), Edge(5L, 2L, 2), Edge(5L, 3L, 8), Edge(5L, 6L, 3) ) //¹¹ÔìvertexRDDºÍedgeRDD val vertexRDD: RDD[(Long, (String, Int))] = sc.parallelize(vertexArray) val edgeRDD: RDD[Edge[Int]] = sc.parallelize(edgeArray) //¹¹ÔìͼGraph[VD,ED] val graph: Graph[(String, Int), Int] = Graph(vertexRDD, edgeRDD) //*********************************************************************************** //*************************** ͼµÄÊôÐÔ **************************************** //********************************************************************************** println("***********************************************") println("ÊôÐÔÑÝʾ") println("**********************************************************") println("ÕÒ³öͼÖÐÄêÁä´óÓÚ30µÄ¶¥µã£º") graph.vertices.filter { case (id, (name, age)) => age > 30}.collect.foreach { case (id, (name, age)) => println(s"$name is $age") } //±ß²Ù×÷£ºÕÒ³öͼÖÐÊôÐÔ´óÓÚ5µÄ±ß println("ÕÒ³öͼÖÐÊôÐÔ´óÓÚ5µÄ±ß£º") graph.edges.filter(e => e.attr > 5).collect.foreach(e => println(s"${e.srcId} to ${e.dstId} att ${e.attr}")) println //triplets²Ù×÷£¬((srcId, srcAttr), (dstId, dstAttr), attr) println("Áгö±ßÊôÐÔ>5µÄtripltes£º") for (triplet <- graph.triplets.filter(t => t.attr > 5).collect) { println(s"${triplet.srcAttr._1} likes ${triplet.dstAttr._1}") } println //Degrees²Ù×÷ println("ÕÒ³öͼÖÐ×î´óµÄ³ö¶È¡¢Èë¶È¡¢¶ÈÊý£º") def max(a: (VertexId, Int), b: (VertexId, Int)): (VertexId, Int) = { if (a._2 > b._2) a else b } println("max of outDegrees:" + graph.outDegrees.reduce(max) + " max of inDegrees:" + graph.inDegrees.reduce(max) + " max of Degrees:" + graph.degrees.reduce(max)) println //*********************************************************************************** //*************************** ת»»²Ù×÷ **************************************** //********************************************************************************** println("**********************************************************") println("ת»»²Ù×÷") println("**********************************************************") println("¶¥µãµÄת»»²Ù×÷£¬¶¥µãage + 10£º") graph.mapVertices{ case (id, (name, age)) => (id, (name, age+10))} .vertices.collect.foreach(v => println(s"${v._2._1} is ${v._2._2}")) println println("±ßµÄת»»²Ù×÷£¬±ßµÄÊôÐÔ*2£º") graph.mapEdges(e=>e.attr*2).edges.collect.foreach(e => println(s"${e.srcId} to ${e.dstId} att ${e.attr}")) println //*********************************************************************************** //*************************** ½á¹¹²Ù×÷ **************************************** //********************************************************************************** println("**********************************************************") println("½á¹¹²Ù×÷") println("**********************************************************") println("¶¥µãÄê¼Í>30µÄ×Óͼ£º") val subGraph = graph.subgraph(vpred = (id, vd) => vd._2 >= 30) println("×ÓͼËùÓж¥µã£º") subGraph.vertices.collect.foreach(v => println(s"${v._2._1} is ${v._2._2}")) println println("×ÓͼËùÓбߣº") subGraph.edges.collect.foreach(e => println(s"${e.srcId} to ${e.dstId} att ${e.attr}")) println //*********************************************************************************** //*************************** Á¬½Ó²Ù×÷ **************************************** //********************************************************************************** println("**********************************************************") println("Á¬½Ó²Ù×÷") println("**********************************************************") val inDegrees: VertexRDD[Int] = graph.inDegrees case class User(name: String, age: Int, inDeg: Int, outDeg: Int) //´´½¨Ò»¸öÐÂͼ£¬¶¥µãVDµÄÊý¾ÝÀàÐÍΪUser£¬²¢´Ógraph×öÀàÐÍת»» val initialUserGraph: Graph[User, Int] = graph.mapVertices { case (id, (name, age)) => User(name, age, 0, 0)} //initialUserGraphÓëinDegrees¡¢outDegrees£¨RDD£©½øÐÐÁ¬½Ó£¬²¢ÐÞ¸ÄinitialUserGraphÖÐinDegÖµ¡¢outDegÖµ val userGraph = initialUserGraph.outerJoinVertices(initialUserGraph.inDegrees) { case (id, u, inDegOpt) => User(u.name, u.age, inDegOpt.getOrElse(0), u.outDeg) }.outerJoinVertices(initialUserGraph.outDegrees) { case (id, u, outDegOpt) => User(u.name, u.age, u.inDeg,outDegOpt.getOrElse(0)) } println("Á¬½ÓͼµÄÊôÐÔ£º") userGraph.vertices.collect.foreach(v => println(s"${v._2.name} inDeg: ${v._2.inDeg} outDeg: ${v._2.outDeg}")) println println("³ö¶ÈºÍÈë¶ÁÏàͬµÄÈËÔ±£º") userGraph.vertices.filter { case (id, u) => u.inDeg == u.outDeg }.collect.foreach { case (id, property) => println(property.name) } println //*********************************************************************************** //*************************** ¾ÛºÏ²Ù×÷ **************************************** //********************************************************************************** println("**********************************************************") println("¾ÛºÏ²Ù×÷") println("**********************************************************") println("ÕÒ³öÄê¼Í×î´óµÄ×·ÇóÕߣº") val oldestFollower: VertexRDD[(String, Int)] = userGraph.mapReduceTriplets[(String, Int)]( // ½«Ô´¶¥µãµÄÊôÐÔ·¢Ë͸øÄ¿±ê¶¥µã£¬map¹ý³Ì edge => Iterator((edge.dstId, (edge.srcAttr.name, edge.srcAttr.age))), // µÃµ½×î´ó×·ÇóÕߣ¬reduce¹ý³Ì (a, b) => if (a._2 > b._2) a else b ) userGraph.vertices.leftJoin(oldestFollower) { (id, user, optOldestFollower) => optOldestFollower match { case None => s"${user.name} does not have any followers." case Some((name, age)) => s"${name} is the oldest follower of ${user.name}." } }.collect.foreach { case (id, str) => println(str)} println //*********************************************************************************** //*************************** ʵÓòÙ×÷ **************************************** //********************************************************************************** println("**********************************************************") println("¾ÛºÏ²Ù×÷") println("**********************************************************") println("ÕÒ³ö5µ½¸÷¶¥µãµÄ×î¶Ì£º") val sourceId: VertexId = 5L // ¶¨ÒåÔ´µã val initialGraph = graph.mapVertices((id, _) => if (id == sourceId) 0.0 else Double.PositiveInfinity) val sssp = initialGraph.pregel(Double.PositiveInfinity)( (id, dist, newDist) => math.min(dist, newDist), triplet => { // ¼ÆËãÈ¨ÖØ if (triplet.srcAttr + triplet.attr < triplet.dstAttr) { Iterator((triplet.dstId, triplet.srcAttr + triplet.attr)) } else { Iterator.empty } }, (a,b) => math.min(a,b) // ×î¶Ì¾àÀë ) println(sssp.vertices.collect.mkString("\n")) sc.stop() } } |
3.1.3 ÔËÐнá¹û
ÔÚIDEA£¨ÈçºÎʹÓÃIDEA²Î¼ûµÚ3¿Î¡¶3.Spark±à³ÌÄ£ÐÍ£¨Ï£©--IDEA´î½¨¼°ÊµÕ½¡·£©ÖÐÊ×ÏȶÔGraphXExample.scala´úÂë½øÐбàÒ룬±àÒëͨ¹ýºó½øÐÐÖ´ÐУ¬Ö´Ðнá¹ûÈçÏ£º
********************************************************** ÊôÐÔÑÝʾ ********************************************************** ÕÒ³öͼÖÐÄêÁä´óÓÚ30µÄ¶¥µã£º David is 42 Fran is 50 Charlie is 65 Ed is 55 ÕÒ³öͼÖÐÊôÐÔ´óÓÚ5µÄ±ß£º 2 to 1 att 7 5 to 3 att 8 Áгö±ßÊôÐÔ>5µÄtripltes£º Bob likes Alice Ed likes Charlie ÕÒ³öͼÖÐ×î´óµÄ³ö¶È¡¢Èë¶È¡¢¶ÈÊý£º max of outDegrees:(5,3) max of inDegrees:(2,2) max of Degrees:(2,4) ********************************************************** ת»»²Ù×÷ ********************************************************** ¶¥µãµÄת»»²Ù×÷£¬¶¥µãage + 10£º 4 is (David,52) 1 is (Alice,38) 6 is (Fran,60) 3 is (Charlie,75) 5 is (Ed,65) 2 is (Bob,37) ±ßµÄת»»²Ù×÷£¬±ßµÄÊôÐÔ*2£º 2 to 1 att 14 2 to 4 att 4 3 to 2 att 8 3 to 6 att 6 4 to 1 att 2 5 to 2 att 4 5 to 3 att 16 5 to 6 att 6 ********************************************************** ½á¹¹²Ù×÷ ********************************************************** ¶¥µãÄê¼Í>30µÄ×Óͼ£º ×ÓͼËùÓж¥µã£º David is 42 Fran is 50 Charlie is 65 Ed is 55 ×ÓͼËùÓбߣº 3 to 6 att 3 5 to 3 att 8 5 to 6 att 3 ********************************************************** Á¬½Ó²Ù×÷ ********************************************************** Á¬½ÓͼµÄÊôÐÔ£º David inDeg: 1 outDeg: 1 Alice inDeg: 2 outDeg: 0 Fran inDeg: 2 outDeg: 0 Charlie inDeg: 1 outDeg: 2 Ed inDeg: 0 outDeg: 3 Bob inDeg: 2 outDeg: 2 ³ö¶ÈºÍÈë¶ÁÏàͬµÄÈËÔ±£º David Bob ********************************************************** ¾ÛºÏ²Ù×÷ ********************************************************** ÕÒ³öÄê¼Í×î´óµÄ×·ÇóÕߣº Bob is the oldest follower of David. David is the oldest follower of Alice. Charlie is the oldest follower of Fran. Ed is the oldest follower of Charlie. Ed does not have any followers. Charlie is the oldest follower of Bob. ********************************************************** ʵÓòÙ×÷ ********************************************************** ÕÒ³ö5µ½¸÷¶¥µãµÄ×î¶Ì£º (4,4.0) (1,5.0) (6,3.0) (3,8.0) (5,0.0) (2,2.0) |

3.2 PageRank ÑÝʾ
3.2.1 Àý×Ó½éÉÜ
PageRank, ¼´ÍøÒ³ÅÅÃû£¬ÓÖ³ÆÍøÒ³¼¶±ð¡¢Google ×ó²àÅÅÃû»òÅ寿ÅÅÃû¡£ËüÊÇGoogle ´´Ê¼ÈËÀÀ
Å寿ºÍл¶û¸Ç¡¤ ²¼ÁÖÓÚ1997 Äê¹¹½¨ÔçÆÚµÄËÑË÷ϵͳÔÐÍʱÌá³öµÄÁ´½Ó·ÖÎöËã·¨¡£Ä¿Ç°ºÜ¶àÖØÒªµÄÁ´½Ó·ÖÎöËã·¨¶¼ÊÇÔÚPageRank
Ëã·¨»ù´¡ÉÏÑÜÉú³öÀ´µÄ¡£PageRank ÊÇGoogle ÓÃÓÚÓÃÀ´±êÊ¶ÍøÒ³µÄµÈ¼¶/ ÖØÒªÐÔµÄÒ»ÖÖ·½·¨£¬ÊÇGoogle
ÓÃÀ´ºâÁ¿Ò»¸öÍøÕ¾µÄºÃ»µµÄΨһ±ê×¼¡£ÔÚÈàºÏÁËÖîÈçTitle ±êʶºÍKeywords ±êʶµÈËùÓÐÆäËüÒòËØÖ®ºó£¬
Google ͨ¹ýPageRank À´µ÷Õû½á¹û£¬Ê¹ÄÇЩ¸ü¾ß¡°µÈ¼¶/ ÖØÒªÐÔ¡±µÄÍøÒ³ÔÚËÑË÷½á¹ûÖÐÁîÍøÕ¾ÅÅÃû»ñµÃÌáÉý£¬´Ó¶øÌá¸ßËÑË÷½á¹ûµÄÏà¹ØÐÔºÍÖÊÁ¿¡£

3.2.2 ²âÊÔÊý¾Ý
ÔÚÕâÀï²âÊÔÊý¾ÝΪ¶¥µãÊý¾Ýgraphx-wiki-vertices.txtºÍ±ßÊý¾Ýgraphx-wiki-edges.txt£¬¿ÉÒÔÔÚ±¾ÏµÁи½´ø×ÊÔ´/data/class9/Ŀ¼ÖÐÕÒµ½ÕâÁ½¸öÊý¾ÝÎļþ£¬ÆäÖиñʽΪ£º
l ¶¥µãΪ¶¥µã±àºÅºÍÍøÒ³±êÌâ

l ±ßÊý¾ÝÓÉÁ½¸ö¶¥µã¹¹³É

3.2.3 ³ÌÐò´úÂë
import org.apache.log4j.{Level, Logger} import org.apache.spark.{SparkContext, SparkConf} import org.apache.spark.graphx._ import org.apache.spark.rdd.RDD object PageRank { def main(args: Array[String]) { //ÆÁ±ÎÈÕÖ¾ Logger.getLogger("org.apache.spark").setLevel(Level.WARN) Logger.getLogger("org.eclipse.jetty.server").setLevel(Level.OFF) //ÉèÖÃÔËÐл·¾³ val conf = new SparkConf().setAppName("PageRank").setMaster("local") val sc = new SparkContext(conf) //¶ÁÈëÊý¾ÝÎļþ val articles: RDD[String] = sc.textFile("/home/hadoop/IdeaProjects/data/graphx/graphx-wiki-vertices.txt") val links: RDD[String] = sc.textFile("/home/hadoop/IdeaProjects/data/graphx/graphx-wiki-edges.txt") //×°ÔØ¶¥µãºÍ±ß val vertices = articles.map { line => val fields = line.split('\t') (fields(0).toLong, fields(1)) } val edges = links.map { line => val fields = line.split('\t') Edge(fields(0).toLong, fields(1).toLong, 0) } //cache²Ù×÷ //val graph = Graph(vertices, edges, "").persist(StorageLevel.MEMORY_ONLY_SER) val graph = Graph(vertices, edges, "").persist() //graph.unpersistVertices(false) //²âÊÔ println("**********************************************************") println("»ñÈ¡5¸ötripletÐÅÏ¢") println("**********************************************************") graph.triplets.take(5).foreach(println(_)) //pageRankËã·¨ÀïÃæµÄʱºòʹÓÃÁËcache()£¬¹ÊÇ°ÃæpersistµÄʱºòÖ»ÄÜʹÓÃMEMORY_ONLY println("**********************************************************") println("PageRank¼ÆË㣬»ñÈ¡×îÓмÛÖµµÄÊý¾Ý") println("**********************************************************") val prGraph = graph.pageRank(0.001).cache() val titleAndPrGraph = graph.outerJoinVertices(prGraph.vertices) { (v, title, rank) => (rank.getOrElse(0.0), title) } titleAndPrGraph.vertices.top(10) { Ordering.by((entry: (VertexId, (Double, String))) => entry._2._1) }.foreach(t => println(t._2._2 + ": " + t._2._1)) sc.stop() } } |
3.2.4 ÔËÐнá¹û
ÔÚIDEAÖÐÊ×ÏȶÔPageRank.scala´úÂë½øÐбàÒ룬±àÒëͨ¹ýºó½øÐÐÖ´ÐУ¬Ö´Ðнá¹ûÈçÏ£º
********************************************************** »ñÈ¡5¸ötripletÐÅÏ¢ ********************************************************** ((146271392968588,Computer Consoles Inc.),(7097126743572404313,Berkeley Software Distribution),0) ((146271392968588,Computer Consoles Inc.),(8830299306937918434,University of California, Berkeley),0) ((625290464179456,List of Penguin Classics),(1735121673437871410,George Berkeley),0) ((1342848262636510,List of college swimming and diving teams),(8830299306937918434,University of California, Berkeley),0) ((1889887370673623,Anthony Pawson),(8830299306937918434,University of California, Berkeley),0) ********************************************************** PageRank¼ÆË㣬»ñÈ¡×îÓмÛÖµµÄÊý¾Ý ********************************************************** University of California, Berkeley: 1321.111754312097 Berkeley, California: 664.8841977233583 Uc berkeley: 162.50132743397873 Berkeley Software Distribution: 90.4786038848606 Lawrence Berkeley National Laboratory: 81.90404939641944 George Berkeley: 81.85226118457985 Busby Berkeley: 47.871998218019655 Berkeley Hills: 44.76406979519754 Xander Berkeley: 30.324075347288037 Berkeley County, South Carolina: 28.908336483710308 |

|