Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Modeler   Code  
»áÔ±   
 
   
 
 
     
   
 ¶©ÔÄ
  ¾èÖú
SparkÈëÃÅʵսϵÁÐ--9.Sparkͼ¼ÆËãGraphX½éÉܼ°ÊµÀý
 
À´Ô´£ºÍÆ¿á ·¢²¼ÓÚ£º 2017-4-28
  3246  次浏览      27
 

1¡¢GraphX½éÉÜ

1.1 GraphXÓ¦Óñ³¾°

Spark GraphXÊÇÒ»¸ö·Ö²¼Ê½Í¼´¦Àí¿ò¼Ü£¬ËüÊÇ»ùÓÚSparkƽ̨Ìṩ¶Ôͼ¼ÆËãºÍͼÍÚ¾ò¼ò½àÒ×ÓõĶø·á¸»µÄ½Ó¿Ú£¬¼«´óµÄ·½±ãÁ˶Էֲ¼Ê½Í¼´¦ÀíµÄÐèÇó¡£

ÖÚËùÖÜÖª¡¤£¬Éç½»ÍøÂçÖÐÈËÓëÈËÖ®¼äÓÐºÜ¶à¹ØÏµÁ´£¬ÀýÈçTwitter¡¢Facebook¡¢Î¢²©ºÍ΢Ðŵȣ¬ÕâЩ¶¼ÊÇ´óÊý¾Ý²úÉúµÄµØ·½¶¼ÐèҪͼ¼ÆË㣬ÏÖÔÚµÄͼ´¦Àí»ù±¾¶¼ÊÇ·Ö²¼Ê½µÄͼ´¦Àí£¬¶ø²¢·Çµ¥»ú´¦Àí¡£Spark GraphXÓÉÓڵײãÊÇ»ùÓÚSparkÀ´´¦ÀíµÄ£¬ËùÒÔÌìÈ»¾ÍÊÇÒ»¸ö·Ö²¼Ê½µÄͼ´¦Àíϵͳ¡£

ͼµÄ·Ö²¼Ê½»òÕß²¢Ðд¦ÀíÆäʵÊǰÑͼ²ð·Ö³ÉºÜ¶àµÄ×Óͼ£¬È»ºó·Ö±ð¶ÔÕâЩ×Óͼ½øÐмÆË㣬¼ÆËãµÄʱºò¿ÉÒÔ·Ö±ðµü´ú½øÐзֽ׶εļÆË㣬¼´¶Ôͼ½øÐв¢ÐмÆËã¡£ÏÂÃæÎÒÃÇ¿´Ò»ÏÂͼ¼ÆËãµÄ¼òµ¥Ê¾Àý£º

´ÓͼÖÐÎÒÃÇ¿ÉÒÔ¿´³ö£ºÄõ½WikipediaµÄÎĵµÒԺ󣬿ÉÒÔ±ä³ÉLink TableÐÎʽµÄÊÓͼ£¬È»ºó»ùÓÚLink TableÐÎʽµÄÊÓͼ¿ÉÒÔ·ÖÎö³ÉHyperlinks³¬Á´½Ó£¬×îºóÎÒÃÇ¿ÉÒÔʹÓÃPageRankÈ¥·ÖÎöµÃ³öTop Communities¡£ÔÚÏÂÃæÂ·¾¶ÖеÄEditor Graphµ½Community£¬Õâ¸ö¹ý³Ì¿ÉÒÔ³ÆÖ®ÎªTriangle Computation£¬ÕâÊǼÆËãÈý½ÇÐεÄÒ»¸öËã·¨£¬»ùÓڴ˻ᷢÏÖÒ»¸öÉçÇø¡£´ÓÉÏÃæµÄ·ÖÎöÖÐÎÒÃÇ¿ÉÒÔ·¢ÏÖͼ¼ÆËãÓкܶàµÄ×ö·¨ºÍËã·¨£¬Í¬Ê±Ò²·¢ÏÖͼºÍ±í¸ñ¿ÉÒÔ×ö»¥ÏàµÄת»»¡£

1.2 GraphXµÄ¿ò¼Ü

Éè¼ÆGraphXʱ£¬µã·Ö¸îºÍGAS¶¼ÒѳÉÊ죬ÔÚÉè¼ÆºÍ±àÂëÖÐÕë¶ÔËüÃǽøÐÐÁËÓÅ»¯£¬²¢ÔÚ¹¦ÄܺÍÐÔÄÜÖ®¼äѰÕÒ×î¼ÑµÄƽºâµã¡£ÈçͬSpark±¾Éí£¬Ã¿¸ö×ÓÄ£¿é¶¼ÓÐÒ»¸öºËÐijéÏó¡£GraphXµÄºËÐijéÏóÊÇResilient Distributed Property Graph£¬Ò»ÖÖµãºÍ±ß¶¼´øÊôÐÔµÄÓÐÏò¶àÖØÍ¼¡£ËüÀ©Õ¹ÁËSpark RDDµÄ³éÏó£¬ÓÐTableºÍGraphÁ½ÖÖÊÓͼ£¬¶øÖ»ÐèÒªÒ»·ÝÎïÀí´æ´¢¡£Á½ÖÖÊÓͼ¶¼ÓÐ×Ô¼º¶ÀÓеIJÙ×÷·û£¬´Ó¶ø»ñµÃÁËÁé»î²Ù×÷ºÍÖ´ÐÐЧÂÊ¡£

ÈçͬSpark£¬GraphXµÄ´úÂë·Ç³£¼ò½à¡£GraphXµÄºËÐÄ´úÂëÖ»ÓÐ3ǧ¶àÐУ¬¶øÔÚ´ËÖ®ÉÏʵÏÖµÄPregelģʽ£¬Ö»Òª¶Ì¶ÌµÄ20¶àÐС£GraphXµÄ´úÂë½á¹¹ÕûÌåÏÂͼËùʾ£¬ÆäÖд󲿷ֵÄʵÏÖ£¬¶¼ÊÇÎ§ÈÆPartitionµÄÓÅ»¯½øÐеġ£ÕâÔÚijÖ̶ֳÈÉÏ˵Ã÷Á˵ã·Ö¸îµÄ´æ´¢ºÍÏàÓ¦µÄ¼ÆËãÓÅ»¯£¬µÄÈ·ÊÇͼ¼ÆËã¿ò¼ÜµÄÖØµãºÍÄѵ㡣

1.3 ·¢Õ¹Àú³Ì

lÔçÔÚ0.5°æ±¾£¬Spark¾Í´øÁËÒ»¸öСÐ͵ÄBagelÄ£¿é£¬ÌṩÁËÀàËÆPregelµÄ¹¦ÄÜ¡£µ±È»£¬Õâ¸ö°æ±¾»¹·Ç³£Ô­Ê¼£¬ÐÔÄܺ͹¦Äܶ¼±È½ÏÈõ£¬ÊôÓÚʵÑéÐͲúÆ·¡£

lµ½0.8°æ±¾Ê±£¬¼øÓÚÒµ½ç¶Ô·Ö²¼Ê½Í¼¼ÆËãµÄÐèÇóÈÕÒæ¼ûÕÇ£¬Spark¿ªÊ¼¶ÀÁ¢Ò»¸ö·ÖÖ§Graphx-Branch£¬×÷Ϊ¶ÀÁ¢µÄͼ¼ÆËãÄ£¿é£¬½è¼øGraphLab£¬¿ªÊ¼Éè¼Æ¿ª·¢GraphX¡£

lÔÚ0.9°æ±¾ÖУ¬Õâ¸öÄ£¿é±»Õýʽ¼¯³Éµ½Ö÷¸É£¬ËäÈ»ÊÇAlpha°æ±¾£¬µ«ÒÑ¿ÉÒÔÊÔÓã¬Ð¡Ãæ°üȦBagel¸æ±ðÎę̀¡£1.0°æ±¾£¬GraphXÕýʽͶÈëÉú²úʹÓá£

ÖµµÃ×¢ÒâµÄÊÇ£¬GraphXĿǰÒÀÈ»´¦ÓÚ¿ìËÙ·¢Õ¹ÖУ¬´Ó0.8µÄ·ÖÖ§µ½0.9ºÍ1.0£¬Ã¿¸ö°æ±¾´úÂë¶¼Óв»ÉٵĸĽøºÍÖØ¹¹¡£¸ù¾Ý¹Û²ì£¬ÔÚûÓиÄÈκδúÂëÂß¼­ºÍÔËÐл·¾³£¬Ö»ÊÇÉý¼¶°æ±¾¡¢Çл»½Ó¿ÚºÍÖØÐ±àÒëµÄÇé¿öÏ£¬Ã¿¸ö°æ±¾ÓÐ10%~20%µÄÐÔÄÜÌáÉý¡£ËäÈ»ºÍGraphLabµÄÐÔÄÜ»¹ÓÐÒ»¶¨²î¾à£¬µ«Æ¾½èSparkÕûÌåÉϵÄÒ»Ì廯Á÷Ë®Ïß´¦Àí£¬ÉçÇøÈÈÁҵĻîÔ¾¶È¼°¿ìËٸĽøËÙ¶È£¬GraphX¾ßÓÐÇ¿´óµÄ¾ºÕùÁ¦¡£

2¡¢GraphXʵÏÖ·ÖÎö

ÈçͬSpark±¾Éí£¬Ã¿¸ö×ÓÄ£¿é¶¼ÓÐÒ»¸öºËÐijéÏó¡£GraphXµÄºËÐijéÏóÊÇResilient Distributed Property Graph£¬Ò»ÖÖµãºÍ±ß¶¼´øÊôÐÔµÄÓÐÏò¶àÖØÍ¼¡£ËüÀ©Õ¹ÁËSpark RDDµÄ³éÏó£¬ÓÐTableºÍGraphÁ½ÖÖÊÓͼ£¬¶øÖ»ÐèÒªÒ»·ÝÎïÀí´æ´¢¡£Á½ÖÖÊÓͼ¶¼ÓÐ×Ô¼º¶ÀÓеIJÙ×÷·û£¬´Ó¶ø»ñµÃÁËÁé»î²Ù×÷ºÍÖ´ÐÐЧÂÊ¡£

GraphXµÄµ×²ãÉè¼ÆÓÐÒÔϼ¸¸ö¹Ø¼üµã¡£

¶ÔGraphÊÓͼµÄËùÓвÙ×÷£¬×îÖÕ¶¼»áת»»³ÉÆä¹ØÁªµÄTableÊÓͼµÄRDD²Ù×÷À´Íê³É¡£ÕâÑù¶ÔÒ»¸öͼµÄ¼ÆË㣬×îÖÕÔÚÂß¼­ÉÏ£¬µÈ¼ÛÓÚһϵÁÐRDDµÄת»»¹ý³Ì¡£Òò´Ë£¬Graph×îÖվ߱¸ÁËRDDµÄ3¸ö¹Ø¼üÌØÐÔ£ºImmutable¡¢DistributedºÍFault-Tolerant£¬ÆäÖÐ×î¹Ø¼üµÄÊÇImmutable£¨²»±äÐÔ£©¡£Âß¼­ÉÏ£¬ËùÓÐͼµÄת»»ºÍ²Ù×÷¶¼²úÉúÁËÒ»¸öÐÂͼ£»ÎïÀíÉÏ£¬GraphX»áÓÐÒ»¶¨³Ì¶ÈµÄ²»±ä¶¥µãºÍ±ßµÄ¸´ÓÃÓÅ»¯£¬¶ÔÓû§Í¸Ã÷¡£

Á½ÖÖÊÓͼµ×²ã¹²ÓõÄÎïÀíÊý¾Ý£¬ÓÉRDD[Vertex-Partition]ºÍRDD[EdgePartition]ÕâÁ½¸öRDD×é³É¡£µãºÍ±ßʵ¼Ê¶¼²»ÊÇÒÔ±íCollection[tuple]µÄÐÎʽ´æ´¢µÄ£¬¶øÊÇÓÉVertexPartition/EdgePartitionÔÚÄÚ²¿´æ´¢Ò»¸ö´øË÷Òý½á¹¹µÄ·ÖƬÊý¾Ý¿é£¬ÒÔ¼ÓËÙ²»Í¬ÊÓͼϵıéÀúËÙ¶È¡£²»±äµÄË÷Òý½á¹¹ÔÚRDDת»»¹ý³ÌÖÐÊǹ²Óõ쬽µµÍÁ˼ÆËãºÍ´æ´¢¿ªÏú¡£

ͼµÄ·Ö²¼Ê½´æ´¢²ÉÓõã·Ö¸îģʽ£¬¶øÇÒʹÓÃpartitionBy·½·¨£¬ÓÉÓû§Ö¸¶¨²»Í¬µÄ»®·Ö²ßÂÔ£¨PartitionStrategy£©¡£»®·Ö²ßÂԻὫ±ß·ÖÅäµ½¸÷¸öEdgePartition£¬¶¥µãMaster·ÖÅäµ½¸÷¸öVertexPartition£¬EdgePartitionÒ²»á»º´æ±¾µØ±ß¹ØÁªµãµÄGhost¸±±¾¡£»®·Ö²ßÂԵIJ»Í¬»áÓ°Ïìµ½ËùÐèÒª»º´æµÄGhost¸±±¾ÊýÁ¿£¬ÒÔ¼°Ã¿¸öEdgePartition·ÖÅäµÄ±ßµÄ¾ùºâ³Ì¶È£¬ÐèÒª¸ù¾ÝͼµÄ½á¹¹ÌØÕ÷ѡȡ×î¼Ñ²ßÂÔ¡£Ä¿Ç°ÓÐEdgePartition2d¡¢EdgePartition1d¡¢RandomVertexCutºÍCanonicalRandomVertexCutÕâËÄÖÖ²ßÂÔ¡£

2.1 ´æ´¢Ä£Ê½

2.1.1 ͼ´æ´¢Ä£Ê½

¾ÞÐÍͼµÄ´æ´¢×ÜÌåÉÏÓб߷ָîºÍµã·Ö¸îÁ½ÖÖ´æ´¢·½Ê½¡£2013Ä꣬GraphLab2.0½«Æä´æ´¢·½Ê½Óɱ߷ָî±äΪµã·Ö¸î£¬ÔÚÐÔÄÜÉÏÈ¡µÃÖØ´óÌáÉý£¬Ä¿Ç°»ù±¾Éϱ»Òµ½ç¹ã·º½ÓÊܲ¢Ê¹Óá£

l±ß·Ö¸î£¨Edge-Cut£©£ºÃ¿¸ö¶¥µã¶¼´æ´¢Ò»´Î£¬µ«Óеı߻ᱻ´ò¶Ï·Öµ½Á½Ì¨»úÆ÷ÉÏ¡£ÕâÑù×öµÄºÃ´¦ÊǽÚÊ¡´æ´¢¿Õ¼ä£»»µ´¦ÊǶÔͼ½øÐлùÓڱߵļÆËãʱ£¬¶ÔÓÚÒ»ÌõÁ½¸ö¶¥µã±»·Öµ½²»Í¬»úÆ÷ÉϵıßÀ´Ëµ£¬Òª¿ç»úÆ÷ͨÐÅ´«ÊäÊý¾Ý£¬ÄÚÍøÍ¨ÐÅÁ÷Á¿´ó¡£

lµã·Ö¸î£¨Vertex-Cut£©£ºÃ¿Ìõ±ßÖ»´æ´¢Ò»´Î£¬¶¼Ö»»á³öÏÖÔÚһ̨»úÆ÷ÉÏ¡£ÁÚ¾Ó¶àµÄµã»á±»¸´ÖƵ½¶ą̀»úÆ÷ÉÏ£¬Ôö¼ÓÁË´æ´¢¿ªÏú£¬Í¬Ê±»áÒý·¢Êý¾Ýͬ²½ÎÊÌâ¡£ºÃ´¦ÊÇ¿ÉÒÔ´ó·ù¼õÉÙÄÚÍøÍ¨ÐÅÁ¿¡£

ËäÈ»Á½ÖÖ·½·¨»¥ÓÐÀû±×£¬µ«ÏÖÔÚÊǵã·Ö¸îÕ¼ÉϷ磬¸÷ÖÖ·Ö²¼Ê½Í¼¼ÆËã¿ò¼Ü¶¼½«×Ô¼ºµ×²ãµÄ´æ´¢ÐÎʽ±ä³ÉÁ˵ã·Ö¸î¡£Ö÷ÒªÔ­ÒòÓÐÒÔÏÂÁ½¸ö¡£

1.´ÅÅ̼۸ñϽµ£¬´æ´¢¿Õ¼ä²»ÔÙÊÇÎÊÌ⣬¶øÄÚÍøµÄͨÐÅ×ÊԴûÓÐÍ»ÆÆÐÔ½øÕ¹£¬¼¯Èº¼ÆËãʱÄÚÍø´ø¿íÊDZ¦¹óµÄ£¬Ê±¼ä±È´ÅÅ̸üÕä¹ó¡£Õâµã¾ÍÀàËÆÓÚ³£¼ûµÄ¿Õ¼ä»»Ê±¼äµÄ²ßÂÔ¡£

2.ÔÚµ±Ç°µÄÓ¦Óó¡¾°ÖУ¬¾ø´ó¶àÊýÍøÂç¶¼ÊÇ¡°Î޳߶ÈÍøÂ硱£¬×ñÑ­ÃÝÂÉ·Ö²¼£¬²»Í¬µãµÄÁÚ¾ÓÊýÁ¿Ïà²î·Ç³£ÐüÊâ¡£¶ø±ß·Ö¸î»áʹÄÇЩ¶àÁھӵĵãËùÏàÁ¬µÄ±ß´ó¶àÊý±»·Öµ½²»Í¬µÄ»úÆ÷ÉÏ£¬ÕâÑùµÄÊý¾Ý·Ö²¼»áʹµÃÄÚÍø´ø¿í¸ü¼Ó×½½ó¼ûÖ⣬ÓÚÊÇ±ß·Ö¸î´æ´¢·½Ê½±»½¥½¥ÅׯúÁË¡£

2.1.2 GraphX´æ´¢Ä£Ê½

Graphx½è¼øPowerGraph£¬Ê¹ÓõÄÊÇVertex-Cut(µã·Ö¸î)·½Ê½´æ´¢Í¼£¬ÓÃÈý¸öRDD´æ´¢Í¼Êý¾ÝÐÅÏ¢£º

lVertexTable(id, data)£ºidΪVertex id£¬dataΪEdge data

lEdgeTable(pid, src, dst, data)£ºpidΪPartion id£¬srcΪԭ¶¨µãid£¬dstΪĿµÄ¶¥µãid

lRoutingTable(id, pid)£ºidΪVertex id£¬pidΪPartion id

µã·Ö¸î´æ´¢ÊµÏÖÈçÏÂͼËùʾ£º

2.2 ¼ÆËãģʽ

2.2.1 ͼ¼ÆËãģʽ

Ŀǰ»ùÓÚͼµÄ²¢ÐмÆËã¿ò¼ÜÒѾ­Óкܶ࣬±ÈÈçÀ´×ÔGoogleµÄPregel¡¢À´×ÔApache¿ªÔ´µÄͼ¼ÆËã¿ò¼ÜGiraph/HAMAÒÔ¼°×îÎªÖøÃûµÄGraphLab£¬ÆäÖÐPregel¡¢HAMAºÍGiraph¶¼ÊǷdz£ÀàËÆµÄ£¬¶¼ÊÇ»ùÓÚBSP£¨Bulk Synchronous Parallell£©Ä£Ê½¡£

Bulk Synchronous Parallell£¬¼´ÕûÌåͬ²½²¢ÐУ¬Ëü½«¼ÆËã·Ö³ÉһϵÁеij¬²½£¨superstep£©µÄµü´ú£¨iteration£©¡£´Ó×ÝÏòÉÏ¿´£¬ËüÊÇÒ»¸ö´®ÐÐģʽ£¬¶ø´ÓºáÏòÉÏ¿´£¬ËüÊÇÒ»¸ö²¢ÐеÄģʽ£¬Ã¿Á½¸ösuperstepÖ®¼äÉèÖÃÒ»¸öÕ¤À¸£¨barrier£©£¬¼´ÕûÌåͬ²½µã£¬È·¶¨ËùÓв¢ÐеļÆËã¶¼Íê³ÉºóÔÙÆô¶¯ÏÂÒ»ÂÖsuperstep¡£

ÿһ¸ö³¬²½£¨superstep£©°üº¬Èý²¿·ÖÄÚÈÝ£º

1.¼ÆËãcompute£ºÃ¿Ò»¸öprocessorÀûÓÃÉÏÒ»¸ösuperstep´«¹ýÀ´µÄÏûÏ¢ºÍ±¾µØµÄÊý¾Ý½øÐб¾µØ¼ÆË㣻

2.ÏûÏ¢´«µÝ£ºÃ¿Ò»¸öprocessor¼ÆËãÍê±Ïºó£¬½«ÏûÏ¢´«µÝ¸öÓëÖ®¹ØÁªµÄÆäËüprocessors

3.ÕûÌåͬ²½µã£ºÓÃÓÚÕûÌåͬ²½£¬È·¶¨ËùÓеļÆËãºÍÏûÏ¢´«µÝ¶¼½øÐÐÍê±Ïºó£¬½øÈëÏÂÒ»¸ösuperstep¡£

2.2.2GraphX¼ÆËãģʽ

ÈçͬSparkÒ»Ñù£¬GraphXµÄGraphÀàÌṩÁ˷ḻµÄͼÔËËã·û£¬´óÖ½ṹÈçÏÂͼËùʾ¡£¿ÉÒÔÔÚ¹Ù·½GraphX Programming GuideÖÐÕÒµ½Ã¿¸öº¯ÊýµÄÏêϸ˵Ã÷£¬±¾ÎĽö½²Êö¼¸¸öÐèҪעÒâµÄ·½·¨¡£

2.2.2.1 ͼµÄ»º´æ

ÿ¸öͼÊÇÓÉ3¸öRDD×é³É£¬ËùÒÔ»áÕ¼Óøü¶àµÄÄÚ´æ¡£ÏàӦͼµÄcache¡¢unpersistºÍcheckpoint£¬¸üÐèҪעÒâʹÓü¼ÇÉ¡£³öÓÚ×î´óÏ޶ȸ´ÓñߵÄÀíÄGraphXµÄĬÈϽӿÚÖ»ÌṩÁËunpersistVertices·½·¨¡£Èç¹ûÒªÊͷűߣ¬µ÷ÓÃg.edges.unpersist()·½·¨²ÅÐУ¬Õâ¸øÓû§´øÀ´ÁËÒ»¶¨µÄ²»±ã£¬µ«ÎªGraphXµÄÓÅ»¯ÌṩÁ˱ãÀûºÍ¿Õ¼ä¡£²Î¿¼GraphXµÄPregel´úÂ룬¶ÔÒ»¸ö´óͼ£¬Ä¿Ç°×î¼ÑµÄʵ¼ùÊÇ£º

´óÌåÖ®ÒâÊǸù¾ÝGraphXÖÐGraphµÄ²»±äÐÔ£¬¶Ôg×ö²Ù×÷²¢¸³»Ø¸øgÖ®ºó£¬gÒѲ»ÊÇÔ­À´µÄgÁË£¬¶øÇÒ»áÔÚÏÂÒ»ÂÖµü´úʹÓã¬ËùÒÔ±ØÐëcache¡£ÁíÍ⣬±ØÐëÏÈÓÃprevG±£Áôס¶ÔÔ­À´Í¼µÄÒýÓ㬲¢ÔÚÐÂͼ²úÉúºó£¬¿ìËÙ½«¾Éͼ³¹µ×Êͷŵô¡£·ñÔò£¬Ê®¼¸ÂÖµü´úºó£¬»áÓÐÄÚ´æÐ¹Â©ÎÊÌ⣬ºÜ¿ìºÄ¹â×÷Òµ»º´æ¿Õ¼ä¡£

2.2.2.2 Áڱ߾ۺÏ

mrTriplets£¨mapReduceTriplets£©ÊÇGraphXÖÐ×îºËÐĵÄÒ»¸ö½Ó¿Ú¡£PregelÒ²»ùÓÚËü¶øÀ´£¬ËùÒÔ¶ÔËüµÄÓÅ»¯Äܴܺó³Ì¶ÈÉÏÓ°ÏìÕû¸öGraphXµÄÐÔÄÜ¡£mrTripletsÔËËã·ûµÄ¼ò»¯¶¨ÒåÊÇ£º

ËüµÄ¼ÆËã¹ý³ÌΪ£ºmap£¬Ó¦ÓÃÓÚÿһ¸öTripletÉÏ£¬Éú³ÉÒ»¸ö»òÕß¶à¸öÏûÏ¢£¬ÏûÏ¢ÒÔTriplet¹ØÁªµÄÁ½¸ö¶¥µãÖеÄÈÎÒâÒ»¸ö»òÁ½¸öΪĿ±ê¶¥µã£»reduce£¬Ó¦ÓÃÓÚÿһ¸öVertexÉÏ£¬½«·¢Ë͸øÃ¿Ò»¸ö¶¥µãµÄÏûÏ¢ºÏ²¢ÆðÀ´¡£

mrTriplets×îºó·µ»ØµÄÊÇÒ»¸öVertexRDD[A]£¬°üº¬Ã¿Ò»¸ö¶¥µã¾ÛºÏÖ®ºóµÄÏûÏ¢£¨ÀàÐÍΪA£©£¬Ã»ÓнÓÊÕµ½ÏûÏ¢µÄ¶¥µã²»»á°üº¬ÔÚ·µ»ØµÄVertexRDDÖС£

ÔÚ×î½üµÄ°æ±¾ÖУ¬GraphXÕë¶ÔËü½øÐÐÁËһЩÓÅ»¯£¬¶ÔÓÚPregelÒÔ¼°ËùÓÐÉϲãËã·¨¹¤¾ß°üµÄÐÔÄܶ¼ÓÐÖØ´óÓ°Ïì¡£Ö÷Òª°üÀ¨ÒÔϼ¸µã¡£

1. Caching for Iterative mrTriplets & Incremental Updates for Iterative mrTriplets£ºÔںܶàͼ·ÖÎöËã·¨ÖУ¬²»Í¬µãµÄÊÕÁ²Ëٶȱ仯ºÜ´ó¡£ÔÚµü´úºóÆÚ£¬Ö»ÓкÜÉٵĵã»áÓиüС£Òò´Ë£¬¶ÔÓÚûÓиüеĵ㣬ÏÂÒ»´ÎmrTriplets¼ÆËãʱEdgeRDDÎÞÐè¸üÐÂÏàÓ¦µãÖµµÄ±¾µØ»º´æ£¬´ó·ù½µµÍÁËͨÐÅ¿ªÏú¡£

2.Indexing Active Edges£ºÃ»ÓиüÐµĶ¥µãÔÚÏÂÒ»ÂÖµü´úʱ²»ÐèÒªÏòÁÚ¾ÓÖØÐ·¢ËÍÏûÏ¢¡£Òò´Ë£¬mrTriplets±éÀú±ßʱ£¬Èç¹ûÒ»Ìõ±ßµÄÁÚ¾ÓµãÖµÔÚÉÏÒ»ÂÖµü´úʱûÓиüУ¬ÔòÖ±½ÓÌø¹ý£¬±ÜÃâÁË´óÁ¿ÎÞÓõļÆËãºÍͨÐÅ¡£

3.Join Elimination£ºTripletÊÇÓÉÒ»Ìõ±ßºÍÆäÁ½¸öÁÚ¾Óµã×é³ÉµÄÈýÔª×飬²Ù×÷TripletµÄmapº¯Êý³£³£Ö»Ðè·ÃÎÊÆäÁ½¸öÁÚ¾ÓµãÖµÖеÄÒ»¸ö¡£ÀýÈ磬ÔÚPageRank¼ÆËãÖУ¬Ò»¸öµãÖµµÄ¸üÐÂÖ»ÓëÆäÔ´¶¥µãµÄÖµÓйأ¬¶øÓëÆäËùÖ¸ÏòµÄÄ¿µÄ¶¥µãµÄÖµÎ޹ء£ÄÇôÔÚmrTriplets¼ÆËãÖУ¬¾Í²»ÐèÒªVertexRDDºÍEdgeRDDµÄ3-way join£¬¶øÖ»ÐèÒª2-way join¡£

ËùÓÐÕâЩÓÅ»¯Ê¹GraphXµÄÐÔÄÜÖ𽥱ƽüGraphLab¡£ËäÈ»»¹ÓÐÒ»¶¨²î¾à£¬µ«Ò»Ì廯µÄÁ÷Ë®Ïß·þÎñºÍ·á¸»µÄ±à³Ì½Ó¿Ú£¬¿ÉÒÔÃÖ²¹ÐÔÄܵÄ΢С²î¾à¡£

2.2.2.3 ½ø»¯µÄPregelģʽ

GraphXÖеÄPregel½Ó¿Ú£¬²¢²»Ñϸñ×ñÑ­Pregelģʽ£¬ËüÊÇÒ»¸ö²Î¿¼GAS¸Ä½øµÄPregelģʽ¡£¶¨ÒåÈçÏ£º

ÕâÖÖ»ùÓÚmrTrilets·½·¨µÄPregelģʽ£¬Óë±ê×¼PregelµÄ×î´óÇø±ðÊÇ£¬ËüµÄµÚ2¶Î²ÎÊýÌå½ÓÊÕµÄÊÇ3¸öº¯Êý²ÎÊý£¬¶ø²»½ÓÊÕmessageList¡£Ëü²»»áÔÚµ¥¸ö¶¥µãÉϽøÐÐÏûÏ¢±éÀú£¬¶øÊǽ«¶¥µãµÄ¶à¸öGhost¸±±¾ÊÕµ½µÄÏûÏ¢¾ÛºÏºó£¬·¢Ë͸øMaster¸±±¾£¬ÔÙʹÓÃvprogº¯ÊýÀ´¸üеãÖµ¡£ÏûÏ¢µÄ½ÓÊպͷ¢ËͶ¼±»×Ô¶¯²¢Ðл¯´¦Àí£¬ÎÞÐèµ£Ðij¬¼¶½ÚµãµÄÎÊÌâ¡£

³£¼ûµÄ´úÂëÄ£°åÈçÏÂËùʾ£º

¿ÉÒÔ¿´µ½£¬GraphXÉè¼ÆÕâ¸öģʽµÄÓÃÒâ¡£Ëü×ÛºÏÁËPregelºÍGASÁ½ÕßµÄÓŵ㣬¼´½Ó¿ÚÏà¶Ô¼òµ¥£¬ÓÖ±£Ö¤ÐÔÄÜ£¬¿ÉÒÔÓ¦¶Ôµã·Ö¸îµÄͼ´æ´¢Ä£Ê½£¬Ê¤ÈηûºÏÃÝÂÉ·Ö²¼µÄ×ÔȻͼµÄ´óÐͼÆËã¡£ÁíÍ⣬ֵµÃ×¢ÒâµÄÊÇ£¬¹Ù·½µÄPregel°æ±¾ÊÇ×î¼òµ¥µÄÒ»¸ö°æ±¾¡£¶ÔÓÚ¸´ÔÓµÄÒµÎñ³¡¾°£¬¸ù¾ÝÕâ¸ö°æ±¾À©Õ¹Ò»¸ö¶¨ÖƵÄPregelÊǺܳ£¼ûµÄ×ö·¨¡£

2.2.2.4 ͼËã·¨¹¤¾ß°ü

GraphXÒ²ÌṩÁËÒ»Ì×ͼËã·¨¹¤¾ß°ü£¬·½±ãÓû§¶Ôͼ½øÐзÖÎö¡£Ä¿Ç°×îа汾ÒÑÖ§³ÖPageRank¡¢ÊýÈý½ÇÐΡ¢×î´óÁ¬Í¨Í¼ºÍ×î¶Ì·¾¶µÈ6ÖÖ¾­µäµÄͼËã·¨¡£ÕâЩËã·¨µÄ´úÂëʵÏÖ£¬Ä¿µÄºÍÖØµãÔÚÓÚͨÓÃÐÔ¡£Èç¹ûÒª»ñµÃ×î¼ÑÐÔÄÜ£¬¿ÉÒԲο¼ÆäʵÏÖ½øÐÐÐ޸ĺÍÀ©Õ¹Âú×ãÒµÎñÐèÇó¡£ÁíÍ⣬ÑжÁÕâЩ´úÂ룬ҲÊÇÀí½âGraphX±à³Ì×î¼Ñʵ¼ùµÄºÃ·½·¨¡£

3¡¢GraphXʵÀý

3.1 ͼÀýÑÝʾ

3.1.1 Àý×Ó½éÉÜ

ÏÂͼÖÐÓÐ6¸öÈË£¬Ã¿¸öÈËÓÐÃû×ÖºÍÄêÁ䣬ÕâЩÈ˸ù¾ÝÉç»á¹ØÏµÐγÉ8Ìõ±ß£¬Ã¿Ìõ±ßÓÐÆäÊôÐÔ¡£ÔÚÒÔÏÂÀý×ÓÑÝʾÖн«¹¹½¨¶¥µã¡¢±ßºÍͼ£¬´òӡͼµÄÊôÐÔ¡¢×ª»»²Ù×÷¡¢½á¹¹²Ù×÷¡¢Á¬½Ó²Ù×÷¡¢¾ÛºÏ²Ù×÷£¬²¢½áºÏʵ¼ÊÒªÇó½øÐÐÑÝʾ¡£

3.1.2 ³ÌÐò´úÂë

import org.apache.log4j.{Level, Logger}
import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD

object GraphXExample {
def main(args: Array[String]) {
//ÆÁ±ÎÈÕÖ¾
Logger.getLogger("org.apache.spark").setLevel(Level.WARN)
Logger.getLogger("org.eclipse.jetty.server").setLevel(Level.OFF)

//ÉèÖÃÔËÐл·¾³
val conf = new SparkConf().setAppName("SimpleGraphX").setMaster("local")
val sc = new SparkContext(conf)

//ÉèÖö¥µãºÍ±ß£¬×¢Òâ¶¥µãºÍ±ß¶¼ÊÇÓÃÔª×鶨ÒåµÄArray
//¶¥µãµÄÊý¾ÝÀàÐÍÊÇVD:(String,Int)
val vertexArray = Array(
(1L, ("Alice", 28)),
(2L, ("Bob", 27)),
(3L, ("Charlie", 65)),
(4L, ("David", 42)),
(5L, ("Ed", 55)),
(6L, ("Fran", 50))
)
//±ßµÄÊý¾ÝÀàÐÍED:Int
val edgeArray = Array(
Edge(2L, 1L, 7),
Edge(2L, 4L, 2),
Edge(3L, 2L, 4),
Edge(3L, 6L, 3),
Edge(4L, 1L, 1),
Edge(5L, 2L, 2),
Edge(5L, 3L, 8),
Edge(5L, 6L, 3)
)

//¹¹ÔìvertexRDDºÍedgeRDD
val vertexRDD: RDD[(Long, (String, Int))] = sc.parallelize(vertexArray)
val edgeRDD: RDD[Edge[Int]] = sc.parallelize(edgeArray)

//¹¹ÔìͼGraph[VD,ED]
val graph: Graph[(String, Int), Int] = Graph(vertexRDD, edgeRDD)

//***********************************************************************************
//*************************** ͼµÄÊôÐÔ ****************************************
//**********************************************************************************
println("***********************************************")
println("ÊôÐÔÑÝʾ")
println("**********************************************************")
println("ÕÒ³öͼÖÐÄêÁä´óÓÚ30µÄ¶¥µã£º")
graph.vertices.filter { case (id, (name, age)) => age > 30}.collect.foreach {
case (id, (name, age)) => println(s"$name is $age")
}

//±ß²Ù×÷£ºÕÒ³öͼÖÐÊôÐÔ´óÓÚ5µÄ±ß
println("ÕÒ³öͼÖÐÊôÐÔ´óÓÚ5µÄ±ß£º")
graph.edges.filter(e => e.attr > 5).collect.foreach(e => println(s"${e.srcId} to ${e.dstId} att ${e.attr}"))
println

//triplets²Ù×÷£¬((srcId, srcAttr), (dstId, dstAttr), attr)
println("Áгö±ßÊôÐÔ>5µÄtripltes£º")
for (triplet <- graph.triplets.filter(t => t.attr > 5).collect) {
println(s"${triplet.srcAttr._1} likes ${triplet.dstAttr._1}")
}
println

//Degrees²Ù×÷
println("ÕÒ³öͼÖÐ×î´óµÄ³ö¶È¡¢Èë¶È¡¢¶ÈÊý£º")
def max(a: (VertexId, Int), b: (VertexId, Int)): (VertexId, Int) = {
if (a._2 > b._2) a else b
}
println("max of outDegrees:" + graph.outDegrees.reduce(max) + " max of inDegrees:" + graph.inDegrees.reduce(max)
+ " max of Degrees:" + graph.degrees.reduce(max))
println

//***********************************************************************************
//*************************** ת»»²Ù×÷ ****************************************
//**********************************************************************************
println("**********************************************************")
println("ת»»²Ù×÷")
println("**********************************************************")
println("¶¥µãµÄת»»²Ù×÷£¬¶¥µãage + 10£º")
graph.mapVertices{ case (id, (name, age)) => (id, (name, age+10))}
.vertices.collect.foreach(v => println(s"${v._2._1} is ${v._2._2}"))
println
println("±ßµÄת»»²Ù×÷£¬±ßµÄÊôÐÔ*2£º")
graph.mapEdges(e=>e.attr*2).edges.collect.foreach(e => println(s"${e.srcId} to ${e.dstId} att ${e.attr}"))
println

//***********************************************************************************
//*************************** ½á¹¹²Ù×÷ ****************************************
//**********************************************************************************
println("**********************************************************")
println("½á¹¹²Ù×÷")
println("**********************************************************")
println("¶¥µãÄê¼Í>30µÄ×Óͼ£º")
val subGraph = graph.subgraph(vpred = (id, vd) => vd._2 >= 30)
println("×ÓͼËùÓж¥µã£º")
subGraph.vertices.collect.foreach(v => println(s"${v._2._1} is ${v._2._2}"))
println
println("×ÓͼËùÓбߣº")
subGraph.edges.collect.foreach(e => println(s"${e.srcId} to ${e.dstId} att ${e.attr}"))
println


//***********************************************************************************
//*************************** Á¬½Ó²Ù×÷ ****************************************
//**********************************************************************************
println("**********************************************************")
println("Á¬½Ó²Ù×÷")
println("**********************************************************")
val inDegrees: VertexRDD[Int] = graph.inDegrees
case class User(name: String, age: Int, inDeg: Int, outDeg: Int)

//´´½¨Ò»¸öÐÂͼ£¬¶¥µãVDµÄÊý¾ÝÀàÐÍΪUser£¬²¢´Ógraph×öÀàÐÍת»»
val initialUserGraph: Graph[User, Int] = graph.mapVertices { case (id, (name, age)) => User(name, age, 0, 0)}

//initialUserGraphÓëinDegrees¡¢outDegrees£¨RDD£©½øÐÐÁ¬½Ó£¬²¢ÐÞ¸ÄinitialUserGraphÖÐinDegÖµ¡¢outDegÖµ
val userGraph = initialUserGraph.outerJoinVertices(initialUserGraph.inDegrees) {
case (id, u, inDegOpt) => User(u.name, u.age, inDegOpt.getOrElse(0), u.outDeg)
}.outerJoinVertices(initialUserGraph.outDegrees) {
case (id, u, outDegOpt) => User(u.name, u.age, u.inDeg,outDegOpt.getOrElse(0))
}

println("Á¬½ÓͼµÄÊôÐÔ£º")
userGraph.vertices.collect.foreach(v => println(s"${v._2.name} inDeg: ${v._2.inDeg} outDeg: ${v._2.outDeg}"))
println

println("³ö¶ÈºÍÈë¶ÁÏàͬµÄÈËÔ±£º")
userGraph.vertices.filter {
case (id, u) => u.inDeg == u.outDeg
}.collect.foreach {
case (id, property) => println(property.name)
}
println

//***********************************************************************************
//*************************** ¾ÛºÏ²Ù×÷ ****************************************
//**********************************************************************************
println("**********************************************************")
println("¾ÛºÏ²Ù×÷")
println("**********************************************************")
println("ÕÒ³öÄê¼Í×î´óµÄ×·ÇóÕߣº")
val oldestFollower: VertexRDD[(String, Int)] = userGraph.mapReduceTriplets[(String, Int)](
// ½«Ô´¶¥µãµÄÊôÐÔ·¢Ë͸øÄ¿±ê¶¥µã£¬map¹ý³Ì
edge => Iterator((edge.dstId, (edge.srcAttr.name, edge.srcAttr.age))),
// µÃµ½×î´ó×·ÇóÕߣ¬reduce¹ý³Ì
(a, b) => if (a._2 > b._2) a else b
)

userGraph.vertices.leftJoin(oldestFollower) { (id, user, optOldestFollower) =>
optOldestFollower match {
case None => s"${user.name} does not have any followers."
case Some((name, age)) => s"${name} is the oldest follower of ${user.name}."
}
}.collect.foreach { case (id, str) => println(str)}
println

//***********************************************************************************
//*************************** ʵÓòÙ×÷ ****************************************
//**********************************************************************************
println("**********************************************************")
println("¾ÛºÏ²Ù×÷")
println("**********************************************************")
println("ÕÒ³ö5µ½¸÷¶¥µãµÄ×î¶Ì£º")
val sourceId: VertexId = 5L // ¶¨ÒåÔ´µã
val initialGraph = graph.mapVertices((id, _) => if (id == sourceId) 0.0 else Double.PositiveInfinity)
val sssp = initialGraph.pregel(Double.PositiveInfinity)(
(id, dist, newDist) => math.min(dist, newDist),
triplet => { // ¼ÆËãÈ¨ÖØ
if (triplet.srcAttr + triplet.attr < triplet.dstAttr) {
Iterator((triplet.dstId, triplet.srcAttr + triplet.attr))
} else {
Iterator.empty
}
},
(a,b) => math.min(a,b) // ×î¶Ì¾àÀë
)
println(sssp.vertices.collect.mkString("\n"))

sc.stop()
}
}

3.1.3 ÔËÐнá¹û

ÔÚIDEA£¨ÈçºÎʹÓÃIDEA²Î¼ûµÚ3¿Î¡¶3.Spark±à³ÌÄ£ÐÍ£¨Ï£©--IDEA´î½¨¼°ÊµÕ½¡·£©ÖÐÊ×ÏȶÔGraphXExample.scala´úÂë½øÐбàÒ룬±àÒëͨ¹ýºó½øÐÐÖ´ÐУ¬Ö´Ðнá¹ûÈçÏ£º

**********************************************************
ÊôÐÔÑÝʾ
**********************************************************
ÕÒ³öͼÖÐÄêÁä´óÓÚ30µÄ¶¥µã£º
David is 42
Fran is 50
Charlie is 65
Ed is 55
ÕÒ³öͼÖÐÊôÐÔ´óÓÚ5µÄ±ß£º
2 to 1 att 7
5 to 3 att 8

Áгö±ßÊôÐÔ>5µÄtripltes£º
Bob likes Alice
Ed likes Charlie

ÕÒ³öͼÖÐ×î´óµÄ³ö¶È¡¢Èë¶È¡¢¶ÈÊý£º
max of outDegrees:(5,3) max of inDegrees:(2,2) max of Degrees:(2,4)

**********************************************************
ת»»²Ù×÷
**********************************************************
¶¥µãµÄת»»²Ù×÷£¬¶¥µãage + 10£º
4 is (David,52)
1 is (Alice,38)
6 is (Fran,60)
3 is (Charlie,75)
5 is (Ed,65)
2 is (Bob,37)

±ßµÄת»»²Ù×÷£¬±ßµÄÊôÐÔ*2£º
2 to 1 att 14
2 to 4 att 4
3 to 2 att 8
3 to 6 att 6
4 to 1 att 2
5 to 2 att 4
5 to 3 att 16
5 to 6 att 6

**********************************************************
½á¹¹²Ù×÷
**********************************************************
¶¥µãÄê¼Í>30µÄ×Óͼ£º
×ÓͼËùÓж¥µã£º
David is 42
Fran is 50
Charlie is 65
Ed is 55

×ÓͼËùÓбߣº
3 to 6 att 3
5 to 3 att 8
5 to 6 att 3

**********************************************************
Á¬½Ó²Ù×÷
**********************************************************
Á¬½ÓͼµÄÊôÐÔ£º
David inDeg: 1 outDeg: 1
Alice inDeg: 2 outDeg: 0
Fran inDeg: 2 outDeg: 0
Charlie inDeg: 1 outDeg: 2
Ed inDeg: 0 outDeg: 3
Bob inDeg: 2 outDeg: 2

³ö¶ÈºÍÈë¶ÁÏàͬµÄÈËÔ±£º
David
Bob

**********************************************************
¾ÛºÏ²Ù×÷
**********************************************************
ÕÒ³öÄê¼Í×î´óµÄ×·ÇóÕߣº
Bob is the oldest follower of David.
David is the oldest follower of Alice.
Charlie is the oldest follower of Fran.
Ed is the oldest follower of Charlie.
Ed does not have any followers.
Charlie is the oldest follower of Bob.

**********************************************************
ʵÓòÙ×÷
**********************************************************
ÕÒ³ö5µ½¸÷¶¥µãµÄ×î¶Ì£º
(4,4.0)
(1,5.0)
(6,3.0)
(3,8.0)
(5,0.0)
(2,2.0)

3.2 PageRank ÑÝʾ

3.2.1 Àý×Ó½éÉÜ

PageRank, ¼´ÍøÒ³ÅÅÃû£¬ÓÖ³ÆÍøÒ³¼¶±ð¡¢Google ×ó²àÅÅÃû»òÅ寿ÅÅÃû¡£ËüÊÇGoogle ´´Ê¼ÈËÀ­À Å寿ºÍл¶û¸Ç¡¤ ²¼ÁÖÓÚ1997 Äê¹¹½¨ÔçÆÚµÄËÑË÷ϵͳԭÐÍʱÌá³öµÄÁ´½Ó·ÖÎöËã·¨¡£Ä¿Ç°ºÜ¶àÖØÒªµÄÁ´½Ó·ÖÎöËã·¨¶¼ÊÇÔÚPageRank Ëã·¨»ù´¡ÉÏÑÜÉú³öÀ´µÄ¡£PageRank ÊÇGoogle ÓÃÓÚÓÃÀ´±êÊ¶ÍøÒ³µÄµÈ¼¶/ ÖØÒªÐÔµÄÒ»ÖÖ·½·¨£¬ÊÇGoogle ÓÃÀ´ºâÁ¿Ò»¸öÍøÕ¾µÄºÃ»µµÄΨһ±ê×¼¡£ÔÚÈàºÏÁËÖîÈçTitle ±êʶºÍKeywords ±êʶµÈËùÓÐÆäËüÒòËØÖ®ºó£¬ Google ͨ¹ýPageRank À´µ÷Õû½á¹û£¬Ê¹ÄÇЩ¸ü¾ß¡°µÈ¼¶/ ÖØÒªÐÔ¡±µÄÍøÒ³ÔÚËÑË÷½á¹ûÖÐÁîÍøÕ¾ÅÅÃû»ñµÃÌáÉý£¬´Ó¶øÌá¸ßËÑË÷½á¹ûµÄÏà¹ØÐÔºÍÖÊÁ¿¡£

3.2.2 ²âÊÔÊý¾Ý

ÔÚÕâÀï²âÊÔÊý¾ÝΪ¶¥µãÊý¾Ýgraphx-wiki-vertices.txtºÍ±ßÊý¾Ýgraphx-wiki-edges.txt£¬¿ÉÒÔÔÚ±¾ÏµÁи½´ø×ÊÔ´/data/class9/Ŀ¼ÖÐÕÒµ½ÕâÁ½¸öÊý¾ÝÎļþ£¬ÆäÖиñʽΪ£º

l ¶¥µãΪ¶¥µã±àºÅºÍÍøÒ³±êÌâ

l ±ßÊý¾ÝÓÉÁ½¸ö¶¥µã¹¹³É

3.2.3 ³ÌÐò´úÂë

import org.apache.log4j.{Level, Logger}
import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD

object PageRank {
def main(args: Array[String]) {
//ÆÁ±ÎÈÕÖ¾
Logger.getLogger("org.apache.spark").setLevel(Level.WARN)
Logger.getLogger("org.eclipse.jetty.server").setLevel(Level.OFF)

//ÉèÖÃÔËÐл·¾³
val conf = new SparkConf().setAppName("PageRank").setMaster("local")
val sc = new SparkContext(conf)

//¶ÁÈëÊý¾ÝÎļþ
val articles: RDD[String] = sc.textFile("/home/hadoop/IdeaProjects/data/graphx/graphx-wiki-vertices.txt")
val links: RDD[String] = sc.textFile("/home/hadoop/IdeaProjects/data/graphx/graphx-wiki-edges.txt")

//×°ÔØ¶¥µãºÍ±ß
val vertices = articles.map { line =>
val fields = line.split('\t')
(fields(0).toLong, fields(1))
}

val edges = links.map { line =>
val fields = line.split('\t')
Edge(fields(0).toLong, fields(1).toLong, 0)
}

//cache²Ù×÷
//val graph = Graph(vertices, edges, "").persist(StorageLevel.MEMORY_ONLY_SER)
val graph = Graph(vertices, edges, "").persist()
//graph.unpersistVertices(false)

//²âÊÔ
println("**********************************************************")
println("»ñÈ¡5¸ötripletÐÅÏ¢")
println("**********************************************************")
graph.triplets.take(5).foreach(println(_))

//pageRankËã·¨ÀïÃæµÄʱºòʹÓÃÁËcache()£¬¹ÊÇ°ÃæpersistµÄʱºòÖ»ÄÜʹÓÃMEMORY_ONLY
println("**********************************************************")
println("PageRank¼ÆË㣬»ñÈ¡×îÓмÛÖµµÄÊý¾Ý")
println("**********************************************************")
val prGraph = graph.pageRank(0.001).cache()

val titleAndPrGraph = graph.outerJoinVertices(prGraph.vertices) {
(v, title, rank) => (rank.getOrElse(0.0), title)
}

titleAndPrGraph.vertices.top(10) {
Ordering.by((entry: (VertexId, (Double, String))) => entry._2._1)
}.foreach(t => println(t._2._2 + ": " + t._2._1))

sc.stop()
}
}

3.2.4 ÔËÐнá¹û

ÔÚIDEAÖÐÊ×ÏȶÔPageRank.scala´úÂë½øÐбàÒ룬±àÒëͨ¹ýºó½øÐÐÖ´ÐУ¬Ö´Ðнá¹ûÈçÏ£º

**********************************************************
»ñÈ¡5¸ötripletÐÅÏ¢
**********************************************************
((146271392968588,Computer Consoles Inc.),(7097126743572404313,Berkeley Software Distribution),0)
((146271392968588,Computer Consoles Inc.),(8830299306937918434,University of California, Berkeley),0)
((625290464179456,List of Penguin Classics),(1735121673437871410,George Berkeley),0)
((1342848262636510,List of college swimming and diving teams),(8830299306937918434,University of California, Berkeley),0)
((1889887370673623,Anthony Pawson),(8830299306937918434,University of California, Berkeley),0)

**********************************************************
PageRank¼ÆË㣬»ñÈ¡×îÓмÛÖµµÄÊý¾Ý
**********************************************************
University of California, Berkeley: 1321.111754312097
Berkeley, California: 664.8841977233583
Uc berkeley: 162.50132743397873
Berkeley Software Distribution: 90.4786038848606
Lawrence Berkeley National Laboratory: 81.90404939641944
George Berkeley: 81.85226118457985
Busby Berkeley: 47.871998218019655
Berkeley Hills: 44.76406979519754
Xander Berkeley: 30.324075347288037
Berkeley County, South Carolina: 28.908336483710308

 

   
3246 ´Îä¯ÀÀ       27
Ïà¹ØÎÄÕÂ

»ùÓÚEAµÄÊý¾Ý¿â½¨Ä£
Êý¾ÝÁ÷½¨Ä££¨EAÖ¸ÄÏ£©
¡°Êý¾Ýºþ¡±£º¸ÅÄî¡¢ÌØÕ÷¡¢¼Ü¹¹Óë°¸Àý
ÔÚÏßÉ̳ÇÊý¾Ý¿âϵͳÉè¼Æ ˼·+Ч¹û
 
Ïà¹ØÎĵµ

GreenplumÊý¾Ý¿â»ù´¡Åàѵ
MySQL5.1ÐÔÄÜÓÅ»¯·½°¸
ijµçÉÌÊý¾ÝÖÐ̨¼Ü¹¹Êµ¼ù
MySQL¸ßÀ©Õ¹¼Ü¹¹Éè¼Æ
Ïà¹Ø¿Î³Ì

Êý¾ÝÖÎÀí¡¢Êý¾Ý¼Ü¹¹¼°Êý¾Ý±ê×¼
MongoDBʵս¿Î³Ì
²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿âÉè¼ÆÓëÓÅ»¯
PostgreSQLÊý¾Ý¿âʵսÅàѵ
×îл¼Æ»®
DeepSeekÔÚÈí¼þ²âÊÔÓ¦ÓÃʵ¼ù 4-12[ÔÚÏß]
DeepSeek´óÄ£ÐÍÓ¦Óÿª·¢Êµ¼ù 4-19[ÔÚÏß]
UAF¼Ü¹¹ÌåϵÓëʵ¼ù 4-11[±±¾©]
AIÖÇÄÜ»¯Èí¼þ²âÊÔ·½·¨Óëʵ¼ù 5-23[ÉϺ£]
»ùÓÚ UML ºÍEA½øÐзÖÎöÉè¼Æ 4-26[±±¾©]
ÒµÎñ¼Ü¹¹Éè¼ÆÓ뽨ģ 4-18[±±¾©]

APPÍÆ¹ãÖ®ÇÉÓù¤¾ß½øÐÐÊý¾Ý·ÖÎö
Hadoop Hive»ù´¡sqlÓï·¨
Ó¦Óö༶»º´æÄ£Ê½Ö§³Åº£Á¿¶Á·þÎñ
HBase ³¬Ïêϸ½éÉÜ
HBase¼¼ÊõÏêϸ½éÉÜ
Spark¶¯Ì¬×ÊÔ´·ÖÅä

HadoopÓëSpark´óÊý¾Ý¼Ü¹¹
HadoopÔ­ÀíÓë¸ß¼¶Êµ¼ù
HadoopÔ­Àí¡¢Ó¦ÓÃÓëÓÅ»¯
´óÊý¾ÝÌåϵ¿ò¼ÜÓëÓ¦ÓÃ
´óÊý¾ÝµÄ¼¼ÊõÓëʵ¼ù
Spark´óÊý¾Ý´¦Àí¼¼Êõ

GE Çø¿éÁ´¼¼ÊõÓëʵÏÖÅàѵ
º½Ìì¿Æ¹¤Ä³×Ó¹«Ë¾ Nodejs¸ß¼¶Ó¦Óÿª·¢
ÖÐÊ¢Òæ»ª ׿Խ¹ÜÀíÕß±ØÐë¾ß±¸µÄÎåÏîÄÜÁ¦
ijÐÅÏ¢¼¼Êõ¹«Ë¾ PythonÅàѵ
ij²©²ÊITϵͳ³§ÉÌ Ò×ÓÃÐÔ²âÊÔÓëÆÀ¹À
ÖйúÓÊ´¢ÒøÐÐ ²âÊÔ³ÉÊì¶ÈÄ£Ðͼ¯³É(TMMI)
ÖÐÎïÔº ²úÆ·¾­ÀíÓë²úÆ·¹ÜÀí