
ǰÑÔ
GoogleËÑË÷£¬ÔçÒѳÉΪÎÒÿÌì±ØÓõŤ¾ß£¬ÎÞÊý´Î¾ªÌ¾ËüËÑË÷½á¹ûµÄ׼ȷÐÔ¡£Í¬Ê±£¬ÎÒÒ²ÔÚ×öGoogleµÄSEO£¬Íƹã×Ô¼ºµÄ²©¿Í¡£¾¹ý¼¸¸öÔ³¢ÊÔ£¬ÎҵIJ©¿ÍPRµ½2ÁË£¬ÍâÁ´Ò²Óм¸Íò¸öÁË¡£×ܽáÏÂÀ´£¬»¹ÊǸÐ̾PageRankµÄÉñÆæ£¡
¸Ä±äÊÀ½çµÄËã·¨£¬PageRank£¡
1. PageRankËã·¨½éÉÜ
PageRankÊÇGoogleרÓеÄËã·¨£¬ÓÃÓÚºâÁ¿Ìض¨ÍøÒ³Ïà¶ÔÓÚËÑË÷ÒýÇæË÷ÒýÖÐµÄÆäËûÍøÒ³¶øÑÔµÄÖØÒª³Ì¶È¡£ËüÓÉLarry
Page ºÍ Sergey BrinÔÚ20ÊÀ¼Í90Äê´úºóÆÚ·¢Ã÷¡£PageRankʵÏÖÁ˽«Á´½Ó¼ÛÖµ¸ÅÄî×÷ΪÅÅÃûÒòËØ¡£
PageRankÈÃÁ´½ÓÀ´¡±Í¶Æ±¡±
Ò»¸öÒ³ÃæµÄ¡°µÃƱÊý¡±ÓÉËùÓÐÁ´ÏòËüµÄÒ³ÃæµÄÖØÒªÐÔÀ´¾ö¶¨£¬µ½Ò»¸öÒ³ÃæµÄ³¬Á´½ÓÏ൱ÓÚ¶Ô¸ÃҳͶһƱ¡£Ò»¸öÒ³ÃæµÄPageRankÊÇÓÉËùÓÐÁ´ÏòËüµÄÒ³Ãæ£¨¡°Á´ÈëÒ³Ãæ¡±£©µÄÖØÒªÐÔ¾¹ýµÝ¹éËã·¨µÃµ½µÄ¡£Ò»¸öÓн϶àÁ´ÈëµÄÒ³Ãæ»áÓнϸߵĵȼ¶£¬Ïà·´Èç¹ûÒ»¸öÒ³ÃæÃ»ÓÐÈκÎÁ´ÈëÒ³Ãæ£¬ÄÇôËüûÓеȼ¶¡£
¼òµ¥Ò»¾ä»°¸ÅÀ¨£º´ÓÐí¶àÓÅÖʵÄÍøÒ³Á´½Ó¹ýÀ´µÄÍøÒ³£¬±Ø¶¨»¹ÊÇÓÅÖÊÍøÒ³¡£
PageRankµÄ¼ÆËã»ùÓÚÒÔÏÂÁ½¸ö»ù±¾¼ÙÉ裺
1.ÊýÁ¿¼ÙÉ裺Èç¹ûÒ»¸öÒ³Ãæ½Úµã½ÓÊÕµ½µÄÆäËûÍøÒ³Ö¸ÏòµÄÈëÁ´ÊýÁ¿Ô½¶à£¬ÄÇôÕâ¸öÒ³ÃæÔ½ÖØÒª
2.ÖÊÁ¿¼ÙÉ裺ָÏòÒ³ÃæAµÄÈëÁ´ÖÊÁ¿²»Í¬£¬ÖÊÁ¿¸ßµÄÒ³Ãæ»áͨ¹ýÁ´½ÓÏòÆäËûÒ³Ãæ´«µÝ¸ü¶àµÄÈ¨ÖØ¡£ËùÒÔÔ½ÊÇÖÊÁ¿¸ßµÄÒ³ÃæÖ¸ÏòÒ³ÃæA£¬ÔòÒ³ÃæAÔ½ÖØÒª¡£
ÒªÌá¸ßPageRankÓÐ3¸öÒªµã£º
1.·´ÏòÁ´½ÓÊý
2.·´ÏòÁ´½ÓÊÇ·ñÀ´×ÔPageRank½Ï¸ßµÄÒ³Ãæ
3.·´ÏòÁ´½ÓÔ´Ò³ÃæµÄÁ´½ÓÊý
2. PageRankËã·¨ÔÀí
ÔÚ³õʼ½×¶Î£ºÍøÒ³Í¨¹ýÁ´½Ó¹ØÏµ¹¹½¨ÆðÓÐÏòͼ£¬Ã¿¸öÒ³ÃæÉèÖÃÏàͬµÄPageRankÖµ£¬Í¨¹ýÈô¸ÉÂֵļÆË㣬»áµÃµ½Ã¿¸öÒ³ÃæËù»ñµÃµÄ×îÖÕPageRankÖµ¡£Ëæ×ÅÿһÂֵļÆËã½øÐУ¬ÍøÒ³µ±Ç°µÄPageRankÖµ»á²»¶ÏµÃµ½¸üС£
ÔÚÒ»ÂÖ¸üÐÂÒ³ÃæPageRankµÃ·ÖµÄ¼ÆËãÖУ¬Ã¿¸öÒ³Ãæ½«Æäµ±Ç°µÄPageRankֵƽ¾ù·ÖÅäµ½±¾Ò³Ãæ°üº¬µÄ³öÁ´ÉÏ£¬ÕâÑùÿ¸öÁ´½Ó¼´»ñµÃÁËÏàÓ¦µÄȨֵ¡£¶øÃ¿¸öÒ³Ãæ½«ËùÓÐÖ¸Ïò±¾Ò³ÃæµÄÈëÁ´Ëù´«ÈëµÄȨֵÇóºÍ£¬¼´¿ÉµÃµ½ÐµÄPageRankµÃ·Ö¡£µ±Ã¿¸öÒ³Ãæ¶¼»ñµÃÁ˸üкóµÄPageRankÖµ£¬¾ÍÍê³ÉÁËÒ»ÂÖPageRank¼ÆËã¡£
1). Ëã·¨ÔÀí
PageRankËã·¨½¨Á¢ÔÚËæ»ú³åÀËÕßÄ£ÐÍÉÏ£¬Æä»ù±¾Ë¼ÏëÊÇ£ºÍøÒ³µÄÖØÒªÐÔÅÅÐòÊÇÓÉÍøÒ³¼äµÄÁ´½Ó¹ØÏµËù¾ö¶¨µÄ£¬Ëã·¨ÊÇÒÀ¿¿ÍøÒ³¼äµÄÁ´½Ó½á¹¹À´ÆÀ¼Ûÿ¸öÒ³ÃæµÄµÈ¼¶ºÍÖØÒªÐÔ£¬Ò»¸öÍøÒ³µÄPRÖµ²»½ö¿¼ÂÇÖ¸ÏòËüµÄÁ´½ÓÍøÒ³Êý£¬»¹ÓÐÖ¸Ïò¡¯Ö¸ÏòËüµÄÍøÒ³µÄÆäËûÍøÒ³±¾ÉíµÄÖØÒªÐÔ¡£
PageRank¾ßÓÐÁ½´óÌØÐÔ£º
PRÖµµÄ´«µÝÐÔ£ºÍøÒ³AÖ¸ÏòÍøÒ³Bʱ£¬AµÄPRÖµÒ²²¿·Ö´«µÝ¸øB
ÖØÒªÐԵĴ«µÝÐÔ£ºÒ»¸öÖØÒªÍøÒ³±ÈÒ»¸ö²»ÖØÒªÍøÒ³´«µÝµÄÈ¨ÖØÒª¶à
2). ¼ÆË㹫ʽ£º

PR(pi): piÒ³ÃæµÄPageRankÖµ
n: ËùÓÐÒ³ÃæµÄÊýÁ¿
pi: ²»Í¬µÄÍøÒ³p1,p2,p3
M(i): piÁ´ÈëÍøÒ³µÄ¼¯ºÏ
L(j): pjÁ´³öÍøÒ³µÄÊýÁ¿
d:×èÄáϵÊý, ÈÎÒâʱ¿Ì£¬Óû§µ½´ïÄ³Ò³Ãæºó²¢¼ÌÐøÏòºóä¯ÀÀµÄ¸ÅÂÊ¡£
(1-d=0.15) :±íʾÓû§Í£Ö¹µã»÷£¬Ëæ»úÌøµ½ÐÂURLµÄ¸ÅÂÊ
ȡֵ·¶Î§: 0 < d ¡Ü 1, GoogleÉèΪ0.85
3). ¹¹ÔìʵÀý£ºÒÔ4¸öÒ³ÃæµÄÊý¾ÝΪÀý

ͼƬ˵Ã÷£º
ID=1µÄÒ³ÃæÁ´Ïò2,3,4Ò³Ãæ£¬ËùÒÔÒ»¸öÓû§´ÓID=1µÄÒ³ÃæÌø×ªµ½2,3,4µÄ¸ÅÂʸ÷Ϊ1/3
ID=2µÄÒ³ÃæÁ´Ïò3,4Ò³Ãæ£¬ËùÒÔÒ»¸öÓû§´ÓID=2µÄÒ³ÃæÌø×ªµ½3,4µÄ¸ÅÂʸ÷Ϊ1/2
ID=3µÄÒ³ÃæÁ´Ïò4Ò³Ãæ£¬ËùÒÔÒ»¸öÓû§´ÓID=3µÄÒ³ÃæÌø×ªµ½4µÄ¸ÅÂʸ÷Ϊ1
ID=4µÄÒ³ÃæÁ´Ïò2Ò³Ãæ£¬ËùÒÔÒ»¸öÓû§´ÓID=4µÄÒ³ÃæÌø×ªµ½2µÄ¸ÅÂʸ÷Ϊ1
¹¹ÔìÁÚ½Ó±í£º
Á´½ÓÔ´Ò³Ãæ Á´½ÓÄ¿±êÒ³Ãæ 1 2,3,4 2 3,4 3 4 4 2 |
¹¹ÔìÁÚ½Ó¾ØÕó(·½Õó):
ÁУºÔ´Ò³Ãæ
ÐУºÄ¿±êÒ³Ãæ
[,1] [,2] [,3] [,4] [1,] 0 0 0 0 [2,] 1 0 0 1 [3,] 1 1 0 0 [4,] 1 1 1 0 |
ת»»Îª¸ÅÂʾØÕó(×ªÒÆ¾ØÕó)
[,1] [,2] [,3] [,4] [1,] 0 0 0 0 [2,] 1/3 0 0 1 [3,] 1/3 1/2 0 0 [4,] 1/3 1/2 1 0 |
ͨ¹ýÁ´½Ó¹ØÏµ£¬ÎÒÃǾ͹¹Ôì³öÁË¡°×ªÒƾØÕ󡱡£
3. RÓïÑÔµ¥»úË㷨ʵÏÖ
´´½¨Êý¾ÝÎļþ£ºpage.csv
·Ö±ðÓÃÏÂÃæ3ÖÖ·½Ê½ÊµÏÖPageRank:
1.δ¿¼ÂÇ×èÄáϵͳµÄÇé¿ö
2.°üÀ¨¿¼ÂÇ×èÄáϵͳµÄÇé¿ö
3.Ö±½ÓÓÃRµÄÌØÕ÷Öµ¼ÆË㺯Êý
1). δ¿¼ÂÇ×èÄáϵͳµÄÇé¿ö
RÓïÑÔʵÏÖ
#¹¹½¨ÁÚ½Ó¾ØÕó adjacencyMatrix<-function(pages){ n<-max(apply(pages,2,max)) A <- matrix(0,n,n) for(i in 1:nrow(pages)) A[pages[i,]$dist,pages[i,]$src]<-1 A }
#±ä»»¸ÅÂʾØÕó
probabilityMatrix<-function(G){
cs <- colSums(G)
cs[cs==0] <- 1
n <- nrow(G)
A <- matrix(0,nrow(G),ncol(G))
for (i in 1:n) A[i,] <- A[i,] + G[i,]/cs
A
}
#µÝ¹é¼ÆËã¾ØÕóÌØÕ÷Öµ
eigenMatrix<-function(G,iter=100){
iter<-10
n<-nrow(G)
x <- rep(1,n)
for (i in 1:iter) x <- G %*% x
x/sum(x)
}
> pages<-read.table(file="page.csv",header=FALSE,sep=",")
> names(pages)<-c("src","dist");pages
src dist
1 1 2
2 1 3
3 1 4
4 2 3
5 2 4
6 3 4
7 4 2
> A<-adjacencyMatrix(pages);A
[,1] [,2] [,3] [,4]
[1,] 0 0 0 0
[2,] 1 0 0 1
[3,] 1 1 0 0
[4,] 1 1 1 0
> G<-probabilityMatrix(A);G
[,1] [,2] [,3] [,4]
[1,] 0.0000000 0.0 0 0
[2,] 0.3333333 0.0 0 1
[3,] 0.3333333 0.5 0 0
[4,] 0.3333333 0.5 1 0
> q<-eigenMatrix(G,100);q
[,1]
[1,] 0.0000000
[2,] 0.4036458
[3,] 0.1979167
[4,] 0.3984375
|
½á¹û½â¶Á£º
ID=1µÄÒ³Ãæ£¬PRÖµÊÇ0£¬ÒòΪûÓÐÖ¸ÏòID=1µÄÒ³Ãæ
ID=2µÄÒ³Ãæ£¬PRÖµÊÇ0.4£¬È¨ÖØ×î¸ß£¬ÒòΪ1ºÍ4¶¼Ö¸Ïò2£¬4È¨ÖØ½Ï¸ß£¬²¢ÇÒ4Ö»ÓÐÒ»¸öÁ´½ÓÖ¸Ïòµ½2£¬È¨ÖØ´«µÝûÓÐËðʧ
ID=3µÄÒ³Ãæ£¬PRÖµÊÇ0.19£¬ËäÓÐ1ºÍ2µÄÖ¸ÏòÁË3£¬µ«ÊÇ1ºÍ2»¹Ö¸ÏòµÄÆäËûÒ³Ãæ£¬È¨Öر»·ÖÉ¢ÁË£¬ËùÒÔID=3µÄÒ³ÃæPR²¢²»¸ß
ID=4µÄÒ³Ãæ£¬PRÖµÊÇ0.39£¬È¨Öغܸߣ¬ÒòΪ±»1,2,3¶¼Ö¸ÏòÁË
´ÓÉÏÃæµÄ½á¹û£¬ÎÒÃÇ·¢ÏÖID=1µÄÒ³Ãæ£¬PRÖµÊÇ0£¬ÄÇôID=1µÄÒ³£¬¾Í²»ÄÜÏòÆäËûÒ³ÃæÊä³öÈ¨ÖØÁË£¬¼ÆËã¾Í»á²»ºÏÀí£¡ËùÒÔ£¬Ôö¼Ód×èÄáϵÊý£¬ÐÞÕýûÓÐÁ´½ÓÖ¸ÏòµÄÒ³Ãæ£¬±£Ö¤Ò³ÃæµÄ×îСPRÖµ>0£¬¡£
2). °üÀ¨¿¼ÂÇ×èÄáϵͳµÄÇé¿ö
Ôö¼Óº¯Êý£ºdProbabilityMatrix
#±ä»»¸ÅÂʾØÕó,¿¼ÂÇdµÄÇé¿ö dProbabilityMatrix<-function(G,d=0.85){ cs <- colSums(G) cs[cs==0] <- 1 n <- nrow(G) delta <- (1-d)/n A <- matrix(delta,nrow(G),ncol(G)) for (i in 1:n) A[i,] <- A[i,] + d*G[i,]/cs A }
> pages<-read.table(file="page.csv",header=FALSE,sep=",")
> names(pages)<-c("src","dist");pages
src dist
1 1 2
2 1 3
3 1 4
4 2 3
5 2 4
6 3 4
7 4 2
> A<-adjacencyMatrix(pages);A
[,1] [,2] [,3] [,4]
[1,] 0 0 0 0
[2,] 1 0 0 1
[3,] 1 1 0 0
[4,] 1 1 1 0
> G<-dProbabilityMatrix(A);G
[,1] [,2] [,3] [,4]
[1,] 0.0375000 0.0375 0.0375 0.0375
[2,] 0.3208333 0.0375 0.0375 0.8875
[3,] 0.3208333 0.4625 0.0375 0.0375
[4,] 0.3208333 0.4625 0.8875 0.0375
> q<-eigenMatrix(G,100);q
[,1]
[1,] 0.0375000
[2,] 0.3738930
[3,] 0.2063759
[4,] 0.3822311 |
Ôö¼Ó×èÄáϵÊýºó£¬ID=1µÄÒ³Ãæ£¬¾ÍÓÐÖµÁËPR(1)=(1-d)/n=(1-0.85)/4=0.0375£¬¼´ÎÞÍâÁ´Ò³ÃæµÄ×îСֵ¡£
3). Ö±½ÓÓÃRµÄÌØÕ÷Öµ¼ÆË㺯Êý
Ôö¼Óº¯Êý£ºcalcEigenMatrix
#Ö±½Ó¼ÆËã¾ØÕóÌØÕ÷Öµ calcEigenMatrix<-function(G){ x <- Re(eigen(G)$vectors[,1]) x/sum(x) }
> pages<-read.table(file="page.csv",header=FALSE,sep=",")
> names(pages)<-c("src","dist");pages
src dist
1 1 2
2 1 3
3 1 4
4 2 3
5 2 4
6 3 4
7 4 2
> A<-adjacencyMatrix(pages);A
[,1] [,2] [,3] [,4]
[1,] 0 0 0 0
[2,] 1 0 0 1
[3,] 1 1 0 0
[4,] 1 1 1 0
> G<-dProbabilityMatrix(A);G
[,1] [,2] [,3] [,4]
[1,] 0.0375000 0.0375 0.0375 0.0375
[2,] 0.3208333 0.0375 0.0375 0.8875
[3,] 0.3208333 0.4625 0.0375 0.0375
[4,] 0.3208333 0.4625 0.8875 0.0375
> q<-calcEigenMatrix(G);q
[1] 0.0375000 0.3732476 0.2067552 0.3824972 |
Ö±½Ó¼ÆËã¾ØÕóÌØÕ÷Öµ£¬¿ÉÒÔÓÐЧµØ¼õÉÙµÄÑ»·µÄ²Ù×÷£¬Ìá¸ß³ÌÐòÔËÐÐЧÂÊ¡£
ÔÚÁ˽âPageRankµÄÔÀíºó£¬Ê¹ÓÃRÓïÑÔ¹¹½¨PageRankÄ£ÐÍ£¬ÊǷdz£ÈÝÒ׵ġ£Êµ¼ÊÓ¦ÓÃÖУ¬ÎÒÃÇÒ²Ô¸ÒâÓñȽϼòµ¥µÄ·½Ê½½¨Ä££¬ÑéÖ¤ºó£¬ÔÙÓÃÆäËûÓïÑÔÓïÑÔÈ¥ÆóÒµÓ¦Óã¡ |