|
The PageRank algorithm is the most popular method to rank /
weight websites. The algorithm was developed and published by
Sergey Brin and Larry Page at the Stanford university end of the
90s. PageRank can be understood as the importance of a website.
The ranking algorithm is a function of on- and off-page factors.
On-page factors are for example the title, description, headings
and plain text. An off-page factor apart from PageRank is the
anchor text (of incoming links). Neither the content nor the URL
plays a role (such parameters are called off-page factors).
Moreover, there is no difference between internal and external
links.
The citation (link) graph of the web is an important resource
that has largely gone unused in existing web search engines.
Google created maps containing as many as 518 million of these
hyperlinks, a significant sample of the total. These maps allow
rapid calculation of a web page's "PageRank", an objective
measure of its citation importance that corresponds well with
people's subjective idea of importance. Because of this
correspondence, PageRank is an excellent way to prioritize the
results of web keyword searches. For most popular subjects, a
simple text matching search that is restricted to web page
titles performs admirably when PageRank prioritizes the results
(demo available at google.stanford.edu). Assume page A has
pages T1...Tn which point to it (i.e., are citations). The
parameter d is a damping factor which can be set between 0 and
1. We usually set d to 0.85. There are more details about d in
the next section. Also C(A) is defined as the number of links
going out of page A. The PageRank of a page A is given as
follows:
PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))
Note that the PageRanks form a probability distribution over web
pages, so the sum of all web pages' PageRanks will be one.
PageRank or PR(A) can be calculated using a simple iterative algorithm,
and corresponds to the principal eigenvector of the normalized
link matrix of the web.
|