PageRank: The In-depth Explanation

A brief History of PageRank

PageRank was the first core algorithm in Google Search and is named after Larry Page (co-founder of Google). PageRank aims at estimating the importance of a webpage on the basis of number and quality of links it receives. In short it is a link analysis algorithm.

PageRank was developed as part of a research project in 1996 by Larry Page and Sergey Brin while they were studying in Stanford University. Rajeev Motwani and Terry Winograd were co-authors of the first paper about the PageRank project.

It might come as a surprise to many people but the PageRank US patent was actually assigned to Stanford University, who gave its exclusive license rights to Google in exchange for 1.8 million Google shares, which it later sold for US$336 million in the year 2005.

Is PageRank still important?

While there are hundreds of signals Google uses to rank web pages but the importance of PageRank in SEO shouldn’t be underestimated. The main theme behind PageRank, which is link analysis is still a primary part of Google’s core ranking algorithms and would remain so for many many years to come.

Original Algorithmic definition of PageRank

PageRank as defined in the original paper “The Anatomy of a Large-Scale Hypertextual Web Search Engine” by Sergey Brin and Lawrence Page (also co-founders of Google) is as follows:

We assume page A has pages T1…Tn which point to it (i.e., are citations). The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85. Also C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows:

PR(A) = (1-d) + d (PR(T1)/C(T1) + … + PR(Tn)/C(Tn))

Note that the PageRanks form a probability distribution over web pages, so the sum of all web pages’ PageRanks will be one.

Stanford technical report on PageRank “The PageRank Citation Ranking: Bringing Order to the Web”

The paper that provided a detailed explanation of PageRank was titled “The PageRank Citation Ranking: Bringing Order to the Web”. It was a technical report submitted by Lawrence Page, and Sergey Brin, and Rajeev Motwani, and Terry Winograd in the year 1999 to Stanford.

The paper describes PageRank as a method for rating Web pages objectively and mechanically, effectively measuring the human interest and attention devoted to them. In the Introductory section of the report they clearly mention that they take advantage of the link structure of the Web to produce a global “importance” ranking of every web page. This ranking, called PageRank, helps search engines and users quickly make sense of the vast heterogeneity of the World Wide Web.

The paper goes on to explain the link structure of the time. While estimates vary, the current graph of the crawlable Web has roughly 150 million nodes (pages) and 1.7 billion edges (links). Every page has some number of forward links (outedges) and backlinks (inedges). Web pages vary greatly in terms of the number of backlinks they have. For example, the Netscape home page has 62,804 backlinks in our current database compared to most pages which have just a few backlinks. Generally, highly linked pages are more “important” than pages with few links.

This is the definition of PageRank as explained in the paper. Let u be a web page. Then let Fu be the set of pages u points to and Bu be the set of pages that point to u. Let Nu = |Fu| be the number of links from u and let c be a factor used for normalization (so that the total rank of all web pages is constant). We begin by defining a simple ranking, R which is a slightly simplified version of PageRank:

Other important sections of the report includes Computing PageRank, managing dangling links, PageRank Implementation, Convergence Properties, Searching with PageRank, Personalized PageRank and Its applications.

The primary US Patent on PageRank “Method for node ranking in a linked database”

US Patent number 6,285,999 which was filed on January 9, 1998 and assigned to The Board of Trustees of the Leland Stanford Junior University (Stanford, CA) on September 4, 2001, was the primary US Patent relating to PageRank. The Patent was invented by Lawrence Page, also a co-founder of Google. The title of the patent is “Method for node ranking in a linked database”. What is claimed according to the patent are the following:

  1. A computer implemented method of scoring a plurality of linked documents, comprising: 
    • obtaining a plurality of documents, at least some of the documents being linked documents, at least some of the documents being linking documents, and at least some of the documents being both linked documents and linking documents, each of the linked documents being pointed to by a link in one or more of the linking documents; 
    • assigning a score to each of the linked documents based on scores of the one or more linking documents and
    •  processing the linked documents according to their scores.
  2. The method of claim 1, wherein the assigning includes: 
    • identifying a weighting factor for each of the linking documents, the weighting factor being dependent on the number of links to the one or more linking documents, and 
    • adjusting the score of each of the one or more linking documents based on the identified weighting factor. 
  3. The method of claim 1, wherein the assigning includes: 
    • identifying a weighting factor for each of the linking documents, the weighting factor being dependent on an estimation of a probability that a linking document will be accessed, and 
    • adjusting the score of each of the one or more linking documents based on the identified weighting factor. 
  4. The method of claim 1, wherein the assigning includes: 
    • identifying a weighting factor for each of the linking documents, the weighting factor being dependent on the URL, host, domain, author, institution, or last update time of the one or more linking documents, and 
    • adjusting the score of each of the one or more linking documents based on the identified weighting factor. 
  5. The method of claim 1, wherein the assigning includes:
    • identifying a weighting factor for each of the linking documents, the weighting factor being dependent on whether the one or more linking documents are selected documents or roots, and
    • adjusting the score of each of the one or more linking documents based on the identified weighting factor.
  6. The method of claim 1, wherein the assigning includes:
    • identifying a weighting factor for each of the linking documents, the weighting factor being dependent on the importance, visibility or textual emphasis of the links in the one or more linking documents, and
    • adjusting the score of each of the one or more linking documents based on the identified weighting factor.
  7. The method of claim 1, wherein the assigning includes:
    • identifying a weighting factor for each of the linking documents, the weighting factor being dependent on a particular user’s preferences, the rate at which users access the one or more linking documents, or the importance of the one or more linking documents, and
    • adjusting the score of each of the one or more linking documents based on the identified weighting factor.
  8. A computer implemented method of determining a score for a plurality of linked documents, comprising:
    • obtaining a plurality of linked documents;
    • selecting one of the linked documents; 
    • assigning a score to the selected document that is dependent on scores of documents that link to the selected document; and 
    • processing the linked documents according to their scores.
  9. A computer implemented method of ranking a plurality of linked documents, comprising: 
    • obtaining a plurality of documents, at least some of the documents being linked documents and at least some of the documents being linking documents, at least some of the linking documents also being linked documents, each of the linked documents being pointed to by a link in one or more of the linking documents; 
    • generating an initial estimate of a rank for each of the linked documents; 
    • updating the estimate of the rank for each of the linked documents using ranks for the one or more linking documents; and 
    • processing the linked documents according to their updated ranks. 
  10. A computer implemented method of ranking a plurality of linked documents, comprising: 
    • automatically performing a random traversal of a plurality of linked documents, the random traversal including selecting a random link to traverse in a current linked document; 
    • for each linked document that is traversed, assigning a rank to the linked document that is dependent on the number of times the linked document has been traversed; and 
    • processing the plurality of linked documents according to their rank. 
  11. The method of claim 10, wherein there is a predetermined probability that the next linked document to be traversed will be a random one according to a distribution of the plurality of linked documents. 
  12. The method of claim 1, wherein the processing includes: 
    • displaying links to the linked documents as a directory listing. 
  13. The method of claim 1, wherein the processing includes: 
    • displaying links to the linked documents, and 
    • displaying annotations representing the score of each of the linked documents. 
  14. The method of claim 13, wherein the annotations are bars, icons, or text. 
  15. The method of claim 1, further comprising: 
    • processing the linked documents based on textual matching. 
  16. The method of claim 15, wherein the textual matching includes matching anchor text associated with the links. 
  17. The method of claim 1, further comprising: 
    • processing the linked documents based on groupings of the linked documents. 
  18. A computer-readable medium that stores instructions executable by one or more processing devices to perform a method for determining scores for a plurality of linked documents, comprising: 
    • instructions for obtaining a plurality of documents, at least some of the documents being linked documents, at least some of the documents being linking documents, and at least some of the documents being both linked documents and linking documents, each of the linked documents being pointed to by a link in one or more of the linking documents; 
    • instructions for determining a score for each of the linked documents based on scores for the one or more linking documents; and 
    • instructions for processing the linked documents according to their scores. 
  19. A computer-readable medium that stores instructions executable by one or more processors to perform a method for scoring documents, comprising: 
    • instructions for searching a plurality of documents, at least some of the documents being linked documents and at least some of the documents being linking documents, at least some of the linking documents also being linked documents, each of the linked documents being pointed to by a link in one or more of the linking documents; 
    • instructions for scoring each of the linked documents based on scores for the one or more linking documents; and 
    • instructions for providing the linked documents based on their scores. 
  20. The method of claim 1, wherein the assigning a score includes: 
    • determining the score based on (1) a number of the linking documents that link to the linked document and (2) an importance of the linking documents. 
  21. The method of claim 20, wherein the importance of the linking documents is based on a number of documents that link to the linking documents. 
  22. The method of claim 1, wherein the assigning a score includes: 
    • associating one or more backlinks with each of the linked documents, each of the backlinks corresponding to one of the linking documents that links to the linked document, 
    • assigning a weight to each of the backlinks, and 
    • determining a score for each of the linked documents based on a number of backlinks for the linked document and the weights assigned to the backlinks. 
  23. The method of claim 22, wherein the processing of the linked documents includes: 
    • organizing the linked documents based on the determined scores. 
  24. The method of claim 22, wherein the assigning a weight includes: 
    • assigning different weights to at least some of the backlinks associated with at least one of the linked documents. 
  25. The method of claim 1, wherein the assigning a score includes: 
    • associating one or more backlinks with each of the linked documents, each of the backlinks corresponding to one of the linking documents that links to the linked document, 
    • assigning a weight to each of the backlinks, and 
    • determining a score for each of the linked documents based on a sum of the weights assigned to the backlinks associated with the linked document. 
  26. The method of claim 25, wherein the weights assigned to each of the backlinks are independent of text of the corresponding linking documents. 
  27. The method of claim 1, wherein the assigning a score includes: 
    • determining the score primarily based on linking information. 
  28. The method of claim 1, wherein the assigning a score includes: 
    • determining the score substantially independent of user-query content. 
  29. The method of claim 1, wherein the assigning a score includes: 
    • iteratively determining the score for a linked document, the score being primarily based on document-linking information and substantially independent of user-query content.
If you like it please share

Leave a Reply

Your email address will not be published.