Theoretical Computer Science
ds.algorithms advice-request formal-modeling ni.networking-internet
Updated Fri, 20 May 2022 17:59:08 GMT

PageRank for a Non-Random Searcher


I'm looking to adapt the PageRank algorithm as a centrality measure in a network. This network however, unlike the "random surfer" of the original paper on PageRank, or the random library browser for Eigenfactor.org, doesn't have a random browser who can leave and jump off to some other network. The theoretical reader is reading only this literature, and reading it completely.

As I understand it, the damping factor in the usual implementation of PageRank is 1 - probability of the random surfer jumping to a different site, and is usually set at 0.85. Is it reasonable then, in an entirely closed network, to set this value = 1.0, or is there something I'm not seeing?

Some details of the network, which would probably be helpful:

All the networks I will be looking at are fairly small, less than 1000 nodes, and directed. They're citation networks - with papers as nodes and edges as citations between papers, so inherently there are no isolated nodes not connected to any other nodes, as their inclusion in the network is conditional on there being a link to or from the network. There's no reason to believe the network is strongly connected - indeed, I'm pretty sure they're inherently not.




Solution

I'm not familiar at all with details of the PageRank theory, but here's an intuitive answer: Suppose you have a huge connected graph plus a single isolated vertex that you wish to reach. Without random jumps there's no hope to stop surfing. Does the algorithm exclude such bad instances? More generally if the graph is disconnected the jumps would be necessary to reach every vertex.





Comments (2)

  • +0 – If a vertex has no incoming edges (is a rank source), then it will be unreachable and have 0 rank. If a vertex has no outgoing edges (is a rank sink), then rank will not be conserved in subsequent iterations. I.e., a vertex being weakly (but not strongly) connected causes problems; it doesn't have to be isolated. — Aug 15, 2011 at 17:03  
  • +0 – There are a number of unreachable "rank source" nodes in the network, but why is it necessarily bad for them to have 0 rank. If what you're interested in is the relative scale of "A is more important than B", why do nodes need to have a positive rank? There's a single rank sink, but I'm less worried about it's rank being stable, as its an artificial construct that's there to let you calculate betweenness centrality. — Aug 15, 2011 at 21:02