GNN Notes : LEC 04

1. PageRank

main idea

Regard links as votes : Page is more important if it has more links

Different Importance of links : links from important links are more important

Combination : A point is important if it is pointed by other important pages

Formula

Common Formulation

$r_i$ : the $i$-th node’s rank or score

$d_i$ : the out-degree of $i$-th node
$$
r_j = \sum_{i->j}\frac{r_i}{d_i}
$$
Matrix Formulation

$A_{ij}$ : equal to 1 if edge from node $i$ to $j$ exists, else 0

$sum_by_row(A)$ : a vector means the out-degrees of nodes in graph

Stochastic adjacency matrix M : if $j->i$, $M_{ij} = 1 / d_j$

$M = (A / sum_by_row(A))^T$

so that $r = M \cdot r$

Eigenvector Formulation

we could regard $r = M \cdot r$ as $1\cdot r = M \cdot r$, so that $r$ would be the eigenvector with the eigenvalue 1

2. How to solve PageRank ?

Power Iteration

algorithm :

  1. assign each node an initial page rank

  2. repeat until convergence $\sum_i|r_i^{t+1} - r_i^t| <= \varepsilon \Leftrightarrow |r^{t+1}-r^t|_1 <=varepsilon$:
    $$
    \begin{align}
    &r_j^{t+1} = \sum_{i->j} \frac{r_i^t}{d_i}\
    \Leftrightarrow &r = M\cdot r
    \end{align}
    $$

Problem : converge

problem in converge :

dead ends problem : no out-links

Spider traps : all out-links within the group

solution : teleport

At each time step, with the probability $\beta$, follow a link at random; with the probability $1-\beta$, jump to a random node

commonly, $0.8 < \beta <0.9$

Especially, when surfer arrive at a dead end, it has $1.0$ probability to jump to a random node

Formulation as follow (assume that $\sum_{i} r_i = 1$):
$$
\begin{align}
&r_j=\sum_{i\rightarrow j}\beta\frac{r_i}{d_i} + \frac1N(1-\beta)\
\Leftrightarrow &P = \beta \cdot M + \frac1N(1-\beta)
\end{align}
$$

3. Personalized PageRank and Random Walks with Restarts

PageRank, PPR and RWR

difference : the points could be teleported to

PageRank : teleport to anywhere on the graph

PPR : teleport to only a subset of graph

RWR : always teleport to the staring point

Random Walks with Restarts

Idea :

  1. Every node has some importance
  2. Importance gets evenly split among all edges and pushed to the neighbors

algorithm :

  1. Given a set of query_nodes $Q$, randomly select a node $q_i$ and make a random walk
  2. make a step to a random neighbor and record visit or with $1 -\beta$ probability to restart from $q_j \in Q$
  3. the nodes with the highest visit count have highest proximity to $Q$
1
2
3
4
5
6
item = Q.sample_by_weight()
for i in range(N_STEPS):
item = item.get_random_neighbor()
item.visited_time += 1
if random >= beta :
item = Q.sample_by_weight()

considers :

  • multiple connections
  • multiple paths
  • directed and undirected connection
  • degree of node

4. Matrix Factorization

Embedding and Matrix Factorization

Given : measure the similarity of nodes with adjacency matrix $A$

then : $Z^TZ \approx A$

so that : $\arg \min_Z ||A-Z^TZ||_2$

Exactly : we make matrix factorization of adjacency matrix $A$