Lec 03 Node Embedding
1. Overview
Goal : Efficient task-independent feature learning for ML with graph
Why Embedding ?
- similarity of embedding space indicate that in origin space
- encode information
- could used in many downstream tasks
2. Encoder and Decoder
1 | graph TB |
Learning Node Embedding
- Encoder maps from nodes to embeddings
- define a similarity function in origin space
- Decoder maps from embeddings to similarity scores
- optimize the parameters of encoder
3. Shallow Encoding : Embedding lookup
$$
Enc(v_i) = z_{v_i} = Z \cdot v_i\
(v_i)_j = \left{ \begin{array}{rcl}
0,i\ne j\
1,i=j
\end{array} \right.\
Z \in R^{d\times |V|}
$$
Weakness : need to optimize a large number of parameters
4. Random Walk : Overview
Algorithm of Random Walk
Run short fixed-length random walks starting from each node $u$ with strategy $R$
collect $N_R(u)$ for each node $u$, where $N_R(u)$ is a multi-set including all node access in step 1
optimize the loss function gradient descent
$$
\arg \min_{z_u,u\in V} \sum_{u \in V} \sum_{v \in N_R(u)} - \log \frac{\exp(z_u^Tz_v)}{\sum_{n\in V} z_u^Tz_n}
$$
Weakness : time expensive
The time complexity of loss function calculating is $O(|N|^2)$, that means it is time expensive
Solution : Negative Sample
To simplify the loss function’s calculating, use the formula as follow
$$
\log \frac{\exp(z_u^Tz_v)}{\sum_{n\in V} z_u^Tz_n} \approx \log (\sigma(z_u^Tz_v)) - \sum_{i=1}^k \log(\sigma(z_u^Tz_{n_i}))\
\sigma(x) = \frac{1}{1+\exp(-x)}\
$$
In the formula, $n_i$ express the node got by sample from random distribution
Proved by the practice, the negative sample have good performance when $k$ just in $[5,20]$
5. The Random Walk Strategy
5.1 Deep Walk
Just run fixed-length, unbiased random walks
Weakness : too constrained
5.2 Node2Vec
Idea : use flexible, biased random walks that can trade off between local and global, where local nodes’ explored by BFS while global by DFS
Super Parameters
- return parameter $p$ : depend the probability to return to previous node
- In-out parameter $q$ : depend the probability to BFS or DFS
Example For node2vec
For the graph below, assume previous node is $A$ and current node is $B$
1 | graph LR |
The next node would depend by random sample from the distribution as follow :
$$
nextnode = \left{\begin{array}{rcl}
A,prob = 1 / p\
C,prob = 1 / q\
E,prob = 1\
F,prob = 1
\end{array}\right.\
$$
Attention : the probabilities is not normalized
6. Entire Graph Embedding
6.1 Sum or average
algorithm :
- run a standard node embedding technique on graph G
- sum or average all node embeddings in the G
6.2 Virtual Node
algorithm :
- use a virtual node represent the entire graph, maybe have a series of edges with node in the graph
- run a standard node embedding technique to get the virtual node’s embedding
6.3 Anonymous Walk Embedding : v1
algorithm :
- run a random walk on the graph $G$
- anonymization : for each node in a random walk result, replace its name with the index of the first time it occur
- for all of anonymous walk produced by 2, count the number of all kinds of anonymous walk
- Embedding $Z_G[i]$ represent the score or probability of $i$-th anonymous walk
the dimension of embedding $Z_G$:
Because the kind number of anonymous walk depend on the length of walk, so the dimension of embedding depend on it too.
How many random walks m do we need?
We want the distribution to have error of more than $\varepsilon$ with probability less than $\delta$
$$
m = \lceil \frac 2 {\varepsilon^2(\log(2^n-2)-\log(\delta))} \rceil
$$
6.4 Anonymous Walk Embedding : v2
signals :
$\eta$ : the number of samples, the “sample” mean the sum of window number could be extracted from all anonymous walk
$Z$ : the embeddings of walks, $z_i$ express $i$-th specific type of anonymous walk
$\Delta$ : the window size
$z_G$ : the embedding of the entire graph $G$
$w_{i,j}$ : the $j$-th anonymous walk in the series sampled starting from node $i$
algorithm :
learn the embeddings of anonymous walks $Z = {z_i|i=1,2,…,\eta}$ as well as $z_G$
a) for each node i, starting from it and sample anonymous walks randomly, get a series of walks called $[w_{i1},w_{i2},…,w_{iT}]$
b) for a window size $\Delta$, learn to predict walks that co-occur
$$
\arg \min_{Z,z_G,b,U} \sum_{i \in G }\sum_{t=\Delta}^{T-\Delta} -\log P(w_{i,t}|{w_{i,t-\Delta},w_{i,t-\Delta+1},…,w_{i,t+\Delta},z_G})\
P(w_{i,t}|{w_{i,t-\Delta},w_{i,t-\Delta+1},…,w_{i,t+\Delta},z_G}) = \frac{\exp(y(w_{i,t}))}{\sum_{j=1}^\eta \exp(y(w_j))}\
y(w) = b + U (cat(\frac 1 {2\Delta}\sum_{i=-\Delta}^\Delta z_i,z_G))
$$
7. How to use Embeddings
Clustering by point $z_i$
- community detection
Node classification by point $z_i$
- predict the label of node
Link prediction : predict edge $(i,j)$ based on $(z_i,z_j)$
- concatenate : $[z_i,z_j]$
- Hadamard : $z_i * z_j$
- Sum / Avg : $z_i + z_j$
- Distance : $||z_i-z_j||_2$
Graph classification by $z_G$