Lec 05 Collective Classification
1. Overview
Question : Given a graph with labels on some nodes, how do we assign labels to all other nodes in the network ?
Intuition : Similar nodes are connected
Technologies : Relational classification, Iterative Classification, Belief Propagation
Explain of “Similar nodes are connected” from the view of social science:
Homophily : the tendency of individuals to associate and bond with similar others
Influence : social connection can influence the individual characteristics of a person
2. Motivations of classification
similar nodes are typically close together or directly connected in the network
classification label of node $v$ may depend on 1) features of $v$; 2) labels of the nodes in $v$’s neighborhood; 3) features of the nodes in $v$’s neighborhood
3. Problem definition
Given :
adjacency matrix $A$ over $n$ nodes
some nodes with labels $Y_i$ and some unlabeled nodes
the features of nodes $F$
Goal :
Predict the labels of unlabeled nodes, or find $P(Y_v)$ given features and network
4. Pipeline of Collective Classification
Hypothesis : Markov Assumption
Pipeline :
- Local classifier : used for initial label assignment
- Predict the nodes label used only their features, but not network information
- Relational Classifier : capture correlation
- label one node based on the labels or features of its neighbors
- Collective Inference
- apply the relational classifier to each node iteratively until the inconsistency between neighboring labels is minimized
- be affected by network structure
5. Relational Classification
Base Idea : class probability of $Y_v$ of node $v$ is a weighted average of class probabilities of its neighbors
Steps :
Initial : initial labeled node $v$ with its ground-truth label $Y_v^*$ while initial unlabeled nodes $Y_v = 0.5$
Update all nodes’ labels with formula follow until convergence:
$$
P(Y_v = c) = \frac 1 {\sum_{(v,u)\in E}A_{u,v}}\sum_{(v,u)\in E}A_{v,u} P(Y_u=c)
$$
where $A$ is adjacency matrix with weight or strength information
Challenges :
Convergence is not guaranteed
model can’t use node feature information
6. Iterative Classification
Main Idea : classify node $v$ with its attributions $f_v$ and the summary of labels $z_v$ of neighbors set $N_v$
Approach :
$\phi_1(f_v)$ : predict the label of node $v$ base of $f_v$
$\phi_2(f_v,z_v)$ : predict the label base of $f_v$ and $z_v$
the approaches of $z_v$
- histogram of the number of each label in $N_v$
- most common label in $N_v$
- number of different labels in $N_v$
Steps :
- classify based on node attributions alone
- train classifier $\phi_1,\phi_2$ with training set
- Iterate till convergence
- Initial : initial the labels with $\phi_1$
- Iterate : update $z_v$ based on $Y_u,\forall u \in N_v$ and then update $Y_v$ based on $f_v$ and $z_v$
Challenges :
Convergence is not guaranteed
7. Loopy belief propagation
Notation
- Label-label potential matrix $\psi$ : $\psi(Y_i,Y_j)$ is proportional to the probability of a node $j$ being in class $Y_j$ given that it has neighbor $i$ in $Y_i$
- Prior belief $\phi$ : $\phi(Y_i)$ is proportional to the probability of node $i$ being in class $Y_i$
- $m_{i\rightarrow j}(Y_j)$ : $i$’s message of $j$ being in $Y_j$
- $\mathcal L$ : set of all classes / labels
Steps
Initial all messages to 1
repeat for each node by formula
$$
m_{i\rightarrow j}(Y_j) = \sum_{Y_i\in \mathcal L} \psi(Y_i,Y_j) \phi(Y_i) \Pi_{k\in N_i/j} m_{k\rightarrow i}(Y_i),\forall Y_j \in \mathcal L
$$
Result
$$
b_i(Y_i) = \phi_i(Y_i)\Pi_{j \in N_i}m_{j\rightarrow i}(Y_i),\forall Y_i \in \mathcal L
$$
Advantages
- easy to program and parallelize
- can apply to any graph model with any form of potentials
Challenges
- convergence is not guaranteed, especially if many closed loops