Networks Reading Group: Model Selection

Clement Lee

2021-03-10 (Wed)

Background

Approach 1: Fixed

Approach 2: Inferring \(K\) as a parameter

Approach 3: Selecting via a criterion

Notation

A. Likelihood modularity

B. Complete data log-likelihood

\[ \begin{aligned} \pi(Y,Z|\theta) &= \pi(Y|Z,\theta)\times\pi(Z|\theta)\\ \log\pi(Y,Z|\theta) &= \log\pi(Y|Z,\theta)+\log\pi(Z|\theta) \end{aligned} \]

C. Integrated complete data log-likelihood (ICL)

\[ \begin{aligned} \log\pi(Y,Z) &= \log\int\pi(Y,Z|\theta)\pi(\theta)d\theta \end{aligned} \]

D. Approximate ICL

\[ \begin{aligned} \log\pi(Y,Z) \approx \max_{\theta}\log\pi(Y,Z|\theta)-\frac{K^2}{2}\log\left(n(n-1)\right)-\frac{K-1}{2}\log n \end{aligned} \]

E. Observed data log-likelihood

\[ \begin{aligned} \log\pi(Y|\theta) &= \log\left(\sum_{Z}\pi(Y,Z|\theta)\right) \\ &= \log\left(\sum_{Z}\left[\pi(Y|Z,\theta)\times\pi(Z|\theta)\right]\right) \end{aligned} \]

F. Approximate observed data log-likelihood

\[ \begin{aligned} \log\pi(Y|\theta) &= E_Q\left[\log\pi(Y,Z|\theta) - \log Q(Z)\right] + D_{KL}\left(Q(Z)||\pi(Z|Y,\theta)\right)\\ \log\pi(Y|\theta) &\approx ~~\quad\log\pi(Y,Z|\theta) - \log Q(Z) \end{aligned} \]

G. Marginal log-likelihood

\[ \begin{aligned} \log\pi(Y) &= \log\int\pi(Y|\theta)\pi(\theta)d\theta \end{aligned} \]

\[ \begin{aligned} \log\pi(Y) &= E_Q\left[\log\pi(Y,Z) - \log Q(Z)\right] + D_{KL}\left(Q(Z)||\pi(Z|Y)\right)\\ \log\pi(Y) &\approx ~~\quad\log\pi(Y,Z) - \log Q(Z) \end{aligned} \]

H. Bayesian information criterion (BIC)

Some penalties

Yet another direction

\[ \begin{aligned} \log\pi(Y|Z)&=\log\int\pi(Y|Z,\theta)\pi(\theta)d\theta\\ &\approx \sup_{\theta}\log\pi(Y|Z,\theta)-\frac{1}{2}\frac{K(K+1)}{2}\log\frac{n(n-1)}{2}\\ &\approx \sup_{\theta}\log\pi(Y|Z,\theta)-\frac{K(K+1)}{2}\log n \end{aligned} \]

\[ \begin{aligned} \frac{1}{2}\frac{K(K+1)}{2}\log\frac{n(n-1)}{2} &\approx \frac{1}{2}\frac{K(K+1)}{2}\log\frac{n^2}{2}\\ &= \frac{1}{2}\frac{K(K+1)}{2}\log n^2 - \frac{1}{2}\frac{K(K+1)}{2}\log2\\ &= \frac{K(K+1)}{2}\log n - \frac{1}{2}\frac{K(K+1)}{2}\log2 \end{aligned} \]

Let’s go with their approximation

\[ \begin{aligned} \text{ICL} - \log\pi(Z) = \log\pi(Y,Z)-\log\pi(Z)&\approx \sup_{\theta}\log\pi(Y|Z,\theta)-\frac{K(K+1)}{2}\log n\\ \text{ICL} + \log\tau(Z_K)&\approx \sup_{\theta}\log\pi(Y|Z,\theta)-\frac{K(K+1)}{2}\log n \end{aligned} \]

Putting a prior over \(Z\) & \(K\)

\[ \begin{aligned} \log\pi(Z) &= \log\left[\tau(Z_K)\right]^{-\lambda} = \log\left(K^{-\lambda n}\right) = -\lambda n\log K \end{aligned} \]

Deriving the criterion & penalty

\[ \begin{aligned} \text{ICL} &\approx \sup_{\theta}\log\pi(Y|Z,\theta)-\frac{K(K+1)}{2}\log n + \log\pi(Z)\\ &\approx \sup_{\theta}\log\pi(Y|Z,\theta)-\frac{K(K+1)}{2}\log n - \lambda n \log K\\ &\approx \sup_{\theta}\log\pi(Y|Z,\theta)-\left(\frac{K(K+1)}{2}\log n + \lambda n \log K\right) \\ l(K) &= \max_{Z} \sup_{\theta}\log\pi(Y|Z,\theta)-\left(\frac{K(K+1)}{2}\log n + \lambda n \log K\right) \end{aligned} \]

Wait, they also mentioned ICL

Misspecifying \(K\)

\[ \begin{aligned} l(K^{'}) - l(K) &= \left(\max_{Z\in\left[K^{'}\right]^n}\sup_{\theta}\log\pi(Y|Z,\theta) - \log\pi(Y|Z^{*},\theta^{*})\right) \\ &\quad- \left(\max_{Z\in\left[K\right]^n}\sup_{\theta}\log\pi(Y|Z,\theta) - \log\pi(Y|Z^{*},\theta^{*})\right) \\ &\quad+ \left(\frac{K^{'}(K^{'}+1)}{2}\log n - \frac{K(K+1)}{2}\log n\right) + \left(\lambda n \log K^{'} - \lambda n \log K\right)\\ &=\text{A log-likelihood ratio with misspecified $K$}\\ &\\ &\quad - \text{A log-likelihood ratio with correct $K$ (and therefore follows the $\chi^2$ distribution divided by 2)}\\ &\\ &\quad + \left(\frac{K^{'}(K^{'}+1)}{2} - \frac{K(K+1)}{2}\right)\log n + \lambda n \log\frac{K^{'}}{K} \end{aligned} \]

The theoretical results (that I skipped)

\[ \begin{aligned} \Pr\left(l(K^{'})>l(K)\right)\rightarrow 0\qquad\text{as}\quad n\rightarrow\infty \end{aligned} \]

\[ \begin{aligned} \Pr\left(l(K^{'})>l(K)\right)\rightarrow 1\qquad\text{as}\quad n\rightarrow\infty,\qquad\text{for}\quad K^{'}>K \end{aligned} \]

Degree-corrected SBM

\[ \begin{aligned} \Pr\left(l(K^{'})>l(K)\right)\rightarrow 0\qquad\text{as}\quad n\rightarrow\infty \end{aligned} \]

Simulation results

Real applications

Hu et al. (2019) Saldana et al. (2017) Chen & Lei (2018)
SBM 5 10 3
DC-SBM 3 1 1
Hu et al. (2019) Saldana et al. (2017) Chen & Lei (2018)
DC-SBM 2 1 2

Model selection between SBM & DC-SBM?

Thank you!