2017-03-24

https://digitalcivics.io

A computing research lab in

  • Human-computer interaction
  • Social and ubiquitous computing

Themes:

  • Health and social care
  • Education
  • Politics

Learning Circle

Online learning platform

Different from Massive Open Online Course (MOOC)

  • passive learning

Students learn from interaction with each other and mentors/experts

  • active learning
  • the line between student and teacher becomes less clear

Connecting Classes

  • Just like an ordinary class, but students take notes via twitter or write blogs
  • #CClasses and the class specific hashtags for extracting and collecting data
  • Can interact in the social network with each other, as well as people outside the classroom, e.g. experts in the subject
  • Expand the learning space beyond the physical classroom

Social network analysis

Social network analysis

Feed Finder

  • An app for finding breastfeeding friendly places
  • User review system

  • How about other ideas that can be turned into a geolocation-based app?
  • What if we let the community decide what app to develop?

App Movement

Support/sharing stage

  • Start a movement, and share to friends on social network
  • Potential users support and share

Design stage

  • Use the templates provided by the developers
  • Vote on the design aspects of the app

Launch stage

  • Similar to Feed Finder, users search for places and give reviews

App Movement

Not all movements reach the target support and get launched

Some successful apps

The sharing of a movement is like spreading an "epidemic" on the social network

This motivates us to develop a network epidemic model for the sharing data

Epidemic models

Susceptible-Infected-Recovered (SIR) model

image source

Epidemic models

Compartment models

  • Each individual is at exactly one stage at any time
  • Potentially goes from one stage to another

Classical assumptions governing the dynamics

  • Markovian
  • Homogeneous mixing

See e.g. Andersson and Britton (2000) (link)

Epidemic models

Modelling heterogeneity

Multiple levels of mixing

Not quite applicable to our data

  • No information on such levels in the social network

Epidemic models

Network modelling

Previous work

  • Britton and O’Neill (2002) (link)
  • Neal and Roberts (2005) (link)
  • Focus on inference algorithms

Based on Bernoulli random graph (BRG)

  • \(\Pr\)(any two individuals are connected) = \(p\)
  • Such probability independent of all other connections
  • Not quite realistic for social networks

Network models

Barabási and Albert (1999): Preferential attachment (PA) model

  • Nodes join network sequentially, each with \(\mu\) new edges
  • Each existing node gets an edge with probability proportional to current degree


image source

Network models

Power-law degree distribution

Network models

Small path lengths

  • Six degrees of separation

High clustering

  • Three typical nodes \(A,B\) and \(C\)
  • Assume \(A\leftrightarrow B\) and \(B\leftrightarrow C\)
  • Clustering coefficient: how likely that \(A\leftrightarrow C\)
  • BRG: whether each pair is connected is independent of other pairs a priori

Network models

Epidemic modelling

Compartment models represented by ordinary differential equations

  • \(\displaystyle\frac{dS}{dt}=-\beta IS, \quad\qquad\frac{dI}{dt}=\beta IS-\gamma I, \quad\qquad\frac{dR}{dt}=\gamma I\)

Incorporate network aspects as covariate

  • E.g. same parameter values for nodes with same degree
  • Comprehensive review by Pastor-Satorras et al. (2015)

Quite remote from previously introduced statistical models

Ingredients

Population: \(m\) nodes/individuals

The epidemic times \(\boldsymbol{I}=(I_1,I_2,\ldots,I_m)\)

  • Susceptible-Infected (SI) model
  • Markovian, constant infection rate \(\beta\)

The underlying graph/network \(\boldsymbol{G}=\{G_{ij}\}_{m\times m}\)

  • PA model, parameter \(\mu\)
  • \(G_{ij}=1\) if \(i\leftrightarrow j\) for \(i\neq j\), \(0\) otherwise

The transmission tree \(\boldsymbol{P}=\{P_{ij}\}_{m\times m}\)

  • \(P_{ij}=1\) if node \(j\) is infected by \(i\), \(0\) otherwise

Model

Step 1: Obtain new edges for the nodes

Index the nodes by their order of entering the network

One edge to start with: nodes \(1\leftrightarrow2\)

Node \(i~(>2)\) brings in \(x_i\) new edges

  • \(x_i=\mu\) (constant) in original PA model too restrictive
  • \(x_i=y_i \wedge (i-1)\) where \(y_i\sim\) Poission \((\mu)\)

Model

Step 1: Obtain new edges for the nodes

Toy example: \(m=8,\mu=3\)

  • \(x_3=2,~x_4=3,~x_5=4,~x_6=2,~x_7=3,~x_8=4\)
## [1] "G"
##                     
## [1,] . 1 1 1 1 . 1 1
## [2,] 1 . 1 1 1 . . 1
## [3,] 1 1 . 1 1 . 1 .
## [4,] 1 1 1 . 1 1 . 1
## [5,] 1 1 1 1 . 1 1 1
## [6,] . . . 1 1 . . .
## [7,] 1 . 1 . 1 . . .
## [8,] 1 1 . 1 1 . . .

Model

Step 1: Obtain new edges for the nodes

Toy example: \(m=8,\mu=3\)

  • \(x_3=2,~x_4=3,~x_5=4,~x_6=2,~x_7=3,~x_8=4\)

Relationship between \(x_i\) and \(G_{ij}\)

  • For \(i>2\), \(x_i=\sum_{j=1}^{i-1}G_{ij}\)
  • \(x_i\) essentially the column sum of \(G_{ij}\), up to the major diagonal
## [1] "G"
##                     
## [1,] . 1 1 1 1 . 1 1
## [2,] 1 . 1 1 1 . . 1
## [3,] 1 1 . 1 1 . 1 .
## [4,] 1 1 1 . 1 1 . 1
## [5,] 1 1 1 1 . 1 1 1
## [6,] . . . 1 1 . . .
## [7,] 1 . 1 . 1 . . .
## [8,] 1 1 . 1 1 . . .

Model

Step 1: Obtain new edges for the nodes

Likelihood: \[ L_1(\boldsymbol{G};\mu)=\prod_{i=3}^{m} \left[\frac{e^{-\mu}\mu^{x_i}}{x_i!}\right]^{\boldsymbol{1}\{0~\leq~x_i~<~i-1\}} \left[\sum_{z=i-1}^\infty\frac{e^{-\mu}\mu^{z}}{z!}\right]^{\boldsymbol{1}\{x_i~=~i-1\}}\\ =\frac{e^{-\mu(m-2)}}{\prod_{i=3}^{m}(x_i!)}\prod_{i=3}^{m} \mu^{\left[\sum_{j=1}^{i-1}G_{ij}\boldsymbol{1}\left\{0\leq\sum_{j=1}^{i-1}G_{ij}<i-1\right\}\right]}\\ \times\prod_{i=3}^{m}\left[(i-1)!\sum_{z=i-1}^{\infty}\frac{\mu^z}{z!}\right]^{\boldsymbol{1}\left\{\sum_{j=1}^{i-1}G_{ij}=i-1\right\}} \]

Model

Step 2: Preferentially attach the edges to build the network

When node \(i\) enters

  • \(x_i\) existing nodes chosen from \(\{1,2,\ldots,i-1\}\)
  • Weighted sampling without replacement
  • Weight of existing node \(j = \frac{\sum_{k=1}^{i-1}G_{kj}}{\sum_{j=1}^{i-1}\sum_{k=1}^{i-1}G_{kj}}\)

Exact likelihood

  • Go through all \(x_i!\) permutations of the selected nodes
  • When \(x=1,2,3,4,5,6,\ldots,~x!=1,2,6,24,120,720,\ldots\)
  • Computationally not feasible

Model

Step 2: Preferentially attach the edges to build the network

When node \(i\) enters

  • \(x_i\) existing nodes chosen from \(\{1,2,\ldots,i-1\}\)
  • Weighted sampling without replacement
  • Weight of existing node \(j = \frac{\sum_{k=1}^{i-1}G_{kj}}{\sum_{l=1}^{i-1}\sum_{k=1}^{i-1}G_{kl}}\)

Approximate likelihood

  • Weighted sampling with replacement

Model

Step 2: Preferentially attach the edges to build the network

Contribution by node \(i\)'s new edges: \[ L_{2i}=x_i!\times\prod_{j=1}^{i-1}\left(\frac{\sum_{k=1}^{i-1}G_{kj}} {\sum_{l=1}^{i-1}\sum_{k=1}^{i-1}G_{kl}}\right)^{G_{ij}} \]

Likelihood by the process of adding new edges: \[ L_2(\boldsymbol{G}):=\prod_{i=3}^{m}L_{2i}= \prod_{i=3}^{m}(x_i!)\times \prod_{i=3}^{m}\prod_{j=1}^{i-1}\left(\frac{\sum_{k=1}^{i-1}G_{kj}} {\sum_{l=1}^{i-1}\sum_{k=1}^{i-1}G_{kl}}\right)^{G_{ij}} \]

Model

Step 3: Spread the epidemic on the given network

Index the nodes by their epidemic (temporal) order

Infected node \(i\) makes infectious contacts

  • with its network neighbours
  • at points of Poisson process with rate \(\beta\sum_{j=1}^{m}G_{ij}\)

Likelihood independent of transmission tree \(\boldsymbol{P}\)

\(\pi(\boldsymbol{I}|\boldsymbol{G},\beta)=\beta^{m-1}\exp\left(-\beta\sum\sum_{(i,j):G_{ij}=1}\left[(I_j-I_i)\vee0\right]\right)\\ \qquad\quad~=\beta^{m-1}\exp\left(-\beta\sum_{i=1}^{m-1}\sum_{j=i+1}^{m}G_{ij}(I_j-I_i)\right)\)

Model

Step 3: Spread the epidemic on the given network

So where is \(\boldsymbol{P}\) gone?

  • "Uniform distribution on the set of all possible infection pathways" (Britton and O’Neill 2002)

\[ \pi(\boldsymbol{P}|\boldsymbol{G})\propto \prod_{j=2}^m \frac{1}{~\sum_{i=1}^{j-1}G_{ij}~} \times \prod_{i=1}^{m-1}\prod_{j=i+1}^{m}\mathbf{1}\left\{P_{ij}\leq G_{ij}\right\} \]

Posterior of \(\boldsymbol{G}\) involves \(\boldsymbol{P}\)

  • If \(P_{ij}=1\), \(G_{ij}(=G_{ji})=1\) with probability \(1\) a posteriori
  • If \(P_{ij}=0\), posterior of \(G_{ij}\) derived in inference

Model

Step 4: Connect the network and the epidemic

Epidemic order not necessarily the same as network order

  • Fix the labelling of nodes by epidemic order
  • Introduce variable \(\boldsymbol{\sigma}\) - permutation of \(\{1,2,\ldots,m\}\)

Convert from epidemic order to network order

  • Replace \(\boldsymbol{G}\) by \(\boldsymbol{G}_{\boldsymbol{\sigma}}=f(\boldsymbol{G},\boldsymbol{\sigma})\) when computing \(L_1(\cdot;\mu)\) and \(L_2(\cdot)\)

Model

Step 4: Connect the network and the epidemic

Toy example continued: \(m=8,\mu=3\)

  • \(\boldsymbol{\sigma}=(5,3,8,2,6,4,7,1)\)
## [1] "G"
##                     
## [1,] . 1 1 1 1 . 1 1
## [2,] 1 . 1 1 1 . . 1
## [3,] 1 1 . 1 1 . 1 .
## [4,] 1 1 1 . 1 1 . 1
## [5,] 1 1 1 1 . 1 1 1
## [6,] . . . 1 1 . . .
## [7,] 1 . 1 . 1 . . .
## [8,] 1 1 . 1 1 . . .
## [1] "G_sigma = f(G, sigma)"
##                     
## [1,] . 1 1 1 1 1 1 1
## [2,] 1 . . 1 . 1 1 1
## [3,] 1 . . 1 . 1 . 1
## [4,] 1 1 1 . . 1 . 1
## [5,] 1 . . . . 1 . .
## [6,] 1 1 1 1 1 . . 1
## [7,] 1 1 . . . . . 1
## [8,] 1 1 1 1 . 1 1 .

Bayesian inference

\(\boldsymbol{G}\) is usually unknown

\(~~~~\pi(\boldsymbol{G},\boldsymbol{\sigma},\beta,\mu|\boldsymbol{P},\boldsymbol{I})\\ \propto\pi(\boldsymbol{P},\boldsymbol{I},\boldsymbol{G},\boldsymbol{\sigma},\beta,\mu)\\ =\pi(\boldsymbol{P},\boldsymbol{I}|\boldsymbol{G},\boldsymbol{\sigma},\beta,\mu)~\pi(\boldsymbol{G},\boldsymbol{\sigma},\beta,\mu)\\ =\pi(\boldsymbol{P}|\boldsymbol{G})~\pi(\boldsymbol{I}|\boldsymbol{G},\beta)~\pi(\boldsymbol{G}|\boldsymbol{\sigma},\beta,\mu)~\pi(\boldsymbol{\sigma},\beta,\mu)\qquad(\boldsymbol{P}~\bot~\boldsymbol{I}~\text{given}~\boldsymbol{G})\\ =\pi(\boldsymbol{P}|\boldsymbol{G})~\pi(\boldsymbol{I}|\boldsymbol{G},\beta)~ L_1(\boldsymbol{G}_{\boldsymbol{\sigma}};\mu)~L_2(\boldsymbol{G}_{\boldsymbol{\sigma}})~\pi(\boldsymbol{\sigma})~\pi(\beta)~\pi(\mu)\)

Markov Chain Monte Carlo (MCMC) algorithm straightforward

  • Similar latent approach if \(\boldsymbol{P}\) is also unknown

Bayesian inference

Uninformative priors

\(\beta\sim\text{Gamma}(a_\beta, \text{rate}=b_\beta)\)

\(\mu\sim\text{Gamma}(a_\mu, \text{rate}=b_\mu)\)

\(\pi(\boldsymbol{\sigma})=(m!)^{-1}\)

Bayesian inference

Posteriors

\(\beta|\ldots\sim \text{Gamma}\left(a_\beta+m-1,\text{rate}=b_\beta+\sum_{i=1}^{m-1}\sum_{j=i+1}^{m}G_{ij}(I_j-I_i)\right)\)

\(\pi(\mu|\ldots)\propto L_1(\boldsymbol{G}_{\boldsymbol{\sigma}};\mu)~\pi(\mu)\)

\(\pi(\boldsymbol{\sigma}|\ldots)\propto L_1(\boldsymbol{G}_{\boldsymbol{\sigma}};\mu)~L_2(\boldsymbol{G}_{\boldsymbol{\sigma}})\)

Exploring permutation space

  • Bezáková, Kalai, and Santhanam (2006): link
  • Random insertion
  • More efficient than random swap

Bayesian inference

Posteriors

\(\Pr(G_{ij}=1|P_{ij}=1,\ldots)=1\)

\(\boldsymbol{G}_0\) the same as \(\boldsymbol{G}\) except \(G_{ij}\) (and \(G_{ji}\)) is set to \(0\)

\(\boldsymbol{G}_1\) the same as \(\boldsymbol{G}\) except \(G_{ij}\) (and \(G_{ji}\)) is set to \(1\)

\(\Pr(G_{ij}=0|P_{ij}=0,\boldsymbol{G}_{-ij},\ldots)\propto \displaystyle\frac{\quad\pi(\boldsymbol{G}_0|\boldsymbol{\sigma},\beta,\mu)\quad} {\sum_{k=1,k\neq i}^{j-1}G_{kj}}\)

\(\Pr(G_{ij}=1|P_{ij}=0,\boldsymbol{G}_{-ij},\ldots)\propto \displaystyle\frac{\pi(\boldsymbol{G}_1|\boldsymbol{\sigma},\beta,\mu)~e^{-\beta(I_j-I_i)}} {\sum_{k=1,k\neq i}^{j-1}G_{kj}+1}\)

Simulation study

Scenarios

Simulate network only, estimate \((\mu,\boldsymbol{\sigma})\)

  • Good

Simulate network & epidemic, estimate \((\mu,\beta,\boldsymbol{\sigma})\) given \(\boldsymbol{G}\) & \(\boldsymbol{I}\)

  • Good

Simulate network & epidemic, estimate \((\mu,\beta,\boldsymbol{\sigma},\boldsymbol{G})\) given \(\boldsymbol{P}\) & \(\boldsymbol{I}\)

  • Problematic

Simulation study

Identifiability

Posterior of \(\mu\) does not depend not on its true value

Simulation study

Identifiability

\(\beta\) not good either

Simulation study

Identifiability

What about \(\ldots \alpha=\beta\times\mu\)?

Simulation study

Observations

Inverse relationship between epidemic rate \(\beta\) and parameter characterising network connectedness

  • Average number new edges \(\mu\) in PA model
  • Edge inclusion probability \(p\) in BRG model
    • Echoing observation by Britton and O’Neill (2002)

Identifiability

  • Identifying one parameter \((\alpha)\) as good as we can get
  • Interpretation: network scaled epidemic rate

Summary

Review on network epidemics

  • Statistical models assume BRG, unrealistic
  • Physical models focus on dynamics, difficult to do inference

Model and inference

  • PA model grows the network, SI model spreads the epidemic
  • Gibbs steps for \(\beta\) and individual edges \(G_{ij}\), Metropolis steps for \(\mu\) and \(\boldsymbol{\sigma}\)

Simulation study

  • Product of \(\beta\) and \(\mu\) identifiable but not individually

Application - App Movement

Still waiting for results

Typical times taken per iteration:

  • \(m=100: 0.88\text{s}\)
  • \(m=200: 14\text{s}\)

Computational time \(O(m^4)\)

  • # potential edges \((G_{ij})\) to update \(O(m^2)\)
  • # computations involved in updating one edge \(O(m^2)\)

Data sets worth applying the model to

  • \(m = 350\) and up

Future work

Apply to App Movement data

  • Sample sizes \(m\) much larger than in simulation study
  • Current MCMC algorithm not scalable with \(m\)

Compare with models which use BRG

  • One possible way: calculate the marginal likelihoods

Marginal MCMC methods by simulating the network

  • Direct according to the preferential attachment rule
  • Takes much less time than updating the edges one by one

Bibliography

Andersson, Hakan, and Tom Britton. 2000. Stochastic Epidemic Models and Their Statistical Analysis. Lecture Notes in Statistics 151. Springer, New York.

Ball, Frank, D. Mollison, and G. Scalia-Tomba. 1997. “Epidemics with Two Levels of Mixing.” Annals of Applied Probability 7: 46–89.

Barabási, Albert-László, and Réka Albert. 1999. “Emergence of Scaling in Random Networks.” Science 286 (5439): 509–12.

Bezáková, Ivona, Adam Kalai, and Rahul Santhanam. 2006. “Graph Model Selction Using Maximum Likelihood.” In Proceedings of the \(23^{rd}\) International Conference on Machine Learning, Pittsburgh, PA, 2006. International Machine Learning Society.

Britton, Tom, and Philip D. O’Neill. 2002. “Bayesian Inference for Stochastic Epidemics in Populations with Random Social Structure.” Scandinavian Journal of Statistics 29 (3): 375–90.

Britton, Tom, Theodore Kypraios, and Philip D. O’Neill. 2011. “Inference for Epidemics with Three Levels of Mixing:methodology and Application to a Measles Outbreak.” Scandinavian Journal of Statistics 38: 578–99. doi:10.1111/j.1467-9469.2010.00726.x.

Neal, Peter, and Gareth Roberts. 2005. “A Case Study in Non-Centering for Data Augmentation: Stochastic Epidemics.” Statistics and Computing 15: 315–27.

Pastor-Satorras, Romualdo, Claudio Castellano, Piet Van Mieghem, and Alessandro Vespignani. 2015. “Epidemic Processes in Complex Networks.” Reviews of Modern Physics 87 (3): 925–79. doi:10.1103/RevModPhys.87.925.