A Social Network Analysis of Articles on Social Network Analysis

Clement Lee
(Joint work with Darren J Wilkinson)
Statistics Research Group Internal Seminar

2018-10-19 (Fri)

Outline

  1. Data:
    • Citation network of articles on social network analysis
  2. Model:
    • To cluster the articles into groups
    • A modified mixed membership stochastic block model
  3. Application:
    • Fit model to citation network
    • Visualise memberships & network

Social Network Analysis

Three main groups of articles

  1. Generative models
    • E.g. small-world model, preferential attachment model
  2. Exponential random graph models
    • Log-density is a linear combination of summary statistics
  3. Community detection and latent models
    • Algorithms that cluster nodes
    • Models that uncover latent structure
    • E.g. stochastic block model, latent space model

Citation Network

  • 135 nodes (articles), 1118 edges (citations)
  • If A cites B, B cannot cite A

Adjacency Matrix Representation

Arbitrary order

Topological Order

Our data is a directed acyclic graph (DAG)

Clustering/Community Detection

Spinglass algorithm

Clustering/Community Detection

Walktrap algorithm

Group-to-group probabilities

Citing\Cited Group 1 Group 2 Group 3
Group 1 0.70 0.10 0.15
Group 2 0.20 0.60 0.05
Group 3 0.02 0.07 0.50
  • The densities can be seen as the probability of citing, given the group combination
  • Rows or columns don’t need to sum to 1

Stochastic Block Model (SBM)

Citing\Cited Group 1 Group 2 Group 3
Group 1 0.70 0.10 0.15
Group 2 0.20 0.60 0.05
Group 3 0.02 0.07 0.50
  • Example: Article A cites article B
  • Also assume A is in group 1, B in group 2
  • P(A cites B, A in 1, B in 2)
    = P(A in 1) x P(B in 2) x P(A cites B | A in 1, B in 2)
    = P(A in 1) x P(B in 2) x 0.1

Stochastic Block Model (SBM)

Holland, Laskey, and Leinhardt (1983), Social Networks

  • Ingredients of likelihood
    1. The group-to-group probabilities
    2. The latent groups the articles belong to
  • The articles are hard clustered

Mixed Membership SBM

Airoldi et al. (2008), JMLR

  • Ingredients of likelihood
    1. The group-to-group probabilities
    2. The latent groups the articles belong to,
      for their pairwise interactions
    3. The memberships of the articles
  • The articles are soft clustered

Modifications

  • Our proposed model is for DAGs
  • The number of latent variables halved
  • Topological order as extra parameter as it is not unique

Number of Groups

  • Can be modelled e.g. Peixoto (2018)
  • Not incorporated in our model (yet)
  • Fit model with different numbers of groups

Statistical Inference

  • Airoldi et al. (2008) used variational Bayes
    • Fast, but accuracy not guaranteed
  • We use a regular Gibbs sampler in MCMC
    • Feasible for the size of our data
  • Potentially more efficient / scalable alternatives
    • Stochastic gradient MCMC (Li, Ahn, and Welling 2016)
    • Collapsed Gibbs sampler

Back to Citation Network

  • Prior knowledge of 3 main groups - manual clustering
  • Fit model with 3, 4, 5 & 6 groups; results for 4 groups

Group-to-group Probabilities

Mixed Memberships

Network plot

Membership projection

Topological Order

For 3, 4, 5 & 6 groups

Summary

  • Model
    • A modified mixed membership stochastic block model
    • Suitable for directed acyclic graphs
  • Inference
    • Number of latent variables halved in Gibbs sampler
  • Application
    • Citation network of articles on social network analysis
    • Revealed 3 main groups (+ 1 miscellaneous group)
  • Next
    • Model the number of groups
    • Inference alternatives
    • Apply to other data e.g. software dependencies

Airoldi, Edoardo M., David M. Blei, Stephen E. Fienberg, and Eric P. Xing. 2008. “Mixed Membership Stochastic Blockmodels.” Journal of Machine Learning Research 9: 1981–2014.

Holland, Paul W., Kathryn Blackmond Laskey, and Samuel Leinhardt. 1983. “Stochastic Blockmodels: First Steps.” Social Networks 5 (2): 109–37.

Li, Wenzhe, Sungjin Ahn, and Max Welling. 2016. “Scalable MCMC for Mixed Membership Stochastic Blockmodels.” In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, 51:723–31. Proceedings of Machine Learning Research.

Peixoto, Tiago P. 2018. “Nonparametric Weighted Stochastic Block Models.” Physical Review E 97: 012306.