Research focus
Price Paid Data from Land Registry
Repeat Sales model in spatial econometrics
A spatio-temporal model
For updating augmented data in MCMC algorithms
Models with a large number of latent variables
Update 1, \(k\), or all of them in each iteration?
Optimal when (\(k~\times\) acceptance rate) is maximised
* C. Lee and P. Neal (2018). Optimal Scaling of the Independence Sampler: Theory and Practice, Bernoulli 24 (3), 1636-1652.
Optimal \(k\) is such that acceptance rate \(\approx\) 23.4%
App Movement
Sharing of movement \(\approx\) spreading an “epidemic”
Generate a network by preferential attachment rules
Spread the epidemic by a Susceptible-Infected model
* C. Lee, A. Garbett and D. J. Wilkinson (2018). A network epidemic model for online commissioning data, Statistics and Computing 28 (4), 891-904.
Computational issues
Connections between two individuals only observed if there were infections
Inferring the missing connections, treated as latent variables
Simulation study reveals identifiability issues
To understand temporal behaviour on Twitter
To model short-term growth of retweets
Data: ~4 hours of tweets & retweets with #gots7
on 2017-07-16, Game of Thrones Season 7 premiere
* C. Lee and D. J. Wilkinson (2018). A hierarchical model of non-homogeneous Poisson processes for Twitter retweets, ArXiv e-prints.
Retweet growth over time
Retweet count vs Follower count
A hierarchical Bayesian model
The retweets of \(i\)th original tweet is fit by a non-homogeneous Poisson process
Estimation: latent variables & universal parameters
Actual data vs Simulated data
To review > 100 papers on social network analysis (SNA)
To cluster them according to how they cite each other
Allowing papers to belong to multiple groups as a lot of them are interdisciplinary
Stochastic block model for clustering nodes in a network
Mixed membership version - soft clustering
A citation network is a directed acyclic graph (DAG)
The data & the application equally important
* C. Lee and D. J. Wilkinson (2018). A social network analysis of articles on social network analysis, ArXiv e-prints.
To understand large-scale human behaviour from mobile phone data
* M. Vanhoof, C. Lee and Z. Smoreda (To appear). Performance and sensitivities of home detection on mobile phone data, Big Data Meets Survey Science, Monograph of BigSurv18 Conference.
Assigning home location by calling patterns
Validating with census data
Monthly variation
The digital library of the Association for Computing Machinery (ACM): http://dl.acm.org
What are the emerging/hot topics?
Data | Visualisation/Empirical | Modelling |
---|---|---|
Temporal | Retweets | |
Spatial + Temporal | Home detection | House price index |
Network | Network of SNA papers | |
Network + Temporal | Network of CHI papers | Network epidemic Future work —> |
The same ultimate goal: model-based clustering
Apply to the network of \((>6000)\) CHI papers