2018-01-26 (Fri)

#thehandmaidstale

Outline

  1. Extend one kind of non-homogeneous Poisson process (NHPP) and apply to single sequences of event times:
    • original tweets of a hashtag
    • retweets of one original
  2. Model retweet count vs follower count by censored regression
  3. Model retweets of all originals by a hierarchical NHPP model
    • follower count as a covariate
  4. Application

The power law and hybrid processes

NHPP (one-dimensional)

If a sequence of events is assumed to arise from an NHPP with intensity function \(h(t)\geq 0\)

  • The random variable of the number of events within \([t_1,t_2]\) follows the Poisson distribution with mean \(\displaystyle\int_{t_1}^{t_2}h(t)dt\)
  • Independent of the random number of events in any other disjoint interval
  • Also define the cumulative intensity \(H(t)=\displaystyle\int_0^t h(t)dt\)

Power law process

  • \(h(t)=\gamma t^{-\lambda}\), where \(\gamma>0,\lambda<1\)
  • Some call it Weibull process in reliability
  • N.B. \(\neq\) a renewal process with power-law distributed interarrival times
  • Common special case: homogeneous Poisson process
    • Power law process with \(\lambda=0\)
    • Renewal process with exponential distributed interarrival times

Duane plot (Duane 1964)

How to check if the power law process is appropriate for the data?

  • Let \(\{t_i,i=1,2,\ldots,n\}\) be a sequence of event times, \(0\leq t_1\leq t_2\leq\cdots\leq t_n\)
  • Plot \(t_i/i\) vs \(t_i\) on log-log scale
  • If the data is indeed generated from the power law process, the plot will give approximately a straight line with slope \(\lambda\)

Duane plot

Why \(t_i/i\) vs \(t_i\) on log-log scale?

  • Denote \(N(t)\) as the number of events in the interval \([0,t]\)
  • Expectation \(\displaystyle E[N(t)]=H(t)=\int_0^t h(t)dt=\frac{\gamma t^{1-\lambda}}{1-\lambda}\)
  • So \(t_i\) should be such that \[ \frac{\gamma t_i^{1-\lambda}}{1-\lambda}\approx i \quad\Leftrightarrow\quad \log\left(\frac{t_i}{i}\right)\approx\log\left(\frac{1-\lambda}{\gamma}\right)+\lambda\log t_i\]

Sidetrack

How to check if some (non-temporal) data are Weibull distributed?

  • Consider a sequence \(\{x_i,i=1,2,\ldots,n\}\) assumed to be iid
  • Plot \(-\log\hat{S}(x_i)\) vs \(x_i\) on log-log scale, where \(\hat{S}(x_i)\) is the empirical survival function
  • If \(\{x_i\}\) indeed comes from the Weibull distribution, the plot will give approximately a straight line with slope equal to the shape parameter

Duane plot for #thehandmaidstale

Duane plot for #thehandmaidstale

Extending the power law process

  • Latter events seem to occur at a rate slower than expected by the power law process
  • We call an NHPP with \(h(t)=\gamma t^{-\lambda}e^{-\theta t}\) the hybrid process, where \(\gamma>0,\lambda<1,\theta\geq0\)
  • Alternatively called "power law with exponential cutoff" (Mathews et al. 2017)
  • When \(\theta=0\), the hybrid process becomes the power law process

Fitting the hybrid process

  • Data: retweets of top original only
  • Inference: maximum likelihood
  • Estimates: \(\left(\hat{\lambda},\hat{\theta},\hat{\gamma}\right)=(0.45,0,0.261)\)
  • Including \(\theta\) doesn't improve the fit here

Going back to the whole data set

Fitting the hybrid process (cont'd)

  • Data: retweets of all originals
  • Hierarchical approach: a distribution to govern the parameter values for retweets of each original
  • Question: can we actually explain the distribution of the parameters (and that of the retweet count) by some covariates?

Censored regression

Terminology

For \(i\)-th original, where \(i=1,2,\ldots,n\)

Retweet count \(m_i\)
Transformed retweet count \(m_i^{*}=\log(1+m_i)\)
Follower count \(x_i\)
Mean-centred follower count \(x_i^{*}=\log(1+x_i)-\displaystyle\sum_{k=1}^n\log(1+x_k)\)

Exploratory analysis

Model

  • Censored regression (Tobin 1958) \[ m_i^{*}=\max(0,\alpha+\beta x_i^{*}+\epsilon_i) \] \[ \epsilon_i~\overset{\text{iid}}{\sim}~\text{N}(0,\tau^{-1}) \]
  • Note the difference with \[ m_i^{*}=\max(0,\alpha+\beta x_i^{*})+\epsilon_i \]
  • \(x_i^{*}\) orthogonalises \(\alpha\) and \(\beta\) in inference

Extending the original censored regression

  • Terms with higher powers of \(x_i^{*}\) can be added to potentially improve fit
  • Linear and quadratic fits give maximum log-likelihood of 828.4 and 841.47, respectively
  • Therefore, the following censored regression will be used in the hierarchical model: \[ m_i^{*}=\max\left(0,\alpha+\beta x_i^{*}+\kappa\left(x_i^{*}\right)^2+\epsilon_i\right) \]

Graphical fit

Hierarchical model for all retweets

Recall

  • \(h(t)=\gamma t^{-\lambda}e^{-\theta t}, H(t)=\gamma\Gamma(1-\lambda,\theta t)\theta^{\lambda-1}\)
    • \(\Gamma(.,.)\) is the incomplete Gamma function
  • \(h(t)\) and \(H(t)\) increase linearly with \(\gamma\)
  • \(\gamma\) can be seen as a proxy for the retweet count
  • Making \(\gamma\) depend on (mean-centred) follower count removes the need to model (transformed) retweet count separately

Complete model specification

  • The \(i\)-th original occurs at time \(t_i\)
  • Its \(m_i\) retweets follow a hybrid process with intensity \[ h_i(t)=\gamma_i(t-t_i)^{-\lambda}e^{-\theta(t-t_i)}, t\geq t_i \]
  • Location shift from \(t\) to \(t-t_i\): retweets occur relative to the creation of the \(i\)-th original
  • Relationship of \(\gamma_i\) and \(x_i^{*}\) (mean-centred follower count): \[ \log(1+\gamma_i) = \max\left(0,\alpha+\beta x_i^{*}+\kappa\left(x_i^{*}\right)^2+\epsilon_i\right) \]
  • As before, \(\epsilon_i~\overset{\text{iid}}{\sim}~\text{N}(0,\tau^{-1})\)

All notations

  • Scalars
    • Observation period: \(T~\left(\geq t_n\geq t_{n-1}\geq\cdots\geq t_1\geq0\right)\)
    • Parameters: \(\alpha,\beta,\kappa,\lambda,\tau,\theta\)
  • For each original
    • \(i\)-th original's & its retweets' times: \(\boldsymbol{t}_i=(t_i,t_{i1},t_{i2},\ldots,t_{im_i})\)
    • \(\gamma_i\) not a parameter but a function of \(\alpha,\beta,\kappa,\epsilon_i\) and \(x_i^{*}\)
  • Vectors of length \(n\)
    • Retweet counts: \(\boldsymbol{m}=(m_1,m_2,\ldots,m_n)\)
    • All event times: \(\boldsymbol{t}=\{\boldsymbol{t}_1,\boldsymbol{t}_2,\ldots,\boldsymbol{t}_n\}\)
    • (Mean-centred) follower counts: \(\boldsymbol{x}^{*}=(x_1^{*},x_2^{*},\ldots,x_n^{*})\)
    • Random errors / latent variables: \(\boldsymbol{\epsilon}=(\epsilon_1,\epsilon_2,\ldots,\epsilon_n)\)

Likelihood

\[ f(\boldsymbol{m},\boldsymbol{t}|\boldsymbol{x}^{*},\alpha,\beta,\kappa,\lambda,\tau,\boldsymbol{\epsilon},\theta=0) = \prod_{i:m_i>0}\left[\boldsymbol{1}_{\{\gamma_i>0\}}\gamma_i^{m_i}\prod_{j=1}^{m_i}(t_{ij}-t_{i})^{-\lambda}\right] \] \[ \times\exp\left(-\left(1-\lambda\right)^{-1}\sum_{i=1}^{n}\gamma_i(T-t_i)^{1-\lambda}\right) \] \[ f(\boldsymbol{m},\boldsymbol{t}|\boldsymbol{x}^{*},\alpha,\beta,\kappa,\lambda,\tau,\boldsymbol{\epsilon},\theta>0) = \prod_{i:m_i>0}\left[\boldsymbol{1}_{\{\gamma_i>0\}}\gamma_i^{m_i}\prod_{j=1}^{m_i}(t_{ij}-t_{i})^{-\lambda}\right] \] \[ \times\exp\left(-\theta^{\lambda-1}\sum_{i=1}^{n}\left[\gamma_i\Gamma(1-\lambda,\theta(T-t_i))\right] - \theta\sum_{i:m_i>0}\sum_{j=1}^{m_i}(t_{ij}-t_{i})\right)\]

\(\theta=0\) vs \(\theta>0\)

  • We are actually interested which one is adequate
    • Hierarchical model of power law processes \((\theta=0)\)
    • Hierarchical model of hybrid processes \((\theta>0)\)
  • Treat them as two different models
    • \(f(\boldsymbol{m},\boldsymbol{t}|\boldsymbol{x}^{*},\alpha,\beta,\kappa,\lambda,\tau,\boldsymbol{\epsilon},\theta=0) \rightarrow f(\boldsymbol{m},\boldsymbol{t}|\boldsymbol{x}^{*},\boldsymbol{\eta}_0,M=0)\), where \(\boldsymbol{\eta}_0=(\alpha,\beta,\kappa,\lambda,\tau,\boldsymbol{\epsilon})\)
    • \(f(\boldsymbol{m},\boldsymbol{t}|\boldsymbol{x}^{*},\alpha,\beta,\kappa,\lambda,\tau,\boldsymbol{\epsilon},\theta>0) \rightarrow f(\boldsymbol{m},\boldsymbol{t}|\boldsymbol{x}^{*},\boldsymbol{\eta}_1,M=1)\), where \(\boldsymbol{\eta}_1=(\alpha,\beta,\kappa,\lambda,\tau,\boldsymbol{\epsilon},\theta)\)
  • Infer \(M\) via model selection in inference

Inference

Bayesian approach because of:

  • The presence of latent variables \(\boldsymbol{\epsilon}\)
  • Model selection required between \(M=0\) and \(M=1\)
Independent priors: \[ \alpha,\beta,\kappa\sim\text{N}(\text{mean}=0,~\text{sd}=100) \] \[ \tau,1-\lambda\sim\text{Gamma}(\text{shape}=1,~\text{rate}=0.001) \] \[ (\text{For}~M=1)~\theta\sim\text{Gamma}(\text{shape}=1,~\text{rate}=0.001) \] \[ M: \pi(M=0) = 1 - \pi(M=1) \]

MCMC (individual models)

Metropolis-within-Gibbs (MWG) algorithm

  • Update each scalar parameter individually
  • Metropolis steps:
    • \(\alpha,\beta,\kappa,\lambda\)
    • \(\epsilon_i, i = 1,2,\ldots,n\)
    • (For \(M=1\)) \(\theta\)
  • Gibbs step: \(\tau\)

MCMC (model selection)

Gibbs variable selection (Dellaportas, Forster, and Ntzoufras 2002)

Required quantities

  1. Likelihoods: \(f(\boldsymbol{m},\boldsymbol{t}|\boldsymbol{x}^{*},\boldsymbol{\eta}_0,M=0)\) & \(f(\boldsymbol{m},\boldsymbol{t}|\boldsymbol{x}^{*},\boldsymbol{\eta}_1,M=1)\)
  2. Model priors: \(\pi(M=0)\) & \(\pi(M=1)\)
  3. Prior of \(\theta\) under \(M=1\): \(\pi(\theta|M=1)\)
  4. Pseudoprior of \(\theta\) under \(M=0\): \(\pi(\theta|M=0)\)

MCMC (model selection)

If current value of \(M=1\):

  1. Draw \(\boldsymbol{\eta}_1\) using the individual MWG algorithm (previous slide)
  2. Split \(\boldsymbol{\eta}_1\) into \(\boldsymbol{\eta}_0\) and \(\theta\) \(\quad\Leftrightarrow\quad\boldsymbol{\eta}_1=(\boldsymbol{\eta}_0,\theta)\)
  3. Calculate \(A_0\) and \(A_1\):
    • \(A_0 = f(\boldsymbol{m},\boldsymbol{t}|\boldsymbol{x}^{*},\boldsymbol{\eta}_0,M=0)\pi(\theta|M=0)\pi(M=0)\)
    • \(A_1 = f(\boldsymbol{m},\boldsymbol{t}|\boldsymbol{x}^{*},\boldsymbol{\eta}_1,M=1)\pi(\theta|M=1)\pi(M=1)\)
  4. Set \(M\) to \(0\) with prob. \(\displaystyle\frac{A_0}{A_0+A_1}\) or \(1\) with prob. \(\displaystyle\frac{A_1}{A_0+A_1}\)

MCMC (model selection)

If current value of \(M=0\):

  1. Draw \(\boldsymbol{\eta}_0\) using the individual MWG algorithm
  2. Draw \(\theta\) from pseudoprior \(\pi(\theta|M=0)\)
  3. Combine \(\boldsymbol{\eta}_0\) and \(\theta\) to form \(\boldsymbol{\eta}_1\) \(\quad\Leftrightarrow\quad\boldsymbol{\eta}_1=(\boldsymbol{\eta}_0,\theta)\)
  4. Calculate \(A_0\) and \(A_1\) as before
  5. Set \(M\) to \(0\) or \(1\) with probabilities as before

Bayes factor

  • The empirical proportion of \(M=1\) in the converged chain is the estimate for the posterior probability \(\pi(M=1|\boldsymbol{m},\boldsymbol{t},\boldsymbol{x}^{*})\)
  • Similarly for \(\pi(M=0|\boldsymbol{m},\boldsymbol{t},\boldsymbol{x}^{*})\)
  • Bayes factor for model \(1\) over model \(0\) is \[ B_{10}=\frac {\hat{\pi}(M=1|\boldsymbol{m},\boldsymbol{t},\boldsymbol{x}^{*})} {\hat{\pi}(M=0|\boldsymbol{m},\boldsymbol{t},\boldsymbol{x}^{*})} \left/\frac {\pi(M=1)} {\pi(M=0)} \right. \]

Application

#thehandmaidstale

Season 1 finale, 2017-06-14

#thehandmaidstale

  • Observation period: \(T\approx21\) hours
  • Originals: \(n=2043\), of which \(265\) were ever retweeted
  • Retweets: \(\displaystyle\sum_{i=1}^nm_i=971,~\underset{1\leq i\leq n}{\max}m_i=204\)

#gots7

Season 7 premiere, 2017-07-16

#gots7

  • Observation period: \(T\approx4.4\) hours
  • Originals: \(n=25420\), of which \(3145\) were ever retweeted
  • Retweets: \(\displaystyle\sum_{i=1}^nm_i=29751,~\underset{1\leq i\leq n}{\max}m_i=3204\)

Posterior density

Posterior density

Bayes factor & model selection

#thehandmaidstale

  • \(B_{10}=6.572\)
    • Hierarchical model of hybrid processes selected
  • Mathews et al. (2017): "exponential cutoff" phenomenon for tweets collected over a similar duration of 24 hours

#gots7

  • \(B_{10}=1.219\times 10^{-9}\)
    • Hierarchical model of power law processes selected
  • Mathews et al. (2017): power law phenomenon for tweets collected over a similar duration of 3 hours

Prediction

Prediction

Summary

  • Use Duane plot as a diagnostic for the power law process, which is extended to the hybrid process
  • Model relationship between follower count and retweet count using censored regression
  • Construct a hierarchical model for retweets by combining the hybrid process and censored regression
  • Infer the model choice between power law and hybrid processes in MCMC algorithm
  • Selected models for different data consistent with previous findings

Bibliography

Dellaportas, Petros, Jonathan J Forster, and Ioannis Ntzoufras. 2002. “On Bayesian Model and Variable Selection Using MCMC.” Statistics and Computing 12: 27–36.

Duane, J. T. 1964. “Learning Curve Approach to Reliability Monitoring.” IEEE Transactions on Aerospace 2 (2): 563–66.

Mathews, Peter, Lewis Mitchell, Giang Nguyen, and Nigel Bean. 2017. “The Nature and Origin of Heavy Tails in Retweet Activity.” In Proceedings of the 26th International Conference on World Wide Web Companion, 1493–8. WWW ’17 Companion. Republic; Canton of Geneva, Switzerland: International World Wide Web Conferences Steering Committee. doi:10.1145/3041021.3053903.

Tobin, James. 1958. “Estimation of Relationships for Limited Dependent Variables.” Econometrica 26 (1): 24–36. doi:10.2307/1907382.