From the power law to extreme value mixture distributions

Clement Lee (joint work with Emma Eastoe and Aiden Farrell)

2024-07-10

Outline

  • Extending the power law for network degrees
    • claimed to be ubiquitous
  • Mixture distribution
    • incorporating extreme value theory
  • Application
    • useful even for simulated data
    • comparison with real data
  • Studying the fit over time

 

Introduction

Power laws

  • Continuous: Pareto distribution (\(\alpha>1\))

\[\begin{align*} f(y) &\propto y^{-\alpha},\qquad{}y>y_0 \\ \log f(y) &= -\alpha\log{}y+c \\ & \\ & \\ F(y) &= 1-(y/y_0)^{-(\alpha-1)} \\ \log\left(1-F(y)\right) &= -(\alpha-1)\log{}y + c^{*} \end{align*}\]

 

  • Discrete: Zipf distribution

\[\begin{align*} p(x) &\propto x^{-\alpha},\qquad{}x=x_0,x_0+1,\ldots \\ \log p(x) &= -\alpha\log{}x+c & \\ & \\ \end{align*}\]

  • Approximate linearity for survival function on log-log scale

Degrees of real networks

 

Why the power law?

  • Seemingly ubiquitous for networks
    • and other kinds of data
  • “Nice” models imply power law degree distribution
  • Preferential attachment (Barabási and Albert 1999)
    • Undirected: Hofstad (2016b)
    • Directed: Bollobás et al. (2001)
    • Nonlinear: Oliveira and Spencer (2005),
      Rudas, Tóth, and Valkó (2007)
  • Generalised random graphs
    • Hofstad (2016a)

 

  • Workflow
    1. Plot degrees on log-log scale
    2. Fit some distribution to (subset of) data,
      claim degrees follow the power law (or not)
    3. Claim network comes from preferential attachment model (or not)
  • Criticisms
    1. Visualisation not good enough
    2. Inadequacy of distributions;
      testing procedure not fit for purpose
    3. Sufficient but not necessary condition

1. Visualisation not good enough

 

  • Valero, Pérez-Casany, and Duarte-López (2022)

Really a straight line?

  • Survival function visualises large degrees better

 

  • “Curved” downwards

2. Inadeqaucy of distributions

Method Reason Issue
Subset data above \(u\) & choose optimal \(u\) by Kolmogorov-Smirnov statistic (Clauset, Shalizi, and Newman 2009) Small degrees deviate from straight line Requires additional procedure; likelihood under different \(u\) not comparable
Lognormal (Clauset, Shalizi, and Newman 2009; Buzsáki and Mizuseki 2014) To improve overall fit Light-tailed; inherently continuous
Weibull / stretched exponential (Malevergne, Pisarenko, and Sornette 2005) To improve overall fit Inherently continuous
Zipf-polylog / power law with exponential cut-off (Valero, Pérez-Casany, and Duarte-López 2022; Pastor-Satorras and Vespignani 2001) To improve overall fit; to accommodate “curvature” Light-tailed
Incorporating extreme value methods (Voitalov et al. 2019) Large degrees deviate from straight line Assumes heavy-tailed; inherently continuous
Double power law (Ayed, Lee, and Caron 2019) To accommodate “curvature” Assumes heavy-tailed
Mixture of Zipfs (Jung and Phoa 2021) Small degrees deviate from straight line Assumes heavy-tailed

Extreme value mixture distribution

Primer: relationships

Rightmost column

Schematic of spliced mixture distribution

  • Slope: \(-\alpha(<-1)\)

  • Tail heaviness: \(1/(\alpha-1)\)

  • \(\theta\in(0,1]\)

  • Zipf\((\alpha)\) when \(\theta=1\)

  • Blue’s tail heaviness: \(\xi\)
  • Brown’s tail heaviness: \(1/(\alpha-1)\)
    • had the power law extended beyond \(u\)

Bayesian inference

CRAN dependencies

 

  • Mixture distribution improves fit
  • Better than alternatives

From preferential attachment model

 

  • Not clear-cut even when simulated from true model
  • Uncertainty & finite sample behaviour

Tail heaviness: actual vs implied

  • Away from \(y=x\) line
  • Difficult to have sustained growth according to \(\alpha\)

 

  • Indication whole of data could follow the power law
  • Huge uncertainty of \(\xi\) still

Evolution

Relatively stable

 

Tail steadily lighter than implied by power law

Summary

Next steps

Hypothesis testing

  • \(H_0\): Data follows the power law
    • not necessarily arising from distribution as iid samples
  • Test statistic
    • distance between estimates of \(\xi\) and \(1/(\alpha-1)\)
    • possibly utilising \(u\) as well
    • what is the distribution under \(H_0\)?

 

Underlying network model

  • Evolution of the raw data
    • A generalised linear model
    • Different to Jeong, Néda, and Barabási (2003) who studied overall growth
  • Is preferential attachment still evident?
    • How do lighter-than-power-law heavy tails arise?
    • What modifications to the model are required? Fitness, aging / fatigue?
  • Prove modified network model \(\Rightarrow\) desired limiting degree distribution

Thank you!

References

Ayed, Fadhel, Juho Lee, and François Caron. 2019. “Beyond the Chinese Restaurant and Pitman-Yor Processes: Statistical Models with Double Power-Law Behavior.” ArXiv E-Print. http://arxiv.org/abs/1902.04714.

Barabási, Albert-László, and Réka Albert. 1999. “Emergence of Scaling in Random Networks.” Science 286 (5439): 509–12. https://doi.org/10.1126/science.286.5439.509.

Bollobás, B, O Riordan, J Spencer, and G Tusnády. 2001. “The Degree Sequence of a Scale-Free Random Graph Process.” Random Structures Algorithms 18 (3): 279–90. https://doi.org/10.1002/rsa.1009.

Buzsáki, G, and K Mizuseki. 2014. “The Log-Dynamic Brain: How Skewed Distributions Affect Network Operations.” Nature Reviews Neuroscience 15: 264–78. https://doi.org/10.1038/nrn3687.

Clauset, A., C. R. Shalizi, and M. E. J. Newman. 2009. “Power-Law Distributions in Empirical Data.” SIAM Review 51 (4): 661–703. https://doi.org/10.1137/070710111.

Hofstad, Remco van der. 2016a. “Generalized Random Graphs.” In Random Graphs and Complex Networks, 183–215. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press. https://doi.org/10.1017/9781316779422.009.

———. 2016b. “Preferential Attachment Models.” In Random Graphs and Complex Networks, 256–300. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press. https://doi.org/10.1017/9781316779422.011.

Jeong, H, Z Néda, and A L Barabási. 2003. “Measuring Preferential Attachment in Evolving Networks.” Europhysics Letters 61 (4): 567–72. https://doi.org/10.1209/epl/i2003-00166-9.

Jung, Hohyun, and Frederick Kin Hing Phoa. 2021. “A Mixture Model of Truncated Zeta Distributions with Applications to Scientific Collaboration Networks.” Entropy 23 (5). https://doi.org/10.3390/e23050502.

Malevergne, Y, V Pisarenko, and D Sornette. 2005. “Empirical Distributions of Stock Returns: Between the Stretched Exponential and the Power Law?” Quantitative Finance 5 (4): 379–401. https://doi.org/10.1080/14697680500151343.

Oliveira, R, and J Spencer. 2005. “Connectivity Transitions in Networks with Super-Linear Preferential Attachment.” Internet Mathematics 2 (2): 121–63. https://doi.org/10.1080/15427951.2005.10129101.

Pastor-Satorras, Romualdo, and Alessandro Vespignani. 2001. “Epidemic Dynamics and Endemic States in Complex Networks.” Physical Review E 63 (6): 066117. https://doi.org/10.1103/PhysRevE.63.066117.

Rudas, A, B Tóth, and B Valkó. 2007. “Random Trees and General Branching Processes.” Random Structures Algorithms 31 (2): 186–202. https://doi.org/10.1002/rsa.20137.

Valero, Jordi, Marta Pérez-Casany, and Ariel Duarte-López. 2022. “The Zipf-Polylog Distribution: Modeling Human Interactions Through Social Networks.” Physica A 603. https://doi.org/10.1016/j.physa.2022.127680.

Voitalov, Ivan, Pim van der Hoorn, Remco van der Hofstad, and Dmitri Krioukov. 2019. “Scale-Free Networks Well Done.” Phys. Rev. Res. 1 (3): 033034. https://doi.org/10.1103/PhysRevResearch.1.033034.