From the power law to extreme value mixture distributions

Clement Lee (Newcastle University)

2024-06-30

Outline

  • Networks \(\times\) extreme value theory
  • Extending the power law for degrees
  • Mixture distribution
    • even for data simulated from “true” random graph models
    • finite sample behaviour
  • Compare results for real & simulated data
    • make models more realistic?
  • Studying the evolution of real data

 

1. Introduction

Degree distribution

 

Why the power law?

  • Seemingly ubiquitous
  • Potentially generated from “nice” models
    • Preferential attachment (Barabasi-Albert)
    • Generalised random graphs

 

Really a straight line?

  • Survival function / complementary CDF

 

  • “Curved” downwards

Tackling various issues

2. Extreme value mixture distribution

Primer: relationships

Schematic, on log-log scale

  • Slope: \(-\alpha(<-1)\)

  • Tail heaviness: \(1/(\alpha-1)\)

  • \(\theta\in(0,1]\)

  • Reduced to Zipf\((\alpha)\) when \(\theta=1\)

  • Blue’s tail heaviness: \(\xi\)
  • Brown’s tail heaviness: \(1/(\alpha-1)\)
    • had the power law extended beyond \(u\)

Bayesian inference

CRAN dependencies

 

  • Mixture distribution improves fit
  • Better than alternatives

Simulated data

 

  • Not clear-cut even when simulated from true model
  • Uncertainty & finite sample behaviour

Tail heaviness: actual vs implied

  • Indication whole of data could follow the power law
  • Huge uncertainty of \(\xi\) still

 

  • Away from \(y=x\) line
  • Difficult to have sustained growth according to \(\alpha\)

3. Evolution over time

Tail heaviness relatively stable

So is the power law exponent

Tail steadily lighter than implied by power law

Summary

  • Degrees seemingly follow the power law
    • wholly or partially
  • Extreme value mixture distribution
    • power law \((\alpha)\) in the body
    • integer generalised Pareto \((\xi,\ldots)\) in the tail
  • Comparing \(\xi\) and \(1/(\alpha-1)\)
    • indicates if power law applies to whole of data
  • Application: CRAN dependencies
    • seeming stability over time
  • Next steps
    • evolution of the raw data
    • what model (modification) leads to such behaviour

 

Bibliography

Artico, I., I. Smolyarenko, V. Vinciotti, and E. C. Wit. 2020. “How Rare Are Power-Law Networks Really?” Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences 476 (2241): 20190742. https://doi.org/10.1098/rspa.2019.0742.

Ayed, Fadhel, Juho Lee, and François Caron. 2019. “Beyond the Chinese Restaurant and Pitman-Yor Processes: Statistical Models with Double Power-Law Behavior.” ArXiv E-Print. http://arxiv.org/abs/1902.04714.

Broido, A. D., and A. Clauset. 2019. “Scale-Free Networks Are Rare.” Nature Communications 10 (1017). https://doi.org/10.1038/s41467-019-08746-5.

Clauset, A., C. R. Shalizi, and M. E. J. Newman. 2009. “Power-Law Distributions in Empirical Data.” SIAM Review 51 (4): 661–703. https://doi.org/10.1137/070710111.

Jung, Hohyun, and Frederick Kin Hing Phoa. 2021. “A Mixture Model of Truncated Zeta Distributions with Applications to Scientific Collaboration Networks.” Entropy 23 (5). https://doi.org/10.3390/e23050502.

Valero, Jordi, Marta Pérez-Casany, and Ariel Duarte-López. 2022. “The Zipf-Polylog Distribution: Modeling Human Interactions Through Social Networks.” Physica A 603. https://doi.org/10.1016/j.physa.2022.127680.

Voitalov, Ivan, Pim van der Hoorn, Remco van der Hofstad, and Dmitri Krioukov. 2019. “Scale-Free Networks Well Done.” Phys. Rev. Res. 1 (3): 033034. https://doi.org/10.1103/PhysRevResearch.1.033034.