Statlib

Statlib provides the foundational framework for classical, contemporary, and emerging research in mathematical statistics. Drawing deeply from diverse domains, including real, functional, and convex analysis, measure theory, stochastic calculus, combinatorics, probability, and information theory, theoretical statistics demands absolute precision. As LLM-generated proofs become increasingly prevalent across these sub-disciplines, the need for rigorous, machine-verified mathematical theories has never been more urgent. While Lean's Mathlib is structured for broad mathematical domains, Statlib leverages its rich API ecosystem to deliver a streamlined, specialized library dedicated exclusively to theoretical statistics.

Development begins with the core foundations of statistical inference within decision-theoretic and local asymptotic-theory frameworks. However, the library's architecture is explicitly designed for dynamic growth, scaling to encompass nonparametric and semiparametric statistics, alongside modern frontiers such as high-dimensional statistics, conformal inference, and frameworks for information-theoretic and computational trade-offs.

To support this vision, our workflow centers on coordinating targeted formalization projects across both classical and modern methods, developing comprehensive tutorials to onboard future contributors, and establishing a collaborative forum to address shared architectural themes and implementation challenges.

We would like to thank Jeremy Avigad (Carnegie Mellon University) and Rémy Degenne (Inria center at the University of Lille) for support and encouragement in the process of initiating and developing this library.

Our team of initial contributors, listed in alphabetical order, includes Yongxi (Aaron) Lin (Carnegie Mellon University), Debarghya Mukherjee (Boston University), Rajarshi Mukherjee (Harvard University), Fred Rajasekaran (Stanford University), and Zixiao Jolene Wang (Harvard University).