Contributing to Statlib

Thank you for your interest in contributing to Statlib! Whether you are a statistician new to Lean, a Lean expert curious about statistics, or a CS / AI researcher interested in formalization tools, we are building with you. If anything in this document is unclear, please ask on the Statlib Zulip channel.

Contents

About Statlib

Statlib is, first, a community. Our goal is to develop the practices, shared vocabulary, and reusable infrastructure that let statisticians actually use Lean as a tool for research - to write in it, communicate through it, and verify their work with it.

We are a research-driven community. We target frontier topics through many kinds of contributions. Given that many probability-related results are missing from Mathlib and would be useful across a range of frontier directions in statistical research, we will also formalize the fundamentals of probability theory from the bottom up. The scope of Statlib is shaped by the interests of our maintainers, advisors, and active contributors.

Statlib is not an autoformalization project. Several projects of that kind already exist - typically one team using an LLM to translate a book or paper end-to-end. Statlib is different: its definitions and conventions are developed together with statisticians who use the language in their own research, with the aim of building a trusted core that future statistical research can use and rely on.

Relationship to Mathlib

Statlib depends on Mathlib. We do not duplicate Mathlib's MeasureTheory or ProbabilityTheory. Some of Statlib may live outside Mathlib until the project is mature and the code is reusable. Statlib's mission is to support specific statistical research topics.

How to contribute

You are welcome to contribute in any way you see fit and have the most fun with.

Project maintainers or leaders may use blueprints, GitHub project dashboards, or GitHub issues to maintain a project, claim or disclaim tasks, and coordinate through Zulip.

New to Lean? See the official Lean installation instructions.

Submitting a pull request

Most contributions come in as pull requests that add this kind of content to the library.

We ask that PRs:

The role of AI

We believe agentic coding plays a substantial role in growing the library. We require transparency about that use and treat the resulting metadata as a research resource.

Our ultimate goal is not to be the first to formalize something. We believe digestion, explanation, and reusability matter more than generation. We therefore work to avoid PRs that create a maintenance-burden for the community, which we define as a PR meeting two or more of the following criteria:

Therefore, for any PR to Statlib, the contributor is expected to be able to honestly state:

I have read and understood every line of code in this PR. I take responsibility for its correctness, style, and fit with the surrounding library. If a reviewer raises a question, I can answer it without re-running the AI tool.

For background, see the Mathlib Zulip discussion #mathlib4 > What to do with "slop" PRs.

Working with us across fields

How best should we formalize statistics in Lean, and what does the future of research look like as formalization matures? We are an experimental library and welcome researchers from all areas to propose projects of many kinds: computer science projects, dataset-building efforts, automatic theorem-proving tools, and research on human-AI collaboration. This is a space to experiment with what research, education, and related work may look like in the future.

Style and documentation

Statlib follows the Mathlib style guide for layout, naming, and proof formatting.

We will add more guidelines for governance, todo lists, continuous integration, and tutorials soon.

If you spot anything in this document that should be improved - clarified, expanded, contradicted by experience, or just plain wrong - please open an issue or a PR. This document is itself a living artifact of the project.