Contemporary Python Packaging (2023)
This document lays out a set of Python packaging practices. I don’t claim they are best practices, but they fit my needs, and might fit yours.
Validity
This was written in January 2023, superseding previous posts from 2020 and 2019.
As of this writing, Python 3.7 is approaching its end-of-life and many packages have already set a minimum version of Python 3.8. This document should be superseded or disregarded no later than the Python 3.9 end-of-life. If you cite this as a justification for your behavior, please stop doing so at that time.
Summary
- Describe all of your project metadata in
pyproject.toml
- Use hatchling to build sdists and wheels
- Use hatch-vcs to extract the version from git and, optionally, write into a module
- Test your packages with a build-test-deploy workflow in CI
A Cookiecutter template implementing these suggestions can be found at https://github.com/effigies/cookiecutter-packaging.
Changes from previous revision
- Drop setuptools and versioneer
- Reduce focus on support for legacy versions of pip or build tools
- Add section on continuous integration (CI) for testing packages
Likely future changes
- Adjust to use
src/
layout instead of flat - Discussion of namespace packages
- Discussion of versioning multiple packages within a monorepo
- Testing environments using hatch or tox
Perspective
My position in the Python ecosystem will color my perspective and approach to packaging, and, presumably, how much weight you give what I think.
I work primarily on neuroimaging packages. I am currently the lead maintainer of NiBabel and keep the lights on at Nipype 1.x while the transition to Pydra progresses. I also maintain PyBIDS, fMRIPrep and a few other packages closely related to the these.
These packages have different requirements. A couple worth mentioning are
- The nipy packages have historically included git hash references and version information to allow users to provide detailed debugging information. Preserving this while updating the infrastructure has taken effort.
- Pydra aims to keep a similar import infrastructure to nipype, so that
from nipype.interfaces import fsl
becomesfrom pydra.tasks import fsl
, while separating out the tasks into task packages.pydra.tasks.fsl
is provided by apydra-fsl
package.
A couple years ago, I became a maintainer of versioneer, which until a few years ago was the way in Python to extract versions from Git history. As a result, I’ve become a bit more familiar with the internals and development trajectory of setuptools.
On the whole, I believe the Python packaging ecosystem is moving in a positive direction. The changes in this document over previous versions reflect the evolution of the standards and tooling more than in my philosophy. In particular, the standardization of declarative package metadata and editable installs mean that the choice of build backends is now less fraught, and we are free to choose the simplest that gets the job done.
Desiderata
Motivating my recommendations are a few desiderata, in rough order of importance:
- Installation should work, from source, on fairly old systems. Debian Stable (11; “bullseye”) is my touchstone here.
- Prefer declarative syntax, and limit dynamic metadata, as much as possible.
- Enable revision-based versions, with minimal opportunity for error.
- Limit custom code to absolute minimum. (Partially redundant with limiting dynamic metadata.)
To operationalize (1), the following approaches should all install correctly:
pip install .
python -m build && pip install dist/*.tar.gz
python -m build && pip install dist/*.whl
And development/editable mode should work:
pip install -e .
To operationalize (3), all of the above should produce an install with the same version string, and setting the version should be done from a version control tag if possible. Assuming a git
repository, the following should also work:
git archive -o archive.tar.gz $TAG && pip install archive.tar.gz
Recommendations
I recommend using hatchling and hatch-vcs.
pyproject.toml
Create a pyproject.toml
file. This contains [build-system]
, [project]
, and [tool]
tables.
Build system
Python has the notion of build “frontends” and “backends”. The frontend is the command you call, such as pip
or python -m build
, while the backend is the tool these tools call to turn your source code into a built package or installed module.
[build-system]
requires = ["hatchling", "hatch-vcs"]
build-backend = "hatchling.build"
hatchling
is the build backend used by hatch, which is a larger project management tool. hatch-vcs wraps setuptools_scm to retrieve the version from VCS and write a version file to disk.
Project metadata
Most packaging metadata can be set declaratively in the [project]
table. See Declaring project metadata for the full specification of this table.
The following skeleton can be used as a model.
[project]
name = "project-name"
description = "A package"
readme = "README.md"
requires-python = ">=3.8"
license = { file="LICENSE" }
authors = [
{ name="You", email="your@email.tld" },
]
classifiers = [
"Programming Language :: Python :: 3",
]
dependencies = []
dynamic = ["version"]
[project.urls]
"Homepage" = "https://github.com/your/package"
Note the dynamic = ["version"]
field, which is required to use hatch-vcs
to determine the version.
I strongly recommend including the requires-python
field. This will prevent pip
from attempting to install on incompatible systems. When you drop 3.8 - or any other versions - update the requires-python
to avoid breaking downstream tools that still install on unsupported versions.
Tool configuration
Versioning
All that is needed to set the version from git now is:
[tool.hatch.version]
source = "vcs"
If you would like to make the version accessible from python, for example:
import project_name
print(project_name.__version__)
Then you need to write to file. Add the following to pyproject.toml
:
[tool.hatch.build.hooks.vcs]
version-file = "project_name/_version.py"
And ensure that the version is imported into __init__.py
:
try:
from ._version import version as __version__
except ImportError:
pass
Note that this will be modified every time you build
or pip install
the package, so you should add _version.py
to your .gitignore
.
To allow git-archive
to inject version information that will be picked up, create .git-archival.txt
:
node: $Format:%H$
node-date: $Format:%cI$
describe-name: $Format:%(describe:tags=true,match=*[0-9]*)$
ref-names: $Format:%D$
And create/add the following line to .gitattributes
:
.git_archival.txt export-subst
Package data
Hatchling defaults to including everything in your VCS into your sdist and everything inside the package directories of your sdist into your wheels. You can customize each of these individually:
[tool.hatch.build.targets.sdist]
exclude = [".git_archival.txt"]
[tool.hatch.build.targets.wheel]
packages = ["project_name"]
exclude = [
"project_name/data/test_data",
]
Exclude .git_archival.txt
so that it appears in git archives, but not sdists, which already have package metadata and version-file
s created.
By specifying packages
, you can have additional directories with scripts for managing your repository without confusing hatch
.
In passing, you might consider removing large test data from your wheel while leaving it in your sdist. Linux distributions will often prefer sdist over a repository as a canonical source for a version, so including all package data there is a good idea. It’s becoming common to have many virtual environments installed on a system, so keeping the footprint of installed wheels to a minimum is a good practice.
Notes on new build systems and legacy operating systems
The standardization of build frontends and backends has dramatically reduced my concerns about backwards compatibility. As long as a user is able to upgrade pip with pip install --upgrade --user pip
, essentially any build system becomes available. The main place where this could become problematic is on systems that can’t fetch from PyPI due to network access restrictions. In this case, distributing pre-built wheels that do not need any backend processing is probably cleanest.
Why not setuptools?
In this post I recommend moving away from setuptools. On the one hand, this is because I think the new approaches are better and cleaner. On the other hand, as the various build systems converge on supporting common standards, setuptools is becoming less stable.
Setuptools seems to be placed in an impossible situation, to simultaneously support decades of legacy package specifications as well as a collection of new standards that have resulted from dissatisfaction with how things have been done for decades. As a result, the churn is very high at the moment. Major versions are bumped multiple times a year, with support for new standards coming alongside deprecations or breakages of unadvertised but longstanding behavior.
So it’s difficult to incrementally adopt new standards while relying on the setuptools-specific bits to continue working as always. At a certain point it became less work to learn a new build backend than to keep on with setuptools. The good news is that there is a choice of backends, and 90% of the configuration should be the same, no matter which you choose.
The next jump, if it comes, should be less painful.
Continuous Integration
Whether you use the recommended approach or not, it’s worth checking that your packages build correctly. In addition to users of your packages, you may have third-party packagers that will prepare your package to be installed via conda or a Linux distribution-specific package manager. Testing your packages outside your repository reduces the chances of distributing broken packages.
The following is a minimal GitHub Actions specification for testing packages, but it should be easily translatable to other continuous integration services.
on:
push:
branches:
- master
- maint/*
tags:
- "*"
pull_request:
branches:
- master
- maint/*
defaults:
run:
shell: bash
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
with:
fetch-depth: 0
- uses: actions/setup-python@v4
with:
python-version: 3
- run: pip install --upgrade build twine
- name: Build sdist and wheel
run: python -m build
- run: twine check dist/*
- name: Upload sdist and wheel artifacts
uses: actions/upload-artifact@v3
with:
name: dist
path: dist/
- name: Build git archive
run: mkdir archive && git archive -v -o archive/archive.tgz HEAD
- name: Upload git archive artifact
uses: actions/upload-artifact@v3
with:
name: archive
path: archive/
test-package:
runs-on: ubuntu-latest
needs: [build]
strategy:
matrix:
package: ['wheel', 'sdist', 'archive']
steps:
- name: Download sdist and wheel artifacts
if: matrix.package != 'archive'
uses: actions/download-artifact@v3
with:
name: dist
path: dist/
- name: Download git archive artifact
if: matrix.package == 'archive'
uses: actions/download-artifact@v3
with:
name: archive
path: archive/
- uses: actions/setup-python@v4
with:
python-version: 3
- name: Display Python version
run: python -c "import sys; print(sys.version)"
- name: Update pip
run: pip install --upgrade pip
- name: Install wheel
if: matrix.package == 'wheel'
run: pip install dist/*.whl
- name: Install sdist
if: matrix.package == 'sdist'
run: pip install dist/*.tar.gz
- name: Install archive
if: matrix.package == 'archive'
run: pip install archive/archive.tgz
- name: Install test extras
run: pip install project-name[test]
- name: Run tests
run: pytest --doctest-modules -v --pyargs project_name
publish:
runs-on: ubuntu-latest
needs: [test-package]
if: github.event_name == 'push' && startsWith(github.ref, 'refs/tags/')
steps:
- uses: actions/download-artifact@v3
with:
name: dist
path: dist/
- uses: pypa/gh-action-pypi-publish@release/v1
with:
user: __token__
password: ${{ secrets.PYPI_API_TOKEN }}
(Apologies for the lack of newlines. They seem to be removed by the syntax highlighter.)
To use the publish
step, you will need to add a PyPI token to your encrypted secrets.