Contemporary Python Packaging (2023)

python
code
packaging
A set of practices for packaging Python modules.
Author

Chris Markiewicz

Published

16 January 2023

This document lays out a set of Python packaging practices. I don’t claim they are best practices, but they fit my needs, and might fit yours.

Validity

This was written in January 2023, superseding previous posts from 2020 and 2019.

As of this writing, Python 3.7 is approaching its end-of-life and many packages have already set a minimum version of Python 3.8. This document should be superseded or disregarded no later than the Python 3.9 end-of-life. If you cite this as a justification for your behavior, please stop doing so at that time.

Summary

  • Describe all of your project metadata in pyproject.toml
  • Use hatchling to build sdists and wheels
  • Use hatch-vcs to extract the version from git and, optionally, write into a module
  • Test your packages with a build-test-deploy workflow in CI

A Cookiecutter template implementing these suggestions can be found at https://github.com/effigies/cookiecutter-packaging.

Changes from previous revision

  • Drop setuptools and versioneer
  • Reduce focus on support for legacy versions of pip or build tools
  • Add section on continuous integration (CI) for testing packages

Likely future changes

  • Adjust to use src/ layout instead of flat
  • Discussion of namespace packages
  • Discussion of versioning multiple packages within a monorepo
  • Testing environments using hatch or tox

Perspective

My position in the Python ecosystem will color my perspective and approach to packaging, and, presumably, how much weight you give what I think.

I work primarily on neuroimaging packages. I am currently the lead maintainer of NiBabel and keep the lights on at Nipype 1.x while the transition to Pydra progresses. I also maintain PyBIDS, fMRIPrep and a few other packages closely related to the these.

These packages have different requirements. A couple worth mentioning are

  1. The nipy packages have historically included git hash references and version information to allow users to provide detailed debugging information. Preserving this while updating the infrastructure has taken effort.
  2. Pydra aims to keep a similar import infrastructure to nipype, so that from nipype.interfaces import fsl becomes from pydra.tasks import fsl, while separating out the tasks into task packages. pydra.tasks.fsl is provided by a pydra-fsl package.

A couple years ago, I became a maintainer of versioneer, which until a few years ago was the way in Python to extract versions from Git history. As a result, I’ve become a bit more familiar with the internals and development trajectory of setuptools.

On the whole, I believe the Python packaging ecosystem is moving in a positive direction. The changes in this document over previous versions reflect the evolution of the standards and tooling more than in my philosophy. In particular, the standardization of declarative package metadata and editable installs mean that the choice of build backends is now less fraught, and we are free to choose the simplest that gets the job done.

Desiderata

Motivating my recommendations are a few desiderata, in rough order of importance:

  1. Installation should work, from source, on fairly old systems. Debian Stable (11; “bullseye”) is my touchstone here.
  2. Prefer declarative syntax, and limit dynamic metadata, as much as possible.
  3. Enable revision-based versions, with minimal opportunity for error.
  4. Limit custom code to absolute minimum. (Partially redundant with limiting dynamic metadata.)

To operationalize (1), the following approaches should all install correctly:

  • pip install .
  • python -m build && pip install dist/*.tar.gz
  • python -m build && pip install dist/*.whl

And development/editable mode should work:

  • pip install -e .

To operationalize (3), all of the above should produce an install with the same version string, and setting the version should be done from a version control tag if possible. Assuming a git repository, the following should also work:

  • git archive -o archive.tar.gz $TAG && pip install archive.tar.gz

Recommendations

I recommend using hatchling and hatch-vcs.

pyproject.toml

Create a pyproject.toml file. This contains [build-system], [project], and [tool] tables.

Build system

Python has the notion of build “frontends” and “backends”. The frontend is the command you call, such as pip or python -m build, while the backend is the tool these tools call to turn your source code into a built package or installed module.

[build-system]
requires = ["hatchling", "hatch-vcs"]
build-backend = "hatchling.build"

hatchling is the build backend used by hatch, which is a larger project management tool. hatch-vcs wraps setuptools_scm to retrieve the version from VCS and write a version file to disk.

Project metadata

Most packaging metadata can be set declaratively in the [project] table. See Declaring project metadata for the full specification of this table.

The following skeleton can be used as a model.

[project]
name = "project-name"
description = "A package"
readme = "README.md"
requires-python = ">=3.8"
license = { file="LICENSE" }
authors = [
  { name="You", email="your@email.tld" },
]
classifiers = [
  "Programming Language :: Python :: 3",
]
dependencies = []
dynamic = ["version"]

[project.urls]
"Homepage" = "https://github.com/your/package"

Note the dynamic = ["version"] field, which is required to use hatch-vcs to determine the version.

I strongly recommend including the requires-python field. This will prevent pip from attempting to install on incompatible systems. When you drop 3.8 - or any other versions - update the requires-python to avoid breaking downstream tools that still install on unsupported versions.

Tool configuration

Versioning

All that is needed to set the version from git now is:

[tool.hatch.version]
source = "vcs"

If you would like to make the version accessible from python, for example:

import project_name
print(project_name.__version__)

Then you need to write to file. Add the following to pyproject.toml:

[tool.hatch.build.hooks.vcs]
version-file = "project_name/_version.py"

And ensure that the version is imported into __init__.py:

try:
    from ._version import version as __version__
except ImportError:
    pass

Note that this will be modified every time you build or pip install the package, so you should add _version.py to your .gitignore.

To allow git-archive to inject version information that will be picked up, create .git-archival.txt:

node: $Format:%H$
node-date: $Format:%cI$
describe-name: $Format:%(describe:tags=true,match=*[0-9]*)$
ref-names: $Format:%D$

And create/add the following line to .gitattributes:

.git_archival.txt  export-subst
Package data

Hatchling defaults to including everything in your VCS into your sdist and everything inside the package directories of your sdist into your wheels. You can customize each of these individually:

[tool.hatch.build.targets.sdist]
exclude = [".git_archival.txt"]

[tool.hatch.build.targets.wheel]
packages = ["project_name"]
exclude = [
    "project_name/data/test_data",
]

Exclude .git_archival.txt so that it appears in git archives, but not sdists, which already have package metadata and version-files created.

By specifying packages, you can have additional directories with scripts for managing your repository without confusing hatch.

In passing, you might consider removing large test data from your wheel while leaving it in your sdist. Linux distributions will often prefer sdist over a repository as a canonical source for a version, so including all package data there is a good idea. It’s becoming common to have many virtual environments installed on a system, so keeping the footprint of installed wheels to a minimum is a good practice.

Notes on new build systems and legacy operating systems

The standardization of build frontends and backends has dramatically reduced my concerns about backwards compatibility. As long as a user is able to upgrade pip with pip install --upgrade --user pip, essentially any build system becomes available. The main place where this could become problematic is on systems that can’t fetch from PyPI due to network access restrictions. In this case, distributing pre-built wheels that do not need any backend processing is probably cleanest.

Why not setuptools?

In this post I recommend moving away from setuptools. On the one hand, this is because I think the new approaches are better and cleaner. On the other hand, as the various build systems converge on supporting common standards, setuptools is becoming less stable.

Setuptools seems to be placed in an impossible situation, to simultaneously support decades of legacy package specifications as well as a collection of new standards that have resulted from dissatisfaction with how things have been done for decades. As a result, the churn is very high at the moment. Major versions are bumped multiple times a year, with support for new standards coming alongside deprecations or breakages of unadvertised but longstanding behavior.

So it’s difficult to incrementally adopt new standards while relying on the setuptools-specific bits to continue working as always. At a certain point it became less work to learn a new build backend than to keep on with setuptools. The good news is that there is a choice of backends, and 90% of the configuration should be the same, no matter which you choose.

The next jump, if it comes, should be less painful.

Continuous Integration

Whether you use the recommended approach or not, it’s worth checking that your packages build correctly. In addition to users of your packages, you may have third-party packagers that will prepare your package to be installed via conda or a Linux distribution-specific package manager. Testing your packages outside your repository reduces the chances of distributing broken packages.

The following is a minimal GitHub Actions specification for testing packages, but it should be easily translatable to other continuous integration services.

on:
  push:
    branches:
      - master
      - maint/*
    tags:
      - "*"
  pull_request:
    branches:
      - master
      - maint/*

defaults:
  run:
    shell: bash

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
        with:
          fetch-depth: 0
      - uses: actions/setup-python@v4
        with:
          python-version: 3
      - run: pip install --upgrade build twine
      - name: Build sdist and wheel
        run: python -m build
      - run: twine check dist/*
      - name: Upload sdist and wheel artifacts
        uses: actions/upload-artifact@v3
        with:
          name: dist
          path: dist/
      - name: Build git archive
        run: mkdir archive && git archive -v -o archive/archive.tgz HEAD
      - name: Upload git archive artifact
        uses: actions/upload-artifact@v3
        with:
          name: archive
          path: archive/

  test-package:
    runs-on: ubuntu-latest
    needs: [build]
    strategy:
      matrix:
        package: ['wheel', 'sdist', 'archive']
    steps:
      - name: Download sdist and wheel artifacts
        if: matrix.package != 'archive'
        uses: actions/download-artifact@v3
        with:
          name: dist
          path: dist/
      - name: Download git archive artifact
        if: matrix.package == 'archive'
        uses: actions/download-artifact@v3
        with:
          name: archive
          path: archive/
      - uses: actions/setup-python@v4
        with:
          python-version: 3
      - name: Display Python version
        run: python -c "import sys; print(sys.version)"
      - name: Update pip
        run: pip install --upgrade pip
      - name: Install wheel
        if: matrix.package == 'wheel'
        run: pip install dist/*.whl
      - name: Install sdist
        if: matrix.package == 'sdist'
        run: pip install dist/*.tar.gz
      - name: Install archive
        if: matrix.package == 'archive'
        run: pip install archive/archive.tgz
      - name: Install test extras
        run: pip install project-name[test]
      - name: Run tests
        run: pytest --doctest-modules -v --pyargs project_name

  publish:
    runs-on: ubuntu-latest
    needs: [test-package]
    if: github.event_name == 'push' && startsWith(github.ref, 'refs/tags/')
    steps:
      - uses: actions/download-artifact@v3
        with:
          name: dist
          path: dist/
      - uses: pypa/gh-action-pypi-publish@release/v1
        with:
          user: __token__
          password: ${{ secrets.PYPI_API_TOKEN }}

(Apologies for the lack of newlines. They seem to be removed by the syntax highlighter.)

To use the publish step, you will need to add a PyPI token to your encrypted secrets.

References

PEPs