This page lists specific things that
- the Python packaging community wants
- are fairly well-scoped
- would happen much faster if the Packaging Working Group got funding to achieve them (through donations or grants/directed gifts)
Please contact the Packaging WG by emailing
[email protected]
to ask us to estimate how much one of
these improvements would cost; we'll get back to you within a few business
days.
This is roughly prioritized by urgency and impact, but is not a roadmap.
The packaging ecosystem relies heavily on access to project metadata. This data is currently difficult to access, because traditionally it was calculated "on demand" and only available by downloading the full distribution file. PEP 643 specifies a mechanism that allows projects to expose static metadata from source distributions where available (critically, project name and version are required to be static), and PEP 658 allows package indexes to expose metadata without requiring a download of the full distribution file.
These two PEPs are currently not implemented throughout the packaging ecosystem. In particular, Warehouse does not yet allow PEP 643 metadata to be uploaded, and it does not implement PEP 658 for either wheels or source distributions. And setuptools does not yet implement PEP 643 (while a full implementation involves complex backward compatibility questions, a simple implementation of static name and version metadata could be relatively straightforward).
If these two PEPs were implemented as described above, so that they achieve a critical mass of being available for a significant proportion of package queries, tools consuming package metadata, such as pip and poetry, could successfully use that data to simplify and optimise package handling, and adhoc tools would be able to reliably access package metadata without needing to implement complex download and build processes.
I am writing this proposal from the point of view of potential consumers of the relevant metadata. I have not reached out to the setuptools or warehouse projects to establish if they would be interested in having this work funded, but I believe that we need to find some way of speeding up adoption of these standards to allow the ecosystem to move forward.
PyTorch, TensorFlow, and many other Python packages (especially science packages) suffer from cross-platform installability problems, which affect both users and developers. Packagers and users prefer using built distributions (usually in the wheel format); publishing built distributions increases convenience for end users because source code is pre-compiled, which significantly reduces install time (e.g., from 10+ minutes to several seconds).
Supporting the multifarious Linux platforms is something we've been lagging on; we are still finishing up the rollout of manylinux2010 and recently approved the new standard manylinux2014. But even so, packagers will have to build their own wheels to release packages, which can be fiddly, brittle, and time-consuming.
We'd like help to:
- Fully implement & maintain conda-press Conda-press is a tool that takes conda packages and turns them into wheels, without recompiling. This makes it very fast to create a wheel out of an existing package. It usually works. However, there have been a variety of bug and maintainence issues that require more development (and perhaps a refactor) to address.
- Create a generic wheel-building service to make releases faster and more robust
We need funding for specification research and writing, backend and frontend development, testing, DevOps/infrastructure/platform services, user experience work, technical writing for end users, project management, and community outreach.
The packaging tool auditwheel
"is a command line tool to facilitate the creation of Python wheel
packages for Linux (containing pre-compiled binary extensions) that
are compatible with a wide variety of Linux distributions" and key
standards. It can inspect a wheel, checking whether it is
standards-compliant. It can also repair a wheel. If a wheel depends on
libraries that are not on the system, it can rewrite that wheel and inject
libraries needed, parsing and rewriting ELF data. It can also repair the
relevant manylinux
tag(s) on a wheel.
However, no such utility exists on Windows, and so package maintainers on Windows face trouble creating wheels and debugging their packages. And the similar utility for Mac OS does not share auditwheel's code and user interface.
Therefore, developers would like to add Windows and Mac support to auditwheel. Porting auditwheel to Windows would make it much easier to make Windows wheels, and porting it to macOS would reduce duplication on the packaging maintainers' side, and reduce the proliferation of quirky tools that individual package maintainers need to learn about.
A simpler and more consistent cross-platform workflow will make it easier for package maintainers to use generic off-the-shelf automation. More maintainers will be able to leverage available automation (GitHub Actions, Travis CI, Azure pipelines, and potentially a future PyPI wheel-building service) to speed up releases and reduce grunt work. Also, this will especially be useful for scientific programmers, since they often create Python applications or libraries that include binaries written in other languages, and wheel distributions of those packages are prone to complication.
We need funding for backend development, hardware, testing, continuous integration platform services, technical writing for end users, project management, and community outreach.
We need funding to ensure core packaging tools work well with each other;
currently they aren't seamlessly interoperable. See the integration-test
project. This will help us get
faster at testing and rolling out bugfixes and features for all Python
packaging and distribution tools:
well-known projects like pip
, virtualenv
, and wheel
, but also all the
downstream projects that depend on them.
The Python Package Index, a key platform for Python developers, has a browser interface, but most people use PyPI by hitting its API endpoints with client applications such as pip
. PyPI has a minimal download API that does not implement many features that users have requested. The lack of a full-featured download API in Warehouse (the PyPI codebase) blocks many improvements, including:
- Light-bandwidth metadata-only API
calls and JSON
standardization
that would enable better downloads, installations, dependency resolution
features, and troubleshooting for
pip
and other clients - RSS feeds that other platforms could reuse to get PyPI updates in user tooling
- Security notification feeds
- Caching for the bandersnatch mirroring client
We'd like to architect and implement a new Warehouse download API to support these features, and deprecate and decommission the old endpoints. This requires backend development work, technical writing, user experience research, and publicity and coordination work within Python's community.
There is a part of the Python standard library
called distutils
, and some
users directly use it. We want users to instead switch to the supported
toolchain, which uses
setuptools
, and we want to move all the functionality from distutils
into
setuptools
. This requires backend development work, technical writing,
project management, and publicity work within Python's community.
The documentation for setuptools has grown messily over time and is difficult to browse and navigate. Also, the legacy documentation for distutils and the current setuptools docs heavily overlap in content. These references thus trip up even experienced developers who want to understand these fundamental utilities.
We need funding for several weeks of technical writer work and developer review to:
- de-duplicate the distutils and setuptools documentation, making the latter independent of the former
- re-organize the setuptools documentation
Reproducible builds allow
developers to independently verify that a distributed software package was not
tampered with. Since a considerable number of the Python packages use
setuptools
, adding support for reproducible builds to this build backend
can help to improve security in Python ecosystem as a whole.
Some preliminary works that can help structuring this activity are available (see
pypa/setuptools#2133,
pypa/setuptools#1512 and
pypa/wheel#362), however the
effort was never concluded. Funding would be used finalize the development of
this feature.
Import tasks are:
- Support
SOURCE_DATA_EPOCH
environment variable for both sdists and wheels. - Make both sdist and wheels independent of umask.
- Ensure that C/C++-extensions compiled with setuptools are reproducible.
- Document the process of verifying a sdist or wheel.
This project might require coordination with other tools in the ecosystem
(e.g. wheel
).
If we audit and update PyPI metadata for existing projects based on already-uploaded artifacts, we can publish information about what packages depend on each other and on certain environments, and ensure a high-quality API for many tools to reuse and build upon. The current PyPI upload API relies on the upload client extracting the metadata and supplying it with the first upload request, and that isn't a valid assumption for older upload clients. Currently, our constraint is a combination of developer time, compute resources, and privileged backend database access; funding would break this bottleneck.
User experience research, and UX and development implementation work, would make it easier for packagers to create configuration files. We aim to use the UX research work from improvements in pip's user experience and build on them to improve the larger experience of packaging for Python in general.
Our packaging ecosystem relies on a particular structured data format (classifiers) to indicate a package's legal license. However, our current system allows for ambiguity that makes some downstream data display incoherent or very difficult, and doesn't allow for some license specificity that downstream consumers need (Libraries.io and similar projects). Fixing this is a fairly small project, involving Python development, public communications, project management, and potentially a few hours of legal counsel for review.
pip
currently uses requirements.txt
to specify dependencies; it can specify
versions of packages but not hashes. The newer pipfile
format can include hashes, which some users
prefer. But pip
doesn't yet
support pipfile
, so many users are
blocked from using hashes to better secure their Python runtimes. We have made
some progress toward standardizing an interoperable lockfile
format, but we need to finish that
design standardization and consensus-gathering
work
and implement it in pip
, pipenv
, and related tools. Other attempts reached the PEP
stage 12, but ultimately were rejected. We'd need Python
engineering work and project management to develop and deploy this.
Right now, there are ways for package maintainers to test and share draft versions of their upcoming releases, but they cause friction and confusion. So we want to add staged releases -- a temporary state that a release can be in, where PyPI has it and can evaluate it, but hasn't published it yet.
This will:
- let project owners/maintainers preview/test how their package metadata displays on the website, and review where their fresh releases are out of compliance with site and interoperability requirements (preventing the problem of maintainers wanting to re-upload removed files)
- help cross-platform package maintainers coordinate dozens of wheels built on multiple machines for simultaneous release
- Provide an interoperability check for toolchain developers, and a testing site for people learning packaging
- Simplify packagers' upload configuration files
- reduce complexity that currently forces maintainers to use confusing "dev" or prerelease version numbers
- Improve security of package uploads, by allowing maintainers to scope upload API tokens to the newly staged package
- Prevent package name conflicts
- Streamline infrastructure maintenance and confusing documentation by letting us take down the separate test.pypi.org staging site
- Provide pre-release warnings to maintainers of packages that fail metadata checks (such as rejecting or warning for packages without Python requirements metadata, or manylinux wheels that fail auditwheel checks -- as we increase the packaging ecology's strictness regarding metadata standards compliance, during the intermediate period where we're warning maintainers/owners about failing strictness checks but not yet blocking releases on those new stricter checks, the package preview feature will help us provide soft warnings.
We'll need database support for understanding the release state ("is this published or not"), user experience and developer support, and testing, security, infrastructure, and project management support.
It's difficult to roll out new features gradually to PyPI's test site or to selected test users. A feature flag system would help us do targeted outreach to particular groups of users, deploy more confidently, and roll back changes when needed. We'd need user experience, front and backend engineer, data analytics, and project management support to develop and deploy this.
Python packagers who need help currently create Sourceforge and GitHub tickets, email mailing lists, tweet at maintainers, and so on. A unified user support ticket system, integrated into Warehouse, would:
- help managers, entrepreneurs, and academics reserve specific package names
- support username changes
- give users a reporting system to quickly flag malware and spam
- provide a transfer system for abandoned/unmaintained projects
- reduce work for PyPI's core developers who currently have to sift through user support issues to find bug reports and feature requests
- enable PyPI admins to better delegate support and moderation work to volunteers
We need funding for backend and frontend development, testing and security checks, DevOps/infrastructure/platform services (including API/email integration), user experience work, technical writing for end users, project management, and community outreach.
Python packaging tools that interact with package indexes, such as pip (pypa/pip#4475) and twine (pypa/twine#362), currently only have simple authentication support to secure private sources, such as basic access authentication. Open source tool maintainers acknowledge that, when using third-party indices, sometimes organisational policies require stronger authentication methods, such as single sign-on. We believe it’s beneficial to develop a pluggable Python library that can be depended by the packaging tools to provide additional authentication methods. But we lack both the use case and domain knowledge in the area. We are looking for funding and expertise support from organisations.
We are interested in developing a shared interface and implementation for various alternative authentication methods. Support can be developed for both tools (and maybe more), so organisations can choose to install them to be able to use e.g. Kerberos to secure their private package indexes. The work involved would include development, research, project management, and technical writing work towards the following tasks:
- Survey various authentication methods, and how they can be implemented as a pluggable library.
- Develop an interface that tools (e.g. pip) can implement to detect authentication method support, and call into the library that provides it.
- Develop and maintain libraries that implement the various auth methods for users to install when support is needed.
The wheel project, the official binary distribution format for Python,
is approaching version
1.0. The
milestone includes
Provide a public API: "Since there is clearly a need for wheel to
function as a library as well, a public API should be defined and
documented." The stability
implied by the 1.x version number, and the public API, will improve
other tools' ability to call and reuse wheel
, thus reducing
duplication and improving other toolmakers' ability to move faster and
maintain their codebases more easily. We would like support for design
and implementation, community coordination, and technical writing.
To mitigate account takeover attacks, where attackers upload malicious code in existing popular packages, we need to continue improving our support for MFA and use of API tokens instead of password-based auth. This work would involve research, development, and technical writing to finalise and implement existing proposals including:
- Require API Tokens for upload if MFA is enabled (upload currently bypasses MFA)
- Improve 2FA Account Recovery request fulfillment process
- other issues tagged
tokens
Once these technical prerequisites are satisfied, we would be able to revisit discussion of MFA policy - including encouraging MFA requirements for popular packages, or even mandating it for all users as some other package registries are considering. This subsequent work would involve additional development, project management, and community engagement as we determine and implement authentication policy, including ongoing user support to handle a much greater frequency of account recovery requests.
To scale up our anti-abuse moderation and help package maintainers with security response, we need to be able to, for instance, mark a release as deprecated or a project as unsupported. This means we need a generic system to add, edit, and remove administrative attributes ("flags" or "statuses") to individual projects and releases. We need support to do the architectural design to implement this. (See notes from this meeting.)
To keep PyPI's users secure, we want to give them
an opt-in communication channel to hear about security vulnerabilities for the packages they use. Implementing this would also give us
architectural support to warn or prevent pip
users who try to
install a PyPI package that's been found to be broken or malware. We
need funding for user experience work, development, testing,
infrastructure, potentially platform services (e.g., SMS), and community
outreach.
Recent research on weaknesses in the npm
supply chain
identified 2,818 maintainer email addresses at expired domains, affecting 8,494 packages.
Such weaknesses can be mitigated by multi-factor authentication, but generally make
targeted account hijacking trivially easy.
Because registration expiry dates are public via whois records, PyPI could warn maintainers with email addresses at soon-to-expire domains. As a further enhancement, PyPI could block password resets via any email address at a domain which has expired and been re-registered after the 30 day renewal grace period. This gives defenders a natural time-based advantage over attackers.
Funding would be used for backend development, security engineering, project management, system administration, outreach to package maintainers, and ongoing operational costs.
Since anyone can upload a package to PyPI, malicious users might upload malware, which would then harm users. To mitigate this risk, PSF previously obtained funding to add some malware detection in Warehouse in late 2019, but the goals for the relevant milestone were more ambitious than funding allowed for. The malware detection system is currently in limbo: an interesting prototype with limited practical impact because of the astounding number of false-positives. To protect users from malware, we still need to:
- Make Malware Verdicts Auditable - Right now, verdicts are removed once the associated project/package is removed. We need to change the backend to retain past verdicts so we can evaluate and improve the efficacy of this system.
- Detect packages being published with typo'ish names - Add a typosquatting check.
- Implement a more robust malware detector - One current check relies on simple pattern matching with YARA. A better approach requires parsing the package code into an AST.
- YARA rules for setup.py not ignoring comments - Related to the above issue.
- Release a canonical dataset for developers to write their checks against, and iterate on to improve detection accuracy.
We also want to set up a partnership with VirusTotal or a similar third-party virus checking service during the check development to scan every uploaded package. Integration with a third-party virus scanner is low-hanging fruit that could move the needle on PyPI package security.
Funding would be used for backend development, security engineering, project management, system administration, and publicity to stakeholders. Ideally, AV integrations would be donated by the vendors.
Some TODOs that were on this page have now received funding!
(This is now funded and we hired people to work on this project. The new resolver is in beta.)
We're partway through a next-generation rewrite of the dependency resolver within pip, Python's package download and installation tool. The project ran into massive technical debt, but the refactoring is nearly finished and prototype functionality is in alpha now. (In-depth explanation by Sebastian Awwad of the problem & our approach, lead developer Pradyun Gedam's initial plan, 2017 status updates, and GitHub issue #988 tracking progress and June 2019 status update, and issue #6536 for planning rollout.)
Funding would support user experience, communications/publicity, and testing work (including developing robust testing/CI infrastructure) as well as core feature development and review.
We need to finish the resolver because so many other improvements are blocked on it:
- adding an "upgrade-all" command to pip
- warning when trying to download or build wheels from incompatible set of packages/requirements
- adding a no-implicit-upgrades strategy
- making PyPI and pip enforce metadata compliance more strictly
- warning the user when uninstalling a package that other packages depend on
- properly respecting constraints
- recording requested and installed extras
- option to show what versions of packages are currently available
- listing packages' dependencies and dependents on PyPI
- minimizing duplication of work between pip and pipenv
- better pipenv functionality
- package namespace support
- moving more code out of Python's standard library so we can release improvements faster
and it would fix so many dependency issues for our users:
- Django installation conflict
- cherrypy/six/cheroot installation conflict
- Spyder downgrade requirement
- boto3/bravado dependency failure
- Ansible/PyOpenSSL/cryptography failure
- extras installation failure
- extras upgrade failure
- breaking installed packages
- elasticsearch/requests failure
- hatch, another packaging tool
And in our larger ecology, this causes installation problems for:
- conda's compatibility with pip
- the Servo browser engine
- numpy and scipy
- Canonical's DevOps tool Juju
- a Cap'n Proto implementation
- toil, awscli, and boto3
- the Mozilla website & icalendar
- certbot, in the past and possibly the future
- TurboGears
- a JIRA API client library
- a WebSocket protocol test suite
- Robot Operating System tooling
This is now funded, thanks to the Chan Zuckerberg Initiative and Mozilla Open Source Support.
pip
's user experience needs to improve by providing
better error messages
and prompts, logs, output, and reporting, and becoming more consistent
across features, to fit the user's mental model better, make hairy
problems easier to untangle, and reduce unintended data loss. pip
's
maintainers have a list of TODOs and need funding so that user experience researchers, UX
designers, developers, and technical writers can spend dedicated time
addressing them.