-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for offline operation (e.g. using local copy of PyPA advisory repo as vulnerability service) #698
Comments
Thanks for the detailed report @riwoodward! We greatly appreciate it. Your understanding is correct: I'll let @di opine here as well, but here's my thoughts:
So TL;DR: I think we could do this if (1) we can get some kind of stability guarantees about the Advisory DB's format, and (2) doing so doesn't make our ultimate plans to integrate into |
I don't think we should try to integrate against the advisory DB directly, mostly for the reasons @woodruffw mentioned. I do think we could possibly support some kind of local file cache, but I think I need to understand the use case a bit more before I can say that would be a useful thing to add. Overall I don't think I really understand how or why this feature will be used... assuming you have a lockfile and a snapshot of an advisory database, nothing is going to change: at the time you're online to get the snapshot, you already know about all the possible vulnerabilities for the subset of dependencies you have. Any offline audit with these two things is always going to produce the same result. The main point of pip-audit is to find new vulnerabilities in dependencies you're using as they are discovered. The other use case is being aware of existing vulnerabilities when adopting new dependencies, but again, I don't see how you could be introducing new dependencies somewhere offline where you wouldn't be able to also query for vulnerabilities. Is the general goal just to remove a dependency on an external service? That's why we provide PyPI as a provider, because you already have a dependency on PyPI anyways for everything else you need for installation. |
Thanks both for the comments. As @di suggests, perhaps I can flesh out the real-world use case more clearly. In high-security environments, CI runners and build boxes etc are often kept offline. Dependencies that are required can be mirrored to a dedicated local mirror machine on the LAN. Pip-audit sounds ideal for "finding new vulnerabilities in dependencies as they're discovered" as di suggests, which could be achieved using e.g. scheduled CI pipelines where a job includes the vulnerability check. The problem, however, is: how can one keep the pip-audit informatoin source updated without internet access. Without an up-to-date vulnerability database, new vulnerabilites in a project won't be discovered and the developer won't be alerted automatically. Pip-audit is still useful to developers if they run the command manually during development on machines with Internet access, but I believe there is an opportunity to make this even more useful and add "offline" support. I like the "mirror the PyPA advisory database" approach to some internal LAN mirror since then you're only storing the vulnerability info. Mirroring the whole PyPI JSON API would take significantly more space, bandwidth etc and most of that wouldn't be useful data. If you're aware of a good open-source tool for mirroring the API by the way, please let me know as it seems only partially supported in most packages that do this (e.g. devpi). At present, pip-audit hard-codes the PyPI URL anyway so even a local JSON API mirror would still need work to implement pip-audit support. I understand the concerns @woodruffw raises to using PyPA's repo though. For now, I guess I'll just have to use my hacked approach at the start of this issue, as that does work well for me. But I'll keep on eye on the project in case this (or another offline method) becomes an official feature. Thanks again for the great tool too! |
I guess what I'm missing is: why does the check need to happen offline? Why can't it happen at the time you would "download" the cache, which you would need to be online for anyways? |
Banks and similar environments have dedicated security teams that have their own idea of safety and convenience. They might agree in establishing a very narrow channel to download configuration data, a vulnerability database, etc. Large organizations also have a policy to "accept risk". This might translate to having a vulnerability database which is a few days or a week old. Cyber security people are held accountable for their decisions, hence they want to stay in control—they define the price. They want to make sure that an attacker has an attack surface as minimal as possible. They're not totally wrong: A lot can happen with direct access to the Internet. Supply chain attacks being another enormous danger, followed by social engineering. Bottom-line: Air-gapped environments are real. Solutions are needed. See GitLab's security scanning offerings for an example. They didn't invent that themselves, banks and other customers pushed them. I know it, first-hand. |
I'm sympathetic to the idea that there are contexts/setups where these checks should happen offline, but I think the engineering points in #698 (comment) are still outstanding: we can't (reasonably) support an offline mode in |
We are executing GitLab tests offline and there is not pip-audit offline mode available for now. See pypa/pip-audit#698
Linking things together: #805 presents a similar need. |
Is your feature request related to a problem? Please describe.
Currently two vulnerability services are offered,
pypi
andosv
, but these are both based on pip-audit retrieving information from the Internet (e.g. with URLs specified in the code). This is a problem when needing to operate on offline machines or with limited Internet access. As it stands, I don't think pip-audit can operate without an available Internet conncection?Describe the solution you'd like
I would like an option to execute pip-audit using a locally available copy of the advisory database. For example, I could maintain a local mirror on my network for the PyPA advisory database repo (https://github.com/pypa/advisory-database), then when pip-audit needs to be run on any offline machine on my network, I could simply retrieve from the local mirror and pass the path for this to pip-audit for it to use as a vulnerability service. This would be particularly useful for offline / air-gapped CI systems.
I hacked together a quick implementation for this and it works well. I just modified the
query
function ofpip_audit/_service/pypi.py
to become as below, and passed the path to the local copy of PyPA advisory database repo as an env var (e.g.export PIPAUDITDB=~/advisory-database
)I guess to make this an officially supported feature, the path to the PyPA repo could be specified using the
-s SERVICE, --vulnerability-service SERVICE
arg? Or make another option beyondosv
andpypi
, e.g.pypa-repo
with another arg for the path to said repo?I note that offline indexes are already supported using the
--index-url
arg, so this could be complementary?If interested, I could put together a PR (i.e. taking the above approach and adding error handling, proper use of args, tidy up etc)? I wanted to see what you thought of the method / proposed approach first though.
Describe alternatives you've considered
Running a local PyPI mirror including the JSON advisory info could work but that would be considerably more effort and resource usage to achieve the same goal.
Additional context
Really great tool otherwise - thanks!
The text was updated successfully, but these errors were encountered: