Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--skip-existing gives misleading feedback #1074

Open
1 task done
lsloan opened this issue Mar 20, 2024 · 7 comments
Open
1 task done

--skip-existing gives misleading feedback #1074

lsloan opened this issue Mar 20, 2024 · 7 comments

Comments

@lsloan
Copy link

lsloan commented Mar 20, 2024

Is there an existing issue for this?

  • I have searched the existing issues (open and closed), and could not find an existing issue

What keywords did you use to search existing issues?

skip

What operating system are you using?

macOS

If you selected 'Other', describe your Operating System here

No response

What version of Python are you running?

$ python --version
Python 3.11.8

How did you install twine? Did you use your operating system's package manager or pip or something else?

$ python3 -m pip install --upgrade twine

What version of twine do you have installed (include the complete output)

$ twine --version
twine version 5.0.0 (importlib-metadata: 7.0.2, keyring: 24.3.1, pkginfo: 1.10.0, requests: 2.31.0, requests-toolbelt: 1.0.0,
urllib3: 2.2.1)

Which package repository are you using?

test.pypi.org

Please describe the issue that you are experiencing

When I run twine upload with the --skip-existing flag, it says it skipped existing files and warnings, but not errors, are given. However, it also shows the colorful progress bar, which appears to indicate that it actually DID upload the skipped files.

Please list the steps required to reproduce this behaviour

  1.  $ python3 -m twine upload --repository testpypi --skip-existing dist/*
     Uploading mobyfubarbbq-0.0.1.post1-py3-none-any.whl
     100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 22.7/22.7 kB • 00:00 • 21.7 MB/s
     WARNING  Skipping mobyfubarbbq-0.0.1.post1-py3-none-any.whl because it appears to already exist                                  
     Uploading mobyfubarbbq-0.0.1rc2-py3-none-any.whl
     100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 22.6/22.6 kB • 00:00 • 42.0 MB/s
     Uploading mobyfubarbbq-0.0.1.post1.tar.gz
     100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 36.7/36.7 kB • 00:00 • 67.2 MB/s
     WARNING  Skipping mobyfubarbbq-0.0.1.post1.tar.gz because it appears to already exist                                            
     Uploading mobyfubarbbq-0.0.1rc2.tar.gz
     100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 36.7/36.7 kB • 00:00 • 82.9 MB/s

Anything else you'd like to mention?

I expect it to only show the WARNING lines without the Uploading and progress bar lines for mobyfubarbbq-0.0.1.post1.tar.gz and mobyfubarbbq-0.0.1.post1-py3-none-any.whl.

@lsloan lsloan added the bug label Mar 20, 2024
@sigmavirus24
Copy link
Member

We can only determine if something exists reliably if we attempt to upload it. That's why there's a progress bar. If we hide that and only show when successful that could work but I don't believe that provides any value to the user then. Additionally, I don't believe we are able to retroactively hide it but maybe the underlying library has improved since last I looked

@di
Copy link
Member

di commented Apr 19, 2024

We can only determine if something exists reliably if we attempt to upload it.

Is this true? package_is_uploaded uses the JSON API to determine if the file already exists:

twine/twine/repository.py

Lines 202 to 232 in 67e87ef

def package_is_uploaded(
self, package: package_file.PackageFile, bypass_cache: bool = False
) -> bool:
# NOTE(sigmavirus24): Not all indices are PyPI and pypi.io doesn't
# have a similar interface for finding the package versions.
if not self.url.startswith((LEGACY_PYPI, WAREHOUSE, OLD_WAREHOUSE)):
return False
safe_name = package.safe_name
releases = None
if not bypass_cache:
releases = self._releases_json_data.get(safe_name)
if releases is None:
url = f"{LEGACY_PYPI}pypi/{safe_name}/json"
headers = {"Accept": "application/json"}
response = self.session.get(url, headers=headers)
if response.status_code == 200:
releases = response.json()["releases"]
else:
releases = {}
self._releases_json_data[safe_name] = releases
packages = releases.get(package.metadata.version, [])
for uploaded_package in packages:
if uploaded_package["filename"] == package.basefilename:
return True
return False

and this happens before upload:

if upload_settings.skip_existing and repository.package_is_uploaded(package):
logger.warning(skip_message)
continue
resp = repository.upload(package)

My read is that we'd only need to upload if a file doesn't appear in the JSON response for a project, and that this would only fail to upload if the file once existed but had been deleted.

@sigmavirus24
Copy link
Member

With third party package indices there isn't a JSON API

@satmandu
Copy link

Does the upload url return a 200 response if the file has already been uploaded? Could that be used to avoid an upload if --skip-existing is set?

In Chromebrew we just check for a 200 in the output of curl -sI <url> for our upload URL to determine if a file has been uploaded (though this doesn't check to see if the file has been properly uploaded), and then avoid an upload if that is the case.

@sigmavirus24
Copy link
Member

It returns a 409 if I remember correctly, but more generally a 4xx response even if not a 409. This is also trivially confirmed by trying to reupload a release artifact without using this flag. I'm on a phone so I can't reproduce for you.

@satmandu
Copy link

Hmm, just to be parsimonious with bandwidth, would it make sense to just search the index for an uploaded file before attempting an upload? Or just use the distinction in return codes to just look for the headers returned for an upload URL before uploading? 409 vs something else should be sufficient to tell them apart if they are indeed different?

On the non-pypi Gitlab package registry we also use we can definitely just look for the 200 code... Hmm, maybe that's gitlab only?

@sigmavirus24
Copy link
Member

Every non-pypi registry will be different in my experience. Honestly, life would be simpler if there was a PyPI facade they could all use instead but alas the NIH is strong amongst a lot of these companies

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants