Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get image extensions when not present in --img-name-col column #3

Open
thompsonmj opened this issue Jun 5, 2024 · 3 comments
Open
Labels
enhancement New feature or request

Comments

@thompsonmj
Copy link
Contributor

The column used to write the filename may not have a file extension. A user might prefer to have images saved with extensions present in the filenames.

Another optional Boolean flag could be used to give the user the chance to specify something like:

  -e, --infer-extension  Infer the appropriate file extension if one is not present in the --img-name-col (default: False)

If this is switched on, some options mentioned in the discussion for #1 for doing this could be:

First, double check that there isn't a valid image extension in the --img-name-col to avoid writing a file with something like image.png.png (if there's an extension present and the user says -e, I can't imagine why they would have done so intentionally).

To detect an extension:

  • Use e.g. filetype to detect the appropriate extension based on the data itself and use that when writing the filename.
    or
  • Use the URL column to check for an extension and/or Content-Disposition header
@thompsonmj thompsonmj added the enhancement New feature or request label Jun 5, 2024
@egrace479
Copy link
Member

Another option would be response.headers['Content-Type']. For an image, this should return something along the lines of img/png, in which case a simple .split("/")[1] would give us the proper filetype.

@thompsonmj
Copy link
Contributor Author

That sounds like a good first place to look. For MIME type image/jpeg, we might want the extension to instead be .jpg? Otherwise of all the image types, I think we can be pretty sure that .split("/")[1] will work to get a good extension.

Is response.headers['Content-Type'] guaranteed to always be there though? Probably rare, but might want a backup check in case it's absent.

@johnbradley
Copy link

Another way servers convey the appropriate filename is via content disposition:
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Disposition#as_a_response_header_for_the_main_body

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants