Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Image Processor Fast RT-DETR #34354

Merged

Conversation

yonigozlan
Copy link
Member

@yonigozlan yonigozlan commented Oct 23, 2024

What does this PR do?

Adds a fast image processor for RT-DETR. Follows issue #33810.
This image processor is a result of this work on comparing different image processing method.

The diffs look bad but this PR is almost exclusively made up of # Copied from based on the fast image processor for DETR!

Implementation

See #34063

Usage

Except for the fact that it only returns torch tensors, this fast processor is fully compatible with the current one.
It can be instantiated through AutoImageProcessor with use_fast=True, or through the Class directly:

from transformers import AutoImageProcessor

processor = AutoImageProcessor.from_pretrained("PekingU/rtdetr_r50vd", use_fast=True)
from transformers import RTDetrImageProcessorFast

processor = RTDetrImageProcessorFast.from_pretrained("PekingU/rtdetr_r50vd")

Usage is the same as the current processor, except for the device kwarg:

from torchvision.io import read_image
images = torchvision.io.read_image(image_path)
processor = RTDetrImageProcessorFast.from_pretrained("PekingU/rtdetr_r50vd")
images_processed = processor(images , return_tensors="pt", device="cuda")

If device is not specified:

  • If the input images are tensors, the processing will be done on the device of the images.
  • If the inputs are PIL or Numpy images, the processing is done on CPU.

Performance gains

  • Average over 10% of the COCO 2017 validation dataset, with batch_size=1.

benchmark_results_full_pipeline_rt_detr_fast_single


  • Average over 10% of the COCO 2017 validation dataset, with batch_size=8.

benchmark_results_full_pipeline_rt_detr_fast_batched


Tests

  • The new image processor is tested on all the tests of the current processor.
  • I have also added a consistency test for processing on GPU vs CPU.

Who can review?

@ArthurZucker Pinging you directly as there is almost no "new" code here.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@yonigozlan yonigozlan marked this pull request as ready for review October 23, 2024 17:41
Copy link
Contributor

@molbap molbap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Just a few nits - saw you were removing the kwargs validation as well, can we also do that for detr?

Comment on lines +137 to +84
def prepare_coco_detection_annotation(
image,
target,
return_segmentation_masks: bool = False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return_segmentation_masks is unused here, could either reuse the one from detr_fast so we have a # Copied from statement here?

Copy link
Member Author

@yonigozlan yonigozlan Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's a bit weird, return_segmentation_masks is present in several places, but rt-detr does not support segmentation. I added a copied from here, with an Ignore copy for the segmentation part (as otherwise we would need to import/copy a function that would never be used.

EDIT: actually the Ignore copy makes the CI crash, not sure why, so I left it as is for now...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# Ignore copy does not work like this 😉 forget about it in this case!

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!
A few more nits, overall good, IMO all your graph should be placed in the documentation as well and not just on the PR description!

Comment on lines +137 to +84
def prepare_coco_detection_annotation(
image,
target,
return_segmentation_masks: bool = False,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# Ignore copy does not work like this 😉 forget about it in this case!

@yonigozlan yonigozlan force-pushed the add-image-processing-fast-rtdetr branch from b22bf32 to 2960859 Compare October 29, 2024 18:08
@yonigozlan
Copy link
Member Author

@ArthurZucker Refactored DETR and RT-DETR image processor fast to loop as few times as possible over annotations and images, and added some docs!

@ArthurZucker
Copy link
Collaborator

Does this improve the perf you saw? 😉

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating with me! 🤗

@yonigozlan
Copy link
Member Author

yonigozlan commented Oct 30, 2024

Thanks for iterating with me! 🤗

Thank you!

Does this improve the perf you saw? 😉

Hmm hard to tell, maybe very slightly when on GPU, as on CPU the potential gains are overshadowed by the processing time. But at least it's cleaner that way! :)

@yonigozlan yonigozlan merged commit 48872fd into huggingface:main Oct 30, 2024
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants