Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Semantic Segmentation] - Add SegFormer Architecture, Weight Conversion Script and Presets #1883

Open
wants to merge 68 commits into
base: master
Choose a base branch
from

Conversation

DavidLandup0
Copy link
Collaborator

@DavidLandup0 DavidLandup0 commented Sep 26, 2024

This PR adds:

  • SegFormerBackbone and presets
  • SegFormerImageSegmenter and presets for Cityscapes and ADE20k (B0...B5 each)
  • Conversion script
  • Tests

Basic Usage

preprocessor = keras_hub.models.ImageSegmenterPreprocessor.from_preset("segformer_b0_ade20k_512")
segmenter = keras_hub.models.SegFormerImageSegmenter.from_preset("segformer_b0_ade20k_512")
segmenter(np.random.rand(1, 512, 512, 3))

End-to-end example with preprocessor:

import urllib.request 
from PIL import Image 
import numpy as np
import keras_hub

preprocessor = keras_hub.models.ImageSegmenterPreprocessor.from_preset("segformer_b0_ade20k_512")
segmenter = keras_hub.models.SegFormerImageSegmenter.from_preset("segformer_b0_ade20k_512")

img_url = "https://www.vanorohotel.com/wp-content/uploads/2021/07/drz-vanoro_6737.jpg"  
urllib.request.urlretrieve(img_url, "image.png") 
  
img = np.array(Image.open("image.png").resize((512, 512)))
img = np.expand_dims(img, 0)
inputs = preprocessor(img)
outs = segmenter(inputs)

image

Training Pipeline Example

A few examples in the notebook below:

  • Instantiation of Backbone and Segmenter with MiT Encoder
  • Running on input images
  • Training pipeline with TFDS on Oxford IIIT Pets as an example

https://colab.research.google.com/drive/1EBNg6nPKx_KzyRuQQtHZ_PG_Nsf2pAg2#scrollTo=V9Ub4NHKCx9e

After a few minutes of training from scratch (both encoder and segmenter):

image
image
image
image

@DavidLandup0 DavidLandup0 marked this pull request as draft September 26, 2024 13:05
@DavidLandup0 DavidLandup0 marked this pull request as ready for review September 29, 2024 09:52
@DavidLandup0 DavidLandup0 changed the title [Semantic Segmentation] - SegFormer (and MiTs) [Semantic Segmentation] - SegFormer (MixTransformer-based) Sep 29, 2024
@DavidLandup0 DavidLandup0 changed the title [Semantic Segmentation] - SegFormer (MixTransformer-based) [Semantic Segmentation] - Add SegFormer Sep 29, 2024
)

segformer_backbone = keras_hub.models.SegFormerBackbone(backbone=encoder)
segformer = keras_hub.models.SegFormerImageSegmenter(backbone=segformer_backbone, num_classes=4)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be updated to SegFormerImageSegmenter.from_preset() once the config.json files are uploaded to Kaggle

@divyashreepathihalli
Copy link
Collaborator

Is this PR ready for review?

@DavidLandup0
Copy link
Collaborator Author

@divyashreepathihalli In essence - yes, but there's some noise when running predictions using the converted weights (example image below of huggingface outputs vs our outputs). Looking into the numerics again, but the rest of the PR is ready for review :)

image

@DavidLandup0
Copy link
Collaborator Author

Found the issue - a transpose call shuffling the order of a latent in the encoder incorrectly. I'll get the presets up on Kaggle now

image

@DavidLandup0 DavidLandup0 changed the title [Semantic Segmentation] - Add SegFormer Architecture (+ configs for random initialization) [Semantic Segmentation] - Add SegFormer Architecture, Weight Conversion Script and Presets Oct 16, 2024
Copy link
Collaborator

@divyashreepathihalli divyashreepathihalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing!! the PR looks great. Just a few comments.

@@ -0,0 +1,99 @@
# Licensed under the Apache License, Version 2.0 (the "License");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove copyright banner

# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove copyright

"""

backbone_cls = SegFormerBackbone

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please also add the preprocessor flow : follow SAM for example.
this will involve adding an image converter, task preprocessor and then passing the preprocessor here to the task model.

@@ -0,0 +1,34 @@
# Licensed under the Apache License, Version 2.0 (the "License");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove copyright banner

@@ -0,0 +1,191 @@
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove copyright banner


"""

backbone_cls = MiTBackbone
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed

@@ -0,0 +1,83 @@
# Licensed under the Apache License, Version 2.0 (the "License");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove copyright on all files. I will stop commenting this

self.backbone = backbone
self.preprocessor = preprocessor
self.dropout = keras.layers.Dropout(0.1)
self.output_segmentation = keras.layers.Conv2D(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we call this the output_segmentation_head

image_converter_cls = SegFormerImageConverter

@preprocessing_function
def call(self, x, y=None, sample_weight=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about transformations to y - which would be masks in the training set. If the images are resized/transformed, so should the masks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants