Clarification on JSON Lines Dataset for Multi-Task Fine-Tuning of Florence-2 #323

mariaalfaroc · 2024-10-14T16:50:24Z

mariaalfaroc
Oct 14, 2024

Hi everyone,

I came across the notebook discussing how to fine-tune Florence-2 for Object Detection, and I have a question regarding the structure of the JSON Lines dataset when fine-tuning for multiple tasks.

Specifically, how should the dataset be formatted if I want to fine-tune for more than one task?

Should the prefix field be a list of task string IDs, while the suffix field contains a list of strings that represent the answers for each task? For example, would the following structure be correct?

{
  "prefix": ["<OD>", "<OCR>"],
  "suffix": [
    "ace of hearts<loc_345><loc_315><loc_582><loc_721>2 of hearts<loc_709><loc_115><loc_888><loc_509>3 of hearts<loc_529><loc_228><loc_735><loc_613>4 of hearts<loc_98><loc_421><loc_415><loc_845>",
    "answer_for_ocr"
  ]
}

Additionally, is there a guide available on how to format datasets for each task?

I appreciate any guidance on this!

Thank you!

LinasKo · 2024-10-15T07:01:15Z

LinasKo
Oct 15, 2024
Maintainer

Hi @mariaalfaroc 👋

I don't have an answer, but I suggest looking at maestro. That's our newest project, aimed explicitly at fine-tuning multimodal models. Note that the next two weeks are intense, so they might not respond.

Here's where @SkalskiP talks about the data format for Florence 2: YouTube.

0 replies

mariaalfaroc · 2024-10-15T07:44:43Z

mariaalfaroc
Oct 15, 2024
Author

Hi @LinasKo,

Thanks for your response!

I've reviewed the maestro documentation and the YouTube tutorial. However, in both of them, the fine-tuning process for Florence-2 is focused on a single task at a time—Object Detection (OD) or Visual Question Answering (VQA).

For OD, a sample annotation from the dataset looks like this:

{
  "image": "IMG_20220316_165139_jpg.rf.e4c229a9128494d17992cbe88af575df.jpg",
  "prefix": "<OD>",
  "suffix": "9 of diamonds<loc_141><loc_18><loc_404><loc_465>jack of diamonds<loc_589><loc_120><loc_789><loc_454>queen of diamonds<loc_308><loc_482><loc_570><loc_966>king of diamonds<loc_549><loc_477><loc_777><loc_904>10 of diamonds<loc_396><loc_75><loc_613><loc_458>"
}

For VQA, it appears as:

{
  "image": "IMG_20220316_165139_jpg.rf.e4c229a9128494d17992cbe88af575df.jpg",
  "prefix": "<VQA> How many cards are in the image?",
  "suffix": "5"
}

What I'd like to know is: how should the annotations be structured if I want to fine-tune Florence-2 for both OD and VQA simultaneously? Would this structure be valid? Is this even possible?

{
  "image": "IMG_20220316_165139_jpg.rf.e4c229a9128494d17992cbe88af575df.jpg",
  "prefix": ["<OD>", "<VQA> How many cards are in the image?"],
  "suffix": [
    "9 of diamonds<loc_141><loc_18><loc_404><loc_465>jack of diamonds<loc_589><loc_120><loc_789><loc_454>queen of diamonds<loc_308><loc_482><loc_570><loc_966>king of diamonds<loc_549><loc_477><loc_777><loc_904>10 of diamonds<loc_396><loc_75><loc_613><loc_458>",
    "5"
  ]
}

Thank you so much again! :)

3 replies

LinasKo Oct 15, 2024
Maintainer

@SkalskiP, do we have an answer for this?

@mariaalfaroc, ping me next week - I might be able to ask him in-person 🙂

siddiquemu Oct 29, 2024

@LinasKo Any conclusion on this issue? I am also interested to know the JSON dataset structure to train florence2 for multiple tasks jointly.

LinasKo Oct 30, 2024
Maintainer

Thanks for the patience. I need another week, I'm afraid 😉
It's a busy time for us 🙂

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification on JSON Lines Dataset for Multi-Task Fine-Tuning of Florence-2 #323

{{title}}

Replies: 2 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Clarification on JSON Lines Dataset for Multi-Task Fine-Tuning of Florence-2 #323

mariaalfaroc Oct 14, 2024

Replies: 2 comments · 3 replies

LinasKo Oct 15, 2024 Maintainer

mariaalfaroc Oct 15, 2024 Author

LinasKo Oct 15, 2024 Maintainer

siddiquemu Oct 29, 2024

LinasKo Oct 30, 2024 Maintainer

mariaalfaroc
Oct 14, 2024

Replies: 2 comments 3 replies

LinasKo
Oct 15, 2024
Maintainer

mariaalfaroc
Oct 15, 2024
Author

LinasKo Oct 15, 2024
Maintainer

LinasKo Oct 30, 2024
Maintainer