fix: (1) Add `StatcanDialogueDatasetRetrieval` (2) Fix `DRESModel.encode_conversations` to allow list of dictionaries #779

xhluca · 2024-05-21T16:05:33Z

This pull request introduces two changes:

New dataset: StatcanDialogueDatasetRetrieval
Fix: Change DRESModel.encode_conversations to allow conversations composed of list of dictionaries, alongside lists of strings (e.g. Topicoqa). Also fix batch_size parameter in DRESModel.encode_conversations (b5141e8)

Checklist for adding MMTEB dataset

Reason for dataset addition:

Other

change to s2p: fix: (1) Add StatcanDialogueDatasetRetrieval (2) Fix DRESModel.encode_conversations to allow list of dictionaries #779 (comment)
pin to specific version of dataset: fix: (1) Add StatcanDialogueDatasetRetrieval (2) Fix DRESModel.encode_conversations to allow list of dictionaries #779 (comment)
Test: evaluation = MTEB(task_langs=["en", "de"]) # Only select datasets which are "en", "de" or "en-de"

Notes

I have tested that the dataset runs with the mteb package.

To test it, I have used the following code:

from mteb import MTEB
from mteb.evaluation.evaluators import DRESModel


class DummyModel:
    def encode(self, texts, *args, **kwargs):
        return texts


evaluation = MTEB(tasks=["StatcanDialogueDatasetRetrieval"])
evaluation.load_tasks_data()
sdd = evaluation.tasks[0]

q = sdd.queries["french"]["test"]["Q12210"]

doc_ids = list(sdd.relevant_docs["french"]["test"]["Q12210"].keys())
answer_1 = sdd.corpus["french"]["test"][doc_ids[0]]
answer_2 = sdd.corpus["french"]["test"][doc_ids[1]]


dres = DRESModel(model=DummyModel())

assert isinstance(q, list) is True, "Query is not a list"

encoded = dres.encode_conversations([q], batch_size=1)
print(encoded[0])
# user: bonjour; operator: Bonjour, je m'appelle Kelly C. Comment puis-je vous aider?; user: je cherches des données sur le secteur du porc canadien; user: je suis vraiment perdu dans le site; operator: Un instant; operator: Veuillez consulter l'hyperlien suivant: Production animale (filtre: porc) (https://www150.statcan.gc.ca/n1/fr/sujets/agriculture_et_alimentati; user: avez vous des données sur les exportations?; operator: un instant; user: en fait je veux mesurer l'importance de ce secteur en termes de production, de revenus, d'exportation et d'emploi.; operator: Activité commerciale internationale pour le code 1 du SH (https://www5.statcan.gc.ca/cimt-cicm/commodities-marchandises?lang=fra&ch +Live+animals+and+animal+products.&refMonth=7&refYr=2020&freq=6&countryId=999&usaState=0&provId=1&dataTransformation=0&searchStr=&monthStr=July); operator: et Activité commerciale internationale pour le code 2 du SH (https://www5.statcan.gc.ca/cimt-cicm/commodities-marchandises?lang=fra& +Live+animals+and+animal+products.&refMonth=7&refYr=2020&freq=6&countryId=999&usaState=0&provId=1&dataTransformation=0&searchStr=&monthStr=July); operator: Pour la production, je vous invite à consulter le premier hyperlien que j'ai partagé. Il y aurait des tableaux comme 32-10-0126-01 (https://w pour les exports, veuillez consulter les hyperliens ci-dessus; operator: pour les revenus et emplois, un instant; user: pour les exports, je vois des données mensuelles; user: est il possible d'avoir des données annuelles, un cumul?; operator: Oui, vous avez l'option de changer la fréquence; operator: Est-ce que vous voyez l'option?; user: exemple pour 2019, je ferais quoi pour avoir les données de toute l'année; operator: Vous devez changer l'année à 2019, et la fréquence à annuel. Vous devez ensuite extraire les données. Exemple: Tableau 980-0002 (https://w lang=fra&getSectionId()=1&dataTransformation=0&refYr=2019&refMonth=7&freq=12&countryId=0&getUsaState()=0&provId=1&retrieve=Extraire&country=null&trad; user: et le mois, je vais choisir quoi; operator: pour le revenu, veuillez consulter le tableau suivant: 32-10-0136-01 (https://www150.statcan.gc.ca/t1/tbl1/fr/tv.action?pid=3210013601&re; operator: Vous pouvez selectionner le mois que vous voulez.; user: super merci; user: pour l'emploi; user: etes vous avec moi?; operator: oui, un instant; user: si je veux connaitre la place du canada dans le monde en matière de porc, en termes de production et d'exportation; operator: Je suis toujours à la recherche de vos données. Merci de votre patience.; user: un gros merci

I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).

Looking at the scores (recall at k):

"recall_at_1": 0.03704,
"recall_at_3": 0.07407,
"recall_at_5": 0.09877,
"recall_at_10": 0.15895,
"recall_at_20": 0.17747,
"recall_at_100": 0.39738,
"recall_at_1000": 0.72222,

we can see that the recall at k from 1 to 1000 consistently, which means it is a fairly challenging task (r@10 is only 0.15) but non-random.

xhluca · 2024-05-21T16:07:14Z

@orionw @vaibhavad would love if you can review this PR! There's a few changes i want to add but feel free to leave feedback now if you wish!

orionw · 2024-05-21T16:20:59Z

The current code looks like a great start -- the remainder todos are just filling in the things marked TBD and adding results/points.

mteb/tasks/Retrieval/multilingual/StatcanDialogueDatasetRetrieval.py

xhluca · 2024-05-22T19:56:56Z

Seems like language filtering does not work, or i'm running the wrong command:

from mteb import MTEB
evaluation = MTEB(task_langs=["de"])
print(evaluation.available_tasks)  # StatcanDialogueDatasetRetrieval will appear

vaibhavad · 2024-05-22T20:10:25Z

@xhluca,

I don't think this is the correct API. It is show all tasks, regardless of category, language etc. Can you try evaluation.print_selected_tasks() instead?

xhluca · 2024-05-22T20:27:59Z

Thanks, seems print_selected_tasks works.

…s, alongside lists of strings (e.g. Topicoqa). Also fix batch_size parameter in encode_conversations

xhluca · 2024-05-22T23:14:35Z

@vaibhavad @orionw As discussed in this comment, a fixed was needed for DRESModel.encode_conversations (i.e. convert_conv_history_to_query) in order to allow a conversation to be represented as a list of dict rather than a list of string. This format is directly compatible with huggingface-style conversations (see this tutorial). See this commit for the details: b5141e8

Let me know if this fix makes sense (and whether I should tag this PR as bug fix or something else). Let me know if this change to DRESModel.encode_conversations should be classified under something else in 779.json.

orionw

Have some minor comments, but I like the changes.

No need to tag the PR (not sure we use that?) but definitely add points for the dataset, 1-2 for a bugfix, and also the reviewers.

orionw · 2024-05-23T18:39:16Z

mteb/tasks/Retrieval/multilingual/StatcanDialogueDatasetRetrieval.py

+        description="A Dataset for Retrieving Data Tables through Conversations with Genuine Intents, available in English and French.",
+        dataset={
+            "path": "McGill-NLP/statcan-dialogue-dataset-retrieval",
+            "revision": "v1.0",


This should be the git commit hash of the dataset, if possible.

mteb/evaluation/evaluators/RetrievalEvaluator.py

xhluca · 2024-05-24T01:39:41Z

Huh, seems tests failed after i merged changes from main to this branch...

orionw · 2024-05-24T14:26:23Z

Huh, seems tests failed after i merged changes from main to this branch...

I think those are fixed by #803 if you merge in main again, agree they weren't caused by this PR

orionw

Once the tests pass and @vaibhavad approves, will enable automerge

xhluca · 2024-05-25T01:35:41Z

@vaibhavad can you approve if everything has been covered?

vaibhavad · 2024-05-27T18:37:21Z

Approved and merged, thanks for the great work @xhluca!

xhluca and others added 3 commits May 17, 2024 20:23

WIP

92ebd7f

Merge branch 'embeddings-benchmark:main' into add-statcan

4e96794

Merge branch 'embeddings-benchmark:main' into add-statcan

eea434a

xhluca marked this pull request as ready for review May 21, 2024 16:07

orionw self-requested a review May 21, 2024 16:21

vaibhavad self-requested a review May 21, 2024 18:17

vaibhavad reviewed May 21, 2024

View reviewed changes

mteb/tasks/Retrieval/multilingual/StatcanDialogueDatasetRetrieval.py Outdated Show resolved Hide resolved

vaibhavad reviewed May 21, 2024

View reviewed changes

mteb/tasks/Retrieval/multilingual/StatcanDialogueDatasetRetrieval.py Show resolved Hide resolved

vaibhavad reviewed May 21, 2024

View reviewed changes

mteb/tasks/Retrieval/multilingual/StatcanDialogueDatasetRetrieval.py Outdated Show resolved Hide resolved

vaibhavad reviewed May 21, 2024

View reviewed changes

mteb/tasks/Retrieval/multilingual/StatcanDialogueDatasetRetrieval.py Outdated Show resolved Hide resolved

vaibhavad requested changes May 21, 2024

View reviewed changes

mteb/tasks/Retrieval/multilingual/StatcanDialogueDatasetRetrieval.py Show resolved Hide resolved

isaac-chung assigned vaibhavad May 22, 2024

Update metadata based on reviewer requested changes

abacfef

xhluca and others added 5 commits May 22, 2024 18:13

Finalize metadata

97eef03

add statcandialoguedatasetretrival to mteb.tasks.retrieval

9c4cc71

Merge branch 'embeddings-benchmark:main' into add-statcan

30ea9a6

Convert query from JSON string to dictionary (parsed using json)

e5932b9

Fix: DRESModel to allow conversations composed of list of dictionarie…

b5141e8

…s, alongside lists of strings (e.g. Topicoqa). Also fix batch_size parameter in encode_conversations

xhluca changed the title ~~[WIP] Add statcan dialogue dataset~~ [WIP] (1) Add StatcanDialogueDatasetRetrieval (2) Fix DRESModel.encode_conversations to allow list of dictionaries May 22, 2024

Add baseline results

8310df9

xhluca changed the title ~~[WIP] (1) Add StatcanDialogueDatasetRetrieval (2) Fix DRESModel.encode_conversations to allow list of dictionaries~~ fix: (1) Add StatcanDialogueDatasetRetrieval (2) Fix DRESModel.encode_conversations to allow list of dictionaries May 22, 2024

orionw requested changes May 23, 2024

View reviewed changes

xhluca added 2 commits May 23, 2024 21:33

Change revision to hash

9da3558

Add points

8a98530

Merge branch 'main' into add-statcan

56787a4

xhluca requested review from orionw and vaibhavad May 24, 2024 01:39

Fix incorrect object reference

994ed30

orionw approved these changes May 24, 2024

View reviewed changes

xhluca added 2 commits May 24, 2024 12:11

Merge branch 'embeddings-benchmark:main' into add-statcan

b3772fa

Merge branch 'embeddings-benchmark:main' into add-statcan

fcf017a

vaibhavad approved these changes May 27, 2024

View reviewed changes

vaibhavad merged commit 7943ff0 into embeddings-benchmark:main May 27, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: (1) Add `StatcanDialogueDatasetRetrieval` (2) Fix `DRESModel.encode_conversations` to allow list of dictionaries #779

fix: (1) Add `StatcanDialogueDatasetRetrieval` (2) Fix `DRESModel.encode_conversations` to allow list of dictionaries #779

xhluca commented May 21, 2024 •

edited by vaibhavad

Loading

xhluca commented May 21, 2024

orionw commented May 21, 2024

xhluca commented May 22, 2024

vaibhavad commented May 22, 2024 •

edited

Loading

xhluca commented May 22, 2024

xhluca commented May 22, 2024 •

edited

Loading

orionw left a comment

orionw May 23, 2024

xhluca commented May 24, 2024

orionw commented May 24, 2024

orionw left a comment •

edited

Loading

xhluca commented May 25, 2024

vaibhavad commented May 27, 2024

fix: (1) Add StatcanDialogueDatasetRetrieval (2) Fix DRESModel.encode_conversations to allow list of dictionaries #779

fix: (1) Add StatcanDialogueDatasetRetrieval (2) Fix DRESModel.encode_conversations to allow list of dictionaries #779

Conversation

xhluca commented May 21, 2024 • edited by vaibhavad Loading

Checklist for adding MMTEB dataset

Other

Notes

xhluca commented May 21, 2024

orionw commented May 21, 2024

xhluca commented May 22, 2024

vaibhavad commented May 22, 2024 • edited Loading

xhluca commented May 22, 2024

xhluca commented May 22, 2024 • edited Loading

orionw left a comment

Choose a reason for hiding this comment

orionw May 23, 2024

Choose a reason for hiding this comment

xhluca commented May 24, 2024

orionw commented May 24, 2024

orionw left a comment • edited Loading

Choose a reason for hiding this comment

xhluca commented May 25, 2024

vaibhavad commented May 27, 2024

fix: (1) Add `StatcanDialogueDatasetRetrieval` (2) Fix `DRESModel.encode_conversations` to allow list of dictionaries #779

fix: (1) Add `StatcanDialogueDatasetRetrieval` (2) Fix `DRESModel.encode_conversations` to allow list of dictionaries #779

xhluca commented May 21, 2024 •

edited by vaibhavad

Loading

vaibhavad commented May 22, 2024 •

edited

Loading

xhluca commented May 22, 2024 •

edited

Loading

orionw left a comment •

edited

Loading