fix: evaluate missing splits #1268

thivyanth · 2024-10-01T15:16:44Z

Addresses #1260

Checklist

Run tests locally to make sure nothing is broken using make test.
- 944 passed, 236 skipped, 55 warnings in 232.13s (0:03:52)
Run the formatter to format the code using make lint.

isaac-chung

@thivyanth Thanks for taking a first stab at it! For any new functionality, we usually add a test / test cases to help confirm that the added code works. Could you please add a test for this? Specifically the following cases:

Results exist, but no missing splits
Results exist, and 1 missing split.

I can review the PR afterwards.

KennethEnevoldsen

Looking good, but still a few things to adjust.

KennethEnevoldsen · 2024-10-03T09:29:29Z

mteb/evaluation/MTEB.py

-                if save_path.exists() and not overwrite_results:
-                    logger.info(
-                        f"{task.metadata.name} results already exists. Loading results from disk. Set overwrite_results=True to overwrite."
+                existing_results = self.load_existing_results(save_path)


you should use

MTEBResults.from_disk(path)

to load results.

(see line 354)

KennethEnevoldsen · 2024-10-03T09:30:58Z

mteb/evaluation/MTEB.py

+                        logger.info(
+                            f"{task.metadata.name} results exist but missing splits: {missing_splits}. Running evaluation for missing splits."
+                        )
+                        task_eval_splits = missing_splits


This will overwrite the existing file further down resulting in a new file where the splits are missing. You will have to merge the results objects.

KennethEnevoldsen · 2024-10-03T09:31:22Z

mteb/evaluation/MTEB.py

+    def load_existing_results(self, save_path):
+        if save_path.exists():
+            with open(save_path) as f:
+                return json.load(f)
+        return None


Suggested change

def load_existing_results(self, save_path):

if save_path.exists():

with open(save_path) as f:

return json.load(f)

return None

not needed see comment above

KennethEnevoldsen · 2024-10-03T09:32:51Z

mteb/evaluation/MTEB.py

+    def compare_splits_and_subsets(self, existing_results, task_eval_splits):
+        missing_splits = []
+        for split in task_eval_splits:
+            if split not in existing_results:
+                missing_splits.append(split)
+        return missing_splits


Suggested change

def compare_splits_and_subsets(self, existing_results, task_eval_splits):

missing_splits = []

for split in task_eval_splits:

if split not in existing_results:

missing_splits.append(split)

return missing_splits

@staticmethod

def compare_splits_and_subsets(existing_results: MTEBResults, task_eval_splits: list[str]) -> list[str]:

missing_splits = []

for split in task_eval_splits: # this will need to be adapted to MTEBResults object

if split not in existing_results:

missing_splits.append(split)

return missing_splits

thivyanth added 2 commits October 1, 2024 11:05

implement partial evaluation for missing splits

5b538f9

lint

7cec0d7

thivyanth mentioned this pull request Oct 1, 2024

Only skip benchmarking if split results are the same too #1260

Open

isaac-chung requested changes Oct 1, 2024

View reviewed changes

KennethEnevoldsen requested changes Oct 3, 2024

View reviewed changes

isaac-chung assigned thivyanth Oct 3, 2024

isaac-chung changed the title ~~Feature/missing-splits-evaluation~~ fix: evaluate missing splits Oct 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: evaluate missing splits #1268

fix: evaluate missing splits #1268

thivyanth commented Oct 1, 2024 •

edited by isaac-chung

Loading

isaac-chung left a comment

KennethEnevoldsen left a comment •

edited

Loading

KennethEnevoldsen Oct 3, 2024

KennethEnevoldsen Oct 3, 2024

KennethEnevoldsen Oct 3, 2024

KennethEnevoldsen Oct 3, 2024

fix: evaluate missing splits #1268

Are you sure you want to change the base?

fix: evaluate missing splits #1268

Conversation

thivyanth commented Oct 1, 2024 • edited by isaac-chung Loading

Checklist

isaac-chung left a comment

Choose a reason for hiding this comment

KennethEnevoldsen left a comment • edited Loading

Choose a reason for hiding this comment

KennethEnevoldsen Oct 3, 2024

Choose a reason for hiding this comment

KennethEnevoldsen Oct 3, 2024

Choose a reason for hiding this comment

KennethEnevoldsen Oct 3, 2024

Choose a reason for hiding this comment

KennethEnevoldsen Oct 3, 2024

Choose a reason for hiding this comment

thivyanth commented Oct 1, 2024 •

edited by isaac-chung

Loading

KennethEnevoldsen left a comment •

edited

Loading