Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: evaluate missing splits #1268

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

thivyanth
Copy link

@thivyanth thivyanth commented Oct 1, 2024

Addresses #1260

Checklist

@Muennighoff @isaac-chung

  • Run tests locally to make sure nothing is broken using make test.
    • 944 passed, 236 skipped, 55 warnings in 232.13s (0:03:52)
  • Run the formatter to format the code using make lint.

Copy link
Collaborator

@isaac-chung isaac-chung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thivyanth Thanks for taking a first stab at it! For any new functionality, we usually add a test / test cases to help confirm that the added code works. Could you please add a test for this? Specifically the following cases:

  1. Results exist, but no missing splits
  2. Results exist, and 1 missing split.

I can review the PR afterwards.

Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, but still a few things to adjust.

if save_path.exists() and not overwrite_results:
logger.info(
f"{task.metadata.name} results already exists. Loading results from disk. Set overwrite_results=True to overwrite."
existing_results = self.load_existing_results(save_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should use

MTEBResults.from_disk(path)

to load results.

(see line 354)

logger.info(
f"{task.metadata.name} results exist but missing splits: {missing_splits}. Running evaluation for missing splits."
)
task_eval_splits = missing_splits
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will overwrite the existing file further down resulting in a new file where the splits are missing. You will have to merge the results objects.

Comment on lines +506 to +510
def load_existing_results(self, save_path):
if save_path.exists():
with open(save_path) as f:
return json.load(f)
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def load_existing_results(self, save_path):
if save_path.exists():
with open(save_path) as f:
return json.load(f)
return None

not needed see comment above

Comment on lines +512 to +517
def compare_splits_and_subsets(self, existing_results, task_eval_splits):
missing_splits = []
for split in task_eval_splits:
if split not in existing_results:
missing_splits.append(split)
return missing_splits
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def compare_splits_and_subsets(self, existing_results, task_eval_splits):
missing_splits = []
for split in task_eval_splits:
if split not in existing_results:
missing_splits.append(split)
return missing_splits
@staticmethod
def compare_splits_and_subsets(existing_results: MTEBResults, task_eval_splits: list[str]) -> list[str]:
missing_splits = []
for split in task_eval_splits: # this will need to be adapted to MTEBResults object
if split not in existing_results:
missing_splits.append(split)
return missing_splits

@isaac-chung isaac-chung changed the title Feature/missing-splits-evaluation fix: evaluate missing splits Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants