Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failures to correct evaluate non-single letter answers #709

Open
jamesbraza opened this issue Nov 19, 2024 · 0 comments
Open

Failures to correct evaluate non-single letter answers #709

jamesbraza opened this issue Nov 19, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@jamesbraza
Copy link
Collaborator

For the qa_prompt:

Q: What is my office's zip code?

Options:
A) 94107
B) 94106
C) cheesecake
D) Insufficient information to answer this question
E) -8

And the qa_answer (where correct is A):

the answer is 94106 or 94107

The LLM evaluation says:

The single letter answer is B.

Which gets converted to "T", and declares it as LitQAEvaluation.INCORRECT.

Even though LitQAEvaluation.INCORRECT is actually the right output, we got it right here for the wrong reasons.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant