Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HellaSwag numbers? #31

Open
forresti opened this issue Jan 2, 2024 · 2 comments
Open

HellaSwag numbers? #31

forresti opened this issue Jan 2, 2024 · 2 comments

Comments

@forresti
Copy link

forresti commented Jan 2, 2024

Great work on this project!
In Table 20 of the LLAMA-2 paper, it says that LLAMA-2 gets 77.2 accuracy on HellaSwag. The LLAMA-2 paper isn't clear on whether this is zero-shot, but Table 20 of the Falcon paper confirms that it is zero-shot. However, in Table 25 of the Wanda paper, it says that LLAMA-2 Dense gets 57.17 accuracy on HellaSwag.

This seems like a large gap. Could you help me to understand the gap? E.g. are there multiple metrics, or multiple versions of the dataset, or something else that could cause a gap like this?

@Eric-mingjie
Copy link
Collaborator

Hi, thanks for the question. I think this might be related to the metrics, for the EleutherAI benchmark, it reports two metrics acc and acc_norm. In the paper, we report the acc metric. However, based on the log file from our experiments, it seems that for LLaMA-2-7B, the acc_norm number on HellaSwag is 76.00.

@forresti
Copy link
Author

forresti commented Jan 3, 2024

Thank you! Do you also have an acc_norm number for your experiments?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants