HellaSwag numbers? #31

forresti · 2024-01-02T07:19:45Z

Great work on this project!
In Table 20 of the LLAMA-2 paper, it says that LLAMA-2 gets 77.2 accuracy on HellaSwag. The LLAMA-2 paper isn't clear on whether this is zero-shot, but Table 20 of the Falcon paper confirms that it is zero-shot. However, in Table 25 of the Wanda paper, it says that LLAMA-2 Dense gets 57.17 accuracy on HellaSwag.

This seems like a large gap. Could you help me to understand the gap? E.g. are there multiple metrics, or multiple versions of the dataset, or something else that could cause a gap like this?

Eric-mingjie · 2024-01-02T08:32:46Z

Hi, thanks for the question. I think this might be related to the metrics, for the EleutherAI benchmark, it reports two metrics acc and acc_norm. In the paper, we report the acc metric. However, based on the log file from our experiments, it seems that for LLaMA-2-7B, the acc_norm number on HellaSwag is 76.00.

forresti · 2024-01-03T08:40:52Z

Thank you! Do you also have an acc_norm number for your experiments?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HellaSwag numbers? #31

HellaSwag numbers? #31

forresti commented Jan 2, 2024

Eric-mingjie commented Jan 2, 2024

forresti commented Jan 3, 2024

HellaSwag numbers? #31

HellaSwag numbers? #31

Comments

forresti commented Jan 2, 2024

Eric-mingjie commented Jan 2, 2024

forresti commented Jan 3, 2024