You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Great work on this project!
In Table 20 of the LLAMA-2 paper, it says that LLAMA-2 gets 77.2 accuracy on HellaSwag. The LLAMA-2 paper isn't clear on whether this is zero-shot, but Table 20 of the Falcon paper confirms that it is zero-shot. However, in Table 25 of the Wanda paper, it says that LLAMA-2 Dense gets 57.17 accuracy on HellaSwag.
This seems like a large gap. Could you help me to understand the gap? E.g. are there multiple metrics, or multiple versions of the dataset, or something else that could cause a gap like this?
The text was updated successfully, but these errors were encountered:
Hi, thanks for the question. I think this might be related to the metrics, for the EleutherAI benchmark, it reports two metrics acc and acc_norm. In the paper, we report the acc metric. However, based on the log file from our experiments, it seems that for LLaMA-2-7B, the acc_norm number on HellaSwag is 76.00.
Great work on this project!
In Table 20 of the LLAMA-2 paper, it says that LLAMA-2 gets 77.2 accuracy on HellaSwag. The LLAMA-2 paper isn't clear on whether this is zero-shot, but Table 20 of the Falcon paper confirms that it is zero-shot. However, in Table 25 of the Wanda paper, it says that LLAMA-2 Dense gets 57.17 accuracy on HellaSwag.
This seems like a large gap. Could you help me to understand the gap? E.g. are there multiple metrics, or multiple versions of the dataset, or something else that could cause a gap like this?
The text was updated successfully, but these errors were encountered: