Skip to content

Smile-SA/llm-benchmark-notebook

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 

Repository files navigation

LLMs benchmark notebook (code, math)

Project Description

Notebook structure for evaluating solutions (LLMs). Currently does 5 tests in Python Code and 5 tests in Math, hopefully can grow with translation, general question/answering and basic knowledge, speed (token/sec), and much more!

Currently, the LLMbenchmark notebook has 4 LLM eval/benchmark code: Yi (6B), Vicuna (7B), Mistral (7B), and Gemma (2B). More information and explanations on how to run each notebook and evals will be in the notebook’s first cell.

You can copy and recreate an eval for any LLM by copying the cells of a model and simply changing the model-specific parts with your model’s name. Using the YourOwnLLMbenchmark Kaggle Notebook as a template and by following the instructions in that notebook you may be able to recreate the eval but for the LLM of your choice.

Table of contents

  1. LLMbenchmarkEvals.ipynb: LLM eval/benchmark code for Yi (6B), Vicuna (7B), Mistral (7B), and Gemma (2B), returns the benchmark table for each. More information and details in the first cell of the notebook itself.

  2. LLMbenchmarkEvals-template.ipynb: Copy the cells and change the model-specific spots with your own LLM name that you want to test in this evaluation. More information and details in the first cell of the notebook itself.

Compatibility

  • Kaggle
  • Google Collab

Information and explanations on how to run each notebook will be in the top cell of each of the respective notebooks.

Credits and contact

  • Celeste Deudon
  • SMILE R&D

License

Apache 2.0

About

This is a notebook enabling to benchmark various LLMs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published