EleutherAI / lm-evaluation-harness Public

Notifications You must be signed in to change notification settings
Fork 1.9k
Star 7k

Code
Issues 315
Pull requests 91
Actions
Projects 1
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: EleutherAI/lm-evaluation-harness

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

315 Open 839 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Is there a good method to set the generation_kwargs for tasks so that different kv caches can be used for optimization?

#2506 opened Nov 19, 2024 by CoderChen01

Load dataset error

#2505 opened Nov 19, 2024 by junming-yang

french_bench_xnli dataset doesn't exist asking questions

For asking for clarification / support on library usage.

#2501 opened Nov 18, 2024 by jgcb00

[Performance significantly drop when increase the batch_size] bug

Something isn't working.

#2498 opened Nov 17, 2024 by yushengsu-thu

OOM Issues in MMLU Evaluation with lm_eval Using vllm as Backend

#2490 opened Nov 14, 2024 by wchen61

Overwrite default tasks

#2487 opened Nov 13, 2024 by jonoillar

Cannot reproduce LLaMA 3 8B on hendrycks_math validation

For validation of task implementations.

#2479 opened Nov 11, 2024 by liuxiaozhu01

Why Different Versions Make a Big Difference in HellaSwag zero-shot validation

For validation of task implementations.

#2478 opened Nov 11, 2024 by cquxl

MMLU-Pro Generation -> Generative

#2472 opened Nov 8, 2024 by rimashahbazyan

[QUESTION] Does lm-eval support direct evaluation from json?

#2470 opened Nov 8, 2024 by AaronZLT

Issue with Perplexity Score Using max_length > 1024 in lm-evaluation-harness

#2467 opened Nov 7, 2024 by fnusid

task load return error

#2466 opened Nov 7, 2024 by pod2c

Evaluate local tasks failed when using lm-eval --tasks <local-folder>

#2463 opened Nov 7, 2024 by ChenXiaoTemp

auto-detect batchsize finding too large a batchsize to fit in VRAM at the end when used with multi-gpu bug

Something isn't working.

#2458 opened Nov 5, 2024 by SmerkyG

how to compute tasks' metrics by their sub-tasks

#2448 opened Oct 31, 2024 by xiaobo-Chen

Why is using vLLM via lm-eval-harness slower than using vLLM directly? asking questions

For asking for clarification / support on library usage.

#2445 opened Oct 30, 2024 by WuXnkris

Wrong format of the few-shot examples in mgsm_direct tasks good first issue

Good for newcomers

validation

For validation of task implementations.

#2444 opened Oct 30, 2024 by zxcvuser

Improve preprocessing for paws-x and xnli tasks feature request

A feature that isn't implemented yet.

good first issue

Good for newcomers

#2442 opened Oct 30, 2024 by zxcvuser

vllm with tensor_parallel_mode is not working at all because of multiprocessing problem

#2431 opened Oct 28, 2024 by 95jinchul

Task winogrande does not work in 0-shot setting together with --apply_chat_template

#2430 opened Oct 27, 2024 by ArtemBiliksin

GPU with GGFU LLM

#2429 opened Oct 25, 2024 by Znbne

Llama3.1-8B-Instruct evaluation fails asking questions

For asking for clarification / support on library usage.

#2428 opened Oct 25, 2024 by Isaaclgz

test speculative decode accuracy asking questions

For asking for clarification / support on library usage.

#2424 opened Oct 24, 2024 by baoqianmagik

For asking for clarification / support on library usage.

#2423 opened Oct 24, 2024 by sorobedio

bbh_zeroshot fails during to a custom filter issue. bug

Something isn't working.

#2422 opened Oct 23, 2024 by shamanez

Previous 1 2 3 4 5 … 12 13 Next

Previous Next

ProTip! Type g i on any issue or pull request to go back to the issue listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly