Fixing strided perplexity calculation for fixed-length models #34394

forrestdavis · 2024-10-24T20:08:29Z

What does this PR do?

Fixes an issue with perplexity of fixed length models with a strided window. The perplexity calculation assumes all batches are the same size. This calculates the average with different context sizes. The update is just to perplexity.md in the docs.

Fixes #34138

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

@ArthurZucker

* add support for non nested images and add tests * add tests error scenario * fix style * added single and no image to error tests

* fix onnx non-expotable inplace op * mistral, qwen2, qwen2_vl, starcoder2 * fixup copies

* fix right pad llavas * device mismatch

* no filter * no filter * no filter --------- Co-authored-by: ydshieh <[email protected]>

* better example * Update src/transformers/generation/configuration_utils.py * Update src/transformers/generation/logits_process.py * nits

* Fix bnb training test: compatibility with OPTSdpaAttention

* update * update --------- Co-authored-by: ydshieh <[email protected]>

cache

* fix * fix and test use_cache test * style * remove atol

0.21 Co-authored-by: ydshieh <[email protected]>

* update * update * update --------- Co-authored-by: ydshieh <[email protected]>

…ngface#34343) * Fix batch size handling in prediction_loop for DataLoaderShard Updated the prediction_loop method in the Trainer class to correctly handle batch size when using DataLoaderShard. This ensures that the batch size is retrieved from total_batch_size for distributed training scenarios, preventing TypeError related to NoneType during evaluation. * Update src/transformers/trainer.py Co-authored-by: Zach Mueller <[email protected]> * Applied the fix to remove unused imports --------- Co-authored-by: Zach Mueller <[email protected]>

* exclude fsdp from delay_optimizer_creation * add test case for trainer: FSDP mode and fp8 as mixed precision * rearrange imports * ruff formatted * adapt _init_fsdp to fp8 * use _init_fsdp only when resume_from_checkpoint * In case of FDP, self.layer will be CheckpointWrapper which has no len() method * delete _init_fsdp * solve conflict * fix conflict * make fixup

* Add _determine_best_metric and new saving logic. 1. Logic to determine the best logic was separated out from `_save_checkpoint`. 2. In `_maybe_log_save_evaluate`, whether or not a new best metric was achieved is determined after each evaluation, and if the save strategy is "best' then the TrainerControl is updated accordingly. * Added SaveStrategy. Same as IntervalStrategy, but with a new attribute called BEST. * IntervalStrategy -> SaveStrategy * IntervalStratgy -> SaveStrategy for save_strat. * Interval -> Save in docstring. * Updated docstring for save_strategy. * Added SaveStrategy and made according changes. `save_strategy` previously followed `IntervalStrategy` but now follows `SaveStrategy`. Changes were made accordingly to the code and the docstring. * Changes from `make fixup`. * Removed redundant metrics argument. * Added new test_save_best_checkpoint test. 1. Checks for both cases where `metric_for_best_model` is explicitly provided and when it's not provided. 2. The first case should have two checkpoints saved, whereas the second should have three saved. * Changed should_training_end saving logic. The Trainer saves a checkpoints at the end of training by default as long as `save_strategy != SaveStrategy.NO`. This condition was modified to include `SaveStrategy.BEST` because it would be counterintuitive that we'd only want the best checkpoint to be saved but the last one is as well. * `args.metric_for_best_model` default to loss. * Undo metric_for_best_model update. * Remove checking metric_for_best_model. * Added test cases for loss and no metric. * Added error for metric and changed default best_metric. * Removed unused import. * `new_best_metric` -> `is_new_best_metric` Co-authored-by: Arthur <[email protected]> * Applied `is_new_best_metric` to all. Changes were made for consistency and also to fix a potential bug. --------- Co-authored-by: Arthur <[email protected]> Co-authored-by: Zach Mueller <[email protected]>

…clude cache_position and attention_mask details (huggingface#34322) * [docs] update input documentation for MAMBA2 and MISTRAL models to include cache_position and attention_mask details * [docs] correct input documentation for MISTRAL model to reference `input_ids` instead of `decoder_input_ids` * [docs] clarify cache_position description in MISTRAL model documentation

…33980) * docs: ko: model_doc/barthez.md * feat: nmt draft --------- Co-authored-by: Steven Liu <[email protected]>

…ngface#34449) Enhance user experience using py-linting

…Arabic (huggingface#33034) * Add docs/source/ar/fast_tokenizers.md to Add_docs_source_ar_fast_tokenizers.md * Update _toctree.yml * Update _toctree.yml * Update docs/source/ar/_toctree.yml Co-authored-by: Abdullah Mohammed <[email protected]> * Update docs/source/ar/fast_tokenizers.md Co-authored-by: Abdullah Mohammed <[email protected]> * Update docs/source/ar/fast_tokenizers.md Co-authored-by: Abdullah Mohammed <[email protected]> * Update docs/source/ar/fast_tokenizers.md Co-authored-by: Abdullah Mohammed <[email protected]> * Update docs/source/ar/fast_tokenizers.md Co-authored-by: Abdullah Mohammed <[email protected]> * Update docs/source/ar/fast_tokenizers.md Co-authored-by: Abdullah Mohammed <[email protected]> * Update docs/source/ar/fast_tokenizers.md Co-authored-by: Abdullah Mohammed <[email protected]> * Update docs/source/ar/fast_tokenizers.md Co-authored-by: Abdullah Mohammed <[email protected]> * Update docs/source/ar/fast_tokenizers.md Co-authored-by: Abdullah Mohammed <[email protected]> * Update docs/source/ar/fast_tokenizers.md Co-authored-by: Abdullah Mohammed <[email protected]> * Update docs/source/ar/fast_tokenizers.md Co-authored-by: Abdullah Mohammed <[email protected]> --------- Co-authored-by: Abdullah Mohammed <[email protected]>

* enable average tokens across devices * reduce earlier in case model needs it * simplify if statement * reformat code to make ruff happy * add doc for argument: average_tokens_across_devices * cannot find world size when pytorch is unavailable * format code --------- Co-authored-by: Zach Mueller <[email protected]> Co-authored-by: Arthur <[email protected]>

* add depth postprocessing for GLPN * remove previous temp fix for glpn tests * Style changes for GLPN's `post_process_depth_estimation` Co-authored-by: Arthur <[email protected]> * additional style fix --------- Co-authored-by: Arthur <[email protected]>

* fix llavas * code style * green ci

* fix test * fix copies

* fix * fix mistral

…gface#34482) * use a tinymodel to test generation config which aviod timeout * remove tailing whitespace

…ng (huggingface#33200) * feat: Added int conversion and unwrapping * test: added tests for post_process_keypoint_detection of SuperPointImageProcessor * docs: changed docs to include post_process_keypoint_detection method and switched from opencv to matplotlib * test: changed test to not depend on SuperPointModel forward * test: added missing require_torch decorator * docs: changed pyplot parameters for the keypoints to be more visible in the example * tests: changed import torch location to make test_flax and test_tf * Revert "tests: changed import torch location to make test_flax and test_tf" This reverts commit 39b32a2. * tests: fixed import * chore: applied suggestions from code review Co-authored-by: NielsRogge <[email protected]> * tests: fixed import * tests: fixed import (bis) * tests: fixed import (ter) * feat: added choice of type for target_size and changed tests accordingly * docs: updated code snippet to reflect the addition of target size type choice in post process method * tests: fixed imports (...) * tests: fixed imports (...) * style: formatting file * docs: fixed typo from image[0] to image.size[0] * docs: added output image and fixed some tests * Update docs/source/en/model_doc/superpoint.md Co-authored-by: Pavel Iakubovskii <[email protected]> * fix: included SuperPointKeypointDescriptionOutput in TYPE_CHECKING if statement and changed tests results to reflect changes to SuperPoint from absolute keypoints coordinates to relative * docs: changed SuperPoint's docs to print output instead of just accessing * style: applied make style * docs: added missing output type and precision in docstring of post_process_keypoint_detection * perf: deleted loop to perform keypoint conversion in one statement * fix: moved keypoint conversion at the end of model forward * docs: changed SuperPointInterestPointDecoder to SuperPointKeypointDecoder class name and added relative (x, y) coordinates information to its method * fix: changed type hint * refactor: removed unnecessary brackets * revert: SuperPointKeypointDecoder to SuperPointInterestPointDecoder * Update docs/source/en/model_doc/superpoint.md Co-authored-by: Pavel Iakubovskii <[email protected]> --------- Co-authored-by: Steven Bucaille <[email protected]> Co-authored-by: NielsRogge <[email protected]> Co-authored-by: Pavel Iakubovskii <[email protected]>

* check * check * check * check * add docstring --------- Co-authored-by: ydshieh <[email protected]>

fix average NLL in perplexity.md

ArthurZucker

Thanks for updating

ArthurZucker · 2024-10-29T10:16:38Z

Could you resolve conflicts with main? 🤗

* Separator in regex * Standardize separator for relative path in auto generated message * open() encoding * Replace `\` on `os.path.abspath` --------- Co-authored-by: Arthur <[email protected]>

* fix regression * add test for torchao * expected output * better fix

Fix here is simpler

forrestdavis · 2024-10-29T13:19:46Z

Thanks! I see that there were changes to address this. If those are preferred that's fine with me. I think the proposal here is simpler, but that's subjective.

forrestdavis added 3 commits October 24, 2024 16:02

Fixing perplexity calculation

4e69c75

Adding back in tqdm

5c791d7

Adding back in a space from before

5c7500a

forrestdavis mentioned this pull request Oct 24, 2024

Incorrect average calculation in Perplexity of fixed-length models #34138

Open

4 tasks

yonigozlan and others added 26 commits October 24, 2024 20:00

Use non nested images and batched text Idefics2/3 (huggingface#34222)

940a6bd

* add support for non nested images and add tests * add tests error scenario * fix style * added single and no image to error tests

Fix onnx non-expotable inplace aten op (huggingface#34376)

5779bac

* fix onnx non-expotable inplace op * mistral, qwen2, qwen2_vl, starcoder2 * fixup copies

Fix right padding in LLaVA models (huggingface#34305)

9f365fe

* fix right pad llavas * device mismatch

no filter (huggingface#34391)

2238553

* no filter * no filter * no filter --------- Co-authored-by: ydshieh <[email protected]>

SynthID: better example (huggingface#34372)

8814043

* better example * Update src/transformers/generation/configuration_utils.py * Update src/transformers/generation/logits_process.py * nits

Tests: upgrade test_eager_matches_sdpa_generate (huggingface#34386)

186b8dc

Fix bnb training test failure (huggingface#34414)

e447185

* Fix bnb training test: compatibility with OPTSdpaAttention

Avoid check expected exception when it is on CUDA (huggingface#34408)

f73f5e6

* update * update --------- Co-authored-by: ydshieh <[email protected]>

Fix typos in agents_advanced.md (huggingface#34405)

6a62a6d

[docs] Cache implementations (huggingface#34325)

1d06379

cache

Fix pix2struct (huggingface#34374)

fddbd3c

* fix * fix and test use_cache test * style * remove atol

pin tensorflow_probability<0.22 in docker files (huggingface#34381)

fc465bb

0.21 Co-authored-by: ydshieh <[email protected]>

Tiny update after huggingface#34383 (huggingface#34404)

9360f18

* update * update * update --------- Co-authored-by: ydshieh <[email protected]>

🌐 [i18n-KO] Translated model_doc/barthez.md to Korean (huggingface#…

1f7539c

…33980) * docs: ko: model_doc/barthez.md * feat: nmt draft --------- Co-authored-by: Steven Liu <[email protected]>

Apply linting to the important code blocks to make it readable (huggi…

084e946

…ngface#34449) Enhance user experience using py-linting

feat: run benchmarks on A100 (huggingface#34287)

6cc4a67

LLaVA: latency issues (huggingface#34460)

fe76b60

* fix llavas * code style * green ci

Generation: fix test (huggingface#34369)

808d6c5

* fix test * fix copies

Fix CI (huggingface#34458)

63ca6d9

* fix * fix mistral

techkang and others added 4 commits October 29, 2024 09:39

use a tinymodel to test generation config which aviod timeout (huggin…

655bec2

…gface#34482) * use a tinymodel to test generation config which aviod timeout * remove tailing whitespace

Simplify running tests in a subprocess (huggingface#34213)

439334c

* check * check * check * check * add docstring --------- Co-authored-by: ydshieh <[email protected]>

Fix perplexity computation in perplexity.md (huggingface#34387)

626c610

fix average NLL in perplexity.md

ArthurZucker approved these changes Oct 29, 2024

View reviewed changes

hlky and others added 6 commits October 29, 2024 11:40

Fixes for Modular Converter on Windows (huggingface#34266)

9e3d704

* Separator in regex * Standardize separator for relative path in auto generated message * open() encoding * Replace `\` on `os.path.abspath` --------- Co-authored-by: Arthur <[email protected]>

Fix regression loading dtype (huggingface#34409)

004530a

* fix regression * add test for torchao * expected output * better fix

Fixing perplexity calculation

5235c6c

Adding back in tqdm

0cefb99

Adding back in a space from before

c97221d

Fix here is simpler

Removing excess parentheses

fedc243

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing strided perplexity calculation for fixed-length models #34394

Fixing strided perplexity calculation for fixed-length models #34394

forrestdavis commented Oct 24, 2024

ArthurZucker left a comment

ArthurZucker commented Oct 29, 2024

forrestdavis commented Oct 29, 2024

Fixing strided perplexity calculation for fixed-length models #34394

Are you sure you want to change the base?

Fixing strided perplexity calculation for fixed-length models #34394

Conversation

forrestdavis commented Oct 24, 2024

What does this PR do?

Before submitting

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker commented Oct 29, 2024

forrestdavis commented Oct 29, 2024