Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix checkpoint #66

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

gameofdimension
Copy link

@gameofdimension gameofdimension commented Jun 11, 2024

  1. make use_reentrant=True explicit, because it will default to true if it is not assigned
  2. fix gradient checkpoint when it used with dropout turned on. if preserve_rng_state=False, the dropout will definitely not work, because gradient flows into wrong input cells

it can be shown with a failed 300 epoch training, with preserve_rng_state=False and use_checkpoint=True.

some samples at the end of one failed training:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant