You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We can see that the engine stopped at epoch 2 and when engine is resumed, epochs are counted from 3. The epoch 3 does not count 10 iterations and ends earlier (after 6 iterations).
Maybe, an expected behaviour while resuming could be to count epochs from 2 and count 10 iterations before switching to the epoch 4 ?
It's due to iter_counter local variable (in run_once_on_dataset) which has been initialized as state.iteration % epoch_length (3) and causes engine to stop 7 batches later when iter_counter == epoch_length.
Hence, iter_counter should start with zero.
The text was updated successfully, but these errors were encountered:
I agree with Sadra's comment here: #2645 (comment)
that final iteration should be 34 == epoch_length * (max_epochs - 1) + (iteration_to_stop % epoch_length)
Current Engine's behaviour when we resume the run from terminated state is the following:
Output:
We can see that the engine stopped at epoch 2 and when engine is resumed, epochs are counted from 3. The epoch 3 does not count 10 iterations and ends earlier (after 6 iterations).
Maybe, an expected behaviour while resuming could be to count epochs from 2 and count 10 iterations before switching to the epoch 4 ?
@sadra-barikbin what do you think ?
EDIT:
a suggestion from #2645 (comment)
The text was updated successfully, but these errors were encountered: