Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements for resuming from terminated engine #2669

Open
vfdev-5 opened this issue Aug 23, 2022 · 1 comment
Open

Improvements for resuming from terminated engine #2669

vfdev-5 opened this issue Aug 23, 2022 · 1 comment

Comments

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Aug 23, 2022

Current Engine's behaviour when we resume the run from terminated state is the following:

from ignite.engine import Engine, Events
from ignite.utils import setup_logger, logging


def func(engine, batch):
    print(engine.state.epoch, engine.state.iteration, " | ", batch)


max_epochs = 4
data = range(10)
engine = Engine(func)
engine.logger = setup_logger("engine", level=logging.DEBUG)


@engine.on(Events.ITERATION_COMPLETED(once=14))
def terminate():
    print("-> terminate")
    engine.terminate()

engine.run(data, max_epochs=max_epochs)
engine.run(data, max_epochs=max_epochs)
print(engine.state.epoch, engine.state.iteration)
Output:
2022-08-23 15:42:07,681 engine INFO: Engine run starting with max_epochs=4.
2022-08-23 15:42:07,682 engine DEBUG: 0 | 0, Firing handlers for event Events.STARTED
2022-08-23 15:42:07,683 engine DEBUG: 1 | 0, Firing handlers for event Events.EPOCH_STARTED
2022-08-23 15:42:07,683 engine DEBUG: 1 | 0, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,685 engine DEBUG: 1 | 0, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,687 engine DEBUG: 1 | 1, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,688 engine DEBUG: 1 | 1, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,688 engine DEBUG: 1 | 1, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,689 engine DEBUG: 1 | 1, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,690 engine DEBUG: 1 | 2, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,691 engine DEBUG: 1 | 2, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,692 engine DEBUG: 1 | 2, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,692 engine DEBUG: 1 | 2, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,693 engine DEBUG: 1 | 3, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,693 engine DEBUG: 1 | 3, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,695 engine DEBUG: 1 | 3, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,696 engine DEBUG: 1 | 3, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,697 engine DEBUG: 1 | 4, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,698 engine DEBUG: 1 | 4, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,698 engine DEBUG: 1 | 4, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,700 engine DEBUG: 1 | 4, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,701 engine DEBUG: 1 | 5, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,702 engine DEBUG: 1 | 5, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,703 engine DEBUG: 1 | 5, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,704 engine DEBUG: 1 | 5, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,706 engine DEBUG: 1 | 6, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,707 engine DEBUG: 1 | 6, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,708 engine DEBUG: 1 | 6, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,708 engine DEBUG: 1 | 6, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,709 engine DEBUG: 1 | 7, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,711 engine DEBUG: 1 | 7, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,712 engine DEBUG: 1 | 7, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,714 engine DEBUG: 1 | 7, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,714 engine DEBUG: 1 | 8, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,716 engine DEBUG: 1 | 8, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,717 engine DEBUG: 1 | 8, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,717 engine DEBUG: 1 | 8, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,719 engine DEBUG: 1 | 9, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,719 engine DEBUG: 1 | 9, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,720 engine DEBUG: 1 | 9, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,720 engine DEBUG: 1 | 9, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,721 engine DEBUG: 1 | 10, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,722 engine DEBUG: 1 | 10, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,722 engine DEBUG: 1 | 10, Firing handlers for event Events.EPOCH_COMPLETED
2022-08-23 15:42:07,723 engine INFO: Epoch[1] Complete. Time taken: 00:00:00.039
2022-08-23 15:42:07,723 engine DEBUG: 2 | 10, Firing handlers for event Events.EPOCH_STARTED
2022-08-23 15:42:07,724 engine DEBUG: 2 | 10, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,724 engine DEBUG: 2 | 10, Firing handlers for event Events.DATALOADER_STOP_ITERATION
2022-08-23 15:42:07,725 engine DEBUG: 2 | 10, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,725 engine DEBUG: 2 | 11, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,726 engine DEBUG: 2 | 11, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,727 engine DEBUG: 2 | 11, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,727 engine DEBUG: 2 | 11, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,728 engine DEBUG: 2 | 12, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,728 engine DEBUG: 2 | 12, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,729 engine DEBUG: 2 | 12, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,729 engine DEBUG: 2 | 12, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,730 engine DEBUG: 2 | 13, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,733 engine DEBUG: 2 | 13, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,734 engine DEBUG: 2 | 13, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,734 engine DEBUG: 2 | 13, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,735 engine DEBUG: 2 | 14, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,736 engine DEBUG: 2 | 14, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,737 engine INFO: Terminate signaled. Engine will stop after current iteration is finished.
2022-08-23 15:42:07,737 engine DEBUG: 2 | 14, Firing handlers for event Events.TERMINATE_SINGLE_EPOCH
2022-08-23 15:42:07,738 engine DEBUG: 2 | 14, Firing handlers for event Events.TERMINATE
2022-08-23 15:42:07,739 engine INFO: Epoch[2] Complete. Time taken: 00:00:00.015
2022-08-23 15:42:07,739 engine DEBUG: 2 | 14, Firing handlers for event Events.COMPLETED
2022-08-23 15:42:07,740 engine INFO: Engine run complete. Time taken: 00:00:00.058
2022-08-23 15:42:07,741 engine INFO: Engine run resuming from iteration 14, epoch 2 until 4 epochs
2022-08-23 15:42:07,741 engine DEBUG: 2 | 14, Firing handlers for event Events.STARTED
2022-08-23 15:42:07,742 engine DEBUG: 3 | 14, Firing handlers for event Events.EPOCH_STARTED
2022-08-23 15:42:07,743 engine DEBUG: 3 | 14, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,743 engine DEBUG: 3 | 14, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,744 engine DEBUG: 3 | 15, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,745 engine DEBUG: 3 | 15, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,746 engine DEBUG: 3 | 15, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,746 engine DEBUG: 3 | 15, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,747 engine DEBUG: 3 | 16, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,748 engine DEBUG: 3 | 16, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,748 engine DEBUG: 3 | 16, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,749 engine DEBUG: 3 | 16, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,749 engine DEBUG: 3 | 17, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,750 engine DEBUG: 3 | 17, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,750 engine DEBUG: 3 | 17, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,751 engine DEBUG: 3 | 17, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,751 engine DEBUG: 3 | 18, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,754 engine DEBUG: 3 | 18, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,755 engine DEBUG: 3 | 18, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,755 engine DEBUG: 3 | 18, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,756 engine DEBUG: 3 | 19, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,757 engine DEBUG: 3 | 19, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,758 engine DEBUG: 3 | 19, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,758 engine DEBUG: 3 | 19, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,759 engine DEBUG: 3 | 20, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,759 engine DEBUG: 3 | 20, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,761 engine DEBUG: 3 | 20, Firing handlers for event Events.EPOCH_COMPLETED
2022-08-23 15:42:07,761 engine INFO: Epoch[3] Complete. Time taken: 00:00:00.018
2022-08-23 15:42:07,762 engine DEBUG: 4 | 20, Firing handlers for event Events.EPOCH_STARTED
2022-08-23 15:42:07,762 engine DEBUG: 4 | 20, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,763 engine DEBUG: 4 | 20, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,763 engine DEBUG: 4 | 21, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,764 engine DEBUG: 4 | 21, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,764 engine DEBUG: 4 | 21, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,765 engine DEBUG: 4 | 21, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,766 engine DEBUG: 4 | 22, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,766 engine DEBUG: 4 | 22, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,767 engine DEBUG: 4 | 22, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,768 engine DEBUG: 4 | 22, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,769 engine DEBUG: 4 | 23, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,769 engine DEBUG: 4 | 23, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,770 engine DEBUG: 4 | 23, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,771 engine DEBUG: 4 | 23, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,771 engine DEBUG: 4 | 24, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,772 engine DEBUG: 4 | 24, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,773 engine DEBUG: 4 | 24, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,774 engine DEBUG: 4 | 24, Firing handlers for event Events.DATALOADER_STOP_ITERATION
2022-08-23 15:42:07,774 engine DEBUG: 4 | 24, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,775 engine DEBUG: 4 | 25, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,776 engine DEBUG: 4 | 25, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,776 engine DEBUG: 4 | 25, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,777 engine DEBUG: 4 | 25, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,778 engine DEBUG: 4 | 26, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,779 engine DEBUG: 4 | 26, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,780 engine DEBUG: 4 | 26, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,781 engine DEBUG: 4 | 26, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,781 engine DEBUG: 4 | 27, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,782 engine DEBUG: 4 | 27, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,783 engine DEBUG: 4 | 27, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,783 engine DEBUG: 4 | 27, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,784 engine DEBUG: 4 | 28, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,784 engine DEBUG: 4 | 28, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,785 engine DEBUG: 4 | 28, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,785 engine DEBUG: 4 | 28, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,786 engine DEBUG: 4 | 29, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,787 engine DEBUG: 4 | 29, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,787 engine DEBUG: 4 | 29, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,788 engine DEBUG: 4 | 29, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,788 engine DEBUG: 4 | 30, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,789 engine DEBUG: 4 | 30, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,790 engine DEBUG: 4 | 30, Firing handlers for event Events.EPOCH_COMPLETED
2022-08-23 15:42:07,791 engine INFO: Epoch[4] Complete. Time taken: 00:00:00.029
2022-08-23 15:42:07,791 engine DEBUG: 4 | 30, Firing handlers for event Events.COMPLETED
2022-08-23 15:42:07,792 engine INFO: Engine run complete. Time taken: 00:00:00.051

1 1  |  0
1 2  |  1
1 3  |  2
1 4  |  3
1 5  |  4
1 6  |  5
1 7  |  6
1 8  |  7
1 9  |  8
1 10  |  9
2 11  |  0
2 12  |  1
2 13  |  2
2 14  |  3
-> terminate
3 15  |  0
3 16  |  1
3 17  |  2
3 18  |  3
3 19  |  4
3 20  |  5
4 21  |  6
4 22  |  7
4 23  |  8
4 24  |  9
4 25  |  0
4 26  |  1
4 27  |  2
4 28  |  3
4 29  |  4
4 30  |  5

(4, 30)

We can see that the engine stopped at epoch 2 and when engine is resumed, epochs are counted from 3. The epoch 3 does not count 10 iterations and ends earlier (after 6 iterations).

Maybe, an expected behaviour while resuming could be to count epochs from 2 and count 10 iterations before switching to the epoch 4 ?

@sadra-barikbin what do you think ?

EDIT:
a suggestion from #2645 (comment)

It's due to iter_counter local variable (in run_once_on_dataset) which has been initialized as state.iteration % epoch_length (3) and causes engine to stop 7 batches later when iter_counter == epoch_length.

Hence, iter_counter should start with zero.

@vfdev-5
Copy link
Collaborator Author

vfdev-5 commented Aug 29, 2022

I agree with Sadra's comment here: #2645 (comment)
that final iteration should be 34 == epoch_length * (max_epochs - 1) + (iteration_to_stop % epoch_length)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant