You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the _next_batch method of TrainPipelineSparseDist, we check whether the new dataloader_iter is the same as the original dataloader_iter. We proceed to fetch the next batch only if they are different. However, when we set persistent_workers=true in the dataloader, the dataloader_iter remains the same instance for each epoch. As a result, we can not get data when the epoch exceeds 1.
In the
_next_batch
method ofTrainPipelineSparseDist
, we check whether the newdataloader_iter
is the same as the originaldataloader_iter
. We proceed to fetch the next batch only if they are different. However, when we setpersistent_workers=true
in the dataloader, thedataloader_iter
remains the same instance for each epoch. As a result, we can not get data when the epoch exceeds 1.https://github.com/pytorch/torchrec/blob/main/torchrec/distributed/train_pipeline/train_pipelines.py#L578
The text was updated successfully, but these errors were encountered: