You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Analogous to concurrent.futures.ProcessPoolExecutor's max_tasks_per_child (added in cp3.11) and multiprocessing.pool.Pool's maxtasksperchild (added in cp3.2) keyword arguments, it would be great to be able to control after how many completed tasks a loky subprocess is flushed and replaced with a new subprocess.
Our dask workers are currently consistently facing loky.process_executor.TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker.:
Most likely caused by upstream memory leaks in lxml, hitting our 60GiB mem limit over time due to running the same loky pool subprocesses over 5+ hours. Periodically flushing the workers (spawn start method) will most likely fix these errors.
Many thanks!
The text was updated successfully, but these errors were encountered:
Hi 👋
Analogous to
concurrent.futures.ProcessPoolExecutor
'smax_tasks_per_child
(added in cp3.11) andmultiprocessing.pool.Pool
'smaxtasksperchild
(added in cp3.2) keyword arguments, it would be great to be able to control after how many completed tasks a loky subprocess is flushed and replaced with a new subprocess.Our dask workers are currently consistently facing
loky.process_executor.TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker.
:Most likely caused by upstream memory leaks in
lxml
, hitting our 60GiB mem limit over time due to running the same loky pool subprocesses over 5+ hours. Periodically flushing the workers (spawn
start method) will most likely fix these errors.Many thanks!
The text was updated successfully, but these errors were encountered: