Is Databricks auto-scaling supposed to work with joblib-spark? #45

pauljohn32 · 2023-01-05T16:53:52Z

joblib-spark will use the nodes that are on/available when cluster is started, but it never triggers the Databricks system to wake up another node, even when the one it is using is running on all cores.

I found some comments from 2020 that say auto-scaling with Spark is generally problematic, compared to auto-scaling in Azure itself. So maybe auto-scaling is not supposed to work.

Should it work?

kashishsehgal73 · 2023-08-18T05:31:11Z

looking at the source code, https://github.com/joblib/joblib-spark/blob/master/joblibspark/backend.py

if we pass n_jobs = -1,

in line 112, it reset the njobs from the spark context object return value in line 128 (which returns the number of active cores in the cluster)
it does not ask for more workers at all.....

So databricks would not get any signal or demand to autoscale,In short, joblibspark is not designed to autoscale databricks clusters at all.

But we should push for this feature as most spark loads are being moved to the cloud and autoscaling is integral part of cost saving and effiiciency

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is Databricks auto-scaling supposed to work with joblib-spark? #45

Is Databricks auto-scaling supposed to work with joblib-spark? #45

pauljohn32 commented Jan 5, 2023

kashishsehgal73 commented Aug 18, 2023

Is Databricks auto-scaling supposed to work with joblib-spark? #45

Is Databricks auto-scaling supposed to work with joblib-spark? #45

Comments

pauljohn32 commented Jan 5, 2023

kashishsehgal73 commented Aug 18, 2023