Anyone running self-hosted Spark + Dagster on K8s? #17713

AlejandroUPC · 2023-11-06T08:23:32Z

AlejandroUPC
Nov 6, 2023

I am curious about the problems you have had, if any, when running your Spark + Dagster on a k8s set up self-hosted.

So far I have been thinking of using google's spark operator but I am not sure how would this integrate into submitting jobs (given that is done through kubernetes API) https://github.com/GoogleCloudPlatform/spark-on-k8s-operator.

jrouly · 2024-11-08T00:31:06Z

jrouly
Nov 8, 2024

I'm wondering the same myself, at the moment. We've been running Dagster on EKS for years with great success.

A few years ago, we also used Spark on EKS orchestrated by Dagster and basically wrote ops that would execute and monitor spark-submit.

At the time, the primary Dagster/Spark integration was step launchers, which abstracted external processes like Spark behind Dagster idioms. You could effectively launch a Spark application as a step and I/O would be passed through IO managers. This was weird with Spark due to data volume concerns - we never really wanted to load a Spark data frame into main memory on a Dagster pod.

It seems like as recently as a week ago, Dagster has superseded (deprecated?) step launchers with Dagster Pipes, which act very much in the way we initially DIY'd our Spark integration.

The Dagster Pipes client is responsible for submitting the Spark application to the K8s API for execution.

Unfortunately, according to Dagster:

Spark applications are typically not containerized or executed on Kubernetes.

whereby they do not include a Spark+Kubernetes Pipes out of the box client. Just Databricks, AWS Glue, and AWS EMR (+ Serverless). A little disappointing. Perhaps the dagster-k8s pipes client is sufficient?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Anyone running self-hosted Spark + Dagster on K8s? #17713

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Anyone running self-hosted Spark + Dagster on K8s? #17713

AlejandroUPC Nov 6, 2023

Replies: 1 comment

jrouly Nov 8, 2024

AlejandroUPC
Nov 6, 2023

jrouly
Nov 8, 2024