Anyone running self-hosted Spark + Dagster on K8s? #17713
Replies: 1 comment
-
I'm wondering the same myself, at the moment. We've been running Dagster on EKS for years with great success. A few years ago, we also used Spark on EKS orchestrated by Dagster and basically wrote ops that would execute and monitor At the time, the primary Dagster/Spark integration was step launchers, which abstracted external processes like Spark behind Dagster idioms. You could effectively launch a Spark application as a step and I/O would be passed through IO managers. This was weird with Spark due to data volume concerns - we never really wanted to load a Spark data frame into main memory on a Dagster pod. It seems like as recently as a week ago, Dagster has superseded (deprecated?) step launchers with Dagster Pipes, which act very much in the way we initially DIY'd our Spark integration. The Dagster Pipes client is responsible for submitting the Spark application to the K8s API for execution. Unfortunately, according to Dagster:
whereby they do not include a Spark+Kubernetes Pipes out of the box client. Just Databricks, AWS Glue, and AWS EMR (+ Serverless). A little disappointing. Perhaps the dagster-k8s pipes client is sufficient? |
Beta Was this translation helpful? Give feedback.
-
I am curious about the problems you have had, if any, when running your Spark + Dagster on a k8s set up self-hosted.
So far I have been thinking of using google's spark operator but I am not sure how would this integrate into submitting jobs (given that is done through kubernetes API) https://github.com/GoogleCloudPlatform/spark-on-k8s-operator.
Beta Was this translation helpful? Give feedback.
All reactions