Replies: 1 comment 3 replies
-
Daft by default (without Ray) already uses all cores. Today, it does require partitioning in order to do so more effectively, but Daft will already make efficient use of CPUs for tasks like parallel data loading. We are working on a new execution model that make this even more efficient without partitioning (will be released in about a month or so). Daft + Ray also uses all cores, only if your data is partitioned. The main difference is that this mode also allows for out-of-core processing, and distributed processing. Using Ray introduces some overhead, but the overhead is worth it if the data is large. In summary, because we optimized for distributed computing, today Daft relies on partitioning to efficiently utilize resources on both the local and Ray runners. We have a new execution engine in the works which will make this easier without partitioning. |
Beta Was this translation helpful? Give feedback.
-
I am running some tests in Fabric with and without a ray cluster. Am I correct that default daft uses a single core (just like pandas) and daft + ray uses all available cores (by default). Like spark, do I need to partition the data to utilize the cores effectively ? (I did with and without partitioning and saw a big improvement).
Beta Was this translation helpful? Give feedback.
All reactions