You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Writing to Arrow vectors through Comet's ArrowWriter class only uses setSafe on the underlying vectors. This incurs a capacity and resize check for every write. For CometSparkToColumnarExec we know the sizes ahead of time, so this is unnecessary overhead when performing a lot of writes.
Describe the potential solution
For ArrowWriter we should:
Expose a way to set the capacities for column vectors. Note that setInitialCapacity on the underlying BaseFixedWidthVector doesn't actually allocate anything, so we may need to call allocateNew as well.
Extend ArrowWriter to have unsafe methods that call set on the underlying vectors, probably with assertions that we can elide in release builds.
Additional context
No response
The text was updated successfully, but these errors were encountered:
What is the problem the feature request solves?
Writing to Arrow vectors through Comet's
ArrowWriter
class only usessetSafe
on the underlying vectors. This incurs a capacity and resize check for every write. ForCometSparkToColumnarExec
we know the sizes ahead of time, so this is unnecessary overhead when performing a lot of writes.Describe the potential solution
For
ArrowWriter
we should:setInitialCapacity
on the underlyingBaseFixedWidthVector
doesn't actually allocate anything, so we may need to callallocateNew
as well.ArrowWriter
to have unsafe methods that callset
on the underlying vectors, probably with assertions that we can elide in release builds.Additional context
No response
The text was updated successfully, but these errors were encountered: