Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Implement bloom_filter_agg #987

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

mbutrovich
Copy link
Contributor

@mbutrovich mbutrovich commented Sep 30, 2024

Which issue does this PR close?

Closes #846.

Rationale for this change

What changes are included in this PR?

How are these changes tested?

@mbutrovich mbutrovich marked this pull request as draft September 30, 2024 19:39
fn state(&mut self) -> Result<Vec<ScalarValue>> {
// TODO(Matt): There might be a more efficient way to do this by transmuting since calling
// state() on an Accumulator is considered destructive.
let state_sv = ScalarValue::Binary(Some(self.state_as_bytes()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to avoid the copy, which may be too ugly , would be to store bloom filter data as an Option<>

So instead of

pub struct SparkBloomFilter {
    bits: SparkBitArray,
    num_hash_functions: u32,
}

Something like

pub struct SparkBloomFilter {
    bits: Option<SparkBitArray>
    num_hash_functions: u32,
}

And then you could basically use Option::take to take the value and leave a None in its place

let Some(bits) = self.bits.take() else {
  return Err(invalid state)
};

// do whatever you want now you have the owned `bits`

@codecov-commenter
Copy link

codecov-commenter commented Oct 2, 2024

Codecov Report

Attention: Patch coverage is 76.47059% with 4 lines in your changes missing coverage. Please review.

Project coverage is 34.41%. Comparing base (c3023c5) to head (bf22902).
Report is 19 commits behind head on main.

Files with missing lines Patch % Lines
.../scala/org/apache/comet/serde/QueryPlanSerde.scala 76.47% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main     #987      +/-   ##
============================================
+ Coverage     34.03%   34.41%   +0.38%     
- Complexity      875      889      +14     
============================================
  Files           112      112              
  Lines         43289    43428     +139     
  Branches       9572     9627      +55     
============================================
+ Hits          14734    14947     +213     
+ Misses        25521    25437      -84     
- Partials       3034     3044      +10     
Flag Coverage Δ
?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for bloom_filter_agg
3 participants