ESQL: TOP support for strings #113183

nik9000 · 2024-09-19T13:37:17Z

Adds support to the TOP aggregation for keyword and text field
types.

Closes #109849

Adds support to the `TOP` aggregation for `keyword` and `text` field types.

github-actions · 2024-09-19T13:37:31Z

Documentation preview:

✨ Changed pages

elasticsearchmachine · 2024-09-19T13:37:42Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

elasticsearchmachine · 2024-09-19T13:37:43Z

Hi @nik9000, I've created a changelog YAML for you.

…ytes

nik9000 · 2024-09-19T13:39:50Z

...lugin/esql/compute/src/main/java/org/elasticsearch/compute/data/sort/BucketedSortCommon.java

+/**
+ * Components common to BucketedSort implementations.
+ */
+class BucketedSortCommon implements Releasable {


I yanked this out because it looked like it'd be safe to share at least a little code. I didn't plug this into the X-BucketedSort classes yet. But I think it's just about the same thing.

nik9000 · 2024-09-19T13:42:49Z

...ompute/src/test/java/org/elasticsearch/compute/aggregation/TopIpAggregatorFunctionTests.java


+public class TopIpAggregatorFunctionTests extends AbstractTopBytesRefAggregatorFunctionTests {


I yanked the bytes behavior to a common class. It's tiny, but feels like it saves a bit of copy and paste and the compiler will tell you the variant bits.

nik9000 · 2024-09-19T14:23:01Z

buildkite run buildkite/docs-build-pr

ivancea

Looks good!

ivancea · 2024-09-19T14:10:35Z

...lugin/esql/compute/src/main/java/org/elasticsearch/compute/data/sort/BucketedSortCommon.java

+    long endIndex(long rootIndex) {
+        return rootIndex + bucketSize;
+    }
+
+    long requiredSize(long rootIndex) {
+        return rootIndex + bucketSize;
+    }


Should we merge those?

oh, huh, that makes sense. Will do.

ivancea · 2024-09-19T14:12:20Z

...ck/plugin/esql/compute/src/main/java/org/elasticsearch/compute/data/sort/IpBucketedSort.java

-        this.order = order;
-        this.bucketSize = bucketSize;
-        heapMode = new BitArray(0, bigArrays);
+        this.common = new BucketedSortCommon(bigArrays, order, bucketSize);


No inheritance? Shouldn't final methods be safe to use?

I sure could have inherited it. I started that way because it felt easier but the ctor with the sub-types and the closing and.... for that, at least, it felt easier to compose.

ivancea · 2024-09-19T14:22:23Z

.../esql/src/test/java/org/elasticsearch/xpack/esql/expression/function/aggregate/TopTests.java

+            if (DataType.isString(valueType) == false) {
+                continue;
+            }
+            suppliers.add(new TestCaseSupplier(List.of(valueType), () -> {


There's a MultiRowTestCaseSupplier.stringCases(), maybe use it with the other cases? It has a param for the expected DataType

ivancea · 2024-09-19T14:24:35Z

...gin/esql/compute/src/main/java/org/elasticsearch/compute/aggregation/X-TopAggregator.java.st

@@ -100,7 +103,13 @@ $endif$
        private final $Name$BucketedSort sort;

        private GroupingState(BigArrays bigArrays, int limit, boolean ascending) {
+$if(BytesRef)$
+            // TODO pass the breaker in from the DriverContext


⚠️✋👮 Is this TODO for this PR or for later? If for later, I foresee it will be there for a long time :hehe:

Yeah. I feel like I'll either have to do it in the next follow up or, well, it'll wait.

It'll be more consistent if we do it, but we do only ever use the request breaker so it is safe enough as it.

...sql/compute/src/test/java/org/elasticsearch/compute/data/sort/BytesRefBucketedSortTests.java

...gin/esql/compute/src/main/java/org/elasticsearch/compute/data/sort/BytesRefBucketedSort.java

ivancea · 2024-09-19T14:51:33Z

...gin/esql/compute/src/main/java/org/elasticsearch/compute/data/sort/BytesRefBucketedSort.java

+                    values.set(start + i, null);
+                }
+
+                // TODO: Make use of heap structures to faster iterate in order instead of copying and sorting


It's unrelated, but I'm thinking now: Isn't this still nlogn? Would this really be better over an in-place sort?

Saying this because we have this comment everywhere, and I'm not sure if it really can be done. Maybe I'm missing some trick

Iterating the heap is O(n), right? We aren't removing and re-heaping. We're just iterating in order.

Also it'd save a copy.

Iterating, yes. But sorting, it's still nlogn for a heap. Unless our heapify keeps it "sortable". But I'd say that would be slower.
To sort it, the heap tells us the min value. But then, the next candidates are the 2 children. Then, it would be 3 potential candidates (1 child + 2 grand-children), and so on. Worst case, like re-heapifying on every iteration

Sorry, yeah, it's still n log n. I presume it's better because it can rely on the heap property being there already. But I agree, it's probably not worth a ton of time on.

...gin/esql/compute/src/main/java/org/elasticsearch/compute/data/sort/BytesRefBucketedSort.java

elasticsearchmachine · 2024-09-19T16:08:23Z

Hi @nik9000, I've updated the changelog YAML for you.

nik9000 · 2024-09-19T16:52:26Z

@ivancea, I believe I've fixed the things you mentioned. Can you think of anything else that's left for this one?

…ytes

ivancea · 2024-09-20T10:26:53Z

...gin/esql/compute/src/main/java/org/elasticsearch/compute/data/sort/BytesRefBucketedSort.java

@@ -382,13 +377,10 @@ private BreakingBytesRefBuilder clearedBytesAt(long index) {

    @Override
    public final void close() {
-        Releasables.close(() -> {


Oh, I thought this was an interesting, safe trick. Some reason to swap to wrap()? For future cases

Mostly paranoia around if one fails. wrap will continue even on close.

I feel like i go to a lot of trouble to call these methods to make sure closing happens right. Partly that's paranoia - it can't fail. But partly that just so readers see it and say "the normal close code" - they see a call to wrap and stuff as "normal"

Adds support to the `TOP` aggregation for `keyword` and `text` field types. Closes elastic#109849

elasticsearchmachine · 2024-09-23T17:01:57Z

💚 Backport successful

Status	Branch	Result
✅	8.x

Adds support to the `TOP` aggregation for `keyword` and `text` field types. Closes #109849

nik9000 added 3 commits September 19, 2024 08:52

ESQL: TOP support for strings

d809372

Adds support to the `TOP` aggregation for `keyword` and `text` field types.

Merge branch 'main' into esql_top_bytes

4d5ca13

Use common

c58b20c

nik9000 added >feature :Analytics/ES|QL AKA ESQL v8.16.0 v9.0.0 labels Sep 19, 2024

nik9000 requested a review from a team as a code owner September 19, 2024 13:37

nik9000 requested a review from ivancea September 19, 2024 13:37

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Sep 19, 2024

Update docs/changelog/113183.yaml

4e82923

nik9000 added 2 commits September 19, 2024 09:39

Its fine

8574c0b

Merge remote-tracking branch 'nik9000/esql_top_bytes' into esql_top_b…

87c2aff

…ytes

nik9000 commented Sep 19, 2024

View reviewed changes

Merge branch 'main' into esql_top_bytes

024b23f

ivancea reviewed Sep 19, 2024

View reviewed changes

nik9000 added 4 commits September 19, 2024 11:24

Skip meta tests

7f09d0b

Fixup

5b78031

Liek this

84db9e4

Update docs/changelog/113183.yaml

0f51410

nik9000 added 2 commits September 19, 2024 16:01

Merge branch 'main' into esql_top_bytes

58c9266

Merge remote-tracking branch 'nik9000/esql_top_bytes' into esql_top_b…

b5d3906

…ytes

ivancea approved these changes Sep 20, 2024

View reviewed changes

Merge branch 'main' into esql_top_bytes

b601e78

nik9000 added auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) auto-backport-and-merge Automatically create backport pull requests and merge when ready labels Sep 23, 2024

elasticsearchmachine merged commit 58021c3 into elastic:main Sep 23, 2024
15 checks passed

nik9000 deleted the esql_top_bytes branch September 23, 2024 17:00

nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Sep 23, 2024

ESQL: TOP support for strings (elastic#113183)

7e78e51

Adds support to the `TOP` aggregation for `keyword` and `text` field types. Closes elastic#109849

nik9000 mentioned this pull request Sep 23, 2024

[8.x] ESQL: TOP support for strings (#113183) #113408

Merged

elasticsearchmachine pushed a commit that referenced this pull request Sep 25, 2024

ESQL: TOP support for strings (#113183) (#113408)

0e6bbb0

Adds support to the `TOP` aggregation for `keyword` and `text` field types. Closes #109849

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ESQL: TOP support for strings #113183

ESQL: TOP support for strings #113183

nik9000 commented Sep 19, 2024 •

edited

Loading

github-actions bot commented Sep 19, 2024

elasticsearchmachine commented Sep 19, 2024

elasticsearchmachine commented Sep 19, 2024

nik9000 Sep 19, 2024

nik9000 Sep 19, 2024

nik9000 commented Sep 19, 2024

ivancea left a comment

ivancea Sep 19, 2024

nik9000 Sep 19, 2024

ivancea Sep 19, 2024

nik9000 Sep 19, 2024

ivancea Sep 19, 2024

nik9000 Sep 19, 2024

ivancea Sep 19, 2024

nik9000 Sep 19, 2024

ivancea Sep 19, 2024 •

edited

Loading

nik9000 Sep 19, 2024

ivancea Sep 20, 2024

nik9000 Sep 23, 2024

elasticsearchmachine commented Sep 19, 2024

nik9000 commented Sep 19, 2024

ivancea Sep 20, 2024

nik9000 Sep 23, 2024

elasticsearchmachine commented Sep 23, 2024


		public class TopIpAggregatorFunctionTests extends AbstractTopBytesRefAggregatorFunctionTests {

ESQL: TOP support for strings #113183

ESQL: TOP support for strings #113183

Conversation

nik9000 commented Sep 19, 2024 • edited Loading

github-actions bot commented Sep 19, 2024

elasticsearchmachine commented Sep 19, 2024

elasticsearchmachine commented Sep 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nik9000 commented Sep 19, 2024

ivancea left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ivancea Sep 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elasticsearchmachine commented Sep 19, 2024

nik9000 commented Sep 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elasticsearchmachine commented Sep 23, 2024

💚 Backport successful

nik9000 commented Sep 19, 2024 •

edited

Loading

ivancea Sep 19, 2024 •

edited

Loading