Skip to content

Commit

Permalink
Spatial search functions support multi-valued fields in compute engine (
Browse files Browse the repository at this point in the history
#112063)

* Using enum to control mv-predicate combinations with ANY or ALL

* Update docs/changelog/112063.yaml

* Fix changelog

* Refactored to use generic MvCombiner for more flexibility

This opens the door to combiners which work with more types than just boolean.

* Spotless, and disabled failing cartesian-point tests

Reported the failing cases at #112102

* Fix changelog with better summary

* Fix changelog with better summary and highlight text

* More spotless checks

* Remove low-value comment edit

* Refined MvCombiner to maintain state to deal with ST_CONTAINS

We have a special case in ST_CONTAINS in that lucenes triangle-tree implementation causes a situation where we need to reject contains results when there are other geometries that do not contain, but do intersect. This does not make sense from a pure geospatial perspective, but is a necessary consequence of the triangle-tree.

* Code review fixes, and fix for long doc-values MV

* Cleanup and fix fold() serialization

The fold was returning the intermediate ContainsResult, which cannot be serialized, instead of the correct final boolean result.

* Fix to multi-contains-multi case using BitArray

Since a multi-value field should be seen as an alternative to a geometry collection, it is insufficient to consider `ANY` for multi-value contains.

There are two approaches to this:
* Pre-build a geometry collection before converting to docValuesReader
* Maintain more state so we can assert that all components are contained within at least one of the field values

In an effort to minimize the changes to the generated code, the second approach was taken, and in fact was achievable without any changes at all to generated code.

However, this approach uses BigArrays, and does not get the correct one passed in. We need to change generated code a small bit to pass that in. We'll do that in a followup commit, but only if the alternative approach of creating a combined multi-value docValueReader is deemed more complex.

* Fix to multi-contains-multi case using GeometryCollection

This is an alternative approach to the previous one which used a BitArray to maintain state. Now we rely entirely on the internals of the DocValuesReader, and instead pre-create the GEOMETRYCOLLECTION of all the values in the multi-value field, so the triangle tree already considers the necessary combinations.

This approach moves the responsibility of iterating over the multi-value from the generated code into the non-generated code. In total the number of lines of code goes down, as fewer code paths are possible.

* Add addition fixed issue to changelog

* Added csv-spec tests for testing multi-valued geometries

* More tests for multi-value literals and one fix in fold()

* Fix bug with doc values extraction for non-indexed fields for centroid

Initially this work was about adding more tests, but discovered the bug at #112505. This commit fixes hat issue and expands the tests in a few areas:
* PhysicalPlanOptimizerTests expanded to verify that physical planning now considers if the field has doc-values
* SpatialPushDownPointsTestCase simple point-in-polygon tests expanded to consider ST_CENTROID as well, so that this behaviour is tested better there

* Note that this PR also fixes the doc-values field extract bug

This could have been fixed in a separate PR, but fixing it here was needed because the tests we wrote were failing without it.

* Multi-point test cases

* Added capability to prevent test failing on older clusters

Also removed a test that was sensitive to multi-node cluster results ordering

* Support BlockBuilder multivalue combining for ST_WITHIN

This is similar too, but simpler than the ST_CONTAINS solution.
In addition we added support for two fields to handle multi-values by using ST_CONTAINS surrogate with parameters swapped.

* Require capability for BWC tests

* Added multivalue fields tests for points

* Support multivalues for CONTAINS/WITHIN between two fields

This included taking into account that CONTAINS and WITHIN are not symmetrical in the case that the indexed geometry contains multiple intersecting polygons.

We need to document this behaviour.

* Small optimization to not create collections over single geometries

* Simplification of iterating over multi-value BytesRef

* Update docs/changelog/112063.yaml

* Update docs/changelog/112063.yaml

* Added back removed bug-fix link

* Merge conflict

* Support point doc-values for ST_WITHIN

* Simplify ST_CONTAINS to not consider intersecting polygons

This turns out to already be handled by combined doc-values

* Last CONTAINS evaluators moved to BlockBuilder approach

* Revert usage of MyCombiner in spatial predicates

Since ST_CONTAINS and ST_WITHIN could not use the ANY/ALL logic and needed to first collect all values into a single geometry before applying the predicate, we decided to move ST_INTERSECTS and ST_DISJOINT to this same approach so all spatial predicates have the same level of complexity and are easier to maintain.

* Revert ability to perform ANY/ALL predicate evaluations

This was only being used by the spatial predicates, and since they have reverted to doing this logic internally, we remove this capability from the code-base. If we wish to implement ANY/ALL logic in any other predicates, this could be brought back by reverting this commit.

* Simplify code paths for evaluators

Now that all evaluators use the Block.Builder approach we can move all the common code down to the SpatialRelations class.

This means that all static evaluator methods now contain only a single line of code, and all of them are identical between all four spatial functions, making comparison and maintenance much easier.

* Cleanup code for easier review

* Fixed bug with empty multivalue params and doc-values

This was failing a test in ENRICH

* After renaming the evaluator parameters we need to update the unit tests
  • Loading branch information
craigtaverner authored Sep 13, 2024
1 parent ac27e73 commit b8a24b1
Show file tree
Hide file tree
Showing 60 changed files with 2,799 additions and 2,163 deletions.
32 changes: 32 additions & 0 deletions docs/changelog/112063.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
pr: 112063
summary: Spatial search functions support multi-valued fields in compute engine
area: ES|QL
type: bug
issues:
- 112102
- 112505
- 110830
highlight:
title: "ESQL: Multi-value fields supported in Geospatial predicates"
body: |-
Supporting multi-value fields in `WHERE` predicates is a challenge due to not knowing whether `ALL` or `ANY`
of the values in the field should pass the predicate.
For example, should the field `age:[10,30]` pass the predicate `WHERE age>20` or not?
This ambiguity does not exist with the spatial predicates
`ST_INTERSECTS` and `ST_DISJOINT`, because the choice between `ANY` or `ALL`
is implied by the predicate itself.
Consider a predicate checking a field named `location` against a test geometry named `shape`:
* `ST_INTERSECTS(field, shape)` - true if `ANY` value can intersect the shape
* `ST_DISJOINT(field, shape)` - true only if `ALL` values are disjoint from the shape
This works even if the shape argument is itself a complex or compound geometry.
Similar logic exists for `ST_CONTAINS` and `ST_WITHIN` predicates, but these are not as easily solved
with `ANY` or `ALL`, because a collection of geometries contains another collection if each of the contained
geometries is within at least one of the containing geometries. Evaluating this requires that the multi-value
field is first combined into a single geometry before performing the predicate check.
* `ST_CONTAINS(field, shape)` - true if the combined geometry contains the shape
* `ST_WITHIN(field, shape)` - true if the combined geometry is within the shape
notable: false
Original file line number Diff line number Diff line change
Expand Up @@ -122,8 +122,19 @@ public static Tuple<Page, List<String>> loadPageFromCsv(URL source, Map<String,

record CsvColumn(String name, Type type, BuilderWrapper builderWrapper) implements Releasable {
void append(String stringValue) {
if (stringValue.startsWith("\"") && stringValue.endsWith("\"")) { // string value
stringValue = stringValue.substring(1, stringValue.length() - 1).replace(ESCAPED_COMMA_SEQUENCE, ",");
if (stringValue.startsWith("\"") && stringValue.endsWith("\"")) {
// string value
String[] mvStrings = stringValue.substring(1, stringValue.length() - 1).split("\",\\s*\"");
if (mvStrings.length > 1) {
builderWrapper().builder().beginPositionEntry();
for (String mvString : mvStrings) {
mvString = mvString.replace(ESCAPED_COMMA_SEQUENCE, ",");
builderWrapper().append().accept(mvString.length() == 0 ? null : type.convert(mvString));
}
builderWrapper().builder().endPositionEntry();
return;
}
stringValue = mvStrings[0].replace(ESCAPED_COMMA_SEQUENCE, ",");
} else if (stringValue.contains(",")) {// multi-value field
builderWrapper().builder().beginPositionEntry();

Expand Down Expand Up @@ -376,7 +387,20 @@ public static ExpectedResults loadCsvSpecValues(String csv) {
}
List<Object> listOfMvValues = new ArrayList<>();
for (String mvValue : multiValues) {
listOfMvValues.add(columnTypes.get(i).convert(mvValue.trim().replace(ESCAPED_COMMA_SEQUENCE, ",")));
try {
listOfMvValues.add(columnTypes.get(i).convert(mvValue.trim().replace(ESCAPED_COMMA_SEQUENCE, ",")));
} catch (IllegalArgumentException e) {
throw new IllegalArgumentException(
"Error parsing multi-value field ["
+ columnNames.get(i)
+ "] with value ["
+ mvValue
+ "] on row "
+ values.size(),
e
);

}
}
rowValues.add(listOfMvValues);
} else {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,8 @@ public class CsvTestsDataLoader {
private static final TestsDataset COUNTRIES_BBOX_WEB = new TestsDataset("countries_bbox_web");
private static final TestsDataset AIRPORT_CITY_BOUNDARIES = new TestsDataset("airport_city_boundaries");
private static final TestsDataset CARTESIAN_MULTIPOLYGONS = new TestsDataset("cartesian_multipolygons");
private static final TestsDataset MULTIVALUE_GEOMETRIES = new TestsDataset("multivalue_geometries");
private static final TestsDataset MULTIVALUE_POINTS = new TestsDataset("multivalue_points");
private static final TestsDataset DISTANCES = new TestsDataset("distances");
private static final TestsDataset K8S = new TestsDataset("k8s", "k8s-mappings.json", "k8s.csv").withSetting("k8s-settings.json");
private static final TestsDataset ADDRESSES = new TestsDataset("addresses");
Expand Down Expand Up @@ -104,6 +106,8 @@ public class CsvTestsDataLoader {
Map.entry(COUNTRIES_BBOX_WEB.indexName, COUNTRIES_BBOX_WEB),
Map.entry(AIRPORT_CITY_BOUNDARIES.indexName, AIRPORT_CITY_BOUNDARIES),
Map.entry(CARTESIAN_MULTIPOLYGONS.indexName, CARTESIAN_MULTIPOLYGONS),
Map.entry(MULTIVALUE_GEOMETRIES.indexName, MULTIVALUE_GEOMETRIES),
Map.entry(MULTIVALUE_POINTS.indexName, MULTIVALUE_POINTS),
Map.entry(DATE_NANOS.indexName, DATE_NANOS),
Map.entry(K8S.indexName, K8S),
Map.entry(DISTANCES.indexName, DISTANCES),
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
{
"properties": {
"abbrev": {
"type": "keyword"
},
"name": {
"type": "text"
},
"scalerank": {
"type": "integer"
},
"type": {
"type": "keyword"
},
"location": {
"type": "geo_point",
"index": false,
"doc_values": false
},
"country": {
"type": "keyword"
},
"city": {
"type": "keyword"
},
"city_location": {
"type": "geo_point"
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
{
"properties": {
"id": {
"type": "long"
},
"intersects": {
"type": "boolean"
},
"contains": {
"type": "boolean"
},
"shape": {
"type": "geo_shape"
},
"smaller": {
"type": "geo_shape"
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
{
"properties": {
"id": {
"type": "long"
},
"intersects": {
"type": "boolean"
},
"within": {
"type": "boolean"
},
"centroid": {
"type": "geo_point"
},
"location": {
"type": "geo_point"
},
"subset": {
"type": "geo_point"
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
id:l, intersects:boolean, contains:boolean, shape:geo_shape, smaller:geo_shape
0, true, true, ["GEOMETRYCOLLECTION(POLYGON ((-10 -10\, 0 -10\, 0 0\, -10 0\, -10 -10))\, POLYGON ((0 0\, 10 0\, 10 10\, 0 10\, 0 0)))"], ["GEOMETRYCOLLECTION(POLYGON ((-9 -9\, -1 -9\, -1 -1\, -9 -1\, -9 -9))\, POLYGON ((1 1\, 9 1\, 9 9\, 1 9\, 1 1)))"]
1, true, true, ["MULTIPOLYGON( ((-10 -10\, 0 -10\, 0 0\, -10 0\, -10 -10))\, ((0 0\, 10 0\, 10 10\, 0 10\, 0 0)))"], ["MULTIPOLYGON( ((-9 -9\, -1 -9\, -1 -1\, -9 -1\, -9 -9))\, ((1 1\, 9 1\, 9 9\, 1 9\, 1 1)))"]
2, true, true, ["POLYGON ((-15 -15\, 15 -15\, 15 15\, -15 15\, -15 -15))"], ["POLYGON ((-14 -14\, 14 -14\, 14 14\, -14 14\, -14 -14))"]
3, true, true, ["POLYGON ((-15 -15\, 15 -15\, 15 15\, -15 15\, -15 -15))", "POLYGON ((15 15\, 25 15\, 25 25\, 15 25\, 15 15))"], ["POLYGON ((-14 -14\, 14 -14\, 14 14\, -14 14\, -14 -14))", "POLYGON ((16 16\, 24 16\, 24 24\, 16 24\, 16 16))"]
4, true, true, ["POLYGON ((-10 -10\, 0 -10\, 0 0\, -10 0\, -10 -10))", "POLYGON ((0 0\, 10 0\, 10 10\, 0 10\, 0 0))"], ["POLYGON ((-9 -9\, -1 -9\, -1 -1\, -9 -1\, -9 -9))", "POLYGON ((1 1\, 9 1\, 9 9\, 1 9\, 1 1))"]
5, true, false, ["POLYGON ((-5 -5\, 5 -5\, 5 5\, -5 5\, -5 -5))"], ["POLYGON ((-4 -4\, 4 -4\, 4 4\, -4 4\, -4 -4))"]
6, true, false, ["POLYGON ((-5 -5\, 5 -5\, 5 5\, -5 5\, -5 -5))", "POLYGON ((15 15\, 25 15\, 25 25\, 15 25\, 15 15))"], ["POLYGON ((-4 -4\, 4 -4\, 4 4\, -4 4\, -4 -4))", "POLYGON ((16 16\, 24 16\, 24 24\, 16 24\, 16 16))"]
7, true, false, ["POLYGON ((-9 -9\, -1 -9\, -1 -1\, -9 -1\, -9 -9))", "POLYGON ((1 1\, 9 1\, 9 9\, 1 9\, 1 1))"], ["POLYGON ((-8 -8\, -2 -8\, -2 -2\, -8 -2\, -8 -8))", "POLYGON ((2 2\, 8 2\, 8 8\, 2 8\, 2 2))"]
8, false, false, ["POLYGON ((15 15\, 25 15\, 25 25\, 15 25\, 15 15))"], ["POLYGON ((16 16\, 24 16\, 24 24\, 16 24\, 16 16))"]
9, false, false, ["POLYGON ((-25 -25\, -15 -25\, -15 -15\, -25 -15\, -25 -25))", "POLYGON ((15 15\, 25 15\, 25 25\, 15 25\, 15 15))"], ["POLYGON ((-24 -24\, -16 -24\, -16 -16\, -24 -16\, -24 -24))", "POLYGON ((16 16\, 24 16\, 24 24\, 16 24\, 16 16))"]
10, true, false, ["POLYGON ((-15 -15\, 15 -15\, 15 15\, -15 15\, -15 -15))", "POLYGON ((5 5\, 15 5\, 15 15\, 5 15\, 5 5))"], ["POLYGON ((-14 -14\, 14 -14\, 14 14\, -14 14\, -14 -14))", "POLYGON ((6 6\, 14 6\, 14 14\, 6 14\, 6 6))"]
11, true, false, ["POLYGON ((-11 -11\, 1 -11\, 1 1\, -11 1\, -11 -11))", "POLYGON ((-1 -1\, 11 -1\, 11 11\, -1 11\, -1 -1))"], ["POLYGON ((-10 -10\, 0 -10\, 0 0\, -10 0\, -10 -10))", "POLYGON ((0 0\, 10 0\, 10 10\, 0 10\, 0 0))"]
Loading

0 comments on commit b8a24b1

Please sign in to comment.