Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Spatial search functions support multi-valued fields in compute engine (
#112063) * Using enum to control mv-predicate combinations with ANY or ALL * Update docs/changelog/112063.yaml * Fix changelog * Refactored to use generic MvCombiner for more flexibility This opens the door to combiners which work with more types than just boolean. * Spotless, and disabled failing cartesian-point tests Reported the failing cases at #112102 * Fix changelog with better summary * Fix changelog with better summary and highlight text * More spotless checks * Remove low-value comment edit * Refined MvCombiner to maintain state to deal with ST_CONTAINS We have a special case in ST_CONTAINS in that lucenes triangle-tree implementation causes a situation where we need to reject contains results when there are other geometries that do not contain, but do intersect. This does not make sense from a pure geospatial perspective, but is a necessary consequence of the triangle-tree. * Code review fixes, and fix for long doc-values MV * Cleanup and fix fold() serialization The fold was returning the intermediate ContainsResult, which cannot be serialized, instead of the correct final boolean result. * Fix to multi-contains-multi case using BitArray Since a multi-value field should be seen as an alternative to a geometry collection, it is insufficient to consider `ANY` for multi-value contains. There are two approaches to this: * Pre-build a geometry collection before converting to docValuesReader * Maintain more state so we can assert that all components are contained within at least one of the field values In an effort to minimize the changes to the generated code, the second approach was taken, and in fact was achievable without any changes at all to generated code. However, this approach uses BigArrays, and does not get the correct one passed in. We need to change generated code a small bit to pass that in. We'll do that in a followup commit, but only if the alternative approach of creating a combined multi-value docValueReader is deemed more complex. * Fix to multi-contains-multi case using GeometryCollection This is an alternative approach to the previous one which used a BitArray to maintain state. Now we rely entirely on the internals of the DocValuesReader, and instead pre-create the GEOMETRYCOLLECTION of all the values in the multi-value field, so the triangle tree already considers the necessary combinations. This approach moves the responsibility of iterating over the multi-value from the generated code into the non-generated code. In total the number of lines of code goes down, as fewer code paths are possible. * Add addition fixed issue to changelog * Added csv-spec tests for testing multi-valued geometries * More tests for multi-value literals and one fix in fold() * Fix bug with doc values extraction for non-indexed fields for centroid Initially this work was about adding more tests, but discovered the bug at #112505. This commit fixes hat issue and expands the tests in a few areas: * PhysicalPlanOptimizerTests expanded to verify that physical planning now considers if the field has doc-values * SpatialPushDownPointsTestCase simple point-in-polygon tests expanded to consider ST_CENTROID as well, so that this behaviour is tested better there * Note that this PR also fixes the doc-values field extract bug This could have been fixed in a separate PR, but fixing it here was needed because the tests we wrote were failing without it. * Multi-point test cases * Added capability to prevent test failing on older clusters Also removed a test that was sensitive to multi-node cluster results ordering * Support BlockBuilder multivalue combining for ST_WITHIN This is similar too, but simpler than the ST_CONTAINS solution. In addition we added support for two fields to handle multi-values by using ST_CONTAINS surrogate with parameters swapped. * Require capability for BWC tests * Added multivalue fields tests for points * Support multivalues for CONTAINS/WITHIN between two fields This included taking into account that CONTAINS and WITHIN are not symmetrical in the case that the indexed geometry contains multiple intersecting polygons. We need to document this behaviour. * Small optimization to not create collections over single geometries * Simplification of iterating over multi-value BytesRef * Update docs/changelog/112063.yaml * Update docs/changelog/112063.yaml * Added back removed bug-fix link * Merge conflict * Support point doc-values for ST_WITHIN * Simplify ST_CONTAINS to not consider intersecting polygons This turns out to already be handled by combined doc-values * Last CONTAINS evaluators moved to BlockBuilder approach * Revert usage of MyCombiner in spatial predicates Since ST_CONTAINS and ST_WITHIN could not use the ANY/ALL logic and needed to first collect all values into a single geometry before applying the predicate, we decided to move ST_INTERSECTS and ST_DISJOINT to this same approach so all spatial predicates have the same level of complexity and are easier to maintain. * Revert ability to perform ANY/ALL predicate evaluations This was only being used by the spatial predicates, and since they have reverted to doing this logic internally, we remove this capability from the code-base. If we wish to implement ANY/ALL logic in any other predicates, this could be brought back by reverting this commit. * Simplify code paths for evaluators Now that all evaluators use the Block.Builder approach we can move all the common code down to the SpatialRelations class. This means that all static evaluator methods now contain only a single line of code, and all of them are identical between all four spatial functions, making comparison and maintenance much easier. * Cleanup code for easier review * Fixed bug with empty multivalue params and doc-values This was failing a test in ENRICH * After renaming the evaluator parameters we need to update the unit tests
- Loading branch information