-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose indexers #32
base: master
Are you sure you want to change the base?
Expose indexers #32
Conversation
@benbovy This is not ready yet and mostly meant for discussing if and how we'd expose indexers. But feel free to have a look and leave comments as soon as you see fit. |
I agree that it's valuable to provide users with some API so that they can do something with the indices. Actually, I think that we could rename A possible signature would look like: ds.xoak.query(
indexers: Dict,
return_distance: bool = True,
query_kwargs: Optional[Dict] = None,
**indexer_kwargs
) (or alternatively expand Likewise, we could later extend the API with other common methods like |
I'll reduce this to just providing a query method for now. Passing arbitrary kwargs adds a lot of complexity because we loose an easy way of knowing about the structure of the data returned by then indexer, then. |
Should we wait for this PR to be merged before moving the repository to xarray-contrib? |
Do as you like. I can easily reopen this PR once it's moved.
|
Doc build fails with unrelated "Command killed due to excessive memory consumption". The rest is (finally) good for a review. I've decided to skip the more elaborate examples for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great @willirath! Overall that looks good to me, I've just left a few minor comments.
@@ -64,6 +72,34 @@ def query_brute_force(dataset, dataset_dims_shape, indexer, indexer_dims_shape, | |||
return dataset.isel(indexers=pos_indexers) | |||
|
|||
|
|||
def query_brute_force_indexers( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could avoid duplicating this function and have query_brute_force
simply return pos_indexers
.
@@ -224,16 +224,13 @@ def _get_pos_indexers(self, indices, indexers): | |||
|
|||
return pos_indexers | |||
|
|||
def sel( | |||
def query( | |||
self, indexers: Mapping[Hashable, Any] = None, **indexers_kwargs: Any | |||
) -> Union[xr.Dataset, xr.DataArray]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the return type is rather Dict[Hashable, Variable]
.
It would be nice if query
would eventually return the distances too. Maybe returning Tuple[DataArray, DataArray]
would make sense instead of trying to return both indexers and distances in the same xarray object. This is closer to what the underlying indexes usually return. Also DataArray
(even without any coordinate) is more common than Variable
. We can save this for later, though.
@@ -16,6 +16,14 @@ def test_s2point(geo_dataset, geo_indexer, geo_expected): | |||
xr.testing.assert_equal(ds_sel.load(), geo_expected.load()) | |||
|
|||
|
|||
def test_s2point_via_query(geo_dataset, geo_indexer, geo_expected): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This unnecessarily adds a lot of tests via the parametrized fixtures, but it's probably fine for now.
@@ -13,6 +13,15 @@ def test_scipy_kdtree(xyz_dataset, xyz_indexer, xyz_expected): | |||
xr.testing.assert_equal(ds_sel.load(), xyz_expected.load()) | |||
|
|||
|
|||
def test_scipy_kdtree_via_indexer(xyz_dataset, xyz_indexer, xyz_expected): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def test_scipy_kdtree_via_indexer(xyz_dataset, xyz_indexer, xyz_expected): | |
def test_scipy_kdtree_via_query(xyz_dataset, xyz_indexer, xyz_expected): |
@@ -7,3 +7,4 @@ Examples | |||
introduction | |||
dask_support | |||
custom_indexes | |||
query |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pooch
is missing in environment.yml
to load the xarray tutorial dataset in this example notebook.
@@ -38,10 +38,13 @@ def test_sklearn_balltree_options(): | |||
assert ds.xoak._index._index_adapter._index_options == {'leaf_size': 10} | |||
|
|||
|
|||
def test_sklearn_geo_balltree(geo_dataset, geo_indexer, geo_expected): | |||
def test_sklearn_geo_balltree(geo_dataset, geo_indexer, geo_expected, geo_expected_indexers): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we split this into two functions like for the other indexes?
This splits the
.sel
method into two steps (getting the positional indexers), and actually doing the selection.The aim of this is to provide an easy way of obtaining the indices for use in downstream applications like horizontal interpolation etc.
.query
)[ ] Add query keyword arguments and pass them to the underlying index.query()
method for calculating distances between target and selected points[ ] Using the indexers for interpolation (on a structured grid)[ ] Using the indexers for a two-step process with step one just getting the nodes for a polyline and step two querying all intermediate points.Test on binder: https://mybinder.org/v2/gh/willirath/xoak/expose-indexers?filepath=doc%2Fexamples