-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement the PintIndex
#163
Conversation
pint_xarray/index.py
Outdated
|
||
|
||
class PintMetaIndex(Index): | ||
# TODO: inherit from MetaIndex once that exists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm actually I'm not sure how a MetaIndex
class would look like. So far we used the generic term "meta-index" to refer to indexes that would wrap one or several indexes, but I don't know if there will be a need to provide a generic class for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, it doesn't really look like we actually need a base class for that, but I noticed that a few methods don't make sense for meta-indexes, from_variables
for example. It's probably fine to use the default for those, though.
Here are a few comments. Happy to answer questions if any. There are some Index methods of like For some other methods like The general approach used in the Xarray indexes refactor heavily relies on the type of the indexes (at least when we need to compare them together). That's not super flexible with the I wonder whether whether Regarding Index methods like You should also be careful when converting the units of indexed coordinates as it may get out of sync with their index. As there's no concept of "duck" index, the easiest would probably be to drop the index (and maybe reconstruct it from scratch) when the coordinates are updated. |
@keewis I have been looking into this once again and now I think I better understand what you'd like to achieve with the Wrap index coordinate variables as unit-aware variablesI'm not familiar with pint, but if a class PintMetaIndex:
def create_variables(self, variables=None):
index_vars = self.index.create_variables(variables)
index_vars_units = {}
for name, var in index_vars.items():
data = array_attach_unit(var.data, self.units[name])
var_units = xr.Variable(var.dims, data, attrs=var.attrs, encoding=var.encoding)
index_vars_units[name] = var_units
return index_var_units We cannot use IndexVariable since (if I remember well) it coerces the data as a Set new Pint index(es)Since @register_dataset_accessor("pint")
class PintDatasetAccessor:
def quantify(self, units=_default, unit_registry=None, **unit_kwargs)):
...
ds_xindexes = self.ds.xindexes
new_indexes, new_index_vars = ds_xindexes.copy_indexes()
for idx, idx_vars in ds_xindexes.group_by_index():
idx_units = {k: v for k, v in units.items() if k in idx_vars}
new_idx = PintMetaIndex(idx, idx_units)
new_indexes.update({k: new_idx for k in idx_vars})
new_index_vars.update(new_idx.create_variables(idx_vars))
new_coords = xr.Coordinates(new_index_vars, new_indexes)
# needs https://github.com/pydata/xarray/pull/8094 to work properly
ds_updated_temp = self.ds.assign_coords(new_coords)
... It is still useful to implement class PintMetaIndex:
@classmethod
def from_variables(cls, variables, options):
index = xr.indexes.PandasIndex.from_variables(variables)
units_dict = {index.index.name: options.get("units")}
return = cls(index, units_dict)
ds = xr.Dataset(coords={"x": [1, 2]})
ds_units = ds.drop_indexes("x").set_xindex("x", PintMetaIndex, units="m") Data selectionBeware that |
nit: I would rename |
Further comments: Implementing For |
@benbovy, with a few tweaks to your suggestions this: In [1]: import xarray as xr
...: import pint_xarray
...:
...: ureg = pint_xarray.unit_registry
...: ds = xr.tutorial.open_dataset("air_temperature")
...: q = ds.pint.quantify({"lat": "degrees", "lon": "degrees"})
...: q.sel(lat=ureg.Quantity(75, "deg").to("rad"))
.../xarray/core/indexes.py:473: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
index = pd.Index(np.asarray(array), **kwargs)
Out[1]:
<xarray.Dataset>
Dimensions: (time: 2920, lon: 53)
Coordinates:
lat float32 [deg] 75.0
* lon (lon) float32 [deg] 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
* time (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
air (time, lon) float32 [K] 241.2 242.5 243.5 ... 241.48999 241.79
Indexes:
lon PintMetaIndex
time PintMetaIndex
Attributes:
Conventions: COARDS
title: 4x daily NMC reanalysis (1948)
description: Data is from NMC initialized reanalysis\n(4x/day). These a...
platform: Model
references: http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly... does work 🎉 The only hickup is that somehow we seem to call |
(the failing tests are expected, I will have to update some of the workaround code) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@keewis Excellent!
Hmm the warning is weird. safe_cast_to_index
is called when a new PandasIndex
object is created but there no reason to create one in your example.
Small suggestion: you could implement PintMetaIndex._repr_inline_
so that it displays the type of the wrapped index (e.g., PintMetaIndex(PandasIndex)
)
Regarding all the errors in CI |
would that just delegate to the underlying index, or also wrap it (probably the former, but I wanted to make sure)? In any case, I wonder if we should just fix |
In the case of
Yes definitely, I think I fixed it in one of my open PRs in the Xarray repo. |
In theory, we could also use `sel` and `loc` directly, but that would not allow us to change the units of the result (or at least, as far as I can tell).
with this, all the tests pass locally (except the doctests, which I didn't try yet). |
with the recent commits, this should be ready for reviews (cc @TomNicholas, @benbovy, @jthielen). Note that even though the index implements a couple of the methods of the custom index base class, the wrapper methods ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is cool!
So right now indexes
is a valid argument to the DataArray
constructor but not to the Dataset
constructor?
What exactly are you referring to? Not sure if this is it, but the entire code base is built upon |
I was actually referring to your top-level example in your github comment: Usage, for anyone who wants to play around with itimport xarray as xr
from pint_xarray.index import PintMetaIndex
ds = xr.tutorial.open_dataset("air_temperature")
arr = ds.air
new_arr = xr.DataArray(
arr.variable,
coords={
"lat": arr.lat.variable,
"lon": arr.lon.variable,
"time": arr.time.variable,
},
indexes={
"lat": PintMetaIndex(arr.xindexes["lat"], {"lat": arr.lat.attrs.get("units")}),
"lon": PintMetaIndex(arr.xindexes["lon"], {"lon": arr.lon.attrs.get("units")}),
"time": arr.xindexes["time"],
},
fastpath=True,
)
new_arr.sel(
lat=ureg.Quantity([75, 70, 65], "deg"),
lon=ureg.Quantity([200, 202.5], "deg"),
) This will fail at the moment because But that seems to be the case when I look at the |
Yes but this is really for internal use along with the |
Great to see this ready @keewis ! I went through the changes and it looks good to me (cannot tell much about the pint-specific logic, though). |
that comment is also two years old, you'd use the |
I don't wish to block or postpone this PR in any way. But can someone give a quick overview of how the workflow changes with this? From the commit I see that |
right now In new PRs, we can step by step extend the index and deprecate most of the workaround methods in the |
Again unrelated, but speaking about deprecation. I have played around with |
thanks all, I'll merge this now and we can continue improving in separate PRs. The only thing I was slightly worried about is backwards-compatibility, but given that this is such a major improvement I guess we can get away with it (also, we have been recommending |
As mentioned in #162, it is possible to get the indexing functions to work, although there still is no public API.
I also still don't quite understand how other methods work since the refactor, so this only implements
sel
.Usage, for anyone who wants to play around with it
This will fail at the moment because
xarray
treatsdask
arrays differently from duck-dask
arrays, but passing single values works!PintMetaIndex
#162, closes Wrong units when usingda.integrate()
#205, closes Support for set_xindex? #218pre-commit run --all-files
whats-new.rst
api.rst