Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clean-up and implement the conversion methods #11

Merged
merged 33 commits into from
Jul 10, 2020

Conversation

keewis
Copy link
Collaborator

@keewis keewis commented Jun 27, 2020

As described in #6, this aims to implement working Dataset.pint.to and DataArray.pint.to methods.

@keewis
Copy link
Collaborator Author

keewis commented Jun 28, 2020

@jthielen, @TomNicholas: here's an overview of the current implementation of to:

usage examples of DataArray.pint.to
In [1]: import xarray as xr 
   ...: import numpy as np 
   ...: import pintxarray 
   ...: import pint 
   ...: ureg = pint.UnitRegistry()

In [2]: da = xr.DataArray( 
   ...:     dims="x", 
   ...:     data=np.linspace(0, 1, 4) * ureg.m, 
   ...:     coords={"u": ("x", [2, 4, 6, 8] * ureg.s)}, 
   ...: )

In [3]: da.pint.to("mm")
Out[3]: 
<xarray.DataArray (x: 4)>
<Quantity([   0.          333.33333333  666.66666667 1000.        ], 'millimeter')>
Coordinates:
    u        (x) int64 <Quantity([2 4 6 8], 'second')>
Dimensions without coordinates: x

In [4]: da.pint.to(ureg.mm)
Out[4]: 
<xarray.DataArray (x: 4)>
<Quantity([   0.          333.33333333  666.66666667 1000.        ], 'millimeter')>
Coordinates:
    u        (x) int64 <Quantity([2 4 6 8], 'second')>
Dimensions without coordinates: x

In [5]: da.pint.to({da.name: "mm"})
Out[5]: 
<xarray.DataArray (x: 4)>
<Quantity([   0.          333.33333333  666.66666667 1000.        ], 'millimeter')>
Coordinates:
    u        (x) int64 <Quantity([2 4 6 8], 'second')>
Dimensions without coordinates: x

In [6]: da.pint.to("mm", u="ms")
Out[6]: 
<xarray.DataArray (x: 4)>
<Quantity([   0.          333.33333333  666.66666667 1000.        ], 'millimeter')>
Coordinates:
    u        (x) float64 <Quantity([2000. 4000. 6000. 8000.], 'millisecond')>
Dimensions without coordinates: x

In [7]: da.pint.to({da.name: "mm", "u": "ms"})
Out[7]: 
<xarray.DataArray (x: 4)>
<Quantity([   0.          333.33333333  666.66666667 1000.        ], 'millimeter')>
Coordinates:
    u        (x) float64 <Quantity([2000. 4000. 6000. 8000.], 'millisecond')>
Dimensions without coordinates: x

In [8]: da.pint.to({da.name: "mm"}, u="ms")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-9-e092d5c447d0> in <module>
----> 1 da.pint.to({da.name: "mm"}, u="ms")

.../pintxarray/accessors.py in to(self, units, registry, **unit_kwargs)
    265             )
    266 
--> 267         units = either_dict_or_kwargs(units, unit_kwargs, "to")
    268 
    269         return conversion.convert_units(self.da, units)

.../pintxarray/accessors.py in either_dict_or_kwargs(positional, keywords, method_name)
     43             )
     44         if keywords:
---> 45             raise ValueError(
     46                 "cannot specify both keyword and positional "
     47                 f"arguments to .{method_name}"

ValueError: cannot specify both keyword and positional arguments to .to
usage examples of Dataset.pint.to
In [1]: import xarray as xr 
   ...: import numpy as np 
   ...: import pintxarray 
   ...: import pint 
   ...: ureg = pint.UnitRegistry()

In [2]: ds = xr.Dataset( 
   ...:     data_vars={"a": ("x", np.linspace(0, 1, 4) * ureg.m)}, 
   ...:     coords={"u": ("x", [2, 4, 6, 8] * ureg.ms)}, 
   ...: )

In [3]: ds.pint.to(a="mm", u="ms")
Out[3]: 
<xarray.Dataset>
Dimensions:  (x: 4)
Coordinates:
    u        (x) int64 <Quantity([2 4 6 8], 'millisecond')>
Dimensions without coordinates: x
Data variables:
    a        (x) float64 <Quantity([   0.          333.33333333  666.66666667...

In [4]: ds.pint.to({"a": "mm", "u": "ms"})
Out[4]: 
<xarray.Dataset>
Dimensions:  (x: 4)
Coordinates:
    u        (x) int64 <Quantity([2 4 6 8], 'millisecond')>
Dimensions without coordinates: x
Data variables:
    a        (x) float64 <Quantity([   0.          333.33333333  666.66666667...

In [5]: ds.pint.to({"a": "mm"}, u="ms")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-5-49ddaee6c1c1> in <module>
----> 1 ds.pint.to({"a": "mm"}, u="ms")

.../pintxarray/accessors.py in to(self, units, **unit_kwargs)
    376 
    377     def to(self, units=None, **unit_kwargs):
--> 378         units = either_dict_or_kwargs(units, unit_kwargs, "to")
    379 
    380         return conversion.convert_units(self.ds, units)

.../pintxarray/accessors.py in either_dict_or_kwargs(positional, keywords, method_name)
     43             )
     44         if keywords:
---> 45             raise ValueError(
     46                 "cannot specify both keyword and positional "
     47                 f"arguments to .{method_name}"

ValueError: cannot specify both keyword and positional arguments to .to

In a future PR we could then add support for contexts via a contexts keyword-only argument (variables named contexts then cannot be converted using kwargs but this is still possible using the positional argument).

Also, we could add pseudo-support for converting units in indexes: take the string unit from the units attribute, use it to construct a quantity, convert it to the new units, set the new index values using the magnitude of the converted quantity, and then update the attribute.

@keewis
Copy link
Collaborator Author

keewis commented Jun 28, 2020

Also, @TomNicholas: there's still only a single check visible. I can't seem to be able to set up an account so I can't verify this, but reading the documentation it seems that it needs read / write access to checks.

@keewis keewis marked this pull request as ready for review June 29, 2020 15:08
@keewis
Copy link
Collaborator Author

keewis commented Jul 1, 2020

@TomNicholas, this should be ready for review and merge.

Copy link
Member

@TomNicholas TomNicholas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me!

One comment: there is now this separation between high-level functions (e.g. attach_units) and internal ones (e.g. array_attach_units). Should they be separated with preceding underscores? Should some of the top-level ones be public API? In particular the testing assert functions might be useful for users or downstream developers.

Comment on lines 19 to 20
pytest.param(None, id="without registry"),
pytest.param(unit_registry, id="with registry"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not know you could assign tests ids like this - cool!

@keewis
Copy link
Collaborator Author

keewis commented Jul 8, 2020

hmm... for now I'd say let's consider everything imported by __init__.py public and everything else private. Once we have a more thorough API documentation, we could state that everything included there is public and everything else isn't (which I think is what xarray does).

I agree that adding a pintxarray.testing module to the public API would definitely be useful.

@jthielen
Copy link
Collaborator

jthielen commented Jul 8, 2020

@keewis The implementation of to here looks great. I agree that it would be good to follow-up on this with contexts support and index conversion pseudo-support. The latter is particularly important in most of my own use cases, so if you don't get around to implementing it soon, I'll try to do so. In MetPy I added that index conversion on a separate accessor method, but here it looks like it could integrate well into the main to method.

@keewis
Copy link
Collaborator Author

keewis commented Jul 8, 2020

with two approvals I guess it is time to merge? not yet, the docstrings are missing

Also, @jthielen, if you already implemented the pseudo-index support once it might be faster for you to do so again. Go ahead if you want to and have the time.

@keewis
Copy link
Collaborator Author

keewis commented Jul 8, 2020

ah, with the relocation the CI setup is gone. @TomNicholas, do you think you could set this up again?

@keewis
Copy link
Collaborator Author

keewis commented Jul 8, 2020

I added the docstrings (and found two bugs in the process), so this is ready for another round of reviews.

Copy link
Collaborator

@jthielen jthielen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me pending CI getting fixed and passing. Just a couple questions (that don't need to hold up merging) on things I noticed when looking at this closer:

pint_xarray/accessors.py Outdated Show resolved Hide resolved

units = either_dict_or_kwargs(units, unit_kwargs, "to")

return conversion.convert_units(self.da, units)

def to_base_units(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should to_base_units also be adjusted analogously to use a util from conversion or is it fine as-is?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to_base_units uses Quantity.to_base_units, so I'm not quite sure how we'd be able to use a conversion function here. Did you want to add a function to conversion that calls .to_base_units? If so, then yes, we could use that to reduce duplicated code.

We should also allow passing in a dict / kwargs to control which of the DataArray's variables should be converted (defaulting to "convert all variables").

Both should be a new PR, though.

units = either_dict_or_kwargs(units, unit_kwargs, "to")

return conversion.convert_units(self.ds, units)

def to_base_units(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here: not sure how to use conversion, but we should definitely add a possibility to only convert certain variables.

@TomNicholas
Copy link
Member

ah, with the relocation the CI setup is gone. @TomNicholas, do you think you could set this up again?

Thanks for flagging that the CI has gone. I think I've fixed it - @keewis can you just push an empty commit to this branch to see if that triggers the CI on this PR?

@keewis
Copy link
Collaborator Author

keewis commented Jul 9, 2020

the CI seems to work again

@keewis
Copy link
Collaborator Author

keewis commented Jul 9, 2020

if the code citing style I used (stating the import path) is acceptable, this should be ready to be merged. I'm not quite sure what the best citing style would be, so if we agree on a certain style I'm happy to use that instead.

@jthielen
Copy link
Collaborator

jthielen commented Jul 9, 2020

@keewis Not to get too nit-picky, but I believe the terms of the Apache License require us to include "all copyright, patent, trademark, and attribution notices from the Source form of the Work" along with "a copy of this License", and to also state changes.

So, since we're already including the Apache License (since that is pint-xarray's license as well), we should be fine just including "Copyright 2014-2019, xarray Developers" to that comment to cover the copyright notice requirement and add something like "reused with modification" or whatever other phrasing you find best to cover the stating changes requirement.

@keewis
Copy link
Collaborator Author

keewis commented Jul 10, 2020

Not to get too nit-picky

You're not, I just have zero experience with code citing. I looked at the xarray code base for reference, and there the import style reference is mostly used, but then there's also the licenses folder which contains each individual project's license, even if it is a duplicate (there are multiple licenses (MIT?) that are almost identical). And for the register_*_accessor functions, pandas simply states:

# Ported with modifications from xarray
# https://github.com/pydata/xarray/blob/master/xarray/core/extensions.py

while the license is included in their LICENSES folder.

So now, I'm thinking of using this:

# based on xarray.core.utils.either_dict_or_kwargs                                                                                                                                                                                
# https://github.com/pydata/xarray/blob/v0.15.1/xarray/core/utils.py#L249-L268                                                                                                                                                    

and including xarray's license in a licenses folder. Not sure if that's enough?

@jthielen
Copy link
Collaborator

So now, I'm thinking of using this:

# based on xarray.core.utils.either_dict_or_kwargs                                                                                                                                                                                
# https://github.com/pydata/xarray/blob/v0.15.1/xarray/core/utils.py#L249-L268                                                                                                                                                    

and including xarray's license in a licenses folder. Not sure if that's enough?

That sounds great except for one thing: it doesn't include the copyright notice (since Apache, unlike BSD and MIT, doesn't include the copyright notice as part of the standard license text). So, I think you'll just have to add that somewhere. At the top of licenses/XARRAY_LICENSE is likely easiest, but perhaps a licenses/README.md or the code comment could be good too.

@keewis
Copy link
Collaborator Author

keewis commented Jul 10, 2020

sure, although pandas doesn't seem to have done that (see pandas-dev/pandas@eee83e2)

@jthielen
Copy link
Collaborator

Hmm...I'm pretty sure then pandas messed up on that one, since the license is clear about requiring "all copyright, patent, trademark, and attribution notices". I'll submit a PR to pandas and see what they say.

@keewis
Copy link
Collaborator Author

keewis commented Jul 10, 2020

thanks

@keewis
Copy link
Collaborator Author

keewis commented Jul 10, 2020

done. Once you got a response from the pandas devs, we can change this again.

Maybe it's worth pinging @shoyer to resolve this?

@keewis
Copy link
Collaborator Author

keewis commented Jul 10, 2020

so if the code citation good enough for now, let's merge?

@jthielen jthielen merged commit 4f3b0bc into xarray-contrib:master Jul 10, 2020
@keewis keewis deleted the to branch July 10, 2020 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants