Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantity 2.0 #91

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Quantity 2.0 #91

wants to merge 1 commit into from

Conversation

nstarman
Copy link
Member

@nstarman nstarman commented Dec 27, 2023

Signed-off-by: nstarman <[email protected]>
@pllim
Copy link
Member

pllim commented Feb 1, 2024

I totally missed this one, so I am confused why there is a APE 23 and APE 25 here. And why report is in PDF instead of Markdown.

Is this related to the following?

@nstarman
Copy link
Member Author

nstarman commented Feb 1, 2024

so I am confused why there is a APE 23 and APE 25 here

I'll clean that up.

The report is in PDF instead of Markdown.

The report was for the Moore grant, not the APE. I'm including the PDF to eventually turn into an APE, written in Markdown.

@byrdie
Copy link

byrdie commented Mar 12, 2024

@nstarman, I think this looks really good so far, I'm quite excited to work with other arrays inside Quantity!

I was wondering if you have given any thought to supporting arrays with labeled axes (e.g. xarray) during this effort? I think supporting xarray (or the other way around, xarray supporting Quantity) is really important going forward, and I think it's worth considering adding first-class support this feature for Quantity 2.0.

@MridulS
Copy link

MridulS commented Mar 12, 2024

xarray already has support for units via https://github.com/xarray-contrib/pint-xarray, so making astropy.units compatible with xarray would solve this right?

@byrdie
Copy link

byrdie commented Mar 12, 2024

I guess it depends on how we want to interact with the rest of the astropy ecosystem, for example: how can we make astropy.coordinates work with and propagate labeled axes? Is it easier to convert the underlying data buffers of SkyCoord etc. to instances of xarray, or make Quantity wrap xarray so only have to make minimal changes to astropy.coordinates?

@nstarman
Copy link
Member Author

Support for x-array has to use their framework, which is how pint does it.
There already are unofficial libraries, like https://github.com/KaoruNishikawa/xarray-quantity to do this. With Quantity 2.0 we should make a coordinated package official version.

@mhvk
Copy link
Contributor

mhvk commented Jun 14, 2024

@rieder, @LourensVeen - this is the document for Quantity wrapping that I was talking about.

@mhvk
Copy link
Contributor

mhvk commented Jun 27, 2024

@nstarman - I've been thinking a bit more about the Quantity 2.0 things, and one question I wonder about is the subclassing vs a protocal logic. I think I understand the basic idea well enough -- and like the idea for Quantity itself being defined by a protocol -- but just wanted to be sure that, effectively, with Quantity 2.0, any code that relies on isinstance(q, (np.ndarrarray, dask.array.array)) is out of luck, correct? I.e., correct use is via the Array API, though duck typing will work in most cases. (Note, this is really just for clarification, I never liked isinstance checks much and the Protocol logic seems to formalize duck typing in a nice way... That said, dask.array.array does not yet have __array_namespace__, so we should not run too far ahead...)

Somewhat related, but more on the implementation, I found the use of the interface to get methods somewhat confusing. I definitely see the logic of why composition makes defining a grid of quantity subclasses and various array types easier, but to use __getattr__ to get methods seems contrary to the logic of being easy to inspect, and also seems worrisome from a performance perspective (in Column, we include a Cython __getitem__ just because going through an extra step in python is slow...). I guess the idea is that one would also override __dir__ to help tab completion? I ask since it does seem creating a new subclass DaskQuantity(AbstractQuantity, DaskInterface) would trivially make the methods available.

Anyway, really just for clarification!

@mhvk
Copy link
Contributor

mhvk commented Jun 27, 2024

Second question is more of a nitpick - I'm not enchanted with to_unit and to_unit_value. I also think that getting all the unit libraries to agree on that is going to be difficult. One option here would be to generalize the use of <<, which basically replaces .to() (though missing equivalencies, of course - though perhaps those are rare enough one could envision doing it with a context manager). Since Quantity now becomes a super thin wrapper for which initialization is fast, perhaps (q << new_unit).value is not so bad? Anyway, mostly to suggest one could leave this out of the protocol altogether. (One might go further, and define the protocol by dunder methods?)

@nstarman
Copy link
Member Author

nstarman commented Jun 27, 2024

but just wanted to be sure that, effectively, with Quantity 2.0, any code that relies on isinstance(q, (np.ndarrarray, dask.array.array)) is out of luck, correct?

Yeah. With Quantity as a wrapper, then it would fail isinstance checks. But since it's Array API compatible it would pass isinstance(x, ArrayAPI) and hasattr(x, "__array_namespace__") checks, where the former is a run-time checkable Protocol and the latter is the suggested attribute check

(as a note, the simple @runtimecheckable; class ArrayAPI(Protocol):; def __array_namespace__(self): ... operates the exact same as hasattr(x, "__array_namespace__")).

I found the use of the interface to get methods somewhat confusing...
a new subclass DaskQuantity(AbstractQuantity, DaskInterface) would trivially make the methods available.

Agreed that the interface bit is the hardest. I have a lot of conflicting thoughts about introducing subclasses like this. If you take a look at pint they had some of these same difficulties and have an interesting resolution with a mix of pros and cons. Maybe we should discuss further offline!

Either way, I think my suggestion would be start by not implementing the interface portion. Even without the interface we get everything that's in the Array API. All that the interface enabled was support for things like DaskArray.compute() or JaxArray.at[].set. These are important, for sure, but both the interface solution or subclassing are 100% compatible with starting without the interface portion. Then Quantity and its heirarchy becomes the simple starting point of

from abc import ABCMeta
from dataclasses import dataclass
from typing import final

@dataclass(frozen=True, slots=True)
class AbstractQuantity(metaclass=ABCMeta):
    value: ...
    unit: Unit


@final
@dataclass(frozen=True, slots=True)
class Quantity(AbstractQuantity):
    pass


@final
@dataclass(frozen=True, slots=True)
class Angle(AbstractQuantity):
    wrap_angle: Quantity

We can add the subclassing or interface layers after!

@nstarman
Copy link
Member Author

I'm not enchanted with to_unit and to_unit_value.
I also think that getting all the unit libraries to agree on that is going to be difficult.

That's for sure. But IMO still worthwhile pursuing. I would be interested to reach to folks in the fall.
I predict that to_units would be the most amenable to folks. .to() is quite generic and, following the Array API's choices on to_device, it's better to be specific. That's not to say we should deprecate to here at Astropy! I think we need to keep it around for a very long while, only that to should be an alias that directly calls to_units.

Anyway, mostly to suggest one could leave this out of the protocol altogether.

It's interesting. The most useful bit about the protocol is that it's a concrete definition of the duck type. In many respects getting community agreement on the protocol is key to defining it. That being said, my top realistic choice would be for us to define a Protocol, then assemble other unit libraries to collaborate on a Quantity API under the auspices of the Consortium for Python Data API Standards (https://data-apis.org). Our Protocol would then become a subclass of the common Protocol, which adds on any Astropy-specific functionality, e.g. __lshift__.

@mhvk
Copy link
Contributor

mhvk commented Jun 29, 2024

One more thing: you mention wanting quantities to be immutable - is that meant to be just for the instance, or also the underlying array? For numpy arrays, that basically is a non-starter - one has to do in-place stuff with large arrays, both to get decent performance and to reduce memory use. In principle, one could get a new quantity wrapper out, but am not sure if that is not more confusing rather than less: after r = np.add(q1, q2, out=q1) I really expect r is q1 (in fact, lots of tests in astropy follow that pattern). But since this is different for dask, possibly Quantity itself being immutable makes sense (as long as initialization is really fast...).

On <<: the logic for suggesting this was that we can shift the problem of changing units to the units, rather than to the quantity: for q << new_unit, the unit has to implement __rlshift__ and then, if the argument is a Quantity, effectively do q.__class__(q.unit.to(self, q.value), self). Assuming the existence of array << unit (or array * unit), the unit already has to know about the quantity protocol, so in that sense it doesn't add more complexity. That said, since Quantity carries the unit, it is not strange to assume it knows the unit protocol... Anyway, only by having it on Quantity can one implement in-place unit conversion as in q <<= new_unit.

p.s. Now followed the links to the Protocol PEP -- thanks for including those in the document -- and becoming quite sold on that. Am less clear about the abstract-final pattern, but less important at this moment.

@mhvk
Copy link
Contributor

mhvk commented Jun 29, 2024

On the interface, we could indeed start without, though I guess it will take quite a bit of getting used to to avoid methods like q.mean()... Though presumably some/many array types do not implement those anyway...

@nstarman
Copy link
Member Author

On the interface, we could indeed start without, though I guess it will take quite a bit of getting used to to avoid methods like q.mean()... Though presumably some/many array types do not implement those anyway...

I would advocate starting with just the Array API.
We can add much more easily than we can remove.
If we choose to add, we can implement a .mean() method through the array API.

def mean(self):
    xp =  self.__array_namespace__()
    return xp.mean(self)

@nstarman
Copy link
Member Author

you mention wanting quantities to be immutable - is that meant to be just for the instance, or also the underlying array?

Just Quantity itself! I agree we shouldn't do anything to the underlying object, except maybe convert it to an array if it's a list.

Am less clear about the abstract-final pattern, but less important at this moment.

Really the abstract-final pattern can be summarized as: concrete classes can never inherit from other concrete classes, only abstract ones. This actually ends up simplifying a lot of code / ensuring classes don't get too complex. See https://docs.kidger.site/equinox/pattern/.

So for Quantity this would mean Quantity -> SpecificTypeQuantity -> Angle would be refactored as

  1. AbstractQuantity -> Quantity
  2. AbstractQuantity -> SpecificTypeQuantity -> Angle, where SpecificTypeQuantity is also an ABC.

Amongst the many benefits, adding an AbstractQuantity above Quantity would help with the transition to Quantity 2.0 as both the new and old Quantity classes can inherit from AbstractQuantity! So people can use the base class as a unified check.

@mhvk
Copy link
Contributor

mhvk commented Jun 30, 2024

Mostly a reminder, that when discussing other array types in the coordination meeting, one that was considered of particular interest, was zarr.

@TomNicholas
Copy link

TomNicholas commented Aug 6, 2024

Hi, xarray dev here 👋 I just saw this issue thanks to @byrdie .

@keewis and I are the ones who made xarray work with pint, and wrote the blog post about that. I didn't even know that https://github.com/KaoruNishikawa/xarray-quantity existed!

We would love to see integration between xarray and astropy.units. I feel like we currently have two parallel stacks that do a lot of the same things, and making units work is one of the most important things in order to bridge that divide.

I would advocate starting with just the Array API.

We can go into details, but in general if your units array class implements the array API then most things in xarray should "just work".

@mhvk
Copy link
Contributor

mhvk commented Sep 13, 2024

@TomNicholas - belatedly, also a direct reply to your note: we just discussed how we want to proceed with this "Quantity 2.0" project, and your note made clear that the first alternative to ndarray we will try to support is xarray -- we really want to start by just relying on the Array API, and it is great to hear xarray is far along in that regard. We'll get back to you when we (inevitably) run into snags...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants