Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement the inline repr #22

Merged
merged 12 commits into from
Aug 26, 2020
Merged

implement the inline repr #22

merged 12 commits into from
Aug 26, 2020

Conversation

keewis
Copy link
Collaborator

@keewis keewis commented Aug 6, 2020

As discussed in hgrecco/pint#1133, this monkeypatches the _repr_inline_ method introduced in pydata/xarray#4248 onto pint.Quantity, resulting in:

In [1]: import xarray as xr 
   ...: import numpy as np 
   ...: import pint 
   ...: import pint_xarray 
   ...:  
   ...: ds = xr.Dataset({"a": ("x", np.linspace(0, 1, 100), {"units": "m"})}) 
   ...: display(ds) 
   ...:  
   ...: quantified = ds.pint.quantify() 
   ...: display(quantified)
<xarray.Dataset>
Dimensions:  (x: 100)
Dimensions without coordinates: x
Data variables:
    a        (x) float64 0.0 0.0101 0.0202 0.0303 ... 0.9697 0.9798 0.9899 1.0
<xarray.Dataset>
Dimensions:  (x: 100)
Dimensions without coordinates: x
Data variables:
    a        (x) float64 [m] 0.0 0.0101 0.0202 0.0303 ... 0.9798 0.9899 1.0

However, this uses xarray.core.formatting.format_array_flat which is not public API (maybe it should be?).

Also, no tests, yet, because I don't know how to best check a repr.

@codecov
Copy link

codecov bot commented Aug 6, 2020

Codecov Report

Merging #22 into master will decrease coverage by 1.08%.
The diff coverage is 78.02%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #22      +/-   ##
==========================================
- Coverage   87.25%   86.16%   -1.09%     
==========================================
  Files           9       11       +2     
  Lines         706      795      +89     
==========================================
+ Hits          616      685      +69     
- Misses         90      110      +20     
Impacted Files Coverage Δ
pint_xarray/formatting.py 74.35% <74.35%> (ø)
pint_xarray/__init__.py 100.00% <100.00%> (ø)
pint_xarray/tests/test_formatting.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4347a32...35ff26e. Read the comment docs.

@keewis
Copy link
Collaborator Author

keewis commented Aug 6, 2020

thinking about this some more, this doesn't work for anything other than numpy.ndarray objects. For now I guess we could use the magnitude's _repr_inline_ (falling back to __repr__ if that doesn't exist) and only use format_array_flat if the magnitude is a ndarray.

However, as we nest deeper (e.g. xarray(pint(uncertainties(dask(sparse(cupy))))) – for argument's sake, let's assume that this actually makes sense) this might break or become really complicated. Does anyone have any ideas how to deal with that?

If I'm simply missing something we have that discussion here, otherwise I guess we should open a issue on xarray's issue tracker.

@jthielen
Copy link
Collaborator

jthielen commented Aug 6, 2020

Yes, I agree that format_array_flat should probably just apply to magnitude being an ndarray.

I think a cascading series of _repr_inline_ should work for nested arrays, so long as

  • the metadata of the higher nested objects is considered the priority (if not, then we're back to a fully managed solution to the likes of Design: nested _meta dask/dask#5329)
  • small max lengths are handled gracefully (i.e., a minimum where it is just like Dask.Array(...), then ..., then nothing)
  • we're okay with the lowest arrays in large nesting chains not having any information show up in the inline repr (situation where there is not enough characters to even describe the full nesting has to be accounted for somehow)
  • it can be adopted without too much complaint across the ecosystem

Assuming all this, then each layer of the nesting will reduce the max length of the inline repr string available to the layers below it, until a layer reaches a reasonable minimum where it "gives up". At least that's the natural design that I inferred from the simple _repr_inline_(max_width) API.

All that being said, it might still be good to bring up on xarray's end since this is a more general issue with inline reprs of nested duck arrays, with nothing pint-specific other than it being the motivating use-case.

@jthielen
Copy link
Collaborator

jthielen commented Aug 6, 2020

Also, as far as tests go, I'd think a unit test for this could be as simple as

assert q._repr_inline_(40) == "expected 40 character inline string repr..."

for a couple combinations of dtypes, shapes, and max widths. I'd see no need for integration tests making sure the final repr is as expected (since that's more on xarray's end).

@keewis
Copy link
Collaborator Author

keewis commented Aug 20, 2020

not sure how to test that we forward to duck arrays properly, but that's something that depends on pydata/xarray#4324.

So I guess this is ready for review?

@keewis keewis mentioned this pull request Aug 22, 2020
16 tasks
@jthielen
Copy link
Collaborator

My only suggestion is vendoring xarray.core.formatting.format_array_flat instead of using it from xarray. I've gotten burned before relying on xarray's private API. If at some point format_array_flat becomes public API, then we can remove the vendoring and just rely on that.

Otherwise looks good!

@keewis
Copy link
Collaborator Author

keewis commented Aug 23, 2020

Vendoring format_array_flat would increase the number of direct dependencies (we'd depend on pandas) and also pull in 12 other functions, so this might be pretty painful. We could make it semi-public in xarray instead? Any thoughts on this, @shoyer?

@shoyer
Copy link

shoyer commented Aug 23, 2020

👍 for vendoring. I suspect you can make use a much simplified version of the associated helper functions, because pandas is really only used for datetime formatting.

@keewis
Copy link
Collaborator Author

keewis commented Aug 23, 2020

well, although it doesn't really make sense, this works:

In [5]: import pint
   ...: import pandas as pd
   ...: 
   ...: ureg = pint.UnitRegistry()
   ...: ureg.Quantity(pd.date_range("2010-01-01", periods=10).to_numpy(), "m")
Out[5]: 
array(['2010-01-01T00:00:00.000000000', '2010-01-02T00:00:00.000000000',
       '2010-01-03T00:00:00.000000000', '2010-01-04T00:00:00.000000000',
       '2010-01-05T00:00:00.000000000', '2010-01-06T00:00:00.000000000',
       '2010-01-07T00:00:00.000000000', '2010-01-08T00:00:00.000000000',
       '2010-01-09T00:00:00.000000000', '2010-01-10T00:00:00.000000000'],
      dtype='datetime64[ns]') <Unit('meter')>

if we disregard that, vendoring would indeed be much easier.

@keewis
Copy link
Collaborator Author

keewis commented Aug 23, 2020

with the vendoring the coverage decreases, but I think that's fine as long as we keep it in sync with the xarray version.

@jthielen
Copy link
Collaborator

Looks good now with the vendoring.

I did catch one more thing on another go-through (sorry about that). Hopefully this won't matter in practice, but the pathological case where max_width - len(units_repr) - 3 < 0 doesn't look to be explicitly handled. There should probably at least be a test that it degrades properly and does not error? Not sure what we actually want the repr to look like in that case (perhaps just ...)?

@keewis
Copy link
Collaborator Author

keewis commented Aug 24, 2020

I switched to using the vendored maybe_truncate and added a test for max_width - len(units) - 3 < 0. Right now, it will keep the units, no matter how short the leftover width becomes. Not sure if that's okay? The displayed repr of dask and sparse arrays (which are special cased in xarray) will also be longer than max_width if necessary.

@jthielen
Copy link
Collaborator

I switched to using the vendored maybe_truncate and added a test for max_width - len(units) - 3 < 0. Right now, it will keep the units, no matter how short the leftover width becomes. Not sure if that's okay? The displayed repr of dask and sparse arrays (which are special cased in xarray) will also be longer than max_width if necessary.

Seems okay to me based on the existing behavior as you mentioned. If it becomes a problem, it can always be adjusted later.

@keewis keewis merged commit 857a8e1 into xarray-contrib:master Aug 26, 2020
@keewis keewis deleted the inline-repr branch August 26, 2020 21:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants