-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve error messages #8264
Comments
I completely agree. I think part of the problem is that because we generally have good internal abstractions for operations, an error gets thrown from deep within e.g. |
Yes I definitely agree, this can be an issue, #2078 is a great example there. (that said, I still think there are areas we can do better without big changes — even if we gave the index of the object which failed it would be quite helpful...) |
I've been using pandas a bit recently, and it has even worse error messages (though also the objects are less complex, so maybe less impactful). Here's an example: df['a']
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File /opt/homebrew/lib/python3.9/site-packages/pandas/core/indexes/base.py:3790, in Index.get_loc(self, key)
3789 try:
-> 3790 return self._engine.get_loc(casted_key)
3791 except KeyError as err:
File index.pyx:152, in pandas._libs.index.IndexEngine.get_loc()
File index.pyx:181, in pandas._libs.index.IndexEngine.get_loc()
File pandas/_libs/hashtable_class_helper.pxi:7080, in pandas._libs.hashtable.PyObjectHashTable.get_item()
File pandas/_libs/hashtable_class_helper.pxi:7088, in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'a'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
Cell In[1], line 1
----> 1 df['a']
File /opt/homebrew/lib/python3.9/site-packages/pandas/core/frame.py:3896, in DataFrame.__getitem__(self, key)
3894 if self.columns.nlevels > 1:
3895 return self._getitem_multilevel(key)
-> 3896 indexer = self.columns.get_loc(key)
3897 if is_integer(indexer):
3898 indexer = [indexer]
File /opt/homebrew/lib/python3.9/site-packages/pandas/core/indexes/base.py:3797, in Index.get_loc(self, key)
3792 if isinstance(casted_key, slice) or (
3793 isinstance(casted_key, abc.Iterable)
3794 and any(isinstance(x, slice) for x in casted_key)
3795 ):
3796 raise InvalidIndexError(key)
-> 3797 raise KeyError(key) from err
3798 except TypeError:
3799 # If we have a listlike key, _check_indexing_error will raise
3800 # InvalidIndexError. Otherwise we fall through and re-raise
3801 # the TypeError.
3802 self._check_indexing_error(key)
KeyError: 'a' We're a bit better at the comparable case: da.sel(lat=10)
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File /opt/homebrew/lib/python3.9/site-packages/pandas/core/indexes/base.py:3790, in Index.get_loc(self, key)
3789 try:
-> 3790 return self._engine.get_loc(casted_key)
3791 except KeyError as err:
File index.pyx:152, in pandas._libs.index.IndexEngine.get_loc()
File index.pyx:181, in pandas._libs.index.IndexEngine.get_loc()
File pandas/_libs/hashtable_class_helper.pxi:3576, in pandas._libs.hashtable.Float32HashTable.get_item()
File pandas/_libs/hashtable_class_helper.pxi:3600, in pandas._libs.hashtable.Float32HashTable.get_item()
KeyError: 10.0
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
File /opt/homebrew/lib/python3.9/site-packages/xarray/core/indexes.py:772, in PandasIndex.sel(self, labels, method, tolerance)
771 try:
--> 772 indexer = self.index.get_loc(label_value)
773 except KeyError as e:
File /opt/homebrew/lib/python3.9/site-packages/pandas/core/indexes/base.py:3797, in Index.get_loc(self, key)
3796 raise InvalidIndexError(key)
-> 3797 raise KeyError(key) from err
3798 except TypeError:
3799 # If we have a listlike key, _check_indexing_error will raise
3800 # InvalidIndexError. Otherwise we fall through and re-raise
3801 # the TypeError.
KeyError: 10.0
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
Cell In[4], line 1
----> 1 da.sel(lat=10)
File /opt/homebrew/lib/python3.9/site-packages/xarray/core/dataarray.py:1590, in DataArray.sel(self, indexers, method, tolerance, drop, **indexers_kwargs)
1480 def sel(
1481 self,
1482 indexers: Mapping[Any, Any] | None = None,
(...)
1486 **indexers_kwargs: Any,
1487 ) -> Self:
1488 """Return a new DataArray whose data is given by selecting index
1489 labels along the specified dimension(s).
1490
(...)
1588 Dimensions without coordinates: points
1589 """
-> 1590 ds = self._to_temp_dataset().sel(
1591 indexers=indexers,
1592 drop=drop,
1593 method=method,
1594 tolerance=tolerance,
1595 **indexers_kwargs,
1596 )
1597 return self._from_temp_dataset(ds)
File /opt/homebrew/lib/python3.9/site-packages/xarray/core/dataset.py:3047, in Dataset.sel(self, indexers, method, tolerance, drop, **indexers_kwargs)
2986 """Returns a new dataset with each array indexed by tick labels
2987 along the specified dimension(s).
2988
(...)
3044 DataArray.sel
3045 """
3046 indexers = either_dict_or_kwargs(indexers, indexers_kwargs, "sel")
-> 3047 query_results = map_index_queries(
3048 self, indexers=indexers, method=method, tolerance=tolerance
3049 )
3051 if drop:
3052 no_scalar_variables = {}
File /opt/homebrew/lib/python3.9/site-packages/xarray/core/indexing.py:193, in map_index_queries(obj, indexers, method, tolerance, **indexers_kwargs)
191 results.append(IndexSelResult(labels))
192 else:
--> 193 results.append(index.sel(labels, **options))
195 merged = merge_sel_results(results)
197 # drop dimension coordinates found in dimension indexers
198 # (also drop multi-index if any)
199 # (.sel() already ensures alignment)
File /opt/homebrew/lib/python3.9/site-packages/xarray/core/indexes.py:774, in PandasIndex.sel(self, labels, method, tolerance)
772 indexer = self.index.get_loc(label_value)
773 except KeyError as e:
--> 774 raise KeyError(
775 f"not all values found in index {coord_name!r}. "
776 "Try setting the `method` keyword argument (example: method='nearest')."
777 ) from e
779 elif label_array.dtype.kind == "b":
780 indexer = label_array
KeyError: "not all values found in index 'lat'. Try setting the `method` keyword argument (example: method='nearest')." As one case of doing this well: there's useful info we could provide when there's an index error:
@TomNicholas makes a good point that many of the errors are raised deeper in the call stack which doesn't have the broader context, such as a view of the full object. One option is for higher-level code to catch the exception, and call a function with the full context which is designed to write a good error message. This also means that it's OK to have lower performance requirements — we only pay the cost of constructing the message (including, for example, searching for similar values in an index) when we hit an error. There could also be an option for skipping this in an environment where perf is more important than friendliness. I would love for xarray to be the equivalent of the rust compiler here — known for being surprisingly helpful, such that going back to another tool feels like you're without a teammate! This might not seem like a traditional funding proposal, but I do think it could make a good candidate for one:
|
I agree with everything you just wrote @max-sixty 🙏 Another thing to keep in mind would be the support for exception groups in python 3.11. I imagine there could be a lot of use cases for these in xarray, and @keewis and I have already been discussing using them in pint-xarray (xarray-contrib/pint-xarray#144 (comment)) and in datatree (xarray-contrib/datatree#264). |
Is your feature request related to a problem?
Coming back to xarray, and using it based on what I remember from a year ago or so, means I make lots of mistakes. I've also been using it outside of a repl, where error messages are more important, given I can't explore a dataset inline.
Some of the error messages could be much more helpful. Take one example:
The second sentence is nice. But the first could be give us much more information:
join=...
? Are they off by 1 or are they completely different types?testing.assert_equal
produces pretty nice errors, as a comparisonHaving these good is really useful, lets folks stay in the flow while they're working, and it signals that we're a well-built, refined library.
Describe the solution you'd like
I'm not sure the best way to surface the issues — error messages make for less legible contributions than features or bug fixes, and the primary audience for good error messages is often the opposite of those actively developing the library. They're also more difficult to manage as GH issues — there could be scores of marginal issues which would often be out of date.
One thing we do in PRQL is have a file that snapshots error messages
test_bad_error_messages.rs
, which can then be a nice contribution to change those from bad to good. I'm not sure whether that would work here (python doesn't seem to have a great snapshotter,pytest-regtest
is the best I've found; I wrotepytest-accept
but requires doctests).Any other ideas?
Describe alternatives you've considered
No response
Additional context
A couple of specific error-message issues:
The text was updated successfully, but these errors were encountered: