Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subclassing from pandera.api.dataframe.model.DataFrameModel errors on annotated but not initialized fields starting with an underscore #1765

Open
3 tasks done
adzcai opened this issue Jul 26, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@adzcai
Copy link

adzcai commented Jul 26, 2024

Describe the bug

If I create a generic subclass of pandera.api.dataframe.model.DataFrameModel that has an uninitialized, annotated field starting with an underscore, and try to instantiate it with concrete type parameters, DataFrameModel.__class_getitem__ throws an error when it tries to collect the fields here.

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandera.
  • (optional) I have confirmed this bug exists on the main branch of pandera.

Code Sample, a copy-pastable example

from typing import Generic
from pandera.api.dataframe.model import DataFrameModel, TDataFrame, TSchema

class Schema(DataFrameModel[TDataFrame, TSchema], Generic[TDataFrame, TSchema]):
    _foo: int

from pandera.api.pandas.container import DataFrameSchema
import pandas as pd

x: Schema[pd.DataFrame, DataFrameSchema]

The last line raises the following error:

KeyError                                  Traceback (most recent call last)
Cell In[4], line 10
      7 from pandera.api.pandas.container import DataFrameSchema
      8 import pandas as pd
---> 10 x: Schema[pd.DataFrame, DataFrameSchema]

File ~/micromamba/envs/virgo/lib/python3.12/site-packages/pandera/api/dataframe/model.py:189, in DataFrameModel.__class_getitem__(cls, item)
    187 param_dict: Dict[TypeVar, Type[Any]] = dict(zip(__parameters__, item))
    188 extra: Dict[str, Any] = {"__annotations__": {}}
--> 189 for field, (annot_info, field_info) in cls._collect_fields().items():
    190     if isinstance(annot_info.arg, TypeVar):
    191         if annot_info.arg in param_dict:

File ~/micromamba/envs/virgo/lib/python3.12/site-packages/pandera/api/dataframe/model.py:359, in DataFrameModel._collect_fields(cls)
    357 fields = {}
    358 for field_name, annotation in annotations.items():
--> 359     field = attrs[field_name]  # __init_subclass__ guarantees existence
    360     if not isinstance(field, FieldInfo):
    361         raise SchemaInitError(
    362             f"'{field_name}' can only be assigned a 'Field', "
    363             + f"not a '{type(field)}.'"
    364         )

KeyError: '_foo'

Expected behavior

The field shouldn't be collected since it starts with an underscore.

Desktop (please complete the following information):

  • OS: macOS
  • Browser: Safari
  • Version: 0.20.3

Additional context

It seems like _foo is never in the dict attrs = cls._get_model_attrs(), since that checks through the __dict__s of the superclasses, but _foo isn't initialized, so it's not there.
Maybe we should filter out non-fields from the annotations dict as well.

@adzcai adzcai added the bug Something isn't working label Jul 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant