Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sklearn pipeline compatibility and pandas dependencies #406

Open
PaulWestenthanner opened this issue May 6, 2023 · 0 comments
Open

Sklearn pipeline compatibility and pandas dependencies #406

PaulWestenthanner opened this issue May 6, 2023 · 0 comments

Comments

@PaulWestenthanner
Copy link
Collaborator

Expected Behavior

full compatibility with sklearn pipelines

Actual Behavior

we're only compatible with pandas mode of sklearn.
By default a multi-step pipeline, that has an encoder not as first step, e.g.

Pipeline(
    steps=[
        ("preprocessor", SomePreprocessor().set_output("pandas"),
        ("encoder", SomeEncoder()),
    ]
)

will fail if the user does not manually specify set_output('pandas') or configure pandas mode globally for sklearn via sklearn.set_config(transform_output="pandas").

This is not very nice and might lead to errors.

Steps to Reproduce the Problem

  1. Create a sklearn Pipeline with 1. step a preprocessor, 2. step an encoder
  2. call fit_transform on the pipeline. This will raise an error as category encoders works with dataframes internally and after the first transform and array is given where the column names differ from what the encoder would expect.

Potential Solution

To fix this we'd need to get independent of pandas internally. This is quite difficult and requires some refactoring in all encoders. Mainly how feature names are determined. Also the benefit is rather small since there is an easy workaround in the uncommon case that the encoder is not the first step of a multi-step pipeline.
However, if a major refactoring is done for a potential version 3 we could include this as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant