-
Notifications
You must be signed in to change notification settings - Fork 395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: min_group_size for other category encoders, e.g. one hot #279
Comments
I added this parameter to OrdinalEncoder, since there was also #81 issue, which was basically asking for the same thing. I can create pr if this is still needed |
Hi @glevv |
@PaulWestenthanner I think I started doing this, made a beta version, but after testing it realized that I will have to go deeper and change a lot of things and I did not have spare time for that then. |
Ok nice! Do you want to continue working on it? Otherwise I might look into it in March/April |
I don't think I will work on it in the near future. You can give it a go |
I would like to work on it. |
Hi Julia, |
Hi @PaulWestenthanner ! I refactored the code count.py so the functionality can be extracted and pulled into the base class, so all classes can inherit this feature. Some comments that I'd like to make: There's an attribute called
I would recommend to leave out this parameter and always go for the default for the following reasons:
If we do this refactor, the code shrinks by about a third, and becomes easier understandable and maintainable. What do you think? |
@PaulWestenthanner Another simplification I'd suggest. There is right now another attribute called if the I think this parameter should be ommitted because it seems to lead to inconsistent behavior and changing names, I suggest having a kind of default name wrangling without custom option. Then we get rid of this parameter and the code gets easier to understand. I would suggest for name wrangling a name that's like if we do the above two suggestions, the min_group feature can be controlled by just one parameter, called |
My suggestion for the integration to BaseEncocer: We add the Then all the subclasses have it without us having to change their code. |
Hi Julia, thanks for all the work you've put into this. I really like your suggestions:
|
CountEncoder() has a min_group_size parameter that sets a minimum number of obs in a group required in order not for the group to be lumped together with other small groups. It would be nice to have this for the other encoder classes.
The text was updated successfully, but these errors were encountered: