-
-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FilterAUC and missing values #60
Comments
Could this also apply to more filters than just |
why dont we throw an error? if NAs are there. |
i guess ignoring the NA-based obs in the calculation is "cleanest" and most robust as michel suggested. but we should probably then do this in a global place, unit test this properly and also document it visibly |
a3e43f9 replaced Metrics and ignores NAs, but we still need tests and check the behaviour of other filters. |
Ignoring Nas is actually wrong after thinking about this more pls don't merge / release this without further discussion |
If you have a feature with 98% missing values and for the remainder there is a high or perfect correlation with the target that feature would get a very high score. That's wrong? Nas should be an error for filters. And users should transparently impute them. Agreed? |
@mllg Looking at |
For filters, Bernd suggested throwing an error. I assume this is the safest way to deal with this. If we want to allow missing values by just removing them (as |
Right now it seems like NAs are removed prior to score calculation Lines 42 to 46 in 13988d3
I would add an assertion which checks for NAs in any feature and apply this to every filter with a descriptive error message to use a pipeop to impute these values? |
FilterAUC
operates on features with missing values by just ranking the missing values last (default inrank()
). I'm not sure that this is statistically sound.I'd suggest removing them and calculate the AUC on the remaining observations.
@berndbischl @pat-s ?
The text was updated successfully, but these errors were encountered: