-
Notifications
You must be signed in to change notification settings - Fork 351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable cohort filtering for "Predicted Y" #1722
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Gaurav Gupta <[email protected]>
Signed-off-by: Gaurav Gupta <[email protected]>
Signed-off-by: Gaurav Gupta <[email protected]>
Signed-off-by: Gaurav Gupta <[email protected]>
Signed-off-by: Gaurav Gupta <[email protected]>
Codecov Report
@@ Coverage Diff @@
## main #1722 +/- ##
==========================================
- Coverage 92.52% 91.84% -0.68%
==========================================
Files 55 35 -20
Lines 2127 699 -1428
==========================================
- Hits 1968 642 -1326
+ Misses 159 57 -102
Flags with carried forward coverage won't be shown. Click here to find out more. |
1 similar comment
else: | ||
if not is_spark(self.dataset): | ||
if not isinstance(self.dataset, pd.DataFrame): | ||
df[PRED_Y] = self.model.predict(self.dataset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we may be calculating PRED_Y even if we don't need it for filtering?
That is why the calculation was done in has_classification_outcome instead previously, so we would only call predict (which is expensive) if we really need it:
if has_classification_outcome:
if PRED_Y in df:
pred_y = df[PRED_Y]
else:
# calculate directly via prediction on model
pred_y = self.model.predict(
df.drop(columns=[TRUE_Y, ROW_INDEX]))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can just have a similar check to has_classification_outcome, such as has_predicted_Y
, if there is a "Predicted Y" filter passed here - and only then call the predict function.
model_task, | ||
filters=filters) | ||
|
||
# @pytest.mark.skip("Skipping this test due to a bug condition " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove commented code here (to re-enable test)
return validation_data | ||
|
||
|
||
def run_different_error_analyzers(validation_data, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe rename this to just
run_error_analyzers
I think this is shorter and more succinct, since "different" doesn't add more information to the method name
Signed-off-by: Gaurav Gupta <[email protected]>
Signed-off-by: Gaurav Gupta <[email protected]>
Description
This PR enables cohort filtering capability for "Predicted Y". It enables this by adding the predicted values to the dataframe before the cohort filtering logic using the pandas query function. This PR has the following changes:-
cohort_filter.py
. If the predictions are available apriori then it adds those predictions.PredictionAnalyzer
which takes predictions instead of the model.Predicted Y
case.Checklist