Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable cohort filtering for "Predicted Y" #1722

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

gaugup
Copy link
Contributor

@gaugup gaugup commented Sep 15, 2022

Description

This PR enables cohort filtering capability for "Predicted Y". It enables this by adding the predicted values to the dataframe before the cohort filtering logic using the pandas query function. This PR has the following changes:-

  1. Add predicted y values in all cases in cohort_filter.py. If the predictions are available apriori then it adds those predictions.
  2. Adds tests in test_cohort_filter.py to run tests from PredictionAnalyzer which takes predictions instead of the model.
  3. Enables cohort filtering for Predicted Y case.
  4. Other changes in erroranalysis package to handle scenario when predicted y is present on the filtered data.

Checklist

  • I have added screenshots above for all UI changes.
  • I have added e2e tests for all UI changes.
  • Documentation was updated if it was needed.

Signed-off-by: Gaurav Gupta <[email protected]>
@codecov-commenter
Copy link

codecov-commenter commented Sep 15, 2022

Codecov Report

Merging #1722 (71bfc81) into main (32408d5) will decrease coverage by 0.68%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main    #1722      +/-   ##
==========================================
- Coverage   92.52%   91.84%   -0.68%     
==========================================
  Files          55       35      -20     
  Lines        2127      699    -1428     
==========================================
- Hits         1968      642    -1326     
+ Misses        159       57     -102     
Flag Coverage Δ
unittests 91.84% <ø> (-0.68%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

see 20 files with indirect coverage changes

1 similar comment
else:
if not is_spark(self.dataset):
if not isinstance(self.dataset, pd.DataFrame):
df[PRED_Y] = self.model.predict(self.dataset)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we may be calculating PRED_Y even if we don't need it for filtering?
That is why the calculation was done in has_classification_outcome instead previously, so we would only call predict (which is expensive) if we really need it:

        if has_classification_outcome:
            if PRED_Y in df:
                pred_y = df[PRED_Y]
            else:
                # calculate directly via prediction on model
                pred_y = self.model.predict(
                    df.drop(columns=[TRUE_Y, ROW_INDEX]))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just have a similar check to has_classification_outcome, such as has_predicted_Y, if there is a "Predicted Y" filter passed here - and only then call the predict function.

model_task,
filters=filters)

# @pytest.mark.skip("Skipping this test due to a bug condition "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove commented code here (to re-enable test)

return validation_data


def run_different_error_analyzers(validation_data,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe rename this to just
run_error_analyzers
I think this is shorter and more succinct, since "different" doesn't add more information to the method name

1 similar comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants