Enable cohort filtering for "Predicted Y" #1722

gaugup · 2022-09-15T21:53:44Z

Description

This PR enables cohort filtering capability for "Predicted Y". It enables this by adding the predicted values to the dataframe before the cohort filtering logic using the pandas query function. This PR has the following changes:-

Add predicted y values in all cases in cohort_filter.py. If the predictions are available apriori then it adds those predictions.
Adds tests in test_cohort_filter.py to run tests from PredictionAnalyzer which takes predictions instead of the model.
Enables cohort filtering for Predicted Y case.
Other changes in erroranalysis package to handle scenario when predicted y is present on the filtered data.

Checklist

I have added screenshots above for all UI changes.
I have added e2e tests for all UI changes.
Documentation was updated if it was needed.

Signed-off-by: Gaurav Gupta <[email protected]>

codecov-commenter · 2022-09-15T21:58:13Z

Codecov Report

Merging #1722 (71bfc81) into main (32408d5) will decrease coverage by 0.68%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main    #1722      +/-   ##
==========================================
- Coverage   92.52%   91.84%   -0.68%     
==========================================
  Files          55       35      -20     
  Lines        2127      699    -1428     
==========================================
- Hits         1968      642    -1326     
+ Misses        159       57     -102

Flag	Coverage Δ
unittests	`91.84% <ø> (-0.68%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

see 20 files with indirect coverage changes

github-actions · 2022-09-15T22:27:47Z

https://responsibleai.blob.core.windows.net/pullrequest/microsoft/responsible-ai-toolbox/gaugup/AddPredYCohortFiltering/dashboard/index.html

github-actions · 2022-09-15T22:30:44Z

https://responsibleai.blob.core.windows.net/pullrequest/microsoft/responsible-ai-toolbox/gaugup/AddPredYCohortFiltering/dashboard/index.html

imatiach-msft · 2022-09-16T18:07:01Z

erroranalysis/erroranalysis/_internal/cohort_filter.py

+        else:
+            if not is_spark(self.dataset):
+                if not isinstance(self.dataset, pd.DataFrame):
+                    df[PRED_Y] = self.model.predict(self.dataset)


Here we may be calculating PRED_Y even if we don't need it for filtering?
That is why the calculation was done in has_classification_outcome instead previously, so we would only call predict (which is expensive) if we really need it:

if has_classification_outcome: if PRED_Y in df: pred_y = df[PRED_Y] else: # calculate directly via prediction on model pred_y = self.model.predict( df.drop(columns=[TRUE_Y, ROW_INDEX]))

We can just have a similar check to has_classification_outcome, such as has_predicted_Y, if there is a "Predicted Y" filter passed here - and only then call the predict function.

imatiach-msft · 2022-09-16T18:10:00Z

erroranalysis/tests/test_cohort_filter.py

+                                      model_task,
+                                      filters=filters)
+
+    # @pytest.mark.skip("Skipping this test due to a bug condition "


remove commented code here (to re-enable test)

imatiach-msft · 2022-09-16T18:11:58Z

erroranalysis/tests/test_cohort_filter.py

    return validation_data


+def run_different_error_analyzers(validation_data,


nit: maybe rename this to just
run_error_analyzers
I think this is shorter and more succinct, since "different" doesn't add more information to the method name

Signed-off-by: Gaurav Gupta <[email protected]>

github-actions · 2023-06-21T23:23:24Z

https://responsibleai.blob.core.windows.net/pullrequest/microsoft/responsible-ai-toolbox/gaugup/AddPredYCohortFiltering/dashboard/index.html

github-actions · 2023-06-21T23:36:11Z

https://responsibleai.blob.core.windows.net/pullrequest/microsoft/responsible-ai-toolbox/gaugup/AddPredYCohortFiltering/dashboard/index.html

github-actions · 2023-06-26T19:57:21Z

https://responsibleai.blob.core.windows.net/pullrequest/microsoft/responsible-ai-toolbox/gaugup/AddPredYCohortFiltering/dashboard/index.html

gaugup added 6 commits September 8, 2022 11:51

Attempt to add test for PredictionsAnalyzer

2775a2c

Signed-off-by: Gaurav Gupta <[email protected]>

Merge branch 'main' into gaugup/AddPredYCohortFiltering

a9515cd

Cache pred_y to support 'Predicted Y' filter

f24f474

Signed-off-by: Gaurav Gupta <[email protected]>

Fix bug

dd66029

Signed-off-by: Gaurav Gupta <[email protected]>

Merge branch 'main' into gaugup/AddPredYCohortFiltering

770ce82

Fix lint errors

4489db9

Signed-off-by: Gaurav Gupta <[email protected]>

gaugup requested review from imatiach-msft, RubyZ10, vinuthakaranth and tongyu-microsoft as code owners September 15, 2022 21:53

Remove pdb

307ee96

Signed-off-by: Gaurav Gupta <[email protected]>

Merge branch 'main' into gaugup/AddPredYCohortFiltering

cb3fd93

RubyZ10 approved these changes Sep 16, 2022

View reviewed changes

imatiach-msft reviewed Sep 16, 2022

View reviewed changes

imatiach-msft approved these changes Sep 16, 2022

View reviewed changes

Merge branch 'main' into gaugup/AddPredYCohortFiltering

bec2fdd

Signed-off-by: Gaurav Gupta <[email protected]>

gaugup requested review from xuke444 and hawestra as code owners June 21, 2023 22:53

gaugup added 2 commits June 21, 2023 16:04

Merge branch 'main' into gaugup/AddPredYCohortFiltering

61ce12b

Fix lint

bbd9dea

Signed-off-by: Gaurav Gupta <[email protected]>

Merge branch 'main' into gaugup/AddPredYCohortFiltering

71bfc81

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable cohort filtering for "Predicted Y" #1722

Enable cohort filtering for "Predicted Y" #1722

gaugup commented Sep 15, 2022

codecov-commenter commented Sep 15, 2022 •

edited

Loading

github-actions bot commented Sep 15, 2022

github-actions bot commented Sep 15, 2022

imatiach-msft Sep 16, 2022

imatiach-msft Sep 16, 2022

imatiach-msft Sep 16, 2022

imatiach-msft Sep 16, 2022

github-actions bot commented Jun 21, 2023

github-actions bot commented Jun 21, 2023

github-actions bot commented Jun 26, 2023

		return validation_data


		def run_different_error_analyzers(validation_data,

Enable cohort filtering for "Predicted Y" #1722

Are you sure you want to change the base?

Enable cohort filtering for "Predicted Y" #1722

Conversation

gaugup commented Sep 15, 2022

Description

Checklist

codecov-commenter commented Sep 15, 2022 • edited Loading

Codecov Report

github-actions bot commented Sep 15, 2022

github-actions bot commented Sep 15, 2022

imatiach-msft Sep 16, 2022

Choose a reason for hiding this comment

imatiach-msft Sep 16, 2022

Choose a reason for hiding this comment

imatiach-msft Sep 16, 2022

Choose a reason for hiding this comment

imatiach-msft Sep 16, 2022

Choose a reason for hiding this comment

github-actions bot commented Jun 21, 2023

github-actions bot commented Jun 21, 2023

github-actions bot commented Jun 26, 2023

codecov-commenter commented Sep 15, 2022 •

edited

Loading