Add robust metric #122

TimotheeMathieu · 2021-06-25T09:13:19Z

This PR use Huber robust mean estimator to make a robust metric.

Description: one of the big challenge of robust machine learning is that the usual scoring scheme (cross_validation with MSE for instance) is not robust. Indeed, if the dataset has some outliers, then the test sets in cross_validation may have outliers and then the cross_validation MSE would give us a huge error for our robust algorithm on any corrupted data. This is why for example robust methods cannot be efficient for regression challenges in kaggle, because the error computation is not robust.
This PR propose a robust metric that would allow us to compute a robust cross-validation MSE for instance.

Example :

import numpy as np
from sklearn.metrics import mean_squared_error
from sklearn_extra.robust import make_huber_metric

robust_mse = make_huber_metric(mean_squared_error, c=9) 
# c = 9 -> more than 99% of a normal is within [-3, 3]. Hence more that 99% of a normal squared is within [0,9].

y_true = np.random.normal(size=100)
y_true_cor = y_true.copy()
y_true_cor[42] = 20 # this is an outlier in the test set
y_pred = np.random.normal(size=100)


print('MSE on uncorrupted : %.3F ' %(mean_squared_error(y_true, y_pred)))
print('Robust MSE on uncorrupted : %.3F ' %(robust_mse(y_true, y_pred)))
print('MSE on corrupted : %.3F ' %(mean_squared_error(y_true_cor, y_pred)))
print('Robust MSE on corrupted : %.3F ' %(robust_mse(y_true_cor, y_pred)))

This returns

MSE on uncorrupted : 2.152 
Robust MSE on uncorrupted : 2.072 
MSE on corrupted : 7.202 
Robust MSE on corrupted : 2.072

doc/modules/robust.rst

examples/robust/robust_cv_example.py

rth

Thanks @TimotheeMathieu !

I was wondering if @lorentzenchr has an opinion on this by any chance?

I think we would at least need more tests, for instance checking that cross-validation works on the resulting metric. Also I think all of scikit-learn metrics support sample_weight. Would it make sense to add it here?

sklearn_extra/robust/mean_estimators.py

lorentzenchr · 2021-06-25T22:46:55Z

Having a huber loss available as metric makes sense for models fitted with huber loss.

Be aware that the huber loss elicits something in between the median and the expectation, so it is not really clear what you get/estimate. The omnipresent point about MSE not being robust has at least 2 important points:

The (estimation of the) expectation (mean) is not robust in general.
There are alternatives for the MSE, in particular for positive targets, that elicit the expectation, e.g. all tweedie deviances.

Last but not least, my all time favorite reference: https://arxiv.org/abs/0912.0902

TimotheeMathieu · 2021-06-26T06:46:44Z

Thanks for the comments.

@lorentzenchr what I did is not the Huber loss. It is a robust estimator of the mean applied to the squared errors.
I used the MSE only as an example, I can also do a robust version of mean absolute error if I use make_huber_metric(mean_absolute_error, c=9) , this is very different because our aim is always to estimate the MSE or mean absolute error but while ignoring the outliers. I don't use a different loss function, I use a different way to estimate the mean in MEAN squared error and MEAN absolute error. because the empirical mean is not robust while Huber estimator is robust.
This can be a problem for people used to Huber loss but in fact this is very different and it is also from Huber so I can't really change the name.

This is always robust, and we know this from the theory
We still estimate the MSE or MAE... we don't really change the metric estimated. We just make it robust. So it can still be interpreted as you would usually interpret an MSE or MAE...

If you want to see references, for instance there is Robust location estimator by Huber or more recently Challenging the empirical mean and empirical variance: a deviation study by Catoni.

EDIT : I added an explanation in the user guide that gives some equations to explain this.

lorentzenchr · 2021-06-26T18:56:42Z

@TimotheeMathieu Thanks for the explanation. Now I get it. Something that could be mentioned in the example is the trimmed mean as a simpler entry point to robust estimation.

…-learn-extra into robust_metric

TimotheeMathieu added 5 commits June 25, 2021 10:54

add robust metric maker

b3639a9

add docstring

ae055b0

fix doctring example

c43de90

add doc

c266b21

Add example and doc

81b0333

chkoar reviewed Jun 25, 2021

View reviewed changes

doc/modules/robust.rst Outdated Show resolved Hide resolved

make example executable and try another link

8f829e9

chkoar reviewed Jun 25, 2021

View reviewed changes

examples/robust/robust_cv_example.py Outdated Show resolved Hide resolved

TimotheeMathieu added 3 commits June 25, 2021 15:43

add make_huber_metric to api doc

56cd4e7

fix doc

e141e8a

fix doc api

834a00b

TimotheeMathieu mentioned this pull request Jun 25, 2021

Bug with CI codecov and doc #123

Closed

rth reviewed Jun 25, 2021

View reviewed changes

sklearn_extra/robust/mean_estimators.py Show resolved Hide resolved

sklearn_extra/robust/mean_estimators.py Outdated Show resolved Hide resolved

sklearn_extra/robust/mean_estimators.py Outdated Show resolved Hide resolved

TimotheeMathieu added 2 commits June 26, 2021 09:03

add more explanation and change names of variables

e7a006f

add test robust cv

a48bc1b

TimotheeMathieu and others added 5 commits June 27, 2021 09:54

Merge branch 'main' of https://github.com/scikit-learn-contrib/scikit…

4a4983d

…-learn-extra into robust_metric

add to changelog, add trimmed mean example to example

1e16ab2

Merge branch 'main' into robust_metric

6f3e31d

black reformat

495b852

fix docstring

0a3843a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add robust metric #122

Add robust metric #122

TimotheeMathieu commented Jun 25, 2021 •

edited

Loading

rth left a comment

lorentzenchr commented Jun 25, 2021

TimotheeMathieu commented Jun 26, 2021 •

edited

Loading

lorentzenchr commented Jun 26, 2021

Add robust metric #122

Are you sure you want to change the base?

Add robust metric #122

Conversation

TimotheeMathieu commented Jun 25, 2021 • edited Loading

rth left a comment

Choose a reason for hiding this comment

lorentzenchr commented Jun 25, 2021

TimotheeMathieu commented Jun 26, 2021 • edited Loading

lorentzenchr commented Jun 26, 2021

TimotheeMathieu commented Jun 25, 2021 •

edited

Loading

TimotheeMathieu commented Jun 26, 2021 •

edited

Loading