Robust irls for regression #130

TimotheeMathieu · 2021-10-13T09:07:52Z

New optimization scheme for RobustWeightedRegressor.
Use IRLS (iterative reweighted least squares). Faster in low dimension than the SGD counterpart.

More extensive experiments would be needed to compare IRLS solver to SGD solver, but on simple examples IRLS seems to perform rather well.

rth

A few comments below. Thanks for this PR. Also I don't understand how this works: I would have expected to see some code implementing the IRLS solver but I couldn't find any..

sklearn_extra/robust/robust_weighted_estimator.py

rth · 2021-10-19T07:52:04Z

sklearn_extra/robust/robust_weighted_estimator.py

+            if self.solver == "SGD":
+                base_estimator.partial_fit(X, y)
+            else:
+                base_estimator.fit(X, y)


Why is this difference? If it's OK to call fit, then can't we call fit for all estimators? For SGD it's mostly equivalent.

partial_fit only does one iteration, fit do not, I could use fit with max_iter=1 alternatively, would this be better ?
It is important to use only one iteration because there may be outliers in the data and training on the whole dataset would imply that a lot of steps are non-robust and for SGD with a decreasing step-size it may never recover. This is different for IRLS.

Co-authored-by: Roman Yurchak <[email protected]>

TimotheeMathieu · 2021-10-19T08:39:08Z

The IRLS algorithm is a bit implicit I agree. The principle is that we do least squares at each iteration (cf line 392) and this least squares is reweighted using the custom robust weights:

for epoch in range(self.max_iter):
    weights = ...
    base_estimator.fit(X, y, sample_weight=weights)

So this is exactly an IRLS algorithm.

rth · 2021-10-20T12:45:43Z

I see your point. I just find calling this option solver a bit confusing, as far as I understand that would mixe two different things between the inner solver and this outer iterative approach? We are iteratively reweighting an estimator with a least square loss in any case, and in that sense one can argue that it's IRLS also for SGDClassifier, can't we?

Wouldn't it be clearer to accept base_estimator as parameter with a default to SDG? Also I really think we should use Ridge instead of LinearRegression there. Also to be consistent with SGDClassifier, otherwise we are not optimizing the same loss, and the solver shouldn't impact on the loss. This would also remove the need for sgd_kwargs parameters. In particular, currently if one would want to do a grid search on the optimal regularization this would not be obvious with the solver parameter.

So ideally, RobustWeightedRegressor().get_params() should also display hyperparameters of the inner estimator, which is a pre-requisite for using GridSearchCV. And it should if it's a meta-estimator, I think (we would need to check how it's done in scikit-learn meta-estimators, I don't remember)

TimotheeMathieu · 2021-10-20T13:57:33Z

I see your point. I just find calling this option solver a bit confusing, as far as I understand that would mixe two different things between the inner solver and this outer iterative approach? We are iteratively reweighting an estimator with a least square loss in any case, and in that sense one can argue that it's IRLS also for SGDClassifier, can't we?

Yes you are right, although we can use alternative losses for SGD. Maybe I can replace "solver" by "iterative_scheme" or something like that, it could take as input 'SGD' or 'OLS' (or 'Ridge') but the algorithm is a little different for the two approaches because SGD you do only one iteration in each loop while for least squares, we do the whole optimization at each step. Maybe I can use a base_estimator, and then I can check, if max_iter is in the parameters I set it to 1 otherwise I do the whole optimization at each (outer) step.

Wouldn't it be clearer to accept base_estimator as parameter with a default to SDG? Also I really think we should use Ridge instead of LinearRegression there. Also to be consistent with SGDClassifier, otherwise we are not optimizing the same loss, and the solver shouldn't impact on the loss. This would also remove the need for sgd_kwargs parameters. In particular, currently if one would want to do a grid search on the optimal regularization this would not be obvious with the solver parameter.

Yes it is doable but I really want to keep the SGD one iteration per loop approach because it helps with stability. If you do the whole optimization at each iteration then there can be instability because in the first (outer) iterations, the weights are not robust yet. Ok for Ridge, but I don't understand why you say that this would remove the need for sgd_kwargs ?

So ideally, RobustWeightedRegressor().get_params() should also display hyperparameters of the inner estimator, which is a pre-requisite for using GridSearchCV. And it should if it's a meta-estimator, I think (we would need to check how it's done in scikit-learn meta-estimators, I don't remember)

I will check.

rth · 2021-10-20T14:01:02Z

Ok for Ridge, but I don't understand why you say that this would remove the need for sgd_kwargs ?

It wouldn't. I mean if we go with supporting multiple base estimators, we probably should rename that to something that's independent from SGD (i.e. estimator_kwargs instead of ridge_kwarg and sgd_kwargs), or ideally something that would work with get_params/set_params and would be compatible with GridSearchCV.

TimotheeMathieu added 3 commits October 13, 2021 10:59

add IRLS algo

6468cf4

black

817d869

fix default parameter

08bcc3b

rth reviewed Oct 19, 2021

View reviewed changes

Apply suggestions from code review

28c7c57

Co-authored-by: Roman Yurchak <[email protected]>

TimotheeMathieu added 4 commits October 19, 2021 11:01

fix tests and apply review comments

391efd0

fix test

e1aaaed

Merge remote-tracking branch 'upstream/main' into robust_IRLS

9900559

fix test clustering new scikit-learn

decca0e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Robust irls for regression #130

Robust irls for regression #130

TimotheeMathieu commented Oct 13, 2021

rth left a comment •

edited

Loading

rth Oct 19, 2021

TimotheeMathieu Oct 19, 2021

TimotheeMathieu commented Oct 19, 2021 •

edited

Loading

rth commented Oct 20, 2021

TimotheeMathieu commented Oct 20, 2021

rth commented Oct 20, 2021 •

edited

Loading

Robust irls for regression #130

Are you sure you want to change the base?

Robust irls for regression #130

Conversation

TimotheeMathieu commented Oct 13, 2021

rth left a comment • edited Loading

Choose a reason for hiding this comment

rth Oct 19, 2021

Choose a reason for hiding this comment

TimotheeMathieu Oct 19, 2021

Choose a reason for hiding this comment

TimotheeMathieu commented Oct 19, 2021 • edited Loading

rth commented Oct 20, 2021

TimotheeMathieu commented Oct 20, 2021

rth commented Oct 20, 2021 • edited Loading

rth left a comment •

edited

Loading

TimotheeMathieu commented Oct 19, 2021 •

edited

Loading

rth commented Oct 20, 2021 •

edited

Loading