Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Robust irls for regression #130

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

Robust irls for regression #130

wants to merge 8 commits into from

Conversation

TimotheeMathieu
Copy link
Contributor

New optimization scheme for RobustWeightedRegressor.
Use IRLS (iterative reweighted least squares). Faster in low dimension than the SGD counterpart.

More extensive experiments would be needed to compare IRLS solver to SGD solver, but on simple examples IRLS seems to perform rather well.

Copy link
Contributor

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments below. Thanks for this PR. Also I don't understand how this works: I would have expected to see some code implementing the IRLS solver but I couldn't find any..

sklearn_extra/robust/robust_weighted_estimator.py Outdated Show resolved Hide resolved
sklearn_extra/robust/robust_weighted_estimator.py Outdated Show resolved Hide resolved
if self.solver == "SGD":
base_estimator.partial_fit(X, y)
else:
base_estimator.fit(X, y)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this difference? If it's OK to call fit, then can't we call fit for all estimators? For SGD it's mostly equivalent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partial_fit only does one iteration, fit do not, I could use fit with max_iter=1 alternatively, would this be better ?
It is important to use only one iteration because there may be outliers in the data and training on the whole dataset would imply that a lot of steps are non-robust and for SGD with a decreasing step-size it may never recover. This is different for IRLS.

@TimotheeMathieu
Copy link
Contributor Author

TimotheeMathieu commented Oct 19, 2021

The IRLS algorithm is a bit implicit I agree. The principle is that we do least squares at each iteration (cf line 392) and this least squares is reweighted using the custom robust weights:

for epoch in range(self.max_iter):
    weights = ...
    base_estimator.fit(X, y, sample_weight=weights)

So this is exactly an IRLS algorithm.

@rth
Copy link
Contributor

rth commented Oct 20, 2021

I see your point. I just find calling this option solver a bit confusing, as far as I understand that would mixe two different things between the inner solver and this outer iterative approach? We are iteratively reweighting an estimator with a least square loss in any case, and in that sense one can argue that it's IRLS also for SGDClassifier, can't we?

Wouldn't it be clearer to accept base_estimator as parameter with a default to SDG? Also I really think we should use Ridge instead of LinearRegression there. Also to be consistent with SGDClassifier, otherwise we are not optimizing the same loss, and the solver shouldn't impact on the loss. This would also remove the need for sgd_kwargs parameters. In particular, currently if one would want to do a grid search on the optimal regularization this would not be obvious with the solver parameter.

So ideally, RobustWeightedRegressor().get_params() should also display hyperparameters of the inner estimator, which is a pre-requisite for using GridSearchCV. And it should if it's a meta-estimator, I think (we would need to check how it's done in scikit-learn meta-estimators, I don't remember)

@TimotheeMathieu
Copy link
Contributor Author

I see your point. I just find calling this option solver a bit confusing, as far as I understand that would mixe two different things between the inner solver and this outer iterative approach? We are iteratively reweighting an estimator with a least square loss in any case, and in that sense one can argue that it's IRLS also for SGDClassifier, can't we?

Yes you are right, although we can use alternative losses for SGD. Maybe I can replace "solver" by "iterative_scheme" or something like that, it could take as input 'SGD' or 'OLS' (or 'Ridge') but the algorithm is a little different for the two approaches because SGD you do only one iteration in each loop while for least squares, we do the whole optimization at each step. Maybe I can use a base_estimator, and then I can check, if max_iter is in the parameters I set it to 1 otherwise I do the whole optimization at each (outer) step.

Wouldn't it be clearer to accept base_estimator as parameter with a default to SDG? Also I really think we should use Ridge instead of LinearRegression there. Also to be consistent with SGDClassifier, otherwise we are not optimizing the same loss, and the solver shouldn't impact on the loss. This would also remove the need for sgd_kwargs parameters. In particular, currently if one would want to do a grid search on the optimal regularization this would not be obvious with the solver parameter.

Yes it is doable but I really want to keep the SGD one iteration per loop approach because it helps with stability. If you do the whole optimization at each iteration then there can be instability because in the first (outer) iterations, the weights are not robust yet. Ok for Ridge, but I don't understand why you say that this would remove the need for sgd_kwargs ?

So ideally, RobustWeightedRegressor().get_params() should also display hyperparameters of the inner estimator, which is a pre-requisite for using GridSearchCV. And it should if it's a meta-estimator, I think (we would need to check how it's done in scikit-learn meta-estimators, I don't remember)

I will check.

@rth
Copy link
Contributor

rth commented Oct 20, 2021

Ok for Ridge, but I don't understand why you say that this would remove the need for sgd_kwargs ?

It wouldn't. I mean if we go with supporting multiple base estimators, we probably should rename that to something that's independent from SGD (i.e. estimator_kwargs instead of ridge_kwarg and sgd_kwargs), or ideally something that would work with get_params/set_params and would be compatible with GridSearchCV.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants