-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Robust irls for regression #130
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments below. Thanks for this PR. Also I don't understand how this works: I would have expected to see some code implementing the IRLS solver but I couldn't find any..
if self.solver == "SGD": | ||
base_estimator.partial_fit(X, y) | ||
else: | ||
base_estimator.fit(X, y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this difference? If it's OK to call fit, then can't we call fit for all estimators? For SGD it's mostly equivalent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
partial_fit only does one iteration, fit do not, I could use fit with max_iter=1 alternatively, would this be better ?
It is important to use only one iteration because there may be outliers in the data and training on the whole dataset would imply that a lot of steps are non-robust and for SGD with a decreasing step-size it may never recover. This is different for IRLS.
Co-authored-by: Roman Yurchak <[email protected]>
The IRLS algorithm is a bit implicit I agree. The principle is that we do least squares at each iteration (cf line 392) and this least squares is reweighted using the custom robust weights: for epoch in range(self.max_iter):
weights = ...
base_estimator.fit(X, y, sample_weight=weights) So this is exactly an IRLS algorithm. |
I see your point. I just find calling this option Wouldn't it be clearer to accept So ideally, |
Yes you are right, although we can use alternative losses for SGD. Maybe I can replace "solver" by "iterative_scheme" or something like that, it could take as input 'SGD' or 'OLS' (or 'Ridge') but the algorithm is a little different for the two approaches because SGD you do only one iteration in each loop while for least squares, we do the whole optimization at each step. Maybe I can use a base_estimator, and then I can check, if max_iter is in the parameters I set it to 1 otherwise I do the whole optimization at each (outer) step.
Yes it is doable but I really want to keep the SGD one iteration per loop approach because it helps with stability. If you do the whole optimization at each iteration then there can be instability because in the first (outer) iterations, the weights are not robust yet. Ok for Ridge, but I don't understand why you say that this would remove the need for sgd_kwargs ?
I will check. |
It wouldn't. I mean if we go with supporting multiple base estimators, we probably should rename that to something that's independent from SGD (i.e. |
New optimization scheme for RobustWeightedRegressor.
Use IRLS (iterative reweighted least squares). Faster in low dimension than the SGD counterpart.
More extensive experiments would be needed to compare IRLS solver to SGD solver, but on simple examples IRLS seems to perform rather well.