DOC: sometimes the Lasso solution is the same as sklearn, sometimes not #186

mathurinm · 2021-12-17T19:23:19Z

Hi @mblondel @fabianp
I think this will be short to answer, why is the solution sometimes equal to that of sklearn, and sometimes not ?

This should be quick to reproduce, look at 1st and 3rd result over 5 seeds:

import numpy as np
from numpy.linalg import norm
from lightning.regression import CDRegressor
from sklearn.linear_model import Lasso

np.random.seed(0)
X = np.random.randn(200, 500)
beta = np.ones(X.shape[1])
beta[20:] = 0
y = X @ beta + 0.3 * np.random.randn(X.shape[0])
alpha = norm(X.T @ y, ord=np.inf) / 10


def p_obj(X, y, alpha, w):
    return norm(y - X @ w) ** 2 / 2 + alpha * norm(w, ord=1)


for seed in range(5):
    print('-' * 80)
    clf = CDRegressor(C=0.5, alpha=alpha, penalty='l1',
                      tol=1-30, random_state=seed)
    clf.fit(X, y)

    las = Lasso(fit_intercept=False, alpha=alpha/len(y), tol=1e-10).fit(X, y)
    print(norm(clf.coef_[0] - las.coef_))

    light_o = p_obj(X, y, alpha, clf.coef_[0])
    sklea_o = p_obj(X, y, alpha, las.coef_)

    print(light_o - sklea_o)

ping @QB3 @agramfort

mathurinm · 2021-12-19T10:50:33Z

setting permute=False fixes the issue. There may be a bug because permuting feature order is not an heuristic that can prevent convergence (I may be missing what permute does)

agramfort · 2021-12-19T11:35:33Z

it's like the random option in sklearn I suspect

…

Message ID: ***@***.*** com>

mblondel · 2021-12-19T22:26:05Z

Thanks for the repro @mathurinm!

The permute option indeeds just permutes the coordinates:
https://github.com/scikit-learn-contrib/lightning/blob/master/lightning/impl/primal_cd_fast.pyx#L1306

permute=True, shrinking=False works so it seems to be the combination of permute=True and shrinking=True that is problematic.

mathurinm · 2021-12-20T09:27:26Z

I also thought that there was an issue when the two of them were True, but in fact for small L1 pen strength, permute=False, shrinking=True gives different results too:

import numpy as np
from numpy.linalg import norm
from lightning.regression import CDRegressor
from sklearn.linear_model import Lasso

np.random.seed(0)
X = np.random.randn(200, 500)
beta = np.ones(X.shape[1])
beta[20:] = 0
y = X @ beta + 0.3 * np.random.randn(X.shape[0])
alpha = norm(X.T @ y, ord=np.inf) / 100


def p_obj(X, y, alpha, w):
    return norm(y - X @ w) ** 2 / 2 + alpha * norm(w, ord=1)


for shrinking in (True, False):
    seed = 0
    print('-' * 80)
    print(f"With shrinking={shrinking} and permute=False")
    clf = CDRegressor(C=0.5, alpha=alpha, penalty='l1',
                      tol=1-30, random_state=seed, permute=False,
                      shrinking=shrinking)
    clf.fit(X, y)

    las = Lasso(fit_intercept=False, alpha=alpha/len(y), max_iter=100_000,
                tol=1e-10).fit(X, y)
    print(f'distance between coeffs: {norm(clf.coef_[0] - las.coef_)}')

    light_o = p_obj(X, y, alpha, clf.coef_[0])
    sklea_o = p_obj(X, y, alpha, las.coef_)

    print(f"lightning obj - sklearn_obj : {light_o - sklea_o:.7f}")

mblondel · 2021-12-22T00:03:58Z

Thanks a lot for the investigation @mathurinm! So it seems that shrinking=False is 'unsafe'. Maybe the right thing to do would be to set it to False by default?

mathurinm mentioned this issue Feb 21, 2022

Unsafe screening with CDClassifier? #140

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: sometimes the Lasso solution is the same as sklearn, sometimes not #186

DOC: sometimes the Lasso solution is the same as sklearn, sometimes not #186

mathurinm commented Dec 17, 2021 •

edited

Loading

mathurinm commented Dec 19, 2021

agramfort commented Dec 19, 2021 via email

mblondel commented Dec 19, 2021

mathurinm commented Dec 20, 2021

mblondel commented Dec 22, 2021

DOC: sometimes the Lasso solution is the same as sklearn, sometimes not #186

DOC: sometimes the Lasso solution is the same as sklearn, sometimes not #186

Comments

mathurinm commented Dec 17, 2021 • edited Loading

mathurinm commented Dec 19, 2021

agramfort commented Dec 19, 2021 via email

mblondel commented Dec 19, 2021

mathurinm commented Dec 20, 2021

mblondel commented Dec 22, 2021

mathurinm commented Dec 17, 2021 •

edited

Loading