Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add R2 as performance statistic #483

Open
fweber144 opened this issue Nov 30, 2023 · 4 comments
Open

Add R2 as performance statistic #483

fweber144 opened this issue Nov 30, 2023 · 4 comments
Labels
enhancement Enhancements of existing features, but also new feature requests.

Comments

@fweber144
Copy link
Collaborator

As suggested by @avehtari, it would be good to have $R^2$ as a performance statistic in projpred. This could be called stats = "R2" (and stat = "R2" for suggest_size()), for example. According to @avehtari, we should go for LOO - $R^2$.

There is also related code at

projpred/R/summary_funs.R

Lines 170 to 187 in bec6258

if (stat == "r2") {
if (!is.null(mu.bs)) {
y <- mu.bs
} else {
y <- d_test$y
}
eloo <- mu - y
n <- length(y)
rd <- bayesboot::rudirichlet(4000, n)
vary <- (rowSums(sweep(rd, 2, y^2, FUN = "*")) -
rowSums(sweep(rd, 2, y, FUN = "*"))^2) * (n / (n - 1))
vareloo <- (rowSums(sweep(rd, 2, eloo^2, FUN = "*")) -
rowSums(sweep(rd, 2, eloo, FUN = "*")^2)) * (n / (n - 1))
looR2 <- 1 - vareloo / vary
looR2[looR2 < -1] <- -1
looR2[looR2 > 1] <- 1
value <- median(looR2)
value.se <- sd(looR2)
(Note that * (n / (n - 1) can be omitted because it cancels out.) In those lines, bayesboot::rudirichlet() is used. According to @avehtari, the SE could also be calculated without a Dirichlet approach, using the formula from stan-dev/loo#205 (comment).

@fweber144 fweber144 added the enhancement Enhancements of existing features, but also new feature requests. label Nov 30, 2023
@fweber144
Copy link
Collaborator Author

@AlejandroCatalina: Is line looR2[looR2 < -1] <- -1 supposed to read looR2[looR2 < 0] <- 0?

@fweber144
Copy link
Collaborator Author

@avehtari: The SE formula provided in stan-dev/loo#205 (comment) refers to LOO - $R^2$. I guess it cannot be applied directly to K-fold CV, no CV (i.e., test dataset = training dataset), or a hold-out test dataset. Do you know of similar formulas for those cases?

@avehtari
Copy link
Collaborator

avehtari commented Dec 5, 2023

Is line looR2[looR2 < -1] <- -1 supposed to read looR2[looR2 < 0] <- 0?

The first one is intentional.

The SE formula provided in stan-dev/loo#205 (comment) refers to LOO -
. I guess it cannot be applied directly to K-fold CV

Can be used with K-fold-CV and pointwise evaluation.

no CV (i.e., test dataset = training dataset)

We have used Bayesian-R2 for that as it has some benefits in that case, but the same formula could be used, too

or a hold-out test dataset

Can be used

@avehtari
Copy link
Collaborator

Implemented by 0ed8391

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancements of existing features, but also new feature requests.
Projects
None yet
Development

No branches or pull requests

2 participants