Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forward search without candidates at specific model size #307

Open
fweber144 opened this issue May 2, 2022 · 2 comments
Open

Forward search without candidates at specific model size #307

fweber144 opened this issue May 2, 2022 · 2 comments
Labels
perhaps Consider implementing this, but this is not a must-have.

Comments

@fweber144
Copy link
Collaborator

I'm currently working on the search_terms argument (fixing bugs and improving documentation). While doing so, I realized that there can be model sizes for which the forward search doesn't have any candidate models, for example:

options(mc.cores = parallel::detectCores(logical = FALSE))
data("df_gaussian", package = "projpred")
df_gaussian <- df_gaussian[1:41, ]
dat <- data.frame(y = df_gaussian$y, df_gaussian$x)
library(rstanarm)
rfit <- stan_glm(y ~ X1 + X2 + X3 + X4 + X5,
                 data = dat,
                 seed = 1140350788)
library(projpred)
vs <- varsel(rfit,
             nclusters = 3,
             nclusters_pred = 5,
             method = "forward",
             search_terms = c("X1 + X2"),
             seed = 46782345)

(tested with projpred 2.1.1). If you inspect the output of that varsel() call, you'll see that X1 + X2 is regarded as the solution term at model size 1:

print(vs)

gives


Family: gaussian 
Link function: identity 

Formula: y ~ X1 + X2 + X3 + X4 + X5
Observations: 41
Search method: forward, maximum number of terms 1
Number of clusters used for selection: 3
Number of clusters used for prediction: 5
Suggested Projection Size: NA

Selection Summary:
 size solution_terms   elpd  se  diff diff.se
    0           <NA> -101.6 2.9 -17.4     3.4
    1        X1 + X2  -93.9 2.8  -9.7     2.3

and plot(vs) behaves accordingly. Now my question (especially to @AlejandroCatalina) is whether this is intended or whether X1 + X2 should be regarded as the solution term at model size 2 because it consists of the 2 terms X1 and X2. The latter would probably require some larger changes because all functions downstream of search_forward() would have to be adapted to deal with "empty model sizes".

@AlejandroCatalina
Copy link
Collaborator

AlejandroCatalina commented May 2, 2022 via email

@fweber144
Copy link
Collaborator Author

Thanks, yes that helps. For now, I'll keep the current behavior. In a future release, we could think about switching to the alternative approach proposed above which requires some larger changes.

@fweber144 fweber144 added the perhaps Consider implementing this, but this is not a must-have. label May 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perhaps Consider implementing this, but this is not a must-have.
Projects
None yet
Development

No branches or pull requests

2 participants