Add survival benchmark tuning spaces #47

jemus42 · 2024-06-11T15:34:00Z

I started adding tuning spaces but there are some issues I don't know how to address yet:

Search spaces include learners only available from GitHub remotes (mlr3extralearners, CoxBoost, survivalmodels, and mlr3proba for survival in general)
Some tuning spaces include .extra_trafo or trafo I didn't know how to correctly translate using tune_token.
Another example are p_int spaces where I didn't quite understand whether I can safely use e.g. tune_token(1L, 10L) to get a discrete search space.
The documentation helper (e.g. rd_info(lts("surv.penalized.sbb"))) errors in various ways, e.g.

✖ tuning_spaces_sbb.R:34: @section failed to evaluate inline markdown code.
Caused by error in `map_chr()`:
ℹ In index: 1.
Caused by error in `switch()`:
! EXPR must be a length 1 vector

...which I assume to be related to 2. because this does not happen for "simple" tuning spaces that only consist of params I can easily specify with tune_token

EDIT:

Oh, and

In our benchmark we set some parameters to non-default values, and if the goal is to get learners with the same general config as in our publication, being able to e.g. set surv.ranger to num.trees = 1000 etc. could be important. Now I'm not sure sure whether this is considered out of scope for the package, or if not, how I can add fixed parameters to the tuning spaces.

be-marc · 2024-06-26T06:36:06Z

Search spaces include learners only available from GitHub remotes (mlr3extralearners, CoxBoost, survivalmodels, and mlr3proba for survival in general)

We have to add them to suggests and use mlr3misc::require_namespaces() and testthat::skip_if_not_installed(). CRAN has this policy:

Packages on which a CRAN package depends should be available from a mainstream repository: if any mentioned in ‘Suggests’ or ‘Enhances’ fields are not from such a repository, where to obtain them at a repository should be specified in an ‘Additional_repositories’ field of the DESCRIPTION file (as a comma-separated list of repository URLs) or for other means of access, described in the ‘Description’ field. A package listed in ‘Suggests’ or ‘Enhances’ should be used conditionally in examples or tests if it cannot straightforwardly be installed on the major R platforms. (‘Writing R Extensions’ recommends that they are always used conditionally.) Orphaned CRAN packages should not be strict requirements (in the ‘Depends’, ‘Imports’ or ‘LinkingTo’ fields, including indirectly). They are allowed in ‘Suggests’ if used conditionally, although this is discouraged.

Maybe you can figure out what to do exactly. @bblodfon You have the same problem with mlr3viz. Maybe you figure this out together.

be-marc · 2024-06-26T06:40:18Z

Some tuning spaces include .extra_trafo or trafo I didn't know how to correctly translate using tune_token.

I have to change the TuningSpace class for this , but I think we can make it work.

Another example are p_int spaces where I didn't quite understand whether I can safely use e.g. tune_token(1L, 10L) to get a discrete search space.

I don't really understand that. The example gives you all integer values between 1 and 10.

be-marc · 2024-06-26T06:47:40Z

Use grid search due to small + finite search space

Ah you want to do a grid search with tune_token(1L, 10L)?

be-marc · 2024-06-26T06:50:51Z

In our benchmark we set some parameters to non-default values, and if the goal is to get learners with the same general config as in our publication, being able to e.g. set surv.ranger to num.trees = 1000 etc. could be important. Now I'm not sure sure whether this is considered out of scope for the package, or if not, how I can add fixed parameters to the tuning spaces.

You can also set constants as long as they overwrite the learner's default. I will adapt the printer for this.

be-marc · 2024-06-26T07:09:08Z

The documentation helper (e.g. rd_info(lts("surv.penalized.sbb"))) errors in various ways

Can you merge main? This could also be the cause.

bblodfon · 2024-06-26T08:33:37Z

Maybe you can figure out what to do exactly. bblodfon You have the same problem with mlr3viz. Maybe you figure this out together.

Yes, I was planning to try it out today and see if CRAN likes that or not. I will report it here.

Note that I also had tried to have my own tuningspaces more than a year ago, main problem was that I couldn't quite figure out how to do this when the learner is a graph learner, ie in the case of xgboost.aft + distrcompose for example (since you get extra prediction types, ie distr). But maybe now it works Lukas, have you tried? ie if I use the tuning spaces to get the xgboost AFT learner, then add the distrcompose, then the learner is good to go for benchmarking? (that would be amazing)

be-marc · 2024-06-26T09:27:30Z

Note that I also had tried to have my own tuningspaces more than a year ago, main problem was that I couldn't quite figure out how to do this when the learner is a graph learner, ie in the case of xgboost.aft + distrcompose for example (since you get extra prediction types, ie distr). But maybe now it works Lukas, have you tried? ie if I use the tuning spaces to get the xgboost AFT learner, then add the distrcompose, then the learner is good to go for benchmarking? (that would be amazing)

You add the tuning space to the learner and then build the pipeline with it. Or does the distrcompose pipeop also have hyperparameters? TuningSpaces for pipelines is still an open issue #11 but if you need it, I would implement this.

bblodfon · 2024-06-26T10:02:56Z

Yes, the distrcompose has hyperparams, see example. Also crankcompose (very useful).

So, every composition pipeline pretty much in mlr3proba that someone would like to use atop the learner they get from tuning spaces to change the prediction types, will need this Marc!

Lukas was using this for example many times in his benchmark script, see here.

jemus42 · 2024-06-26T10:17:35Z

Search spaces include learners only available from GitHub remotes (mlr3extralearners, CoxBoost, survivalmodels, and mlr3proba for survival in general)

We have to add them to suggests and use mlr3misc::require_namespaces() and testthat::skip_if_not_installed(). CRAN has this policy:

if any mentioned in ‘Suggests’ or ‘Enhances’ fields are not from such a repository, where to obtain them at a repository should be specified in an ‘Additional_repositories’ field of the DESCRIPTION file (as a comma-separated list of repository URLs) or for other means of access, described in the ‘Description’ field. A package listed in ‘Suggests’ or ‘Enhances’ should be used conditionally in examples or tests if it cannot straightforwardly be installed on the major R platforms.

Sounds like adding

Additional_repositories:
  https://mlr-org.r-universe.dev

should be enough, as the r-universe repo also pulls in learner dependencies (CoxBoost for example) 🤔

jemus42 · 2024-06-26T10:17:47Z

The documentation helper (e.g. rd_info(lts("surv.penalized.sbb"))) errors in various ways

Can you merge main? This could also be the cause.

Done, doesn't seem to have helped yet?

jemus42 · 2024-06-26T10:19:08Z

Use grid search due to small + finite search space

Ah you want to do a grid search with tune_token(1L, 10L)?

Also yes, but the main thing was that I didn't quite understand from the docs whether tune_token(1L, 10L) would produce an integer search space or whether it would convert to p_dbl(1, 10)

jemus42 · 2024-06-26T10:20:41Z

I couldn't quite figure out how to do this when the learner is a graph learner, ie in the case of xgboost.aft + distrcompose for example (since you get extra prediction types, ie distr). But maybe now it works Lukas, have you tried? ie if I use the tuning spaces to get the xgboost AFT learner, then add the distrcompose, then the learner is good to go for benchmarking? (that would be amazing)

I have not, I wanted to start really basic here with the bare minimum for our search spaces and then see how far we can go.
I think for this particular setup though I don't need the graphs I think.

be-marc · 2024-06-26T10:33:10Z

Also yes, but the main thing was that I didn't quite understand from the docs whether tune_token(1L, 10L) would produce an integer search space or whether it would convert to p_dbl(1, 10)

No, if the token is set on a p_int, the drawn values remain integer values.

I was curious if you can do the grid search. It seems that is works like this.

library(paradox)

param_set = ps(
  x_int =  p_int(lower = 1,  upper = 10)
)

param_set$set_values(x_int = to_tune(p_fct(c(1, 2, 3, 4, 5))))

search_space = param_set$search_space()

design = generate_design_random(search_space, n = 1)

x = design$transpose()

x

# [[1]]
# [[1]]$x_int
# [1] 3

class(x[[1]]$x_int)

# [1] "numeric"

I think this was a new feature Martin introduced with the p_int notation.

jemus42 · 2024-07-03T14:17:05Z

Pulled upstream changes and tweaked rd_info a little,

a) It no longer assumes that logscale is set, which was NULL in some of my cases
b) It (awkwardly) extracts the trafo fun and prints the function rather than e.g. "Logscale" as a temporary workaround.

I assume this is maybe something you'd prefer to do otherwise and not as a drive-by change in this PR, but well, I still haven't managed to get all tuning spaces to print properly (.extra_trafo is missing).

I also added Remotes: and Additional_repositories fields to DESCRIPTION, which now looks close to what's now in mlr3viz 🤞🏻

bblodfon · 2024-07-03T14:58:26Z

Yes, the Additional_repositories should work on CRAN I did a check the other day with the mlr3viz and the windows hub.

So, what more is needed from the initial points 1-5?

jemus42 · 2024-07-03T15:17:52Z

I realized I somehow had 4 points and labelled them 1-5.
Welp, there's two hard things in computing: Cache invalidation, naming things, and off-by-one errors.

1. Learners only available from GitHub remotes (mlr3extralearners, CoxBoost, survivalmodels, and mlr3proba)
2. Some tuning spaces include .extra_trafo or trafo and I'm not sure this is supported by TuningSpace yet.
3. The documentation helper (e.g. rd_info(lts("surv.penalized.sbb"))) errors for trafo and expects logscale -> I fixed this, at least as a workaround
5. Constant values as tuning parameters: I have re-added them, but rd_info doesn't seem to like it.

bblodfon · 2024-08-17T10:59:45Z

I added an example with the pipeline for XGB-AFT to also predict the distr prediction. This is related to #51, I added two ways to do it, maybe keep only one, what do you think?

Also:

I don't know if we want this, but currently we can't load the learner and apply the tuning space in one go for the survival models, eg lts(lrn("surv.xgboost.aft")) doesn't return the learner with the tuned parameters as it associates with the surv.xgboost.aft.default id. Maybe check if the learner id starts with surv and apply the sbb endfix?
lts("surv.aorsf.sbb")$get_learner() fails because of the .extra_trafo => I think that's (2) on Lukas list.

Start adding survival benchmark tuning spaces

26b23c5

fix: print constant values

396af69

Merge branch 'mlr-org:main' into survbench-spaces

3707915

Add Additional_repositories for GitHub packages

5479f03

jemus42 and others added 4 commits July 3, 2024 15:51

Merge branch 'mlr-org:main' into survbench-spaces

66a3897

Add Remotes for required learners

9b17e3a

Adjust rd_info to include .trafo fun

05fc8a0

roxygenize

135c0fc

Re-add constant values to tuning spaces

743b164

bblodfon added 3 commits August 17, 2024 12:28

add examples with simple and graph learner (XGB-AFT)

913f1f0

update example + small fix

cbb9c3a

update param name

2d0abaf

Merge branch 'main' into survbench-spaces

4b1740b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add survival benchmark tuning spaces #47

Add survival benchmark tuning spaces #47

jemus42 commented Jun 11, 2024 •

edited

Loading

be-marc commented Jun 26, 2024

be-marc commented Jun 26, 2024

be-marc commented Jun 26, 2024

be-marc commented Jun 26, 2024

be-marc commented Jun 26, 2024

bblodfon commented Jun 26, 2024

be-marc commented Jun 26, 2024

bblodfon commented Jun 26, 2024 •

edited

Loading

jemus42 commented Jun 26, 2024

jemus42 commented Jun 26, 2024

jemus42 commented Jun 26, 2024

jemus42 commented Jun 26, 2024

be-marc commented Jun 26, 2024

jemus42 commented Jul 3, 2024 •

edited

Loading

bblodfon commented Jul 3, 2024

jemus42 commented Jul 3, 2024 •

edited

Loading

bblodfon commented Aug 17, 2024

Add survival benchmark tuning spaces #47

Are you sure you want to change the base?

Add survival benchmark tuning spaces #47

Conversation

jemus42 commented Jun 11, 2024 • edited Loading

be-marc commented Jun 26, 2024

be-marc commented Jun 26, 2024

be-marc commented Jun 26, 2024

be-marc commented Jun 26, 2024

be-marc commented Jun 26, 2024

bblodfon commented Jun 26, 2024

be-marc commented Jun 26, 2024

bblodfon commented Jun 26, 2024 • edited Loading

jemus42 commented Jun 26, 2024

jemus42 commented Jun 26, 2024

jemus42 commented Jun 26, 2024

jemus42 commented Jun 26, 2024

be-marc commented Jun 26, 2024

jemus42 commented Jul 3, 2024 • edited Loading

bblodfon commented Jul 3, 2024

jemus42 commented Jul 3, 2024 • edited Loading

bblodfon commented Aug 17, 2024

jemus42 commented Jun 11, 2024 •

edited

Loading

bblodfon commented Jun 26, 2024 •

edited

Loading

jemus42 commented Jul 3, 2024 •

edited

Loading

jemus42 commented Jul 3, 2024 •

edited

Loading