Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update vignette to not use holdout role #44

Merged
merged 15 commits into from
Apr 16, 2024
44 changes: 44 additions & 0 deletions .github/workflows/r-cmd-check-paradox.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# r cmd check workflow of the mlr3 ecosystem v0.1.0
# https://github.com/mlr-org/actions
on:
workflow_dispatch:
push:
branches:
- main
pull_request:
branches:
- main

name: r-cmd-check-paradox

jobs:
r-cmd-check:
runs-on: ${{ matrix.config.os }}

name: ${{ matrix.config.os }} (${{ matrix.config.r }})

env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}

strategy:
fail-fast: false
matrix:
config:
- {os: ubuntu-latest, r: 'devel'}
- {os: ubuntu-latest, r: 'release'}

steps:
- uses: actions/checkout@v3

- name: paradox
run: 'echo -e "Remotes:\n mlr-org/paradox,\n mlr-org/mlr3learners,\n mlr-org/mlr3pipelines,\n mlr-org/mlr3oml" >> DESCRIPTION'

- uses: r-lib/actions/setup-r@v2
with:
r-version: ${{ matrix.config.r }}

- uses: r-lib/actions/setup-r-dependencies@v2
with:
extra-packages: any::rcmdcheck
needs: check
- uses: r-lib/actions/check-r-package@v2
19 changes: 11 additions & 8 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
Package: mcboost
Type: Package
Title: Multi-Calibration Boosting
Version: 0.4.2
Version: 0.4.3
Authors@R:
c(person(given = "Florian",
family = "Pfisterer",
role = c("cre", "aut"),
role = "aut",
email = "[email protected]",
comment = c(ORCID = "0000-0001-8867-762X")),
person(given = "Susanne",
Expand All @@ -18,18 +18,21 @@ Authors@R:
role = "ctb",
email = "[email protected]",
comment = c(ORCID = "0000-0001-7363-4299")),
person(given = "Carolin",
person(given = "Carolin",
family = "Becker",
role = "ctb"),
person(given = "Bernd",
family = "Bischl",
role = "ctb",
email = "[email protected]",
comment = c(ORCID = "0000-0001-6002-6980"))
comment = c(ORCID = "0000-0001-6002-6980")),
person(given = "Sebastian",
family = "Fischer",
role = c("ctb", "cre"),
email = "[email protected]")
)
Maintainer: Florian Pfisterer <[email protected]>
Description: Implements 'Multi-Calibration Boosting' (2018) <https://proceedings.mlr.press/v80/hebert-johnson18a.html> and
'Multi-Accuracy Boosting' (2019) <arXiv:1805.12317> for the multi-calibration of a machine learning model's prediction.
'Multi-Accuracy Boosting' (2019) <doi:10.48550/arXiv.1805.12317> for the multi-calibration of a machine learning model's prediction.
'MCBoost' updates predictions for sub-groups in an iterative fashion in order to mitigate biases like poor calibration or large accuracy differences across subgroups.
Multi-Calibration works best in scenarios where the underlying data & labels are unbiased, but resulting models are.
This is often the case, e.g. when an algorithm fits a majority population while ignoring or under-fitting minority populations.
Expand Down Expand Up @@ -66,9 +69,9 @@ Suggests:
covr,
testthat (>= 3.1.0)
Roxygen: list(markdown = TRUE, r6 = TRUE)
RoxygenNote: 7.2.1
RoxygenNote: 7.3.1
VignetteBuilder: knitr
Collate:
Collate:
'AuditorFitters.R'
'MCBoost.R'
'PipelineMCBoost.R'
Expand Down
7 changes: 5 additions & 2 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
# mcboost (development version)
# mcboost 0.4.3

* Compatibility with upcoming 'paradox' release.
* Change the vignette to not use the holdout task.

# mcboost 0.4.2
* Removed new functionality for survival tasks added in `0.4.0`.
* Removed new functionality for survival tasks added in `0.4.0`.
A dependency, `mlr3proba` was removed from CRAN for now.
The functionality will be added back when `mlr3proba` is re-introduced to CRAN.
Users who wish to use `mcboost` for `survival` are adviced to use version `0.4.1` usetogether with the GitHub version of `mlr3proba`.
Expand Down
26 changes: 13 additions & 13 deletions R/PipeOpMCBoost.R
Original file line number Diff line number Diff line change
Expand Up @@ -65,19 +65,19 @@ PipeOpMCBoost = R6Class("PipeOpMCBoost",
#' @param param_vals [`list`] \cr
#' List of hyperparameters for the `PipeOp`.
initialize = function(id = "mcboost", param_vals = list()) {
param_set = paradox::ParamSet$new(list(
paradox::ParamInt$new("max_iter", lower = 0L, upper = Inf, default = 5L, tags = "train"),
paradox::ParamDbl$new("alpha", lower = 0, upper = 1, default = 1e-4, tags = "train"),
paradox::ParamDbl$new("eta", lower = 0, upper = 1, default = 1, tags = "train"),
paradox::ParamLgl$new("partition", tags = "train", default = TRUE),
paradox::ParamInt$new("num_buckets", lower = 1, upper = Inf, default = 2L, tags = "train"),
paradox::ParamLgl$new("rebucket", default = FALSE, tags = "train"),
paradox::ParamLgl$new("multiplicative", default = TRUE, tags = "train"),
paradox::ParamUty$new("auditor_fitter", default = NULL, tags = "train"),
paradox::ParamUty$new("subpops", default = NULL, tags = "train"),
paradox::ParamUty$new("default_model_class", default = ConstantPredictor, tags = "train"),
paradox::ParamUty$new("init_predictor", default = NULL, tags = "train")
))
param_set = paradox::ps(
max_iter = paradox::p_int(lower = 0L, upper = Inf, default = 5L, tags = "train"),
alpha = paradox::p_dbl(lower = 0, upper = 1, default = 1e-4, tags = "train"),
eta = paradox::p_dbl(lower = 0, upper = 1, default = 1, tags = "train"),
partition = paradox::p_lgl(tags = "train", default = TRUE),
num_buckets = paradox::p_int(lower = 1, upper = Inf, default = 2L, tags = "train"),
rebucket = paradox::p_lgl(default = FALSE, tags = "train"),
multiplicative = paradox::p_lgl(default = TRUE, tags = "train"),
auditor_fitter = paradox::p_uty(default = NULL, tags = "train"),
subpops = paradox::p_uty(default = NULL, tags = "train"),
default_model_class = paradox::p_uty(default = ConstantPredictor, tags = "train"),
init_predictor = paradox::p_uty(default = NULL, tags = "train")
)
super$initialize(id,
param_set = param_set, param_vals = param_vals, packages = character(0),
input = data.table(name = c("data", "prediction"), train = c("TaskClassif", "TaskClassif"), predict = c("TaskClassif", "TaskClassif")),
Expand Down
30 changes: 3 additions & 27 deletions cran-comments.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,6 @@
## Reason for resubmission

Removed dependency on package mlr3proba that was removed from CRAN.
Apologies for not being able to upload a new version in time.

## R CMD check

Results in one NOTE:

CRAN repository db overrides:
X-CRAN-Comment: Archived on 2022-05-16 as requires archived package 'mlr3proba'.

The dependency on 'mlr3proba' has been removed in the updated version.


There is one NOTE that is only found on Windows (Server 2022, R-devel 64-bit):

```
* checking for detritus in the temp directory ... NOTE
Found the following files/directories:
'lastMiKTeXException'
```

As noted in R-hub issue #503, this could be due to a bug/crash in MiKTeX and can likely be ignored.

- WARNINGs or ERRORs

## R-HUB
0 errors | 0 warnings | 1 note

All checks show "Status: success"
New maintainer:
Sebastian Fischer <[email protected]>
9 changes: 7 additions & 2 deletions man/mcboost-package.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 5 additions & 4 deletions vignettes/mcboost_basics_extensions.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -517,10 +517,11 @@ summary(data$ViolentCrimesPerPop)
```

We again split our task into **train** and **test**.
We do this in `mlr3` by simply setting some (here 500) row roles to `"holdout"`.
We do this in `mlr3` by creating a 2/3 - 1/3 split using `mlr3::partition()` and assigning the train ids to the row role `"use"`.

```{r}
tsk$set_row_roles(sample(tsk$row_roles$use, 500), "holdout")
split = partition(tsk)
tsk$set_row_roles(split$train, "use")
```

### 6.1 Preprocessing
Expand Down Expand Up @@ -571,13 +572,13 @@ mc$multicalibrate(data, labels)

### 6.3 Evaluation on Test Data

We first create the test task by setting the `holdout` rows to `use`, and then
We first create the test task by assigning the test ids to the row role `"use"`, and then
use our preprocessing `pipe's` predict function to also impute missing values
for the validation data. Then we again extract features `X` and target `y`.

```{r}
test_task = tsk$clone()
test_task$row_roles$use = test_task$row_roles$holdout
test_task$row_roles$use = split$test
test_task = pipe$predict(list(test_task))[[1]]
test_data = test_task$data(cols = tsk$feature_names)
test_labels = test_task$data(cols = tsk$target_names)[[1]]
Expand Down
Loading