-
-
Notifications
You must be signed in to change notification settings - Fork 8
/
README.Rmd
137 lines (100 loc) · 4.83 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
---
output: github_document
---
# mlr3filters
Package website: [release](https://mlr3filters.mlr-org.com/) | [dev](https://mlr3filters.mlr-org.com/dev/)
{mlr3filters} adds feature selection filters to [mlr3](https://mlr3.mlr-org.com).
The implemented filters can be used stand-alone, or as part of a machine learning pipeline in combination with
[mlr3pipelines](https://mlr3pipelines.mlr-org.com) and the [filter operator](https://mlr3pipelines.mlr-org.com/reference/mlr_pipeops_filter.html).
Wrapper methods for feature selection are implemented in [mlr3fselect](https://mlr3fselect.mlr-org.com).
Learners which support the extraction feature importance scores can be combined with a filter from this package for embedded feature selection.
<!-- badges: start -->
[![r-cmd-check](https://github.com/mlr-org/mlr3filters/actions/workflows/r-cmd-check.yml/badge.svg)](https://github.com/mlr-org/mlr3filters/actions/workflows/r-cmd-check.yml)
[![CRAN Status](https://www.r-pkg.org/badges/version-ago/mlr3filters)](https://cran.r-project.org/package=mlr3filters)
[![StackOverflow](https://img.shields.io/badge/stackoverflow-mlr3-orange.svg)](https://stackoverflow.com/questions/tagged/mlr3)
[![Mattermost](https://img.shields.io/badge/chat-mattermost-orange.svg)](https://lmmisld-lmu-stats-slds.srv.mwn.de/mlr_invite/)
<!-- badges: end -->
## Installation
CRAN version
```{r eval = FALSE}
install.packages("mlr3filters")
```
Development version
```{r, eval = FALSE}
remotes::install_github("mlr-org/mlr3filters")
```
## Filters
### Filter Example
```{r}
set.seed(1)
library("mlr3")
library("mlr3filters")
task = tsk("sonar")
filter = flt("auc")
head(as.data.table(filter$calculate(task)))
```
### Implemented Filters
```{r echo = FALSE, message=FALSE}
library("mlr3misc")
library("mlr3filters")
library("data.table")
link_cran = function(pkg) {
mlr3misc::map(pkg, function(.x) {
mlr3misc::map_chr(.x, function(.y) {
if (unlist(.y) %in% getOption("defaultPackages")) {
.y
} else {
sprintf("[%1$s](https://cran.r-project.org/package=%1$s)", .y)
}
})
})
}
tab = as.data.table(mlr_filters)[, !c("params", "task_properties")]
tab[, task_types := sapply(task_types, function(x) if (is_scalar_na(x)) "Universal" else paste(capitalize(x), collapse = " & "))]
tab[, feature_types := sapply(feature_types, function(x) paste(capitalize(x), collapse = ", "))]
tab[, packages := sapply(packages, function(x) paste(link_cran(x), collapse = ", "))]
# manually change the task type for specific filters
learner_based = c("performance", "permutation", "importance", "selected_features")
tab[key %in% learner_based, task_types := "Universal"]
tab[key %in% learner_based, packages := ""]
setnames(tab,
old = c("key", "task_types", "feature_types", "packages"),
new = c("Name", "Task Types", "Feature Types", "Package")
)
knitr::kable(tab, format = "markdown")
```
### Variable Importance Filters
The following learners allow the extraction of variable importance and therefore are supported by `FilterImportance`:
```{r echo=FALSE, warning=FALSE}
library("mlr3learners")
tab = as.data.table(mlr_learners)
tab[sapply(properties, is.element, el = "importance"), key]
```
If your learner is not listed here but capable of extracting variable importance from the fitted model, the reason is most likely that it is not yet integrated in the package [mlr3learners](https://github.com/mlr-org/mlr3learners) or the [extra learner extension](https://github.com/mlr-org/mlr3extralearners).
Please open an issue so we can add your package.
Some learners need to have their variable importance measure "activated" during learner creation.
For example, to use the "impurity" measure of Random Forest via the {ranger} package:
```{r}
task = tsk("iris")
lrn = lrn("classif.ranger", seed = 42)
lrn$param_set$values = list(importance = "impurity")
filter = flt("importance", learner = lrn)
filter$calculate(task)
head(as.data.table(filter), 3)
```
### Performance Filter
`FilterPerformance` is a univariate filter method which calls `resample()` with every predictor variable in the dataset and ranks the final outcome using the supplied measure.
Any learner can be passed to this filter with `classif.rpart` being the default.
Of course, also regression learners can be passed if the task is of type "regr".
### Filter-based Feature Selection
In many cases filtering is only one step in the modeling pipeline.
To select features based on filter values, one can use [`PipeOpFilter`](https://mlr3pipelines.mlr-org.com/reference/mlr_pipeops_filter.html) from [mlr3pipelines](https://github.com/mlr-org/mlr3pipelines).
```{r, results='hide'}
library(mlr3pipelines)
task = tsk("spam")
# the `filter.frac` should be tuned
graph = po("filter", filter = flt("auc"), filter.frac = 0.5) %>>%
po("learner", lrn("classif.rpart"))
learner = as_learner(graph)
rr = resample(task, learner, rsmp("holdout"))
```