-
-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add wrapper methods #35
Conversation
As you are saying. Max features is really not a termination criterion. I guess that is bad design. It would only make sense like that for forward search. But it might be a relevant side constraint for many algorithms. Eg random search. So it would rather go into the control. Does that make sense? Especially if your code is now uglier when that is in the termination... M |
Yes, it makes sense. We should keep |
Well. I didn't do much 😀. Welcome. |
R/FeatureSelectionForward.R
Outdated
} | ||
|
||
super$initialize(id = "forward_selection", pe = pe, tm = tm, | ||
settings = list(max_features = checkmate::assert_numeric(max_features, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of settings
we should use param_set
here inheriting from paradox. Similar to how it is done in the Filter class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self$param_set = assert_param_set(param_set)
R/FeatureSelectionForward.R
Outdated
#' @family FeatureSelection | ||
#' @examples | ||
#' task = mlr3::mlr_tasks$get("pima") | ||
#' measures = mlr3::mlr_measures$mget(c("classif.acc")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#' measures = mlr3::mlr_measures$mget(c("classif.acc")) | |
#' measures = mlr3::mlr_measures$get(c("classif.acc")) |
R/FeatureSelectionForward.R
Outdated
public = list( | ||
initialize = function(pe, tm, max_features = NA) { | ||
if(is.na(max_features)) { | ||
max_features = length(pe$task$feature_names) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hier sollte dann sowas stehen wie:
super$initialize(
id = id,
[...]
param_set = ParamSet$new(list([..])),
param_vals = param_vals
)
R/FeatureSelectionRandom.R
Outdated
public = list( | ||
initialize = function(pe, tm, max_features = NA, batch_size = 10) { | ||
super$initialize(id = "random_selection", pe = pe, tm = tm, | ||
settings = list(max_features = checkmate::assert_numeric(max_features, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paradox::ParamSet()
state = NULL, | ||
|
||
initialize = function(settings) { | ||
self$settings = checkmate::assert_list(settings, names = "unique") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paradox::ParamSet()
inherit = Terminator, | ||
public = list( | ||
initialize = function(max_evaluations) { | ||
super$initialize(settings = list(max_evaluations = checkmate::assert_count(max_evaluations, positive = TRUE, coerce = TRUE))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paradox::ParamSet()
R/TerminatorPerformanceStep.R
Outdated
inherit = Terminator, | ||
public = list( | ||
initialize = function(threshold) { | ||
super$initialize(settings = list(threshold = checkmate::assert_numeric(threshold))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paradox::ParamSet()
R/TerminatorRuntime.R
Outdated
inherit = Terminator, | ||
public = list( | ||
initialize = function(max_time, units) { | ||
super$initialize(settings = list(max_time = checkmate::assert_count(max_time, positive = TRUE, coerce = TRUE), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paradox::ParamSet()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Marc!
Question
Why a new PR and not simply changes to the old one? This splits the discussion somehow.
ToDO
- As my comments indicate, we should use
paradox::ParamSet()
for things related to settings or hyperpars. There is still a lot missing in mlr3featsel also and for now onlyFilterVariance
has a ParamSet. - Please apply the Style guide
The old PR was started from a fork. I switched the developement to a branch. I will change the other things. Thank you for the review. |
I am writing a short summary about what is already done and what not because my time is limited for the rest of the month. I will be back on it next month. I think the general design can stay like this. The complicated cases like FeatureSelection
FeatureSelectionSequential
FeatureSelectionGenetic
Terminator
PerformanceEvaluator
Misc
If you are missing a feature from |
No worries, you already did a lot here!
No need to do everything at once. It is even better if we split everything up into small PRs. So you can focus on getting one method working and then continue with the next one.
This is not so crucial and can be skipped for now. Just add it in an issue.
No problem, just do a new PR for these
Never used it so far so cant really comment on it. Sadly there is also no example in the help page to quickly take a look. The summary post is cool -> Please always do it (edit) in the first post of the PR so that new people do not have to search for it. 👍 Hope you're having fun, we're on the right track 🚀 |
Last code example in https://mlr.mlr-org.com/articles/tutorial/feature_selection.html#select-a-feature-subset
Okay I am going to add just the missing main parts in this PR and the small details will be Issues/ PR after the merge
Yes 😄 |
Moved to https://github.com/mlr-org/mlr3fswrap. |
Moved from #30
Fixes #24
It is more complicated than I thought. Now
max_features
is implemented in allTerminator*
classes but it is only used in conjunction withFeatureSelectionForward
. It makes no sense to check for it if the user usesFeatureSelectionRandom
. In this case the maximum number of features just determines the design of the 0-1 encoding. They are not a stopping criterion. I implemented it like this. If FeatureSelectionForward is called, it activates the check for maximum features in theTerminator*
object.I am not happy with this implementation because it adds so many specific code lines to the
Terminator
class, which are just necessary forFeatureSelectionForward
. The otherFeatureSelection*
classes will not need this in order to work with theTerminator*
classes. Moreover, the user needs to provide thePerformanceEvaluation
object to theTerminator
object in order to check for the possible number of features, which is again just necessary forFeatureSelectionForward
.$get_result
returns the result in the same way asmlr::getFeatSelResult()
now. Just a list with the selected features and the performance.mlr
has a function calledanalyzeFeatSelResult
which would be helpful to analysis the result ofFeatureSelectionForward
.mlr
prints out a text. Maybe it would be better to implement it this time in a machine readable form like a list?It translates 0/1 to feature names which is needed in all
FeatureSelection*
classes. Therefore, I made it a private method in theFeatureSelection
class so that allFeatureSelection*
classes can use it.