-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add multi-armed bandit sampler #155
Conversation
b728e4e
to
568c8c6
Compare
568c8c6
to
a1c1784
Compare
states = (TrialState.COMPLETE, TrialState.PRUNED) | ||
trials = study._get_trials(deepcopy=False, states=states, use_cache=True) | ||
|
||
rewards_by_choice: defaultdict = defaultdict(float) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[QUESTION] This defaultdict
treats never choiced arm having 0 rewards. Should i replace any other idea?
(as far as I can think of using _n_startup_trials
like TPESampler or letting user set default reward instead of 0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a very good point actually:)
I should have given the pseudocode of the
- The control parameter of the algorithm is
$\epsilon$ , i.e. the probability of random sampling,n_trials
, which we define as$T$ hereafter, and the number of choices$K$ . - Try every single arm
$\epsilon T / K$ times. - Choose the optimal arm (up to
$\epsilon T / K$ or up to the latest trial) for each dimension.
So usually, we start from the random initialization.
However, we do not have to strictly stick to this algorithm, meaning that it is totally acceptable to not follow the classic algorithm implementation.
Instead, we can do it in the UCB policy fashion where we try each arm once at the initialization.
In this way, your issue will be resolved and we can still retain most of your implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your suggestion!
Instead, we can do it in the UCB policy fashion where we try each arm once at the initialization.
This looks me to good and changed initialization in 371556f
(random initialization seems difficult for Optuna because of its high objective flexibility 🙏)
Could you review this PR? (cf. #113 (comment)) |
@@ -0,0 +1,20 @@ | |||
import optuna |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I confirmed that the example works!
states = (TrialState.COMPLETE, TrialState.PRUNED) | ||
trials = study._get_trials(deepcopy=False, states=states, use_cache=True) | ||
|
||
rewards_by_choice: defaultdict = defaultdict(float) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a very good point actually:)
I should have given the pseudocode of the
- The control parameter of the algorithm is
$\epsilon$ , i.e. the probability of random sampling,n_trials
, which we define as$T$ hereafter, and the number of choices$K$ . - Try every single arm
$\epsilon T / K$ times. - Choose the optimal arm (up to
$\epsilon T / K$ or up to the latest trial) for each dimension.
So usually, we start from the random initialization.
However, we do not have to strictly stick to this algorithm, meaning that it is totally acceptable to not follow the classic algorithm implementation.
Instead, we can do it in the UCB policy fashion where we try each arm once at the initialization.
In this way, your issue will be resolved and we can still retain most of your implementation.
Co-authored-by: Shuhei Watanabe <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nabenabe0928 Thank you for suggestion and comment! Colud you check my revisions, please?
states = (TrialState.COMPLETE, TrialState.PRUNED) | ||
trials = study._get_trials(deepcopy=False, states=states, use_cache=True) | ||
|
||
rewards_by_choice: defaultdict = defaultdict(float) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your suggestion!
Instead, we can do it in the UCB policy fashion where we try each arm once at the initialization.
This looks me to good and changed initialization in 371556f
(random initialization seems difficult for Optuna because of its high objective flexibility 🙏)
Hi, thank you for the prompt action! |
@nabenabe0928 Renamed modules and directory in 0bc6cb7. |
@ryota717 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the modification and sorry for the late response:(
I added some comments, but you can choose whether you take the suggestions or not!
Feel free to tell me your opinion and then we can promptly merge this PR!
@@ -0,0 +1,25 @@ | |||
--- | |||
author: Ryota Nishijima | |||
title: MAB Epsilon-Greedy Sampler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit]
title: MAB Epsilon-Greedy Sampler | |
title: A Sampler Based on Epsilon-Greedy Multi-Armed Bandit Algorithm |
if study.direction == StudyDirection.MINIMIZE: | ||
return min( | ||
param_distribution.choices, | ||
key=lambda x: rewards_by_choice[x] / max(cnt_by_choice[x], 1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit]
Now, thanks to the last modification, we do not have to use min/max operator here!
key=lambda x: rewards_by_choice[x] / max(cnt_by_choice[x], 1), | |
key=lambda x: rewards_by_choice[x] / cnt_by_choice[x], |
else: | ||
return max( | ||
param_distribution.choices, | ||
key=lambda x: rewards_by_choice[x] / max(cnt_by_choice[x], 1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit]
Same here:)
key=lambda x: rewards_by_choice[x] / max(cnt_by_choice[x], 1), | |
key=lambda x: rewards_by_choice[x] / cnt_by_choice[x], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will merge this PR once, so if you would like to add my suggestions, please publish another PR!
Contributor Agreements
Please read the contributor agreements and if you agree, please click the checkbox below.
Tip
Please follow the Quick TODO list to smoothly merge your PR.
Motivation
#113
Description of the changes
Adding multi-armed bandit sampler in #113.
TODO List towards PR Merge
Please remove this section if this PR is not an addition of a new package.
Otherwise, please check the following TODO list:
./template/
to create your package<COPYRIGHT HOLDER>
inLICENSE
of your package with your nameREADME.md
in your package__init__.py
README.md
README.md
Please Check Here
Please tell me if other options(like annealing epsilon, need
_n_startup_trials
like TPESampler,...) are necessary.