Add multi-armed bandit sampler #155

ryota717 · 2024-09-25T17:23:17Z

Contributor Agreements

Please read the contributor agreements and if you agree, please click the checkbox below.

I agree to the contributor agreements.

Tip

Please follow the Quick TODO list to smoothly merge your PR.

Motivation

#113

Description of the changes

Adding multi-armed bandit sampler in #113.

TODO List towards PR Merge

Please remove this section if this PR is not an addition of a new package.
Otherwise, please check the following TODO list:

Copy ./template/ to create your package
Replace <COPYRIGHT HOLDER> in LICENSE of your package with your name
Fill out README.md in your package
Add import statements of your function or class names to be used in __init__.py
Apply the formatter based on the tips in README.md
Check whether your module works as intended based on the tips in README.md

Please Check Here

Please tell me if other options(like annealing epsilon, need _n_startup_trials like TPESampler,...) are necessary.

ryota717 · 2024-09-25T21:30:10Z

package/samplers/multi_armed_bandit/multi_armed_bandit.py

+            states = (TrialState.COMPLETE, TrialState.PRUNED)
+            trials = study._get_trials(deepcopy=False, states=states, use_cache=True)
+
+            rewards_by_choice: defaultdict = defaultdict(float)


[QUESTION] This defaultdict treats never choiced arm having 0 rewards. Should i replace any other idea?
(as far as I can think of using _n_startup_trials like TPESampler or letting user set default reward instead of 0)

That is a very good point actually:)
I should have given the pseudocode of the $\epsilon$-greedy algorithm, but it usually works as follows:

The control parameter of the algorithm is $\epsilon$, i.e. the probability of random sampling, n_trials, which we define as $T$ hereafter, and the number of choices $K$.

Try every single arm $\epsilon T / K$ times.

Choose the optimal arm (up to $\epsilon T / K$ or up to the latest trial) for each dimension.

So usually, we start from the random initialization.
However, we do not have to strictly stick to this algorithm, meaning that it is totally acceptable to not follow the classic algorithm implementation.
Instead, we can do it in the UCB policy fashion where we try each arm once at the initialization.
In this way, your issue will be resolved and we can still retain most of your implementation.

Thanks for your suggestion!

Instead, we can do it in the UCB policy fashion where we try each arm once at the initialization.

This looks me to good and changed initialization in 371556f
(random initialization seems difficult for Optuna because of its high objective flexibility 🙏)

y0z · 2024-09-26T06:33:37Z

@nabenabe0928

Could you review this PR? (cf. #113 (comment))

nabenabe0928 · 2024-09-30T04:49:52Z

package/samplers/multi_armed_bandit/example.py

@@ -0,0 +1,20 @@
+import optuna


I confirmed that the example works!

package/samplers/multi_armed_bandit/multi_armed_bandit.py

nabenabe0928 · 2024-09-30T05:12:57Z

package/samplers/multi_armed_bandit/multi_armed_bandit.py

+            states = (TrialState.COMPLETE, TrialState.PRUNED)
+            trials = study._get_trials(deepcopy=False, states=states, use_cache=True)
+
+            rewards_by_choice: defaultdict = defaultdict(float)


That is a very good point actually:)
I should have given the pseudocode of the $\epsilon$-greedy algorithm, but it usually works as follows:

The control parameter of the algorithm is $\epsilon$, i.e. the probability of random sampling, n_trials, which we define as $T$ hereafter, and the number of choices $K$.

Try every single arm $\epsilon T / K$ times.

Choose the optimal arm (up to $\epsilon T / K$ or up to the latest trial) for each dimension.

So usually, we start from the random initialization.
However, we do not have to strictly stick to this algorithm, meaning that it is totally acceptable to not follow the classic algorithm implementation.
Instead, we can do it in the UCB policy fashion where we try each arm once at the initialization.
In this way, your issue will be resolved and we can still retain most of your implementation.

Co-authored-by: Shuhei Watanabe <[email protected]>

ryota717

@nabenabe0928 Thank you for suggestion and comment! Colud you check my revisions, please?

ryota717 · 2024-09-30T10:41:54Z

package/samplers/multi_armed_bandit/multi_armed_bandit.py

+            states = (TrialState.COMPLETE, TrialState.PRUNED)
+            trials = study._get_trials(deepcopy=False, states=states, use_cache=True)
+
+            rewards_by_choice: defaultdict = defaultdict(float)


Thanks for your suggestion!

Instead, we can do it in the UCB policy fashion where we try each arm once at the initialization.

This looks me to good and changed initialization in 371556f
(random initialization seems difficult for Optuna because of its high objective flexibility 🙏)

nabenabe0928 · 2024-10-02T06:05:53Z

@ryota717

Hi, thank you for the prompt action!
I will look into the changes asap:)
But probably, it is better to rename the directory name to something like mab_epsilon_greedy?

ryota717 · 2024-10-02T06:30:41Z

@nabenabe0928 Renamed modules and directory in 0bc6cb7.

nabenabe0928 · 2024-10-02T07:16:01Z

@ryota717
Thank you so much! I will check the rest asap 🙇

nabenabe0928

Thank you for the modification and sorry for the late response:(
I added some comments, but you can choose whether you take the suggestions or not!
Feel free to tell me your opinion and then we can promptly merge this PR!

nabenabe0928 · 2024-10-08T04:24:11Z

package/samplers/mab_epsilon_greedy/README.md

@@ -0,0 +1,25 @@
+---
+author: Ryota Nishijima
+title: MAB Epsilon-Greedy Sampler


[nit]

Suggested change

title: MAB Epsilon-Greedy Sampler

title: A Sampler Based on Epsilon-Greedy Multi-Armed Bandit Algorithm

nabenabe0928 · 2024-10-08T04:27:27Z

package/samplers/mab_epsilon_greedy/mab_epsilon_greedy.py

+            if study.direction == StudyDirection.MINIMIZE:
+                return min(
+                    param_distribution.choices,
+                    key=lambda x: rewards_by_choice[x] / max(cnt_by_choice[x], 1),


[nit]
Now, thanks to the last modification, we do not have to use min/max operator here!

Suggested change

key=lambda x: rewards_by_choice[x] / max(cnt_by_choice[x], 1),

key=lambda x: rewards_by_choice[x] / cnt_by_choice[x],

nabenabe0928 · 2024-10-08T04:27:47Z

package/samplers/mab_epsilon_greedy/mab_epsilon_greedy.py

+            else:
+                return max(
+                    param_distribution.choices,
+                    key=lambda x: rewards_by_choice[x] / max(cnt_by_choice[x], 1),


[nit]
Same here:)

Suggested change

key=lambda x: rewards_by_choice[x] / max(cnt_by_choice[x], 1),

key=lambda x: rewards_by_choice[x] / cnt_by_choice[x],

nabenabe0928

I will merge this PR once, so if you would like to add my suggestions, please publish another PR!

docs: multi-armed bandit.

8f0c8b4

ryota717 mentioned this pull request Sep 25, 2024

Add multi-armed bandit-based sampler #113

Open

ryota717 force-pushed the 113-add-bandit-sampler branch 3 times, most recently from b728e4e to 568c8c6 Compare September 25, 2024 21:25

ryota717 added 2 commits September 26, 2024 06:28

feats: multi-armed bandit samplers.

e18ea6a

example: multi-armed bandit samplers.

a1c1784

ryota717 force-pushed the 113-add-bandit-sampler branch from 568c8c6 to a1c1784 Compare September 25, 2024 21:28

ryota717 commented Sep 25, 2024

View reviewed changes

y0z assigned nabenabe0928 Sep 26, 2024

y0z added new-package New packages contribution-welcome Contribution welcome issues labels Sep 26, 2024

nabenabe0928 reviewed Sep 30, 2024

View reviewed changes

ryota717 and others added 2 commits September 30, 2024 19:28

feat: select never selected arm for reward initialization.

371556f

doc: fix POD.

888441c

Co-authored-by: Shuhei Watanabe <[email protected]>

ryota717 commented Sep 30, 2024

View reviewed changes

fix: rename module names.

0bc6cb7

nabenabe0928 reviewed Oct 8, 2024

View reviewed changes

nabenabe0928 approved these changes Oct 8, 2024

View reviewed changes

nabenabe0928 merged commit 9e353ea into optuna:main Oct 8, 2024
4 checks passed

ryota717 mentioned this pull request Oct 13, 2024

small refactor in mab_epsilon_greedy #164

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multi-armed bandit sampler #155

Add multi-armed bandit sampler #155

ryota717 commented Sep 25, 2024

ryota717 Sep 25, 2024

nabenabe0928 Sep 30, 2024 •

edited

Loading

ryota717 Sep 30, 2024

y0z commented Sep 26, 2024

nabenabe0928 Sep 30, 2024

nabenabe0928 Sep 30, 2024 •

edited

Loading

ryota717 left a comment

ryota717 Sep 30, 2024

nabenabe0928 commented Oct 2, 2024

ryota717 commented Oct 2, 2024

nabenabe0928 commented Oct 2, 2024

nabenabe0928 left a comment

nabenabe0928 Oct 8, 2024

nabenabe0928 Oct 8, 2024

nabenabe0928 Oct 8, 2024

nabenabe0928 left a comment

	title: MAB Epsilon-Greedy Sampler
	title: A Sampler Based on Epsilon-Greedy Multi-Armed Bandit Algorithm

	key=lambda x: rewards_by_choice[x] / max(cnt_by_choice[x], 1),
	key=lambda x: rewards_by_choice[x] / cnt_by_choice[x],

Add multi-armed bandit sampler #155

Add multi-armed bandit sampler #155

Conversation

ryota717 commented Sep 25, 2024

Contributor Agreements

Motivation

Description of the changes

TODO List towards PR Merge

Please Check Here

ryota717 Sep 25, 2024

Choose a reason for hiding this comment

nabenabe0928 Sep 30, 2024 • edited Loading

Choose a reason for hiding this comment

ryota717 Sep 30, 2024

Choose a reason for hiding this comment

y0z commented Sep 26, 2024

nabenabe0928 Sep 30, 2024

Choose a reason for hiding this comment

nabenabe0928 Sep 30, 2024 • edited Loading

Choose a reason for hiding this comment

ryota717 left a comment

Choose a reason for hiding this comment

ryota717 Sep 30, 2024

Choose a reason for hiding this comment

nabenabe0928 commented Oct 2, 2024

ryota717 commented Oct 2, 2024

nabenabe0928 commented Oct 2, 2024

nabenabe0928 left a comment

Choose a reason for hiding this comment

nabenabe0928 Oct 8, 2024

Choose a reason for hiding this comment

nabenabe0928 Oct 8, 2024

Choose a reason for hiding this comment

nabenabe0928 Oct 8, 2024

Choose a reason for hiding this comment

nabenabe0928 left a comment

Choose a reason for hiding this comment

nabenabe0928 Sep 30, 2024 •

edited

Loading

nabenabe0928 Sep 30, 2024 •

edited

Loading