Skip to content

Latest commit

 

History

History
437 lines (393 loc) · 19.6 KB

README.md

File metadata and controls

437 lines (393 loc) · 19.6 KB

Implement of Sarsa MAX (Q-LEarning) and Sarsa Expected for Taxi V2 from Open AI Gym

Original Paper:

https://arxiv.org/pdf/cs/9905014.pdf

Open AI Gym:

https://gym.openai.com/
https://github.com/openai/gym

Open AI Gym considers it solved at an average reward of 9.7, but this seems high considering the shortest scenario is 6 actions and the longest is 18, giving a reward between 3 and 15 for a perfect plan.

Results

  • BAR = Best Average Result
  • EC = Epsilon Change
BAR EC Alpha Gamma Type
9.59 0.45 0.05 0.85 Sarsa expected
9.56 0.75 0.1 0.9 Sarsa expected
9.52 0.45 0.05 0.95 Sarsa expected
9.51 0.75 0.15 0.95 Sarsa expected
9.46 0.6 0.2 0.95 Sarsa expected
9.44 0.75 0.1 1.0 Sarsa max
9.44 0.6 0.1 0.9 Sarsa expected
9.41 0.9 0.2 1.0 Sarsa expected
9.4 0.75 0.1 0.95 Sarsa max
9.39 0.75 0.05 0.85 Sarsa max
9.39 0.9 0.15 1.0 Sarsa max
9.39 0.45 0.2 0.95 Sarsa max
9.39 0.75 0.2 0.9 Sarsa max
9.38 0.9 0.05 0.9 Sarsa max
9.37 0.75 0.1 1.0 Sarsa expected
9.37 0.6 0.1 0.9 Sarsa max
9.37 0.75 0.1 0.85 Sarsa expected
9.37 0.45 0.1 0.85 Sarsa max
9.36 0.6 0.05 0.95 Sarsa max
9.36 0.9 0.05 0.85 Sarsa expected
9.36 0.6 0.1 1.0 Sarsa max
9.36 0.9 0.15 0.9 Sarsa max
9.36 0.6 0.15 0.9 Sarsa max
9.34 0.75 0.05 0.95 Sarsa expected
9.34 0.6 0.05 0.9 Sarsa max
9.34 0.9 0.1 0.9 Sarsa expected
9.34 0.45 0.1 0.9 Sarsa expected
9.34 0.6 0.15 0.85 Sarsa expected
9.33 0.45 0.1 0.95 Sarsa expected
9.33 0.45 0.15 0.9 Sarsa max
9.32 0.6 0.05 1.0 Sarsa max
9.32 0.45 0.05 1.0 Sarsa expected
9.32 0.9 0.15 0.9 Sarsa expected
9.32 0.9 0.15 0.85 Sarsa max
9.31 0.6 0.05 0.85 Sarsa max
9.31 0.9 0.1 0.95 Sarsa max
9.31 0.9 0.1 0.95 Sarsa expected
9.31 0.6 0.1 0.85 Sarsa expected
9.31 0.6 0.15 0.85 Sarsa max
9.31 0.75 0.2 0.9 Sarsa expected
9.3 0.6 0.1 0.95 Sarsa max
9.3 0.45 0.15 0.95 Sarsa expected
9.29 0.6 0.05 0.95 Sarsa expected
9.29 0.6 0.05 0.9 Sarsa expected
9.29 0.75 0.05 0.85 Sarsa expected
9.29 0.9 0.15 0.95 Sarsa max
9.29 0.45 0.15 0.95 Sarsa max
9.29 0.45 0.15 0.9 Sarsa expected
9.29 0.45 0.2 1.0 Sarsa max
9.29 0.45 0.2 0.85 Sarsa expected
9.28 0.45 0.05 0.9 Sarsa expected
9.28 0.75 0.1 0.9 Sarsa max
9.28 0.9 0.1 0.85 Sarsa max
9.28 0.6 0.1 0.85 Sarsa max
9.28 0.75 0.15 1.0 Sarsa expected
9.28 0.45 0.15 0.85 Sarsa expected
9.28 0.9 0.2 0.95 Sarsa expected
9.28 0.75 0.2 0.95 Sarsa expected
9.28 0.6 0.2 0.85 Sarsa expected
9.28 0.45 0.2 0.85 Sarsa max
9.27 0.75 0.05 0.9 Sarsa max
9.27 0.45 0.05 0.9 Sarsa max
9.27 0.6 0.15 0.9 Sarsa expected
9.27 0.45 0.15 0.85 Sarsa max
9.27 0.75 0.2 1.0 Sarsa expected
9.27 0.6 0.2 1.0 Sarsa max
9.26 0.45 0.05 0.95 Sarsa max
9.26 0.9 0.1 0.9 Sarsa max
9.26 0.75 0.15 0.9 Sarsa max
9.26 0.6 0.2 0.9 Sarsa expected
9.26 0.6 0.2 0.85 Sarsa max
9.25 0.9 0.05 1.0 Sarsa expected
9.25 0.45 0.05 1.0 Sarsa max
9.25 0.75 0.05 0.9 Sarsa expected
9.25 0.45 0.1 0.85 Sarsa expected
9.25 0.6 0.15 0.95 Sarsa max
9.25 0.75 0.15 0.85 Sarsa expected
9.25 0.6 0.2 1.0 Sarsa expected
9.25 0.75 0.2 0.95 Sarsa max
9.25 0.6 0.2 0.9 Sarsa max
9.25 0.75 0.2 0.85 Sarsa expected
9.24 0.75 0.05 1.0 Sarsa expected
9.24 0.9 0.05 0.95 Sarsa expected
9.24 0.45 0.1 1.0 Sarsa expected
9.24 0.9 0.15 0.85 Sarsa expected
9.23 0.9 0.05 0.95 Sarsa max
9.23 0.9 0.05 0.85 Sarsa max
9.23 0.9 0.1 0.85 Sarsa expected
9.23 0.75 0.15 0.95 Sarsa max
9.23 0.9 0.2 0.85 Sarsa max
9.22 0.6 0.05 0.85 Sarsa expected
9.22 0.75 0.1 0.95 Sarsa expected
9.22 0.45 0.1 0.95 Sarsa max
9.22 0.45 0.2 1.0 Sarsa expected
9.22 0.45 0.2 0.9 Sarsa expected
9.22 0.9 0.2 0.85 Sarsa expected
9.22 0.75 0.2 0.85 Sarsa max
9.21 0.6 0.05 1.0 Sarsa expected
9.21 0.9 0.05 0.9 Sarsa expected
9.21 0.9 0.1 1.0 Sarsa max
9.21 0.6 0.1 1.0 Sarsa expected
9.21 0.45 0.1 1.0 Sarsa max
9.21 0.75 0.15 1.0 Sarsa max
9.21 0.45 0.15 1.0 Sarsa max
9.21 0.75 0.15 0.85 Sarsa max
9.2 0.75 0.05 1.0 Sarsa max
9.19 0.45 0.1 0.9 Sarsa max
9.19 0.9 0.15 1.0 Sarsa expected
9.19 0.6 0.15 1.0 Sarsa max
9.19 0.45 0.15 1.0 Sarsa expected
9.19 0.75 0.15 0.9 Sarsa expected
9.19 0.9 0.2 0.9 Sarsa max
9.19 0.9 0.2 0.9 Sarsa expected
9.18 0.9 0.1 1.0 Sarsa expected
9.18 0.75 0.2 1.0 Sarsa max
9.17 0.6 0.15 1.0 Sarsa expected
9.17 0.6 0.2 0.95 Sarsa max
9.16 0.9 0.05 1.0 Sarsa max
9.16 0.75 0.05 0.95 Sarsa max
9.16 0.6 0.1 0.95 Sarsa expected
9.15 0.45 0.05 0.85 Sarsa max
9.15 0.75 0.1 0.85 Sarsa max
9.15 0.9 0.2 0.95 Sarsa max
9.15 0.45 0.2 0.9 Sarsa max
9.14 0.9 0.2 1.0 Sarsa max
9.12 0.6 0.15 0.95 Sarsa expected
9.09 0.9 0.15 0.95 Sarsa expected
9.09 0.45 0.2 0.95 Sarsa expected

Sample Output

Epsilon change=0.6, Alpha=0.2, Gamma=0.85, Type=Sarsa max
Episode 20000/20000 || Best average reward 9.26  , current average reward 8.96

Total avg rewards= 5.668

Start
+---------+
|R: | : :G|
| : : : : |
| : : : : |
| | : | : |
|Y| : |B: |
+---------+


Iteration 15, Cumaltive reward 6
+---------+
|R: | : :G|
| : : : : |
| : : : : |
| | : | : |
|Y| : |B: |
+---------+
  (Dropoff)

Final reward=6

Episode 100000/100000 || Best average reward 9.37  , current average reward 8.47

Total avg rewards= 8.44849
Top 5 configuration so far
Best average reward=9.59, Epsilon change=0.45, Alpha=0.05, Gamma=0.85, Type=Sarsa expected
Best average reward=9.56, Epsilon change=0.75, Alpha=0.1, Gamma=0.9, Type=Sarsa expected
Best average reward=9.52, Epsilon change=0.45, Alpha=0.05, Gamma=0.95, Type=Sarsa expected
Best average reward=9.51, Epsilon change=0.75, Alpha=0.15, Gamma=0.95, Type=Sarsa expected
Best average reward=9.46, Epsilon change=0.6, Alpha=0.2, Gamma=0.95, Type=Sarsa expected

Epsilon change=0.6, Alpha=0.2, Gamma=0.85, Type=Sarsa expected
Episode 20000/20000 || Best average reward 9.28  , current average reward 8.12

Total avg rewards= 5.5344

Start
+---------+
|R: | : :G|
| : : : : |
| : : : : |
| | : | : |
|Y| : |B: |
+---------+


Iteration 16, Cumaltive reward 5
+---------+
|R: | : :G|
| : : : : |
| : : : : |
| | : | : |
|Y| : |B: |
+---------+
  (Dropoff)

Final reward=5

Episode 100000/100000 || Best average reward 9.48  , current average reward 8.53

Total avg rewards= 8.28445
Top 5 configuration so far
Best average reward=9.59, Epsilon change=0.45, Alpha=0.05, Gamma=0.85, Type=Sarsa expected
Best average reward=9.56, Epsilon change=0.75, Alpha=0.1, Gamma=0.9, Type=Sarsa expected
Best average reward=9.52, Epsilon change=0.45, Alpha=0.05, Gamma=0.95, Type=Sarsa expected
Best average reward=9.51, Epsilon change=0.75, Alpha=0.15, Gamma=0.95, Type=Sarsa expected
Best average reward=9.46, Epsilon change=0.6, Alpha=0.2, Gamma=0.95, Type=Sarsa expected

Epsilon change=0.45, Alpha=0.2, Gamma=0.85, Type=Sarsa max
Episode 20000/20000 || Best average reward 9.28  , current average reward 8.45

Total avg rewards= 5.72345

Start
+---------+
|R: | : :G|
| : : : : |
| : : : : |
| | : | : |
|Y| : |B: |
+---------+


Iteration 16, Cumaltive reward 5
+---------+
|R: | : :G|
| : : : : |
| : : : : |
| | : | : |
|Y| : |B: |
+---------+
  (Dropoff)

Final reward=5

Episode 100000/100000 || Best average reward 9.59  , current average reward 8.67

Total avg rewards= 8.46225
Top 5 configuration so far
Best average reward=9.59, Epsilon change=0.45, Alpha=0.05, Gamma=0.85, Type=Sarsa expected
Best average reward=9.56, Epsilon change=0.75, Alpha=0.1, Gamma=0.9, Type=Sarsa expected
Best average reward=9.52, Epsilon change=0.45, Alpha=0.05, Gamma=0.95, Type=Sarsa expected
Best average reward=9.51, Epsilon change=0.75, Alpha=0.15, Gamma=0.95, Type=Sarsa expected
Best average reward=9.46, Epsilon change=0.6, Alpha=0.2, Gamma=0.95, Type=Sarsa expected

Epsilon change=0.45, Alpha=0.2, Gamma=0.85, Type=Sarsa expected
Episode 20000/20000 || Best average reward 9.29  , current average reward 9.12

Total avg rewards= 5.73365

Start
+---------+
|R: | : :G|
| : : : : |
| : : : : |
| | : | : |
|Y| : |B: |
+---------+


Iteration 12, Cumaltive reward 9
+---------+
|R: | : :G|
| : : : : |
| : : : : |
| | : | : |
|Y| : |B: |
+---------+
  (Dropoff)

Final reward=9

Episode 100000/100000 || Best average reward 9.41  , current average reward 8.46

Total avg rewards= 8.44969
Top 5 configuration so far
Best average reward=9.59, Epsilon change=0.45, Alpha=0.05, Gamma=0.85, Type=Sarsa expected
Best average reward=9.56, Epsilon change=0.75, Alpha=0.1, Gamma=0.9, Type=Sarsa expected
Best average reward=9.52, Epsilon change=0.45, Alpha=0.05, Gamma=0.95, Type=Sarsa expected
Best average reward=9.51, Epsilon change=0.75, Alpha=0.15, Gamma=0.95, Type=Sarsa expected
Best average reward=9.46, Epsilon change=0.6, Alpha=0.2, Gamma=0.95, Type=Sarsa expected

Completed for
Alphas=[0.05, 0.1, 0.15, 0.2]
Gammas=[1.0, 0.95, 0.9, 0.85]
Epsilon change=[0.9, 0.75, 0.6, 0.45]
Type=['max', 'expected']

Results
Best average reward=9.59, Epsilon change=0.45, Alpha=0.05, Gamma=0.85, Type=Sarsa expected
Best average reward=9.56, Epsilon change=0.75, Alpha=0.1, Gamma=0.9, Type=Sarsa expected
Best average reward=9.52, Epsilon change=0.45, Alpha=0.05, Gamma=0.95, Type=Sarsa expected
Best average reward=9.51, Epsilon change=0.75, Alpha=0.15, Gamma=0.95, Type=Sarsa expected
Best average reward=9.46, Epsilon change=0.6, Alpha=0.2, Gamma=0.95, Type=Sarsa expected
Best average reward=9.44, Epsilon change=0.75, Alpha=0.1, Gamma=1.0, Type=Sarsa max
Best average reward=9.44, Epsilon change=0.6, Alpha=0.1, Gamma=0.9, Type=Sarsa expected
Best average reward=9.41, Epsilon change=0.9, Alpha=0.2, Gamma=1.0, Type=Sarsa expected
Best average reward=9.4, Epsilon change=0.75, Alpha=0.1, Gamma=0.95, Type=Sarsa max
Best average reward=9.39, Epsilon change=0.75, Alpha=0.05, Gamma=0.85, Type=Sarsa max
Best average reward=9.39, Epsilon change=0.9, Alpha=0.15, Gamma=1.0, Type=Sarsa max
Best average reward=9.39, Epsilon change=0.45, Alpha=0.2, Gamma=0.95, Type=Sarsa max
Best average reward=9.39, Epsilon change=0.75, Alpha=0.2, Gamma=0.9, Type=Sarsa max
Best average reward=9.38, Epsilon change=0.9, Alpha=0.05, Gamma=0.9, Type=Sarsa max
Best average reward=9.37, Epsilon change=0.75, Alpha=0.1, Gamma=1.0, Type=Sarsa expected
Best average reward=9.37, Epsilon change=0.6, Alpha=0.1, Gamma=0.9, Type=Sarsa max
Best average reward=9.37, Epsilon change=0.75, Alpha=0.1, Gamma=0.85, Type=Sarsa expected
Best average reward=9.37, Epsilon change=0.45, Alpha=0.1, Gamma=0.85, Type=Sarsa max
Best average reward=9.36, Epsilon change=0.6, Alpha=0.05, Gamma=0.95, Type=Sarsa max
Best average reward=9.36, Epsilon change=0.9, Alpha=0.05, Gamma=0.85, Type=Sarsa expected
Best average reward=9.36, Epsilon change=0.6, Alpha=0.1, Gamma=1.0, Type=Sarsa max
Best average reward=9.36, Epsilon change=0.9, Alpha=0.15, Gamma=0.9, Type=Sarsa max
Best average reward=9.36, Epsilon change=0.6, Alpha=0.15, Gamma=0.9, Type=Sarsa max
Best average reward=9.34, Epsilon change=0.75, Alpha=0.05, Gamma=0.95, Type=Sarsa expected
Best average reward=9.34, Epsilon change=0.6, Alpha=0.05, Gamma=0.9, Type=Sarsa max
Best average reward=9.34, Epsilon change=0.9, Alpha=0.1, Gamma=0.9, Type=Sarsa expected
Best average reward=9.34, Epsilon change=0.45, Alpha=0.1, Gamma=0.9, Type=Sarsa expected
Best average reward=9.34, Epsilon change=0.6, Alpha=0.15, Gamma=0.85, Type=Sarsa expected
Best average reward=9.33, Epsilon change=0.45, Alpha=0.1, Gamma=0.95, Type=Sarsa expected
Best average reward=9.33, Epsilon change=0.45, Alpha=0.15, Gamma=0.9, Type=Sarsa max
Best average reward=9.32, Epsilon change=0.6, Alpha=0.05, Gamma=1.0, Type=Sarsa max
Best average reward=9.32, Epsilon change=0.45, Alpha=0.05, Gamma=1.0, Type=Sarsa expected
Best average reward=9.32, Epsilon change=0.9, Alpha=0.15, Gamma=0.9, Type=Sarsa expected
Best average reward=9.32, Epsilon change=0.9, Alpha=0.15, Gamma=0.85, Type=Sarsa max
Best average reward=9.31, Epsilon change=0.6, Alpha=0.05, Gamma=0.85, Type=Sarsa max
Best average reward=9.31, Epsilon change=0.9, Alpha=0.1, Gamma=0.95, Type=Sarsa max
Best average reward=9.31, Epsilon change=0.9, Alpha=0.1, Gamma=0.95, Type=Sarsa expected
Best average reward=9.31, Epsilon change=0.6, Alpha=0.1, Gamma=0.85, Type=Sarsa expected
Best average reward=9.31, Epsilon change=0.6, Alpha=0.15, Gamma=0.85, Type=Sarsa max
Best average reward=9.31, Epsilon change=0.75, Alpha=0.2, Gamma=0.9, Type=Sarsa expected
Best average reward=9.3, Epsilon change=0.6, Alpha=0.1, Gamma=0.95, Type=Sarsa max
Best average reward=9.3, Epsilon change=0.45, Alpha=0.15, Gamma=0.95, Type=Sarsa expected
Best average reward=9.29, Epsilon change=0.6, Alpha=0.05, Gamma=0.95, Type=Sarsa expected
Best average reward=9.29, Epsilon change=0.6, Alpha=0.05, Gamma=0.9, Type=Sarsa expected
Best average reward=9.29, Epsilon change=0.75, Alpha=0.05, Gamma=0.85, Type=Sarsa expected
Best average reward=9.29, Epsilon change=0.9, Alpha=0.15, Gamma=0.95, Type=Sarsa max
Best average reward=9.29, Epsilon change=0.45, Alpha=0.15, Gamma=0.95, Type=Sarsa max
Best average reward=9.29, Epsilon change=0.45, Alpha=0.15, Gamma=0.9, Type=Sarsa expected
Best average reward=9.29, Epsilon change=0.45, Alpha=0.2, Gamma=1.0, Type=Sarsa max
Best average reward=9.29, Epsilon change=0.45, Alpha=0.2, Gamma=0.85, Type=Sarsa expected
Best average reward=9.28, Epsilon change=0.45, Alpha=0.05, Gamma=0.9, Type=Sarsa expected
Best average reward=9.28, Epsilon change=0.75, Alpha=0.1, Gamma=0.9, Type=Sarsa max
Best average reward=9.28, Epsilon change=0.9, Alpha=0.1, Gamma=0.85, Type=Sarsa max
Best average reward=9.28, Epsilon change=0.6, Alpha=0.1, Gamma=0.85, Type=Sarsa max
Best average reward=9.28, Epsilon change=0.75, Alpha=0.15, Gamma=1.0, Type=Sarsa expected
Best average reward=9.28, Epsilon change=0.45, Alpha=0.15, Gamma=0.85, Type=Sarsa expected
Best average reward=9.28, Epsilon change=0.9, Alpha=0.2, Gamma=0.95, Type=Sarsa expected
Best average reward=9.28, Epsilon change=0.75, Alpha=0.2, Gamma=0.95, Type=Sarsa expected
Best average reward=9.28, Epsilon change=0.6, Alpha=0.2, Gamma=0.85, Type=Sarsa expected
Best average reward=9.28, Epsilon change=0.45, Alpha=0.2, Gamma=0.85, Type=Sarsa max
Best average reward=9.27, Epsilon change=0.75, Alpha=0.05, Gamma=0.9, Type=Sarsa max
Best average reward=9.27, Epsilon change=0.45, Alpha=0.05, Gamma=0.9, Type=Sarsa max
Best average reward=9.27, Epsilon change=0.6, Alpha=0.15, Gamma=0.9, Type=Sarsa expected
Best average reward=9.27, Epsilon change=0.45, Alpha=0.15, Gamma=0.85, Type=Sarsa max
Best average reward=9.27, Epsilon change=0.75, Alpha=0.2, Gamma=1.0, Type=Sarsa expected
Best average reward=9.27, Epsilon change=0.6, Alpha=0.2, Gamma=1.0, Type=Sarsa max
Best average reward=9.26, Epsilon change=0.45, Alpha=0.05, Gamma=0.95, Type=Sarsa max
Best average reward=9.26, Epsilon change=0.9, Alpha=0.1, Gamma=0.9, Type=Sarsa max
Best average reward=9.26, Epsilon change=0.75, Alpha=0.15, Gamma=0.9, Type=Sarsa max
Best average reward=9.26, Epsilon change=0.6, Alpha=0.2, Gamma=0.9, Type=Sarsa expected
Best average reward=9.26, Epsilon change=0.6, Alpha=0.2, Gamma=0.85, Type=Sarsa max
Best average reward=9.25, Epsilon change=0.9, Alpha=0.05, Gamma=1.0, Type=Sarsa expected
Best average reward=9.25, Epsilon change=0.45, Alpha=0.05, Gamma=1.0, Type=Sarsa max
Best average reward=9.25, Epsilon change=0.75, Alpha=0.05, Gamma=0.9, Type=Sarsa expected
Best average reward=9.25, Epsilon change=0.45, Alpha=0.1, Gamma=0.85, Type=Sarsa expected
Best average reward=9.25, Epsilon change=0.6, Alpha=0.15, Gamma=0.95, Type=Sarsa max
Best average reward=9.25, Epsilon change=0.75, Alpha=0.15, Gamma=0.85, Type=Sarsa expected
Best average reward=9.25, Epsilon change=0.6, Alpha=0.2, Gamma=1.0, Type=Sarsa expected
Best average reward=9.25, Epsilon change=0.75, Alpha=0.2, Gamma=0.95, Type=Sarsa max
Best average reward=9.25, Epsilon change=0.6, Alpha=0.2, Gamma=0.9, Type=Sarsa max
Best average reward=9.25, Epsilon change=0.75, Alpha=0.2, Gamma=0.85, Type=Sarsa expected
Best average reward=9.24, Epsilon change=0.75, Alpha=0.05, Gamma=1.0, Type=Sarsa expected
Best average reward=9.24, Epsilon change=0.9, Alpha=0.05, Gamma=0.95, Type=Sarsa expected
Best average reward=9.24, Epsilon change=0.45, Alpha=0.1, Gamma=1.0, Type=Sarsa expected
Best average reward=9.24, Epsilon change=0.9, Alpha=0.15, Gamma=0.85, Type=Sarsa expected
Best average reward=9.23, Epsilon change=0.9, Alpha=0.05, Gamma=0.95, Type=Sarsa max
Best average reward=9.23, Epsilon change=0.9, Alpha=0.05, Gamma=0.85, Type=Sarsa max
Best average reward=9.23, Epsilon change=0.9, Alpha=0.1, Gamma=0.85, Type=Sarsa expected
Best average reward=9.23, Epsilon change=0.75, Alpha=0.15, Gamma=0.95, Type=Sarsa max
Best average reward=9.23, Epsilon change=0.9, Alpha=0.2, Gamma=0.85, Type=Sarsa max
Best average reward=9.22, Epsilon change=0.6, Alpha=0.05, Gamma=0.85, Type=Sarsa expected
Best average reward=9.22, Epsilon change=0.75, Alpha=0.1, Gamma=0.95, Type=Sarsa expected
Best average reward=9.22, Epsilon change=0.45, Alpha=0.1, Gamma=0.95, Type=Sarsa max
Best average reward=9.22, Epsilon change=0.45, Alpha=0.2, Gamma=1.0, Type=Sarsa expected
Best average reward=9.22, Epsilon change=0.45, Alpha=0.2, Gamma=0.9, Type=Sarsa expected
Best average reward=9.22, Epsilon change=0.9, Alpha=0.2, Gamma=0.85, Type=Sarsa expected
Best average reward=9.22, Epsilon change=0.75, Alpha=0.2, Gamma=0.85, Type=Sarsa max
Best average reward=9.21, Epsilon change=0.6, Alpha=0.05, Gamma=1.0, Type=Sarsa expected
Best average reward=9.21, Epsilon change=0.9, Alpha=0.05, Gamma=0.9, Type=Sarsa expected
Best average reward=9.21, Epsilon change=0.9, Alpha=0.1, Gamma=1.0, Type=Sarsa max
Best average reward=9.21, Epsilon change=0.6, Alpha=0.1, Gamma=1.0, Type=Sarsa expected
Best average reward=9.21, Epsilon change=0.45, Alpha=0.1, Gamma=1.0, Type=Sarsa max
Best average reward=9.21, Epsilon change=0.75, Alpha=0.15, Gamma=1.0, Type=Sarsa max
Best average reward=9.21, Epsilon change=0.45, Alpha=0.15, Gamma=1.0, Type=Sarsa max
Best average reward=9.21, Epsilon change=0.75, Alpha=0.15, Gamma=0.85, Type=Sarsa max
Best average reward=9.2, Epsilon change=0.75, Alpha=0.05, Gamma=1.0, Type=Sarsa max
Best average reward=9.19, Epsilon change=0.45, Alpha=0.1, Gamma=0.9, Type=Sarsa max
Best average reward=9.19, Epsilon change=0.9, Alpha=0.15, Gamma=1.0, Type=Sarsa expected
Best average reward=9.19, Epsilon change=0.6, Alpha=0.15, Gamma=1.0, Type=Sarsa max
Best average reward=9.19, Epsilon change=0.45, Alpha=0.15, Gamma=1.0, Type=Sarsa expected
Best average reward=9.19, Epsilon change=0.75, Alpha=0.15, Gamma=0.9, Type=Sarsa expected
Best average reward=9.19, Epsilon change=0.9, Alpha=0.2, Gamma=0.9, Type=Sarsa max
Best average reward=9.19, Epsilon change=0.9, Alpha=0.2, Gamma=0.9, Type=Sarsa expected
Best average reward=9.18, Epsilon change=0.9, Alpha=0.1, Gamma=1.0, Type=Sarsa expected
Best average reward=9.18, Epsilon change=0.75, Alpha=0.2, Gamma=1.0, Type=Sarsa max
Best average reward=9.17, Epsilon change=0.6, Alpha=0.15, Gamma=1.0, Type=Sarsa expected
Best average reward=9.17, Epsilon change=0.6, Alpha=0.2, Gamma=0.95, Type=Sarsa max
Best average reward=9.16, Epsilon change=0.9, Alpha=0.05, Gamma=1.0, Type=Sarsa max
Best average reward=9.16, Epsilon change=0.75, Alpha=0.05, Gamma=0.95, Type=Sarsa max
Best average reward=9.16, Epsilon change=0.6, Alpha=0.1, Gamma=0.95, Type=Sarsa expected
Best average reward=9.15, Epsilon change=0.45, Alpha=0.05, Gamma=0.85, Type=Sarsa max
Best average reward=9.15, Epsilon change=0.75, Alpha=0.1, Gamma=0.85, Type=Sarsa max
Best average reward=9.15, Epsilon change=0.9, Alpha=0.2, Gamma=0.95, Type=Sarsa max
Best average reward=9.15, Epsilon change=0.45, Alpha=0.2, Gamma=0.9, Type=Sarsa max
Best average reward=9.14, Epsilon change=0.9, Alpha=0.2, Gamma=1.0, Type=Sarsa max
Best average reward=9.12, Epsilon change=0.6, Alpha=0.15, Gamma=0.95, Type=Sarsa expected
Best average reward=9.09, Epsilon change=0.9, Alpha=0.15, Gamma=0.95, Type=Sarsa expected
Best average reward=9.09, Epsilon change=0.45, Alpha=0.2, Gamma=0.95, Type=Sarsa expected