Zack Beucler
- Use RL to train an agent to competitively complete a race on the first level in the GBA game 'Hot Wheels Stunt Track Challenge'
- Agent should be be able to complete a lap
-
NOTE:
actual_playing.state
is a save file of other maps and challenges in the game.
discrete | multidiscrete | multibinary | |
---|---|---|---|
PPO | ✅ | ✅ | ✅ |
A2C | ✅ | ✅ | ✅ |
DQN | ✅ | ❌ | ❌ |
HER | ✅ | ❌ | ❌ |
QR-DQN | ✅ | ❌ | ❌ |
RecurrentPPO | ✅ | ✅ | ✅ |
TRPO | ✅ | ✅ | ✅ |
Maskable PPO | ✅ | ✅ | ✅ |
ARS | ✅ | ❌ | ❌ |
- math is probably formatted wrong but idc
- speed reward:
- +/- 0.1 if mean speed increases/decreases
n
: Total time steps in episode- In my mind, this should encourage the bot to make forward progress and score points
- train 3 laps
+10
for completing a lap+0.1
or+0.01
for increasing speed- bigger score reward
- Using PPO hyperparameters from Proximal Policy Optimization Algorithms paper
learning_rate=2.5e-4,
n_steps=128,
n_epochs=3,
batch_size=32,
ent_coef=0.01,
vf_coef=1.0,
num_envs=8