Skip to content

Epsilon-greedy, softmax and LinUCB contextual bandit implementations [recommender systems]

License

Notifications You must be signed in to change notification settings

timnugent/bandit-algorithms

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Epsilon-Greedy, Softmax and LinUCB Contextual Bandit Implementations
--------------------------------------------------------------------
Tim Nugent 2015

Upper confidence bound (UCB) contextual bandit with disjoint linear models - see algorithm 1 here (also in docs directory):

https://github.com/rohitkumar07/AI-Paper-Reviews/blob/master/Contextual%20Bandits/Li10Contextual.pdf

Annealing softmax (Blotzmann Exploration) and annealing Epsilon-Greedy implementations:

http://www.cs.mcgill.ca/~vkules/bandits.pdf

Build
-----

Requires C++ build-essential package (e.g. sudo apt-get install build-essential) and Eigen headers, which are included. Just run 'make'.

The Python script to plot the results requires matplotlib (e.g. sudo apt-get install python-matplotlib).

Run
---

Run with 'make test'. This runs the binary and then calls the python scipt to plot the results.

Output
------

bin/cbandit > data/cbandit_out.csv
src/plot.py data/cbandit_out.csv
['1', '0', '1']
['2', '0', '1']
['3', '0.333333', '1']
['4', '0.25', '0.75']
['5', '0.4', '0.6']
['6', '0.5', '0.5']
['7', '0.428571', '0.428571']
['8', '0.375', '0.5']
['9', '0.333333', '0.444444']
['10', '0.4', '0.4']
...
['2995', '0.982638', '0.249082']
['2996', '0.982644', '0.248999']
['2997', '0.982649', '0.248916']
['2998', '0.982655', '0.248833']
['2999', '0.982661', '0.24875']
['3000', '0.982667', '0.248667']
Parsed  data/cbandit_out.csv
Written data/cb-vs-random.png
...
bin/smbandit > data/sm-eg-bandit_out.csv
src/plot2.py data/sm-eg-bandit_out.csv
['1', '0', '0', '0']
['2', '0', '0', '0']
['3', '0', '0.333333', '0']
['4', '0', '0.25', '0']
...
['2995', '0.976294', '0.976294', '0.211018']
['2996', '0.976302', '0.976302', '0.211282']
['2997', '0.97631', '0.97631', '0.211545']
['2998', '0.976318', '0.976318', '0.211474']
['2999', '0.976325', '0.976325', '0.211404']
['3000', '0.976333', '0.976333', '0.211333']
Parsed  data/sm-eg-bandit_out.csv
Written data/sm-vs-eg-vs-random.png


See the plot data/cb-vs-random.png. The results show how the contextual bandit compares with a random strategy at selecting the best of 4 arms, with 'best' being determined by the context (in this case a a vector of length 2 - see line 37-44 in main.cpp). The contextual bandit learns which contexts yield a reward by updating the A matrix for the best arm at each step, and the B vector for this arm if there is a non-zero reward (i.e. the selected arm is correct). As expected, the random strategy converges to about 25%, while the contextual bandit learns quickly and converges to ~98% over 3000 steps. Methods such as the contextual bandit are therefore well suited to personalised (this data is provided as the context) recommender systems where feedback is partial (i.e. user clicks). The second plot data/sm-vs-eg-vs-random.png shows the performance of annealing Epsilon-Greedy and Softmax bandits vs a random strategy, with 5 Bernoulli arms.

Bugs
----

[email protected]

About

Epsilon-greedy, softmax and LinUCB contextual bandit implementations [recommender systems]

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published