GitHub - timnugent/bandit-algorithms: Epsilon-greedy, softmax and LinUCB contextual bandit implementations [recommender systems]

timnugent / bandit-algorithms Public

Notifications You must be signed in to change notification settings
Fork 14
Star 48

Epsilon-greedy, softmax and LinUCB contextual bandit implementations [recommender systems]

48 stars 14 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
bin		bin
data		data
docs		docs
eigen-3.2.4		eigen-3.2.4
src		src
LICENSE		LICENSE
Makefile		Makefile
README.txt		README.txt

Repository files navigation

Epsilon-Greedy, Softmax and LinUCB Contextual Bandit Implementations
--------------------------------------------------------------------
Tim Nugent 2015

Upper confidence bound (UCB) contextual bandit with disjoint linear models - see algorithm 1 here (also in docs directory):

https://github.com/rohitkumar07/AI-Paper-Reviews/blob/master/Contextual%20Bandits/Li10Contextual.pdf

Annealing softmax (Blotzmann Exploration) and annealing Epsilon-Greedy implementations:

http://www.cs.mcgill.ca/~vkules/bandits.pdf

Build
-----

Requires C++ build-essential package (e.g. sudo apt-get install build-essential) and Eigen headers, which are included. Just run 'make'.

The Python script to plot the results requires matplotlib (e.g. sudo apt-get install python-matplotlib).

Run
---

Run with 'make test'. This runs the binary and then calls the python scipt to plot the results.

Output
------

bin/cbandit > data/cbandit_out.csv
src/plot.py data/cbandit_out.csv
['1', '0', '1']
['2', '0', '1']
['3', '0.333333', '1']
['4', '0.25', '0.75']
['5', '0.4', '0.6']
['6', '0.5', '0.5']
['7', '0.428571', '0.428571']
['8', '0.375', '0.5']
['9', '0.333333', '0.444444']
['10', '0.4', '0.4']
...
['2995', '0.982638', '0.249082']
['2996', '0.982644', '0.248999']
['2997', '0.982649', '0.248916']
['2998', '0.982655', '0.248833']
['2999', '0.982661', '0.24875']
['3000', '0.982667', '0.248667']
Parsed  data/cbandit_out.csv
Written data/cb-vs-random.png
...
bin/smbandit > data/sm-eg-bandit_out.csv
src/plot2.py data/sm-eg-bandit_out.csv
['1', '0', '0', '0']
['2', '0', '0', '0']
['3', '0', '0.333333', '0']
['4', '0', '0.25', '0']
...
['2995', '0.976294', '0.976294', '0.211018']
['2996', '0.976302', '0.976302', '0.211282']
['2997', '0.97631', '0.97631', '0.211545']
['2998', '0.976318', '0.976318', '0.211474']
['2999', '0.976325', '0.976325', '0.211404']
['3000', '0.976333', '0.976333', '0.211333']
Parsed  data/sm-eg-bandit_out.csv
Written data/sm-vs-eg-vs-random.png


See the plot data/cb-vs-random.png. The results show how the contextual bandit compares with a random strategy at selecting the best of 4 arms, with 'best' being determined by the context (in this case a a vector of length 2 - see line 37-44 in main.cpp). The contextual bandit learns which contexts yield a reward by updating the A matrix for the best arm at each step, and the B vector for this arm if there is a non-zero reward (i.e. the selected arm is correct). As expected, the random strategy converges to about 25%, while the contextual bandit learns quickly and converges to ~98% over 3000 steps. Methods such as the contextual bandit are therefore well suited to personalised (this data is provided as the context) recommender systems where feedback is partial (i.e. user clicks). The second plot data/sm-vs-eg-vs-random.png shows the performance of annealing Epsilon-Greedy and Softmax bandits vs a random strategy, with 5 Bernoulli arms.

Bugs
----

[email protected]