-
Notifications
You must be signed in to change notification settings - Fork 14
Epsilon-greedy, softmax and LinUCB contextual bandit implementations [recommender systems]
License
timnugent/bandit-algorithms
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Epsilon-Greedy, Softmax and LinUCB Contextual Bandit Implementations -------------------------------------------------------------------- Tim Nugent 2015 Upper confidence bound (UCB) contextual bandit with disjoint linear models - see algorithm 1 here (also in docs directory): https://github.com/rohitkumar07/AI-Paper-Reviews/blob/master/Contextual%20Bandits/Li10Contextual.pdf Annealing softmax (Blotzmann Exploration) and annealing Epsilon-Greedy implementations: http://www.cs.mcgill.ca/~vkules/bandits.pdf Build ----- Requires C++ build-essential package (e.g. sudo apt-get install build-essential) and Eigen headers, which are included. Just run 'make'. The Python script to plot the results requires matplotlib (e.g. sudo apt-get install python-matplotlib). Run --- Run with 'make test'. This runs the binary and then calls the python scipt to plot the results. Output ------ bin/cbandit > data/cbandit_out.csv src/plot.py data/cbandit_out.csv ['1', '0', '1'] ['2', '0', '1'] ['3', '0.333333', '1'] ['4', '0.25', '0.75'] ['5', '0.4', '0.6'] ['6', '0.5', '0.5'] ['7', '0.428571', '0.428571'] ['8', '0.375', '0.5'] ['9', '0.333333', '0.444444'] ['10', '0.4', '0.4'] ... ['2995', '0.982638', '0.249082'] ['2996', '0.982644', '0.248999'] ['2997', '0.982649', '0.248916'] ['2998', '0.982655', '0.248833'] ['2999', '0.982661', '0.24875'] ['3000', '0.982667', '0.248667'] Parsed data/cbandit_out.csv Written data/cb-vs-random.png ... bin/smbandit > data/sm-eg-bandit_out.csv src/plot2.py data/sm-eg-bandit_out.csv ['1', '0', '0', '0'] ['2', '0', '0', '0'] ['3', '0', '0.333333', '0'] ['4', '0', '0.25', '0'] ... ['2995', '0.976294', '0.976294', '0.211018'] ['2996', '0.976302', '0.976302', '0.211282'] ['2997', '0.97631', '0.97631', '0.211545'] ['2998', '0.976318', '0.976318', '0.211474'] ['2999', '0.976325', '0.976325', '0.211404'] ['3000', '0.976333', '0.976333', '0.211333'] Parsed data/sm-eg-bandit_out.csv Written data/sm-vs-eg-vs-random.png See the plot data/cb-vs-random.png. The results show how the contextual bandit compares with a random strategy at selecting the best of 4 arms, with 'best' being determined by the context (in this case a a vector of length 2 - see line 37-44 in main.cpp). The contextual bandit learns which contexts yield a reward by updating the A matrix for the best arm at each step, and the B vector for this arm if there is a non-zero reward (i.e. the selected arm is correct). As expected, the random strategy converges to about 25%, while the contextual bandit learns quickly and converges to ~98% over 3000 steps. Methods such as the contextual bandit are therefore well suited to personalised (this data is provided as the context) recommender systems where feedback is partial (i.e. user clicks). The second plot data/sm-vs-eg-vs-random.png shows the performance of annealing Epsilon-Greedy and Softmax bandits vs a random strategy, with 5 Bernoulli arms. Bugs ---- [email protected]
About
Epsilon-greedy, softmax and LinUCB contextual bandit implementations [recommender systems]
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published