|View source on GitHub|
Implements the Neural + LinUCB bandit algorithm.
Applies LinUCB on top of an encoding network. Since LinUCB is a linear method, the encoding network is used to capture the non-linear relationship between the context features and the expected rewards. The encoding network may be already trained or not; if not trained, the method can optionally train it using epsilon greedy.
Carlos Riquelme, George Tucker, Jasper Snoek,
Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep
Networks for Thompson Sampling, ICLR 2018.
class NeuralLinUCBAgent: An agent implementing the LinUCB algorithm on top of a neural network.
class NeuralLinUCBVariableCollection: A collection of variables used by