Module: tf_agents.bandits.agents.neural_linucb_agent

Implements the Neural + LinUCB bandit algorithm.

Applies LinUCB on top of an encoding network. Since LinUCB is a linear method, the encoding network is used to capture the non-linear relationship between the context features and the expected rewards. The encoding network may be already trained or not; if not trained, the method can optionally train it using epsilon greedy.

Reference:

Carlos Riquelme, George Tucker, Jasper Snoek, Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling, ICLR 2018.

Classes

class NeuralLinUCBAgent: An agent implementing the LinUCB algorithm on top of a neural network.

class NeuralLinUCBVariableCollection: A collection of variables used by NeuralLinUCBAgent.