|View source on GitHub|
An agent that maintains linear estimates for rewards and their uncertainty.
LinUCB and Linear Thompson Sampling agents are subclasses of this agent.
class ExplorationPolicy: Possible exploration policies.
class LinearBanditAgent: An agent that maintains linear reward estimates and their uncertainties.
class LinearBanditVariableCollection: A collection of variables used by
update_a_and_b_with_forgetting(...): Update the covariance matrix
a and the weighted sum of rewards