Module: tf_agents.bandits.agents.linear_bandit_agent

View source on GitHub

An agent that maintains linear estimates for rewards and their uncertainty.

LinUCB and Linear Thompson Sampling agents are subclasses of this agent.


class ExplorationPolicy: Possible exploration policies.

class LinearBanditAgent: An agent that maintains linear reward estimates and their uncertainties.

class LinearBanditVariableCollection: A collection of variables used by LinearBanditAgent.


update_a_and_b_with_forgetting(...): Update the covariance matrix a and the weighted sum of rewards b.