Module: tf_agents.bandits.policies.falcon_reward_prediction_policy

Policy that samples actions based on the FALCON algorithm.

This policy implements an action sampling distribution based on the following paper: David Simchi-Levi and Yunzong Xu, "Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits under Realizability", Mathematics of Operations Research, 2021. https://arxiv.org/pdf/2003.12699.pdf

Classes

class FalconRewardPredictionPolicy: Policy that samples actions based on the FALCON algorithm.

Functions

get_number_of_trainable_elements(...): Gets the total # of elements in the network's trainable variables.