|View source on GitHub|
Policy that samples actions based on the FALCON algorithm.
This policy implements an action sampling distribution based on the following paper: David Simchi-Levi and Yunzong Xu, "Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits under Realizability", Mathematics of Operations Research, 2021. https://arxiv.org/pdf/2003.12699.pdf
class FalconRewardPredictionPolicy: Policy that samples actions based on the FALCON algorithm.
get_number_of_trainable_elements(...): Gets the total # of elements in the network's trainable variables.