tf_agents.bandits.policies.policy_utilities.PolicyInfo

PolicyInfo(log_probability, predicted_rewards_mean, predicted_rewards_optimistic, predicted_rewards_sampled, bandit_policy_type)

log_probability

predicted_rewards_mean

predicted_rewards_optimistic

predicted_rewards_sampled

bandit_policy_type