Module: tf_agents.bandits.policies.policy_utilities

View source on GitHub

Utilities for bandit policies.


class BanditPolicyType: Enumeration of bandit policy types.

class InfoFields: Strings which can be used in the policy info fields.

class PolicyInfo: PolicyInfo(log_probability, predicted_rewards_mean, predicted_rewards_sampled, bandit_policy_type)


bandit_policy_uniform_mask(...): Set bandit policy type tensor to BanditPolicyType.UNIFORM based on mask.

create_bandit_policy_type_tensor_spec(...): Create tensor spec for bandit policy type.

has_bandit_policy_type(...): Check if policy info has bandit_policy_type field/tensor.

masked_argmax(...): Computes the argmax where the allowed elements are given by a mask.

set_bandit_policy_type(...): Sets the InfoFields.BANDIT_POLICY_TYPE on info to bandit_policy_type.