tf_agents.bandits.policies.constraints.QuantileConstraint

Class for representing a trainable quantile constraint.

Inherits From: NeuralConstraint, BaseConstraint

tf_agents.bandits.policies.constraints.QuantileConstraint(
    time_step_spec: tf_agents.typing.types.TimeStep,
    action_spec: tf_agents.typing.types.BoundedTensorSpec,
    constraint_network: tf_agents.typing.types.Network,
    quantile: float = 0.5,
    comparator_fn: tf_agents.typing.types.ComparatorFn = tf.greater,
    quantile_value: float = 0.0,
    name: Text = 'QuantileConstraint'
)

This constraint class implements a quantile constraint such as

Q_tau(x) >= v

Q_tau(x) <= v

Args
`time_step_spec`	A `TimeStep` spec of the expected time_steps.
`action_spec`	A nest of `BoundedTensorSpec` representing the actions.
`constraint_network`	An instance of `tf_agents.network.Network` used to provide estimates of action feasibility. The input structure should be consistent with the `observation_spec`.
`quantile`	A float between 0. and 1., the quantile we want to regress.
`comparator_fn`	a comparator function, such as tf.greater or tf.less.
`quantile_value`	the desired bound (float) we want to enforce on the quantile.
`name`	Python str name of this agent. All variables in this module will fall under that name. Defaults to the class name.

Attributes
`constraint_network`
`observation_spec`

Attributes

constraint_network

observation_spec

Methods

`compute_loss`

View source

compute_loss(
    observations: tf_agents.typing.types.NestedTensor,
    actions: tf_agents.typing.types.NestedTensor,
    rewards: tf_agents.typing.types.Tensor,
    weights: Optional[types.Float] = None,
    training: bool = False
) -> tf_agents.typing.types.Tensor

Computes loss for training the constraint network.

Args
`observations`	A batch of observations.
`actions`	A batch of actions.
`rewards`	A batch of rewards.
`weights`	Optional scalar or elementwise (per-batch-entry) importance weights. The output batch loss will be scaled by these weights, and the final scalar loss is the mean of these values.
`training`	Whether the loss is being used for training.

Returns
`loss`	A `Tensor` containing the loss for the training step.

`initialize`

View source

initialize()

Returns an op to initialize the constraint.

`call`

View source

__call__(
    observation, actions=None
)

Returns the probability of input actions being feasible.

tf_agents.bandits.policies.constraints.QuantileConstraint

Args

Attributes

Methods

compute_loss

initialize

__call__

`compute_loss`

`initialize`

`call`