tf_agents.bandits.networks.global_and_arm_feature_network.create_feed_forward_common_tower_network

View source on GitHub

Creates a common tower network with feedforward towers.

The network produced by this function can be used either in GreedyRewardPredictionPolicy, or NeuralLinUCBPolicy. In the former case, the network must have output_dim=1, it is going to be an instance of QNetwork, and used in the policy as a reward prediction network. In the latter case, the network will be an encoding network with its output consumed by a reward layer or a LinUCB method. The specified output_dim will be the encoding dimension.

observation_spec A nested tensor spec containing the specs for global as well as per-arm observations.
global_layers Iterable of ints. Specifies the layers of the global tower.
arm_layers Iterable of ints. Specifies the layers of the arm tower.
common_layers Iterable of ints. Specifies the layers of the common tower.
output_dim The output dimension of the network. If 1, the common tower will be a QNetwork. Otherwise, the common tower will be an encoding network with the specified output dimension.
global_preprocessing_combiner Preprocessing combiner for global features.
arm_preprocessing_combiner Preprocessing combiner for the arm features.

A network that takes observations adhering observation_spec and outputs reward estimates for every action.