Creates a common tower network with feedforward towers.

Used in the notebooks

Used in the tutorials

The network produced by this function can be used either in GreedyRewardPredictionPolicy, or NeuralLinUCBPolicy. In the former case, the network must have output_dim=1, it is going to be an instance of QNetwork, and used in the policy as a reward prediction network. In the latter case, the network will be an encoding network with its output consumed by a reward layer or a LinUCB method. The specified output_dim will be the encoding dimension.

observation_spec A nested tensor spec containing the specs for global as well as per-arm observations.
global_layers Iterable of ints. Specifies the layers of the global tower.
arm_layers Iterable of ints. Specifies the layers of the arm tower.
common_layers Iterable of ints. Specifies the layers of the common tower.
output_dim The output dimension of the network. If 1, the common tower will be a QNetwork. Otherwise, the common tower will be an encoding network with the specified output dimension.
global_preprocessing_combiner Preprocessing combiner for global features.
arm_preprocessing_combiner Preprocessing combiner for the arm features.
activation_fn A keras activation, specifying the activation function used in all layers. Defaults to relu.
name The network name to use. Shows up in Tensorboard losses.

A network that takes observations adhering observation_spec and outputs reward estimates for every action.