|View source on GitHub|
Compute the bimodal integration between x and y.
nsl.lib.bimodal_integration( x, y, output_dims, integration_config, reuse=None, scope=None )
y are usually from two different types of sources,
x represents image embeddings and
y represent text embeddings.
This function will integrate bimodal inputs
y by the following:
outputs = fc_layer(
When the integration_type is (elementwise) 'additive', this function will is
equivalent to concat
y and pass them into a two-layer perception.
When the integration_type is (elementwise) 'multiplicative', this function
is equivalent to multimodal low-rank bilinear Pooling (MLB) in
When the integration_type is 'tucker_decomp', this function is equivalent to
multimodal tensor-based Tucker decomposition (MUTAN) in arXiv:1705.06676.
x: A tensor of at least rank 2 and static value for the last dimension; i.e. [batch_size, depth], [None, None, None, channels].
y: A tensor of the same type and shape as
x, except the size of the last dimension can be different.
output_dims: Integer or long, the number of output units.
integration_config: IntegrationConfig contains the following configs (or hyper-parameters) for computing the hidden integration of
y: (a) integration_type: Type of integration function to apply. (b) hidden_dims: Integer or a list of Integer, the number of hidden units in the fully-connected layer(s) before the output layer. (c) activation_fn: Activation function to be applied to.
reuse: Whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.
scope: Optional scope for
The tensor variable representing the result of the series of operations.