Join us at TensorFlow World, Oct 28-31. Use code TF20 for 20% off select passes. Register now


View source on GitHub

Compute the bimodal integration between x and y.


The inputs x and y are usually from two different types of sources, e.g., x represents image embeddings and y represent text embeddings. This function will integrate bimodal inputs x and y by the following:

outputs = fc_layer( activation_fn(integration_type(fc_layer(x), fc_layer(y))))

When the integration_type is (elementwise) 'additive', this function will is equivalent to concat x and y and pass them into a two-layer perception. When the integration_type is (elementwise) 'multiplicative', this function is equivalent to multimodal low-rank bilinear Pooling (MLB) in arXiv:1610.04325. When the integration_type is 'tucker_decomp', this function is equivalent to multimodal tensor-based Tucker decomposition (MUTAN) in arXiv:1705.06676.


  • x: A tensor of at least rank 2 and static value for the last dimension; i.e. [batch_size, depth], [None, None, None, channels].
  • y: A tensor of the same type and shape as x, except the size of the last dimension can be different.
  • output_dims: Integer or long, the number of output units.
  • integration_config: IntegrationConfig contains the following configs (or hyper-parameters) for computing the hidden integration of x and y: (a) integration_type: Type of integration function to apply. (b) hidden_dims: Integer or a list of Integer, the number of hidden units in the fully-connected layer(s) before the output layer. (c) activation_fn: Activation function to be applied to.
  • reuse: Whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.
  • scope: Optional scope for variable_scope.


The tensor variable representing the result of the series of operations.