tf.contrib.kfac.fisher_blocks.FullyConnectedDiagonalFB

Class FullyConnectedDiagonalFB

Inherits From: FisherBlock

Defined in tensorflow/contrib/kfac/python/ops/fisher_blocks.py.

FisherBlock for fully-connected (dense) layers using a diagonal approx.

Estimates the Fisher Information matrix's diagonal entries for a fully connected layer. Unlike NaiveDiagonalFB this uses the low-variance "sum of squares" estimator.

Let 'params' be a vector parameterizing a model and 'i' an arbitrary index into it. We are interested in Fisher(params)[i, i]. This is,

Fisher(params)[i, i] = E[ v(x, y, params) v(x, y, params)^T ][i, i] = E[ v(x, y, params)[i] ^ 2 ]

Consider fully connected layer in this model with (unshared) weight matrix 'w'. For an example 'x' that produces layer inputs 'a' and output preactivations 's',

v(x, y, w) = vec( a (d loss / d s)^T )

This FisherBlock tracks Fisher(params)[i, i] for all indices 'i' corresponding to the layer's parameters 'w'.

Properties

num_registered_minibatches

Methods

__init__

__init__(
    layer_collection,
    has_bias=False
)

Creates a FullyConnectedDiagonalFB block.

Args:

  • layer_collection: The collection of all layers in the K-FAC approximate Fisher information matrix to which this FisherBlock belongs.
  • has_bias: Whether the component Kronecker factors have an additive bias. (Default: False)

instantiate_factors

instantiate_factors(
    grads_list,
    damping
)

multiply

multiply(vector)

Approximate damped Fisher-vector product.

Args:

  • vector: Tensor or 2-tuple of Tensors. if self._has_bias, Tensor of shape [input_size, output_size] corresponding to layer's weights. If not, a 2-tuple of the former and a Tensor of shape [output_size] corresponding to the layer's bias.

Returns:

Tensor of the same shape, corresponding to the Fisher-vector product.

multiply_inverse

multiply_inverse(vector)

Approximate damped inverse Fisher-vector product.

Args:

  • vector: Tensor or 2-tuple of Tensors. if self._has_bias, Tensor of shape [input_size, output_size] corresponding to layer's weights. If not, a 2-tuple of the former and a Tensor of shape [output_size] corresponding to the layer's bias.

Returns:

Tensor of the same shape, corresponding to the inverse Fisher-vector product.

register_additional_minibatch

register_additional_minibatch(
    inputs,
    outputs
)

Registers an additional minibatch to the FisherBlock.

Args:

  • inputs: Tensor of shape [batch_size, input_size]. Inputs to the matrix-multiply.
  • outputs: Tensor of shape [batch_size, output_size]. Layer preactivations.

tensors_to_compute_grads

tensors_to_compute_grads()

Tensors to compute derivative of loss with respect to.