tf.contrib.kfac.fisher_blocks.ConvKFCBasicFB

Class ConvKFCBasicFB

Inherits From: KroneckerProductFB

Defined in tensorflow/contrib/kfac/python/ops/fisher_blocks.py.

FisherBlock for 2D convolutional layers using the basic KFC approx.

Estimates the Fisher Information matrix's blog for a convolutional layer.

Consider a convoluational layer in this model with (unshared) filter matrix 'w'. For a minibatch that produces inputs 'a' and output preactivations 's', this FisherBlock estimates,

F(w) = #locations * kronecker(E[flat(a) flat(a)^T], E[flat(ds) flat(ds)^T])

where

ds = (d / ds) log p(y | x, w) #locations = number of (x, y) locations where 'w' is applied.

where the expectation is taken over all examples and locations and flat() concatenates an array's leading dimensions.

See equation 23 in https://arxiv.org/abs/1602.01407 for details.

Properties

num_registered_minibatches

Methods

__init__

__init__(
    layer_collection,
    params,
    strides,
    padding
)

Creates a ConvKFCBasicFB block.

Args:

  • layer_collection: The collection of all layers in the K-FAC approximate Fisher information matrix to which this FisherBlock belongs.
  • params: The parameters (Tensor or tuple of Tensors) of this layer. If kernel alone, a Tensor of shape [kernel_height, kernel_width, in_channels, out_channels]. If kernel and bias, a tuple of 2 elements containing the previous and a Tensor of shape [out_channels].
  • strides: The stride size in this layer (1-D Tensor of length 4).
  • padding: The padding in this layer (1-D of Tensor length 4).

full_fisher_block

full_fisher_block()

Explicitly constructs the full Fisher block.

Used for testing purposes. (In general, the result may be very large.)

Returns:

The full Fisher block.

instantiate_factors

instantiate_factors(
    grads_list,
    damping
)

multiply

multiply(vector)

multiply_inverse

multiply_inverse(vector)

register_additional_minibatch

register_additional_minibatch(
    inputs,
    outputs
)

Registers an additional minibatch to the FisherBlock.

Args:

  • inputs: Tensor of shape [batch_size, height, width, input_size]. Inputs to the convolution.
  • outputs: Tensor of shape [batch_size, height, width, output_size]. Layer preactivations.

tensors_to_compute_grads

tensors_to_compute_grads()