tf.contrib.kfac.estimator.FisherEstimator

Class FisherEstimator

Defined in tensorflow/contrib/kfac/python/ops/estimator.py.

Fisher estimator class supporting various approximations of the Fisher.

This is an abstract base class which does not implement a strategy for placing covariance variables, covariance update ops and inverse update ops. The placement strategies are implemented in placement.py. See FisherEstimatorRoundRobin for example of a concrete subclass with a round-robin placement strategy.

Properties

blocks

All registered FisherBlocks.

damping

factors

All registered FisherFactors.

name

variables

Methods

__init__

__init__(
    variables,
    cov_ema_decay,
    damping,
    layer_collection,
    exps=(-1,),
    estimation_mode='gradients',
    colocate_gradients_with_ops=True,
    name='FisherEstimator',
    compute_cholesky=False,
    compute_cholesky_inverse=False
)

Create a FisherEstimator object.

Args:

  • variables: A list of variables or callable which returns the variables for which to estimate the Fisher. This must match the variables registered in layer_collection (if it is not None).
  • cov_ema_decay: The decay factor used when calculating the covariance estimate moving averages.
  • damping: float. The damping factor used to stabilize training due to errors in the local approximation with the Fisher information matrix, and to regularize the update direction by making it closer to the gradient. (Higher damping means the update looks more like a standard gradient update - see Tikhonov regularization.)
  • layer_collection: The layer collection object, which holds the fisher blocks, kronecker factors, and losses associated with the graph.
  • exps: List of floats or ints. These represent the different matrix powers of the approximate Fisher that the FisherEstimator will be able to multiply vectors by. If the user asks for a matrix power other one of these (or 1, which is always supported), there will be a failure. (Default: (-1,))
  • estimation_mode: The type of estimator to use for the Fishers. Can be 'gradients', 'empirical', 'curvature_prop', or 'exact'. (Default: 'gradients'). 'gradients' is the basic estimation approach from the original K-FAC paper. 'empirical' computes the 'empirical' Fisher information matrix (which uses the data's distribution for the targets, as opposed to the true Fisher which uses the model's distribution) and requires that each registered loss have specified targets. 'curvature_propagation' is a method which estimates the Fisher using self-products of random 1/-1 vectors times "half-factors" of the Fisher, as described here: https://arxiv.org/abs/1206.6464 . Finally, 'exact' is the obvious generalization of Curvature Propagation to compute the exact Fisher (modulo any additional diagonal or Kronecker approximations) by looping over one-hot vectors for each coordinate of the output instead of using 1/-1 vectors. It is more expensive to compute than the other three options by a factor equal to the output dimension, roughly speaking.
  • colocate_gradients_with_ops: Whether we should request gradients be colocated with their respective ops. (Default: True)
  • name: A string. A name given to this estimator, which is added to the variable scope when constructing variables and ops. (Default: "FisherEstimator")
  • compute_cholesky: Bool. Whether or not the FisherEstimator will be able to multiply vectors by the Cholesky factor. (Default: False)
  • compute_cholesky_inverse: Bool. Whether or not the FisherEstimator will be able to multiply vectors by the Cholesky factor inverse. (Default: False)

Raises:

  • ValueError: If no losses have been registered with layer_collection.

create_ops_and_vars_thunks

create_ops_and_vars_thunks(scope=None)

Create thunks that make the ops and vars on demand.

This function returns 4 lists of thunks: cov_variable_thunks, cov_update_thunks, inv_variable_thunks, and inv_update_thunks.

The length of each list is the number of factors and the i-th element of each list corresponds to the i-th factor (given by the "factors" property).

Note that the execution of these thunks must happen in a certain partial order. The i-th element of cov_variable_thunks must execute before the i-th element of cov_update_thunks (and also the i-th element of inv_update_thunks). Similarly, the i-th element of inv_variable_thunks must execute before the i-th element of inv_update_thunks.

TL;DR (oversimplified): Execute the thunks according to the order that they are returned.

Args:

  • scope: A string or None. If None it will be set to the name of this estimator (given by the name property). All thunks will execute inside of a variable scope of the given name. (Default: None)

Returns:

  • cov_variable_thunks: A list of thunks that make the cov variables.
  • cov_update_thunks: A list of thunks that make the cov update ops.
  • inv_variable_thunks: A list of thunks that make the inv variables.
  • inv_update_thunks: A list of thunks that make the inv update ops.

made_vars

made_vars()

make_vars_and_create_op_thunks

make_vars_and_create_op_thunks(scope=None)

Make vars and create op thunks with a specific placement strategy.

For each factor, all of that factor's cov variables and their associated update ops will be placed on a particular device. A new device is chosen for each factor by cycling through list of devices in the cov_devices argument. If cov_devices is None then no explicit device placement occurs.

An analogous strategy is followed for inverse update ops, with the list of devices being given by the inv_devices argument.

Inverse variables on the other hand are not placed on any specific device (they will just use the current the device placement context, whatever that happens to be). The idea is that the inverse variable belong where they will be accessed most often, which is the device that actually applies the preconditioner to the gradient. The user will be responsible for setting the device context for this.

Args:

  • scope: A string or None. If None it will be set to the name of this estimator (given by the name property). All variables will be created, and all thunks will execute, inside of a variable scope of the given name. (Default: None)

Returns:

  • cov_update_thunks: List of cov update thunks. Corresponds one-to-one with the list of factors given by the "factors" property.
  • inv_update_thunks: List of inv update thunks. Corresponds one-to-one with the list of factors given by the "factors" property.

multiply

multiply(vecs_and_vars)

Multiplies the vectors by the corresponding (damped) blocks.

Args:

  • vecs_and_vars: List of (vector, variable) pairs.

Returns:

A list of (transformed vector, var) pairs in the same order as vecs_and_vars.

multiply_cholesky

multiply_cholesky(
    vecs_and_vars,
    transpose=False
)

Multiplies the vecs by the corresponding Cholesky factors.

Args:

  • vecs_and_vars: List of (vector, variable) pairs.
  • transpose: Bool. If true the Cholesky factors are transposed before multiplying the vecs. (Default: False)

Returns:

A list of (transformed vector, var) pairs in the same order as vecs_and_vars.

multiply_cholesky_inverse

multiply_cholesky_inverse(
    vecs_and_vars,
    transpose=False
)

Mults the vecs by the inverses of the corresponding Cholesky factors.

L^-T * L^-1 = (L * L^T)^-1 = F^-1 .

Thus we want to multiply by L^-T in order to sample from Gaussian with covariance F^-1.

Args:

  • vecs_and_vars: List of (vector, variable) pairs.
  • transpose: Bool. If true the Cholesky factor inverses are transposed before multiplying the vecs. (Default: False)

Returns:

A list of (transformed vector, var) pairs in the same order as vecs_and_vars.

multiply_inverse

multiply_inverse(vecs_and_vars)

Multiplies the vecs by the corresponding (damped) inverses of the blocks.

Args:

  • vecs_and_vars: List of (vector, variable) pairs.

Returns:

A list of (transformed vector, var) pairs in the same order as vecs_and_vars.

multiply_matpower

multiply_matpower(
    exp,
    vecs_and_vars
)

Multiplies the vecs by the corresponding matrix powers of the blocks.

Args:

  • exp: A float representing the power to raise the blocks by before multiplying it by the vector.
  • vecs_and_vars: List of (vector, variable) pairs.

Returns:

A list of (transformed vector, var) pairs in the same order as vecs_and_vars.