tf.contrib.kfac.layer_collection.LayerCollection

Class LayerCollection

Defined in tensorflow/contrib/kfac/python/ops/layer_collection.py.

Registry of information about layers and losses.

Note that you need to create a new one of these for each MatrixEstimator or KfacOptimizer.

Attributes:

  • fisher_blocks: a LayersParamsDict (subclass of OrderedDict) mapping layer parameters (Tensors or tuples of Tensors) to FisherBlock instances.
  • fisher_factors: an OrderedDict mapping tuples to FisherFactor instances.
  • losses: a list of LossFunction objects. The loss to be optimized is their sum.
  • loss_colocation_ops: ops to colocate loss function evaluations with. These will typically be the inputs to the losses.

Properties

default_conv2d_approximation

default_conv2d_multi_approximation

default_embedding_approximation

default_embedding_multi_approximation

default_fully_connected_approximation

default_fully_connected_multi_approximation

default_generic_approximation

graph

linked_parameters

Groups of parameters with an optionally specified approximation.

Linked parameters can be added using define_linked_parameters. If an approximation is specified, then this approximation will be used when registering a layer with exactly these parameters, unless an approximation is specified when calling the registration function.

Returns:

A dict mapping tuples of parameters to an optional string.

losses

Tuple of LossFunction objects registered with this LayerCollection.

registered_variables

A tuple of all of the variables currently registered.

subgraph

towers_by_loss

Tuple across losses of LossFunction objects registered to each tower.

Methods

__init__

__init__(
    graph=None,
    name='LayerCollection'
)

as_default

as_default(
    *args,
    **kwds
)

Sets this LayerCollection as the default.

check_registration

check_registration(variables)

Checks that all variable uses have been registered properly.

Args:

  • variables: List of variables.

Raises:

  • ValueError: If any registered variables are not included in the list.
  • ValueError: If any variable in the list is not registered.
  • ValueError: If any variable in the list is registered with the wrong number of "uses" in the subgraph recorded (vs the number of times that variable is actually used in the subgraph).

create_subgraph

create_subgraph()

define_linked_parameters

define_linked_parameters(
    params,
    approximation=None
)

Identify a set of parameters that should be grouped together.

During automatic graph scanning, any matches containing variables that have been identified as part of a linked group will be filtered out unless the match parameters are exactly equal to the ones specified in the linked group.

Args:

  • params: A variable, or a tuple or list of variables. The variables to be linked.
  • approximation: Optional string specifying the type of approximation to use for these variables. If unspecified, this layer collection's default approximation for the layer type will be used.

Raises:

  • ValueError: If the parameters were already registered in a layer or identified as part of an incompatible group.

eval_losses

eval_losses()

Return evaluated losses (colocated with inputs to losses).

eval_losses_on_samples

eval_losses_on_samples()

Return losses evaluated on samples (colocated with inputs to losses).

get_blocks

get_blocks()

get_factors

get_factors()

make_or_get_factor

make_or_get_factor(
    cls,
    args
)

Insert cls(args) into 'self.fisher_factors` if not already present.

Wraps constructor in tf.variable_scope() to ensure variables constructed in cls.__init__ are placed under this LayerCollection's scope.

Args:

  • cls: Class that implements FisherFactor.
  • args: Tuple of arguments to pass into `cls's constructor. Must be hashable.

Returns:

Instance of cls found in self.fisher_factors.

register_block

register_block(
    layer_key,
    fisher_block,
    reuse=VARIABLE_SCOPE
)

Validates and registers the layer_key associated with the fisher_block.

Args:

  • layer_key: A variable or tuple of variables. The key to check for in existing registrations and to register if valid.
  • fisher_block: The associated FisherBlock.
  • reuse: Method to use for inserting new FisherBlock's. One of True, False, orVARIABLE_SCOPE`.

Raises:

  • ValueError: If layer_key was already registered and reuse is False, if layer_key was registered with a different block type, or if layer_key shares any variables with but is not equal to a previously registered key.
  • KeyError: If reuse is True but layer_key was not previously registered.

Returns:

The FisherBlock registered under layer_key. If layer_key was already registered, this will be the previously registered FisherBlock.

register_categorical_predictive_distribution

register_categorical_predictive_distribution(
    logits,
    seed=None,
    targets=None,
    name=None,
    reuse=VARIABLE_SCOPE
)

Registers a categorical predictive distribution.

Args:

  • logits: The logits of the distribution (i.e. its parameters).
  • seed: The seed for the RNG (for debugging) (Default: None)
  • targets: (OPTIONAL) The targets for the loss function. Only required if one wants to call total_loss() instead of total_sampled_loss(). total_loss() is required, for example, to estimate the "empirical Fisher" (instead of the true Fisher). (Default: None)
  • name: (OPTIONAL) str or None. Unique name for this loss function. If None, a new name is generated. (Default: None)
  • reuse: bool or str. If True, this adds logits as an additional mini-batch/tower of inputs to the loss-function/predictive distribution (which must have already been registered). If "VARIABLE_SCOPE", use tf.get_variable_scope().reuse. (Default: "VARIABLE_SCOPE")

register_conv2d

register_conv2d(
    params,
    strides,
    padding,
    inputs,
    outputs,
    data_format=None,
    dilations=None,
    approx=None,
    reuse=VARIABLE_SCOPE
)

Registers a call to tf.nn.conv2d().

Args:

  • params: Tensor or 2-tuple of Tensors corresponding to weight and bias of this layer. Weight matrix should have shape [kernel_height, kernel_width, in_channels, out_channels]. Bias should have shape [out_channels].
  • strides: List of 4 ints. Strides for convolution kernel.
  • padding: string. see tf.nn.conv2d for valid values.
  • inputs: Tensor of shape [batch_size, height, width, in_channels]. Inputs to layer.
  • outputs: Tensor of shape [batch_size, height, width, out_channels]. Output produced by layer.
  • data_format: str or None. Format of data.
  • dilations: List of 4 ints. Dilations along each dimension.
  • approx: str or None. If not None must be one of "kron" or "diagonal". The Fisher approximation to use. If None the default value is used. (Default: None)
  • reuse: bool or str. If True, this adds inputs and outputs as an additional mini-batch/tower of data to use when estimating the Fisher block for this layer (which must have already been registered). If "VARIABLE_SCOPE", use tf.get_variable_scope().reuse. (Default: "VARIABLE_SCOPE")

Raises:

  • ValueError: For improper value to approx.
  • KeyError: If reuse == True but no FisherBlock found for params.
  • ValueError: If reuse == True and FisherBlock found but of the wrong type.

register_conv2d_multi

register_conv2d_multi(
    params,
    strides,
    padding,
    inputs,
    outputs,
    num_uses=None,
    data_format=None,
    dilations=None,
    approx=None,
    reuse=VARIABLE_SCOPE
)

Registers convolutional layers with shared parameters.

Args:

  • params: Tensor or 2-tuple of Tensors corresponding to weight and bias of this layer. Weight matrix should have shape [kernel_height, kernel_width, in_channels, out_channels]. Bias should have shape [out_channels].
  • strides: 1-D Tensor of length 4. Strides for convolution kernel.
  • padding: string. see tf.nn.conv2d for valid values.
  • inputs: A list of Tensors, each of shape [batch_size, height, width, in_channels]. Inputs to layer. The list indexes each use in the graph (which might correspond to a "time-step" in an RNN). OR, can be single Tensor, of shape [num_uses * batch_size, height, width, in_channels], which is a reshaped version of a Tensor of shape [num_uses, batch_size, height, width, in_channels].
  • outputs: A list of Tensors, each of shape [batch_size, height, width, out_channels]. Output produced by layer. The list indexes each use in the graph (which might correspond to a "time-step" in an RNN). Needs to correspond with the order used in inputs. OR, can be a single Tensor, of shape [num_uses * batch_size, height, width, out_channels], which is a reshaped version of a Tensor of shape [num_uses, batch_size, height, width, out_channels].
  • num_uses: int or None. The number uses/time-steps in the graph where the layer appears. Only needed if both inputs and outputs are given in the single Tensor format. (Default: None)
  • data_format: str or None. Format of data.
  • dilations: List of 4 ints. Dilations along each dimension.
  • approx: str or None. If not None must by "kron_indep". The Fisher approximation to use. If None the default value is used. (Default: None)
  • reuse: bool or str. If True, this adds inputs and outputs as an additional mini-batch/tower of data to use when estimating the Fisher block for this layer (which must have already been registered). If "VARIABLE_SCOPE", use tf.get_variable_scope().reuse. (Note that the word use here has a completely different meaning to "use in the graph" as it perturns to the inputs, outputs, and num_uses arguments.) (Default: "VARIABLE_SCOPE")

Raises:

  • ValueError: For improper value to approx.
  • KeyError: If reuse == True but no FisherBlock found for params.
  • ValueError: If reuse == True and FisherBlock found but of the wrong type.

register_convolution

register_convolution(
    params,
    inputs,
    outputs,
    padding,
    strides=None,
    dilation_rate=None,
    data_format=None,
    approx=None,
    reuse=VARIABLE_SCOPE
)

Register a call to tf.nn.convolution().

Args:

  • params: Tensor or 2-tuple of Tensors corresponding to weight and bias of this layer. Weight matrix should have shape [..filter_spatial_size.., in_channels, out_channels]. Bias should have shape [out_channels].
  • inputs: Tensor of shape [batch_size, ..input_spatial_size.., in_channels]. Inputs to layer.
  • outputs: Tensor of shape [batch_size, ..output_spatial_size.., out_channels]. Output produced by layer.
  • padding: string. see tf.nn.conv2d for valid values.
  • strides: List of ints of length len(..input_spatial_size..). Strides for convolution kernel in spatial dimensions.
  • dilation_rate: List of ints of length len(..input_spatial_size..). Dilations along spatial dimension.
  • data_format: str or None. Format of data.
  • approx: str or None. If not None must be one of "kron" or "diagonal". The Fisher approximation to use. If None the default value is used. (Default: None)
  • reuse: bool or str. If True, this adds inputs and outputs as an additional mini-batch/tower of data to use when estimating the Fisher block for this layer (which must have already been registered). If "VARIABLE_SCOPE", use tf.get_variable_scope().reuse. (Default: "VARIABLE_SCOPE")

Raises:

  • ValueError: For improper value to approx.
  • KeyError: If reuse == True but no FisherBlock found for params.
  • ValueError: If reuse == True and FisherBlock found but of the wrong type.

register_depthwise_conv2d

register_depthwise_conv2d(
    params,
    inputs,
    outputs,
    strides,
    padding,
    rate=None,
    data_format=None,
    approx=None,
    reuse=VARIABLE_SCOPE
)

Register a call to tf.nn.depthwise_conv2d().

Args:

  • params: 4-D Tensor of shape [filter_height, filter_width, in_channels, channel_multiplier]. Convolutional filter.
  • inputs: Tensor of shape [batch_size, input_height, input_width, in_channels]. Inputs to layer.
  • outputs: Tensor of shape [batch_size, output_height, output_width, in_channels * channel_multiplier]. Output produced by depthwise conv2d.
  • strides: List of ints of length 4. Strides along all dimensions.
  • padding: string. see tf.nn.conv2d for valid values.
  • rate: None or List of ints of length 2. Dilation rates in spatial dimensions.
  • data_format: str or None. Format of data.
  • approx: str or None. If not None must "diagonal". The Fisher approximation to use. If None the default value is used. (Default: None)
  • reuse: bool or str. If True, this adds inputs and outputs as an additional mini-batch/tower of data to use when estimating the Fisher block for this layer (which must have already been registered). If "VARIABLE_SCOPE", use tf.get_variable_scope().reuse. (Default: "VARIABLE_SCOPE")

Raises:

  • ValueError: For improper value to approx.
  • KeyError: If reuse == True but no FisherBlock found for params.
  • ValueError: If reuse == True and FisherBlock found but of the wrong type.

register_embedding

register_embedding(
    params,
    inputs,
    outputs,
    approx=None,
    reuse=VARIABLE_SCOPE
)

Registers an embedding layer.

Args:

  • params: Embedding matrix of shape [vocab_size, embedding_size].
  • inputs: Tensor of shape [batch_size, input_size] and dtype int32. Indices into embedding matrix.
  • outputs: Tensor of shape [batch_size, embedding_size]. Outputs produced by layer.
  • approx: str or None. If not None must be "kron". The Fisher approximation to use. If None the default value is used. (Default: None)
  • reuse: bool or str. If True, this adds inputs and outputs as an additional mini-batch/tower of data to use when estimating the Fisher block for this layer (which must have already been registered). If "VARIABLE_SCOPE", use tf.get_variable_scope().reuse. (Default: "VARIABLE_SCOPE")

Raises:

  • ValueError: For improper value to approx.
  • KeyError: If reuse == True but no FisherBlock found for params.
  • ValueError: If reuse == True and FisherBlock found but of the wrong type.

register_embedding_multi

register_embedding_multi(
    params,
    inputs,
    outputs,
    num_uses=None,
    approx=None,
    reuse=VARIABLE_SCOPE
)

Registers embedding layers with shared parameters.

Args:

  • params: Embedding matrix of shape [vocab_size, embedding_size].
  • inputs: A list of Tensors, each of shape [batch_size, input_size] and dtype int32. Indices into embedding matrix. The list indexes each use in the graph (which might correspond to a "time-step" in an RNN). OR, can be single Tensor, of shape [num_uses*batch_size, input_size], which is a reshaped version of a Tensor of shape [num_uses, batch_size, input_size].
  • outputs: A list of Tensors, each of shape [batch_size, embedding_size]. Outputs produced by layer. The list indexes each use in the graph (which might correspond to a "time-step" in an RNN). Needs to correspond with the order used in inputs. OR, can be a single Tensor, of shape [num_uses * batch_size, embedding_size], which is a reshaped version of a Tensor of shape [num_uses, batch_size, embedding_size].
  • num_uses: int or None. The number uses/time-steps in the graph where the layer appears. Only needed if both inputs and outputs are given in the single Tensor format. (Default: None)
  • approx: str or None. If not None must by "kron_indep". The Fisher approximation to use. If None the default value is used. (Default: None)
  • reuse: bool or str. If True, this adds inputs and outputs as an additional mini-batch/tower of data to use when estimating the Fisher block for this layer (which must have already been registered). If "VARIABLE_SCOPE", use tf.get_variable_scope().reuse. (Note that the word use here has a completely different meaning to "use in the graph" as it perturns to the inputs, outputs, and num_uses arguments.) (Default: "VARIABLE_SCOPE")

Raises:

  • ValueError: For improper value to approx.
  • KeyError: If reuse == True but no FisherBlock found for params.
  • ValueError: If reuse == True and FisherBlock found but of the wrong type.

register_fully_connected

register_fully_connected(
    params,
    inputs,
    outputs,
    approx=None,
    reuse=VARIABLE_SCOPE
)

Registers a fully connnected layer.

Args:

  • params: Tensor or 2-tuple of Tensors corresponding to weight and bias of this layer. Weight matrix should have shape [input_size, output_size]. Bias should have shape [output_size].
  • inputs: Tensor of shape [batch_size, input_size]. Inputs to layer.
  • outputs: Tensor of shape [batch_size, output_size]. Outputs produced by layer.
  • approx: str or None. If not None must be one of "kron" or "diagonal". The Fisher approximation to use. If None the default value is used. (Default: None)
  • reuse: bool or str. If True, this adds inputs and outputs as an additional mini-batch/tower of data to use when estimating the Fisher block for this layer (which must have already been registered). If "VARIABLE_SCOPE", use tf.get_variable_scope().reuse. (Default: "VARIABLE_SCOPE")

Raises:

  • ValueError: For improper value to approx.
  • KeyError: If reuse == True but no FisherBlock found for params.
  • ValueError: If reuse == True and FisherBlock found but of the wrong type.

register_fully_connected_multi

register_fully_connected_multi(
    params,
    inputs,
    outputs,
    num_uses=None,
    approx=None,
    reuse=VARIABLE_SCOPE
)

Register fully connected layers with shared parameters.

This can handle general fully-connected layers with shared parameters, but has specialized approximations to deal with the case where there is a meaningful linear order to the share instances (such as in an RNN).

Args:

  • params: Tensor or 2-tuple of Tensors corresponding to weight and bias of this layer. Weight matrix should have shape [input_size, output_size]. Bias should have shape [output_size].
  • inputs: A list of Tensors, each of shape [batch_size, input_size]. Inputs to layer. The list indexes each use in the graph (which might correspond to a "time-step" in an RNN). OR, can be single Tensor, of shape [num_uses * batch_size , input_size], which is a reshaped version of a Tensor of shape [num_uses, batch_size, input_size].
  • outputs: A list of Tensors, the same length as inputs, each of shape [batch_size, output_size]. Outputs produced by layer. The list indexes each use in the graph (which might correspond to a "time-step" in an RNN). Needs to correspond with the order used in inputs. OR, can be a single Tensor of shape [num_uses * batch_size, output_size], which is a reshaped version of a Tensor of shape [num_uses, batch_size, output_size].
  • num_uses: int or None. The number uses/time-steps in the graph where the layer appears. Only needed if both inputs and outputs are given in the single Tensor format. (Default: None)
  • approx: str or None. If not None, must be of "kron_indep", "kron_series_1" or "kron_series_2". The Fisher approximation to use. If None the default value is used. (Default: None)
  • reuse: bool or str. If True, this adds inputs and outputs as an additional mini-batch/tower of data to use when estimating the Fisher block for this layer (which must have already been registered). If "VARIABLE_SCOPE", use tf.get_variable_scope().reuse. (Note that the word use here has a completely different meaning to "use in the graph" as it perturns to the inputs, outputs, and num_uses arguments.) (Default: "VARIABLE_SCOPE")

Raises:

  • ValueError: For improper value to approx.

register_generic

register_generic(
    params,
    batch_size,
    approx=None,
    reuse=VARIABLE_SCOPE
)

Registers a generic layer.

Args:

  • params: Tensor or tuple of Tensors corresponding to the parameters.
  • batch_size: 0-D Tensor. Size of the minibatch (for this tower).
  • approx: str or None. It not None, must be one of "full" or "diagonal". The Fisher approximation to use. If None the default value is used. (Default: None)
  • reuse: bool or str. If True, this adds batch_size to the total mini-batch size use when estimating the Fisher block for this layer (which must have already been registered). If "VARIABLE_SCOPE", use tf.get_variable_scope().reuse. (Default: "VARIABLE_SCOPE")

Raises:

  • ValueError: For improper value to approx.
  • KeyError: If reuse == True but no FisherBlock found for params.
  • ValueError: If reuse == True and FisherBlock found but of the wrong type.

register_loss_function

register_loss_function(
    loss,
    colocation_op,
    base_name,
    name=None,
    reuse=VARIABLE_SCOPE
)

Registers a LossFunction object.

Args:

  • loss: The LossFunction object.
  • colocation_op: The op to colocate the loss function's computations with.
  • base_name: The name to derive a new unique name from is the name argument is None.
  • name: (OPTIONAL) str or None. Unique name for this loss function. If None, a new name is generated. (Default: None)
  • reuse: (OPTIONAL) bool or str. If True, adds loss as an additional tower for the existing loss function.

Raises:

  • ValueError: If reuse == True and name == None.
  • ValueError: If reuse == True and seed != None.
  • KeyError: If reuse == True and no existing LossFunction with name found.
  • KeyError: If reuse == False and existing LossFunction with name found.

register_multi_bernoulli_predictive_distribution

register_multi_bernoulli_predictive_distribution(
    logits,
    seed=None,
    targets=None,
    name=None,
    reuse=VARIABLE_SCOPE
)

Registers a multi-Bernoulli predictive distribution.

Args:

  • logits: The logits of the distribution (i.e. its parameters).
  • seed: The seed for the RNG (for debugging) (Default: None)
  • targets: (OPTIONAL) The targets for the loss function. Only required if one wants to call total_loss() instead of total_sampled_loss(). total_loss() is required, for example, to estimate the "empirical Fisher" (instead of the true Fisher). (Default: None)
  • name: (OPTIONAL) str or None. Unique name for this loss function. If None, a new name is generated. (Default: None)
  • reuse: bool or str. If True, this adds logits as an additional mini-batch/tower of inputs to the loss-function/predictive distribution (which must have already been registered). If "VARIABLE_SCOPE", use tf.get_variable_scope().reuse. (Default: "VARIABLE_SCOPE")

register_normal_predictive_distribution

register_normal_predictive_distribution(
    mean,
    var=0.5,
    seed=None,
    targets=None,
    name=None,
    reuse=VARIABLE_SCOPE
)

Registers a normal predictive distribution.

Args:

  • mean: The mean vector defining the distribution.
  • var: The variance (must be a scalar). Note that the default value of 0.5 corresponds to a standard squared error loss (target - prediction)2. If your squared error loss is of the form 0.5*(target - prediction)2 you should use var=1.0. (Default: 0.5)
  • seed: The seed for the RNG (for debugging) (Default: None)
  • targets: (OPTIONAL) The targets for the loss function. Only required if one wants to call total_loss() instead of total_sampled_loss(). total_loss() is required, for example, to estimate the "empirical Fisher" (instead of the true Fisher). (Default: None)
  • name: (OPTIONAL) str or None. Unique name for this loss function. If None, a new name is generated. (Default: None)
  • reuse: bool or str. If True, this adds mean and var as an additional mini-batch/tower of inputs to the loss-function/predictive distribution (which must have already been registered). If "VARIABLE_SCOPE", use tf.get_variable_scope().reuse. (Default: "VARIABLE_SCOPE")

register_separable_conv2d

register_separable_conv2d(
    depthwise_params,
    pointwise_params,
    inputs,
    depthwise_outputs,
    pointwise_outputs,
    strides,
    padding,
    rate=None,
    data_format=None,
    approx=None,
    reuse=VARIABLE_SCOPE
)

Register a call to tf.nn.separable_conv2d().

Args:

  • depthwise_params: 4-D Tensor of shape [filter_height, filter_width, in_channels, channel_multiplier]. Filter for depthwise conv2d.
  • pointwise_params: 4-D Tensor of shape [1, 1, in_channels * channel_multiplier, out_channels]. Filter for pointwise conv2d.
  • inputs: Tensor of shape [batch_size, input_height, input_width, in_channels]. Inputs to layer.
  • depthwise_outputs: Tensor of shape [batch_size, output_height, output_width, in_channels * channel_multiplier]. Output produced by depthwise conv2d.
  • pointwise_outputs: Tensor of shape [batch_size, output_height, output_width, out_channels]. Output produced by pointwise conv2d.
  • strides: List of ints of length 4. Strides for depthwise conv2d kernel in all dimensions.
  • padding: string. see tf.nn.conv2d for valid values.
  • rate: None or List of ints of length 2. Dilation rate of depthwise conv2d kernel in spatial dimensions.
  • data_format: str or None. Format of data.
  • approx: str or None. If not None must be one of "kron" or "diagonal". The Fisher approximation to use. If None the default value is used. (Default: None)
  • reuse: bool or str. If True, this adds inputs and outputs as an additional mini-batch/tower of data to use when estimating the Fisher block for this layer (which must have already been registered). If "VARIABLE_SCOPE", use tf.get_variable_scope().reuse. (Default: "VARIABLE_SCOPE")

Raises:

  • ValueError: For improper value to approx.
  • KeyError: If reuse == True but no FisherBlock found for params.
  • ValueError: If reuse == True and FisherBlock found but of the wrong type.

set_default_conv2d_approximation

set_default_conv2d_approximation(value)

set_default_embedding_approximation

set_default_embedding_approximation(value)

set_default_fully_connected_approximation

set_default_fully_connected_approximation(value)

set_default_fully_connected_multi_approximation

set_default_fully_connected_multi_approximation(value)

set_default_generic_approximation

set_default_generic_approximation(value)

total_loss

total_loss()

total_sampled_loss

total_sampled_loss()