View source on GitHub |

Simple convergence criterion based on lack of decrease in loss values.

Inherits From: `ConvergenceCriterion`

```
tfp.optimizer.convergence_criteria.LossNotDecreasing(
atol=None, rtol=None, window_size=10, min_num_steps=20, name=None
)
```

This rule tracks an exponentially-weighted moving average of the decrease in loss values between successive steps, and stops when that average drops below a threshold.

```
decrease_in_loss[t] = loss[t-1] - loss[t]
average_decrease_in_loss[t] = (
(window_size - 1) * average_decrease_in_loss[t - 1] +
decrease_in_loss[t]) / window_size
has_converged = (average_decrease_in_loss < threshold)
```

The convergence threshold can be set directly as `atol`

, or as a fraction
of the average loss decrease across the first `window_size`

steps
of the optimization:
`threshold = rtol * average_decrease_in_loss[window_size]`

.
If both `atol`

and `rtol`

are specified,
the maximum of the two thresholds is used (equivalently, the optimization
stops if either of the two conditions is met).

The state propagated across training steps is `state[t] = LossNotDecreasingState(loss[t], average_decrease_in_loss[t], average_decrease_in_loss[window_size]).

#### Args:

: float`atol`

`Tensor`

absolute tolerance. Convergence is assumed whenever (an exponentially-weighted moving average of) the decrease in loss values from one step to the next is less than`atol`

. If both`atol`

and`rtol`

are specified, then convergence is assumed if*either*of the criteria is met.: float`rtol`

`Tensor`

relative tolerance. Convergence is assumed whenever (an exponentially-weighted moving average of) the decrease in loss values from one step to the next is less than`rtol * average_initial_decrease_in_loss`

, where`average_initial_decrease_in_loss`

is the exponentially-weighted moving average of the decrease in loss over the first`window_size`

steps of the optimization. If both`atol`

and`rtol`

are specified, then convergence is assumed if*either*of the criteria is met.: int`window_size`

`Tensor`

effective window size for the moving average decrease in loss. The moving average is computed as`moving_average[t] = decrease_in_loss[t] + decay * (moving_average[t-1] - decrease_in_loss[t])`

where`decay = 1. - 1. / window_size`

. Default value:`10`

.: int`min_num_steps`

`Tensor`

minimum number of steps before convergence. The criterion will not return`has_converged=True`

until`step >= min_num_steps`

. This should generally be a larger value than`window_size`

. Default value:`20`

.: optional Python`name`

`str`

name prefixed to ops created by this class.

#### Attributes:

`atol`

`dtype`

`min_num_steps`

`name`

`rtol`

`window_size`

## Methods

`bootstrap`

```
bootstrap(
loss, grads, parameters
)
```

Returns a structure of `Tensors`

for the rule's state at step 0.

The shape of the `Tensor`

s specifying `loss`

, `grads`

, and `parameters`

may
optionally be prefixed by one or more batch dimension(s).

#### Args:

: float`loss`

`Tensor`

initial value of loss being optimized.: list of float`grads`

`Tensor`

gradients of`loss`

wrt`parameters`

.: list of float`parameters`

`Tensor`

initial values of parameters being optimized.

#### Returns:

: (Structure of)`initial_auxiliary_state`

`Tensor`

(s) representing the initial auxiliary state carried forward by this criterion.

`one_step`

```
one_step(
step, loss, grads, parameters, auxiliary_state
)
```

Updates tracked quantities for a new step, and determines if converged.

The shape of the `Tensor`

s specifying `loss`

, `grads`

, and `parameters`

may
optionally be prefixed by one or more batch dimension(s). In this case,
the returned value `has_converged`

will have shape equal to the broadcast
batch shape of whichever of those quantities is used by this convergence
criterion, and the quantities defining the convergence criterion (
`min_num_steps`

, etc.).

#### Args:

: integer`step`

`Tensor`

index of the current step, where`step >= 1`

(on step`0`

,`initial_state`

should be called instead).: float`loss`

`Tensor`

value of loss at the current step.: list of float`grads`

`Tensor`

gradients of`loss`

wrt`parameters`

.: list of float`parameters`

`Tensor`

current values of parameters being optimized.: the (structure of)`auxiliary_state`

`Tensor`

(s) containing state carried forward from the previous step.

#### Returns:

: boolean`has_converged`

`Tensor`

indicating whether the optimization has converged.: (Structure of)`updated_auxiliary_state`

`Tensor`

(s) representing updated quantities tracked by the convergence criterion. This should match the structure of the value returned by`bootstrap`

.