Applies the L-BFGS algorithm to minimize a differentiable function.
tfp.substrates.jax.optimizer.lbfgs_minimize(
value_and_gradients_function,
initial_position,
previous_optimizer_results=None,
num_correction_pairs=10,
tolerance=1e-08,
x_tolerance=0,
f_relative_tolerance=0,
initial_inverse_hessian_estimate=None,
max_iterations=50,
parallel_iterations=1,
stopping_condition=None,
max_line_search_iterations=50,
f_absolute_tolerance=0,
name=None
)
Used in the notebooks
Performs unconstrained minimization of a differentiable function using the
L-BFGS scheme. See [Nocedal and Wright(2006)][1] for details of the algorithm.
Usage:
The following example demonstrates the L-BFGS optimizer attempting to find the
minimum for a simple high-dimensional quadratic objective function.
# A high-dimensional quadratic bowl.
ndims = 60
minimum = np.ones([ndims], dtype='float64')
scales = np.arange(ndims, dtype='float64') + 1.0
# The objective function and the gradient.
def quadratic_loss_and_gradient(x):
return tfp.math.value_and_gradient(
lambda x: tf.reduce_sum(
scales * tf.math.squared_difference(x, minimum), axis=-1),
x)
start = np.arange(ndims, 0, -1, dtype='float64')
optim_results = tfp.optimizer.lbfgs_minimize(
quadratic_loss_and_gradient,
initial_position=start,
num_correction_pairs=10,
tolerance=1e-8)
# Check that the search converged
assert(optim_results.converged)
# Check that the argmin is close to the actual value.
np.testing.assert_allclose(optim_results.position, minimum)
References:
[1] Jorge Nocedal, Stephen Wright. Numerical Optimization. Springer Series
in Operations Research. pp 176-180. 2006
http://pages.mtu.edu/~struther/Courses/OLD/Sp2013/5630/Jorge_Nocedal_Numerical_optimization_267490.pdf
Args |
value_and_gradients_function
|
A Python callable that accepts a point as a
real Tensor and returns a tuple of Tensor s of real dtype containing
the value of the function and its gradient at that point. The function
to be minimized. The input is of shape [..., n] , where n is the size
of the domain of input points, and all others are batching dimensions.
The first component of the return value is a real Tensor of matching
shape [...] . The second component (the gradient) is also of shape
[..., n] like the input value to the function.
|
initial_position
|
Real Tensor of shape [..., n] . The starting point, or
points when using batching dimensions, of the search procedure. At these
points the function value and the gradient norm should be finite.
Exactly one of initial_position and previous_optimizer_results can be
non-None.
|
previous_optimizer_results
|
An LBfgsOptimizerResults namedtuple to
intialize the optimizer state from, instead of an initial_position .
This can be passed in from a previous return value to resume optimization
with a different stopping_condition . Exactly one of initial_position
and previous_optimizer_results can be non-None.
|
num_correction_pairs
|
Positive integer. Specifies the maximum number of
(position_delta, gradient_delta) correction pairs to keep as implicit
approximation of the Hessian matrix.
|
tolerance
|
Scalar Tensor of real dtype. Specifies the gradient tolerance
for the procedure. If the supremum norm of the gradient vector is below
this number, the algorithm is stopped.
|
x_tolerance
|
Scalar Tensor of real dtype. If the absolute change in the
position between one iteration and the next is smaller than this number,
the algorithm is stopped.
|
f_relative_tolerance
|
Scalar Tensor of real dtype. If the relative change
in the objective value between one iteration and the next is smaller
than this value, the algorithm is stopped.
|
initial_inverse_hessian_estimate
|
None. Option currently not supported.
|
max_iterations
|
Scalar positive int32 Tensor . The maximum number of
iterations for L-BFGS updates.
|
parallel_iterations
|
Positive integer. The number of iterations allowed to
run in parallel.
|
stopping_condition
|
(Optional) A Python function that takes as input two
Boolean tensors of shape [...] , and returns a Boolean scalar tensor.
The input tensors are converged and failed , indicating the current
status of each respective batch member; the return value states whether
the algorithm should stop. The default is tfp.optimizer.converged_all
which only stops when all batch members have either converged or failed.
An alternative is tfp.optimizer.converged_any which stops as soon as one
batch member has converged, or when all have failed.
|
max_line_search_iterations
|
Python int. The maximum number of iterations
for the hager_zhang line search algorithm.
|
f_absolute_tolerance
|
Scalar Tensor of real dtype. If the absolute change
in the objective value between one iteration and the next is smaller
than this value, the algorithm is stopped.
|
name
|
(Optional) Python str. The name prefixed to the ops created by this
function. If not supplied, the default name 'minimize' is used.
|
Returns |
optimizer_results
|
A namedtuple containing the following items:
converged: Scalar boolean tensor indicating whether the minimum was
found within tolerance.
failed: Scalar boolean tensor indicating whether a line search
step failed to find a suitable step size satisfying Wolfe
conditions. In the absence of any constraints on the
number of objective evaluations permitted, this value will
be the complement of converged . However, if there is
a constraint and the search stopped due to available
evaluations being exhausted, both failed and converged
will be simultaneously False.
num_objective_evaluations: The total number of objective
evaluations performed.
position: A tensor containing the last argument value found
during the search. If the search converged, then
this value is the argmin of the objective function.
objective_value: A tensor containing the value of the objective
function at the position . If the search converged, then this is
the (local) minimum of the objective function.
objective_gradient: A tensor containing the gradient of the objective
function at the position . If the search converged the
max-norm of this tensor should be below the tolerance.
position_deltas: A tensor encoding information about the latest
changes in position during the algorithm execution.
gradient_deltas: A tensor encoding information about the latest
changes in objective_gradient during the algorithm execution.
|