Adadelta is a more robust extension of Adagrad that adapts learning rates based on a moving window of gradient updates, instead of accumulating all past gradients. This way, Adadelta continues learning even when many updates have been done. Compared to Adagrad, in the original version of Adadelta you don't have to set an initial learning rate. In this version, initial learning rate and decay factor can be set, as in most other Keras optimizers.
It is recommended to leave the parameters of this optimizer at their default values.
lr: float >= 0. Initial learning rate, defaults to 1. It is recommended to leave it at the default value. rho: float >= 0. Adadelta decay factor, corresponding to fraction of gradient to keep at each time step. epsilon: float >= 0. Fuzz factor. If `None`, defaults to `K.epsilon()`. decay: float >= 0. Initial learning rate decay.
- [Adadelta - an adaptive learning rate method](http://arxiv.org/abs/1212.5701)
__init__( lr=1.0, rho=0.95, epsilon=None, decay=0.0, **kwargs )
Initialize self. See help(type(self)) for accurate signature.
from_config( cls, config )
get_gradients( loss, params )
Returns gradients of
loss with respect to
loss: Loss tensor.
params: List of variables.
List of gradient tensors.
ValueError: In case any gradient cannot be computed (e.g. if gradient function not implemented).
get_updates( loss, params )
Returns the current value of the weights of the optimizer.
A list of numpy arrays.
Sets the weights of the optimizer, from Numpy arrays.
Should only be called after computing the gradients (otherwise the optimizer has no weights).
weights: a list of Numpy arrays. The number of arrays and their shape must match number of the dimensions of the weights of the optimizer (i.e. it should match the output of
ValueError: in case of incompatible weight shapes.