# tfm.optimization.CosineDecayWithOffset

A LearningRateSchedule that uses a cosine decay with optional warmup.

Inherits From: `base_lr_class`

See Loshchilov & Hutter, ICLR2016, SGDR: Stochastic Gradient Descent with Warm Restarts.

For the idea of a linear warmup of our learning rate, see Goyal et al..

When we begin training a model, we often want an initial increase in our learning rate followed by a decay. If `warmup_target` is an int, this schedule applies a linear increase per optimizer step to our learning rate from `initial_learning_rate` to `warmup_target` for a duration of `warmup_steps`. Afterwards, it applies a cosine decay function taking our learning rate from `warmup_target` to `alpha` for a duration of `decay_steps`. If `warmup_target` is None we skip warmup and our decay will take our learning rate from `initial_learning_rate` to `alpha`. It requires a `step` value to compute the learning rate. You can just pass a TensorFlow variable that you increment at each training step.

The schedule is a 1-arg callable that produces a warmup followed by a decayed learning rate when passed the current optimizer step. This can be useful for changing the learning rate value across different invocations of optimizer functions.

Our warmup is computed as:

``````def warmup_learning_rate(step):
completed_fraction = step / warmup_steps
total_delta = target_warmup - initial_learning_rate
return completed_fraction * total_delta
``````

And our decay is computed as:

``````if warmup_target is None:
initial_decay_lr = initial_learning_rate
else:
initial_decay_lr = warmup_target

def decayed_learning_rate(step):
step = min(step, decay_steps)
cosine_decay = 0.5 * (1 + cos(pi * step / decay_steps))
decayed = (1 - alpha) * cosine_decay + alpha
return initial_decay_lr * decayed
``````

Example usage without warmup:

``````decay_steps = 1000
initial_learning_rate = 0.1
lr_decayed_fn = tf.keras.optimizers.schedules.CosineDecay(
initial_learning_rate, decay_steps)
``````

Example usage with warmup:

``````decay_steps = 1000
initial_learning_rate = 0
warmup_steps = 1000
target_learning_rate = 0.1
lr_warmup_decayed_fn = tf.keras.optimizers.schedules.CosineDecay(
initial_learning_rate, decay_steps, warmup_target=target_learning_rate,
warmup_steps=warmup_steps
)
``````

You can pass this schedule directly into a `tf.keras.optimizers.Optimizer` as the learning rate. The learning rate schedule is also serializable and deserializable using `tf.keras.optimizers.schedules.serialize` and `tf.keras.optimizers.schedules.deserialize`.

A 1-arg callable learning rate schedule that takes the current optimizer step and outputs the decayed learning rate, a scalar `Tensor` of the same type as `initial_learning_rate`.

## Child Classes

`class base_lr_class`

## Methods

### `from_config`

Instantiates a `LearningRateSchedule` from its config.

Args
`config` Output of `get_config()`.

Returns
A `LearningRateSchedule` instance.

### `__call__`

View source

[{ "type": "thumb-down", "id": "missingTheInformationINeed", "label":"Missing the information I need" },{ "type": "thumb-down", "id": "tooComplicatedTooManySteps", "label":"Too complicated / too many steps" },{ "type": "thumb-down", "id": "outOfDate", "label":"Out of date" },{ "type": "thumb-down", "id": "samplesCodeIssue", "label":"Samples / code issue" },{ "type": "thumb-down", "id": "otherDown", "label":"Other" }]
[{ "type": "thumb-up", "id": "easyToUnderstand", "label":"Easy to understand" },{ "type": "thumb-up", "id": "solvedMyProblem", "label":"Solved my problem" },{ "type": "thumb-up", "id": "otherUp", "label":"Other" }]
{ "lastModified": "Last updated 2024-02-02 UTC.", "confidential": False }