TF 2.0 is out! Get hands-on practice at TF World, Oct 28-31. Use code TF20 for 20% off select passes. Register now

TensorFlow Addons Optimizers: LazyAdam

View on TensorFlow.org Run in Google Colab View source on GitHub Download notebook

Overview

This notebook will demonstrate how to use the lazy adam optimizer from the Addons package.

LazyAdam

LazyAdam is a variant of the Adam optimizer that handles sparse updates more efficiently. The original Adam algorithm maintains two moving-average accumulators for each trainable variable; the accumulators are updated at every step. This class provides lazier handling of gradient updates for sparse variables. It only updates moving-average accumulators for sparse variable indices that appear in the current batch, rather than updating the accumulators for all indices. Compared with the original Adam optimizer, it can provide large improvements in model training throughput for some applications. However, it provides slightly different semantics than the original Adam algorithm, and may lead to different empirical results.

Setup

from __future__ import absolute_import, division, print_function, unicode_literals

import tensorflow as tf
!pip install -q --no-deps tensorflow-addons~=0.6
import tensorflow_addons as tfa
import tensorflow_datasets as tfds
import numpy as np
from matplotlib import pyplot as plt
# Hyperparameters
batch_size=64
epochs=10

Build the Model

model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, input_shape=(784,), activation='relu', name='dense_1'),
    tf.keras.layers.Dense(64, activation='relu', name='dense_2'),
    tf.keras.layers.Dense(10, activation='softmax', name='predictions'),
])

Prepare the Data

# Load MNIST dataset as NumPy arrays
dataset = {}
num_validation = 10000
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Preprocess the data
x_train = x_train.reshape(-1, 784).astype('float32') / 255
x_test = x_test.reshape(-1, 784).astype('float32') / 255

Train and Evaluate

Simply replace typical keras optimizers with the new tfa optimizer

# Compile the model
model.compile(
    optimizer=tfa.optimizers.LazyAdam(0.001),  # Utilize TFA optimizer
    loss=tf.keras.losses.SparseCategoricalCrossentropy(),
    metrics=['accuracy'])

# Train the network
history = model.fit(
    x_train,
    y_train,
    batch_size=batch_size,
    epochs=epochs)
Train on 60000 samples
Epoch 1/10
60000/60000 [==============================] - 3s 53us/sample - loss: 0.3347 - accuracy: 0.9027
Epoch 2/10
60000/60000 [==============================] - 2s 37us/sample - loss: 0.1484 - accuracy: 0.9558
Epoch 3/10
60000/60000 [==============================] - 2s 38us/sample - loss: 0.1101 - accuracy: 0.9664
Epoch 4/10
60000/60000 [==============================] - 2s 38us/sample - loss: 0.0858 - accuracy: 0.9744
Epoch 5/10
60000/60000 [==============================] - 2s 38us/sample - loss: 0.0698 - accuracy: 0.9789
Epoch 6/10
60000/60000 [==============================] - 2s 37us/sample - loss: 0.0586 - accuracy: 0.9816
Epoch 7/10
60000/60000 [==============================] - 2s 38us/sample - loss: 0.0506 - accuracy: 0.9841
Epoch 8/10
60000/60000 [==============================] - 2s 39us/sample - loss: 0.0412 - accuracy: 0.9872
Epoch 9/10
60000/60000 [==============================] - 2s 39us/sample - loss: 0.0352 - accuracy: 0.9887
Epoch 10/10
60000/60000 [==============================] - 2s 40us/sample - loss: 0.0299 - accuracy: 0.9904
# Evaluate the network
print('Evaluate on test data:')
results = model.evaluate(x_test, y_test, batch_size=128, verbose = 2)
print('Test loss = {0}, Test acc: {1}'.format(results[0], results[1]))
Evaluate on test data:
10000/1 - 0s - loss: 0.0506 - accuracy: 0.9729
Test loss = 0.10084548294534906, Test acc: 0.9728999733924866