Model Averaging

View on TensorFlow.org Run in Google Colab View source on GitHub Download notebook

Overview

This notebook demonstrates how to use Moving Average Optimizer along with the Model Average Checkpoint from tensorflow addons package.

Moving Averaging

The advantage of Moving Averaging is that they are less prone to rampant loss shifts or irregular data representation in the latest batch. It gives a smooothened and a more general idea of the model training until some point.

Stochastic Averaging

Stochastic Weight Averaging converges to wider optima. By doing so, it resembles geometric ensembeling. SWA is a simple method to improve model performance when used as a wrapper around other optimizers and averaging results from different points of trajectory of the inner optimizer.

Model Average Checkpoint

callbacks.ModelCheckpoint doesn't give you the option to save moving average weights in the middle of training, which is why Model Average Optimizers required a custom callback. Using the update_weights parameter, ModelAverageCheckpoint allows you to:

  1. Assign the moving average weights to the model, and save them.
  2. Keep the old non-averaged weights, but the saved model uses the average weights.

Setup

pip install -U tensorflow-addons
import tensorflow as tf
import tensorflow_addons as tfa
import numpy as np
import os

Build Model

def create_model(opt):
    model = tf.keras.models.Sequential([
        tf.keras.layers.Flatten(),                         
        tf.keras.layers.Dense(64, activation='relu'),
        tf.keras.layers.Dense(64, activation='relu'),
        tf.keras.layers.Dense(10, activation='softmax')
    ])

    model.compile(optimizer=opt,
                    loss='sparse_categorical_crossentropy',
                    metrics=['accuracy'])

    return model

Prepare Dataset

#Load Fashion MNIST dataset
train, test = tf.keras.datasets.fashion_mnist.load_data()

images, labels = train
images = images/255.0
labels = labels.astype(np.int32)

fmnist_train_ds = tf.data.Dataset.from_tensor_slices((images, labels))
fmnist_train_ds = fmnist_train_ds.shuffle(5000).batch(32)

test_images, test_labels = test
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
29515/29515 [==============================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
26421880/26421880 [==============================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
5148/5148 [==============================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz
4422102/4422102 [==============================] - 0s 0us/step

We will be comparing three optimizers here:

  • Unwrapped SGD
  • SGD with Moving Average
  • SGD with Stochastic Weight Averaging

And see how they perform with the same model.

#Optimizers 
sgd = tf.keras.optimizers.SGD(0.01)
moving_avg_sgd = tfa.optimizers.MovingAverage(sgd)
stocastic_avg_sgd = tfa.optimizers.SWA(sgd)

Both MovingAverage and StochasticAverage optimizers use ModelAverageCheckpoint.

#Callback 
checkpoint_path = "./training/cp-{epoch:04d}.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)

cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_dir,
                                                 save_weights_only=True,
                                                 verbose=1)
avg_callback = tfa.callbacks.AverageModelCheckpoint(filepath=checkpoint_dir, 
                                                    update_weights=True)

Train Model

Vanilla SGD Optimizer

#Build Model
model = create_model(sgd)

#Train the network
model.fit(fmnist_train_ds, epochs=5, callbacks=[cp_callback])
Epoch 1/5
1857/1875 [============================>.] - ETA: 0s - loss: 0.7636 - accuracy: 0.7437
Epoch 1: saving model to ./training
1875/1875 [==============================] - 4s 2ms/step - loss: 0.7619 - accuracy: 0.7442
Epoch 2/5
1853/1875 [============================>.] - ETA: 0s - loss: 0.5005 - accuracy: 0.8250
Epoch 2: saving model to ./training
1875/1875 [==============================] - 3s 2ms/step - loss: 0.5010 - accuracy: 0.8248
Epoch 3/5
1870/1875 [============================>.] - ETA: 0s - loss: 0.4564 - accuracy: 0.8395
Epoch 3: saving model to ./training
1875/1875 [==============================] - 4s 2ms/step - loss: 0.4566 - accuracy: 0.8394
Epoch 4/5
1851/1875 [============================>.] - ETA: 0s - loss: 0.4302 - accuracy: 0.8497
Epoch 4: saving model to ./training
1875/1875 [==============================] - 4s 2ms/step - loss: 0.4305 - accuracy: 0.8496
Epoch 5/5
1872/1875 [============================>.] - ETA: 0s - loss: 0.4126 - accuracy: 0.8556
Epoch 5: saving model to ./training
1875/1875 [==============================] - 4s 2ms/step - loss: 0.4127 - accuracy: 0.8556
<keras.callbacks.History at 0x7f0cb00d0880>
#Evalute results
model.load_weights(checkpoint_dir)
loss, accuracy = model.evaluate(test_images, test_labels, batch_size=32, verbose=2)
print("Loss :", loss)
print("Accuracy :", accuracy)
313/313 - 1s - loss: 76.6787 - accuracy: 0.8127 - 531ms/epoch - 2ms/step
Loss : 76.67872619628906
Accuracy : 0.8126999735832214

Moving Average SGD

#Build Model
model = create_model(moving_avg_sgd)

#Train the network
model.fit(fmnist_train_ds, epochs=5, callbacks=[avg_callback])
Epoch 1/5
1852/1875 [============================>.] - ETA: 0s - loss: 0.7636 - accuracy: 0.7501INFO:tensorflow:Assets written to: ./training/assets
1875/1875 [==============================] - 5s 2ms/step - loss: 0.7615 - accuracy: 0.7508
Epoch 2/5
1851/1875 [============================>.] - ETA: 0s - loss: 0.5023 - accuracy: 0.8256INFO:tensorflow:Assets written to: ./training/assets
1875/1875 [==============================] - 4s 2ms/step - loss: 0.5020 - accuracy: 0.8258
Epoch 3/5
1868/1875 [============================>.] - ETA: 0s - loss: 0.4579 - accuracy: 0.8403INFO:tensorflow:Assets written to: ./training/assets
1875/1875 [==============================] - 4s 2ms/step - loss: 0.4582 - accuracy: 0.8402
Epoch 4/5
1872/1875 [============================>.] - ETA: 0s - loss: 0.4330 - accuracy: 0.8487INFO:tensorflow:Assets written to: ./training/assets
1875/1875 [==============================] - 4s 2ms/step - loss: 0.4328 - accuracy: 0.8487
Epoch 5/5
1853/1875 [============================>.] - ETA: 0s - loss: 0.4128 - accuracy: 0.8553INFO:tensorflow:Assets written to: ./training/assets
1875/1875 [==============================] - 4s 2ms/step - loss: 0.4133 - accuracy: 0.8550
<keras.callbacks.History at 0x7f0c543f2550>
#Evalute results
model.load_weights(checkpoint_dir)
loss, accuracy = model.evaluate(test_images, test_labels, batch_size=32, verbose=2)
print("Loss :", loss)
print("Accuracy :", accuracy)
313/313 - 1s - loss: 76.6787 - accuracy: 0.8127 - 532ms/epoch - 2ms/step
Loss : 76.67872619628906
Accuracy : 0.8126999735832214

Stocastic Weight Average SGD

#Build Model
model = create_model(stocastic_avg_sgd)

#Train the network
model.fit(fmnist_train_ds, epochs=5, callbacks=[avg_callback])
Epoch 1/5
1869/1875 [============================>.] - ETA: 0s - loss: 0.7733 - accuracy: 0.7381INFO:tensorflow:Assets written to: ./training/assets
1875/1875 [==============================] - 6s 3ms/step - loss: 0.7724 - accuracy: 0.7385
Epoch 2/5
1863/1875 [============================>.] - ETA: 0s - loss: 0.5625 - accuracy: 0.8074INFO:tensorflow:Assets written to: ./training/assets
1875/1875 [==============================] - 5s 3ms/step - loss: 0.5623 - accuracy: 0.8075
Epoch 3/5
1866/1875 [============================>.] - ETA: 0s - loss: 0.5308 - accuracy: 0.8172INFO:tensorflow:Assets written to: ./training/assets
1875/1875 [==============================] - 5s 3ms/step - loss: 0.5306 - accuracy: 0.8171
Epoch 4/5
1872/1875 [============================>.] - ETA: 0s - loss: 0.5166 - accuracy: 0.8216INFO:tensorflow:Assets written to: ./training/assets
1875/1875 [==============================] - 5s 3ms/step - loss: 0.5167 - accuracy: 0.8215
Epoch 5/5
1863/1875 [============================>.] - ETA: 0s - loss: 0.5070 - accuracy: 0.8239INFO:tensorflow:Assets written to: ./training/assets
1875/1875 [==============================] - 5s 3ms/step - loss: 0.5066 - accuracy: 0.8241
<keras.callbacks.History at 0x7f0c34402cd0>
#Evalute results
model.load_weights(checkpoint_dir)
loss, accuracy = model.evaluate(test_images, test_labels, batch_size=32, verbose=2)
print("Loss :", loss)
print("Accuracy :", accuracy)
313/313 - 1s - loss: 76.6787 - accuracy: 0.8127 - 504ms/epoch - 2ms/step
Loss : 76.67872619628906
Accuracy : 0.8126999735832214