모델 평균

TensorFlow.org에서 보기 Google Colab에서 실행하기 GitHub에서 소스 보기 노트북 다운로드하기

개요

이 노트북은 TensorFlow Addons 패키지의 모델 평균 체크포인트와 함께 Moving Average Optimizer를 사용하는 방법을 보여줍니다.

이동 평균

이동 평균의 장점은 최신 배치에서 급격한 손실 이동이나 불규칙한 데이터 표현에 덜 취약하다는 것입니다. 어느 시점까지는 모델 훈련에 대한 좀 더 일반적인 아이디어를 제공합니다.

확률적 평균

Stocastic Weight Averaging(SWA)은 더 넓은 최적값으로 수렴됩니다. 기하학적 앙상블링과 비슷하게 됩니다. SWA는 다른 옵티마이저의 래퍼로 사용될 때 모델 성능을 개선하고 내부 옵티마이저의 서로 다른 궤적 포인트에서 결과를 평균화하는 간단한 방법입니다.

모델 평균 체크포인트

callbacks.ModelCheckpoint는 훈련 중에 이동 평균 가중치를 저장하는 옵션을 제공하지 않습니다. 따라서 Moving Average Optimizer에서 사용자 정의 콜백이 필요합니다. update_weights 매개변수와 ModelAverageCheckpoint를 사용하면 다음이 가능합니다.

  1. 이동 평균 가중치를 모델에 할당하고 저장합니다.
  2. 이전의 평균이 아닌 가중치를 유지하지만, 저장된 모델은 평균 가중치를 사용합니다.

설정

pip install -q -U tensorflow-addons
import tensorflow as tf
import tensorflow_addons as tfa
import numpy as np
import os

모델 빌드하기

def create_model(opt):
    model = tf.keras.models.Sequential([
        tf.keras.layers.Flatten(),                         
        tf.keras.layers.Dense(64, activation='relu'),
        tf.keras.layers.Dense(64, activation='relu'),
        tf.keras.layers.Dense(10, activation='softmax')
    ])

    model.compile(optimizer=opt,
                    loss='sparse_categorical_crossentropy',
                    metrics=['accuracy'])

    return model

데이터세트 준비하기

#Load Fashion MNIST dataset
train, test = tf.keras.datasets.fashion_mnist.load_data()

images, labels = train
images = images/255.0
labels = labels.astype(np.int32)

fmnist_train_ds = tf.data.Dataset.from_tensor_slices((images, labels))
fmnist_train_ds = fmnist_train_ds.shuffle(5000).batch(32)

test_images, test_labels = test
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
32768/29515 [=================================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
26427392/26421880 [==============================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
8192/5148 [===============================================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz
4423680/4422102 [==============================] - 0s 0us/step

여기에서는 3가지 옵티마이저를 비교할 것입니다.

  • 래핑되지 않은 SGD
  • 이동 평균을 사용하는 SGD
  • 확률적 가중치 평균을 사용하는 SGD

같은 모델에서 어떻게 동작하는지 확인합니다.

#Optimizers 
sgd = tf.keras.optimizers.SGD(0.01)
moving_avg_sgd = tfa.optimizers.MovingAverage(sgd)
stocastic_avg_sgd = tfa.optimizers.SWA(sgd)

MovingAverageStocasticAverage 옵티마이저 모두 ModelAverageCheckpoint를 사용합니다.

#Callback 
checkpoint_path = "./training/cp-{epoch:04d}.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)

cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_dir,
                                                 save_weights_only=True,
                                                 verbose=1)
avg_callback = tfa.callbacks.AverageModelCheckpoint(filepath=checkpoint_dir, 
                                                    update_weights=True)

모델 훈련하기

Vanilla SGD 옵티마이저

#Build Model
model = create_model(sgd)

#Train the network
model.fit(fmnist_train_ds, epochs=5, callbacks=[cp_callback])
Epoch 1/5
1875/1875 [==============================] - 4s 2ms/step - loss: 1.0748 - accuracy: 0.6571

Epoch 00001: saving model to ./training
Epoch 2/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.5133 - accuracy: 0.8224

Epoch 00002: saving model to ./training
Epoch 3/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.4605 - accuracy: 0.8380

Epoch 00003: saving model to ./training
Epoch 4/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.4315 - accuracy: 0.8469

Epoch 00004: saving model to ./training
Epoch 5/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.4078 - accuracy: 0.8563

Epoch 00005: saving model to ./training
<tensorflow.python.keras.callbacks.History at 0x7fb839e9fd68>
#Evalute results
model.load_weights(checkpoint_dir)
loss, accuracy = model.evaluate(test_images, test_labels, batch_size=32, verbose=2)
print("Loss :", loss)
print("Accuracy :", accuracy)
313/313 - 0s - loss: 79.6013 - accuracy: 0.8019
Loss : 79.60128021240234
Accuracy : 0.8019000291824341

이동 평균 SGD

#Build Model
model = create_model(moving_avg_sgd)

#Train the network
model.fit(fmnist_train_ds, epochs=5, callbacks=[avg_callback])
Epoch 1/5
1875/1875 [==============================] - 5s 2ms/step - loss: 1.1034 - accuracy: 0.6502
INFO:tensorflow:Assets written to: ./training/assets
Epoch 2/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.5254 - accuracy: 0.8154
INFO:tensorflow:Assets written to: ./training/assets
Epoch 3/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.4718 - accuracy: 0.8335
INFO:tensorflow:Assets written to: ./training/assets
Epoch 4/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.4436 - accuracy: 0.8423
INFO:tensorflow:Assets written to: ./training/assets
Epoch 5/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.4221 - accuracy: 0.8531
INFO:tensorflow:Assets written to: ./training/assets
<tensorflow.python.keras.callbacks.History at 0x7fb839f0d630>
#Evalute results
model.load_weights(checkpoint_dir)
loss, accuracy = model.evaluate(test_images, test_labels, batch_size=32, verbose=2)
print("Loss :", loss)
print("Accuracy :", accuracy)
313/313 - 0s - loss: 79.6013 - accuracy: 0.8019
Loss : 79.60128021240234
Accuracy : 0.8019000291824341

확률적 가중치 평균 SGD

#Build Model
model = create_model(stocastic_avg_sgd)

#Train the network
model.fit(fmnist_train_ds, epochs=5, callbacks=[avg_callback])
Epoch 1/5
1875/1875 [==============================] - 6s 3ms/step - loss: 1.1160 - accuracy: 0.6463
INFO:tensorflow:Assets written to: ./training/assets
Epoch 2/5
1875/1875 [==============================] - 5s 3ms/step - loss: 0.6035 - accuracy: 0.7968
INFO:tensorflow:Assets written to: ./training/assets
Epoch 3/5
1875/1875 [==============================] - 5s 3ms/step - loss: 0.5594 - accuracy: 0.8102
INFO:tensorflow:Assets written to: ./training/assets
Epoch 4/5
1875/1875 [==============================] - 5s 3ms/step - loss: 0.5365 - accuracy: 0.8170
INFO:tensorflow:Assets written to: ./training/assets
Epoch 5/5
1875/1875 [==============================] - 5s 3ms/step - loss: 0.5239 - accuracy: 0.8199
INFO:tensorflow:Assets written to: ./training/assets
<tensorflow.python.keras.callbacks.History at 0x7fb7ac51dac8>
#Evalute results
model.load_weights(checkpoint_dir)
loss, accuracy = model.evaluate(test_images, test_labels, batch_size=32, verbose=2)
print("Loss :", loss)
print("Accuracy :", accuracy)
313/313 - 0s - loss: 79.6013 - accuracy: 0.8019
Loss : 79.60128021240234
Accuracy : 0.8019000291824341