Halaman ini diterjemahkan oleh Cloud Translation API.
Switch to English

Menulis loop pelatihan dari awal

Lihat di TensorFlow.org Jalankan di Google Colab Lihat sumber di GitHub Unduh buku catatan

Mempersiapkan

 import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
 

pengantar

Keras menyediakan pelatihan default dan loop evaluasi, fit() dan evaluate() . Penggunaannya dicakup dalam panduan Pelatihan & evaluasi dengan metode bawaan.

Jika Anda ingin menyesuaikan algoritme pembelajaran model Anda sambil tetap memanfaatkan kenyamanan fit() (misalnya, untuk melatih GAN menggunakan fit() ), Anda dapat membuat subkelas kelas Model dan mengimplementasikan metode train_step() Anda sendiri, yang mana disebut berulang kali selama fit() . Ini tercakup dalam panduan Menyesuaikan apa yang terjadi dalam fit() .

Sekarang, jika Anda menginginkan kontrol tingkat sangat rendah atas pelatihan & evaluasi, Anda harus menulis putaran pelatihan & evaluasi Anda sendiri dari awal. Tentang inilah panduan ini.

Menggunakan GradientTape : contoh ujung ke ujung pertama

Memanggil model di dalam cakupan GradientTape memungkinkan Anda mengambil gradien dari bobot lapisan yang dapat dilatih sehubungan dengan nilai kerugian. Dengan menggunakan instance pengoptimal, Anda dapat menggunakan gradien ini untuk memperbarui variabel ini (yang dapat Anda ambil menggunakan model.trainable_weights ).

Mari pertimbangkan model MNIST sederhana:

 inputs = keras.Input(shape=(784,), name="digits")
x1 = layers.Dense(64, activation="relu")(inputs)
x2 = layers.Dense(64, activation="relu")(x1)
outputs = layers.Dense(10, name="predictions")(x2)
model = keras.Model(inputs=inputs, outputs=outputs)
 

Mari kita latih menggunakan gradien batch mini dengan loop pelatihan kustom.

Pertama, kita akan membutuhkan pengoptimal, fungsi kerugian, dan kumpulan data:

 # Instantiate an optimizer.
optimizer = keras.optimizers.SGD(learning_rate=1e-3)
# Instantiate a loss function.
loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)

# Prepare the training dataset.
batch_size = 64
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train = np.reshape(x_train, (-1, 784))
x_test = np.reshape(x_test, (-1, 784))
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size)
 

Inilah loop pelatihan kami:

  • Kami membuka loop for yang berulang selama beberapa periode
  • Untuk setiap epoch, kami membuka loop for yang melakukan iterasi atas dataset, dalam batch
  • Untuk setiap batch, kami membuka cakupan GradientTape()
  • Di dalam ruang lingkup ini, kami memanggil model (forward pass) dan menghitung kerugiannya
  • Di luar ruang lingkup, kami mengambil gradien bobot model terkait dengan kerugian
  • Terakhir, kami menggunakan pengoptimal untuk memperbarui bobot model berdasarkan gradien
 epochs = 2
for epoch in range(epochs):
    print("\nStart of epoch %d" % (epoch,))

    # Iterate over the batches of the dataset.
    for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):

        # Open a GradientTape to record the operations run
        # during the forward pass, which enables autodifferentiation.
        with tf.GradientTape() as tape:

            # Run the forward pass of the layer.
            # The operations that the layer applies
            # to its inputs are going to be recorded
            # on the GradientTape.
            logits = model(x_batch_train, training=True)  # Logits for this minibatch

            # Compute the loss value for this minibatch.
            loss_value = loss_fn(y_batch_train, logits)

        # Use the gradient tape to automatically retrieve
        # the gradients of the trainable variables with respect to the loss.
        grads = tape.gradient(loss_value, model.trainable_weights)

        # Run one step of gradient descent by updating
        # the value of the variables to minimize the loss.
        optimizer.apply_gradients(zip(grads, model.trainable_weights))

        # Log every 200 batches.
        if step % 200 == 0:
            print(
                "Training loss (for one batch) at step %d: %.4f"
                % (step, float(loss_value))
            )
            print("Seen so far: %s samples" % ((step + 1) * 64))
 

Start of epoch 0
Training loss (for one batch) at step 0: 92.5677
Seen so far: 64 samples
Training loss (for one batch) at step 200: 1.9201
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 0.7029
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 1.0511
Seen so far: 38464 samples
Training loss (for one batch) at step 800: 0.5134
Seen so far: 51264 samples

Start of epoch 1
Training loss (for one batch) at step 0: 0.4872
Seen so far: 64 samples
Training loss (for one batch) at step 200: 0.5805
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 0.6873
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 0.5880
Seen so far: 38464 samples
Training loss (for one batch) at step 800: 0.3768
Seen so far: 51264 samples

Penanganan metrik tingkat rendah

Mari tambahkan pemantauan metrik ke loop dasar ini.

Anda dapat dengan mudah menggunakan kembali metrik bawaan (atau yang khusus yang Anda tulis) dalam loop pelatihan semacam itu yang ditulis dari awal. Berikut alurnya:

  • Instantiate metric pada awal loop
  • Panggil metric.update_state() setelah setiap batch
  • Panggil metric.result() saat Anda perlu menampilkan nilai metrik saat ini
  • Panggil metric.reset_states() saat Anda perlu menghapus status metrik (biasanya di akhir masa)

Mari kita gunakan pengetahuan ini untuk menghitung SparseCategoricalAccuracy pada data validasi di akhir setiap zaman:

 # Get model
inputs = keras.Input(shape=(784,), name="digits")
x = layers.Dense(64, activation="relu", name="dense_1")(inputs)
x = layers.Dense(64, activation="relu", name="dense_2")(x)
outputs = layers.Dense(10, name="predictions")(x)
model = keras.Model(inputs=inputs, outputs=outputs)

# Instantiate an optimizer to train the model.
optimizer = keras.optimizers.SGD(learning_rate=1e-3)
# Instantiate a loss function.
loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)

# Prepare the metrics.
train_acc_metric = keras.metrics.SparseCategoricalAccuracy()
val_acc_metric = keras.metrics.SparseCategoricalAccuracy()

# Prepare the training dataset.
batch_size = 64
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size)

# Prepare the validation dataset.
# Reserve 10,000 samples for validation.
x_val = x_train[-10000:]
y_val = y_train[-10000:]
x_train = x_train[:-10000]
y_train = y_train[:-10000]
val_dataset = tf.data.Dataset.from_tensor_slices((x_val, y_val))
val_dataset = val_dataset.batch(64)
 

Inilah putaran pelatihan & evaluasi kami:

 import time

epochs = 2
for epoch in range(epochs):
    print("\nStart of epoch %d" % (epoch,))
    start_time = time.time()

    # Iterate over the batches of the dataset.
    for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
        with tf.GradientTape() as tape:
            logits = model(x_batch_train, training=True)
            loss_value = loss_fn(y_batch_train, logits)
        grads = tape.gradient(loss_value, model.trainable_weights)
        optimizer.apply_gradients(zip(grads, model.trainable_weights))

        # Update training metric.
        train_acc_metric.update_state(y_batch_train, logits)

        # Log every 200 batches.
        if step % 200 == 0:
            print(
                "Training loss (for one batch) at step %d: %.4f"
                % (step, float(loss_value))
            )
            print("Seen so far: %d samples" % ((step + 1) * 64))

    # Display metrics at the end of each epoch.
    train_acc = train_acc_metric.result()
    print("Training acc over epoch: %.4f" % (float(train_acc),))

    # Reset training metrics at the end of each epoch
    train_acc_metric.reset_states()

    # Run a validation loop at the end of each epoch.
    for x_batch_val, y_batch_val in val_dataset:
        val_logits = model(x_batch_val, training=False)
        # Update val metrics
        val_acc_metric.update_state(y_batch_val, val_logits)
    val_acc = val_acc_metric.result()
    val_acc_metric.reset_states()
    print("Validation acc: %.4f" % (float(val_acc),))
    print("Time taken: %.2fs" % (time.time() - start_time))
 

Start of epoch 0
Training loss (for one batch) at step 0: 86.8484
Seen so far: 64 samples
Training loss (for one batch) at step 200: 1.5473
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 1.1590
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 0.9719
Seen so far: 38464 samples
Training loss (for one batch) at step 800: 0.9732
Seen so far: 51264 samples
Training acc over epoch: 0.6902
Validation acc: 0.8232
Time taken: 5.97s

Start of epoch 1
Training loss (for one batch) at step 0: 0.7543
Seen so far: 64 samples
Training loss (for one batch) at step 200: 1.0089
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 0.3905
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 0.4995
Seen so far: 38464 samples
Training loss (for one batch) at step 800: 0.4404
Seen so far: 51264 samples
Training acc over epoch: 0.8263
Validation acc: 0.8719
Time taken: 5.76s

Mempercepat langkah pelatihan Anda dengan tf.function

Waktu proses default di TensorFlow 2.0 adalah eksekusi yang cepat . Dengan demikian, loop pelatihan kita di atas dijalankan dengan penuh semangat.

Ini bagus untuk debugging, tetapi kompilasi grafik memiliki keunggulan kinerja yang pasti. Decribing komputasi Anda sebagai grafik statis memungkinkan kerangka kerja untuk menerapkan optimasi kinerja global. Hal ini tidak mungkin jika framework dibatasi untuk menjalankan satu operasi demi operasi dengan rakus, tanpa pengetahuan tentang apa yang akan terjadi selanjutnya.

Anda dapat mengompilasi ke dalam grafik statis fungsi apa pun yang menggunakan tensor sebagai input. Cukup tambahkan dekorator @tf.function di atasnya, seperti ini:

 @tf.function
def train_step(x, y):
    with tf.GradientTape() as tape:
        logits = model(x, training=True)
        loss_value = loss_fn(y, logits)
    grads = tape.gradient(loss_value, model.trainable_weights)
    optimizer.apply_gradients(zip(grads, model.trainable_weights))
    train_acc_metric.update_state(y, logits)
    return loss_value

 

Mari lakukan hal yang sama dengan langkah evaluasi:

 @tf.function
def test_step(x, y):
    val_logits = model(x, training=False)
    val_acc_metric.update_state(y, val_logits)

 

Sekarang, mari jalankan kembali loop pelatihan kami dengan langkah pelatihan terkompilasi ini:

 import time

epochs = 2
for epoch in range(epochs):
    print("\nStart of epoch %d" % (epoch,))
    start_time = time.time()

    # Iterate over the batches of the dataset.
    for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
        loss_value = train_step(x_batch_train, y_batch_train)

        # Log every 200 batches.
        if step % 200 == 0:
            print(
                "Training loss (for one batch) at step %d: %.4f"
                % (step, float(loss_value))
            )
            print("Seen so far: %d samples" % ((step + 1) * 64))

    # Display metrics at the end of each epoch.
    train_acc = train_acc_metric.result()
    print("Training acc over epoch: %.4f" % (float(train_acc),))

    # Reset training metrics at the end of each epoch
    train_acc_metric.reset_states()

    # Run a validation loop at the end of each epoch.
    for x_batch_val, y_batch_val in val_dataset:
        test_step(x_batch_val, y_batch_val)

    val_acc = val_acc_metric.result()
    val_acc_metric.reset_states()
    print("Validation acc: %.4f" % (float(val_acc),))
    print("Time taken: %.2fs" % (time.time() - start_time))
 

Start of epoch 0
Training loss (for one batch) at step 0: 0.7263
Seen so far: 64 samples
Training loss (for one batch) at step 200: 0.3066
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 0.8204
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 0.2876
Seen so far: 38464 samples
Training loss (for one batch) at step 800: 0.2506
Seen so far: 51264 samples
Training acc over epoch: 0.8668
Validation acc: 0.8946
Time taken: 1.44s

Start of epoch 1
Training loss (for one batch) at step 0: 0.2911
Seen so far: 64 samples
Training loss (for one batch) at step 200: 0.2892
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 0.6538
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 0.2441
Seen so far: 38464 samples
Training loss (for one batch) at step 800: 0.5973
Seen so far: 51264 samples
Training acc over epoch: 0.8846
Validation acc: 0.9057
Time taken: 1.11s

Jauh lebih cepat, bukan?

Penanganan kerugian tingkat rendah yang dilacak oleh model

Lapisan & model secara rekursif melacak setiap kerugian yang dibuat selama self.add_loss(value) melewati lapisan yang memanggil self.add_loss(value) . Daftar yang dihasilkan dari nilai kerugian skalar yang tersedia melalui properti model.losses pada akhir depan lulus.

Jika Anda ingin menggunakan komponen kerugian ini, Anda harus menjumlahkannya dan menambahkannya ke kerugian utama dalam langkah pelatihan Anda.

Pertimbangkan lapisan ini, yang menciptakan kerugian regularisasi aktivitas:

 class ActivityRegularizationLayer(layers.Layer):
    def call(self, inputs):
        self.add_loss(1e-2 * tf.reduce_sum(inputs))
        return inputs

 

Mari kita membangun model yang sangat sederhana yang menggunakannya:

 inputs = keras.Input(shape=(784,), name="digits")
x = layers.Dense(64, activation="relu")(inputs)
# Insert activity regularization as a layer
x = ActivityRegularizationLayer()(x)
x = layers.Dense(64, activation="relu")(x)
outputs = layers.Dense(10, name="predictions")(x)

model = keras.Model(inputs=inputs, outputs=outputs)
 

Seperti apa langkah pelatihan kita sekarang:

 @tf.function
def train_step(x, y):
    with tf.GradientTape() as tape:
        logits = model(x, training=True)
        loss_value = loss_fn(y, logits)
        # Add any extra losses created during the forward pass.
        loss_value += sum(model.losses)
    grads = tape.gradient(loss_value, model.trainable_weights)
    optimizer.apply_gradients(zip(grads, model.trainable_weights))
    train_acc_metric.update_state(y, logits)
    return loss_value

 

Ringkasan

Sekarang Anda tahu segalanya yang perlu diketahui tentang menggunakan loop pelatihan bawaan dan menulis sendiri dari awal.

Untuk menyimpulkan, berikut adalah contoh ujung-ke-ujung sederhana yang menghubungkan semua yang telah Anda pelajari dalam panduan ini: DCGAN yang dilatih tentang digit MNIST.

Contoh ujung-ke-ujung: loop pelatihan GAN dari awal

Anda mungkin akrab dengan Generative Adversarial Networks (GANs). GAN dapat menghasilkan gambar baru yang terlihat hampir nyata, dengan mempelajari distribusi laten dari kumpulan data pelatihan gambar ("ruang laten" gambar).

GAN terdiri dari dua bagian: model "generator" yang memetakan titik di ruang laten ke titik di ruang gambar, model "diskriminator", pengklasifikasi yang dapat membedakan antara gambar nyata (dari set data pelatihan) dan gambar palsu (keluaran dari jaringan generator).

Lingkaran pelatihan GAN terlihat seperti ini:

1) Latih diskriminator. - Contoh kumpulan titik acak di ruang laten. - Ubah poin menjadi gambar palsu melalui model "generator". - Dapatkan kumpulan gambar nyata dan gabungkan dengan gambar yang dihasilkan. - Latih model "diskriminator" untuk mengklasifikasikan gambar yang dihasilkan vs gambar nyata.

2) Latih generator. - Contoh titik acak di ruang laten. - Ubah poin menjadi gambar palsu melalui jaringan "generator". - Dapatkan sekumpulan gambar nyata dan gabungkan dengan gambar yang dihasilkan. - Latih model "generator" untuk "menipu" diskriminator dan mengklasifikasikan gambar palsu sebagai nyata.

Untuk gambaran umum yang lebih rinci tentang cara kerja GAN, lihat Pembelajaran Mendalam dengan Python .

Mari menerapkan loop pelatihan ini. Pertama, buat diskriminator yang dimaksudkan untuk mengklasifikasikan angka palsu vs asli:

 discriminator = keras.Sequential(
    [
        keras.Input(shape=(28, 28, 1)),
        layers.Conv2D(64, (3, 3), strides=(2, 2), padding="same"),
        layers.LeakyReLU(alpha=0.2),
        layers.Conv2D(128, (3, 3), strides=(2, 2), padding="same"),
        layers.LeakyReLU(alpha=0.2),
        layers.GlobalMaxPooling2D(),
        layers.Dense(1),
    ],
    name="discriminator",
)
discriminator.summary()
 
Model: "discriminator"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 14, 14, 64)        640       
_________________________________________________________________
leaky_re_lu (LeakyReLU)      (None, 14, 14, 64)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 7, 7, 128)         73856     
_________________________________________________________________
leaky_re_lu_1 (LeakyReLU)    (None, 7, 7, 128)         0         
_________________________________________________________________
global_max_pooling2d (Global (None, 128)               0         
_________________________________________________________________
dense_4 (Dense)              (None, 1)                 129       
=================================================================
Total params: 74,625
Trainable params: 74,625
Non-trainable params: 0
_________________________________________________________________

Kemudian mari kita buat jaringan generator, yang mengubah vektor laten menjadi keluaran bentuk (28, 28, 1) (mewakili digit MNIST):

 latent_dim = 128

generator = keras.Sequential(
    [
        keras.Input(shape=(latent_dim,)),
        # We want to generate 128 coefficients to reshape into a 7x7x128 map
        layers.Dense(7 * 7 * 128),
        layers.LeakyReLU(alpha=0.2),
        layers.Reshape((7, 7, 128)),
        layers.Conv2DTranspose(128, (4, 4), strides=(2, 2), padding="same"),
        layers.LeakyReLU(alpha=0.2),
        layers.Conv2DTranspose(128, (4, 4), strides=(2, 2), padding="same"),
        layers.LeakyReLU(alpha=0.2),
        layers.Conv2D(1, (7, 7), padding="same", activation="sigmoid"),
    ],
    name="generator",
)
 

Inilah bit kuncinya: loop pelatihan. Seperti yang Anda lihat, ini sangat mudah. Fungsi langkah pelatihan hanya membutuhkan 17 baris.

 # Instantiate one optimizer for the discriminator and another for the generator.
d_optimizer = keras.optimizers.Adam(learning_rate=0.0003)
g_optimizer = keras.optimizers.Adam(learning_rate=0.0004)

# Instantiate a loss function.
loss_fn = keras.losses.BinaryCrossentropy(from_logits=True)


@tf.function
def train_step(real_images):
    # Sample random points in the latent space
    random_latent_vectors = tf.random.normal(shape=(batch_size, latent_dim))
    # Decode them to fake images
    generated_images = generator(random_latent_vectors)
    # Combine them with real images
    combined_images = tf.concat([generated_images, real_images], axis=0)

    # Assemble labels discriminating real from fake images
    labels = tf.concat(
        [tf.ones((batch_size, 1)), tf.zeros((real_images.shape[0], 1))], axis=0
    )
    # Add random noise to the labels - important trick!
    labels += 0.05 * tf.random.uniform(labels.shape)

    # Train the discriminator
    with tf.GradientTape() as tape:
        predictions = discriminator(combined_images)
        d_loss = loss_fn(labels, predictions)
    grads = tape.gradient(d_loss, discriminator.trainable_weights)
    d_optimizer.apply_gradients(zip(grads, discriminator.trainable_weights))

    # Sample random points in the latent space
    random_latent_vectors = tf.random.normal(shape=(batch_size, latent_dim))
    # Assemble labels that say "all real images"
    misleading_labels = tf.zeros((batch_size, 1))

    # Train the generator (note that we should *not* update the weights
    # of the discriminator)!
    with tf.GradientTape() as tape:
        predictions = discriminator(generator(random_latent_vectors))
        g_loss = loss_fn(misleading_labels, predictions)
    grads = tape.gradient(g_loss, generator.trainable_weights)
    g_optimizer.apply_gradients(zip(grads, generator.trainable_weights))
    return d_loss, g_loss, generated_images

 

Mari latih GAN kita, dengan memanggil train_step berulang kali dalam train_step gambar.

Karena pembeda dan generator kami adalah convnets, Anda akan ingin menjalankan kode ini pada GPU.

 import os

# Prepare the dataset. We use both the training & test MNIST digits.
batch_size = 64
(x_train, _), (x_test, _) = keras.datasets.mnist.load_data()
all_digits = np.concatenate([x_train, x_test])
all_digits = all_digits.astype("float32") / 255.0
all_digits = np.reshape(all_digits, (-1, 28, 28, 1))
dataset = tf.data.Dataset.from_tensor_slices(all_digits)
dataset = dataset.shuffle(buffer_size=1024).batch(batch_size)

epochs = 1  # In practice you need at least 20 epochs to generate nice digits.
save_dir = "./"

for epoch in range(epochs):
    print("\nStart epoch", epoch)

    for step, real_images in enumerate(dataset):
        # Train the discriminator & generator on one batch of real images.
        d_loss, g_loss, generated_images = train_step(real_images)

        # Logging.
        if step % 200 == 0:
            # Print metrics
            print("discriminator loss at step %d: %.2f" % (step, d_loss))
            print("adversarial loss at step %d: %.2f" % (step, g_loss))

            # Save one generated image
            img = tf.keras.preprocessing.image.array_to_img(
                generated_images[0] * 255.0, scale=False
            )
            img.save(os.path.join(save_dir, "generated_img" + str(step) + ".png"))

        # To limit execution time we stop after 10 steps.
        # Remove the lines below to actually train the model!
        if step > 10:
            break
 

Start epoch 0
discriminator loss at step 0: 0.68
adversarial loss at step 0: 0.67

Itu dia! Anda akan mendapatkan digit MNIST palsu yang terlihat bagus hanya setelah ~ 30 detik pelatihan di Colab GPU.