หน้านี้ได้รับการแปลโดย Cloud Translation API

ใช้ TPU

ดูบน TensorFlow.org

ทำงานใน Google Colab

ดูแหล่งที่มาบน GitHub

ดาวน์โหลดโน๊ตบุ๊ค

ก่อนที่คุณจะเรียกใช้สมุดบันทึก Colab นี้ ตรวจสอบให้แน่ใจว่าตัวเร่งฮาร์ดแวร์ของคุณเป็น TPU โดยตรวจสอบการตั้งค่าสมุดบันทึกของคุณ: รันไทม์ > เปลี่ยนประเภทรันไทม์ > ตัวเร่งฮาร์ดแวร์ > TPU

ติดตั้ง

import tensorflow as tf

import os
import tensorflow_datasets as tfds

/tmpfs/src/tf_docs_env/lib/python3.6/site-packages/requests/__init__.py:104: RequestsDependencyWarning: urllib3 (1.26.8) or chardet (2.3.0)/charset_normalizer (2.0.11) doesn't match a supported version!
  RequestsDependencyWarning)

การเริ่มต้น TPU

โดยทั่วไปแล้ว TPU เป็นผู้ปฏิบัติงาน Cloud TPU ซึ่งแตกต่างจากกระบวนการในเครื่องที่เรียกใช้โปรแกรม Python ของผู้ใช้ ดังนั้น คุณต้องดำเนินการเริ่มต้นบางอย่างเพื่อเชื่อมต่อกับคลัสเตอร์ระยะไกลและเริ่มต้น TPU โปรดทราบว่าอาร์กิวเมนต์ tpu tf.distribute.cluster_resolver.TPUClusterResolver เป็นที่อยู่พิเศษสำหรับ Colab เท่านั้น หากคุณกำลังเรียกใช้โค้ดบน Google Compute Engine (GCE) คุณควรส่งผ่านชื่อ Cloud TPU ของคุณแทน

resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='')
tf.config.experimental_connect_to_cluster(resolver)
# This is the TPU initialization code that has to be at the beginning.
tf.tpu.experimental.initialize_tpu_system(resolver)
print("All devices: ", tf.config.list_logical_devices('TPU'))

INFO:tensorflow:Clearing out eager caches
INFO:tensorflow:Clearing out eager caches
INFO:tensorflow:Initializing the TPU system: grpc://10.240.1.10:8470
INFO:tensorflow:Initializing the TPU system: grpc://10.240.1.10:8470
INFO:tensorflow:Finished initializing TPU system.
INFO:tensorflow:Finished initializing TPU system.
All devices:  [LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:0', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:1', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:2', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:3', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:4', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:5', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:6', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:7', device_type='TPU')]

การจัดวางอุปกรณ์ด้วยตนเอง

หลังจากที่เริ่มต้น TPU แล้ว คุณสามารถใช้การจัดวางอุปกรณ์ด้วยตนเองเพื่อวางการคำนวณบนอุปกรณ์ TPU เครื่องเดียวได้:

a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])

with tf.device('/TPU:0'):
  c = tf.matmul(a, b)

print("c device: ", c.device)
print(c)

c device:  /job:worker/replica:0/task:0/device:TPU:0
tf.Tensor(
[[22. 28.]
 [49. 64.]], shape=(2, 2), dtype=float32)

กลยุทธ์การจัดจำหน่าย

โดยปกติ คุณเรียกใช้โมเดลของคุณบน TPU หลายตัวในลักษณะคู่ขนานของข้อมูล ในการเผยแพร่โมเดลของคุณบน TPU หลายตัว (หรือตัวเร่งความเร็วอื่นๆ) TensorFlow ขอเสนอกลยุทธ์การกระจายหลายแบบ คุณสามารถแทนที่กลยุทธ์การจัดจำหน่ายของคุณและโมเดลจะทำงานบนอุปกรณ์ (TPU) ใดก็ตามที่กำหนด ตรวจสอบ คู่มือกลยุทธ์การจัดจำหน่าย สำหรับข้อมูลเพิ่มเติม

เพื่อแสดงสิ่งนี้ ให้สร้างวัตถุ tf.distribute.TPUStrategy :

strategy = tf.distribute.TPUStrategy(resolver)

INFO:tensorflow:Found TPU system:
INFO:tensorflow:Found TPU system:
INFO:tensorflow:*** Num TPU Cores: 8
INFO:tensorflow:*** Num TPU Cores: 8
INFO:tensorflow:*** Num TPU Workers: 1
INFO:tensorflow:*** Num TPU Workers: 1
INFO:tensorflow:*** Num TPU Cores Per Worker: 8
INFO:tensorflow:*** Num TPU Cores Per Worker: 8
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:3, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:3, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:4, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:4, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:5, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:5, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:6, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:6, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:7, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:7, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0)

หากต้องการจำลองการคำนวณเพื่อให้สามารถทำงานในคอร์ TPU ทั้งหมดได้ คุณสามารถส่งผ่านไปยัง strategy.run API ด้านล่างนี้คือตัวอย่างที่แสดงคอร์ทั้งหมดที่ได้รับอินพุตเดียวกัน (a, b) และทำการคูณเมทริกซ์บนแต่ละคอร์อย่างอิสระ ผลลัพธ์จะเป็นค่าจากการจำลองทั้งหมด

@tf.function
def matmul_fn(x, y):
  z = tf.matmul(x, y)
  return z

z = strategy.run(matmul_fn, args=(a, b))
print(z)

PerReplica:{
  0: tf.Tensor(
[[22. 28.]
 [49. 64.]], shape=(2, 2), dtype=float32),
  1: tf.Tensor(
[[22. 28.]
 [49. 64.]], shape=(2, 2), dtype=float32),
  2: tf.Tensor(
[[22. 28.]
 [49. 64.]], shape=(2, 2), dtype=float32),
  3: tf.Tensor(
[[22. 28.]
 [49. 64.]], shape=(2, 2), dtype=float32),
  4: tf.Tensor(
[[22. 28.]
 [49. 64.]], shape=(2, 2), dtype=float32),
  5: tf.Tensor(
[[22. 28.]
 [49. 64.]], shape=(2, 2), dtype=float32),
  6: tf.Tensor(
[[22. 28.]
 [49. 64.]], shape=(2, 2), dtype=float32),
  7: tf.Tensor(
[[22. 28.]
 [49. 64.]], shape=(2, 2), dtype=float32)
}

การจำแนกประเภทบน TPUs

เมื่อครอบคลุมแนวคิดพื้นฐานแล้ว ให้พิจารณาตัวอย่างที่เป็นรูปธรรมมากขึ้น ส่วนนี้สาธิตวิธีใช้กลยุทธ์การแจกจ่าย tf.distribute.TPUStrategy เพื่อฝึกโมเดล Keras บน Cloud TPU

กำหนดแบบจำลอง Keras

เริ่มต้นด้วยคำจำกัดความของโมเดล Sequential Keras สำหรับการจัดประเภทรูปภาพบนชุดข้อมูล MNIST โดยใช้ Keras ไม่ต่างจากสิ่งที่คุณจะใช้หากคุณกำลังฝึกเกี่ยวกับ CPU หรือ GPU โปรดทราบว่าการสร้างโมเดล Keras ต้องอยู่ภายใน strategy.scope ดังนั้นจึงสร้างตัวแปรได้บนอุปกรณ์ TPU แต่ละเครื่อง ส่วนอื่นๆ ของโค้ดไม่จำเป็นต้องอยู่ในขอบเขตของกลยุทธ์

def create_model():
  return tf.keras.Sequential(
      [tf.keras.layers.Conv2D(256, 3, activation='relu', input_shape=(28, 28, 1)),
       tf.keras.layers.Conv2D(256, 3, activation='relu'),
       tf.keras.layers.Flatten(),
       tf.keras.layers.Dense(256, activation='relu'),
       tf.keras.layers.Dense(128, activation='relu'),
       tf.keras.layers.Dense(10)])

โหลดชุดข้อมูล

การใช้ tf.data.Dataset API อย่างมีประสิทธิภาพเป็นสิ่งสำคัญเมื่อใช้ Cloud TPU เนื่องจากคุณจะใช้ Cloud TPU ไม่ได้ เว้นแต่คุณจะป้อนข้อมูลได้อย่างรวดเร็วเพียงพอ คุณสามารถเรียนรู้เพิ่มเติมเกี่ยวกับประสิทธิภาพของชุดข้อมูลได้ใน คู่มือประสิทธิภาพไปป์ไลน์อินพุต

สำหรับการทดสอบทั้งหมดยกเว้นการทดสอบที่ง่ายที่สุด (โดยใช้ tf.data.Dataset.from_tensor_slices หรือข้อมูลในกราฟอื่นๆ) คุณต้องจัดเก็บไฟล์ข้อมูลทั้งหมดที่อ่านโดยชุดข้อมูลในบัคเก็ต Google Cloud Storage (GCS)

สำหรับกรณีการใช้งานส่วนใหญ่ ขอแนะนำให้แปลงข้อมูลของคุณเป็นรูปแบบ TFRecord และใช้ tf.data.TFRecordDataset เพื่ออ่าน ตรวจสอบ TFRecord และบทช่วยสอน tf.Example สำหรับรายละเอียดเกี่ยวกับวิธีการทำเช่นนี้ ซึ่งไม่ใช่ข้อกำหนดที่เข้มงวด และคุณสามารถใช้โปรแกรมอ่านชุดข้อมูลอื่นๆ ได้ เช่น tf.data.FixedLengthRecordDataset หรือ tf.data.TextLineDataset

คุณสามารถโหลดชุดข้อมูลขนาดเล็กทั้งหมดลงในหน่วยความจำโดยใช้ tf.data.Dataset.cache

โดยไม่คำนึงถึงรูปแบบข้อมูลที่ใช้ ขอแนะนำอย่างยิ่งให้คุณใช้ไฟล์ขนาดใหญ่ตามลำดับ 100MB นี่เป็นสิ่งสำคัญอย่างยิ่งในการตั้งค่าเครือข่ายนี้ เนื่องจากค่าใช้จ่ายในการเปิดไฟล์สูงขึ้นอย่างมาก

ดังที่แสดงในโค้ดด้านล่าง คุณควรใช้โมดูล tensorflow_datasets เพื่อรับสำเนาข้อมูลการฝึกอบรมและการทดสอบ MNIST โปรดทราบว่า try_gcs ถูกกำหนดให้ใช้สำเนาที่มีอยู่ในที่เก็บข้อมูล GCS สาธารณะ หากไม่ระบุ TPU จะไม่สามารถเข้าถึงข้อมูลที่ดาวน์โหลด

def get_dataset(batch_size, is_training=True):
  split = 'train' if is_training else 'test'
  dataset, info = tfds.load(name='mnist', split=split, with_info=True,
                            as_supervised=True, try_gcs=True)

  # Normalize the input data.
  def scale(image, label):
    image = tf.cast(image, tf.float32)
    image /= 255.0
    return image, label

  dataset = dataset.map(scale)

  # Only shuffle and repeat the dataset in training. The advantage of having an
  # infinite dataset for training is to avoid the potential last partial batch
  # in each epoch, so that you don't need to think about scaling the gradients
  # based on the actual batch size.
  if is_training:
    dataset = dataset.shuffle(10000)
    dataset = dataset.repeat()

  dataset = dataset.batch(batch_size)

  return dataset

ฝึกโมเดลโดยใช้ API ระดับสูงของ Keras

คุณสามารถฝึกโมเดลของคุณด้วย Keras fit และ compile API ในขั้นตอนนี้ไม่มีสิ่งใดเฉพาะสำหรับ TPU คุณเขียนโค้ดเหมือนกับว่าคุณกำลังใช้ GPU หลายตัวและ MirroredStrategy แทน TPUStrategy คุณสามารถเรียนรู้เพิ่มเติมในการ ฝึกอบรมแบบกระจายด้วยบทช่วยสอน Keras

with strategy.scope():
  model = create_model()
  model.compile(optimizer='adam',
                loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                metrics=['sparse_categorical_accuracy'])

batch_size = 200
steps_per_epoch = 60000 // batch_size
validation_steps = 10000 // batch_size

train_dataset = get_dataset(batch_size, is_training=True)
test_dataset = get_dataset(batch_size, is_training=False)

model.fit(train_dataset,
          epochs=5,
          steps_per_epoch=steps_per_epoch,
          validation_data=test_dataset, 
          validation_steps=validation_steps)

Epoch 1/5
300/300 [==============================] - 18s 32ms/step - loss: 0.1433 - sparse_categorical_accuracy: 0.9564 - val_loss: 0.0452 - val_sparse_categorical_accuracy: 0.9859
Epoch 2/5
300/300 [==============================] - 6s 21ms/step - loss: 0.0335 - sparse_categorical_accuracy: 0.9898 - val_loss: 0.0318 - val_sparse_categorical_accuracy: 0.9899
Epoch 3/5
300/300 [==============================] - 6s 21ms/step - loss: 0.0199 - sparse_categorical_accuracy: 0.9935 - val_loss: 0.0397 - val_sparse_categorical_accuracy: 0.9866
Epoch 4/5
300/300 [==============================] - 6s 21ms/step - loss: 0.0109 - sparse_categorical_accuracy: 0.9964 - val_loss: 0.0436 - val_sparse_categorical_accuracy: 0.9892
Epoch 5/5
300/300 [==============================] - 6s 21ms/step - loss: 0.0103 - sparse_categorical_accuracy: 0.9963 - val_loss: 0.0481 - val_sparse_categorical_accuracy: 0.9881
<keras.callbacks.History at 0x7f0d485602e8>

เพื่อลดโอเวอร์เฮดของ Python และเพิ่มประสิทธิภาพของ TPU ให้สูงสุด ให้ส่งอาร์กิวเมนต์— steps_per_execution —to Model.compile ในตัวอย่างนี้ จะเพิ่มปริมาณงานประมาณ 50%:

with strategy.scope():
  model = create_model()
  model.compile(optimizer='adam',
                # Anything between 2 and `steps_per_epoch` could help here.
                steps_per_execution = 50,
                loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                metrics=['sparse_categorical_accuracy'])

model.fit(train_dataset,
          epochs=5,
          steps_per_epoch=steps_per_epoch,
          validation_data=test_dataset,
          validation_steps=validation_steps)

Epoch 1/5
300/300 [==============================] - 12s 41ms/step - loss: 0.1515 - sparse_categorical_accuracy: 0.9537 - val_loss: 0.0416 - val_sparse_categorical_accuracy: 0.9863
Epoch 2/5
300/300 [==============================] - 3s 10ms/step - loss: 0.0366 - sparse_categorical_accuracy: 0.9891 - val_loss: 0.0410 - val_sparse_categorical_accuracy: 0.9875
Epoch 3/5
300/300 [==============================] - 3s 10ms/step - loss: 0.0191 - sparse_categorical_accuracy: 0.9938 - val_loss: 0.0432 - val_sparse_categorical_accuracy: 0.9865
Epoch 4/5
300/300 [==============================] - 3s 10ms/step - loss: 0.0141 - sparse_categorical_accuracy: 0.9951 - val_loss: 0.0447 - val_sparse_categorical_accuracy: 0.9875
Epoch 5/5
300/300 [==============================] - 3s 11ms/step - loss: 0.0093 - sparse_categorical_accuracy: 0.9968 - val_loss: 0.0426 - val_sparse_categorical_accuracy: 0.9884
<keras.callbacks.History at 0x7f0d0463cd68>

ฝึกโมเดลโดยใช้ลูปการฝึกแบบกำหนดเอง

คุณยังสามารถสร้างและฝึกโมเดลของคุณโดยใช้ tf.function และ tf.distribute API ได้โดยตรง คุณสามารถใช้ strategy.experimental_distribute_datasets_from_function API เพื่อแจกจ่ายชุดข้อมูลที่กำหนดฟังก์ชันชุดข้อมูล โปรดทราบว่าในตัวอย่างด้านล่างขนาดชุดงานที่ส่งผ่านไปยังชุดข้อมูลจะเป็นขนาดชุดงานต่อแบบจำลองแทนที่จะเป็นขนาดชุดงานส่วนกลาง หากต้องการเรียนรู้เพิ่มเติม โปรดดูการ ฝึกอบรมแบบกำหนดเองด้วย tf.distribute.Strategy บทแนะนำ

ขั้นแรก สร้างโมเดล ชุดข้อมูล และ tf.functions:

# Create the model, optimizer and metrics inside the strategy scope, so that the
# variables can be mirrored on each device.
with strategy.scope():
  model = create_model()
  optimizer = tf.keras.optimizers.Adam()
  training_loss = tf.keras.metrics.Mean('training_loss', dtype=tf.float32)
  training_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(
      'training_accuracy', dtype=tf.float32)

# Calculate per replica batch size, and distribute the datasets on each TPU
# worker.
per_replica_batch_size = batch_size // strategy.num_replicas_in_sync

train_dataset = strategy.experimental_distribute_datasets_from_function(
    lambda _: get_dataset(per_replica_batch_size, is_training=True))

@tf.function
def train_step(iterator):
  """The step function for one training step."""

  def step_fn(inputs):
    """The computation to run on each TPU device."""
    images, labels = inputs
    with tf.GradientTape() as tape:
      logits = model(images, training=True)
      loss = tf.keras.losses.sparse_categorical_crossentropy(
          labels, logits, from_logits=True)
      loss = tf.nn.compute_average_loss(loss, global_batch_size=batch_size)
    grads = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(list(zip(grads, model.trainable_variables)))
    training_loss.update_state(loss * strategy.num_replicas_in_sync)
    training_accuracy.update_state(labels, logits)

  strategy.run(step_fn, args=(next(iterator),))

WARNING:tensorflow:From <ipython-input-1-5625c2a14441>:15: StrategyBase.experimental_distribute_datasets_from_function (from tensorflow.python.distribute.distribute_lib) is deprecated and will be removed in a future version.
Instructions for updating:
rename to distribute_datasets_from_function
WARNING:tensorflow:From <ipython-input-1-5625c2a14441>:15: StrategyBase.experimental_distribute_datasets_from_function (from tensorflow.python.distribute.distribute_lib) is deprecated and will be removed in a future version.
Instructions for updating:
rename to distribute_datasets_from_function

จากนั้นรันลูปการฝึก:

steps_per_eval = 10000 // batch_size

train_iterator = iter(train_dataset)
for epoch in range(5):
  print('Epoch: {}/5'.format(epoch))

  for step in range(steps_per_epoch):
    train_step(train_iterator)
  print('Current step: {}, training loss: {}, accuracy: {}%'.format(
      optimizer.iterations.numpy(),
      round(float(training_loss.result()), 4),
      round(float(training_accuracy.result()) * 100, 2)))
  training_loss.reset_states()
  training_accuracy.reset_states()

Epoch: 0/5
Current step: 300, training loss: 0.1339, accuracy: 95.79%
Epoch: 1/5
Current step: 600, training loss: 0.0333, accuracy: 98.91%
Epoch: 2/5
Current step: 900, training loss: 0.0176, accuracy: 99.43%
Epoch: 3/5
Current step: 1200, training loss: 0.0126, accuracy: 99.61%
Epoch: 4/5
Current step: 1500, training loss: 0.0122, accuracy: 99.61%

ปรับปรุงประสิทธิภาพด้วยหลายขั้นตอนภายใน `tf.function`

คุณสามารถปรับปรุงประสิทธิภาพได้ด้วยการรันหลายขั้นตอนภายใน tf.function สิ่งนี้ทำได้โดยปิดการเรียก strategy.run ด้วย tf.range ภายใน tf.function และ AutoGraph จะแปลงเป็น tf.while_loop บนคนงาน TPU

แม้จะมีประสิทธิภาพที่ดีขึ้น แต่ก็มีข้อเสียในวิธีนี้เมื่อเทียบกับการรันขั้นตอนเดียวภายใน tf.function การรันหลายขั้นตอนใน tf.function นั้นมีความยืดหยุ่นน้อยกว่า—คุณไม่สามารถเรียกใช้โค้ด Python อย่างกระตือรือร้นหรือโดยพลการภายในขั้นตอนได้

@tf.function
def train_multiple_steps(iterator, steps):
  """The step function for one training step."""

  def step_fn(inputs):
    """The computation to run on each TPU device."""
    images, labels = inputs
    with tf.GradientTape() as tape:
      logits = model(images, training=True)
      loss = tf.keras.losses.sparse_categorical_crossentropy(
          labels, logits, from_logits=True)
      loss = tf.nn.compute_average_loss(loss, global_batch_size=batch_size)
    grads = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(list(zip(grads, model.trainable_variables)))
    training_loss.update_state(loss * strategy.num_replicas_in_sync)
    training_accuracy.update_state(labels, logits)

  for _ in tf.range(steps):
    strategy.run(step_fn, args=(next(iterator),))

# Convert `steps_per_epoch` to `tf.Tensor` so the `tf.function` won't get 
# retraced if the value changes.
train_multiple_steps(train_iterator, tf.convert_to_tensor(steps_per_epoch))

print('Current step: {}, training loss: {}, accuracy: {}%'.format(
      optimizer.iterations.numpy(),
      round(float(training_loss.result()), 4),
      round(float(training_accuracy.result()) * 100, 2)))

Current step: 1800, training loss: 0.0081, accuracy: 99.74%

ตัวยึดตำแหน่ง22

ขั้นตอนถัดไป

เอกสารประกอบ Google Cloud TPU : วิธีตั้งค่าและเรียกใช้ Google Cloud TPU
โน้ตบุ๊ก Google Cloud TPU Colab : ตัวอย่างการฝึกอบรมแบบครบวงจร
คู่มือประสิทธิภาพ Google Cloud TPU : เพิ่มประสิทธิภาพ Cloud TPU เพิ่มเติมโดยการปรับพารามิเตอร์การกำหนดค่า Cloud TPU สำหรับแอปพลิเคชันของคุณ
การฝึกอบรมแบบกระจายด้วย TensorFlow : วิธีใช้กลยุทธ์การจัดจำหน่าย รวมถึง tf.distribute.TPUStrategy พร้อมตัวอย่างที่แสดงแนวทางปฏิบัติที่ดีที่สุด