GPU を使用する

コレクションでコンテンツを整理 必要に応じて、コンテンツの保存と分類を行います。

TensorFlow.org で表示 Google Colab で実行 GitHub でソースを表示 ノートブックをダウンロード

TensorFlow のコードとtf.kerasモデルは、コードを変更することなく単一の GPU で透過的に実行されます。

注意: tf.config.list_physical_devices('GPU')を使用して、TensorFlow が GPU を使用していることを確認してください。

単一または複数のマシンで複数の GPU を実行する最も簡単な方法は、分散ストラテジーの使用です。

このガイドは、これらのアプローチを試し、TensorFlow の GPU 使用方法を細かく制御する必要があることに気づいたユーザーを対象としています。シングル GPU とマルチ GPU のシナリオにおいてパフォーマンスをデバッグする方法については、「TensorFlow GPU パフォーマンスを最適化する」ガイドをご覧ください。

セットアップ

最新リリースの TensorFlow GPU がインストールされていることを確認します。

import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
2022-08-09 06:13:50.756704: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-08-09 06:13:51.429770: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvrtc.so.11.1: cannot open shared object file: No such file or directory
2022-08-09 06:13:51.429990: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvrtc.so.11.1: cannot open shared object file: No such file or directory
2022-08-09 06:13:51.430001: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Num GPUs Available:  4

概要

TensorFlow は、CPU や GPU など、さまざまなタイプのデバイスでの計算の実行をサポートしています。それらは、例えば文字列の識別子で表されています。

  • "/device:CPU:0": マシンの CPU。
  • "/GPU:0": TensorFlow が認識するマシンの最初の GPU の簡略表記。
  • "/job:localhost/replica:0/task:0/device:GPU:1": TensorFlow が認識するマシンの 2 番目の GPU の完全修飾名。

TensorFlow 演算に CPU と GPU 実装の両方を持つ場合、演算がデバイスに割り当てられるときに、デフォルトでは GPU デバイスが優先されます。たとえば、tf.matmul は CPU と GPU カーネルの両方を持ちます。デバイス CPU:0GPU:0 を持つシステム上では、それを他のデバイス上で実行することを明示的に要求しない限りは、GPU:0 デバイスが tf.matmul の実行に選択されます。

TensorFlow 演算に対応する GPU 実装がない場合は、演算は CPU デバイスにフォールバックします。たとえば、tf.cast には CPU カーネルしかないため、デバイス CPU:0GPU:0 のあるシステムでは、CPU:0 デバイスが tf.cast の実行に選択され、GPU:0 デバイスでの実行をリクエストされても無視されます。

デバイスの配置をログに記録する

演算と tensor がどのデバイスに割り当てられたかを確認するには、tf.debugging.set_log_device_placement(True)をプログラムの最初のステートメントとして置きます。デバイス配置ログを有効にすると、テンソルの割り当てや演算が出力されます。

tf.debugging.set_log_device_placement(True)

# Create some tensors
a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
c = tf.matmul(a, b)

print(c)
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0
tf.Tensor(
[[22. 28.]
 [49. 64.]], shape=(2, 2), dtype=float32)

上記のコードの出力は、GPU:0MatMul演算が実行されたことを示します。

手動でデバイスを配置する

自動的に選択されたものの代わりに、自分が選択したデバイスで特定の演算を実行する場合は、デバイスコンテキストの作成にwith tf.deviceを使用すると、そのコンテキスト内のすべての演算は同じ指定されたデバイス上で実行されます。

tf.debugging.set_log_device_placement(True)

# Place tensors on the CPU
with tf.device('/CPU:0'):
  a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
  b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])

c = tf.matmul(a, b)
print(c)
Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0
tf.Tensor(
[[22. 28.]
 [49. 64.]], shape=(2, 2), dtype=float32)

これでabCPU:0に割り当てられたことが分かります。MatMul演算に対してデバイスが明示的に指定されていないため、TensorFlow ランタイムは演算と利用可能なデバイスに基づいてデバイスを選択し(この例ではGPU:0)、必要に応じて自動的にデバイス間でテンソルをコピーします。

GPU のメモリ増加を制限する

デフォルトでは、TensorFlow は(CUDA_VISIBLE_DEVICESに従い)プロセスが認識する全 GPU の ほぼ全てのGPU メモリをマップします。これはメモリの断片化を減らしてデバイス上のかなり貴重な GPU メモリリソースをより効率的に使用するために行われます。TensorFlow を特定の GPU セットに制限するには、tf.config.set_visible_devicesメソッドを使用します。

gpus = tf.config.list_physical_devices('GPU')
if gpus:
  # Restrict TensorFlow to only use the first GPU
  try:
    tf.config.set_visible_devices(gpus[0], 'GPU')
    logical_gpus = tf.config.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPU")
  except RuntimeError as e:
    # Visible devices must be set before GPUs have been initialized
    print(e)
Visible devices cannot be modified after being initialized

場合によりますが、プロセスが使用可能なメモリのサブセットのみを割り当てるか、またはプロセスに必要とされるメモリ使用量のみを増やすことが望ましいです。TensorFlow は、これを制御する 2 つのメソッドを提供します。

最初のオプションは、tf.config.experimental.set_memory_growth を呼び出してメモリ増大を有効にすることです。これはランタイムの割り当てに必要な GPU メモリだけを割り当てようと試みます。非常に小さいメモリの割り当てから始め、プログラムが実行されてより多くの GPU メモリが必要になってくるにつれて、TensorFlow プロセスに割り当てられる GPU メモリ領域を拡張します。メモリの断面化につながる可能性があるため、メモリは解放されません。特定の GPU のメモリ増大を有効にするためには、テンソルの割り当てや演算の実行の前に次のコードを使用してください。

gpus = tf.config.list_physical_devices('GPU')
if gpus:
  try:
    # Currently, memory growth needs to be the same across GPUs
    for gpu in gpus:
      tf.config.experimental.set_memory_growth(gpu, True)
    logical_gpus = tf.config.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Memory growth must be set before GPUs have been initialized
    print(e)
Physical devices cannot be modified after being initialized

このオプションを有効にするもう 1 つの方法は、環境変数TF_FORCE_GPU_ALLOW_GROWTHtrueに設定するというものです。この構成はプラットフォーム固有です。

2 番目のメソッドは、tf.config.set_logical_device_configurartionで仮想 GPU デバイスを構成し、GPU に割り当てられるメモリ総量を固定することです。

gpus = tf.config.list_physical_devices('GPU')
if gpus:
  # Restrict TensorFlow to only allocate 1GB of memory on the first GPU
  try:
    tf.config.set_logical_device_configuration(
        gpus[0],
        [tf.config.LogicalDeviceConfiguration(memory_limit=1024)])
    logical_gpus = tf.config.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Virtual devices must be set before GPUs have been initialized
    print(e)
Virtual devices cannot be modified after being initialized

これは TensorFlow プロセスに利用可能な GPU メモリの量を正確に抑制する場合に有用です。これは GPU がワークステーション GUI などの他のアプリケーションと共有されている場合のローカル開発では一般的な方法です。

マルチ GPU システムで単一 GPU を使用する

システムに 2 つ以上の GPU が搭載されている場合、デフォルトでは最小の ID を持つ GPU が選択されます。異なる GPU 上で実行する場合には、その選択を明示的に指定する必要があります。

tf.debugging.set_log_device_placement(True)

try:
  # Specify an invalid GPU device
  with tf.device('/device:GPU:2'):
    a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
    b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
    c = tf.matmul(a, b)
except RuntimeError as e:
  print(e)
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:2
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:2
Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:2

指定したデバイスが存在しない場合には、RuntimeError: .../device:GPU:2 unknown deviceが表示されます。

指定されたデバイスが存在しない場合に TensorFlow に既に演算の実行をサポートしているデバイスを自動的に選択させたければ、tf.config.set_soft_device_placement(True)を呼び出すことができます。

tf.config.set_soft_device_placement(True)
tf.debugging.set_log_device_placement(True)

# Creates some tensors
a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
c = tf.matmul(a, b)

print(c)
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0
tf.Tensor(
[[22. 28.]
 [49. 64.]], shape=(2, 2), dtype=float32)

マルチ GPU を使用する

マルチ GPU 向けの開発は、追加リソースでのモデルのスケーリングを可能にします。単一 GPU システム上で開発している場合は、仮想デバイスで複数の GPU をシミュレートできます。これにより、追加のリソースがなくてもマルチ GPU セットアップの簡単なテストが可能になります。

gpus = tf.config.list_physical_devices('GPU')
if gpus:
  # Create 2 virtual GPUs with 1GB memory each
  try:
    tf.config.set_logical_device_configuration(
        gpus[0],
        [tf.config.LogicalDeviceConfiguration(memory_limit=1024),
         tf.config.LogicalDeviceConfiguration(memory_limit=1024)])
    logical_gpus = tf.config.list_logical_devices('GPU')
    print(len(gpus), "Physical GPU,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Virtual devices must be set before GPUs have been initialized
    print(e)
Virtual devices cannot be modified after being initialized

ランタイムで利用可能な複数の論理 GPU を取得したら、tf.distribute.Strategy または手動配置でマルチ GPU を利用できます。

tf.distribute.Strategyを使用する

マルチ GPU を使用するベストプラクティスは、tf.distribute.Strategyを使用することです。次に単純な例を示します。

tf.debugging.set_log_device_placement(True)
gpus = tf.config.list_logical_devices('GPU')
strategy = tf.distribute.MirroredStrategy(gpus)
with strategy.scope():
  inputs = tf.keras.layers.Input(shape=(1,))
  predictions = tf.keras.layers.Dense(1)(inputs)
  model = tf.keras.models.Model(inputs=inputs, outputs=predictions)
  model.compile(loss='mse',
                optimizer=tf.keras.optimizers.SGD(learning_rate=0.2))
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:1
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:1
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:1
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:2
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:2
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:2
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:3
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:3
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:3
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2', '/job:localhost/replica:0/task:0/device:GPU:3')
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op RandomUniform in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Sub in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Mul in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op AddV2 in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:0
input: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0
_EagerConst: (_EagerConst): /job:localhost/replica:0/task:0/device:GPU:0
output_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:0
a: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0
b: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0
MatMul: (MatMul): /job:localhost/replica:0/task:0/device:GPU:0
product_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:0
input: (_Arg): /job:localhost/replica:0/task:0/device:GPU:2
_EagerConst: (_EagerConst): /job:localhost/replica:0/task:0/device:GPU:2
output_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:2
a: (_Arg): /job:localhost/replica:0/task:0/device:GPU:2
b: (_Arg): /job:localhost/replica:0/task:0/device:GPU:2
MatMul: (MatMul): /job:localhost/replica:0/task:0/device:GPU:2
product_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:2
resource_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:0
VarHandleOp: (VarHandleOp): /job:localhost/replica:0/task:0/device:GPU:0
resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0
value: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0
AssignVariableOp: (AssignVariableOp): /job:localhost/replica:0/task:0/device:GPU:0
input: (_Arg): /job:localhost/replica:0/task:0/device:GPU:1
_EagerConst: (_EagerConst): /job:localhost/replica:0/task:0/device:GPU:1
output_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:1
resource_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:1
VarHandleOp: (VarHandleOp): /job:localhost/replica:0/task:0/device:GPU:1
resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:1
value: (_Arg): /job:localhost/replica:0/task:0/device:GPU:1
AssignVariableOp: (AssignVariableOp): /job:localhost/replica:0/task:0/device:GPU:1
resource_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:2
VarHandleOp: (VarHandleOp): /job:localhost/replica:0/task:0/device:GPU:2
resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:2
value: (_Arg): /job:localhost/replica:0/task:0/device:GPU:2
AssignVariableOp: (AssignVariableOp): /job:localhost/replica:0/task:0/device:GPU:2
input: (_Arg): /job:localhost/replica:0/task:0/device:GPU:3
_EagerConst: (_EagerConst): /job:localhost/replica:0/task:0/device:GPU:3
output_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:3
resource_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:3
VarHandleOp: (VarHandleOp): /job:localhost/replica:0/task:0/device:GPU:3
resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:3
value: (_Arg): /job:localhost/replica:0/task:0/device:GPU:3
AssignVariableOp: (AssignVariableOp): /job:localhost/replica:0/task:0/device:GPU:3
input: (_Arg): /job:localhost/replica:0/task:0/device:CPU:0
_EagerConst: (_EagerConst): /job:localhost/replica:0/task:0/device:GPU:0
output_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:0
shape: (_DeviceArg): /job:localhost/replica:0/task:0/device:CPU:0
RandomUniform: (RandomUniform): /job:localhost/replica:0/task:0/device:GPU:0
output_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:0
x: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0
y: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0
Sub: (Sub): /job:localhost/replica:0/task:0/device:GPU:0
z_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:0
x: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0
y: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0
Mul: (Mul): /job:localhost/replica:0/task:0/device:GPU:0
z_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:0
x: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0
y: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0
AddV2: (AddV2): /job:localhost/replica:0/task:0/device:GPU:0
z_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:0
resource_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:0
VarHandleOp: (VarHandleOp): /job:localhost/replica:0/task:0/device:GPU:0
resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0
value: (_Arg): /job:localhost/replica:0/taExecuting op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:1
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:1
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:1
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:2
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:2
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:2
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:3
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:3
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:3
Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Fill in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:1
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:1
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:1
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:2
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:2
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:2
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:3
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:3
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:3
Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
sk:0/device:GPU:0
AssignVariableOp: (AssignVariableOp): /job:localhost/replica:0/task:0/device:GPU:0
resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0
ReadVariableOp: (ReadVariableOp): /job:localhost/replica:0/task:0/device:GPU:0
value_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:0
input: (_Arg): /job:localhost/replica:0/task:0/device:GPU:1
Identity: (Identity): /job:localhost/replica:0/task:0/device:GPU:1
output_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:1
resource_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:1
VarHandleOp: (VarHandleOp): /job:localhost/replica:0/task:0/device:GPU:1
resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:1
value: (_Arg): /job:localhost/replica:0/task:0/device:GPU:1
AssignVariableOp: (AssignVariableOp): /job:localhost/replica:0/task:0/device:GPU:1
input: (_Arg): /job:localhost/replica:0/task:0/device:GPU:2
Identity: (Identity): /job:localhost/replica:0/task:0/device:GPU:2
output_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:2
resource_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:2
VarHandleOp: (VarHandleOp): /job:localhost/replica:0/task:0/device:GPU:2
resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:2
value: (_Arg): /job:localhost/replica:0/task:0/device:GPU:2
AssignVariableOp: (AssignVariableOp): /job:localhost/replica:0/task:0/device:GPU:2
input: (_Arg): /job:localhost/replica:0/task:0/device:GPU:3
Identity: (Identity): /job:localhost/replica:0/task:0/device:GPU:3
output_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:3
resource_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:3
VarHandleOp: (VarHandleOp): /job:localhost/replica:0/task:0/device:GPU:3
resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:3
value: (_Arg): /job:localhost/replica:0/task:0/device:GPU:3
AssignVariableOp: (AssignVariableOp): /job:localhost/replica:0/task:0/device:GPU:3
NoOp: (NoOp): /job:localhost/replica:0/task:0/device:GPU:0
input: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0
Identity: (Identity): /job:localhost/replica:0/task:0/device:GPU:0
output_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:0
dims: (_DeviceArg): /job:localhost/replica:0/task:0/device:CPU:0
value: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0
Fill: (Fill): /job:localhost/replica:0/task:0/device:GPU:0
output_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:0
resource_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:0
VarHandleOp: (VarHandleOp): /job:localhost/replica:0/task:0/device:GPU:0
resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0
value: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0
AssignVariableOp: (AssignVariableOp): /job:localhost/replica:0/task:0/device:GPU:0
resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0
ReadVariableOp: (ReadVariableOp): /job:localhost/replica:0/task:0/device:GPU:0
value_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:0
resource_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:1
VarHandleOp: (VarHandleOp): /job:localhost/replica:0/task:0/device:GPU:1
resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:1
value: (_Arg): /job:localhost/replica:0/task:0/device:GPU:1
AssignVariableOp: (AssignVariableOp): /job:localhost/replica:0/task:0/device:GPU:1
resource_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:2
VarHandleOp: (VarHandleOp): /job:localhost/replica:0/task:0/device:GPU:2
resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:2
value: (_Arg): /job:localhost/replica:0/task:0/device:GPU:2
AssignVariableOp: (AssignVariableOp): /job:localhost/replica:0/task:0/device:GPU:2
resource_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:3
VarHandleOp: (VarHandleOp): /job:localhost/replica:0/task:0/device:GPU:3
resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:3
value: (_Arg): /job:localhost/replica:0/task:0/device:GPU:3
AssignVariableOp: (AssignVariableOp): /job:localhost/replica:0/task:0/device:GPU:3
iExecuting op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:1
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:1
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:1
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:2
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:2
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:2
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:3
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:3
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:3
Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:1
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:1
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:1
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:2
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:2
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:2
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:3
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:3
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:3
Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:1
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:1
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:1
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:2
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:2
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:2
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:3
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:3
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:3
Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Fill in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:1
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:1
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:1
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:2
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:2
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:2
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:3
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:3
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:3
Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:1
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:2
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:3
Executing op AddN in device /job:localhost/replica:0/task:0/device:CPU:0
Executing op Identity in device /job:localhost/replica:0/task:0/device:CPU:0
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:1
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:2
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:3
Executing op AddN in device /job:localhost/replica:0/task:0/device:CPU:0
Executing op Identity in device /job:localhost/replica:0/task:0/device:CPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Fill in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:1
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:1
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:1
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:2
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:2
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:2
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:3
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:3
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:3
Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:1
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:2
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:3
Executing op AddN in device /job:localhost/replica:0/task:0/device:CPU:0
Executing op Identity in device /job:localhost/replica:0/task:0/device:CPU:0
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:1
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:2
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:3
Executing op AddN in device /job:localhost/replica:0/task:0/device:CPU:0
Executing op Identity in device /job:localhost/replica:0/task:0/device:CPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:1
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:1
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:1
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:2
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:2
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:2
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:3
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:3
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:3
Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0

このプログラムは各 GPU でモデルのコピーを実行し、それらの間で入力データを分割します。これは、「データの並列処理」としても知られています。

分散ストラテジーの詳細については、こちらのガイドをご覧ください。

手動で配置する

tf.distribute.Strategyは、内部的には複数のデバイスに渡って計算を複製することによって動作しています。各 GPU 上でモデルを構築することにより、複製を手動で実装することができます。例を示します。

tf.debugging.set_log_device_placement(True)

gpus = tf.config.list_logical_devices('GPU')
if gpus:
  # Replicate your computation on multiple GPUs
  c = []
  for gpu in gpus:
    with tf.device(gpu.name):
      a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
      b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
      c.append(tf.matmul(a, b))

  with tf.device('/CPU:0'):
    matmul_sum = tf.add_n(c)

  print(matmul_sum)
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:1
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:1
Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:1
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:2
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:2
Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:2
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:3
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:3
nput: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0
_EagerConst: (_EagerConst): /job:localhost/replica:0/task:0/device:GPU:0
output_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:0
resource_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:0
VarHandleOp: (VarHandleOp): /job:localhost/replica:0/task:0/device:GPU:0
resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0
value: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0
AssignVariableOp: (AssignVariableOp): /job:localhost/replica:0/task:0/device:GPU:0
resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0
ReadVariableOp: (ReadVariableOp): /job:localhost/replica:0/task:0/device:GPU:0
value_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:0
input: (_Arg): /job:localhost/replica:0/task:0/device:GPU:1
Identity: (Identity): /job:localhost/replica:0/task:0/device:GPU:1
output_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:1
resource_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:1
VarHandleOp: (VarHandleOp): /job:localhost/replica:0/task:0/device:GPU:1
resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:1
value: (_Arg): /job:localhost/replica:0/task:0/device:GPU:1
AssignVariableOp: (AssignVariableOp): /job:localhost/replica:0/task:0/device:GPU:1
input: (_Arg): /job:localhost/replica:0/task:0/device:GPU:2
Identity: (Identity): /job:localhost/replica:0/task:0/device:GPU:2
output_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:2
resource_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:2
VarHandleOp: (VarHandleOp): /job:localhost/replica:0/task:0/device:GPU:2
resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:2
value: (_Arg): /job:localhost/replica:0/task:0/device:GPU:2
AssignVariableOp: (AssignVariableOp): /job:localhost/replica:0/task:0/device:GPU:2
input: (_Arg): /job:localhost/replica:0/task:0/device:GPU:3
Identity: (Identity): /job:localhost/replica:0/task:0/device:GPU:3
output_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:3
resource_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:3
VarHandleOp: (VarHandleOp): /job:localhost/replica:0/task:0/device:GPU:3
resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:3
value: (_Arg): /job:localhost/replica:0/task:0/device:GPU:3
AssignVariableOp: (AssignVariableOp): /job:localhost/replica:0/task:0/device:GPU:3
resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0
ReadVariableOp: (ReadVariableOp): /job:localhost/replica:0/task:0/device:GPU:0
value_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:0
resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:1
ReadVariableOp: (ReadVariableOp): /job:localhost/replica:0/task:0/device:GPU:1
value_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:1
resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:2
ReadVariableOp: (ReadVariableOp): /job:localhost/replica:0/task:0/device:GPU:2
value_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:2
resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:3
ReadVariableOp: (ReadVariableOp): /job:localhost/replica:0/task:0/device:GPU:3
value_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:3
inputs_0: (_Arg): /job:localhost/replica:0/task:0/device:CPU:0
inputs_1: (_Arg): /job:localhost/replica:0/task:0/device:CPU:0
inputs_2: (_Arg): /job:localhost/replica:0/task:0/device:CPU:0
inputs_3: (_Arg): /job:localhost/replica:0/task:0/device:CPU:0
AddN: (AddN): /job:localhost/replica:0/task:0/device:CPU:0
sum_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:CPU:0
input: (_Arg): /job:localhost/replica:0/task:0/device:CPU:0
Identity: (Identity): /job:localhost/replica:0/task:0/device:CPU:0
output_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:CPU:0
a: (_Arg): /job:localhost/replica:0/task:0/device:GPU:1
b: (_Arg): /job:localhost/replica:0/task:0/device:GPU:1
MatMul: (MatMul): /job:localhost/replica:0/task:0/device:GPU:1
product_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:1
a: (_ArExecuting op MatMul in device /job:localhost/replica:0/task:0/device:GPU:3
Executing op AddN in device /job:localhost/replica:0/task:0/device:CPU:0
tf.Tensor(
[[ 88. 112.]
 [196. 256.]], shape=(2, 2), dtype=float32)