หน้านี้ได้รับการแปลโดย Cloud Translation API

ใช้ GPU

ดูบน TensorFlow.org

ทำงานใน Google Colab

ดูแหล่งที่มาบน GitHub

ดาวน์โหลดโน๊ตบุ๊ค

รหัส TensorFlow และรุ่น tf.keras จะทำงานอย่างโปร่งใสบน GPU ตัวเดียวโดยไม่ต้องเปลี่ยนรหัส

วิธีที่ง่ายที่สุดในการรันบน GPU หลายตัว บนเครื่องเดียวหรือหลายเครื่อง คือการใช้ Distribution Strategies

คู่มือนี้มีไว้สำหรับผู้ใช้ที่ลองใช้แนวทางเหล่านี้แล้วและพบว่าพวกเขาต้องการการควบคุมที่ละเอียดถี่ถ้วนว่า TensorFlow ใช้ GPU อย่างไร หากต้องการเรียนรู้วิธีดีบักปัญหาด้านประสิทธิภาพสำหรับสถานการณ์ GPU ตัวเดียวและหลายตัว โปรดดูที่คู่มือการ ปรับประสิทธิภาพของ TensorFlow GPU ให้เหมาะสม

ติดตั้ง

ตรวจสอบให้แน่ใจว่าคุณได้ติดตั้ง TensorFlow gpu รุ่นล่าสุด

import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

Num GPUs Available:  1

ภาพรวม

TensorFlow รองรับการคำนวณที่กำลังรันบนอุปกรณ์ประเภทต่างๆ รวมถึง CPU และ GPU โดยจะแสดงด้วยตัวระบุสตริง เช่น

"/device:CPU:0" : CPU ของเครื่องของคุณ
"/GPU:0" : สัญกรณ์สั้นสำหรับ GPU ตัวแรกของเครื่องของคุณที่ TensorFlow มองเห็น
"/job:localhost/replica:0/task:0/device:GPU:1" : ชื่อแบบเต็มของ GPU ตัวที่สองในเครื่องของคุณที่ TensorFlow มองเห็นได้

หากการดำเนินการ TensorFlow มีทั้งการใช้งาน CPU และ GPU โดยค่าเริ่มต้น อุปกรณ์ GPU จะได้รับการจัดลำดับความสำคัญเมื่อมีการกำหนดการดำเนินการ ตัวอย่างเช่น tf.matmul มีทั้งเคอร์เนลของ CPU และ GPU และในระบบที่มีอุปกรณ์ CPU:0 และ GPU:0 GPU:0 จะถูกเลือกให้เรียกใช้ tf.matmul เว้นแต่คุณจะร้องขอให้เรียกใช้บนอุปกรณ์อื่นอย่างชัดแจ้ง

หากการทำงานของ TensorFlow ไม่มีการใช้งาน GPU ที่สอดคล้องกัน การดำเนินการดังกล่าวจะกลับไปที่อุปกรณ์ CPU ตัวอย่างเช่น เนื่องจาก tf.cast มีเฉพาะเคอร์เนลของ CPU ในระบบที่มีอุปกรณ์ CPU:0 และ GPU:0 อุปกรณ์ CPU:0 จะถูกเลือกให้เรียกใช้ tf.cast แม้ว่าจะได้รับการร้องขอให้ทำงานบนอุปกรณ์ GPU:0 .

กำลังบันทึกตำแหน่งอุปกรณ์

หากต้องการค้นหาว่าการทำงานและเทนเซอร์ของคุณถูกกำหนดให้กับอุปกรณ์ใด ให้ใส่ tf.debugging.set_log_device_placement(True) เป็นคำสั่งแรกของโปรแกรมของคุณ การเปิดใช้งานการบันทึกตำแหน่งของอุปกรณ์จะทำให้พิมพ์การจัดสรรหรือการดำเนินการของเทนเซอร์

tf.debugging.set_log_device_placement(True)

# Create some tensors
a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
c = tf.matmul(a, b)

print(c)

Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0
tf.Tensor(
[[22. 28.]
 [49. 64.]], shape=(2, 2), dtype=float32)

รหัสด้านบนจะพิมพ์การบ่งชี้ว่า MatMul op ถูกดำเนินการบน GPU:0

การจัดวางอุปกรณ์ด้วยตนเอง

หากคุณต้องการให้การดำเนินการบางอย่างทำงานบนอุปกรณ์ที่คุณเลือกแทนการทำงานที่เลือกโดยอัตโนมัติ คุณสามารถใช้ with tf.device เพื่อสร้างบริบทของอุปกรณ์ และการดำเนินการทั้งหมดภายในบริบทนั้นจะทำงานบนอุปกรณ์ที่กำหนดเดียวกัน .

tf.debugging.set_log_device_placement(True)

# Place tensors on the CPU
with tf.device('/CPU:0'):
  a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
  b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])

# Run on the GPU
c = tf.matmul(a, b)
print(c)

Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0
tf.Tensor(
[[22. 28.]
 [49. 64.]], shape=(2, 2), dtype=float32)

คุณจะเห็นว่าตอนนี้ a และ b ถูกกำหนดให้กับ CPU:0 เนื่องจากไม่ได้ระบุอุปกรณ์ไว้อย่างชัดเจนสำหรับการทำงานของ MatMul รันไทม์ของ TensorFlow จะเลือกหนึ่งอุปกรณ์ตามการทำงานและอุปกรณ์ที่มีอยู่ ( GPU:0 ในตัวอย่างนี้) และคัดลอกเทนเซอร์ระหว่างอุปกรณ์โดยอัตโนมัติหากจำเป็น

จำกัดการเติบโตของหน่วยความจำ GPU

โดยค่าเริ่มต้น TensorFlow จะจับคู่หน่วยความจำ GPU เกือบทั้งหมดของ GPU ทั้งหมด (ขึ้นอยู่กับ CUDA_VISIBLE_DEVICES ) ที่มองเห็นได้ในกระบวนการ สิ่งนี้ทำเพื่อใช้ทรัพยากรหน่วยความจำ GPU อันมีค่าบนอุปกรณ์ได้อย่างมีประสิทธิภาพมากขึ้นโดยลดการกระจายตัวของหน่วยความจำ ในการจำกัด TensorFlow ไว้ที่ชุด GPU เฉพาะ ให้ใช้วิธี tf.config.set_visible_devices

gpus = tf.config.list_physical_devices('GPU')
if gpus:
  # Restrict TensorFlow to only use the first GPU
  try:
    tf.config.set_visible_devices(gpus[0], 'GPU')
    logical_gpus = tf.config.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPU")
  except RuntimeError as e:
    # Visible devices must be set before GPUs have been initialized
    print(e)

1 Physical GPUs, 1 Logical GPU

ในบางกรณี กระบวนการควรจัดสรรเฉพาะชุดย่อยของหน่วยความจำที่พร้อมใช้งาน หรือเพิ่มเฉพาะการใช้หน่วยความจำตามที่กระบวนการต้องการ TensorFlow มี 2 วิธีในการควบคุมสิ่งนี้

ตัวเลือกแรกคือเปิดการเติบโตของหน่วยความจำโดยเรียก tf.config.experimental.set_memory_growth ซึ่งพยายามจัดสรรหน่วยความจำ GPU ให้มากเท่าที่จำเป็นสำหรับการจัดสรรรันไทม์ โดยเริ่มจากการจัดสรรหน่วยความจำเพียงเล็กน้อย และเมื่อโปรแกรมเริ่มทำงานและ ต้องการหน่วยความจำ GPU มากขึ้น พื้นที่หน่วยความจำ GPU ถูกขยายสำหรับกระบวนการ TensorFlow หน่วยความจำจะไม่ถูกปล่อยออกมาเนื่องจากอาจทำให้เกิดการกระจายตัวของหน่วยความจำได้ หากต้องการเปิดการเติบโตของหน่วยความจำสำหรับ GPU เฉพาะ ให้ใช้รหัสต่อไปนี้ก่อนที่จะจัดสรรเทนเซอร์หรือดำเนินการใด ๆ

gpus = tf.config.list_physical_devices('GPU')
if gpus:
  try:
    # Currently, memory growth needs to be the same across GPUs
    for gpu in gpus:
      tf.config.experimental.set_memory_growth(gpu, True)
    logical_gpus = tf.config.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Memory growth must be set before GPUs have been initialized
    print(e)

Physical devices cannot be modified after being initialized

อีกวิธีหนึ่งในการเปิดใช้งานตัวเลือกนี้คือการตั้งค่าตัวแปรสภาพแวดล้อม TF_FORCE_GPU_ALLOW_GROWTH true การกำหนดค่านี้เป็นแพลตฟอร์มเฉพาะ

วิธีที่สองคือการกำหนดค่าอุปกรณ์ GPU เสมือนด้วย tf.config.set_logical_device_configuration และตั้งค่าฮาร์ดจำกัดในหน่วยความจำทั้งหมดเพื่อจัดสรรบน GPU

gpus = tf.config.list_physical_devices('GPU')
if gpus:
  # Restrict TensorFlow to only allocate 1GB of memory on the first GPU
  try:
    tf.config.set_logical_device_configuration(
        gpus[0],
        [tf.config.LogicalDeviceConfiguration(memory_limit=1024)])
    logical_gpus = tf.config.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Virtual devices must be set before GPUs have been initialized
    print(e)

Virtual devices cannot be modified after being initialized

สิ่งนี้มีประโยชน์หากคุณต้องการผูกจำนวนหน่วยความจำ GPU ที่มีให้กับกระบวนการ TensorFlow อย่างแท้จริง นี่เป็นแนวทางปฏิบัติทั่วไปสำหรับการพัฒนาในพื้นที่เมื่อมีการแชร์ GPU กับแอปพลิเคชันอื่นๆ เช่น GUI ของเวิร์กสเตชัน

การใช้ GPU ตัวเดียวบนระบบ multi-GPU

หากคุณมี GPU มากกว่าหนึ่งตัวในระบบของคุณ ระบบจะเลือก GPU ที่มี ID ต่ำสุดตามค่าเริ่มต้น หากคุณต้องการใช้ GPU อื่น คุณจะต้องระบุการตั้งค่าให้ชัดเจน:

tf.debugging.set_log_device_placement(True)

try:
  # Specify an invalid GPU device
  with tf.device('/device:GPU:2'):
    a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
    b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
    c = tf.matmul(a, b)
except RuntimeError as e:
  print(e)

Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0

หากไม่มีอุปกรณ์ที่คุณระบุ คุณจะได้รับ RuntimeError : .../device:GPU:2 unknown device

หากคุณต้องการให้ TensorFlow เลือกอุปกรณ์ที่มีอยู่และรองรับการทำงานโดยอัตโนมัติในกรณีที่ไม่มีอุปกรณ์ที่ระบุ คุณสามารถโทร tf.config.set_soft_device_placement(True)

tf.config.set_soft_device_placement(True)
tf.debugging.set_log_device_placement(True)

# Creates some tensors
a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
c = tf.matmul(a, b)

print(c)

Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0
tf.Tensor(
[[22. 28.]
 [49. 64.]], shape=(2, 2), dtype=float32)

ใช้ GPU หลายตัว

การพัฒนา GPU หลายตัวจะช่วยให้โมเดลสามารถปรับขนาดด้วยทรัพยากรเพิ่มเติมได้ หากพัฒนาบนระบบด้วย GPU เดียว คุณสามารถจำลอง GPU หลายตัวด้วยอุปกรณ์เสมือนได้ ซึ่งช่วยให้ทดสอบการตั้งค่า GPU หลายตัวได้ง่ายโดยไม่ต้องใช้ทรัพยากรเพิ่มเติม

gpus = tf.config.list_physical_devices('GPU')
if gpus:
  # Create 2 virtual GPUs with 1GB memory each
  try:
    tf.config.set_logical_device_configuration(
        gpus[0],
        [tf.config.LogicalDeviceConfiguration(memory_limit=1024),
         tf.config.LogicalDeviceConfiguration(memory_limit=1024)])
    logical_gpus = tf.config.list_logical_devices('GPU')
    print(len(gpus), "Physical GPU,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Virtual devices must be set before GPUs have been initialized
    print(e)

Virtual devices cannot be modified after being initialized

เมื่อมี GPU ลอจิกหลายตัวที่พร้อมใช้งานสำหรับรันไทม์ คุณสามารถใช้ GPU หลายตัวด้วย tf.distribute.Strategy หรือด้วยการจัดวางด้วยตนเอง

ด้วย `tf.distribute.Strategy`

แนวทางปฏิบัติที่ดีที่สุดสำหรับการใช้ GPU หลายตัวคือการใช้ tf.distribute.Strategy นี่เป็นตัวอย่างง่ายๆ:

tf.debugging.set_log_device_placement(True)
gpus = tf.config.list_logical_devices('GPU')
strategy = tf.distribute.MirroredStrategy(gpus)
with strategy.scope():
  inputs = tf.keras.layers.Input(shape=(1,))
  predictions = tf.keras.layers.Dense(1)(inputs)
  model = tf.keras.models.Model(inputs=inputs, outputs=predictions)
  model.compile(loss='mse',
                optimizer=tf.keras.optimizers.SGD(learning_rate=0.2))

Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op RandomUniform in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Sub in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Mul in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op AddV2 in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Fill in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Fill in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Identity in device /job:localhost/replica:0/task:0/device:CPU:0
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Identity in device /job:localhost/replica:0/task:0/device:CPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Fill in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Identity in device /job:localhost/replica:0/task:0/device:CPU:0
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Identity in device /job:localhost/replica:0/task:0/device:CPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0

โปรแกรมนี้จะทำสำเนาโมเดลของคุณบน GPU แต่ละตัว โดยแยกข้อมูลอินพุตระหว่างกัน หรือที่เรียกว่า " data parallelism "

สำหรับข้อมูลเพิ่มเติมเกี่ยวกับกลยุทธ์การจัดจำหน่าย โปรดดูคำแนะนำ ที่นี่

การจัดวางด้วยมือ

tf.distribute.Strategy ทำงานภายใต้ประทุนโดยการจำลองการคำนวณข้ามอุปกรณ์ คุณสามารถใช้การจำลองแบบด้วยตนเองโดยสร้างโมเดลของคุณบน GPU แต่ละตัว ตัวอย่างเช่น:

tf.debugging.set_log_device_placement(True)

gpus = tf.config.list_logical_devices('GPU')
if gpus:
  # Replicate your computation on multiple GPUs
  c = []
  for gpu in gpus:
    with tf.device(gpu.name):
      a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
      b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
      c.append(tf.matmul(a, b))

  with tf.device('/CPU:0'):
    matmul_sum = tf.add_n(c)

  print(matmul_sum)

Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0
tf.Tensor(
[[22. 28.]
 [49. 64.]], shape=(2, 2), dtype=float32)

ตัวยึดตำแหน่ง22