API NumPy trên TensorFlow

Xem trên TensorFlow.org

Chạy trong Google Colab

Xem nguồn trên GitHub

Tải xuống sổ ghi chép

Tổng quat

TensorFlow triển khai một tập hợp con của API NumPy , có sẵn dưới dạng tf.experimental.numpy . Điều này cho phép chạy mã NumPy, được tăng tốc bởi TensorFlow, đồng thời cho phép truy cập vào tất cả các API của TensorFlow.

Thành lập

import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
import tensorflow.experimental.numpy as tnp
import timeit

print("Using TensorFlow version %s" % tf.__version__)

Using TensorFlow version 2.6.0

Kích hoạt hành vi NumPy

Để sử dụng tnp làm NumPy, hãy bật hành vi NumPy cho TensorFlow:

tnp.experimental_enable_numpy_behavior()

Lệnh gọi này cho phép quảng bá kiểu trong TensorFlow và cũng thay đổi suy luận kiểu, khi chuyển đổi các từ thành tensor, để tuân thủ nghiêm ngặt hơn tiêu chuẩn NumPy.

Mảng TensorFlow NumPy ND

Một ví dụ của tf.experimental.numpy.ndarray , được gọi là ND Array , đại diện cho một mảng dày đặc đa chiều của một loại dtype nhất định được đặt trên một thiết bị nhất định. Nó là một bí danh của tf.Tensor . Kiểm tra lớp mảng ND để biết các phương thức hữu ích như ndarray.T , ndarray.reshape , ndarray.ravel và các phương thức khác.

Đầu tiên, hãy tạo một đối tượng mảng ND, sau đó gọi các phương thức khác nhau.

# Create an ND array and check out different attributes.
ones = tnp.ones([5, 3], dtype=tnp.float32)
print("Created ND array with shape = %s, rank = %s, "
      "dtype = %s on device = %s\n" % (
          ones.shape, ones.ndim, ones.dtype, ones.device))

# `ndarray` is just an alias to `tf.Tensor`.
print("Is `ones` an instance of tf.Tensor: %s\n" % isinstance(ones, tf.Tensor))

# Try commonly used member functions.
print("ndarray.T has shape %s" % str(ones.T.shape))
print("narray.reshape(-1) has shape %s" % ones.reshape(-1).shape)

Created ND array with shape = (5, 3), rank = 2, dtype = <dtype: 'float32'> on device = /job:localhost/replica:0/task:0/device:GPU:0

Is `ones` an instance of tf.Tensor: True

ndarray.T has shape (3, 5)
narray.reshape(-1) has shape (15,)

Loại khuyến mãi

Các API TensorFlow NumPy có ngữ nghĩa được xác định rõ ràng để chuyển đổi các từ sang mảng ND, cũng như để thực hiện quảng bá kiểu trên các đầu vào của mảng ND. Vui lòng xem np.result_type để biết thêm chi tiết.

Các API TensorFlow giữ nguyên đầu vào tf.Tensor và không thực hiện quảng cáo loại trên chúng, trong khi các API TensorFlow NumPy quảng bá tất cả các đầu vào theo quy tắc xúc tiến loại NumPy. Trong ví dụ tiếp theo, bạn sẽ thực hiện quảng cáo loại. Đầu tiên, hãy chạy bổ sung trên các đầu vào mảng ND của các loại khác nhau và lưu ý các loại đầu ra. API TensorFlow sẽ không cho phép loại quảng cáo nào trong số này.

print("Type promotion for operations")
values = [tnp.asarray(1, dtype=d) for d in
          (tnp.int32, tnp.int64, tnp.float32, tnp.float64)]
for i, v1 in enumerate(values):
  for v2 in values[i + 1:]:
    print("%s + %s => %s" % 
          (v1.dtype.name, v2.dtype.name, (v1 + v2).dtype.name))

Type promotion for operations
int32 + int64 => int64
int32 + float32 => float64
int32 + float64 => float64
int64 + float32 => float64
int64 + float64 => float64
float32 + float64 => float64

Cuối cùng, chuyển đổi các chữ sang mảng ND bằng ndarray.asarray và lưu ý kiểu kết quả.

print("Type inference during array creation")
print("tnp.asarray(1).dtype == tnp.%s" % tnp.asarray(1).dtype.name)
print("tnp.asarray(1.).dtype == tnp.%s\n" % tnp.asarray(1.).dtype.name)

Type inference during array creation
tnp.asarray(1).dtype == tnp.int64
tnp.asarray(1.).dtype == tnp.float64

Khi chuyển đổi các ký tự sang mảng ND, NumPy thích các loại rộng như tnp.int64 và tnp.float64 . Ngược lại, tf.convert_to_tensor thích tf.int32 và tf.float32 để chuyển đổi hằng số thành tf.Tensor . Các API TensorFlow NumPy tuân thủ hành vi NumPy đối với số nguyên. Đối với float, đối số prefer_float32 của experimental_enable_numpy_behavior cho phép bạn kiểm soát xem có thích tf.float32 hơn tf.float64 (mặc định là False ) hay không. Ví dụ:

tnp.experimental_enable_numpy_behavior(prefer_float32=True)
print("When prefer_float32 is True:")
print("tnp.asarray(1.).dtype == tnp.%s" % tnp.asarray(1.).dtype.name)
print("tnp.add(1., 2.).dtype == tnp.%s" % tnp.add(1., 2.).dtype.name)

tnp.experimental_enable_numpy_behavior(prefer_float32=False)
print("When prefer_float32 is False:")
print("tnp.asarray(1.).dtype == tnp.%s" % tnp.asarray(1.).dtype.name)
print("tnp.add(1., 2.).dtype == tnp.%s" % tnp.add(1., 2.).dtype.name)

When prefer_float32 is True:
tnp.asarray(1.).dtype == tnp.float32
tnp.add(1., 2.).dtype == tnp.float32
When prefer_float32 is False:
tnp.asarray(1.).dtype == tnp.float64
tnp.add(1., 2.).dtype == tnp.float64

Phát thanh truyền hình

Tương tự như TensorFlow, NumPy xác định ngữ nghĩa phong phú cho các giá trị "phát sóng". Bạn có thể xem hướng dẫn phát sóng NumPy để biết thêm thông tin và so sánh điều này với ngữ nghĩa phát sóng TensorFlow .

x = tnp.ones([2, 3])
y = tnp.ones([3])
z = tnp.ones([1, 2, 1])
print("Broadcasting shapes %s, %s and %s gives shape %s" % (
    x.shape, y.shape, z.shape, (x + y + z).shape))

Broadcasting shapes (2, 3), (3,) and (1, 2, 1) gives shape (1, 2, 3)

Lập chỉ mục

NumPy xác định các quy tắc lập chỉ mục rất phức tạp. Xem hướng dẫn Lập chỉ mục NumPy . Lưu ý việc sử dụng mảng ND như các chỉ số bên dưới.

x = tnp.arange(24).reshape(2, 3, 4)

print("Basic indexing")
print(x[1, tnp.newaxis, 1:3, ...], "\n")

print("Boolean indexing")
print(x[:, (True, False, True)], "\n")

print("Advanced indexing")
print(x[1, (0, 0, 1), tnp.asarray([0, 1, 1])])

Basic indexing
tf.Tensor(
[[[16 17 18 19]
  [20 21 22 23]]], shape=(1, 2, 4), dtype=int64) 

Boolean indexing
tf.Tensor(
[[[ 0  1  2  3]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [20 21 22 23]]], shape=(2, 2, 4), dtype=int64) 

Advanced indexing
tf.Tensor([12 13 17], shape=(3,), dtype=int64)

# Mutation is currently not supported
try:
  tnp.arange(6)[1] = -1
except TypeError:
  print("Currently, TensorFlow NumPy does not support mutation.")

Currently, TensorFlow NumPy does not support mutation.

Mô hình mẫu

Tiếp theo, bạn có thể xem cách tạo một mô hình và chạy suy luận trên đó. Mô hình đơn giản này áp dụng một lớp relu theo sau là một phép chiếu tuyến tính. Các phần sau sẽ trình bày cách tính toán độ dốc cho mô hình này bằng cách sử dụng GradientTape của TensorFlow.

class Model(object):
  """Model with a dense and a linear layer."""

  def __init__(self):
    self.weights = None

  def predict(self, inputs):
    if self.weights is None:
      size = inputs.shape[1]
      # Note that type `tnp.float32` is used for performance.
      stddev = tnp.sqrt(size).astype(tnp.float32)
      w1 = tnp.random.randn(size, 64).astype(tnp.float32) / stddev
      bias = tnp.random.randn(64).astype(tnp.float32)
      w2 = tnp.random.randn(64, 2).astype(tnp.float32) / 8
      self.weights = (w1, bias, w2)
    else:
      w1, bias, w2 = self.weights
    y = tnp.matmul(inputs, w1) + bias
    y = tnp.maximum(y, 0)  # Relu
    return tnp.matmul(y, w2)  # Linear projection

model = Model()
# Create input data and compute predictions.
print(model.predict(tnp.ones([2, 32], dtype=tnp.float32)))

tf.Tensor(
[[-1.7706785  1.1137733]
 [-1.7706785  1.1137733]], shape=(2, 2), dtype=float32)

TensorFlow NumPy và NumPy

TensorFlow NumPy triển khai một tập con của thông số NumPy đầy đủ. Mặc dù nhiều biểu tượng sẽ được thêm vào theo thời gian, nhưng có những tính năng có hệ thống sẽ không được hỗ trợ trong tương lai gần. Chúng bao gồm hỗ trợ NumPy C API, tích hợp Swig, thứ tự lưu trữ Fortran, chế độ xem và stride_tricks , và một số dtype (như np.recarray và np.object ). Để biết thêm chi tiết, vui lòng xem Tài liệu API TensorFlow NumPy .

Khả năng tương tác NumPy

Mảng TensorFlow ND có thể tương tác với các hàm NumPy. Các đối tượng này triển khai giao diện __array__ . NumPy sử dụng giao diện này để chuyển đổi các đối số của hàm thành các giá trị np.ndarray trước khi xử lý chúng.

Tương tự, các hàm TensorFlow NumPy có thể chấp nhận các đầu vào thuộc nhiều loại khác nhau bao gồm np.ndarray . Các đầu vào này được chuyển đổi thành mảng ND bằng cách gọi ndarray.asarray trên chúng.

Việc chuyển đổi mảng ND sang và từ np.ndarray có thể kích hoạt các bản sao dữ liệu thực tế. Vui lòng xem phần về các bản sao đệm để biết thêm chi tiết.

# ND array passed into NumPy function.
np_sum = np.sum(tnp.ones([2, 3]))
print("sum = %s. Class: %s" % (float(np_sum), np_sum.__class__))

# `np.ndarray` passed into TensorFlow NumPy function.
tnp_sum = tnp.sum(np.ones([2, 3]))
print("sum = %s. Class: %s" % (float(tnp_sum), tnp_sum.__class__))

sum = 6.0. Class: <class 'numpy.float64'>
sum = 6.0. Class: <class 'tensorflow.python.framework.ops.EagerTensor'>

# It is easy to plot ND arrays, given the __array__ interface.
labels = 15 + 2 * tnp.random.randn(1, 1000)
_ = plt.hist(labels)

png

Bản sao đệm

Trộn TensorFlow NumPy với mã NumPy có thể kích hoạt các bản sao dữ liệu. Điều này là do TensorFlow NumPy có yêu cầu nghiêm ngặt hơn về căn chỉnh bộ nhớ so với NumPy.

Khi np.ndarray được chuyển tới TensorFlow NumPy, nó sẽ kiểm tra các yêu cầu về căn chỉnh và kích hoạt một bản sao nếu cần. Khi chuyển bộ đệm CPU mảng ND cho NumPy, nói chung bộ đệm sẽ đáp ứng các yêu cầu căn chỉnh và NumPy sẽ không cần tạo bản sao.

Mảng ND có thể tham chiếu đến các bộ đệm được đặt trên các thiết bị khác với bộ nhớ CPU cục bộ. Trong những trường hợp như vậy, việc gọi hàm NumPy sẽ kích hoạt các bản sao trên mạng hoặc thiết bị khi cần thiết.

Do đó, việc trộn lẫn các lệnh gọi NumPy API thường phải được thực hiện một cách thận trọng và người dùng nên đề phòng chi phí sao chép dữ liệu. Việc xen kẽ các cuộc gọi TensorFlow NumPy với các cuộc gọi TensorFlow nói chung là an toàn và tránh sao chép dữ liệu. Xem phần về khả năng tương tác TensorFlow để biết thêm chi tiết.

Ưu tiên điều hành

TensorFlow NumPy xác định __array_priority__ cao hơn NumPy's. Điều này có nghĩa là đối với các toán tử liên quan đến cả mảng ND và np.ndarray , đầu vào trước sẽ được ưu tiên, tức là đầu vào np.ndarray sẽ được chuyển đổi thành một mảng ND và việc triển khai TensorFlow NumPy của toán tử sẽ được gọi.

x = tnp.ones([2]) + np.ones([2])
print("x = %s\nclass = %s" % (x, x.__class__))

x = tf.Tensor([2. 2.], shape=(2,), dtype=float64)
class = <class 'tensorflow.python.framework.ops.EagerTensor'>

TF NumPy và TensorFlow

TensorFlow NumPy được xây dựng dựa trên TensorFlow và do đó tương tác liền mạch với TensorFlow.

`tf.Tensor` và mảng ND

Mảng ND là một bí danh của tf.Tensor , vì vậy rõ ràng chúng có thể được trộn lẫn với nhau mà không kích hoạt các bản sao dữ liệu thực tế.

x = tf.constant([1, 2])
print(x)

# `asarray` and `convert_to_tensor` here are no-ops.
tnp_x = tnp.asarray(x)
print(tnp_x)
print(tf.convert_to_tensor(tnp_x))

# Note that tf.Tensor.numpy() will continue to return `np.ndarray`.
print(x.numpy(), x.numpy().__class__)

tf.Tensor([1 2], shape=(2,), dtype=int32)
tf.Tensor([1 2], shape=(2,), dtype=int32)
tf.Tensor([1 2], shape=(2,), dtype=int32)
[1 2] <class 'numpy.ndarray'>

Khả năng tương tác TensorFlow

Một mảng ND có thể được chuyển tới các API TensorFlow, vì mảng ND chỉ là một bí danh của tf.Tensor . Như đã đề cập trước đó, sự tương tác như vậy không thực hiện sao chép dữ liệu, ngay cả đối với dữ liệu được đặt trên máy gia tốc hoặc thiết bị từ xa.

Ngược lại, các đối tượng tf.Tensor có thể được chuyển tới các API tf.experimental.numpy mà không cần thực hiện sao chép dữ liệu.

# ND array passed into TensorFlow function.
tf_sum = tf.reduce_sum(tnp.ones([2, 3], tnp.float32))
print("Output = %s" % tf_sum)

# `tf.Tensor` passed into TensorFlow NumPy function.
tnp_sum = tnp.sum(tf.ones([2, 3]))
print("Output = %s" % tnp_sum)

Output = tf.Tensor(6.0, shape=(), dtype=float32)
Output = tf.Tensor(6.0, shape=(), dtype=float32)

Gradients và Jacobians: tf.GradientTape

GradientTape của TensorFlow có thể được sử dụng để nhân giống ngược thông qua mã TensorFlow và TensorFlow NumPy.

Sử dụng mô hình được tạo trong phần Mô hình ví dụ và tính toán độ dốc và jacobians.

def create_batch(batch_size=32):
  """Creates a batch of input and labels."""
  return (tnp.random.randn(batch_size, 32).astype(tnp.float32),
          tnp.random.randn(batch_size, 2).astype(tnp.float32))

def compute_gradients(model, inputs, labels):
  """Computes gradients of squared loss between model prediction and labels."""
  with tf.GradientTape() as tape:
    assert model.weights is not None
    # Note that `model.weights` need to be explicitly watched since they
    # are not tf.Variables.
    tape.watch(model.weights)
    # Compute prediction and loss
    prediction = model.predict(inputs)
    loss = tnp.sum(tnp.square(prediction - labels))
  # This call computes the gradient through the computation above.
  return tape.gradient(loss, model.weights)

inputs, labels = create_batch()
gradients = compute_gradients(model, inputs, labels)

# Inspect the shapes of returned gradients to verify they match the
# parameter shapes.
print("Parameter shapes:", [w.shape for w in model.weights])
print("Gradient shapes:", [g.shape for g in gradients])
# Verify that gradients are of type ND array.
assert isinstance(gradients[0], tnp.ndarray)

Parameter shapes: [TensorShape([32, 64]), TensorShape([64]), TensorShape([64, 2])]
Gradient shapes: [TensorShape([32, 64]), TensorShape([64]), TensorShape([64, 2])]

# Computes a batch of jacobians. Each row is the jacobian of an element in the
# batch of outputs w.r.t. the corresponding input batch element.
def prediction_batch_jacobian(inputs):
  with tf.GradientTape() as tape:
    tape.watch(inputs)
    prediction = model.predict(inputs)
  return prediction, tape.batch_jacobian(prediction, inputs)

inp_batch = tnp.ones([16, 32], tnp.float32)
output, batch_jacobian = prediction_batch_jacobian(inp_batch)
# Note how the batch jacobian shape relates to the input and output shapes.
print("Output shape: %s, input shape: %s" % (output.shape, inp_batch.shape))
print("Batch jacobian shape:", batch_jacobian.shape)

Output shape: (16, 2), input shape: (16, 32)
Batch jacobian shape: (16, 2, 32)

Biên dịch theo dõi: tf. Chức năng

tf.function của TensorFlow hoạt động bằng cách "biên dịch theo dõi" mã và sau đó tối ưu hóa các dấu vết này để có hiệu suất nhanh hơn nhiều. Xem phần Giới thiệu về Đồ thị và Hàm .

tf.function cũng có thể được sử dụng để tối ưu hóa mã TensorFlow NumPy. Đây là một ví dụ đơn giản để chứng minh tốc độ tăng tốc. Lưu ý rằng nội dung của mã tf.function bao gồm các lệnh gọi đến các API TensorFlow NumPy.

inputs, labels = create_batch(512)
print("Eager performance")
compute_gradients(model, inputs, labels)
print(timeit.timeit(lambda: compute_gradients(model, inputs, labels),
                    number=10) * 100, "ms")

print("\ntf.function compiled performance")
compiled_compute_gradients = tf.function(compute_gradients)
compiled_compute_gradients(model, inputs, labels)  # warmup
print(timeit.timeit(lambda: compiled_compute_gradients(model, inputs, labels),
                    number=10) * 100, "ms")

Eager performance
1.291419400013183 ms

tf.function compiled performance
0.5561202000080812 ms

Vectơ hóa: tf.vectorized_map

TensorFlow có hỗ trợ sẵn có để lập vectơ các vòng lặp song song, cho phép tăng tốc độ từ một đến hai bậc của cường độ. Các tốc độ tăng tốc này có thể truy cập được thông qua API tf.vectorized_map và áp dụng cho cả mã TensorFlow NumPy.

Đôi khi hữu ích khi tính toán gradient của mỗi đầu ra trong một loạt wrt phần tử lô đầu vào tương ứng. Việc tính toán như vậy có thể được thực hiện một cách hiệu quả bằng cách sử dụng tf.vectorized_map như hình dưới đây.

@tf.function
def vectorized_per_example_gradients(inputs, labels):
  def single_example_gradient(arg):
    inp, label = arg
    return compute_gradients(model,
                             tnp.expand_dims(inp, 0),
                             tnp.expand_dims(label, 0))
  # Note that a call to `tf.vectorized_map` semantically maps
  # `single_example_gradient` over each row of `inputs` and `labels`.
  # The interface is similar to `tf.map_fn`.
  # The underlying machinery vectorizes away this map loop which gives
  # nice speedups.
  return tf.vectorized_map(single_example_gradient, (inputs, labels))

batch_size = 128
inputs, labels = create_batch(batch_size)

per_example_gradients = vectorized_per_example_gradients(inputs, labels)
for w, p in zip(model.weights, per_example_gradients):
  print("Weight shape: %s, batch size: %s, per example gradient shape: %s " % (
      w.shape, batch_size, p.shape))

Weight shape: (32, 64), batch size: 128, per example gradient shape: (128, 32, 64) 
Weight shape: (64,), batch size: 128, per example gradient shape: (128, 64) 
Weight shape: (64, 2), batch size: 128, per example gradient shape: (128, 64, 2)

# Benchmark the vectorized computation above and compare with
# unvectorized sequential computation using `tf.map_fn`.
@tf.function
def unvectorized_per_example_gradients(inputs, labels):
  def single_example_gradient(arg):
    inp, label = arg
    return compute_gradients(model,
                             tnp.expand_dims(inp, 0),
                             tnp.expand_dims(label, 0))

  return tf.map_fn(single_example_gradient, (inputs, labels),
                   fn_output_signature=(tf.float32, tf.float32, tf.float32))

print("Running vectorized computation")
print(timeit.timeit(lambda: vectorized_per_example_gradients(inputs, labels),
                    number=10) * 100, "ms")

print("\nRunning unvectorized computation")
per_example_gradients = unvectorized_per_example_gradients(inputs, labels)
print(timeit.timeit(lambda: unvectorized_per_example_gradients(inputs, labels),
                    number=10) * 100, "ms")

Running vectorized computation
0.5265710999992734 ms

Running unvectorized computation
40.35122630002661 ms

Vị trí thiết bị

TensorFlow NumPy có thể đặt các hoạt động trên CPU, GPU, TPU và các thiết bị từ xa. Nó sử dụng cơ chế TensorFlow tiêu chuẩn để đặt thiết bị. Dưới đây là một ví dụ đơn giản cho thấy cách liệt kê tất cả các thiết bị và sau đó đặt một số tính toán trên một thiết bị cụ thể.

TensorFlow cũng có các API để sao chép tính toán trên các thiết bị và thực hiện giảm thiểu tập thể sẽ không được đề cập ở đây.

Liệt kê các thiết bị

tf.config.list_logical_devices và tf.config.list_physical_devices có thể được sử dụng để tìm thiết bị sẽ sử dụng.

print("All logical devices:", tf.config.list_logical_devices())
print("All physical devices:", tf.config.list_physical_devices())

# Try to get the GPU device. If unavailable, fallback to CPU.
try:
  device = tf.config.list_logical_devices(device_type="GPU")[0]
except IndexError:
  device = "/device:CPU:0"

All logical devices: [LogicalDevice(name='/device:CPU:0', device_type='CPU'), LogicalDevice(name='/device:GPU:0', device_type='GPU')]
All physical devices: [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

Đặt hoạt động: `tf.device`

Các hoạt động có thể được thực hiện trên một thiết bị bằng cách gọi nó trong phạm vi tf.device .

print("Using device: %s" % str(device))
# Run operations in the `tf.device` scope.
# If a GPU is available, these operations execute on the GPU and outputs are
# placed on the GPU memory.
with tf.device(device):
  prediction = model.predict(create_batch(5)[0])

print("prediction is placed on %s" % prediction.device)

Using device: LogicalDevice(name='/device:GPU:0', device_type='GPU')
prediction is placed on /job:localhost/replica:0/task:0/device:GPU:0

Sao chép mảng ND trên các thiết bị: `tnp.copy`

Một lệnh gọi tới tnp.copy , được đặt trong một phạm vi thiết bị nhất định, sẽ sao chép dữ liệu vào thiết bị đó, trừ khi dữ liệu đã có trên thiết bị đó.

with tf.device("/device:CPU:0"):
  prediction_cpu = tnp.copy(prediction)
print(prediction.device)
print(prediction_cpu.device)

/job:localhost/replica:0/task:0/device:GPU:0
/job:localhost/replica:0/task:0/device:CPU:0

So sánh hiệu suất

TensorFlow NumPy sử dụng các nhân TensorFlow được tối ưu hóa cao có thể được gửi trên CPU, GPU và TPU. TensorFlow cũng thực hiện nhiều tối ưu hóa trình biên dịch, như kết hợp hoạt động, chuyển sang cải thiện hiệu suất và bộ nhớ. Xem Tối ưu hóa đồ thị TensorFlow với Grappler để tìm hiểu thêm.

Tuy nhiên TensorFlow có chi phí cao hơn cho các hoạt động điều phối so với NumPy. Đối với khối lượng công việc bao gồm các hoạt động nhỏ (dưới khoảng 10 micro giây), các chi phí này có thể chiếm ưu thế trong thời gian chạy và NumPy có thể cung cấp hiệu suất tốt hơn. Đối với các trường hợp khác, TensorFlow nói chung sẽ cung cấp hiệu suất tốt hơn.

Chạy điểm chuẩn bên dưới để so sánh hiệu suất NumPy và TensorFlow NumPy cho các kích thước đầu vào khác nhau.

def benchmark(f, inputs, number=30, force_gpu_sync=False):
  """Utility to benchmark `f` on each value in `inputs`."""
  times = []
  for inp in inputs:
    def _g():
      if force_gpu_sync:
        one = tnp.asarray(1)
      f(inp)
      if force_gpu_sync:
        with tf.device("CPU:0"):
          tnp.copy(one)  # Force a sync for GPU case

    _g()  # warmup
    t = timeit.timeit(_g, number=number)
    times.append(t * 1000. / number)
  return times


def plot(np_times, tnp_times, compiled_tnp_times, has_gpu, tnp_times_gpu):
  """Plot the different runtimes."""
  plt.xlabel("size")
  plt.ylabel("time (ms)")
  plt.title("Sigmoid benchmark: TF NumPy vs NumPy")
  plt.plot(sizes, np_times, label="NumPy")
  plt.plot(sizes, tnp_times, label="TF NumPy (CPU)")
  plt.plot(sizes, compiled_tnp_times, label="Compiled TF NumPy (CPU)")
  if has_gpu:
    plt.plot(sizes, tnp_times_gpu, label="TF NumPy (GPU)")
  plt.legend()

# Define a simple implementation of `sigmoid`, and benchmark it using
# NumPy and TensorFlow NumPy for different input sizes.

def np_sigmoid(y):
  return 1. / (1. + np.exp(-y))

def tnp_sigmoid(y):
  return 1. / (1. + tnp.exp(-y))

@tf.function
def compiled_tnp_sigmoid(y):
  return tnp_sigmoid(y)

sizes = (2 ** 0, 2 ** 5, 2 ** 10, 2 ** 15, 2 ** 20)
np_inputs = [np.random.randn(size).astype(np.float32) for size in sizes]
np_times = benchmark(np_sigmoid, np_inputs)

with tf.device("/device:CPU:0"):
  tnp_inputs = [tnp.random.randn(size).astype(np.float32) for size in sizes]
  tnp_times = benchmark(tnp_sigmoid, tnp_inputs)
  compiled_tnp_times = benchmark(compiled_tnp_sigmoid, tnp_inputs)

has_gpu = len(tf.config.list_logical_devices("GPU"))
if has_gpu:
  with tf.device("/device:GPU:0"):
    tnp_inputs = [tnp.random.randn(size).astype(np.float32) for size in sizes]
    tnp_times_gpu = benchmark(compiled_tnp_sigmoid, tnp_inputs, 100, True)
else:
  tnp_times_gpu = None
plot(np_times, tnp_times, compiled_tnp_times, has_gpu, tnp_times_gpu)

png