12 月 7 日の Women in ML シンポジウムに参加する今すぐ登録する

tf.data を使って NumPy データをロードする

コレクションでコンテンツを整理 必要に応じて、コンテンツの保存と分類を行います。

View on TensorFlow.org Run in Google Colab View source on GitHub Download notebook

このチュートリアルでは、NumPy 配列から tf.data.Dataset にデータを読み込む例を示します。

この例では、MNIST データセットを .npz ファイルから読み込みますが、 NumPy 配列がどこに入っているかは重要ではありません。

設定

import numpy as np
import tensorflow as tf
2022-08-09 02:35:32.856433: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-08-09 02:35:33.423026: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvrtc.so.11.1: cannot open shared object file: No such file or directory
2022-08-09 02:35:33.423281: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvrtc.so.11.1: cannot open shared object file: No such file or directory
2022-08-09 02:35:33.423294: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.

.npz ファイルからのロード

DATA_URL = 'https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz'

path = tf.keras.utils.get_file('mnist.npz', DATA_URL)
with np.load(path) as data:
  train_examples = data['x_train']
  train_labels = data['y_train']
  test_examples = data['x_test']
  test_labels = data['y_test']

tf.data.Dataset を使って NumPy 配列をロード

サンプルの配列と対応するラベルの配列があるとします。 tf.data.Dataset.from_tensor_slices にこれら2つの配列をタプルとして入力し、tf.data.Dataset を作成します。

train_dataset = tf.data.Dataset.from_tensor_slices((train_examples, train_labels))
test_dataset = tf.data.Dataset.from_tensor_slices((test_examples, test_labels))

データセットの使用

データセットのシャッフルとバッチ化

BATCH_SIZE = 64
SHUFFLE_BUFFER_SIZE = 100

train_dataset = train_dataset.shuffle(SHUFFLE_BUFFER_SIZE).batch(BATCH_SIZE)
test_dataset = test_dataset.batch(BATCH_SIZE)

モデルの構築と訓練

model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(10)
])

model.compile(optimizer=tf.keras.optimizers.RMSprop(),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['sparse_categorical_accuracy'])
model.fit(train_dataset, epochs=10)
Epoch 1/10
938/938 [==============================] - 3s 2ms/step - loss: 3.3900 - sparse_categorical_accuracy: 0.8785
Epoch 2/10
938/938 [==============================] - 2s 2ms/step - loss: 0.5183 - sparse_categorical_accuracy: 0.9294
Epoch 3/10
938/938 [==============================] - 2s 2ms/step - loss: 0.3926 - sparse_categorical_accuracy: 0.9462
Epoch 4/10
938/938 [==============================] - 2s 2ms/step - loss: 0.3341 - sparse_categorical_accuracy: 0.9547
Epoch 5/10
938/938 [==============================] - 2s 2ms/step - loss: 0.2878 - sparse_categorical_accuracy: 0.9617
Epoch 6/10
938/938 [==============================] - 2s 2ms/step - loss: 0.2594 - sparse_categorical_accuracy: 0.9651
Epoch 7/10
938/938 [==============================] - 2s 2ms/step - loss: 0.2408 - sparse_categorical_accuracy: 0.9686
Epoch 8/10
938/938 [==============================] - 2s 2ms/step - loss: 0.2219 - sparse_categorical_accuracy: 0.9713
Epoch 9/10
938/938 [==============================] - 2s 2ms/step - loss: 0.2070 - sparse_categorical_accuracy: 0.9730
Epoch 10/10
938/938 [==============================] - 2s 2ms/step - loss: 0.2054 - sparse_categorical_accuracy: 0.9749
<keras.callbacks.History at 0x7f168eaeeeb0>
model.evaluate(test_dataset)
157/157 [==============================] - 1s 2ms/step - loss: 0.6534 - sparse_categorical_accuracy: 0.9590
[0.6534402966499329, 0.9589999914169312]