Missed TensorFlow Dev Summit? Check out the video playlist. Watch recordings

Load NumPy data

View on TensorFlow.org Run in Google Colab View source on GitHub Download notebook

This tutorial provides an example of loading data from NumPy arrays into a tf.data.Dataset.

This example loads the MNIST dataset from a .npz file. However, the source of the NumPy arrays is not important.

Setup

 
import numpy as np
import tensorflow as tf

Load from .npz file

DATA_URL = 'https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz'

path = tf.keras.utils.get_file('mnist.npz', DATA_URL)
with np.load(path) as data:
  train_examples = data['x_train']
  train_labels = data['y_train']
  test_examples = data['x_test']
  test_labels = data['y_test']

Load NumPy arrays with tf.data.Dataset

Assuming you have an array of examples and a corresponding array of labels, pass the two arrays as a tuple into tf.data.Dataset.from_tensor_slices to create a tf.data.Dataset.

train_dataset = tf.data.Dataset.from_tensor_slices((train_examples, train_labels))
test_dataset = tf.data.Dataset.from_tensor_slices((test_examples, test_labels))

Use the datasets

Shuffle and batch the datasets

BATCH_SIZE = 64
SHUFFLE_BUFFER_SIZE = 100

train_dataset = train_dataset.shuffle(SHUFFLE_BUFFER_SIZE).batch(BATCH_SIZE)
test_dataset = test_dataset.batch(BATCH_SIZE)

Build and train a model

model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(10)
])

model.compile(optimizer=tf.keras.optimizers.RMSprop(),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['sparse_categorical_accuracy'])
model.fit(train_dataset, epochs=10)
Train for 938 steps
Epoch 1/10
938/938 [==============================] - 3s 3ms/step - loss: 3.0967 - sparse_categorical_accuracy: 0.8770
Epoch 2/10
938/938 [==============================] - 2s 2ms/step - loss: 0.5423 - sparse_categorical_accuracy: 0.9262
Epoch 3/10
938/938 [==============================] - 2s 2ms/step - loss: 0.4003 - sparse_categorical_accuracy: 0.9434
Epoch 4/10
938/938 [==============================] - 2s 2ms/step - loss: 0.3266 - sparse_categorical_accuracy: 0.9541
Epoch 5/10
938/938 [==============================] - 2s 2ms/step - loss: 0.2834 - sparse_categorical_accuracy: 0.9603
Epoch 6/10
938/938 [==============================] - 2s 2ms/step - loss: 0.2601 - sparse_categorical_accuracy: 0.9643
Epoch 7/10
938/938 [==============================] - 2s 2ms/step - loss: 0.2479 - sparse_categorical_accuracy: 0.9670
Epoch 8/10
938/938 [==============================] - 2s 2ms/step - loss: 0.2248 - sparse_categorical_accuracy: 0.9705
Epoch 9/10
938/938 [==============================] - 2s 2ms/step - loss: 0.2166 - sparse_categorical_accuracy: 0.9725
Epoch 10/10
938/938 [==============================] - 2s 2ms/step - loss: 0.2077 - sparse_categorical_accuracy: 0.9741

<tensorflow.python.keras.callbacks.History at 0x7fda6d1d6208>
model.evaluate(test_dataset)
157/157 [==============================] - 0s 2ms/step - loss: 0.6159 - sparse_categorical_accuracy: 0.9561

[0.6159260598473444, 0.9561]