TensorFlow is back at Google I/O on May 14! Register now

Load NumPy data

View on TensorFlow.org

Run in Google Colab

View source on GitHub

Download notebook

This tutorial provides an example of loading data from NumPy arrays into a tf.data.Dataset.

This example loads the MNIST dataset from a .npz file. However, the source of the NumPy arrays is not important.

Setup

import numpy as np
import tensorflow as tf

Load from `.npz` file

DATA_URL = 'https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz'

path = tf.keras.utils.get_file('mnist.npz', DATA_URL)
with np.load(path) as data:
  train_examples = data['x_train']
  train_labels = data['y_train']
  test_examples = data['x_test']
  test_labels = data['y_test']

Load NumPy arrays with `tf.data.Dataset`

Assuming you have an array of examples and a corresponding array of labels, pass the two arrays as a tuple into tf.data.Dataset.from_tensor_slices to create a tf.data.Dataset.

train_dataset = tf.data.Dataset.from_tensor_slices((train_examples, train_labels))
test_dataset = tf.data.Dataset.from_tensor_slices((test_examples, test_labels))

Use the datasets

Shuffle and batch the datasets

BATCH_SIZE = 64
SHUFFLE_BUFFER_SIZE = 100

train_dataset = train_dataset.shuffle(SHUFFLE_BUFFER_SIZE).batch(BATCH_SIZE)
test_dataset = test_dataset.batch(BATCH_SIZE)

Build and train a model

model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(10)
])

model.compile(optimizer=tf.keras.optimizers.RMSprop(),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['sparse_categorical_accuracy'])

model.fit(train_dataset, epochs=10)

Epoch 1/10
938/938 [==============================] - 3s 2ms/step - loss: 3.3491 - sparse_categorical_accuracy: 0.8784
Epoch 2/10
938/938 [==============================] - 2s 2ms/step - loss: 0.5607 - sparse_categorical_accuracy: 0.9211
Epoch 3/10
938/938 [==============================] - 2s 2ms/step - loss: 0.3840 - sparse_categorical_accuracy: 0.9426
Epoch 4/10
938/938 [==============================] - 2s 2ms/step - loss: 0.3282 - sparse_categorical_accuracy: 0.9523
Epoch 5/10
938/938 [==============================] - 2s 2ms/step - loss: 0.2916 - sparse_categorical_accuracy: 0.9584
Epoch 6/10
938/938 [==============================] - 2s 2ms/step - loss: 0.2494 - sparse_categorical_accuracy: 0.9633
Epoch 7/10
938/938 [==============================] - 2s 2ms/step - loss: 0.2233 - sparse_categorical_accuracy: 0.9677
Epoch 8/10
938/938 [==============================] - 2s 2ms/step - loss: 0.2103 - sparse_categorical_accuracy: 0.9717
Epoch 9/10
938/938 [==============================] - 2s 2ms/step - loss: 0.1976 - sparse_categorical_accuracy: 0.9718
Epoch 10/10
938/938 [==============================] - 2s 2ms/step - loss: 0.1790 - sparse_categorical_accuracy: 0.9739
<keras.src.callbacks.History at 0x7f6994f47f40>

model.evaluate(test_dataset)

157/157 [==============================] - 0s 2ms/step - loss: 0.5942 - sparse_categorical_accuracy: 0.9525
[0.5942174196243286, 0.9524999856948853]