Represents a potentially large set of elements.

Used in the notebooks

Used in the guide Used in the tutorials

The API supports writing descriptive and efficient input pipelines. Dataset usage follows a common pattern:

  1. Create a source dataset from your input data.
  2. Apply dataset transformations to preprocess the data.
  3. Iterate over the dataset and process the elements.

Iteration happens in a streaming fashion, so the full dataset does not need to fit into memory.

Source Datasets:

The simplest way to create a dataset is to create it from a python list:

dataset =[1, 2, 3])
for element in dataset:
tf.Tensor(1, shape=(), dtype=int32)
tf.Tensor(2, shape=(), dtype=int32)
tf.Tensor(3, shape=(), dtype=int32)

To process lines from files, use

dataset =["file1.txt", "file2.txt"])

To process records written in the TFRecord format, use TFRecordDataset:

dataset =["file1.tfrecords", "file2.tfrecords"])

To create a dataset of all files matching a pattern, use

dataset ="/path/*.txt")  # doctest: +SKIP

See and for more ways to create datasets.


Once you have a dataset, you can apply transformations to prepare the data for your model:

dataset =[1, 2, 3])
dataset = x: x*2)
[2, 4, 6]

Common Terms:

Element: A single output from calling next() on a dataset iterator. Elements may be nested structures containing multiple components. For example, the element (1, (3, "apple")) has one tuple nested in another tuple. The components are 1,