Bu sayfa, Cloud Translation API ile çevrilmiştir.
Switch to English

tfio.experimental.IODataset

View source on GitHub

IODataset

Inherits From: IODataset

Used in the notebooks

Used in the tutorials

variant_tensor A DT_VARIANT tensor that represents the dataset.

element_spec The type specification of an element of this dataset.

dataset = tf.data.Dataset.from_tensor_slices([1, 2, 3])
dataset.element_spec
TensorSpec(shape=(), dtype=tf.int32, name=None)

Methods

apply

Applies a transformation function to this dataset.

apply enables chaining of custom Dataset transformations, which are represented as functions that take one Dataset argument and return a transformed Dataset.

dataset = tf.data.Dataset.range(100)
def dataset_fn(ds):
  return ds.filter(lambda x: x < 5)
dataset = dataset.apply(dataset_fn)
list(dataset.as_numpy_iterator())
[0, 1, 2, 3, 4]

Args
transformation_func A function that takes one Dataset argument and returns a Dataset.

Returns
Dataset The Dataset returned by applying transformation_func to this dataset.

as_numpy_iterator

Returns an iterator which converts all elements of the dataset to numpy.

Use as_numpy_iterator to inspect the content of your dataset. To see element shapes and types, print dataset elements directly instead of using as_numpy_iterator.

dataset = tf.data.Dataset.from_tensor_slices([1, 2, 3])
for element in dataset:
  print(element)
tf.Tensor(1, shape=(), dtype=int32)
tf.Tensor(2, shape=(), dtype=int32)
tf.Tensor(3, shape=(), dtype=int32)

This method requires that you are running in eager mode and the dataset's element_spec contains only TensorSpec components.

dataset = tf.data.Dataset.from_tensor_slices([1, 2, 3])
for element in dataset.as_numpy_iterator():
  print(element)
1
2
3
dataset = tf.data.Dataset.from_tensor_slices([1, 2, 3])
print(list(dataset.as_numpy_iterator()))
[1, 2, 3]

as_numpy_iterator() will preserve the nested structure of dataset elements.

dataset = tf.data.Dataset.from_tensor_slices({'a': ([1, 2], [3, 4]),
                                              'b': [5, 6]})
list(dataset.as_numpy_iterator()) == [{'a': (1, 3), 'b': 5},
                                      {'a': (2, 4), 'b': 6}]
True

Returns
An iterable over the elements of the dataset, with their tensors converted to numpy arrays.

Raises
TypeError if an element contains a non-Tensor value.
RuntimeError if eager execution is not enabled.

batch

Combines consecutive elements of this dataset into batches.

dataset = tf.data.Dataset.range(8)
dataset = dataset.batch(3)
list(dataset.as_numpy_iterator())
[array([0, 1, 2]), array([3, 4, 5]), array([6, 7])]
dataset = tf.data.Dataset.range(8)
dataset = dataset.batch(3, drop_remainder=True)
list(dataset.as_numpy_iterator())
[array([0, 1, 2]), array([3, 4, 5])]

The components of the resulting element will have an additional outer dimension, which will be batch_size (or N % batch_size for the last element if batch_size does not divide the number of input elements N evenly and drop_remainder is False). If your program depends on the batches having the same outer dimension, you should set the drop_remainder argument to True to prevent the smaller batch from being produced.

Args
batch_size A tf.int64 scalar tf.Tensor, representing the number of consecutive elements of this dataset to combine in a single batch.
drop_remainder (Optional.) A tf.bool scalar tf.Tensor, representing whether the last batch should be dropped in the case it has fewer than batch_size elements; the default behavior is not to drop the smaller batch.

Returns
Dataset A Dataset.

cache

Caches the elements in this dataset.

The first time the dataset is iterated over, its elements will be cached either in the specified file or in memory. Subsequent iterations will use the cached data.

dataset = tf.data.Dataset.range(5)
dataset = dataset.map(lambda x: x**2)
dataset = dataset.cache()
# The first time reading through the data will generate the data using
# `range` and `map`.
list(dataset.as_numpy_iterator())
[0, 1, 4, 9, 16]
# Subsequent iterations read from the cache.
list(dataset.as_numpy_iterator())
[0, 1, 4, 9, 16]

When caching to a file, the cached data will persist across runs. Even the first iteration through the data will read from the cache file. Changing the input pipeline before the call to .cache() will have no effect until the cache file is removed or the filename is changed.

dataset = tf.data.Dataset.range(5)
dataset = dataset.cache("/path/to/file")  # doctest: +SKIP
list(dataset.as_numpy_iterator())  # doctest: +SKIP
[0, 1, 2, 3, 4]
dataset = tf.data.Dataset.range(10)
dataset = dataset.cache("/path/to/file")  # Same file! # doctest: +SKIP
list(dataset.as_numpy_iterator())  # doctest: +SKIP
[0, 1, 2, 3, 4]

Args
filename A tf.string scalar tf.Tensor, representing the name of a directory on the filesystem to use for caching elements in this Dataset. If a filename is not provided, the dataset will be cached in memory.

Returns
Dataset A Dataset.

cardinality

Returns the cardinality of the dataset, if known.

cardinality may return tf.data.INFINITE_CARDINALITY if the dataset contains an infinite number of elements or tf.data.UNKNOWN_CARDINALITY if the analysis fails to determine the number of elements in the dataset (e.g. when the dataset source is a file).

dataset = tf.data.Dataset.range(42)
print(dataset.cardinality().numpy())
42
dataset = dataset.repeat()
cardinality = dataset.cardinality()
print((cardinality == tf.data.INFINITE_CARDINALITY).numpy())
True
dataset = dataset.filter(lambda x: True)
cardinality = dataset.cardinality()
print((cardinality == tf.data.UNKNOWN_CARDINALITY).numpy())
True

Returns
A scalar tf.int64 Tensor representing the cardinality of the dataset. If the cardinality is infinite or unknown, cardinality returns the named constants tf.data.INFINITE_CARDINALITY and tf.data.UNKNOWN_CARDINALITY respectively.

concatenate

Creates a Dataset by concatenating the given dataset with this dataset.

a = tf.data.Dataset.range(1, 4)  # ==> [ 1, 2, 3 ]
b = tf.data.Dataset.range(4, 8)  # ==> [ 4, 5, 6, 7 ]
ds = a.concatenate(b)
list(ds.as_numpy_iterator())
[1, 2, 3, 4, 5, 6, 7]
# The input dataset and dataset to be concatenated should have the same
# nested structures and output types.
c = tf.data.Dataset.zip((a, b))
a.concatenate(c)
Traceback (most recent call last):
TypeError: Two datasets to concatenate have different types
<dtype: 'int64'> and (tf.int64, tf.int64)
d = tf.data.Dataset.from_tensor_slices(["a", "b", "c"])
a.concatenate(d)
Traceback (most recent call last):
TypeError: Two datasets to concatenate have different types
<dtype: 'int64'> and <dtype: 'string'>

Args
dataset Dataset to be concatenated.

Returns
Dataset A Dataset.

enumerate

Enumerates the elements of this dataset.

It is similar to python's enumerate.

dataset = tf.data.Dataset.from_tensor_slices([1, 2, 3])
dataset = dataset.enumerate(start=5)
for element in dataset.as_numpy_iterator():
  print(element)
(5, 1)
(6, 2)
(7, 3)
# The nested structure of the input dataset determines the structure of
# elements in the resulting dataset.
dataset = tf.data.Dataset.from_tensor_slices([(7, 8), (9, 10)])
dataset = dataset.enumerate()
for element in dataset.as_numpy_iterator():
  print(element)
(0, array([7, 8], dtype=int32))
(1, array([ 9, 10], dtype=int32))

Args
start A tf.int64 scalar tf.Tensor, representing the start value for enumeration.

Returns
Dataset A Dataset.

filter

Filters this dataset according to predicate.

dataset = tf.data.Dataset.from_tensor_slices([1, 2, 3])
dataset = dataset.filter(lambda x: x < 3)
list(dataset.as_numpy_iterator())
[1, 2]
# `tf.math.equal(x, y)` is required for equality comparison
def filter_fn(x):
  return tf.math.equal(x, 1)
dataset = dataset.filter(filter_fn)
list(dataset.as_numpy_iterator())
[1]

Args
predicate A function mapping a dataset element to a boolean.

Returns
Dataset The Dataset containing the elements of this dataset for which predicate is True.

flat_map

Maps map_func across this dataset and flattens the result.

Use flat_map if you want to make sure that the order of your dataset stays the same. For example, to flatten a dataset of batches into a dataset of their elements:

dataset = tf.data.Dataset.from_tensor_slices(
               [[1, 2, 3], [4, 5, 6], [7, 8, 9]])
dataset = dataset.flat_map(lambda x: Dataset.from_tensor_slices(x))
list(dataset.as_numpy_iterator())
[1, 2, 3, 4, 5, 6, 7, 8, 9]

tf.data.Dataset.interleave() is a generalization of flat_map, since flat_map produces the same output as tf.data.Dataset.interleave(cycle_length=1)

Args
map_func A function mapping a dataset element to a dataset.

Returns
Dataset A Dataset.

from_audio

View source

Creates an IODataset from an audio file.

The following audio file formats are supported:

  • WAV
  • Flac
  • Vorbis
  • MP3

Args
filename A string, the filename of an audio file.
name A name prefix for the IOTensor (optional).

Returns
A IODataset.

from_avro

View source

Creates an IODataset from a avro file's dataset object.

Args
filename A string, the filename of a avro file.
schema A string, the schema of a avro file.
columns A list of column names within avro file.
name A name prefix for the IOTensor (optional).

Returns
A IODataset.

from_ffmpeg

View source

Creates an IODataset from a media file by FFmpeg

Args
filename A string, the filename of a media file.
stream A string, the stream index (e.g., "v:0"). Note video, audio, and subtitle index starts with 0 separately.
name A name prefix for the IOTensor (optional).

Returns
A IODataset.

from_generator

Creates a Dataset whose elements are generated by generator.

The generator argument must be a callable object that returns an object that supports the iter() protocol (e.g. a generator function). The elements generated by generator must be compatible with the given output_types and (optional) output_shapes arguments.

import itertools

def gen():
  for i in itertools.count(1):
    yield (i, [1] * i)

dataset = tf.data.Dataset.from_generator(
     gen,
     (tf.int64, tf.int64),
     (tf.TensorShape([]), tf.TensorShape([None])))

list(dataset.take(3).as_numpy_iterator())
[(1, array([1])), (2, array([1, 1])), (3, array([1, 1, 1]))]

Args
generator A callable object that returns an object that supports the iter() protocol. If args is not specified, generator must take no arguments; otherwise it must take as many arguments as there are values in args.
output_types A nested structure of tf.DType objects corresponding to each component of an element yielded by generator.
output_shapes (Optional.) A nested structure of tf.TensorShape objects corresponding to each component of an element yielded by generator.
args (Optional.) A tuple of tf.Tensor objects that will be evaluated and passed to generator as NumPy-array arguments.

Returns
Dataset A Dataset.

from_hdf5

View source

Creates an IODataset from a hdf5 file's dataset object.

Args
filename A string, the filename of a hdf5 file.
dataset A string, the dataset name within hdf5 file.
spec A tf.TensorSpec or a dtype (e.g., tf.int64) of the dataset. In graph mode, spec is needed. In eager mode, spec is probed automatically.
name A name prefix for the IOTensor (optional).

Returns
A IODataset.

from_json

View source

Creates an IODataset from a json file.

Args
filename A string, the filename of a json file.
columns A list of column names. By default (None) all columns will be read.
mode A string, the mode (records or None) to open json file.
name A name prefix for the IOTensor (optional).

Returns
A IODataset.

from_kafka

View source

Creates an IODataset from kafka server with an offset range.

Args
topic A tf.string tensor containing topic subscription.
partition A tf.int64 tensor containing the partition, by default 0.
start A tf.int64 tensor containing the start offset, by default 0.
stop A tf.int64 tensor containing the end offset, by default -1.
servers An optional list of bootstrap servers, by default localhost:9092.
configuration An optional tf.string tensor containing configurations in [Key=Value] format. There are three types of configurations: Global configuration: please refer to 'Global configuration properties' in librdkafka doc. Examples include ["enable.auto.commit=false", "heartbeat.interval.ms=2000"] Topic configuration: please refer to 'Topic configuration properties' in librdkafka doc. Note all topic configurations should be prefixed with configuration.topic.. Examples include ["conf.topic.auto.offset.reset=earliest"]
name A name prefix for the IODataset (optional).

Returns
A IODataset.

from_kinesis

View source

Creates an IODataset from a Kinesis stream.

Args
stream A string, the stream name.
shard A string, the shard of kinesis.
name A name prefix for the IODataset (optional).

Returns
A IODataset.

from_libsvm

View source

Creates an IODataset from a libsvm file.

Args
filename A tf.string tensor containing one or more filenames.
num_features The number of features. dtype(Optional): The type of the output feature tensor. Default to tf.float32. label_dtype(Optional): The type of the output label tensor. Default to tf.int64.
compression_type (Optional.) A tf.string scalar evaluating to one of "" (no compression), "ZLIB", or "GZIP".
name A name prefix for the IOTensor (optional).

Returns
A IODataset.

from_lmdb

View source

Creates an IODataset from a lmdb file.

Args
filename A string, the filename of a lmdb file.
name A name prefix for the IOTensor (optional).

Returns
A IODataset.

from_mnist

View source

Creates an IODataset from MNIST images and/or labels files.

Args
images A string, the filename of MNIST images file.
labels A string, the filename of MNIST labels file.
name A name prefix for the IODataset (optional).

Returns
A IODataset.

from_numpy

View source

Creates an IODataset from Numpy arrays.

The from_numpy allows user to create a Dataset from a dict, tuple, or individual element of numpy array_like. The Dataset created through from_numpy has the same dtypes as the input elements of array_like. The shapes of the Dataset is similar to the input elements of array_like, except that the first dimensions of the shapes are set to None. The reason is that first dimensions of the iterated output which may not be dividable to the total number of elements.

For example:

import numpy as np
import tensorflow as tf
import tensorflow_io as tfio
a = (np.asarray([[0., 1.], [2., 3.], [4., 5.], [6., 7.], [8., 9.]]),
     np.asarray([[10, 11], [12, 13], [14, 15], [16, 17], [18, 19]]))
d = tfio.experimental.IODataset.from_numpy(a).batch(2)
for i in d:
  print(i.numpy())
# numbers of elements = [2, 2, 1] <= (5 / 2)
#
# ([[0., 1.], [2., 3.]], [[10, 11], [12, 13]]) # <= batch index 0
# ([[4., 5.], [6., 7.]], [[14, 15], [16, 17]]) # <= batch index 1
# ([[8., 9.]],           [[18, 19]])           # <= batch index 2

Args: a: dict, tuple, or array_like numpy array if the input type is array_like; dict or tuple of numpy arrays if the input type is dict or tuple. name: A name prefix for the IOTensor (optional).

Returns
A IODataset with the same dtypes as in array_like specified in a.

from_numpy_file

View source

Creates an IODataset from a Numpy file.

The from_numpy_file allows user to create a Dataset from a numpy file (npy or npz). The Dataset created through from_numpy_file has the same dtypes as the elements in numpy file. The shapes of the Dataset is similar to the elements of numpy file, except the first dimensions of the shapes are set to None. The reason is that first dimensions of the iterated output which may not be dividable to the total number of elements. In case numpy file consists of unnamed elements, a tuple of numpy arrays are returned, otherwise a dict is returned for named elements.

Args:
  filename: filename of numpy file (npy or npz).
  spec: A tuple of tf.TensorSpec or dtype, or a dict of
    `name:tf.TensorSpec` or `name:dtype` pairs to specify the dtypes
    in each element of the numpy file. In eager mode spec is automatically
    probed. In graph spec must be provided. If a tuple is provided for
    spec then it is assumed that numpy file consists of `arr_0`, `arr_2`...
    If a dict is provided then numpy file should consists of named
    elements.
  name: A name prefix for the IOTensor (optional).

<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Returns</th></tr>
<tr class="alt">
<td colspan="2">
A `IODataset` with the same dtypes as of the array_like in numpy
file (npy or npz).
</td>
</tr>

</table>



<h3 id="from_parquet"><code>from_parquet</code></h3>

<a target="_blank" href="https://github.com/tensorflow/io/blob/v0.15.0/tensorflow_io/core/python/ops/io_dataset.py#L258-L275">View source</a>

<pre class="devsite-click-to-copy prettyprint lang-py tfo-signature-link">
<code>@classmethod</code>
<code>from_parquet(
    filename, columns=None, **kwargs
)
</code></pre>

Creates an `IODataset` from a Parquet file.


<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Args</th></tr>

<tr>
<td>
`filename`
</td>
<td>
A string, the filename of a Parquet file.
</td>
</tr><tr>
<td>
`columns`
</td>
<td>
A list of column names. By default (None)
all columns will be read.
</td>
</tr><tr>
<td>
`name`
</td>
<td>
A name prefix for the IOTensor (optional).
</td>
</tr>
</table>



<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Returns</th></tr>
<tr class="alt">
<td colspan="2">
A `IODataset`.
</td>
</tr>

</table>



<h3 id="from_pcap"><code>from_pcap</code></h3>

<a target="_blank" href="https://github.com/tensorflow/io/blob/v0.15.0/tensorflow_io/core/python/ops/io_dataset.py#L295-L308">View source</a>

<pre class="devsite-click-to-copy prettyprint lang-py tfo-signature-link">
<code>@classmethod</code>
<code>from_pcap(
    filename, **kwargs
)
</code></pre>

Creates an `IODataset` from a pcap file.


<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Args</th></tr>

<tr>
<td>
`filename`
</td>
<td>
A string, the filename of a pcap file.
</td>
</tr><tr>
<td>
`name`
</td>
<td>
A name prefix for the IOTensor (optional).
</td>
</tr>
</table>



<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Returns</th></tr>
<tr class="alt">
<td colspan="2">
A `IODataset`.
</td>
</tr>

</table>



<h3 id="from_prometheus"><code>from_prometheus</code></h3>

<a target="_blank" href="https://github.com/tensorflow/io/blob/v0.15.0/tensorflow_io/core/python/experimental/io_dataset_ops.py#L195-L221">View source</a>

<pre class="devsite-click-to-copy prettyprint lang-py tfo-signature-link">
<code>@classmethod</code>
<code>from_prometheus(
    query, length, offset=None, endpoint=None, spec=None
)
</code></pre>

Creates an `GraphIODataset` from a prometheus endpoint.


<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Args</th></tr>

<tr>
<td>
`query`
</td>
<td>
A string, the query string for prometheus.
</td>
</tr><tr>
<td>
`length`
</td>
<td>
An integer, the length of the query (in seconds).
</td>
</tr><tr>
<td>
`offset`
</td>
<td>
An integer, the a millisecond-precision timestamp, by default
the time when graph runs.
</td>
</tr><tr>
<td>
`endpoint`
</td>
<td>
A string, the server address of prometheus, by default
`http://localhost:9090`.
</td>
</tr><tr>
<td>
`spec`
</td>
<td>
A structured tf.TensorSpec of the dataset.
The format should be {"job": {"instance": {"name": tf.TensorSpec} } }.
In graph mode, spec is needed. In eager mode,
spec is probed automatically.
</td>
</tr><tr>
<td>
`name`
</td>
<td>
A name prefix for the IODataset (optional).
</td>
</tr>
</table>



<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Returns</th></tr>
<tr class="alt">
<td colspan="2">
A `IODataset`.
</td>
</tr>

</table>



<h3 id="from_sql"><code>from_sql</code></h3>

<a target="_blank" href="https://github.com/tensorflow/io/blob/v0.15.0/tensorflow_io/core/python/experimental/io_dataset_ops.py#L223-L238">View source</a>

<pre class="devsite-click-to-copy prettyprint lang-py tfo-signature-link">
<code>@classmethod</code>
<code>from_sql(
    query, endpoint=None, spec=None
)
</code></pre>

Creates an `GraphIODataset` from a postgresql server endpoint.


<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Args</th></tr>

<tr>
<td>
`query`
</td>
<td>
A string, the sql query string.
</td>
</tr><tr>
<td>
`endpoint`
</td>
<td>
A string, the server address of postgresql server.
</td>
</tr><tr>
<td>
`spec`
</td>
<td>
A structured (tuple) tf.TensorSpec of the dataset.
In graph mode, spec is needed. In eager mode,
spec is probed automatically.
</td>
</tr><tr>
<td>
`name`
</td>
<td>
A name prefix for the IODataset (optional).
</td>
</tr>
</table>



<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Returns</th></tr>
<tr class="alt">
<td colspan="2">
A `IODataset`.
</td>
</tr>

</table>



<h3 id="from_tensor_slices"><code>from_tensor_slices</code></h3>

<pre class="devsite-click-to-copy prettyprint lang-py tfo-signature-link">
<code>@staticmethod</code>
<code>from_tensor_slices(
    tensors
)
</code></pre>

Creates a `Dataset` whose elements are slices of the given tensors.

The given tensors are sliced along their first dimension. This operation
preserves the structure of the input tensors, removing the first dimension
of each tensor and using it as the dataset dimension. All input tensors
must have the same size in their first dimensions.

<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># Slicing a 1D tensor produces scalar tensor elements.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = tf.data.Dataset.from_tensor_slices([1, 2, 3])</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(dataset.as_numpy_iterator())</code>
<code class="no-select nocode">[1, 2, 3]</code>
</pre>


<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># Slicing a 2D tensor produces 1D tensor elements.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = tf.data.Dataset.from_tensor_slices([[1, 2], [3, 4]])</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(dataset.as_numpy_iterator())</code>
<code class="no-select nocode">[array([1, 2], dtype=int32), array([3, 4], dtype=int32)]</code>
</pre>


<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># Slicing a tuple of 1D tensors produces tuple elements containing</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># scalar tensors.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = tf.data.Dataset.from_tensor_slices(([1, 2], [3, 4], [5, 6]))</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(dataset.as_numpy_iterator())</code>
<code class="no-select nocode">[(1, 3, 5), (2, 4, 6)]</code>
</pre>


<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># Dictionary structure is also preserved.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = tf.data.Dataset.from_tensor_slices({&quot;a&quot;: [1, 2], &quot;b&quot;: [3, 4]})</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(dataset.as_numpy_iterator()) == [{&#x27;a&#x27;: 1, &#x27;b&#x27;: 3},</code>
<code class="devsite-terminal" data-terminal-prefix="...">                                      {&#x27;a&#x27;: 2, &#x27;b&#x27;: 4}]</code>
<code class="no-select nocode">True</code>
</pre>


<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># Two tensors can be combined into one Dataset object.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">features = tf.constant([[1, 3], [2, 1], [3, 3]]) # ==&gt; 3x2 tensor</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">labels = tf.constant([&#x27;A&#x27;, &#x27;B&#x27;, &#x27;A&#x27;]) # ==&gt; 3x1 tensor</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = Dataset.from_tensor_slices((features, labels))</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># Both the features and the labels tensors can be converted</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># to a Dataset object separately and combined after.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">features_dataset = Dataset.from_tensor_slices(features)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">labels_dataset = Dataset.from_tensor_slices(labels)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = Dataset.zip((features_dataset, labels_dataset))</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># A batched feature and label set can be converted to a Dataset</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># in similar fashion.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">batched_features = tf.constant([[[1, 3], [2, 3]],</code>
<code class="devsite-terminal" data-terminal-prefix="...">                                [[2, 1], [1, 2]],</code>
<code class="devsite-terminal" data-terminal-prefix="...">                                [[3, 3], [3, 2]]], shape=(3, 2, 2))</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">batched_labels = tf.constant([[&#x27;A&#x27;, &#x27;A&#x27;],</code>
<code class="devsite-terminal" data-terminal-prefix="...">                              [&#x27;B&#x27;, &#x27;B&#x27;],</code>
<code class="devsite-terminal" data-terminal-prefix="...">                              [&#x27;A&#x27;, &#x27;B&#x27;]], shape=(3, 2, 1))</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = Dataset.from_tensor_slices((batched_features, batched_labels))</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">for element in dataset.as_numpy_iterator():</code>
<code class="devsite-terminal" data-terminal-prefix="...">  print(element)</code>
<code class="no-select nocode">(array([[1, 3],</code>
<code class="no-select nocode">       [2, 3]], dtype=int32), array([[b&#x27;A&#x27;],</code>
<code class="no-select nocode">       [b&#x27;A&#x27;]], dtype=object))</code>
<code class="no-select nocode">(array([[2, 1],</code>
<code class="no-select nocode">       [1, 2]], dtype=int32), array([[b&#x27;B&#x27;],</code>
<code class="no-select nocode">       [b&#x27;B&#x27;]], dtype=object))</code>
<code class="no-select nocode">(array([[3, 3],</code>
<code class="no-select nocode">       [3, 2]], dtype=int32), array([[b&#x27;A&#x27;],</code>
<code class="no-select nocode">       [b&#x27;B&#x27;]], dtype=object))</code>
</pre>


Note that if `tensors` contains a NumPy array, and eager execution is not
enabled, the values will be embedded in the graph as one or more
`tf.constant` operations. For large datasets (> 1 GB), this can waste
memory and run into byte limits of graph serialization. If `tensors`
contains one or more large NumPy arrays, consider the alternative described
in [this guide](
https://tensorflow.org/guide/data#consuming_numpy_arrays).

<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Args</th></tr>

<tr>
<td>
`tensors`
</td>
<td>
A dataset element, with each component having the same size in
the first dimension.
</td>
</tr>
</table>



<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Returns</th></tr>

<tr>
<td>
`Dataset`
</td>
<td>
A `Dataset`.
</td>
</tr>
</table>



<h3 id="from_tensors"><code>from_tensors</code></h3>

<pre class="devsite-click-to-copy prettyprint lang-py tfo-signature-link">
<code>@staticmethod</code>
<code>from_tensors(
    tensors
)
</code></pre>

Creates a `Dataset` with a single element, comprising the given tensors.

`from_tensors` produces a dataset containing only a single element. To slice
the input tensor into multiple elements, use `from_tensor_slices` instead.

<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = tf.data.Dataset.from_tensors([1, 2, 3])</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(dataset.as_numpy_iterator())</code>
<code class="no-select nocode">[array([1, 2, 3], dtype=int32)]</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = tf.data.Dataset.from_tensors(([1, 2, 3], &#x27;A&#x27;))</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(dataset.as_numpy_iterator())</code>
<code class="no-select nocode">[(array([1, 2, 3], dtype=int32), b&#x27;A&#x27;)]</code>
</pre>


<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># You can use `from_tensors` to produce a dataset which repeats</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># the same example many times.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">example = tf.constant([1,2,3])</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = tf.data.Dataset.from_tensors(example).repeat(2)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(dataset.as_numpy_iterator())</code>
<code class="no-select nocode">[array([1, 2, 3], dtype=int32), array([1, 2, 3], dtype=int32)]</code>
</pre>


Note that if `tensors` contains a NumPy array, and eager execution is not
enabled, the values will be embedded in the graph as one or more
`tf.constant` operations. For large datasets (> 1 GB), this can waste
memory and run into byte limits of graph serialization. If `tensors`
contains one or more large NumPy arrays, consider the alternative described
in [this
guide](https://tensorflow.org/guide/data#consuming_numpy_arrays).

<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Args</th></tr>

<tr>
<td>
`tensors`
</td>
<td>
A dataset element.
</td>
</tr>
</table>



<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Returns</th></tr>

<tr>
<td>
`Dataset`
</td>
<td>
A `Dataset`.
</td>
</tr>
</table>



<h3 id="from_tiff"><code>from_tiff</code></h3>

<a target="_blank" href="https://github.com/tensorflow/io/blob/v0.15.0/tensorflow_io/core/python/experimental/io_dataset_ops.py#L88-L100">View source</a>

<pre class="devsite-click-to-copy prettyprint lang-py tfo-signature-link">
<code>@classmethod</code>
<code>from_tiff(
    filename, **kwargs
)
</code></pre>

Creates an `IODataset` from a TIFF file.


<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Args</th></tr>

<tr>
<td>
`filename`
</td>
<td>
A string, the filename of a TIFF file.
</td>
</tr><tr>
<td>
`name`
</td>
<td>
A name prefix for the IOTensor (optional).
</td>
</tr>
</table>



<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Returns</th></tr>
<tr class="alt">
<td colspan="2">
A `IODataset`.
</td>
</tr>

</table>



<h3 id="from_video"><code>from_video</code></h3>

<a target="_blank" href="https://github.com/tensorflow/io/blob/v0.15.0/tensorflow_io/core/python/experimental/io_dataset_ops.py#L240-L251">View source</a>

<pre class="devsite-click-to-copy prettyprint lang-py tfo-signature-link">
<code>@classmethod</code>
<code>from_video(
    filename
)
</code></pre>

Creates an `GraphIODataset` from a video file.


<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Args</th></tr>

<tr>
<td>
`filename`
</td>
<td>
A string, the sql query string.
</td>
</tr><tr>
<td>
`name`
</td>
<td>
A name prefix for the IODataset (optional).
</td>
</tr>
</table>



<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Returns</th></tr>
<tr class="alt">
<td colspan="2">
A `IODataset`.
</td>
</tr>

</table>



<h3 id="graph"><code>graph</code></h3>

<a target="_blank" href="https://github.com/tensorflow/io/blob/v0.15.0/tensorflow_io/core/python/ops/io_dataset.py#L67-L79">View source</a>

<pre class="devsite-click-to-copy prettyprint lang-py tfo-signature-link">
<code>@classmethod</code>
<code>graph(
    dtype
)
</code></pre>

Obtain a GraphIODataset to be used in graph mode.


<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Args</th></tr>

<tr>
<td>
`dtype`
</td>
<td>
Data type of the GraphIODataset.
</td>
</tr>
</table>



<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Returns</th></tr>
<tr class="alt">
<td colspan="2">
A class of `GraphIODataset`.
</td>
</tr>

</table>



<h3 id="interleave"><code>interleave</code></h3>

<pre class="devsite-click-to-copy prettyprint lang-py tfo-signature-link">
<code>interleave(
    map_func, cycle_length=None, block_length=None, num_parallel_calls=None,
    deterministic=None
)
</code></pre>

Maps `map_func` across this dataset, and interleaves the results.

For example, you can use `Dataset.interleave()` to process many input files
concurrently:

<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># Preprocess 4 files concurrently, and interleave blocks of 16 records</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># from each file.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">filenames = [&quot;/var/data/file1.txt&quot;, &quot;/var/data/file2.txt&quot;,</code>
<code class="devsite-terminal" data-terminal-prefix="...">             &quot;/var/data/file3.txt&quot;, &quot;/var/data/file4.txt&quot;]</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = tf.data.Dataset.from_tensor_slices(filenames)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">def parse_fn(filename):</code>
<code class="devsite-terminal" data-terminal-prefix="...">  return tf.data.Dataset.range(10)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = dataset.interleave(lambda x:</code>
<code class="devsite-terminal" data-terminal-prefix="...">    tf.data.TextLineDataset(x).map(parse_fn, num_parallel_calls=1),</code>
<code class="devsite-terminal" data-terminal-prefix="...">    cycle_length=4, block_length=16)</code>
</pre>


The `cycle_length` and `block_length` arguments control the order in which
elements are produced. `cycle_length` controls the number of input elements
that are processed concurrently. If you set `cycle_length` to 1, this
transformation will handle one input element at a time, and will produce
identical results to `tf.data.Dataset.flat_map`. In general,
this transformation will apply `map_func` to `cycle_length` input elements,
open iterators on the returned `Dataset` objects, and cycle through them
producing `block_length` consecutive elements from each iterator, and
consuming the next input element each time it reaches the end of an
iterator.

#### For example:

<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = Dataset.range(1, 6)  # ==&gt; [ 1, 2, 3, 4, 5 ]</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># NOTE: New lines indicate &quot;block&quot; boundaries.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = dataset.interleave(</code>
<code class="devsite-terminal" data-terminal-prefix="...">    lambda x: Dataset.from_tensors(x).repeat(6),</code>
<code class="devsite-terminal" data-terminal-prefix="...">    cycle_length=2, block_length=4)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(dataset.as_numpy_iterator())</code>
<code class="no-select nocode">[1, 1, 1, 1,</code>
<code class="no-select nocode"> 2, 2, 2, 2,</code>
<code class="no-select nocode"> 1, 1,</code>
<code class="no-select nocode"> 2, 2,</code>
<code class="no-select nocode"> 3, 3, 3, 3,</code>
<code class="no-select nocode"> 4, 4, 4, 4,</code>
<code class="no-select nocode"> 3, 3,</code>
<code class="no-select nocode"> 4, 4,</code>
<code class="no-select nocode"> 5, 5, 5, 5,</code>
<code class="no-select nocode"> 5, 5]</code>
</pre>


Note: The order of elements yielded by this transformation is
deterministic, as long as `map_func` is a pure function and
`deterministic=True`. If `map_func` contains any stateful operations, the
order in which that state is accessed is undefined.

Performance can often be improved by setting `num_parallel_calls` so that
`interleave` will use multiple threads to fetch elements. If determinism
isn't required, it can also improve performance to set
`deterministic=False`.

<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">filenames = [&quot;/var/data/file1.txt&quot;, &quot;/var/data/file2.txt&quot;,</code>
<code class="devsite-terminal" data-terminal-prefix="...">             &quot;/var/data/file3.txt&quot;, &quot;/var/data/file4.txt&quot;]</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = tf.data.Dataset.from_tensor_slices(filenames)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = dataset.interleave(lambda x: tf.data.TFRecordDataset(x),</code>
<code class="devsite-terminal" data-terminal-prefix="...">    cycle_length=4, num_parallel_calls=tf.data.experimental.AUTOTUNE,</code>
<code class="devsite-terminal" data-terminal-prefix="...">    deterministic=False)</code>
</pre>


<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Args</th></tr>

<tr>
<td>
`map_func`
</td>
<td>
A function mapping a dataset element to a dataset.
</td>
</tr><tr>
<td>
`cycle_length`
</td>
<td>
(Optional.) The number of input elements that will be
processed concurrently. If not set, the tf.data runtime decides what it
should be based on available CPU. If `num_parallel_calls` is set to
`tf.data.experimental.AUTOTUNE`, the `cycle_length` argument identifies
the maximum degree of parallelism.
</td>
</tr><tr>
<td>
`block_length`
</td>
<td>
(Optional.) The number of consecutive elements to produce
from each input element before cycling to another input element. If not
set, defaults to 1.
</td>
</tr><tr>
<td>
`num_parallel_calls`
</td>
<td>
(Optional.) If specified, the implementation creates a
threadpool, which is used to fetch inputs from cycle elements
asynchronously and in parallel. The default behavior is to fetch inputs
from cycle elements synchronously with no parallelism. If the value
`tf.data.experimental.AUTOTUNE` is used, then the number of parallel
calls is set dynamically based on available CPU.
</td>
</tr><tr>
<td>
`deterministic`
</td>
<td>
(Optional.) A boolean controlling whether determinism
should be traded for performance by allowing elements to be produced out
of order.  If `deterministic` is `None`, the
`tf.data.Options.experimental_deterministic` dataset option (`True` by
default) is used to decide whether to produce elements
deterministically.
</td>
</tr>
</table>



<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Returns</th></tr>

<tr>
<td>
`Dataset`
</td>
<td>
A `Dataset`.
</td>
</tr>
</table>



<h3 id="list_files"><code>list_files</code></h3>

<pre class="devsite-click-to-copy prettyprint lang-py tfo-signature-link">
<code>@staticmethod</code>
<code>list_files(
    file_pattern, shuffle=None, seed=None
)
</code></pre>

A dataset of all files matching one or more glob patterns.

The `file_pattern` argument should be a small number of glob patterns.
If your filenames have already been globbed, use
`Dataset.from_tensor_slices(filenames)` instead, as re-globbing every
filename with `list_files` may result in poor performance with remote
storage systems.

Note: The default behavior of this method is to return filenames in
a non-deterministic random shuffled order. Pass a `seed` or `shuffle=False`
to get results in a deterministic order.

#### Example:

If we had the following files on our filesystem:

  - /path/to/dir/a.txt
  - /path/to/dir/b.py
  - /path/to/dir/c.py

If we pass "/path/to/dir/*.py" as the directory, the dataset
would produce:

  - /path/to/dir/b.py
  - /path/to/dir/c.py



<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Args</th></tr>

<tr>
<td>
`file_pattern`
</td>
<td>
A string, a list of strings, or a `tf.Tensor` of string type
(scalar or vector), representing the filename glob (i.e. shell wildcard)
pattern(s) that will be matched.
</td>
</tr><tr>
<td>
`shuffle`
</td>
<td>
(Optional.) If `True`, the file names will be shuffled randomly.
Defaults to `True`.
</td>
</tr><tr>
<td>
`seed`
</td>
<td>
(Optional.) A `tf.int64` scalar `tf.Tensor`, representing the random
seed that will be used to create the distribution. See
`tf.random.set_seed` for behavior.
</td>
</tr>
</table>



<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Returns</th></tr>

<tr>
<td>
`Dataset`
</td>
<td>
A `Dataset` of strings corresponding to file names.
</td>
</tr>
</table>



<h3 id="map"><code>map</code></h3>

<pre class="devsite-click-to-copy prettyprint lang-py tfo-signature-link">
<code>map(
    map_func, num_parallel_calls=None, deterministic=None
)
</code></pre>

Maps `map_func` across the elements of this dataset.

This transformation applies `map_func` to each element of this dataset, and
returns a new dataset containing the transformed elements, in the same
order as they appeared in the input. `map_func` can be used to change both
the values and the structure of a dataset's elements. For example, adding 1
to each element, or projecting a subset of element components.

<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = Dataset.range(1, 6)  # ==&gt; [ 1, 2, 3, 4, 5 ]</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = dataset.map(lambda x: x + 1)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(dataset.as_numpy_iterator())</code>
<code class="no-select nocode">[2, 3, 4, 5, 6]</code>
</pre>


The input signature of `map_func` is determined by the structure of each
element in this dataset.

<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = Dataset.range(5)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># `map_func` takes a single argument of type `tf.Tensor` with the same</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># shape and dtype.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">result = dataset.map(lambda x: x + 1)</code>
</pre>


<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># Each element is a tuple containing two `tf.Tensor` objects.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">elements = [(1, &quot;foo&quot;), (2, &quot;bar&quot;), (3, &quot;baz&quot;)]</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = tf.data.Dataset.from_generator(</code>
<code class="devsite-terminal" data-terminal-prefix="...">    lambda: elements, (tf.int32, tf.string))</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># `map_func` takes two arguments of type `tf.Tensor`. This function</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># projects out just the first component.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">result = dataset.map(lambda x_int, y_str: x_int)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(result.as_numpy_iterator())</code>
<code class="no-select nocode">[1, 2, 3]</code>
</pre>


<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># Each element is a dictionary mapping strings to `tf.Tensor` objects.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">elements =  ([{&quot;a&quot;: 1, &quot;b&quot;: &quot;foo&quot;},</code>
<code class="devsite-terminal" data-terminal-prefix="...">              {&quot;a&quot;: 2, &quot;b&quot;: &quot;bar&quot;},</code>
<code class="devsite-terminal" data-terminal-prefix="...">              {&quot;a&quot;: 3, &quot;b&quot;: &quot;baz&quot;}])</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = tf.data.Dataset.from_generator(</code>
<code class="devsite-terminal" data-terminal-prefix="...">    lambda: elements, {&quot;a&quot;: tf.int32, &quot;b&quot;: tf.string})</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># `map_func` takes a single argument of type `dict` with the same keys</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># as the elements.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">result = dataset.map(lambda d: str(d[&quot;a&quot;]) + d[&quot;b&quot;])</code>
</pre>


The value or values returned by `map_func` determine the structure of each
element in the returned dataset.

<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = tf.data.Dataset.range(3)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># `map_func` returns two `tf.Tensor` objects.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">def g(x):</code>
<code class="devsite-terminal" data-terminal-prefix="...">  return tf.constant(37.0), tf.constant([&quot;Foo&quot;, &quot;Bar&quot;, &quot;Baz&quot;])</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">result = dataset.map(g)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">result.element_spec</code>
<code class="no-select nocode">(TensorSpec(shape=(), dtype=tf.float32, name=None), TensorSpec(shape=(3,), dtype=tf.string, name=None))</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># Python primitives, lists, and NumPy arrays are implicitly converted to</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># `tf.Tensor`.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">def h(x):</code>
<code class="devsite-terminal" data-terminal-prefix="...">  return 37.0, [&quot;Foo&quot;, &quot;Bar&quot;], np.array([1.0, 2.0], dtype=np.float64)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">result = dataset.map(h)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">result.element_spec</code>
<code class="no-select nocode">(TensorSpec(shape=(), dtype=tf.float32, name=None), TensorSpec(shape=(2,), dtype=tf.string, name=None), TensorSpec(shape=(2,), dtype=tf.float64, name=None))</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># `map_func` can return nested structures.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">def i(x):</code>
<code class="devsite-terminal" data-terminal-prefix="...">  return (37.0, [42, 16]), &quot;foo&quot;</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">result = dataset.map(i)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">result.element_spec</code>
<code class="no-select nocode">((TensorSpec(shape=(), dtype=tf.float32, name=None),</code>
<code class="no-select nocode">  TensorSpec(shape=(2,), dtype=tf.int32, name=None)),</code>
<code class="no-select nocode"> TensorSpec(shape=(), dtype=tf.string, name=None))</code>
</pre>


`map_func` can accept as arguments and return any type of dataset element.

Note that irrespective of the context in which `map_func` is defined (eager
vs. graph), tf.data traces the function and executes it as a graph. To use
Python code inside of the function you have a few options:

1) Rely on AutoGraph to convert Python code into an equivalent graph
computation. The downside of this approach is that AutoGraph can convert
some but not all Python code.

2) Use `tf.py_function`, which allows you to write arbitrary Python code but
will generally result in worse performance than 1). For example:

<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">d = tf.data.Dataset.from_tensor_slices([&#x27;hello&#x27;, &#x27;world&#x27;])</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># transform a string tensor to upper case string using a Python function</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">def upper_case_fn(t: tf.Tensor):</code>
<code class="devsite-terminal" data-terminal-prefix="...">  return t.numpy().decode(&#x27;utf-8&#x27;).upper()</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">d = d.map(lambda x: tf.py_function(func=upper_case_fn,</code>
<code class="devsite-terminal" data-terminal-prefix="...">          inp=[x], Tout=tf.string))</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(d.as_numpy_iterator())</code>
<code class="no-select nocode">[b&#x27;HELLO&#x27;, b&#x27;WORLD&#x27;]</code>
</pre>


3) Use `tf.numpy_function`, which also allows you to write arbitrary
Python code. Note that `tf.py_function` accepts `tf.Tensor` whereas
`tf.numpy_function` accepts numpy arrays and returns only numpy arrays.
For example:

<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">d = tf.data.Dataset.from_tensor_slices([&#x27;hello&#x27;, &#x27;world&#x27;])</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">def upper_case_fn(t: np.ndarray):</code>
<code class="devsite-terminal" data-terminal-prefix="...">  return t.decode(&#x27;utf-8&#x27;).upper()</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">d = d.map(lambda x: tf.numpy_function(func=upper_case_fn,</code>
<code class="devsite-terminal" data-terminal-prefix="...">          inp=[x], Tout=tf.string))</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(d.as_numpy_iterator())</code>
<code class="no-select nocode">[b&#x27;HELLO&#x27;, b&#x27;WORLD&#x27;]</code>
</pre>


Note that the use of `tf.numpy_function` and `tf.py_function`
in general precludes the possibility of executing user-defined
transformations in parallel (because of Python GIL).

Performance can often be improved by setting `num_parallel_calls` so that
`map` will use multiple threads to process elements. If deterministic order
isn't required, it can also improve performance to set
`deterministic=False`.

<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = Dataset.range(1, 6)  # ==&gt; [ 1, 2, 3, 4, 5 ]</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = dataset.map(lambda x: x + 1,</code>
<code class="devsite-terminal" data-terminal-prefix="...">    num_parallel_calls=tf.data.experimental.AUTOTUNE,</code>
<code class="devsite-terminal" data-terminal-prefix="...">    deterministic=False)</code>
</pre>


<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Args</th></tr>

<tr>
<td>
`map_func`
</td>
<td>
A function mapping a dataset element to another dataset element.
</td>
</tr><tr>
<td>
`num_parallel_calls`
</td>
<td>
(Optional.) A `tf.int32` scalar `tf.Tensor`,
representing the number elements to process asynchronously in parallel.
If not specified, elements will be processed sequentially. If the value
`tf.data.experimental.AUTOTUNE` is used, then the number of parallel
calls is set dynamically based on available CPU.
</td>
</tr><tr>
<td>
`deterministic`
</td>
<td>
(Optional.) A boolean controlling whether determinism
should be traded for performance by allowing elements to be produced out
of order.  If `deterministic` is `None`, the
`tf.data.Options.experimental_deterministic` dataset option (`True` by
default) is used to decide whether to produce elements
deterministically.
</td>
</tr>
</table>



<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Returns</th></tr>

<tr>
<td>
`Dataset`
</td>
<td>
A `Dataset`.
</td>
</tr>
</table>



<h3 id="options"><code>options</code></h3>

<pre class="devsite-click-to-copy prettyprint lang-py tfo-signature-link">
<code>options()
</code></pre>

Returns the options for this dataset and its inputs.


<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Returns</th></tr>
<tr class="alt">
<td colspan="2">
A `tf.data.Options` object representing the dataset options.
</td>
</tr>

</table>



<h3 id="padded_batch"><code>padded_batch</code></h3>

<pre class="devsite-click-to-copy prettyprint lang-py tfo-signature-link">
<code>padded_batch(
    batch_size, padded_shapes=None, padding_values=None, drop_remainder=False
)
</code></pre>

Combines consecutive elements of this dataset into padded batches.

This transformation combines multiple consecutive elements of the input
dataset into a single element.

Like `tf.data.Dataset.batch`, the components of the resulting element will
have an additional outer dimension, which will be `batch_size` (or
`N % batch_size` for the last element if `batch_size` does not divide the
number of input elements `N` evenly and `drop_remainder` is `False`). If
your program depends on the batches having the same outer dimension, you
should set the `drop_remainder` argument to `True` to prevent the smaller
batch from being produced.

Unlike `tf.data.Dataset.batch`, the input elements to be batched may have
different shapes, and this transformation will pad each component to the
respective shape in `padded_shapes`. The `padded_shapes` argument
determines the resulting shape for each dimension of each component in an
output element:

* If the dimension is a constant, the component will be padded out to that
  length in that dimension.
* If the dimension is unknown, the component will be padded out to the
  maximum length of all elements in that dimension.

<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">A = (tf.data.Dataset</code>
<code class="devsite-terminal" data-terminal-prefix="...">     .range(1, 5, output_type=tf.int32)</code>
<code class="devsite-terminal" data-terminal-prefix="...">     .map(lambda x: tf.fill([x], x)))</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># Pad to the smallest per-batch size that fits all elements.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">B = A.padded_batch(2)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">for element in B.as_numpy_iterator():</code>
<code class="devsite-terminal" data-terminal-prefix="...">  print(element)</code>
<code class="no-select nocode">[[1 0]</code>
<code class="no-select nocode"> [2 2]]</code>
<code class="no-select nocode">[[3 3 3 0]</code>
<code class="no-select nocode"> [4 4 4 4]]</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># Pad to a fixed size.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">C = A.padded_batch(2, padded_shapes=5)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">for element in C.as_numpy_iterator():</code>
<code class="devsite-terminal" data-terminal-prefix="...">  print(element)</code>
<code class="no-select nocode">[[1 0 0 0 0]</code>
<code class="no-select nocode"> [2 2 0 0 0]]</code>
<code class="no-select nocode">[[3 3 3 0 0]</code>
<code class="no-select nocode"> [4 4 4 4 0]]</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># Pad with a custom value.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">D = A.padded_batch(2, padded_shapes=5, padding_values=-1)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">for element in D.as_numpy_iterator():</code>
<code class="devsite-terminal" data-terminal-prefix="...">  print(element)</code>
<code class="no-select nocode">[[ 1 -1 -1 -1 -1]</code>
<code class="no-select nocode"> [ 2  2 -1 -1 -1]]</code>
<code class="no-select nocode">[[ 3  3  3 -1 -1]</code>
<code class="no-select nocode"> [ 4  4  4  4 -1]]</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># Components of nested elements can be padded independently.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">elements = [([1, 2, 3], [10]),</code>
<code class="devsite-terminal" data-terminal-prefix="...">            ([4, 5], [11, 12])]</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = tf.data.Dataset.from_generator(</code>
<code class="devsite-terminal" data-terminal-prefix="...">    lambda: iter(elements), (tf.int32, tf.int32))</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># Pad the first component of the tuple to length 4, and the second</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># component to the smallest size that fits.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = dataset.padded_batch(2,</code>
<code class="devsite-terminal" data-terminal-prefix="...">    padded_shapes=([4], [None]),</code>
<code class="devsite-terminal" data-terminal-prefix="...">    padding_values=(-1, 100))</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(dataset.as_numpy_iterator())</code>
<code class="no-select nocode">[(array([[ 1,  2,  3, -1], [ 4,  5, -1, -1]], dtype=int32),</code>
<code class="no-select nocode">  array([[ 10, 100], [ 11,  12]], dtype=int32))]</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># Pad with a single value and multiple components.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">E = tf.data.Dataset.zip((A, A)).padded_batch(2, padding_values=-1)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">for element in E.as_numpy_iterator():</code>
<code class="devsite-terminal" data-terminal-prefix="...">  print(element)</code>
<code class="no-select nocode">(array([[ 1, -1],</code>
<code class="no-select nocode">       [ 2,  2]], dtype=int32), array([[ 1, -1],</code>
<code class="no-select nocode">       [ 2,  2]], dtype=int32))</code>
<code class="no-select nocode">(array([[ 3,  3,  3, -1],</code>
<code class="no-select nocode">       [ 4,  4,  4,  4]], dtype=int32), array([[ 3,  3,  3, -1],</code>
<code class="no-select nocode">       [ 4,  4,  4,  4]], dtype=int32))</code>
</pre>


See also `tf.data.experimental.dense_to_sparse_batch`, which combines
elements that may have different shapes into a `tf.sparse.SparseTensor`.

<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Args</th></tr>

<tr>
<td>
`batch_size`
</td>
<td>
A `tf.int64` scalar `tf.Tensor`, representing the number of
consecutive elements of this dataset to combine in a single batch.
</td>
</tr><tr>
<td>
`padded_shapes`
</td>
<td>
(Optional.) A nested structure of `tf.TensorShape` or
`tf.int64` vector tensor-like objects representing the shape to which
the respective component of each input element should be padded prior
to batching. Any unknown dimensions will be padded to the maximum size
of that dimension in each batch. If unset, all dimensions of all
components are padded to the maximum size in the batch. `padded_shapes`
must be set if any component has an unknown rank.
</td>
</tr><tr>
<td>
`padding_values`
</td>
<td>
(Optional.) A nested structure of scalar-shaped
`tf.Tensor`, representing the padding values to use for the respective
components. None represents that the nested structure should be padded
with default values.  Defaults are `0` for numeric types and the empty
string for string types. The `padding_values` should have the
same structure as the input dataset. If `padding_values` is a single
element and the input dataset has multiple components, then the same
`padding_values` will be used to pad every component of the dataset.
If `padding_values` is a scalar, then its value will be broadcasted
to match the shape of each component.
</td>
</tr><tr>
<td>
`drop_remainder`
</td>
<td>
(Optional.) A `tf.bool` scalar `tf.Tensor`, representing
whether the last batch should be dropped in the case it has fewer than
`batch_size` elements; the default behavior is not to drop the smaller
batch.
</td>
</tr>
</table>



<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Returns</th></tr>

<tr>
<td>
`Dataset`
</td>
<td>
A `Dataset`.
</td>
</tr>
</table>



<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Raises</th></tr>

<tr>
<td>
`ValueError`
</td>
<td>
If a component has an unknown rank, and  the `padded_shapes`
argument is not set.
</td>
</tr>
</table>



<h3 id="prefetch"><code>prefetch</code></h3>

<pre class="devsite-click-to-copy prettyprint lang-py tfo-signature-link">
<code>prefetch(
    buffer_size
)
</code></pre>

Creates a `Dataset` that prefetches elements from this dataset.

Most dataset input pipelines should end with a call to `prefetch`. This
allows later elements to be prepared while the current element is being
processed. This often improves latency and throughput, at the cost of
using additional memory to store prefetched elements.

Note: Like other `Dataset` methods, prefetch operates on the
elements of the input dataset. It has no concept of examples vs. batches.
`examples.prefetch(2)` will prefetch two elements (2 examples),
while `examples.batch(20).prefetch(2)` will prefetch 2 elements
(2 batches, of 20 examples each).

<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = tf.data.Dataset.range(3)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = dataset.prefetch(2)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(dataset.as_numpy_iterator())</code>
<code class="no-select nocode">[0, 1, 2]</code>
</pre>


<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Args</th></tr>

<tr>
<td>
`buffer_size`
</td>
<td>
A `tf.int64` scalar `tf.Tensor`, representing the maximum
number of elements that will be buffered when prefetching.
</td>
</tr>
</table>



<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Returns</th></tr>

<tr>
<td>
`Dataset`
</td>
<td>
A `Dataset`.
</td>
</tr>
</table>



<h3 id="range"><code>range</code></h3>

<pre class="devsite-click-to-copy prettyprint lang-py tfo-signature-link">
<code>@staticmethod</code>
<code>range(
    *args, **kwargs
)
</code></pre>

Creates a `Dataset` of a step-separated range of values.

<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(Dataset.range(5).as_numpy_iterator())</code>
<code class="no-select nocode">[0, 1, 2, 3, 4]</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(Dataset.range(2, 5).as_numpy_iterator())</code>
<code class="no-select nocode">[2, 3, 4]</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(Dataset.range(1, 5, 2).as_numpy_iterator())</code>
<code class="no-select nocode">[1, 3]</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(Dataset.range(1, 5, -2).as_numpy_iterator())</code>
<code class="no-select nocode">[]</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(Dataset.range(5, 1).as_numpy_iterator())</code>
<code class="no-select nocode">[]</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(Dataset.range(5, 1, -2).as_numpy_iterator())</code>
<code class="no-select nocode">[5, 3]</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(Dataset.range(2, 5, output_type=tf.int32).as_numpy_iterator())</code>
<code class="no-select nocode">[2, 3, 4]</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(Dataset.range(1, 5, 2, output_type=tf.float32).as_numpy_iterator())</code>
<code class="no-select nocode">[1.0, 3.0]</code>
</pre>


<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Args</th></tr>

<tr>
<td>
`*args`
</td>
<td>
follows the same semantics as python's xrange.
len(args) == 1 -> start = 0, stop = args[0], step = 1.
len(args) == 2 -> start = args[0], stop = args[1], step = 1.
len(args) == 3 -> start = args[0], stop = args[1], step = args[2].
</td>
</tr><tr>
<td>
`**kwargs`
</td>
<td>

- output_type: Its expected dtype. (Optional, default: `tf.int64`).
</td>
</tr>
</table>



<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Returns</th></tr>

<tr>
<td>
`Dataset`
</td>
<td>
A `RangeDataset`.
</td>
</tr>
</table>



<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Raises</th></tr>

<tr>
<td>
`ValueError`
</td>
<td>
if len(args) == 0.
</td>
</tr>
</table>



<h3 id="reduce"><code>reduce</code></h3>

<pre class="devsite-click-to-copy prettyprint lang-py tfo-signature-link">
<code>reduce(
    initial_state, reduce_func
)
</code></pre>

Reduces the input dataset to a single element.

The transformation calls `reduce_func` successively on every element of
the input dataset until the dataset is exhausted, aggregating information in
its internal state. The `initial_state` argument is used for the initial
state and the final state is returned as the result.

<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">tf.data.Dataset.range(5).reduce(np.int64(0), lambda x, _: x + 1).numpy()</code>
<code class="no-select nocode">5</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">tf.data.Dataset.range(5).reduce(np.int64(0), lambda x, y: x + y).numpy()</code>
<code class="no-select nocode">10</code>
</pre>


<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Args</th></tr>

<tr>
<td>
`initial_state`
</td>
<td>
An element representing the initial state of the
transformation.
</td>
</tr><tr>
<td>
`reduce_func`
</td>
<td>
A function that maps `(old_state, input_element)` to
`new_state`. It must take two arguments and return a new element
The structure of `new_state` must match the structure of
`initial_state`.
</td>
</tr>
</table>



<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Returns</th></tr>
<tr class="alt">
<td colspan="2">
A dataset element corresponding to the final state of the transformation.
</td>
</tr>

</table>



<h3 id="repeat"><code>repeat</code></h3>

<pre class="devsite-click-to-copy prettyprint lang-py tfo-signature-link">
<code>repeat(
    count=None
)
</code></pre>

Repeats this dataset so each original value is seen `count` times.

<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = tf.data.Dataset.from_tensor_slices([1, 2, 3])</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = dataset.repeat(3)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(dataset.as_numpy_iterator())</code>
<code class="no-select nocode">[1, 2, 3, 1, 2, 3, 1, 2, 3]</code>
</pre>


Note: If this dataset is a function of global state (e.g. a random number
generator), then different repetitions may produce different elements.

<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Args</th></tr>

<tr>
<td>
`count`
</td>
<td>
(Optional.) A `tf.int64` scalar `tf.Tensor`, representing the
number of times the dataset should be repeated. The default behavior (if
`count` is `None` or `-1`) is for the dataset be repeated indefinitely.
</td>
</tr>
</table>



<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Returns</th></tr>

<tr>
<td>
`Dataset`
</td>
<td>
A `Dataset`.
</td>
</tr>
</table>



<h3 id="shard"><code>shard</code></h3>

<pre class="devsite-click-to-copy prettyprint lang-py tfo-signature-link">
<code>shard(
    num_shards, index
)
</code></pre>

Creates a `Dataset` that includes only 1/`num_shards` of this dataset.

`shard` is deterministic. The Dataset produced by `A.shard(n, i)` will
contain all elements of A whose index mod n = i.

<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">A = tf.data.Dataset.range(10)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">B = A.shard(num_shards=3, index=0)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(B.as_numpy_iterator())</code>
<code class="no-select nocode">[0, 3, 6, 9]</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">C = A.shard(num_shards=3, index=1)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(C.as_numpy_iterator())</code>
<code class="no-select nocode">[1, 4, 7]</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">D = A.shard(num_shards=3, index=2)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(D.as_numpy_iterator())</code>
<code class="no-select nocode">[2, 5, 8]</code>
</pre>


This dataset operator is very useful when running distributed training, as
it allows each worker to read a unique subset.

When reading a single input file, you can shard elements as follows:

```python
d = tf.data.TFRecordDataset(input_file)
d = d.shard(num_workers, worker_index)
d = d.repeat(num_epochs)
d = d.shuffle(shuffle_buffer_size)
d = d.map(parser_fn, num_parallel_calls=num_map_threads)

Important caveats:

  • Be sure to shard before you use any randomizing operator (such as shuffle).
  • Generally it is best if the shard operator is used early in the dataset pipeline. For example, when reading from a set of TFRecord files, shard before converting the dataset to input samples. This avoids reading every file on every worker. The following is an example of an efficient sharding strategy within a complete pipeline:
d = Dataset.list_files(pattern)
d = d.shard(num_workers, worker_index)
d = d.repeat(num_epochs)
d = d.shuffle(shuffle_buffer_size)
d = d.interleave(tf.data.TFRecordDataset,
                 cycle_length=num_readers, block_length=1)
d = d.map(parser_fn, num_parallel_calls=num_map_threads)

Args
num_shards A tf.int64 scalar tf.Tensor, representing the number of shards operating in parallel.
index A tf.int64 scalar tf.Tensor, representing the worker index.

Returns
Dataset A Dataset.

Raises
InvalidArgumentError if num_shards or index are illegal values.

shuffle

Randomly shuffles the elements of this dataset.

This dataset fills a buffer with buffer_size elements, then randomly samples elements from this buffer, replacing the selected elements with new elements. For perfect shuffling, a buffer size greater than or equal to the full size of the dataset is required.

For instance, if your dataset contains 10,000 elements but buffer_size is set to 1,000, then shuffle will initially select a random element from only the first 1,000 elements in the buffer. Once an element is selected, its space in the buffer is replaced by the next (i.e. 1,001-st) element, maintaining the 1,000 element buffer.

reshuffle_each_iteration controls whether the shuffle order should be different for each epoch. In TF 1.X, the idiomatic way to create epochs was through the repeat transformation:

dataset = tf.data.Dataset.range(3)
dataset = dataset.shuffle(3, reshuffle_each_iteration=True)
dataset = dataset.repeat(2)  # doctest: +SKIP
[1, 0, 2, 1, 2, 0]
dataset = tf.data.Dataset.range(3)
dataset = dataset.shuffle(3, reshuffle_each_iteration=False)
dataset = dataset.repeat(2)  # doctest: +SKIP
[1, 0, 2, 1, 0, 2]

In TF 2.0, tf.data.Dataset objects are Python iterables which makes it possible to also create epochs through Python iteration:

dataset = tf.data.Dataset.range(3)
dataset = dataset.shuffle(3, reshuffle_each_iteration=True)
list(dataset.as_numpy_iterator())  # doctest: +SKIP
[1, 0, 2]
list(dataset.as_numpy_iterator())  # doctest: +SKIP
[1, 2, 0]
dataset = tf.data.Dataset.range(3)
dataset = dataset.shuffle(3, reshuffle_each_iteration=False)
list(dataset.as_numpy_iterator())  # doctest: +SKIP
[1, 0, 2]
list(dataset.as_numpy_iterator())  # doctest: +SKIP
[1, 0, 2]

Args
buffer_size A tf.int64 scalar tf.Tensor, representing the number of elements from this dataset from which the new dataset will sample.
seed (Optional.) A tf.int64 scalar tf.Tensor, representing the random seed that will be used to create the distribution. See tf.random.set_seed for behavior.
reshuffle_each_iteration (Optional.) A boolean, which if true indicates that the dataset should be pseudorandomly reshuffled each time it is iterated over. (Defaults to True.)

Returns
Dataset A Dataset.

skip

Creates a Dataset that skips count elements from this dataset.

dataset = tf.data.Dataset.range(10)
dataset = dataset.skip(7)
list(dataset.as_numpy_iterator())
[7, 8, 9]

Args
count A tf.int64 scalar tf.Tensor, representing the number of elements of this dataset that should be skipped to form the new dataset. If count is greater than the size of this dataset, the new dataset will contain no elements. If count is -1, skips the entire dataset.

Returns
Dataset A Dataset.

stream

View source

Obtain a non-repeatable StreamIODataset to be used.

Returns
A class of StreamIODataset.

take

Creates a Dataset with at most count elements from this dataset.

dataset = tf.data.Dataset.range(10)
dataset = dataset.take(3)
list(dataset.as_numpy_iterator())
[0, 1, 2]

Args
count A tf.int64 scalar tf.Tensor, representing the number of elements of this dataset that should be taken to form the new dataset. If count is -1, or if count is greater than the size of this dataset, the new dataset will contain all elements of this dataset.

Returns
Dataset A Dataset.

to_file

View source

Write dataset to a file.

Args
dataset A dataset whose content will be written to.
filename A string, the filename of the file to write to.
name A name prefix for the IODataset (optional).

Returns
The number of records written.

unbatch

Splits elements of a dataset into multiple elements.

For example, if elements of the dataset are shaped [B, a0, a1, ...], where B may vary for each input element, then for each element in the dataset, the unbatched dataset will contain B consecutive elements of shape [a0, a1, ...].

elements = [ [1, 2, 3], [1, 2], [1, 2, 3, 4] ]
dataset = tf.data.Dataset.from_generator(lambda: elements, tf.int64)
dataset = dataset.unbatch()
list(dataset.as_numpy_iterator())
[1, 2, 3, 1, 2, 1, 2, 3, 4]

Returns
A Dataset.

window

Combines (nests of) input elements into a dataset of (nests of) windows.

A "window" is a finite dataset of flat elements of size size (or possibly fewer if there are not enough input elements to fill the window and drop_remainder evaluates to False).

The shift argument determines the number of input elements by which the window moves on each iteration. If windows and elements are both numbered starting at 0, the first element in window k will be element k * shift of the input dataset. In particular, the first element of the first window will always be the first element of the input dataset.

The stride argument determines the stride of the input elements, and the shift argument determines the shift of the window.

For example:

dataset = tf.data.Dataset.range(7).window(2)
for window in dataset:
  print(list(window.as_numpy_iterator()))
[0, 1]
[2, 3]
[4, 5]
[6]
dataset = tf.data.Dataset.range(7).window(3, 2, 1, True)
for window in dataset:
  print(list(window.as_numpy_iterator()))
[0, 1, 2]
[2, 3, 4]
[4, 5, 6]
dataset = tf.data.Dataset.range(7).window(3, 1, 2, True)
for window in dataset:
  print(list(window.as_numpy_iterator()))
[0, 2, 4]
[1, 3, 5]
[2, 4, 6]

Note that when the window transformation is applied to a dataset of nested elements, it produces a dataset of nested windows.

nested = ([1, 2, 3, 4], [5, 6, 7, 8])
dataset = tf.data.Dataset.from_tensor_slices(nested).window(2)
for window in dataset:
  def to_numpy(ds):
    return list(ds.as_numpy_iterator())
  print(tuple(to_numpy(component) for component in window))
([1, 2], [5, 6])
([3, 4], [7, 8])
dataset = tf.data.Dataset.from_tensor_slices({'a': [1, 2, 3, 4]})
dataset = dataset.window(2)
for window in dataset:
  def to_numpy(ds):
    return list(ds.as_numpy_iterator())
  print({'a': to_numpy(window['a'])})
{'a': [1, 2]}
{'a': [3, 4]}

Args
size A tf.int64 scalar tf.Tensor, representing the number of elements of the input dataset to combine into a window. Must be positive.
shift (Optional.) A tf.int64 scalar tf.Tensor, representing the number of input elements by which the window moves in each iteration. Defaults to size. Must be positive.
stride (Optional.) A tf.int64 scalar tf.Tensor, representing the stride of the input elements in the sliding window. Must be positive. The default value of 1 means "retain every input element".
drop_remainder (Optional.) A tf.bool scalar tf.Tensor, representing whether the last window should be dropped if its size is smaller than size.

Returns
Dataset A Dataset of (nests of) windows -- a finite datasets of flat elements created from the (nests of) input elements.

with_options

Returns a new tf.data.Dataset with the given options set.

The options are "global" in the sense they apply to the entire dataset. If options are set multiple times, they are merged as long as different options do not use different non-default values.

ds = tf.data.Dataset.range(5)
ds = ds.interleave(lambda x: tf.data.Dataset.range(5),
                   cycle_length=3,
                   num_parallel_calls=3)
options = tf.data.Options()
# This will make the interleave order non-deterministic.
options.experimental_deterministic = False
ds = ds.with_options(options)

Args
options A tf.data.Options that identifies the options the use.

Returns
Dataset A Dataset with the given options.

Raises
ValueError when an option is set more than once to a non-default value

zip

Creates a Dataset by zipping together the given datasets.

This method has similar semantics to the built-in zip() function in Python, with the main difference being that the datasets argument can be an arbitrary nested structure of Dataset objects.

# The nested structure of the `datasets` argument determines the
# structure of elements in the resulting dataset.
a = tf.data.Dataset.range(1, 4)  # ==> [ 1, 2, 3 ]
b = tf.data.Dataset.range(4, 7)  # ==> [ 4, 5, 6 ]
ds = tf.data.Dataset.zip((a, b))
list(ds.as_numpy_iterator())
[(1, 4), (2, 5), (3, 6)]
ds = tf.data.Dataset.zip((b, a))
list(ds.as_numpy_iterator())
[(4, 1), (5, 2), (6, 3)]

# The `datasets` argument may contain an arbitrary number of datasets.
c = tf.data.Dataset.range(7, 13).batch(2)  # ==> [ [7, 8],
                                           #       [9, 10],
                                           #       [11, 12] ]
ds = tf.data.Dataset.zip((a, b, c))
for element in ds.as_numpy_iterator():
  print(element)
(1, 4, array([7, 8]))
(2, 5, array([ 9, 10]))
(3, 6, array([11, 12]))

# The number of elements in the resulting dataset is the same as
# the size of the smallest dataset in `datasets`.
d = tf.data.Dataset.range(13, 15)  # ==> [ 13, 14 ]
ds = tf.data.Dataset.zip((a, d))
list(ds.as_numpy_iterator())
[(1, 13), (2, 14)]

Args
datasets A nested structure of datasets.

Returns
Dataset A Dataset.

__bool__

__iter__

Creates an iterator for elements of this dataset.

The returned iterator implements the Python Iterator protocol.

Returns
An tf.data.Iterator for the elements of this dataset.

Raises
RuntimeError If not inside of tf.function and not executing eagerly.

__len__

Returns the length of the dataset if it is known and finite.

This method requires that you are running in eager mode, and that the length of the dataset is known and non-infinite. When the length may be unknown or infinite, or if you are running in graph mode, use tf.data.Dataset.cardinality instead.

Returns
An integer representing the length of the dataset.

Raises
RuntimeError If the dataset length is unknown or infinite, or if eager execution is not enabled.

__nonzero__