此页面由 Cloud Translation API 翻译。
Switch to English

tfio.experimental.IODataset

在GitHub上查看源代码

IO数据集

继承自: IODataset

用于笔记本电脑

在教程中使用

variant_tensor 表示数据集的DT_VARIANT张量。

element_spec 此数据集元素的类型说明。

dataset = tf.data.Dataset.from_tensor_slices([1, 2, 3])
dataset.element_spec
TensorSpec(shape=(), dtype=tf.int32, name=None)

方法

apply

946

将转换函数应用于此数据集。

apply启用自定义Dataset转换的链接,这些自定义Dataset转换表示为采用一个Dataset参数并返回转换后的Dataset函数。

dataset = tf.data.Dataset.range(100)
def dataset_fn(ds):
  return ds.filter(lambda x: x < 5)
dataset = dataset.apply(dataset_fn)
list(dataset.as_numpy_iterator())
[0, 1, 2, 3, 4]

精氨酸
transformation_func 一个函数,一个Dataset参数并返回一个Dataset

退货
Dataset 通过将transformation_func应用于此数据Dataset返回的数据集。

as_numpy_iterator

返回一个迭代器,该迭代器将数据集的所有元素转换为numpy。

使用as_numpy_iterator检查数据集的内容。要查看元素的形状和类型,请直接打印数据集元素,而不要使用as_numpy_iterator

dataset = tf.data.Dataset.from_tensor_slices([1, 2, 3])
for element in dataset:
  print(element)
tf.Tensor(1, shape=(), dtype=int32)
tf.Tensor(2, shape=(), dtype=int32)
tf.Tensor(3, shape=(), dtype=int32)

此方法要求您以渴望模式运行,并且数据集的element_spec仅包含TensorSpec组件。

dataset = tf.data.Dataset.from_tensor_slices([1, 2, 3])
for element in dataset.as_numpy_iterator():
  print(element)
1
2
3
dataset = tf.data.Dataset.from_tensor_slices([1, 2, 3])
print(list(dataset.as_numpy_iterator()))
[1, 2, 3]

as_numpy_iterator()将保留数据集元素的嵌套结构。

dataset = tf.data.Dataset.from_tensor_slices({'a': ([1, 2], [3, 4]),
                                              'b': [5, 6]})
list(dataset.as_numpy_iterator()) == [{'a': (1, 3), 'b': 5},
                                      {'a': (2, 4), 'b': 6}]
True

退货
对数据集元素进行迭代,并将其张量转换为numpy数组。

加薪
TypeError 如果元素包含非Tensor值。
RuntimeError 如果未启用急切执行。

batch

将此数据集的连续元素合并为批。

dataset = tf.data.Dataset.range(8)
dataset = dataset.batch(3)
list(dataset.as_numpy_iterator())
[array([0, 1, 2]), array([3, 4, 5]), array([6, 7])]
dataset = tf.data.Dataset.range(8)
dataset = dataset.batch(3, drop_remainder=True)
list(dataset.as_numpy_iterator())
[array([0, 1, 2]), array([3, 4, 5])]

结果元素的组件将具有一个额外的外部尺寸,该尺寸将为batch_size (如果batch_size未将输入元素的数量N均匀地划分并且drop_remainderFalse ,则最后一个元素为N % batch_size )。如果程序依赖于具有相同外部尺寸的批次,则应将drop_remainder参数设置为True以防止产生较小的批次。

精氨酸
batch_size 一个tf.int64标量tf.Tensor ,表示要在单个批次中合并的此数据集的连续元素数。
drop_remainder (可选。) tf.bool标量tf.Tensor ,表示如果最后一个批次的数量少于batch_size元素,是否应删除该批次;默认行为是不删除较小的批次。

退货
Dataset Dataset

cache

在此数据集中缓存元素。

第一次迭代数据集时,其元素将缓存在指定的文件或内存中。随后的迭代将使用缓存的数据。

dataset = tf.data.Dataset.range(5)
dataset = dataset.map(lambda x: x**2)
dataset = dataset.cache()
# The first time reading through the data will generate the data using
# `range` and `map`.
list(dataset.as_numpy_iterator())
[0, 1, 4, 9, 16]
# Subsequent iterations read from the cache.
list(dataset.as_numpy_iterator())
[0, 1, 4, 9, 16]

缓存到文件时,缓存的数据将在运行期间保持不变。即使是第一次遍历数据,也将从缓存文件中读取。直到删除缓存文件或更改文件名,在调用.cache()之前更改输入管道才有效。

dataset = tf.data.Dataset.range(5)
dataset = dataset.cache("/path/to/file")  # doctest: +SKIP
list(dataset.as_numpy_iterator())  # doctest: +SKIP
[0, 1, 2, 3, 4]
dataset = tf.data.Dataset.range(10)
dataset = dataset.cache("/path/to/file")  # Same file! # doctest: +SKIP
list(dataset.as_numpy_iterator())  # doctest: +SKIP
[0, 1, 2, 3, 4]

精氨酸
filename tf.string标量tf.Tensor ,表示文件系统上用于在此Dataset中缓存元素的目录的名称。如果未提供文件名,则数据集将缓存在内存中。

退货
Dataset Dataset

cardinality

返回数据集的基数(如果已知)。

cardinality可以返回tf.data.INFINITE_CARDINALITY如果数据集包含的元素或在无限多个tf.data.UNKNOWN_CARDINALITY如果分析失败,以确定在数据集的元素数(例如,当数据集源是一个文件)。

dataset = tf.data.Dataset.range(42)
print(dataset.cardinality().numpy())
42
dataset = dataset.repeat()
cardinality = dataset.cardinality()
print((cardinality == tf.data.INFINITE_CARDINALITY).numpy())
True
dataset = dataset.filter(lambda x: True)
cardinality = dataset.cardinality()
print((cardinality == tf.data.UNKNOWN_CARDINALITY).numpy())
True

退货
表示数据集基数的标量tf.int64 Tensor 。如果基数是无限的或未知的,则cardinality返回命名常量tf.data.INFINITE_CARDINALITYtf.data.UNKNOWN_CARDINALITY

concatenate

通过将给定数据集与此数据Dataset连接来创建Dataset集。

a = tf.data.Dataset.range(1, 4)  # ==> [ 1, 2, 3 ]
b = tf.data.Dataset.range(4, 8)  # ==> [ 4, 5, 6, 7 ]
ds = a.concatenate(b)
list(ds.as_numpy_iterator())
[1, 2, 3, 4, 5, 6, 7]
# The input dataset and dataset to be concatenated should have the same
# nested structures and output types.
c = tf.data.Dataset.zip((a, b))
a.concatenate(c)
Traceback (most recent call last):
TypeError: Two datasets to concatenate have different types
<dtype: 'int64'> and (tf.int64, tf.int64)
d = tf.data.Dataset.from_tensor_slices(["a", "b", "c"])
a.concatenate(d)
Traceback (most recent call last):
TypeError: Two datasets to concatenate have different types
<dtype: 'int64'> and <dtype: 'string'>

精氨酸
dataset 要串联的Dataset

退货
Dataset Dataset

enumerate

枚举此数据集的元素。

它类似于python的enumerate

dataset = tf.data.Dataset.from_tensor_slices([1, 2, 3])
dataset = dataset.enumerate(start=5)
for element in dataset.as_numpy_iterator():
  print(element)
(5, 1)
(6, 2)
(7, 3)
# The nested structure of the input dataset determines the structure of
# elements in the resulting dataset.
dataset = tf.data.Dataset.from_tensor_slices([(7, 8), (9, 10)])
dataset = dataset.enumerate()
for element in dataset.as_numpy_iterator():
  print(element)
(0, array([7, 8], dtype=int32))
(1, array([ 9, 10], dtype=int32))

精氨酸
start 一个tf.int64标量tf.Tensor ,表示枚举的起始值。

退货
Dataset Dataset

filter

根据predicate过滤此数据集。

dataset = tf.data.Dataset.from_tensor_slices([1, 2, 3])
dataset = dataset.filter(lambda x: x < 3)
list(dataset.as_numpy_iterator())
[1, 2]
# `tf.math.equal(x, y)` is required for equality comparison
def filter_fn(x):
  return tf.math.equal(x, 1)
dataset = dataset.filter(filter_fn)
list(dataset.as_numpy_iterator())
[1]

精氨酸
predicate 将数据集元素映射到布尔值的函数。

退货
Dataset Dataset包含该数据集其元素predicateTrue

flat_map

跨此数据集映射map_func并展平结果。

如果要确保数据集的顺序保持不变,请使用flat_map 。例如,要将批次的数据集展平为其元素的数据集:

dataset = tf.data.Dataset.from_tensor_slices(
               [[1, 2, 3], [4, 5, 6], [7, 8, 9]])
dataset = dataset.flat_map(lambda x: Dataset.from_tensor_slices(x))
list(dataset.as_numpy_iterator())
[1, 2, 3, 4, 5, 6, 7, 8, 9]

tf.data.Dataset.interleave()是一个一般化flat_map ,由于flat_map产生输出作为相同tf.data.Dataset.interleave(cycle_length=1)

精氨酸
map_func 将数据集元素映射到数据集的函数。

退货
Dataset Dataset

from_audio

查看资料

从音频文件创建IODataset

支持以下音频文件格式:

  • WAV
  • Flac
  • 沃比斯
  • MP3

精氨酸
filename 字符串,音频文件的文件名。
name IOTensor的名称前缀(可选)。

退货
IODataset

from_avro

查看资料

从avro文件的数据集对象创建IODataset

精氨酸
filename 一个字符串,一个avro文件的文件名。
schema 字符串,Avro文件的架构。
columns avro文件中的列名列表。
name IOTensor的名称前缀(可选)。

退货
IODataset

from_ffmpeg

查看资料

通过FFmpeg从媒体文件创建IODataset

精氨酸
filename 字符串,媒体文件的文件名。
stream 字符串,流索引(例如,“ v:0”)。注意视频,音频和字幕索引分别以0开头。
name IOTensor的名称前缀(可选)。

退货
IODataset

from_generator

创建一个Dataset其元素由generator

generator参数必须是可调用对象,该对象返回支持iter()协议的对象(例如,生成器函数)。生成generator生成的元素必须与给定的output_types和(可选) output_shapes参数兼容。

import itertools

def gen():
  for i in itertools.count(1):
    yield (i, [1] * i)

dataset = tf.data.Dataset.from_generator(
     gen,
     (tf.int64, tf.int64),
     (tf.TensorShape([]), tf.TensorShape([None])))

list(dataset.take(3).as_numpy_iterator())
[(1, array([1])), (2, array([1, 1])), (3, array([1, 1, 1]))]

精氨酸
generator 可调用对象,该对象返回支持iter()协议的对象。如果未指定args ,则generator必须不带参数;否则,它必须使用与args中的值一样多的参数。
output_types tf.DType对象的嵌套结构,对应于generator产生的元素的每个组成部分。
output_shapes (可选。) tf.TensorShape对象的嵌套结构,对应于generator产生的元素的每个组成部分。
args (可选。) tf.Tensor对象的元组,将被评估并作为NumPy-array参数传递给generator

退货
Dataset Dataset

from_hdf5

查看资料

从hdf5文件的数据集对象创建IODataset

精氨酸
filename 字符串,hdf5文件的文件名。
dataset 字符串,hdf5文件中的数据集名称。
spec 数据集的tf.TensorSpec或dtype(例如tf.int64)。在图形模式下,需要规格。在紧急模式下,将自动探测规格。
name IOTensor的名称前缀(可选)。

退货
IODataset

from_json

查看资料

从json文件创建IODataset

精氨酸
filename 字符串,json文件的文件名。
columns 列名列表。默认情况下(无),将读取所有列。
mode 一个字符串,以模式(记录或无)打开json文件。
name IOTensor的名称前缀(可选)。

退货
IODataset

from_kafka

查看资料

从kafka服务器创建一个具有偏移范围的IODataset

精氨酸
topic 包含主题订阅的tf.string张量。
partition 包含分区的tf.int64张量,默认为0。
start 一个tf.int64张量,包含起始偏移量,默认为0。
stop 一个tf.int64张量,包含结束偏移量,默认为-1。
servers 引导服务器的可选列表,默认情况下为localhost:9092
configuration 可选的tf.string张量,包含[Key = Value]格式的配置。共有三种配置类型:全局配置:请参阅librdkafka文档中的“全局配置属性”。示例包括[“ enable.auto.commit = false”,“ heartbeat.interval.ms = 2000”]主题配置:请参阅librdkafka文档中的“主题配置属性”。请注意,所有主题配置均应以configuration.topic.为前缀configuration.topic. 。示例包括[“ conf.topic.auto.offset.reset =最早”]
name IODataset的名称前缀(可选)。

退货
IODataset

from_kinesis

查看资料

从Kinesis流创建IODataset

精氨酸
stream 字符串,流名称。
shard 弦,运动学的碎片。
name IODataset的名称前缀(可选)。

退货
IODataset

from_libsvm

查看资料

从libsvm文件创建IODataset

精氨酸
filename 包含一个或多个文件名的tf.string张量。
num_features 功能数量。 dtype(可选):输出特征张量的类型。默认为tf.float32。 label_dtype(可选):输出标签张量的类型。默认为tf.int64。
compression_type (可选。) tf.string标量,其值为"" (不压缩), "ZLIB""GZIP"
name IOTensor的名称前缀(可选)。

退货
IODataset

from_lmdb

查看资料

创建一个IODataset从lmdb文件。

精氨酸
filename 字符串,一个lmdb文件的文件名。
name IOTensor的名称前缀(可选)。

退货
IODataset

from_mnist

查看资料

根据MNIST图像和/或标签文件创建IODataset

精氨酸
images 字符串,MNIST图像文件的文件名。
labels 字符串,MNIST标签文件的文件名。
name IODataset的名称前缀(可选)。

退货
IODataset

from_numpy

查看资料

2

从Numpy数组创建IODataset

from_numpy允许用户根据numpy array_like的dict,元组或单个元素创建数据集。通过from_numpy创建的Dataset具有与array_like输入元素相同的dtypes。 Dataset的形状类似于array_like的输入元素,除了形状的第一维设置为“无”。原因是迭代输出的第一维度可能无法与元素总数相除。

例如:

import numpy as np
import tensorflow as tf
import tensorflow_io as tfio
a = (np.asarray([[0., 1.], [2., 3.], [4., 5.], [6., 7.], [8., 9.]]),
     np.asarray([[10, 11], [12, 13], [14, 15], [16, 17], [18, 19]]))
d = tfio.experimental.IODataset.from_numpy(a).batch(2)
for i in d:
  print(i.numpy())
# numbers of elements = [2, 2, 1] <= (5 / 2)
#
# ([[0., 1.], [2., 3.]], [[10, 11], [12, 13]]) # <= batch index 0
# ([[4., 5.], [6., 7.]], [[14, 15], [16, 17]]) # <= batch index 1
# ([[8., 9.]],           [[18, 19]])           # <= batch index 2

Args:a:字典,元组或array_like numpy数组(如果输入类型为array_like); numpy数组的dict或tuple(如果输入类型是dict或tuple)。 name:IOTensor的名称前缀(可选)。

退货
IODataset具有相同dtypes如在指定array_like a

from_numpy_file

查看资料

从Numpy文件创建IODataset

from_numpy_file允许用户从numpy文件(npy或npz)创建数据集。通过from_numpy_file创建的Dataset具有与numpy文件中的元素相同的dtypes。数据集的形状类似于numpy文件的元素,除了形状的第一维设置为“无”。原因是迭代输出的第一维度可能无法与元素总数相除。如果numpy文件由未命名元素组成,则返回numpy数组的元组,否则返回命名元素的dict。

Args:
  filename: filename of numpy file (npy or npz).
  spec: A tuple of tf.TensorSpec or dtype, or a dict of
    `name:tf.TensorSpec` or `name:dtype` pairs to specify the dtypes
    in each element of the numpy file. In eager mode spec is automatically
    probed. In graph spec must be provided. If a tuple is provided for
    spec then it is assumed that numpy file consists of `arr_0`, `arr_2`...
    If a dict is provided then numpy file should consists of named
    elements.
  name: A name prefix for the IOTensor (optional).

<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Returns</th></tr>
<tr class="alt">
<td colspan="2">
A `IODataset` with the same dtypes as of the array_like in numpy
file (npy or npz).
</td>
</tr>

</table>



<h3 id="from_parquet"><code>from_parquet</code></h3>

<a target="_blank" href="https://github.com/tensorflow/io/blob/v0.15.0/tensorflow_io/core/python/ops/io_dataset.py#L258-L275">View source</a>

<pre class="devsite-click-to-copy prettyprint lang-py tfo-signature-link">
<code>@classmethod</code>
<code>from_parquet(
    filename, columns=None, **kwargs
)
</code></pre>

Creates an `IODataset` from a Parquet file.


<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Args</th></tr>

<tr>
<td>
`filename`
</td>
<td>
A string, the filename of a Parquet file.
</td>
</tr><tr>
<td>
`columns`
</td>
<td>
A list of column names. By default (None)
all columns will be read.
</td>
</tr><tr>
<td>
`name`
</td>
<td>
A name prefix for the IOTensor (optional).
</td>
</tr>
</table>



<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Returns</th></tr>
<tr class="alt">
<td colspan="2">
A `IODataset`.
</td>
</tr>

</table>



<h3 id="from_pcap"><code>from_pcap</code></h3>

<a target="_blank" href="https://github.com/tensorflow/io/blob/v0.15.0/tensorflow_io/core/python/ops/io_dataset.py#L295-L308">View source</a>

<pre class="devsite-click-to-copy prettyprint lang-py tfo-signature-link">
<code>@classmethod</code>
<code>from_pcap(
    filename, **kwargs
)
</code></pre>

Creates an `IODataset` from a pcap file.


<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Args</th></tr>

<tr>
<td>
`filename`
</td>
<td>
A string, the filename of a pcap file.
</td>
</tr><tr>
<td>
`name`
</td>
<td>
A name prefix for the IOTensor (optional).
</td>
</tr>
</table>



<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Returns</th></tr>
<tr class="alt">
<td colspan="2">
A `IODataset`.
</td>
</tr>

</table>



<h3 id="from_prometheus"><code>from_prometheus</code></h3>

<a target="_blank" href="https://github.com/tensorflow/io/blob/v0.15.0/tensorflow_io/core/python/experimental/io_dataset_ops.py#L195-L221">View source</a>

<pre class="devsite-click-to-copy prettyprint lang-py tfo-signature-link">
<code>@classmethod</code>
<code>from_prometheus(
    query, length, offset=None, endpoint=None, spec=None
)
</code></pre>

Creates an `GraphIODataset` from a prometheus endpoint.


<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Args</th></tr>

<tr>
<td>
`query`
</td>
<td>
A string, the query string for prometheus.
</td>
</tr><tr>
<td>
`length`
</td>
<td>
An integer, the length of the query (in seconds).
</td>
</tr><tr>
<td>
`offset`
</td>
<td>
An integer, the a millisecond-precision timestamp, by default
the time when graph runs.
</td>
</tr><tr>
<td>
`endpoint`
</td>
<td>
A string, the server address of prometheus, by default
`http://localhost:9090`.
</td>
</tr><tr>
<td>
`spec`
</td>
<td>
A structured tf.TensorSpec of the dataset.
The format should be {"job": {"instance": {"name": tf.TensorSpec} } }.
In graph mode, spec is needed. In eager mode,
spec is probed automatically.
</td>
</tr><tr>
<td>
`name`
</td>
<td>
A name prefix for the IODataset (optional).
</td>
</tr>
</table>



<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Returns</th></tr>
<tr class="alt">
<td colspan="2">
A `IODataset`.
</td>
</tr>

</table>



<h3 id="from_sql"><code>from_sql</code></h3>

<a target="_blank" href="https://github.com/tensorflow/io/blob/v0.15.0/tensorflow_io/core/python/experimental/io_dataset_ops.py#L223-L238">View source</a>

<pre class="devsite-click-to-copy prettyprint lang-py tfo-signature-link">
<code>@classmethod</code>
<code>from_sql(
    query, endpoint=None, spec=None
)
</code></pre>

Creates an `GraphIODataset` from a postgresql server endpoint.


<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Args</th></tr>

<tr>
<td>
`query`
</td>
<td>
A string, the sql query string.
</td>
</tr><tr>
<td>
`endpoint`
</td>
<td>
A string, the server address of postgresql server.
</td>
</tr><tr>
<td>
`spec`
</td>
<td>
A structured (tuple) tf.TensorSpec of the dataset.
In graph mode, spec is needed. In eager mode,
spec is probed automatically.
</td>
</tr><tr>
<td>
`name`
</td>
<td>
A name prefix for the IODataset (optional).
</td>
</tr>
</table>



<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Returns</th></tr>
<tr class="alt">
<td colspan="2">
A `IODataset`.
</td>
</tr>

</table>



<h3 id="from_tensor_slices"><code>from_tensor_slices</code></h3>

<pre class="devsite-click-to-copy prettyprint lang-py tfo-signature-link">
<code>@staticmethod</code>
<code>from_tensor_slices(
    tensors
)
</code></pre>

Creates a `Dataset` whose elements are slices of the given tensors.

The given tensors are sliced along their first dimension. This operation
preserves the structure of the input tensors, removing the first dimension
of each tensor and using it as the dataset dimension. All input tensors
must have the same size in their first dimensions.

<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># Slicing a 1D tensor produces scalar tensor elements.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = tf.data.Dataset.from_tensor_slices([1, 2, 3])</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(dataset.as_numpy_iterator())</code>
<code class="no-select nocode">[1, 2, 3]</code>
</pre>


<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># Slicing a 2D tensor produces 1D tensor elements.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = tf.data.Dataset.from_tensor_slices([[1, 2], [3, 4]])</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(dataset.as_numpy_iterator())</code>
<code class="no-select nocode">[array([1, 2], dtype=int32), array([3, 4], dtype=int32)]</code>
</pre>


<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># Slicing a tuple of 1D tensors produces tuple elements containing</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># scalar tensors.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = tf.data.Dataset.from_tensor_slices(([1, 2], [3, 4], [5, 6]))</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(dataset.as_numpy_iterator())</code>
<code class="no-select nocode">[(1, 3, 5), (2, 4, 6)]</code>
</pre>


<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># Dictionary structure is also preserved.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = tf.data.Dataset.from_tensor_slices({&quot;a&quot;: [1, 2], &quot;b&quot;: [3, 4]})</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(dataset.as_numpy_iterator()) == [{&#x27;a&#x27;: 1, &#x27;b&#x27;: 3},</code>
<code class="devsite-terminal" data-terminal-prefix="...">                                      {&#x27;a&#x27;: 2, &#x27;b&#x27;: 4}]</code>
<code class="no-select nocode">True</code>
</pre>


<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># Two tensors can be combined into one Dataset object.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">features = tf.constant([[1, 3], [2, 1], [3, 3]]) # ==&gt; 3x2 tensor</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">labels = tf.constant([&#x27;A&#x27;, &#x27;B&#x27;, &#x27;A&#x27;]) # ==&gt; 3x1 tensor</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = Dataset.from_tensor_slices((features, labels))</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># Both the features and the labels tensors can be converted</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># to a Dataset object separately and combined after.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">features_dataset = Dataset.from_tensor_slices(features)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">labels_dataset = Dataset.from_tensor_slices(labels)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = Dataset.zip((features_dataset, labels_dataset))</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># A batched feature and label set can be converted to a Dataset</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># in similar fashion.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">batched_features = tf.constant([[[1, 3], [2, 3]],</code>
<code class="devsite-terminal" data-terminal-prefix="...">                                [[2, 1], [1, 2]],</code>
<code class="devsite-terminal" data-terminal-prefix="...">                                [[3, 3], [3, 2]]], shape=(3, 2, 2))</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">batched_labels = tf.constant([[&#x27;A&#x27;, &#x27;A&#x27;],</code>
<code class="devsite-terminal" data-terminal-prefix="...">                              [&#x27;B&#x27;, &#x27;B&#x27;],</code>
<code class="devsite-terminal" data-terminal-prefix="...">                              [&#x27;A&#x27;, &#x27;B&#x27;]], shape=(3, 2, 1))</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = Dataset.from_tensor_slices((batched_features, batched_labels))</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">for element in dataset.as_numpy_iterator():</code>
<code class="devsite-terminal" data-terminal-prefix="...">  print(element)</code>
<code class="no-select nocode">(array([[1, 3],</code>
<code class="no-select nocode">       [2, 3]], dtype=int32), array([[b&#x27;A&#x27;],</code>
<code class="no-select nocode">       [b&#x27;A&#x27;]], dtype=object))</code>
<code class="no-select nocode">(array([[2, 1],</code>
<code class="no-select nocode">       [1, 2]], dtype=int32), array([[b&#x27;B&#x27;],</code>
<code class="no-select nocode">       [b&#x27;B&#x27;]], dtype=object))</code>
<code class="no-select nocode">(array([[3, 3],</code>
<code class="no-select nocode">       [3, 2]], dtype=int32), array([[b&#x27;A&#x27;],</code>
<code class="no-select nocode">       [b&#x27;B&#x27;]], dtype=object))</code>
</pre>


Note that if `tensors` contains a NumPy array, and eager execution is not
enabled, the values will be embedded in the graph as one or more
`tf.constant` operations. For large datasets (> 1 GB), this can waste
memory and run into byte limits of graph serialization. If `tensors`
contains one or more large NumPy arrays, consider the alternative described
in [this guide](
https://tensorflow.org/guide/data#consuming_numpy_arrays).

<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Args</th></tr>

<tr>
<td>
`tensors`
</td>
<td>
A dataset element, with each component having the same size in
the first dimension.
</td>
</tr>
</table>



<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Returns</th></tr>

<tr>
<td>
`Dataset`
</td>
<td>
A `Dataset`.
</td>
</tr>
</table>



<h3 id="from_tensors"><code>from_tensors</code></h3>

<pre class="devsite-click-to-copy prettyprint lang-py tfo-signature-link">
<code>@staticmethod</code>
<code>from_tensors(
    tensors
)
</code></pre>

Creates a `Dataset` with a single element, comprising the given tensors.

`from_tensors` produces a dataset containing only a single element. To slice
the input tensor into multiple elements, use `from_tensor_slices` instead.

<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = tf.data.Dataset.from_tensors([1, 2, 3])</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(dataset.as_numpy_iterator())</code>
<code class="no-select nocode">[array([1, 2, 3], dtype=int32)]</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = tf.data.Dataset.from_tensors(([1, 2, 3], &#x27;A&#x27;))</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(dataset.as_numpy_iterator())</code>
<code class="no-select nocode">[(array([1, 2, 3], dtype=int32), b&#x27;A&#x27;)]</code>
</pre>


<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># You can use `from_tensors` to produce a dataset which repeats</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># the same example many times.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">example = tf.constant([1,2,3])</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = tf.data.Dataset.from_tensors(example).repeat(2)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(dataset.as_numpy_iterator())</code>
<code class="no-select nocode">[array([1, 2, 3], dtype=int32), array([1, 2, 3], dtype=int32)]</code>
</pre>


Note that if `tensors` contains a NumPy array, and eager execution is not
enabled, the values will be embedded in the graph as one or more
`tf.constant` operations. For large datasets (> 1 GB), this can waste
memory and run into byte limits of graph serialization. If `tensors`
contains one or more large NumPy arrays, consider the alternative described
in [this
guide](https://tensorflow.org/guide/data#consuming_numpy_arrays).

<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Args</th></tr>

<tr>
<td>
`tensors`
</td>
<td>
A dataset element.
</td>
</tr>
</table>



<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Returns</th></tr>

<tr>
<td>
`Dataset`
</td>
<td>
A `Dataset`.
</td>
</tr>
</table>



<h3 id="from_tiff"><code>from_tiff</code></h3>

<a target="_blank" href="https://github.com/tensorflow/io/blob/v0.15.0/tensorflow_io/core/python/experimental/io_dataset_ops.py#L88-L100">View source</a>

<pre class="devsite-click-to-copy prettyprint lang-py tfo-signature-link">
<code>@classmethod</code>
<code>from_tiff(
    filename, **kwargs
)
</code></pre>

Creates an `IODataset` from a TIFF file.


<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Args</th></tr>

<tr>
<td>
`filename`
</td>
<td>
A string, the filename of a TIFF file.
</td>
</tr><tr>
<td>
`name`
</td>
<td>
A name prefix for the IOTensor (optional).
</td>
</tr>
</table>



<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Returns</th></tr>
<tr class="alt">
<td colspan="2">
A `IODataset`.
</td>
</tr>

</table>



<h3 id="from_video"><code>from_video</code></h3>

<a target="_blank" href="https://github.com/tensorflow/io/blob/v0.15.0/tensorflow_io/core/python/experimental/io_dataset_ops.py#L240-L251">View source</a>

<pre class="devsite-click-to-copy prettyprint lang-py tfo-signature-link">
<code>@classmethod</code>
<code>from_video(
    filename
)
</code></pre>

Creates an `GraphIODataset` from a video file.


<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Args</th></tr>

<tr>
<td>
`filename`
</td>
<td>
A string, the sql query string.
</td>
</tr><tr>
<td>
`name`
</td>
<td>
A name prefix for the IODataset (optional).
</td>
</tr>
</table>



<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Returns</th></tr>
<tr class="alt">
<td colspan="2">
A `IODataset`.
</td>
</tr>

</table>



<h3 id="graph"><code>graph</code></h3>

<a target="_blank" href="https://github.com/tensorflow/io/blob/v0.15.0/tensorflow_io/core/python/ops/io_dataset.py#L67-L79">View source</a>

<pre class="devsite-click-to-copy prettyprint lang-py tfo-signature-link">
<code>@classmethod</code>
<code>graph(
    dtype
)
</code></pre>

Obtain a GraphIODataset to be used in graph mode.


<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Args</th></tr>

<tr>
<td>
`dtype`
</td>
<td>
Data type of the GraphIODataset.
</td>
</tr>
</table>



<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Returns</th></tr>
<tr class="alt">
<td colspan="2">
A class of `GraphIODataset`.
</td>
</tr>

</table>



<h3 id="interleave"><code>interleave</code></h3>

<pre class="devsite-click-to-copy prettyprint lang-py tfo-signature-link">
<code>interleave(
    map_func, cycle_length=None, block_length=None, num_parallel_calls=None,
    deterministic=None
)
</code></pre>

Maps `map_func` across this dataset, and interleaves the results.

For example, you can use `Dataset.interleave()` to process many input files
concurrently:

<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># Preprocess 4 files concurrently, and interleave blocks of 16 records</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># from each file.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">filenames = [&quot;/var/data/file1.txt&quot;, &quot;/var/data/file2.txt&quot;,</code>
<code class="devsite-terminal" data-terminal-prefix="...">             &quot;/var/data/file3.txt&quot;, &quot;/var/data/file4.txt&quot;]</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = tf.data.Dataset.from_tensor_slices(filenames)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">def parse_fn(filename):</code>
<code class="devsite-terminal" data-terminal-prefix="...">  return tf.data.Dataset.range(10)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = dataset.interleave(lambda x:</code>
<code class="devsite-terminal" data-terminal-prefix="...">    tf.data.TextLineDataset(x).map(parse_fn, num_parallel_calls=1),</code>
<code class="devsite-terminal" data-terminal-prefix="...">    cycle_length=4, block_length=16)</code>
</pre>


The `cycle_length` and `block_length` arguments control the order in which
elements are produced. `cycle_length` controls the number of input elements
that are processed concurrently. If you set `cycle_length` to 1, this
transformation will handle one input element at a time, and will produce
identical results to `tf.data.Dataset.flat_map`. In general,
this transformation will apply `map_func` to `cycle_length` input elements,
open iterators on the returned `Dataset` objects, and cycle through them
producing `block_length` consecutive elements from each iterator, and
consuming the next input element each time it reaches the end of an
iterator.

#### For example:

<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = Dataset.range(1, 6)  # ==&gt; [ 1, 2, 3, 4, 5 ]</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># NOTE: New lines indicate &quot;block&quot; boundaries.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = dataset.interleave(</code>
<code class="devsite-terminal" data-terminal-prefix="...">    lambda x: Dataset.from_tensors(x).repeat(6),</code>
<code class="devsite-terminal" data-terminal-prefix="...">    cycle_length=2, block_length=4)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(dataset.as_numpy_iterator())</code>
<code class="no-select nocode">[1, 1, 1, 1,</code>
<code class="no-select nocode"> 2, 2, 2, 2,</code>
<code class="no-select nocode"> 1, 1,</code>
<code class="no-select nocode"> 2, 2,</code>
<code class="no-select nocode"> 3, 3, 3, 3,</code>
<code class="no-select nocode"> 4, 4, 4, 4,</code>
<code class="no-select nocode"> 3, 3,</code>
<code class="no-select nocode"> 4, 4,</code>
<code class="no-select nocode"> 5, 5, 5, 5,</code>
<code class="no-select nocode"> 5, 5]</code>
</pre>


Note: The order of elements yielded by this transformation is
deterministic, as long as `map_func` is a pure function and
`deterministic=True`. If `map_func` contains any stateful operations, the
order in which that state is accessed is undefined.

Performance can often be improved by setting `num_parallel_calls` so that
`interleave` will use multiple threads to fetch elements. If determinism
isn't required, it can also improve performance to set
`deterministic=False`.

<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">filenames = [&quot;/var/data/file1.txt&quot;, &quot;/var/data/file2.txt&quot;,</code>
<code class="devsite-terminal" data-terminal-prefix="...">             &quot;/var/data/file3.txt&quot;, &quot;/var/data/file4.txt&quot;]</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = tf.data.Dataset.from_tensor_slices(filenames)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = dataset.interleave(lambda x: tf.data.TFRecordDataset(x),</code>
<code class="devsite-terminal" data-terminal-prefix="...">    cycle_length=4, num_parallel_calls=tf.data.experimental.AUTOTUNE,</code>
<code class="devsite-terminal" data-terminal-prefix="...">    deterministic=False)</code>
</pre>


<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Args</th></tr>

<tr>
<td>
`map_func`
</td>
<td>
A function mapping a dataset element to a dataset.
</td>
</tr><tr>
<td>
`cycle_length`
</td>
<td>
(Optional.) The number of input elements that will be
processed concurrently. If not set, the tf.data runtime decides what it
should be based on available CPU. If `num_parallel_calls` is set to
`tf.data.experimental.AUTOTUNE`, the `cycle_length` argument identifies
the maximum degree of parallelism.
</td>
</tr><tr>
<td>
`block_length`
</td>
<td>
(Optional.) The number of consecutive elements to produce
from each input element before cycling to another input element. If not
set, defaults to 1.
</td>
</tr><tr>
<td>
`num_parallel_calls`
</td>
<td>
(Optional.) If specified, the implementation creates a
threadpool, which is used to fetch inputs from cycle elements
asynchronously and in parallel. The default behavior is to fetch inputs
from cycle elements synchronously with no parallelism. If the value
`tf.data.experimental.AUTOTUNE` is used, then the number of parallel
calls is set dynamically based on available CPU.
</td>
</tr><tr>
<td>
`deterministic`
</td>
<td>
(Optional.) A boolean controlling whether determinism
should be traded for performance by allowing elements to be produced out
of order.  If `deterministic` is `None`, the
`tf.data.Options.experimental_deterministic` dataset option (`True` by
default) is used to decide whether to produce elements
deterministically.
</td>
</tr>
</table>



<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Returns</th></tr>

<tr>
<td>
`Dataset`
</td>
<td>
A `Dataset`.
</td>
</tr>
</table>



<h3 id="list_files"><code>list_files</code></h3>

<pre class="devsite-click-to-copy prettyprint lang-py tfo-signature-link">
<code>@staticmethod</code>
<code>list_files(
    file_pattern, shuffle=None, seed=None
)
</code></pre>

A dataset of all files matching one or more glob patterns.

The `file_pattern` argument should be a small number of glob patterns.
If your filenames have already been globbed, use
`Dataset.from_tensor_slices(filenames)` instead, as re-globbing every
filename with `list_files` may result in poor performance with remote
storage systems.

Note: The default behavior of this method is to return filenames in
a non-deterministic random shuffled order. Pass a `seed` or `shuffle=False`
to get results in a deterministic order.

#### Example:

If we had the following files on our filesystem:

  - /path/to/dir/a.txt
  - /path/to/dir/b.py
  - /path/to/dir/c.py

If we pass "/path/to/dir/*.py" as the directory, the dataset
would produce:

  - /path/to/dir/b.py
  - /path/to/dir/c.py



<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Args</th></tr>

<tr>
<td>
`file_pattern`
</td>
<td>
A string, a list of strings, or a `tf.Tensor` of string type
(scalar or vector), representing the filename glob (i.e. shell wildcard)
pattern(s) that will be matched.
</td>
</tr><tr>
<td>
`shuffle`
</td>
<td>
(Optional.) If `True`, the file names will be shuffled randomly.
Defaults to `True`.
</td>
</tr><tr>
<td>
`seed`
</td>
<td>
(Optional.) A `tf.int64` scalar `tf.Tensor`, representing the random
seed that will be used to create the distribution. See
`tf.random.set_seed` for behavior.
</td>
</tr>
</table>



<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Returns</th></tr>

<tr>
<td>
`Dataset`
</td>
<td>
A `Dataset` of strings corresponding to file names.
</td>
</tr>
</table>



<h3 id="map"><code>map</code></h3>

<pre class="devsite-click-to-copy prettyprint lang-py tfo-signature-link">
<code>map(
    map_func, num_parallel_calls=None, deterministic=None
)
</code></pre>

Maps `map_func` across the elements of this dataset.

This transformation applies `map_func` to each element of this dataset, and
returns a new dataset containing the transformed elements, in the same
order as they appeared in the input. `map_func` can be used to change both
the values and the structure of a dataset's elements. For example, adding 1
to each element, or projecting a subset of element components.

<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = Dataset.range(1, 6)  # ==&gt; [ 1, 2, 3, 4, 5 ]</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = dataset.map(lambda x: x + 1)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(dataset.as_numpy_iterator())</code>
<code class="no-select nocode">[2, 3, 4, 5, 6]</code>
</pre>


The input signature of `map_func` is determined by the structure of each
element in this dataset.

<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = Dataset.range(5)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># `map_func` takes a single argument of type `tf.Tensor` with the same</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># shape and dtype.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">result = dataset.map(lambda x: x + 1)</code>
</pre>


<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># Each element is a tuple containing two `tf.Tensor` objects.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">elements = [(1, &quot;foo&quot;), (2, &quot;bar&quot;), (3, &quot;baz&quot;)]</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = tf.data.Dataset.from_generator(</code>
<code class="devsite-terminal" data-terminal-prefix="...">    lambda: elements, (tf.int32, tf.string))</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># `map_func` takes two arguments of type `tf.Tensor`. This function</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># projects out just the first component.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">result = dataset.map(lambda x_int, y_str: x_int)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(result.as_numpy_iterator())</code>
<code class="no-select nocode">[1, 2, 3]</code>
</pre>


<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># Each element is a dictionary mapping strings to `tf.Tensor` objects.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">elements =  ([{&quot;a&quot;: 1, &quot;b&quot;: &quot;foo&quot;},</code>
<code class="devsite-terminal" data-terminal-prefix="...">              {&quot;a&quot;: 2, &quot;b&quot;: &quot;bar&quot;},</code>
<code class="devsite-terminal" data-terminal-prefix="...">              {&quot;a&quot;: 3, &quot;b&quot;: &quot;baz&quot;}])</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = tf.data.Dataset.from_generator(</code>
<code class="devsite-terminal" data-terminal-prefix="...">    lambda: elements, {&quot;a&quot;: tf.int32, &quot;b&quot;: tf.string})</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># `map_func` takes a single argument of type `dict` with the same keys</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># as the elements.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">result = dataset.map(lambda d: str(d[&quot;a&quot;]) + d[&quot;b&quot;])</code>
</pre>


The value or values returned by `map_func` determine the structure of each
element in the returned dataset.

<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = tf.data.Dataset.range(3)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># `map_func` returns two `tf.Tensor` objects.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">def g(x):</code>
<code class="devsite-terminal" data-terminal-prefix="...">  return tf.constant(37.0), tf.constant([&quot;Foo&quot;, &quot;Bar&quot;, &quot;Baz&quot;])</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">result = dataset.map(g)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">result.element_spec</code>
<code class="no-select nocode">(TensorSpec(shape=(), dtype=tf.float32, name=None), TensorSpec(shape=(3,), dtype=tf.string, name=None))</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># Python primitives, lists, and NumPy arrays are implicitly converted to</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># `tf.Tensor`.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">def h(x):</code>
<code class="devsite-terminal" data-terminal-prefix="...">  return 37.0, [&quot;Foo&quot;, &quot;Bar&quot;], np.array([1.0, 2.0], dtype=np.float64)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">result = dataset.map(h)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">result.element_spec</code>
<code class="no-select nocode">(TensorSpec(shape=(), dtype=tf.float32, name=None), TensorSpec(shape=(2,), dtype=tf.string, name=None), TensorSpec(shape=(2,), dtype=tf.float64, name=None))</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># `map_func` can return nested structures.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">def i(x):</code>
<code class="devsite-terminal" data-terminal-prefix="...">  return (37.0, [42, 16]), &quot;foo&quot;</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">result = dataset.map(i)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">result.element_spec</code>
<code class="no-select nocode">((TensorSpec(shape=(), dtype=tf.float32, name=None),</code>
<code class="no-select nocode">  TensorSpec(shape=(2,), dtype=tf.int32, name=None)),</code>
<code class="no-select nocode"> TensorSpec(shape=(), dtype=tf.string, name=None))</code>
</pre>


`map_func` can accept as arguments and return any type of dataset element.

Note that irrespective of the context in which `map_func` is defined (eager
vs. graph), tf.data traces the function and executes it as a graph. To use
Python code inside of the function you have a few options:

1) Rely on AutoGraph to convert Python code into an equivalent graph
computation. The downside of this approach is that AutoGraph can convert
some but not all Python code.

2) Use `tf.py_function`, which allows you to write arbitrary Python code but
will generally result in worse performance than 1). For example:

<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">d = tf.data.Dataset.from_tensor_slices([&#x27;hello&#x27;, &#x27;world&#x27;])</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># transform a string tensor to upper case string using a Python function</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">def upper_case_fn(t: tf.Tensor):</code>
<code class="devsite-terminal" data-terminal-prefix="...">  return t.numpy().decode(&#x27;utf-8&#x27;).upper()</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">d = d.map(lambda x: tf.py_function(func=upper_case_fn,</code>
<code class="devsite-terminal" data-terminal-prefix="...">          inp=[x], Tout=tf.string))</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(d.as_numpy_iterator())</code>
<code class="no-select nocode">[b&#x27;HELLO&#x27;, b&#x27;WORLD&#x27;]</code>
</pre>


3) Use `tf.numpy_function`, which also allows you to write arbitrary
Python code. Note that `tf.py_function` accepts `tf.Tensor` whereas
`tf.numpy_function` accepts numpy arrays and returns only numpy arrays.
For example:

<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">d = tf.data.Dataset.from_tensor_slices([&#x27;hello&#x27;, &#x27;world&#x27;])</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">def upper_case_fn(t: np.ndarray):</code>
<code class="devsite-terminal" data-terminal-prefix="...">  return t.decode(&#x27;utf-8&#x27;).upper()</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">d = d.map(lambda x: tf.numpy_function(func=upper_case_fn,</code>
<code class="devsite-terminal" data-terminal-prefix="...">          inp=[x], Tout=tf.string))</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(d.as_numpy_iterator())</code>
<code class="no-select nocode">[b&#x27;HELLO&#x27;, b&#x27;WORLD&#x27;]</code>
</pre>


Note that the use of `tf.numpy_function` and `tf.py_function`
in general precludes the possibility of executing user-defined
transformations in parallel (because of Python GIL).

Performance can often be improved by setting `num_parallel_calls` so that
`map` will use multiple threads to process elements. If deterministic order
isn't required, it can also improve performance to set
`deterministic=False`.

<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = Dataset.range(1, 6)  # ==&gt; [ 1, 2, 3, 4, 5 ]</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = dataset.map(lambda x: x + 1,</code>
<code class="devsite-terminal" data-terminal-prefix="...">    num_parallel_calls=tf.data.experimental.AUTOTUNE,</code>
<code class="devsite-terminal" data-terminal-prefix="...">    deterministic=False)</code>
</pre>


<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Args</th></tr>

<tr>
<td>
`map_func`
</td>
<td>
A function mapping a dataset element to another dataset element.
</td>
</tr><tr>
<td>
`num_parallel_calls`
</td>
<td>
(Optional.) A `tf.int32` scalar `tf.Tensor`,
representing the number elements to process asynchronously in parallel.
If not specified, elements will be processed sequentially. If the value
`tf.data.experimental.AUTOTUNE` is used, then the number of parallel
calls is set dynamically based on available CPU.
</td>
</tr><tr>
<td>
`deterministic`
</td>
<td>
(Optional.) A boolean controlling whether determinism
should be traded for performance by allowing elements to be produced out
of order.  If `deterministic` is `None`, the
`tf.data.Options.experimental_deterministic` dataset option (`True` by
default) is used to decide whether to produce elements
deterministically.
</td>
</tr>
</table>



<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Returns</th></tr>

<tr>
<td>
`Dataset`
</td>
<td>
A `Dataset`.
</td>
</tr>
</table>



<h3 id="options"><code>options</code></h3>

<pre class="devsite-click-to-copy prettyprint lang-py tfo-signature-link">
<code>options()
</code></pre>

Returns the options for this dataset and its inputs.


<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Returns</th></tr>
<tr class="alt">
<td colspan="2">
A `tf.data.Options` object representing the dataset options.
</td>
</tr>

</table>



<h3 id="padded_batch"><code>padded_batch</code></h3>

<pre class="devsite-click-to-copy prettyprint lang-py tfo-signature-link">
<code>padded_batch(
    batch_size, padded_shapes=None, padding_values=None, drop_remainder=False
)
</code></pre>

Combines consecutive elements of this dataset into padded batches.

This transformation combines multiple consecutive elements of the input
dataset into a single element.

Like `tf.data.Dataset.batch`, the components of the resulting element will
have an additional outer dimension, which will be `batch_size` (or
`N % batch_size` for the last element if `batch_size` does not divide the
number of input elements `N` evenly and `drop_remainder` is `False`). If
your program depends on the batches having the same outer dimension, you
should set the `drop_remainder` argument to `True` to prevent the smaller
batch from being produced.

Unlike `tf.data.Dataset.batch`, the input elements to be batched may have
different shapes, and this transformation will pad each component to the
respective shape in `padded_shapes`. The `padded_shapes` argument
determines the resulting shape for each dimension of each component in an
output element:

* If the dimension is a constant, the component will be padded out to that
  length in that dimension.
* If the dimension is unknown, the component will be padded out to the
  maximum length of all elements in that dimension.

<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">A = (tf.data.Dataset</code>
<code class="devsite-terminal" data-terminal-prefix="...">     .range(1, 5, output_type=tf.int32)</code>
<code class="devsite-terminal" data-terminal-prefix="...">     .map(lambda x: tf.fill([x], x)))</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># Pad to the smallest per-batch size that fits all elements.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">B = A.padded_batch(2)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">for element in B.as_numpy_iterator():</code>
<code class="devsite-terminal" data-terminal-prefix="...">  print(element)</code>
<code class="no-select nocode">[[1 0]</code>
<code class="no-select nocode"> [2 2]]</code>
<code class="no-select nocode">[[3 3 3 0]</code>
<code class="no-select nocode"> [4 4 4 4]]</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># Pad to a fixed size.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">C = A.padded_batch(2, padded_shapes=5)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">for element in C.as_numpy_iterator():</code>
<code class="devsite-terminal" data-terminal-prefix="...">  print(element)</code>
<code class="no-select nocode">[[1 0 0 0 0]</code>
<code class="no-select nocode"> [2 2 0 0 0]]</code>
<code class="no-select nocode">[[3 3 3 0 0]</code>
<code class="no-select nocode"> [4 4 4 4 0]]</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># Pad with a custom value.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">D = A.padded_batch(2, padded_shapes=5, padding_values=-1)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">for element in D.as_numpy_iterator():</code>
<code class="devsite-terminal" data-terminal-prefix="...">  print(element)</code>
<code class="no-select nocode">[[ 1 -1 -1 -1 -1]</code>
<code class="no-select nocode"> [ 2  2 -1 -1 -1]]</code>
<code class="no-select nocode">[[ 3  3  3 -1 -1]</code>
<code class="no-select nocode"> [ 4  4  4  4 -1]]</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># Components of nested elements can be padded independently.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">elements = [([1, 2, 3], [10]),</code>
<code class="devsite-terminal" data-terminal-prefix="...">            ([4, 5], [11, 12])]</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = tf.data.Dataset.from_generator(</code>
<code class="devsite-terminal" data-terminal-prefix="...">    lambda: iter(elements), (tf.int32, tf.int32))</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># Pad the first component of the tuple to length 4, and the second</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># component to the smallest size that fits.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = dataset.padded_batch(2,</code>
<code class="devsite-terminal" data-terminal-prefix="...">    padded_shapes=([4], [None]),</code>
<code class="devsite-terminal" data-terminal-prefix="...">    padding_values=(-1, 100))</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(dataset.as_numpy_iterator())</code>
<code class="no-select nocode">[(array([[ 1,  2,  3, -1], [ 4,  5, -1, -1]], dtype=int32),</code>
<code class="no-select nocode">  array([[ 10, 100], [ 11,  12]], dtype=int32))]</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;"># Pad with a single value and multiple components.</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">E = tf.data.Dataset.zip((A, A)).padded_batch(2, padding_values=-1)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">for element in E.as_numpy_iterator():</code>
<code class="devsite-terminal" data-terminal-prefix="...">  print(element)</code>
<code class="no-select nocode">(array([[ 1, -1],</code>
<code class="no-select nocode">       [ 2,  2]], dtype=int32), array([[ 1, -1],</code>
<code class="no-select nocode">       [ 2,  2]], dtype=int32))</code>
<code class="no-select nocode">(array([[ 3,  3,  3, -1],</code>
<code class="no-select nocode">       [ 4,  4,  4,  4]], dtype=int32), array([[ 3,  3,  3, -1],</code>
<code class="no-select nocode">       [ 4,  4,  4,  4]], dtype=int32))</code>
</pre>


See also `tf.data.experimental.dense_to_sparse_batch`, which combines
elements that may have different shapes into a `tf.sparse.SparseTensor`.

<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Args</th></tr>

<tr>
<td>
`batch_size`
</td>
<td>
A `tf.int64` scalar `tf.Tensor`, representing the number of
consecutive elements of this dataset to combine in a single batch.
</td>
</tr><tr>
<td>
`padded_shapes`
</td>
<td>
(Optional.) A nested structure of `tf.TensorShape` or
`tf.int64` vector tensor-like objects representing the shape to which
the respective component of each input element should be padded prior
to batching. Any unknown dimensions will be padded to the maximum size
of that dimension in each batch. If unset, all dimensions of all
components are padded to the maximum size in the batch. `padded_shapes`
must be set if any component has an unknown rank.
</td>
</tr><tr>
<td>
`padding_values`
</td>
<td>
(Optional.) A nested structure of scalar-shaped
`tf.Tensor`, representing the padding values to use for the respective
components. None represents that the nested structure should be padded
with default values.  Defaults are `0` for numeric types and the empty
string for string types. The `padding_values` should have the
same structure as the input dataset. If `padding_values` is a single
element and the input dataset has multiple components, then the same
`padding_values` will be used to pad every component of the dataset.
If `padding_values` is a scalar, then its value will be broadcasted
to match the shape of each component.
</td>
</tr><tr>
<td>
`drop_remainder`
</td>
<td>
(Optional.) A `tf.bool` scalar `tf.Tensor`, representing
whether the last batch should be dropped in the case it has fewer than
`batch_size` elements; the default behavior is not to drop the smaller
batch.
</td>
</tr>
</table>



<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Returns</th></tr>

<tr>
<td>
`Dataset`
</td>
<td>
A `Dataset`.
</td>
</tr>
</table>



<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Raises</th></tr>

<tr>
<td>
`ValueError`
</td>
<td>
If a component has an unknown rank, and  the `padded_shapes`
argument is not set.
</td>
</tr>
</table>



<h3 id="prefetch"><code>prefetch</code></h3>

<pre class="devsite-click-to-copy prettyprint lang-py tfo-signature-link">
<code>prefetch(
    buffer_size
)
</code></pre>

Creates a `Dataset` that prefetches elements from this dataset.

Most dataset input pipelines should end with a call to `prefetch`. This
allows later elements to be prepared while the current element is being
processed. This often improves latency and throughput, at the cost of
using additional memory to store prefetched elements.

Note: Like other `Dataset` methods, prefetch operates on the
elements of the input dataset. It has no concept of examples vs. batches.
`examples.prefetch(2)` will prefetch two elements (2 examples),
while `examples.batch(20).prefetch(2)` will prefetch 2 elements
(2 batches, of 20 examples each).

<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = tf.data.Dataset.range(3)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = dataset.prefetch(2)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(dataset.as_numpy_iterator())</code>
<code class="no-select nocode">[0, 1, 2]</code>
</pre>


<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Args</th></tr>

<tr>
<td>
`buffer_size`
</td>
<td>
A `tf.int64` scalar `tf.Tensor`, representing the maximum
number of elements that will be buffered when prefetching.
</td>
</tr>
</table>



<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Returns</th></tr>

<tr>
<td>
`Dataset`
</td>
<td>
A `Dataset`.
</td>
</tr>
</table>



<h3 id="range"><code>range</code></h3>

<pre class="devsite-click-to-copy prettyprint lang-py tfo-signature-link">
<code>@staticmethod</code>
<code>range(
    *args, **kwargs
)
</code></pre>

Creates a `Dataset` of a step-separated range of values.

<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(Dataset.range(5).as_numpy_iterator())</code>
<code class="no-select nocode">[0, 1, 2, 3, 4]</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(Dataset.range(2, 5).as_numpy_iterator())</code>
<code class="no-select nocode">[2, 3, 4]</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(Dataset.range(1, 5, 2).as_numpy_iterator())</code>
<code class="no-select nocode">[1, 3]</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(Dataset.range(1, 5, -2).as_numpy_iterator())</code>
<code class="no-select nocode">[]</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(Dataset.range(5, 1).as_numpy_iterator())</code>
<code class="no-select nocode">[]</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(Dataset.range(5, 1, -2).as_numpy_iterator())</code>
<code class="no-select nocode">[5, 3]</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(Dataset.range(2, 5, output_type=tf.int32).as_numpy_iterator())</code>
<code class="no-select nocode">[2, 3, 4]</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(Dataset.range(1, 5, 2, output_type=tf.float32).as_numpy_iterator())</code>
<code class="no-select nocode">[1.0, 3.0]</code>
</pre>


<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Args</th></tr>

<tr>
<td>
`*args`
</td>
<td>
follows the same semantics as python's xrange.
len(args) == 1 -> start = 0, stop = args[0], step = 1.
len(args) == 2 -> start = args[0], stop = args[1], step = 1.
len(args) == 3 -> start = args[0], stop = args[1], step = args[2].
</td>
</tr><tr>
<td>
`**kwargs`
</td>
<td>

- output_type: Its expected dtype. (Optional, default: `tf.int64`).
</td>
</tr>
</table>



<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Returns</th></tr>

<tr>
<td>
`Dataset`
</td>
<td>
A `RangeDataset`.
</td>
</tr>
</table>



<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Raises</th></tr>

<tr>
<td>
`ValueError`
</td>
<td>
if len(args) == 0.
</td>
</tr>
</table>



<h3 id="reduce"><code>reduce</code></h3>

<pre class="devsite-click-to-copy prettyprint lang-py tfo-signature-link">
<code>reduce(
    initial_state, reduce_func
)
</code></pre>

Reduces the input dataset to a single element.

The transformation calls `reduce_func` successively on every element of
the input dataset until the dataset is exhausted, aggregating information in
its internal state. The `initial_state` argument is used for the initial
state and the final state is returned as the result.

<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">tf.data.Dataset.range(5).reduce(np.int64(0), lambda x, _: x + 1).numpy()</code>
<code class="no-select nocode">5</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">tf.data.Dataset.range(5).reduce(np.int64(0), lambda x, y: x + y).numpy()</code>
<code class="no-select nocode">10</code>
</pre>


<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Args</th></tr>

<tr>
<td>
`initial_state`
</td>
<td>
An element representing the initial state of the
transformation.
</td>
</tr><tr>
<td>
`reduce_func`
</td>
<td>
A function that maps `(old_state, input_element)` to
`new_state`. It must take two arguments and return a new element
The structure of `new_state` must match the structure of
`initial_state`.
</td>
</tr>
</table>



<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Returns</th></tr>
<tr class="alt">
<td colspan="2">
A dataset element corresponding to the final state of the transformation.
</td>
</tr>

</table>



<h3 id="repeat"><code>repeat</code></h3>

<pre class="devsite-click-to-copy prettyprint lang-py tfo-signature-link">
<code>repeat(
    count=None
)
</code></pre>

Repeats this dataset so each original value is seen `count` times.

<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = tf.data.Dataset.from_tensor_slices([1, 2, 3])</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dataset = dataset.repeat(3)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(dataset.as_numpy_iterator())</code>
<code class="no-select nocode">[1, 2, 3, 1, 2, 3, 1, 2, 3]</code>
</pre>


Note: If this dataset is a function of global state (e.g. a random number
generator), then different repetitions may produce different elements.

<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Args</th></tr>

<tr>
<td>
`count`
</td>
<td>
(Optional.) A `tf.int64` scalar `tf.Tensor`, representing the
number of times the dataset should be repeated. The default behavior (if
`count` is `None` or `-1`) is for the dataset be repeated indefinitely.
</td>
</tr>
</table>



<!-- Tabular view -->
 <table class="responsive fixed orange">
<colgroup><col width="214px"><col></colgroup>
<tr><th colspan="2">Returns</th></tr>

<tr>
<td>
`Dataset`
</td>
<td>
A `Dataset`.
</td>
</tr>
</table>



<h3 id="shard"><code>shard</code></h3>

<pre class="devsite-click-to-copy prettyprint lang-py tfo-signature-link">
<code>shard(
    num_shards, index
)
</code></pre>

Creates a `Dataset` that includes only 1/`num_shards` of this dataset.

`shard` is deterministic. The Dataset produced by `A.shard(n, i)` will
contain all elements of A whose index mod n = i.

<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">A = tf.data.Dataset.range(10)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">B = A.shard(num_shards=3, index=0)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(B.as_numpy_iterator())</code>
<code class="no-select nocode">[0, 3, 6, 9]</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">C = A.shard(num_shards=3, index=1)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(C.as_numpy_iterator())</code>
<code class="no-select nocode">[1, 4, 7]</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">D = A.shard(num_shards=3, index=2)</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">list(D.as_numpy_iterator())</code>
<code class="no-select nocode">[2, 5, 8]</code>
</pre>


This dataset operator is very useful when running distributed training, as
it allows each worker to read a unique subset.

When reading a single input file, you can shard elements as follows:

```python
d = tf.data.TFRecordDataset(input_file)
d = d.shard(num_workers, worker_index)
d = d.repeat(num_epochs)
d = d.shuffle(shuffle_buffer_size)
d = d.map(parser_fn, num_parallel_calls=num_map_threads)

重要警告:

  • 使用任何随机化运算符(例如随机播放)之前,请务必先进行分片。
  • 通常,最好在数据集管道的早期使用分片运算符。例如,当从一组TFRecord文件读取时,在将数据集转换为输入样本之前先进行分片。这样可以避免读取每个工作程序上的每个文件。以下是完整管道中有效分片策略的示例:
d = Dataset.list_files(pattern)
d = d.shard(num_workers, worker_index)
d = d.repeat(num_epochs)
d = d.shuffle(shuffle_buffer_size)
d = d.interleave(tf.data.TFRecordDataset,
                 cycle_length=num_readers, block_length=1)
d = d.map(parser_fn, num_parallel_calls=num_map_threads)

精氨酸
num_shards tf.int64标量tf.Tensor ,表示并行操作的分片数。
index 一个tf.int64标量tf.Tensor ,表示工作程序索引。

退货
Dataset Dataset

加薪
InvalidArgumentError 如果num_shardsindex是非法值。

shuffle

随机重新排列此数据集的元素。

该数据集使用buffer_size元素填充缓冲区,然后从该缓冲区中随机采样元素,将所选元素替换为新元素。为了实现完美的改组,需要缓冲区大小大于或等于数据集的完整大小。

例如,如果您的数据集包含10,000个元素,但buffer_size设置为1,000,则shuffle最初将仅从缓冲区的前1,000个元素中选择一个随机元素。选择一个元素后,其缓冲区中的空间将被下一个(即1,001个)元素替换,并保留1,000个元素的缓冲区。

reshuffle_each_iteration控制每个时代的混洗顺序是否应该不同。在TF 1.X中,创建历元的惯用方式是通过repeat转换:

dataset = tf.data.Dataset.range(3)
dataset = dataset.shuffle(3, reshuffle_each_iteration=True)
dataset = dataset.repeat(2)  # doctest: +SKIP
[1, 0, 2, 1, 2, 0]
dataset = tf.data.Dataset.range(3)
dataset = dataset.shuffle(3, reshuffle_each_iteration=False)
dataset = dataset.repeat(2)  # doctest: +SKIP
[1, 0, 2, 1, 0, 2]

在TF 2.0中, tf.data.Dataset对象是Python可迭代的,这使得通过Python迭代也可以创建历元成为可能:

dataset = tf.data.Dataset.range(3)
dataset = dataset.shuffle(3, reshuffle_each_iteration=True)
list(dataset.as_numpy_iterator())  # doctest: +SKIP
[1, 0, 2]
list(dataset.as_numpy_iterator())  # doctest: +SKIP
[1, 2, 0]
dataset = tf.data.Dataset.range(3)
dataset = dataset.shuffle(3, reshuffle_each_iteration=False)
list(dataset.as_numpy_iterator())  # doctest: +SKIP
[1, 0, 2]
list(dataset.as_numpy_iterator())  # doctest: +SKIP
[1, 0, 2]

精氨酸
buffer_size 一个tf.int64标量tf.Tensor ,代表该数据集中要从中采样新数据集的元素数。
seed (可选。) tf.int64标量tf.Tensor ,表示将用于创建分布的随机种子。有关行为,请参见tf.random.set_seed
reshuffle_each_iteration (可选。)布尔值,如果为true,则表示每次迭代数据集时都应进行伪随机重排。 (默认为True 。)

退货
Dataset Dataset

skip

创建一个Dataset ,该数据集从该数据集中跳过count元素。

dataset = tf.data.Dataset.range(10)
dataset = dataset.skip(7)
list(dataset.as_numpy_iterator())
[7, 8, 9]

精氨酸
count tf.int64标量tf.Tensor ,代表应跳过此数据集以形成新数据集的元素数。如果count大于此数据集的大小,则新数据集将不包含任何元素。如果count为-1,则跳过整个数据集。

退货
Dataset Dataset

stream

查看资料

获取要使用的不可重复的StreamIODataset。

退货
一类StreamIODataset

take

创建一个Dataset至多count从这个数据集的元素。

dataset = tf.data.Dataset.range(10)
dataset = dataset.take(3)
list(dataset.as_numpy_iterator())
[0, 1, 2]

精氨酸
count 一个tf.int64标量tf.Tensor ,表示应采用此数据集的元素数量以形成新数据集。如果count为-1,或者count大于此数据集的大小,则新数据集将包含此数据集的所有元素。

退货
Dataset Dataset

to_file

查看资料

将数据集写入文件。

精氨酸
dataset 内容将被写入的数据集。
filename 字符串,即要写入的文件的文件名。
name IODataset的名称前缀(可选)。

退货
写入的记录数。

unbatch

将数据集的元素拆分为多个元素。

例如,如果数据集的元素的形状为[B, a0, a1, ...] ,其中B对于每个输入元素可能有所不同,那么对于数据集中的每个元素,未批处理的数据集将包含B个形状为[a0, a1, ...] B连续元素[a0, a1, ...]

elements = [ [1, 2, 3], [1, 2], [1, 2, 3, 4] ]
dataset = tf.data.Dataset.from_generator(lambda: elements, tf.int64)
dataset = dataset.unbatch()
list(dataset.as_numpy_iterator())
[1, 2, 3, 1, 2, 1, 2, 3, 4]

退货
Dataset

window

将输入元素(嵌套)组合到窗口(嵌套)的数据集中。

的“窗口”是尺寸的平坦元件的有限数据集size (或者,如果没有足够的输入元素以填充窗口,并可能更少drop_remainder评估为False )。

shift参数确定窗口在每次迭代中移动的输入元素的数量。如果窗口和元素都从0开始编号,则窗口k的第一个元素将是输入数据集的元素k * shift 。特别是,第一个窗口的第一个元素将始终是输入数据集的第一个元素。

stride参数确定输入元素的步幅, shift参数确定窗口的移动。

例如:

dataset = tf.data.Dataset.range(7).window(2)
for window in dataset:
  print(list(window.as_numpy_iterator()))
[0, 1]
[2, 3]
[4, 5]
[6]
dataset = tf.data.Dataset.range(7).window(3, 2, 1, True)
for window in dataset:
  print(list(window.as_numpy_iterator()))
[0, 1, 2]
[2, 3, 4]
[4, 5, 6]
dataset = tf.data.Dataset.range(7).window(3, 1, 2, True)
for window in dataset:
  print(list(window.as_numpy_iterator()))
[0, 2, 4]
[1, 3, 5]
[2, 4, 6]

请注意,将window转换应用于嵌套元素的数据集时,它将生成嵌套窗口的数据集。

nested = ([1, 2, 3, 4], [5, 6, 7, 8])
dataset = tf.data.Dataset.from_tensor_slices(nested).window(2)
for window in dataset:
  def to_numpy(ds):
    return list(ds.as_numpy_iterator())
  print(tuple(to_numpy(component) for component in window))
([1, 2], [5, 6])
([3, 4], [7, 8])
dataset = tf.data.Dataset.from_tensor_slices({'a': [1, 2, 3, 4]})
dataset = dataset.window(2)
for window in dataset:
  def to_numpy(ds):
    return list(ds.as_numpy_iterator())
  print({'a': to_numpy(window['a'])})
{'a': [1, 2]}
{'a': [3, 4]}

精氨酸
size tf.int64标量tf.Tensor ,表示要组合到窗口中的输入数据集的元素数。必须是积极的。
shift (可选。) tf.int64标量tf.Tensor ,表示窗口在每次迭代中移动的输入元素的数量。默认为size 。必须是积极的。
stride (可选。) tf.int64标量tf.Tensor ,表示滑动窗口中输入元素的跨度。必须是积极的。默认值为1表示“保留每个输入元素”。
drop_remainder (可选。) tf.bool标量tf.Tensor ,表示如果最后一个窗口的大小小于size ,是否应删除最后一个窗口。

退货
Dataset Dataset的(巢)窗-从输入元件(巢)创建的平坦元件的有限数据集。

with_options

返回具有给定选项集的新tf.data.Dataset

从适用于整个数据集的意义上讲,这些选项是“全局”的。如果选项设置了多次,则只要不同的选项不使用不同的非默认值,它们就会合并。

ds = tf.data.Dataset.range(5)
ds = ds.interleave(lambda x: tf.data.Dataset.range(5),
                   cycle_length=3,
                   num_parallel_calls=3)
options = tf.data.Options()
# This will make the interleave order non-deterministic.
options.experimental_deterministic = False
ds = ds.with_options(options)

精氨酸
options 一个tf.data.Options ,标识使用的选项。

退货
Dataset 具有给定选项的Dataset

加薪
ValueError 当一个选项被多次设置为非默认值时

zip

通过将给定的数据Dataset压缩在一起来创建Dataset集。

此方法的语义与Python内置的zip()函数相似,主要区别在于datasets参数可以是Dataset对象的任意嵌套结构。

# The nested structure of the `datasets` argument determines the
# structure of elements in the resulting dataset.
a = tf.data.Dataset.range(1, 4)  # ==> [ 1, 2, 3 ]
b = tf.data.Dataset.range(4, 7)  # ==> [ 4, 5, 6 ]
ds = tf.data.Dataset.zip((a, b))
list(ds.as_numpy_iterator())
[(1, 4), (2, 5), (3, 6)]
ds = tf.data.Dataset.zip((b, a))
list(ds.as_numpy_iterator())
[(4, 1), (5, 2), (6, 3)]

# The `datasets` argument may contain an arbitrary number of datasets.
c = tf.data.Dataset.range(7, 13).batch(2)  # ==> [ [7, 8],
                                           #       [9, 10],
                                           #       [11, 12] ]
ds = tf.data.Dataset.zip((a, b, c))
for element in ds.as_numpy_iterator():
  print(element)
(1, 4, array([7, 8]))
(2, 5, array([ 9, 10]))
(3, 6, array([11, 12]))

# The number of elements in the resulting dataset is the same as
# the size of the smallest dataset in `datasets`.
d = tf.data.Dataset.range(13, 15)  # ==> [ 13, 14 ]
ds = tf.data.Dataset.zip((a, d))
list(ds.as_numpy_iterator())
[(1, 13), (2, 14)]

精氨酸
datasets 数据集的嵌套结构。

退货
Dataset Dataset

__bool__

__iter__

为此数据集的元素创建一个迭代器。

返回的迭代器实现Python迭代器协议。

退货
此数据集元素的tf.data.Iterator

加薪
RuntimeError 如果不在tf.function内部并且不急于执行。

__len__

返回数据集的长度(如果已知且是有限的)。

此方法要求您以渴望模式运行,并且数据集的长度是已知的并且是非无限的。当长度可能是未知的或无限的时,或者如果您在图形模式下运行,请改用tf.data.Dataset.cardinality

退货
代表数据集长度的整数。

加薪
RuntimeError 如果数据集长度未知或无限,或者未启用急切执行。

__nonzero__