Mira conferencias magistrales, sesiones de productos, talleres y más de Google I / O Ver lista de reproducción


A Dataset of fixed-length records from one or more binary files.

Inherits From: Dataset

The tf.data.FixedLengthRecordDataset reads fixed length records from binary files and creates a dataset where each record becomes an element of the dataset. The binary files can have a fixed length header and a fixed length footer, which will both be skipped.

For example, suppose we have 2 files "fixed_length0.bin" and "fixed_length1.bin" with the following content:

with open('/tmp/fixed_length0.bin', 'wb') as f:
with open('/tmp/fixed_length1.bin', 'wb') as f:

We can construct a FixedLengthRecordDataset from them as follows:

dataset1 = tf.data.FixedLengthRecordDataset(
    filenames=['/tmp/fixed_length0.bin', '/tmp/fixed_length1.bin'],
    record_bytes=2, header_bytes=6, footer_bytes=6)

The elements of the dataset are:

for element in dataset1.as_numpy_iterator():

filenames A tf.string tensor or tf.data.Dataset containing one or more filenames.
record_bytes A tf.int64 scalar representing the number of bytes in each record.
header_bytes (Optional.) A tf.int64 scalar representing the number of bytes to skip at the start of a file.
footer_bytes (Optional.) A tf.int64 scalar representing the number of bytes to ignore at the end of a file.
buffer_size (Optional.) A tf.int64 scalar representing the number of bytes to buffer when reading.
compression_type (Optional.) A tf.string scalar evaluating to one of "" (no compression), "ZLIB", or "GZIP".
num_parallel_reads (Optional.) A tf.int64 scalar representing the number of files to read in parallel. If greater than one, the records of files read in parallel are outputted in an interleaved order. If your input pipeline is I/O bottlenecked, consider setting this parameter to a value greater than one to parallelize the I/O. If None, files will be read sequentially.

element_spec The type specification of an element of this dataset.

dataset = tf.data.Dataset.from_tensor_slices([1, 2, 3])
TensorSpec(shape=(), dtype=tf.int32, name=None)

For more information, read this guide.



View source

Applies a transformation function to this dataset.

apply enables chaining of custom Dataset transformations, which are represented as functions that take one Dataset argument and return a transformed Dataset.

dataset = tf.data.Dataset.range(100)
def dataset_fn(ds):
  return ds.filter(lambda x: x < 5)
dataset = dataset.apply(dataset_fn)
[0, 1, 2, 3, 4]

transformation_func A function that takes one Dataset argument and returns a Dataset.

Dataset The Dataset returned by applying transformation_func to this dataset.


View source

Returns an iterator which converts all elements of the dataset to numpy.

Use as_numpy_iterator to inspect the content of your dataset. To see element shapes and types, print dataset elements directly instead of using as_numpy_iterator.

dataset = tf.data.Dataset.from_tensor_slices([1, 2, 3])
for element in dataset:
tf.Tensor(1, shape=(), dtype=int32)
tf.Tensor(2, shape=(), dtype=int32)
tf.Tensor(3, shape=(), dtype=int32)

This method requires that you are running in eager mode and the dataset's element_spec contains only TensorSpec components.

dataset = tf.data.Dataset.from_tensor_slices([1, 2, 3])
for element in dataset.as_numpy_iterator():
dataset = tf.data.Dataset.from_tensor_slices([1, 2, 3])
[1, 2, 3]

as_numpy_iterator() will preserve the nested structure of dataset elements.

dataset = tf.data.Dataset.from_tensor_slices({'a': ([1, 2], [3, 4]),
                                              'b': [5, 6]})
list(dataset.as_numpy_iterator()) == [{'a': (1, 3), 'b': 5},
                                      {'a': (2, 4), 'b': 6}]

An iterable over the elements of the dataset, with their tensors converted to numpy arrays.

TypeError if an element contains a non-Tensor value.
RuntimeError if eager execution is not enabled.


View source

Combines consecutive elements of this dataset into batches.

dataset = tf.data.Dataset.range(8)
dataset = dataset.batch(3)
[array([0, 1, 2]), array([3, 4, 5]), array([6, 7])]
dataset = tf.data.Dataset.range(8)
dataset = dataset.batch(3, drop_remainder=True)
[array([0, 1, 2]), array([3, 4, 5])]

The components of the resulting element will have an additional outer dimension, which will be batch_size (or N % batch_size for the last element if batch_size does not divide the number of input elements N evenly and drop_remainder is False). If your program depends on the batches having the same outer dimension, you should set the drop_remainder argument to True to prevent the smaller batch from being produced.

batch_size A tf.int64 scalar tf.Tensor, representing the number of consecutive elements of this dataset to combine in a single batch.
drop_remainder (Optional.) A tf.bool scalar tf.Tensor, representing whether the last batch should be dropped in the case it has fewer than batch_size elements; the default behavior is not to drop the smaller batch.
num_parallel_calls (Optional.) A tf.int64 scalar tf.Tensor, representing the number of batches to compute asynchronously in parallel. If not specified, batches will be computed sequentially. If the value tf.data.AUTOTUNE is used, then the number of parallel calls is set dynamically based on available resources.
deterministic (Optional.) When num_parallel_calls is specified, if this boolean is specified (True or False), it controls the order in which the transformation produces elements. If set to False, the transformation is allowed to yield elements out of order to trade determinism for performance. If not specified, the tf.data.Options.experimental_deterministic option (True by default) controls the behavior.

Dataset A Dataset.


View source

Caches the elements in this dataset.

The first time the dataset is iterated over, its elements will be cached either in the specified file or in memory. Subsequent iterations will use the cached data.

dataset = tf.data.Dataset.range(5)