tf.data.FixedLengthRecordDataset

A Dataset of fixed-length records from one or more binary files.

Inherits From: Dataset

The tf.data.FixedLengthRecordDataset reads fixed length records from binary files and creates a dataset where each record becomes an element of the dataset. The binary files can have a fixed length header and a fixed length footer, which will both be skipped.

For example, suppose we have 2 files "fixed_length0.bin" and "fixed_length1.bin" with the following content:

with open('/tmp/fixed_length0.bin', 'wb') as f:
  f.write(b'HEADER012345FOOTER')
with open('/tmp/fixed_length1.bin', 'wb') as f:
  f.write(b'HEADER6789abFOOTER')

We can construct a FixedLengthRecordDataset from them as follows:

dataset1 = tf.data.FixedLengthRecordDataset(
    filenames=['/tmp/fixed_length0.bin', '/tmp/fixed_length1.bin'],
    record_bytes=2, header_bytes=6, footer_bytes=6)

The elements of the dataset are:

for element in dataset1.as_numpy_iterator():
  print(element)
b'01'
b'23'
b'45'
b'67'
b'89'
b'ab'

filenames A tf.string tensor or tf.data.Dataset containing one or more filenames.
record_bytes A tf.int64 scalar representing the number of bytes in each record.
header_bytes (Optional.) A tf.int64 scalar representing the number of bytes to skip at the start of a file.
footer_bytes (Optional.) A tf.int64 scalar representing the number of bytes to ignore at the end of a file.
buffer_size (Optional.) A tf.int64 scalar representing the number of bytes to buffer when reading.
compression_type (Optional.) A tf.string scalar evaluating to one of "" (no compression), "ZLIB", or "GZIP".
num_parallel_reads (Optional.) A tf.int64 scalar representing the number of files to read in parallel. If greater than one, the records of files read in parallel are outputted in an interleaved order. If your input pipeline is I/O bottlenecked, consider setting this parameter to a value greater than one to parallelize the I/O. If None, files will be read sequentially.

element_spec The type specification of an element of this dataset.

dataset = tf.data.Dataset.from_tensor_slices([1, 2, 3])
dataset.element_spec
TensorSpec(shape=(), dtype=tf.int32, name=None)

For more information, read this guide.

Methods

apply

View source

Applies a transformation function to this dataset.

apply enables chaining of custom Dataset transformations, which are represented as functions that take one Dataset argument and return a transformed Dataset.

dataset = tf.data.Dataset.range(100)
def dataset_fn(ds):
  return ds.filter(lambda x: x < 5)
dataset = dataset.apply(dataset_fn)
list(dataset.as_numpy_iterator())
[0, 1, 2, 3, 4]

Args
transformation_func A function that takes one Dataset argument and returns a Dataset.

Returns
Dataset The Dataset returne