View source on GitHub

RecordInput asynchronously reads and randomly yields TFRecords.

A RecordInput Op will continuously read a batch of records asynchronously into a buffer of some fixed capacity. It can also asynchronously yield random records from this buffer.

It will not start yielding until at least buffer_size / 2 elements have been placed into the buffer so that sufficient randomization can take place.

The order the files are read will be shifted each epoch by shift_amount so that the data is presented in a different order every epoch.

file_pattern File path to the dataset, possibly containing wildcards. All matching files will be iterated over each epoch.
batch_size How many records to return at a time.
buffer_size The maximum number of records the buffer will contain.
parallelism How many reader threads to use for reading from files.
shift_ratio What percentage of the total number files to move the start file forward by each epoch.
seed Specify the random number seed used by generator that randomizes records.
name Optional name for the operation.
batches None by default, creating a single batch op. Otherwise specifies how many batches to create, which are returned as a list when get_yield_op() is called. An example use case is to split processing between devices on one computer.
compression_type The type of compression for the file. Currently ZLIB and GZIP are supported. Defaults to none.

ValueError If one of the arguments is invalid.



View source

Adds a node that yields a group of records every time it is executed. If RecordInput batches parameter is not None, it yields a list of record batches with the specified batch_size.