|View source on GitHub|
RecordInput asynchronously reads and randomly yields TFRecords.
A RecordInput Op will continuously read a batch of records asynchronously into a buffer of some fixed capacity. It can also asynchronously yield random records from this buffer.
It will not start yielding until at least
buffer_size / 2 elements have been
placed into the buffer so that sufficient randomization can take place.
The order the files are read will be shifted each epoch by
that the data is presented in a different order every epoch.
__init__( file_pattern, batch_size=1, buffer_size=1, parallelism=1, shift_ratio=0, seed=0, name=None, batches=None, compression_type=None )
Constructs a RecordInput Op.
file_pattern: File path to the dataset, possibly containing wildcards. All matching files will be iterated over each epoch.
batch_size: How many records to return at a time.
buffer_size: The maximum number of records the buffer will contain.
parallelism: How many reader threads to use for reading from files.
shift_ratio: What percentage of the total number files to move the start file forward by each epoch.
seed: Specify the random number seed used by generator that randomizes records.
name: Optional name for the operation.
batches: None by default, creating a single batch op. Otherwise specifies how many batches to create, which are returned as a list when
get_yield_op()is called. An example use case is to split processing between devices on one computer.
compression_type: The type of compression for the file. Currently ZLIB and GZIP are supported. Defaults to none.
ValueError: If one of the arguments is invalid.
Adds a node that yields a group of records every time it is executed.
batches parameter is not None, it yields a list of
record batches with the specified