tf.contrib.data.parallel_interleave( map_func, cycle_length, block_length=1, sloppy=False, buffer_output_elements=None, prefetch_input_elements=None )
A parallel version of the
map_func across its input to produce nested
datasets, and outputs their elements interleaved. Unlike
tf.data.Dataset.interleave, it gets elements from
datasets in parallel, which increases the throughput, especially in the
presence of stragglers. Furthermore, the
sloppy argument can be used to
improve performance, by relaxing the requirement that the outputs are produced
in a deterministic order, and allowing the implementation to skip over nested
datasets whose elements are not readily available when requested.
# Preprocess 4 files concurrently. filenames = tf.data.Dataset.list_files("/path/to/data/train*.tfrecords") dataset = filenames.apply( tf.contrib.data.parallel_interleave( lambda filename: tf.data.TFRecordDataset(filename), cycle_length=4))
True, the order of produced elements is not
map_func: A function mapping a nested structure of tensors to a
cycle_length: The number of input
Datasets to interleave from in parallel.
block_length: The number of consecutive elements to pull from an input
Datasetbefore advancing to the next input
sloppy: If false, elements are produced in deterministic order. Otherwise, the implementation is allowed, for the sake of expediency, to produce elements in a non-deterministic order.
buffer_output_elements: The number of elements each iterator being interleaved should buffer (similar to the
.prefetch()transformation for each interleaved iterator).
prefetch_input_elements: The number of input elements to transform to iterators before they are needed for interleaving.
Dataset transformation function, which can be passed to