Configures input reading pipeline.

Used in the notebooks

Used in the tutorials

options, dataset options to use. Note that when shuffle_files is True and no seed is defined, deterministic will be set to False internally, unless it is defined here.
try_autocache If True (default) and the dataset satisfy the right conditions (dataset small enough, files not shuffled,...) the dataset will be cached during the first iteration (through ds = ds.cache()).
repeat_filenames If True, repeat the filenames iterator. This will result in an infinite dataset. Repeat is called after the shuffle of the filenames.
add_tfds_id If True, examples dict in will have an additional key 'tfds_id': tf.Tensor(shape=(), dtype=tf.string) containing the example unique identifier (e.g. 'train.tfrecord-000045-of-001024__123'). Note: IDs might changes in future version of TFDS.
shuffle_seed tf.int64, seed forwarded to during file shuffling (which happens when tfds.load(..., shuffle_files=True)).
shuffle_reshuffle_each_iteration bool, forwarded to during file shuffling (which happens when tfds.load(..., shuffle_files=True)).
interleave_cycle_length int, forwarded to
interleave_block_length int, forwarded to
input_context tf.distribute.InputContext, if set, each worker will read a different set of file. For more info, see the distribute_datasets_from_function documentation. Note: * Each workers will always read the same subset of files. shuffle_files only shuffle files within each worker. * If info.splits[split].num_shards < input_context.num_input_pipelines, an error will be raised, as some workers would be empty.
experimental_interleave_sort_fn Function with signature List[FileDict] -> List[FileDict], which takes the list of dict(file: str, take: int, skip: int) and returns the modified version to read. This can be used to sort/shuffle the shards to read in a custom order, instead of relying on shuffle_files=True.
skip_prefetch If False (default), add a ds.prefetch() op at the end. Might be set for performance optimization in some cases (e.g. if you're already calling ds.prefetch() at the end of your pipeline)
num_parallel_calls_for_decode The number of parallel calls for decoding record. By default using's AUTOTUNE.
num_parallel_calls_for_interleave_files The number of parallel calls for interleaving files. By default using's AUTOTUNE.
enable_ordering_guard When True (default), an exception is raised if shuffling or interleaving are used on an ordered dataset.
assert_cardinality When True (default), an exception is raised if at the end of an Epoch the number of read examples does not match the expected number from dataset metadata. A power user would typically want to set False if input files have been tempered with and they don't mind missing records or have too many of them.
override_buffer_size number of bytes to pass to file readers for buffering.

add_tfds_id False
assert_cardinality True
enable_ordering_guard True
experimental_interleave_sort_fn None
input_context None
interleave_block_length 16
interleave_cycle_length 'missing'
num_parallel_calls_for_decode None
num_parallel_calls_for_interleave_files None
options None
override_buffer_size None
repeat_filenames False
shuffle_reshuffle_each_iteration None
shuffle_seed None
skip_prefetch False
try_autocache True