TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

tfds.ReadConfig

Configures input reading pipeline.

tfds.ReadConfig(
    options: Optional[tf.data.Options] = None,
    try_autocache: bool = True,
    repeat_filenames: bool = False,
    add_tfds_id: bool = False,
    shuffle_seed: Optional[int] = None,
    shuffle_reshuffle_each_iteration: Optional[bool] = None,
    interleave_cycle_length: Union[Optional[int], _MISSING] = MISSING,
    interleave_block_length: Optional[int] = 16,
    input_context: Optional[tf.distribute.InputContext] = None,
    experimental_interleave_sort_fn: Optional[InterleaveSortFn] = None,
    skip_prefetch: bool = False,
    num_parallel_calls_for_decode: Optional[int] = None,
    num_parallel_calls_for_interleave_files: Optional[int] = None,
    enable_ordering_guard: bool = True,
    assert_cardinality: bool = True,
    override_buffer_size: Optional[int] = None
)

Used in the notebooks

Used in the tutorials
TFDS and determinism

Attributes
`options`	`tf.data.Options()`, dataset options to use. Note that when `shuffle_files` is True and no seed is defined, deterministic will be set to False internally, unless it is defined here.
`try_autocache`	If True (default) and the dataset satisfy the right conditions (dataset small enough, files not shuffled,...) the dataset will be cached during the first iteration (through `ds = ds.cache()`).
`repeat_filenames`	If True, repeat the filenames iterator. This will result in an infinite dataset. Repeat is called after the shuffle of the filenames.
`add_tfds_id`	If True, examples `dict` in `tf.data.Dataset` will have an additional key `'tfds_id': tf.Tensor(shape=(), dtype=tf.string)` containing the example unique identifier (e.g. 'train.tfrecord-000045-of-001024__123'). Note: IDs might changes in future version of TFDS.
`shuffle_seed`	`tf.int64`, seed forwarded to `tf.data.Dataset.shuffle` during file shuffling (which happens when `tfds.load(..., shuffle_files=True)`).
`shuffle_reshuffle_each_iteration`	`bool`, forwarded to `tf.data.Dataset.shuffle` during file shuffling (which happens when `tfds.load(..., shuffle_files=True)`).
`interleave_cycle_length`	`int`, forwarded to `tf.data.Dataset.interleave`.
`interleave_block_length`	`int`, forwarded to `tf.data.Dataset.interleave`.
`input_context`	`tf.distribute.InputContext`, if set, each worker will read a different set of file. For more info, see the distribute_datasets_from_function documentation. Note: * Each workers will always read the same subset of files. `shuffle_files` only shuffle files within each worker. * If `info.splits[split].num_shards < input_context.num_input_pipelines`, an error will be raised, as some workers would be empty.
`experimental_interleave_sort_fn`	Function with signature `List[FileDict] -> List[FileDict]`, which takes the list of `dict(file: str, take: int, skip: int)` and returns the modified version to read. This can be used to sort/shuffle the shards to read in a custom order, instead of relying on `shuffle_files=True`.
`skip_prefetch`	If False (default), add a `ds.prefetch()` op at the end. Might be set for performance optimization in some cases (e.g. if you're already calling `ds.prefetch()` at the end of your pipeline)
`num_parallel_calls_for_decode`	The number of parallel calls for decoding record. By default using tf.data's AUTOTUNE.
`num_parallel_calls_for_interleave_files`	The number of parallel calls for interleaving files. By default using tf.data's AUTOTUNE.
`enable_ordering_guard`	When True (default), an exception is raised if shuffling or interleaving are used on an ordered dataset.
`assert_cardinality`	When True (default), an exception is raised if at the end of an Epoch the number of read examples does not match the expected number from dataset metadata. A power user would typically want to set False if input files have been tempered with and they don't mind missing records or have too many of them.
`override_buffer_size`	number of bytes to pass to file readers for buffering.

Class Variables
add_tfds_id	`False`
assert_cardinality	`True`
enable_ordering_guard	`True`
experimental_interleave_sort_fn	`None`
input_context	`None`
interleave_block_length	`16`
interleave_cycle_length	`'missing'`
num_parallel_calls_for_decode	`None`
num_parallel_calls_for_interleave_files	`None`
options	`None`
override_buffer_size	`None`
repeat_filenames	`False`
shuffle_reshuffle_each_iteration	`None`
shuffle_seed	`None`
skip_prefetch	`False`
try_autocache	`True`

tfds.ReadConfig

Used in the notebooks

Attributes

Class Variables