tfds.ReadConfig
Stay organized with collections
Save and categorize content based on your preferences.
Configures input reading pipeline.
tfds.ReadConfig(
options: Optional[tf.data.Options] = None,
try_autocache: bool = True,
repeat_filenames: bool = False,
add_tfds_id: bool = False,
shuffle_seed: Optional[int] = None,
shuffle_reshuffle_each_iteration: Optional[bool] = None,
interleave_cycle_length: Union[Optional[int], _MISSING] = MISSING,
interleave_block_length: Optional[int] = 16,
input_context: Optional[tf.distribute.InputContext] = None,
experimental_interleave_sort_fn: Optional[InterleaveSortFn] = None,
skip_prefetch: bool = False,
num_parallel_calls_for_decode: Optional[int] = None,
num_parallel_calls_for_interleave_files: Optional[int] = None,
enable_ordering_guard: bool = True,
assert_cardinality: bool = True,
override_buffer_size: Optional[int] = None
)
Used in the notebooks
Attributes |
options
|
tf.data.Options() , dataset options to use. Note that when
shuffle_files is True and no seed is defined, deterministic will be set
to False internally, unless it is defined here.
|
try_autocache
|
If True (default) and the dataset satisfy the right
conditions (dataset small enough, files not shuffled,...) the dataset will
be cached during the first iteration (through ds = ds.cache() ).
|
repeat_filenames
|
If True, repeat the filenames iterator. This will result
in an infinite dataset. Repeat is called after the shuffle of the
filenames.
|
add_tfds_id
|
If True, examples dict in tf.data.Dataset will have an
additional key 'tfds_id': tf.Tensor(shape=(), dtype=tf.string)
containing the example unique identifier (e.g.
'train.tfrecord-000045-of-001024__123').
Note: IDs might changes in future version of TFDS.
|
shuffle_seed
|
tf.int64 , seed forwarded to tf.data.Dataset.shuffle during
file shuffling (which happens when tfds.load(..., shuffle_files=True) ).
|
shuffle_reshuffle_each_iteration
|
bool , forwarded to
tf.data.Dataset.shuffle during file shuffling (which happens when
tfds.load(..., shuffle_files=True) ).
|
interleave_cycle_length
|
int , forwarded to tf.data.Dataset.interleave .
|
interleave_block_length
|
int , forwarded to tf.data.Dataset.interleave .
|
input_context
|
tf.distribute.InputContext , if set, each worker will read a
different set of file. For more info, see the
distribute_datasets_from_function
documentation.
Note: * Each workers will always read the same subset of files.
shuffle_files only shuffle files within each worker. * If
info.splits[split].num_shards < input_context.num_input_pipelines , an
error will be raised, as some workers would be empty.
|
experimental_interleave_sort_fn
|
Function with signature List[FileDict] ->
List[FileDict] , which takes the list of dict(file: str, take: int, skip:
int) and returns the modified version to read. This can be used to
sort/shuffle the shards to read in a custom order, instead of relying on
shuffle_files=True .
|
skip_prefetch
|
If False (default), add a ds.prefetch() op at the end.
Might be set for performance optimization in some cases (e.g. if you're
already calling ds.prefetch() at the end of your pipeline)
|
num_parallel_calls_for_decode
|
The number of parallel calls for decoding
record. By default using tf.data's AUTOTUNE.
|
num_parallel_calls_for_interleave_files
|
The number of parallel calls for
interleaving files. By default using tf.data's AUTOTUNE.
|
enable_ordering_guard
|
When True (default), an exception is raised if
shuffling or interleaving are used on an ordered dataset.
|
assert_cardinality
|
When True (default), an exception is raised if at the
end of an Epoch the number of read examples does not match the expected
number from dataset metadata. A power user would typically want to set
False if input files have been tempered with and they don't mind missing
records or have too many of them.
|
override_buffer_size
|
number of bytes to pass to file readers for buffering.
|
Class Variables |
add_tfds_id
|
False
|
assert_cardinality
|
True
|
enable_ordering_guard
|
True
|
experimental_interleave_sort_fn
|
None
|
input_context
|
None
|
interleave_block_length
|
16
|
interleave_cycle_length
|
'missing'
|
num_parallel_calls_for_decode
|
None
|
num_parallel_calls_for_interleave_files
|
None
|
options
|
None
|
override_buffer_size
|
None
|
repeat_filenames
|
False
|
shuffle_reshuffle_each_iteration
|
None
|
shuffle_seed
|
None
|
skip_prefetch
|
False
|
try_autocache
|
True
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-04-26 UTC.
[{
"type": "thumb-down",
"id": "missingTheInformationINeed",
"label":"Missing the information I need"
},{
"type": "thumb-down",
"id": "tooComplicatedTooManySteps",
"label":"Too complicated / too many steps"
},{
"type": "thumb-down",
"id": "outOfDate",
"label":"Out of date"
},{
"type": "thumb-down",
"id": "samplesCodeIssue",
"label":"Samples / code issue"
},{
"type": "thumb-down",
"id": "otherDown",
"label":"Other"
}]
[{
"type": "thumb-up",
"id": "easyToUnderstand",
"label":"Easy to understand"
},{
"type": "thumb-up",
"id": "solvedMyProblem",
"label":"Solved my problem"
},{
"type": "thumb-up",
"id": "otherUp",
"label":"Other"
}]
{"lastModified": "Last updated 2024-04-26 UTC."}
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-04-26 UTC."],[],[]]