tfds.core.ShardedFileTemplate

Template to produce filenames for sharded datasets.

data_dir the directory that contains the files for the shards.
template template of the sharded files, e.g. '\({SPLIT}/data.\){FILEFORMAT}-${SHARD_INDEX}'.
dataset_name the name of the dataset.
split the split of the dataset.
filetype_suffix the filetype suffix to denote the type of file. For example, tfrecord.
regex Returns the regular expression for this template.

Can be used to test whether a filename matches to this template.

Methods

filepath_prefix

View source

is_valid

View source

Returns whether the given filename follows this template.

parse_filename_info

View source

Parses the filename using this template.

Note that when the filename doesn't specify the dataset name, split, or filetype suffix, but this template does, then the value in the template will be used.

Arguments
filename the filename that should be parsed.

Returns
the FilenameInfo corresponding to the given file if it could be parsed. None otherwise.

relative_filepath

View source

Returns the path (relative to the data dir) of the shard.

replace

View source

Returns a copy of the ShardedFileTemplate with updated attributes.

sharded_filenames

View source

sharded_filepath

View source

Returns the filename (including full path if data_dir is set) for the given shard.

sharded_filepaths

View source

sharded_filepaths_pattern

View source

Returns a pattern describing all the file paths captured by this template.

If num_shards is given, then it returns '/path/dataset_name-split.fileformat@num_shards. Ifnum_shardsis not given, then it returns '/path/dataset_name-split.fileformat*.

Args
num_shards optional specification of the number of shards.

Returns
the pattern describing all shards captured by this template.

__eq__

dataset_name None
filetype_suffix None
split None
template '{DATASET}-{SPLIT}.{FILEFORMAT}-{SHARD_X_OF_Y}'