tfds.core.SplitInfo

Wraps proto.SplitInfo with an additional property.

name Name of the split (e.g. train, test,...)
shard_lengths List of length containing the number of examples stored in each file.
filename_template The template used to create sharded filenames.
num_examples Total number of examples (sum(shard_lengths))
num_shards Number of files (len(shard_lengths))
num_bytes Size of the files (in bytes)
statistics Additional statistics of the split.
file_instructions Returns the list of dict(filename, take, skip).

This allows for creating your own tf.data.Dataset using the low-level TFDS values.

file_instructions = info.splits['train[75%:]'].file_instructions
instruction_ds = tf.data.Dataset.from_generator(
    lambda: file_instructions,
    output_types={
        'filename': tf.string,
        'take': tf.int64,
        'skip': tf.int64,
    },
)
ds = instruction_ds.interleave(
    lambda f: tf.data.TFRecordDataset(
        f['filename']).skip(f['skip']).take(f['take'])
)

When skip=0 and take=-1, the full shard will be read, so the ds.skip and ds.take could be skipped.

filenames Returns the list of filenames.
filepaths All the paths for all the files that are part of this split.

Methods

from_proto

View source

Returns a SplitInfo class instance from a SplitInfo proto.

replace

View source

Returns a copy of the SplitInfo with updated attributes.

to_proto

View source

filename_template None