ML Community Day is November 9! Join us for updates from TensorFlow, JAX, and more Learn more

tfds.core.SplitInfo

Wraps proto.SplitInfo with an additional property.

name Name of the split (e.g. train, test,...)
shard_lengths List of length containing the number of examples stored in each file.
num_examples Total number of examples (sum(shard_lengths))
num_shards Number of files (len(shard_lengths))
num_bytes Size of the files
statistics Additional statistics of the split.
file_instructions Returns the list of dict(filename, take, skip).

This allows for creating your own tf.data.Dataset using the low-level TFDS values.

file_instructions = info.splits['train[75%:]'].file_instructions
instruction_ds = tf.data.Dataset.from_generator(
    lambda: file_instructions,
    output_types={
        'filename': tf.string,
        'take': tf.int64,
        'skip': tf.int64,
    },
)
ds = instruction_ds.interleave(
    lambda f: tf.data.TFRecordDataset(
        f['filename']).skip(f['skip']).take(f['take'])
)

When skip=0 and take=-1, the full shard will be read, so the ds.skip and ds.take could be skipped.

filenames Returns the list of filenames.

Methods

from_proto

View source

replace

View source

Returns a copy of the SplitInfo with updated attributes.

to_proto

View source