tfds.core.SplitBase

View source on GitHub

Class SplitBase

Abstract base class for Split compositionality.

See the guide on splits for more information.

There are three parts to the composition: 1) The splits are composed (defined, merged, split,...) together before calling the .as_dataset() function. This is done with the __add__, __getitem__, which return a tree of SplitBase (whose leaf are the NamedSplit objects)

split = tfds.Split.TRAIN + tfds.Split.TEST.subsplit(tfds.percent[:50])

2) The SplitBase is forwarded to the .as_dataset() function to be resolved into actual read instruction. This is done by the .get_read_instruction() method which takes the real dataset splits (name, number of shards,...) and parse the tree to return a SplitReadInstruction() object

read_instruction = split.get_read_instruction(self.info.splits)

3) The SplitReadInstruction is then used in the tf.data.Dataset pipeline to define which files to read and how to skip examples within file.

Methods

__add__

View source

__add__(other)

Merging: tfds.Split.TRAIN + tfds.Split.TEST.

__eq__

View source

__eq__(other)

Equality: tfds.Split.TRAIN == 'train'.

__ne__

View source

__ne__(other)

InEquality: tfds.Split.TRAIN != 'test'.

get_read_instruction

View source

get_read_instruction(split_dict)

Parse the descriptor tree and compile all read instructions together.

Args:

  • split_dict: dict, The dict[split_name, SplitInfo] of the dataset

Returns:

  • split_read_instruction: SplitReadInstruction

subsplit

View source

subsplit(
    arg=None,
    k=None,
    percent=None,
    weighted=None
)

Divides this split into subsplits.

There are 3 ways to define subsplits, which correspond to the 3 arguments k (get k even subsplits), percent (get a slice of the dataset with tfds.percent), and weighted (get subsplits with proportions specified by weighted).

Examples:

# 50% train, 50% test
train, test = split.subsplit(k=2)
# 50% train, 25% test, 25% validation
train, test, validation = split.subsplit(weighted=[2, 1, 1])
# Extract last 20%
subsplit = split.subsplit(tfds.percent[-20:])
train, test, valid = split.subsplit(k=3)  # 33%, 33%, 34%
s1, s2, s3, s4 = split.subsplit(weighted=[2, 2, 1, 1])  # 33%, 33%, 16%, 18%

Args:

  • arg: If no kwargs are given, arg will be interpreted as one of k, percent, or weighted depending on the type. For example:
split.subsplit(10)  # Equivalent to split.subsplit(k=10)
split.subsplit(tfds.percent[:-20])  # percent=tfds.percent[:-20]
split.subsplit([1, 1, 2])  # weighted=[1, 1, 2]
  • k: int If set, subdivide the split into k equal parts.
  • percent: tfds.percent slice, return a single subsplit corresponding to a slice of the original split. For example: split.subsplit(tfds.percent[-20:]) # Last 20% of the dataset.
  • weighted: list[int], return a list of subsplits whose proportions match the normalized sum of the list. For example: split.subsplit(weighted=[1, 1, 2]) # 25%, 25%, 50%.

Returns:

A subsplit or list of subsplits extracted from this split object.