Missed TensorFlow Dev Summit? Check out the video playlist. Watch recordings


View source on GitHub

Abstract base class for Split compositionality.

See the guide on splits for more information.

There are three parts to the composition: 1) The splits are composed (defined, merged, split,...) together before calling the .as_dataset() function. This is done with the __add__, __getitem__, which return a tree of SplitBase (whose leaf are the NamedSplit objects)

split = tfds.Split.TRAIN + tfds.Split.TEST.subsplit(tfds.percent[:50])

2) The SplitBase is forwarded to the .as_dataset() function to be resolved into actual read instruction. This is done by the .get_read_instruction() method which takes the real dataset splits (name, number of shards,...) and parse the tree to return a SplitReadInstruction() object

read_instruction = split.get_read_instruction(self.info.splits)

3) The SplitReadInstruction is then used in the tf.data.Dataset pipeline to define which files to read and how to skip examples within file.



View source


Merging: tfds.Split.TRAIN + tfds.Split.TEST.


View source


Equality: tfds.Split.TRAIN == 'train'.


View source


InEquality: tfds.Split.TRAIN != 'test'.


View source


Parse the descriptor tree and compile all read instructions together.


  • split_dict: dict, The dict[split_name, SplitInfo] of the dataset


  • split_read_instruction: SplitReadInstruction


View source

    arg=None, k=None, percent=None, weighted=None

Divides this split into subsplits.

There are 3 ways to define subsplits, which correspond to the 3 arguments k (get k even subsplits), percent (get a slice of the dataset with tfds.percent), and weighted (get subsplits with proportions specified by weighted).


# 50% train, 50% test
train, test = split.subsplit(k=2)
# 50% train, 25% test, 25% validation
train, test, validation = split.subsplit(weighted=[2, 1, 1])
# Extract last 20%
subsplit = split.subsplit(tfds.percent[-20:])
train, test, valid = split.subsplit(k=3)  # 33%, 33%, 34%
s1, s2, s3, s4 = split.subsplit(weighted=[2, 2, 1, 1])  # 33%, 33%, 16%, 18%


  • arg: If no kwargs are given, arg will be interpreted as one of k, percent, or weighted depending on the type. For example:
split.subsplit(10)  # Equivalent to split.subsplit(k=10)
split.subsplit(tfds.percent[:-20])  # percent=tfds.percent[:-20]
split.subsplit([1, 1, 2])  # weighted=[1, 1, 2]
  • k: int If set, subdivide the split into k equal parts.
  • percent: tfds.percent slice, return a single subsplit corresponding to a slice of the original split. For example: split.subsplit(tfds.percent[-20:]) # Last 20% of the dataset.
  • weighted: list[int], return a list of subsplits whose proportions match the normalized sum of the list. For example: split.subsplit(weighted=[1, 1, 2]) # 25%, 25%, 50%.


A subsplit or list of subsplits extracted from this split object.