TF 2.0 is out! Get hands-on practice at TF World, Oct 28-31. Use code TF20 for 20% off select passes. Register now

tfds.core.SplitBase

View source

Class SplitBase

Abstract base class for Split compositionality.

See the guide on splits for more information.

There are three parts to the composition: 1) The splits are composed (defined, merged, split,...) together before calling the .as_dataset() function. This is done with the __add__, __getitem__, which return a tree of SplitBase (whose leaf are the NamedSplit objects)

  split = tfds.Split.TRAIN + tfds.Split.TEST.subsplit(tfds.percent[:50])

2) The SplitBase is forwarded to the .as_dataset() function to be resolved into actual read instruction. This is done by the .get_read_instruction() method which takes the real dataset splits (name, number of shards,...) and parse the tree to return a SplitReadInstruction() object

read_instruction = split.get_read_instruction(self.info.splits)

3) The SplitReadInstruction is then used in the tf.data.Dataset pipeline to define which files to read and how to skip examples within file.

Methods

__add__

View source

__add__(other)

Merging: tfds.Split.TRAIN + tfds.Split.TEST.

__eq__

View source

__eq__(other)

Equality: tfds.Split.TRAIN == 'train'.

__ne__

View source

__ne__(other)

InEquality: tfds.Split.TRAIN != 'test'.

get_read_instruction

View source

get_read_instruction(split_dict)

Parse the descriptor tree and compile all read instructions together.

Args:

  • split_dict: dict, The dict[split_name, SplitInfo] of the dataset

Returns:

  • split_read_instruction: SplitReadInstruction

subsplit

View source

subsplit(
    arg=None,
    k=None,
    percent=None,
    weighted=None
)

Divides this split into subsplits.

There are 3 ways to define subsplits, which correspond to the 3 arguments k (get k even subsplits), percent (get a slice of the dataset with tfds.percent), and weighted (get subsplits with proportions specified by weighted).

Examples:

# 50% train, 50% test
train, test = split.subsplit(k=2)
# 50% train, 25% test, 25% validation
train, test, validation = split.subsplit(weighted=[2, 1, 1])
# Extract last 20%
subsplit = split.subsplit(tfds.percent[-20:])
train, test, valid = split.subsplit(k=3)  # 33%, 33%, 34%
s1, s2, s3, s4 = split.subsplit(weighted=[2, 2, 1, 1])  # 33%, 33%, 16%, 18%

Args:

  • arg: If no kwargs are given, arg will be interpreted as one of k, percent, or weighted depending on the type. For example: split.subsplit(10) # Equivalent to split.subsplit(k=10) split.subsplit(tfds.percent[:-20]) # percent=tfds.percent[:-20] split.subsplit([1, 1, 2]) # weighted=[1, 1, 2]
  • k: int If set, subdivide the split into k equal parts.
  • percent: tfds.percent slice, return a single subsplit corresponding to a slice of the original split. For example: split.subsplit(tfds.percent[-20:]) # Last 20% of the dataset.
  • weighted: list[int], return a list of subsplits whose proportions match the normalized sum of the list. For example: split.subsplit(weighted=[1, 1, 2]) # 25%, 25%, 50%.

Returns:

A subsplit or list of subsplits extracted from this split object.