tfds.core.GeneratorBasedBuilder

Class GeneratorBasedBuilder

Inherits From: DatasetBuilder

Defined in core/dataset_builder.py.

Base class for datasets with data generation based on dict generators.

GeneratorBasedBuilder is a convenience class that abstracts away much of the data writing and reading of DatasetBuilder. It expects subclasses to implement generators of feature dictionaries across the dataset splits (_split_generators) and to specify a file type (_file_format_adapter). See the method docstrings for details.

Minimally, subclasses must override _split_generators and _file_format_adapter.

FileFormatAdapters are defined in tensorflow_datasets.core.file_format_adapter and specify constraints on the feature dictionaries yielded by example generators. See the class docstrings.

__init__

__init__(**kwargs)

Builder constructor.

Args:

  • **kwargs: Constructor kwargs forwarded to DatasetBuilder

Properties

builder_config

tfds.core.BuilderConfig for this builder.

info

tfds.core.DatasetInfo for this builder.

Methods

as_dataset

as_dataset(
    split=None,
    batch_size=1,
    shuffle_files=None,
    as_supervised=False
)

Constructs a tf.data.Dataset.

Callers must pass arguments as keyword arguments.

Args:

  • split: tfds.core.SplitBase, which subset(s) of the data to read. If None (default), returns all splits in a dict <key: tfds.Split, value: tf.data.Dataset>.
  • batch_size: int, batch size. Note that variable-length features will be 0-padded if batch_size > 1. Users that want more custom behavior should use batch_size=1 and use the tf.data API to construct a custom pipeline. If batch_size == -1, will return feature dictionaries of the whole dataset with tf.Tensors instead of a tf.data.Dataset.
  • shuffle_files: bool, whether to shuffle the input files. Defaults to True if split == tfds.Split.TRAIN and False otherwise.
  • as_supervised: bool, if True, the returned tf.data.Dataset will have a 2-tuple structure (input, label) according to builder.info.supervised_keys. If False, the default, the returned tf.data.Dataset will have a dictionary with all the features.

Returns:

tf.data.Dataset, or if split=None, dict<key: tfds.Split, value: tfds.data.Dataset>.

If batch_size is -1, will return feature dictionaries containing the entire dataset in tf.Tensors instead of a tf.data.Dataset.

download_and_prepare

download_and_prepare(
    download_dir=None,
    download_config=None
)

Downloads and prepares dataset for reading.

Args:

  • download_dir: str, directory where downloaded files are stored. Defaults to "~/tensorflow-datasets/downloads".
  • download_config: tfds.download.DownloadConfig, further configuration for downloading and preparing dataset.

Class Members

BUILDER_CONFIGS

IN_DEVELOPMENT

VERSION

builder_configs

name