tfds.load

tfds.load(
    name,
    split=None,
    data_dir=None,
    batch_size=1,
    download=True,
    as_supervised=False,
    with_info=False,
    builder_kwargs=None,
    download_and_prepare_kwargs=None,
    as_dataset_kwargs=None
)

Defined in core/registered.py.

Loads the named dataset into a tf.data.Dataset.

If split=None (the default), returns all splits for the dataset. Otherwise, returns the specified split.

load is a convenience method that fetches the tfds.core.DatasetBuilder by string name, optionally calls DatasetBuilder.download_and_prepare (if download=True), and then calls DatasetBuilder.as_dataset. This is roughly equivalent to:

builder = tfds.builder(name, data_dir=data_dir, **builder_kwargs)
if download:
  builder.download_and_prepare(**download_and_prepare_kwargs)
ds = builder.as_dataset(
    split=split, as_supervised=as_supervised, **as_dataset_kwargs)
if with_info:
  return ds, builder.info
return ds

If you'd like NumPy arrays instead of tf.data.Datasets or tf.Tensors, you can pass the return value to tfds.as_numpy.

Callers must pass arguments as keyword arguments.

Warning: calling this function might potentially trigger the download of hundreds of GiB to disk. Refer to the download argument.

Args:

  • name: str, the registered name of the DatasetBuilder (the snake case version of the class name). This can be either "dataset_name" or "dataset_name/config_name" for datasets with BuilderConfigs. As a convenience, this string may contain comma-separated keyword arguments for the builder. For example "foo_bar/a=True,b=3" would use the FooBar dataset passing the keyword arguments a=True and b=3 (for builders with configs, it would be "foo_bar/zoo/a=True,b=3" to use the "zoo" config and pass to the builder keyword arguments a=True and b=3).
  • split: tfds.Split or str, which split of the data to load. If None, will return a dict with all splits (typically tfds.Split.TRAIN and tfds.Split.TEST).
  • data_dir: str (optional), directory to read/write data. Defaults to "~/tensorflow_datasets".
  • batch_size: int, set to > 1 to get batches of examples. Note that variable length features will be 0-padded. If batch_size=-1, will return the full dataset as tf.Tensors.
  • download: bool (optional), whether to call tfds.core.DatasetBuilder.download_and_prepare before calling tf.DatasetBuilder.as_dataset. If False, data is expected to be in data_dir. If True and the data is already in data_dir, download_and_prepare is a no-op.
  • as_supervised: bool, if True, the returned tf.data.Dataset will have a 2-tuple structure (input, label) according to builder.info.supervised_keys. If False, the default, the returned tf.data.Dataset will have a dictionary with all the features.
  • with_info: bool, if True, tfds.load will return the tuple (tf.data.Dataset, tfds.core.DatasetInfo) containing the info associated with the builder.
  • builder_kwargs: dict (optional), keyword arguments to be passed to the tfds.core.DatasetBuilder constructor. data_dir will be passed through by default.
  • download_and_prepare_kwargs: dict (optional) keyword arguments passed to tfds.core.DatasetBuilder.download_and_prepare if download=True. Allow to control where to download and extract the cached data. If not set, cache_dir and manual_dir will automatically be deduced from data_dir.
  • as_dataset_kwargs: dict (optional), keyword arguments passed to tfds.core.DatasetBuilder.as_dataset. split will be passed through by default.

Returns:

  • ds: tf.data.Dataset, the dataset requested, or if split is None, a dict<key: tfds.Split, value: tfds.data.Dataset>. If batch_size=-1, these will be full datasets as tf.Tensors.
  • ds_info: tfds.core.DatasetInfo, if with_info is True, then tfds.load will return a tuple (ds, ds_info) containing dataset information (version, features, splits, num_examples,...). Note that the ds_info object documents the entire dataset, regardless of the split requested. Split-specific information is available in ds_info.splits.