TensorFlow 2.0 Beta is available Learn more

tfds.load

Loads the named dataset into a tf.data.Dataset.

tfds.load(
    name,
    split=None,
    data_dir=None,
    batch_size=None,
    in_memory=None,
    download=True,
    as_supervised=False,
    decoders=None,
    with_info=False,
    builder_kwargs=None,
    download_and_prepare_kwargs=None,
    as_dataset_kwargs=None,
    try_gcs=False
)

View source

Used in the guide:

Used in the tutorials:

If split=None (the default), returns all splits for the dataset. Otherwise, returns the specified split.

load is a convenience method that fetches the tfds.core.DatasetBuilder by string name, optionally calls DatasetBuilder.download_and_prepare (if download=True), and then calls DatasetBuilder.as_dataset. This is roughly equivalent to:

builder = tfds.builder(name, data_dir=data_dir, **builder_kwargs)
if download:
  builder.download_and_prepare(**download_and_prepare_kwargs)
ds = builder.as_dataset(
    split=split, as_supervised=as_supervised, **as_dataset_kwargs)
if with_info:
  return ds, builder.info
return ds

If you'd like NumPy arrays instead of tf.data.Datasets or tf.Tensors, you can pass the return value to tfds.as_numpy.

Callers must pass arguments as keyword arguments.

Warning: calling this function might potentially trigger the download of hundreds of GiB to disk. Refer to the download argument.

Args:

  • name: str, the registered name of the DatasetBuilder (the snake case version of the class name). This can be either "dataset_name" or "dataset_name/config_name" for datasets with BuilderConfigs. As a convenience, this string may contain comma-separated keyword arguments for the builder. For example "foo_bar/a=True,b=3" would use the FooBar dataset passing the keyword arguments a=True and b=3 (for builders with configs, it would be "foo_bar/zoo/a=True,b=3" to use the "zoo" config and pass to the builder keyword arguments a=True and b=3).
  • split: tfds.Split or str, which split of the data to load. If None, will return a dict with all splits (typically tfds.Split.TRAIN and tfds.Split.TEST).
  • data_dir: str (optional), directory to read/write data. Defaults datasets are stored.
  • batch_size: int, if set, add a batch dimension to examples. Note that variable length features will be 0-padded. If batch_size=-1, will return the full dataset as tf.Tensors.
  • in_memory: bool, if True, loads the dataset in memory which increases iteration speeds. Note that if True and the dataset has unknown dimensions, the features will be padded to the maximum size across the dataset.
  • download: bool (optional), whether to call tfds.core.DatasetBuilder.download_and_prepare before calling tf.DatasetBuilder.as_dataset. If False, data is expected to be in data_dir. If True and the data is already in data_dir, when data_dir is a Placer path.
  • as_supervised: bool, if True, the returned tf.data.Dataset will have a 2-tuple structure (input, label) according to builder.info.supervised_keys. If False, the default, the returned tf.data.Dataset will have a dictionary with all the features.
  • decoders: Nested dict of Decoder objects which allow to customize the decoding. The structure should match the feature structure, but only customized feature keys need to be present. See the guide for more info.
  • with_info: bool, if True, tfds.load will return the tuple (tf.data.Dataset, tfds.core.DatasetInfo) containing the info associated with the builder.
  • builder_kwargs: dict (optional), keyword arguments to be passed to the tfds.core.DatasetBuilder constructor. data_dir will be passed through by default.
  • download_and_prepare_kwargs: dict (optional) keyword arguments passed to tfds.core.DatasetBuilder.download_and_prepare if download=True. Allow to control where to download and extract the cached data. If not set, cache_dir and manual_dir will automatically be deduced from data_dir.
  • as_dataset_kwargs: dict (optional), keyword arguments passed to tfds.core.DatasetBuilder.as_dataset. split will be passed through by default. Example: {'shuffle_files': True}. Note that shuffle_files is False by default unless split == tfds.Split.TRAIN.
  • try_gcs: bool, if True, tfds.load will see if the dataset exists on the public GCS bucket before building it locally.

Returns:

  • ds: tf.data.Dataset, the dataset requested, or if split is None, a dict<key: tfds.Split, value: tfds.data.Dataset>. If batch_size=-1, these will be full datasets as tf.Tensors.
  • ds_info: tfds.core.DatasetInfo, if with_info is True, then tfds.load will return a tuple (ds, ds_info) containing dataset information (version, features, splits, num_examples,...). Note that the ds_info object documents the entire dataset, regardless of the split requested. Split-specific information is available in ds_info.splits.