Loads the named dataset into a


Defined in core/

If split=None (the default), returns all splits for the dataset. Otherwise, returns the specified split.

load is a convenience method that fetches the tfds.core.DatasetBuilder by string name, optionally calls DatasetBuilder.download_and_prepare (if download=True), and then calls DatasetBuilder.as_dataset. This is roughly equivalent to:

builder = tfds.builder(name, data_dir=data_dir, **builder_kwargs)
if download:
ds = builder.as_dataset(
    split=split, as_supervised=as_supervised, **as_dataset_kwargs)
if with_info:
  return ds,
return ds

If you'd like NumPy arrays instead of or tf.Tensors, you can pass the return value to tfds.as_numpy.

Callers must pass arguments as keyword arguments.

Warning: calling this function might potentially trigger the download of hundreds of GiB to disk. Refer to the download argument.


  • name: str, the registered name of the DatasetBuilder (the snake case version of the class name). This can be either "dataset_name" or "dataset_name/config_name" for datasets with BuilderConfigs. As a convenience, this string may contain comma-separated keyword arguments for the builder. For example "foo_bar/a=True,b=3" would use the FooBar dataset passing the keyword arguments a=True and b=3 (for builders with configs, it would be "foo_bar/zoo/a=True,b=3" to use the "zoo" config and pass to the builder keyword arguments a=True and b=3).
  • split: tfds.Split or str, which split of the data to load. If None, will return a dict with all splits (typically tfds.Split.TRAIN and tfds.Split.TEST).
  • data_dir: str (optional), directory to read/write data. Defaults datasets are stored.
  • batch_size: int, set to > 1 to get batches of examples. Note that variable length features will be 0-padded. If batch_size=-1, will return the full dataset as tf.Tensors.
  • download: bool (optional), whether to call tfds.core.DatasetBuilder.download_and_prepare before calling tf.DatasetBuilder.as_dataset. If False, data is expected to be in data_dir. If True and the data is already in data_dir, when data_dir is a Placer path.
  • as_supervised: bool, if True, the returned will have a 2-tuple structure (input, label) according to If False, the default, the returned will have a dictionary with all the features.
  • with_info: bool, if True, tfds.load will return the tuple (, tfds.core.DatasetInfo) containing the info associated with the builder.
  • builder_kwargs: dict (optional), keyword arguments to be passed to the tfds.core.DatasetBuilder constructor. data_dir will be passed through by default.
  • download_and_prepare_kwargs: dict (optional) keyword arguments passed to tfds.core.DatasetBuilder.download_and_prepare if download=True. Allow to control where to download and extract the cached data. If not set, cache_dir and manual_dir will automatically be deduced from data_dir.
  • as_dataset_kwargs: dict (optional), keyword arguments passed to tfds.core.DatasetBuilder.as_dataset. split will be passed through by default. Example: {'shuffle_files': True}. Note that shuffle_files is False by default unless split == tfds.Split.TRAIN.
  • try_gcs: bool, if True, tfds.load will see if the dataset exists on the public GCS bucket before building it locally.


  • ds:, the dataset requested, or if split is None, a dict<key: tfds.Split, value:>. If batch_size=-1, these will be full datasets as tf.Tensors.
  • ds_info: tfds.core.DatasetInfo, if with_info is True, then tfds.load will return a tuple (ds, ds_info) containing dataset information (version, features, splits, num_examples,...). Note that the ds_info object documents the entire dataset, regardless of the split requested. Split-specific information is available in ds_info.splits.