TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

tfds.load

Loads the named dataset into a tf.data.Dataset.

tfds.load(
    name: str,
    *,
    split: Optional[Tree[splits_lib.SplitArg]] = None,
    data_dir: Union[None, str, os.PathLike] = None,
    batch_size: Optional[int] = None,
    shuffle_files: bool = False,
    download: bool = True,
    as_supervised: bool = False,
    decoders: Optional[TreeDict[decode.partial_decode.DecoderArg]] = None,
    read_config: Optional[read_config_lib.ReadConfig] = None,
    with_info: bool = False,
    builder_kwargs: Optional[Dict[str, Any]] = None,
    download_and_prepare_kwargs: Optional[Dict[str, Any]] = None,
    as_dataset_kwargs: Optional[Dict[str, Any]] = None,
    try_gcs: bool = False
)

Used in the notebooks

Used in the guide	Used in the tutorials
Multilayer perceptrons for digit recognition with Core APIs Migrate early stopping Distributed training with Core APIs and DTensor Effective Tensorflow 2 Estimators	TensorFlow Datasets Data augmentation Custom training: walkthrough Load text Training a neural network on MNIST with Keras

tfds.load is a convenience method that:

Fetch the tfds.core.DatasetBuilder by name:

builder = tfds.builder(name, data_dir=data_dir, **builder_kwargs)

Generate the data (when download=True):

builder.download_and_prepare(**download_and_prepare_kwargs)

Load the tf.data.Dataset object:

ds = builder.as_dataset(
    split=split,
    as_supervised=as_supervised,
    shuffle_files=shuffle_files,
    read_config=read_config,
    decoders=decoders,
    **as_dataset_kwargs,
)

See: https://www.tensorflow.org/datasets/overview#load_a_dataset for more examples.

If you'd like NumPy arrays instead of tf.data.Datasets or tf.Tensors, you can pass the return value to tfds.as_numpy.

Args
`name`	`str`, the registered name of the `DatasetBuilder` (the snake case version of the class name). The config and version can also be specified in the name as follows: `'dataset_name[/config_name][:version]'`. For example, `'movielens/25m-ratings'` (for the latest version of `'25m-ratings'`), `'movielens:0.1.0'` (for the default config and version 0.1.0), or`'movielens/25m-ratings:0.1.0'`. Note that only the latest version can be generated, but old versions can be read if they are present on disk. For convenience, the `name` parameter can contain comma-separated keyword arguments for the builder. For example, `'foo_bar/a=True,b=3'` would use the `FooBar` dataset passing the keyword arguments `a=True` and `b=3` (for builders with configs, it would be `'foo_bar/zoo/a=True,b=3'` to use the `'zoo'` config and pass to the builder keyword arguments `a=True` and `b=3`).
`split`	Which split of the data to load (e.g. `'train'`, `'test'`, `['train', 'test']`, `'train[80%:]'`,...). See our split API guide. If `None`, will return all splits in a `Dict[Split, tf.data.Dataset]`
`data_dir`	directory to read/write data. Defaults to the value of the environment variable TFDS_DATA_DIR, if set, otherwise falls back to datasets are stored.
`batch_size`	`int`, if set, add a batch dimension to examples. Note that variable length features will be 0-padded. If `batch_size=-1`, will return the full dataset as `tf.Tensor`s.
`shuffle_files`	`bool`, whether to shuffle the input files. Defaults to `False`.
`download`	`bool` (optional), whether to call `tfds.core.DatasetBuilder.download_and_prepare` before calling `tfds.core.DatasetBuilder.as_dataset`. If `False`, data is expected to be in `data_dir`. If `True` and the data is already in `data_dir`, when data_dir is a Placer path.
`as_supervised`	`bool`, if `True`, the returned `tf.data.Dataset` will have a 2-tuple structure `(input, label)` according to `builder.info.supervised_keys`. If `False`, the default, the returned `tf.data.Dataset` will have a dictionary with all the features.
`decoders`	Nested dict of `Decoder` objects which allow to customize the decoding. The structure should match the feature structure, but only customized feature keys need to be present. See the guide for more info.
`read_config`	`tfds.ReadConfig`, Additional options to configure the input pipeline (e.g. seed, num parallel reads,...).
`with_info`	`bool`, if `True`, `tfds.load` will return the tuple (`tf.data.Dataset`, `tfds.core.DatasetInfo`), the latter containing the info associated with the builder.
`builder_kwargs`	`dict` (optional), keyword arguments to be passed to the `tfds.core.DatasetBuilder` constructor. `data_dir` will be passed through by default.
`download_and_prepare_kwargs`	`dict` (optional) keyword arguments passed to `tfds.core.DatasetBuilder.download_and_prepare` if `download=True`. Allow to control where to download and extract the cached data. If not set, cache_dir and manual_dir will automatically be deduced from data_dir.
`as_dataset_kwargs`	`dict` (optional), keyword arguments passed to `tfds.core.DatasetBuilder.as_dataset`.
`try_gcs`	`bool`, if True, `tfds.load` will see if the dataset exists on the public GCS bucket before building it locally. This is equivalent to passing `data_dir='gs://tfds-data/datasets'`. Warning: `try_gcs` is different than `builder_kwargs.download_config.try_download_gcs`. `try_gcs` (default: False) overrides `data_dir` to be the public GCS bucket. `try_download_gcs` (default: True) allows downloading from GCS while keeping a different `data_dir` than the public GCS bucket. So, to fully bypass GCS, please use `try_gcs=False` and `download_and_prepare_kwargs={'download_config': tfds.core.download.DownloadConfig(try_download_gcs=False)})`.

Returns
`ds`	`tf.data.Dataset`, the dataset requested, or if `split` is None, a `dict<key: tfds.Split, value: tf.data.Dataset>`. If `batch_size=-1`, these will be full datasets as `tf.Tensor`s.
`ds_info`	`tfds.core.DatasetInfo`, if `with_info` is True, then `tfds.load` will return a tuple `(ds, ds_info)` containing dataset information (version, features, splits, num_examples,...). Note that the `ds_info` object documents the entire dataset, regardless of the `split` requested. Split-specific information is available in `ds_info.splits`.

tfds.load Stay organized with collections Save and categorize content based on your preferences.

Used in the notebooks

Args

Returns

tfds.load