TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

tfds.folder_dataset.ImageFolder

Generic image classification dataset created from manual directory.

Inherits From: DatasetBuilder

View aliases

Main aliases

tfds.ImageFolder

tfds.folder_dataset.ImageFolder(
    root_dir: str,
    *,
    shape: Optional[type_utils.Shape] = None,
    dtype: Optional[tf.DType] = None
)

ImageFolder creates a tf.data.Dataset reading the original image files.

The data directory should have the following structure:

path/to/image_dir/
  split_name/  # Ex: 'train'
    label1/  # Ex: 'airplane' or '0015'
      xxx.png
      xxy.png
      xxz.png
    label2/
      xxx.png
      xxy.png
      xxz.png
  split_name/  # Ex: 'test'
    ...

To use it:

builder = tfds.ImageFolder('path/to/image_dir/')
print(builder.info)  # num examples, labels... are automatically calculated
ds = builder.as_dataset(split='train', shuffle_files=True)
tfds.show_examples(ds, builder.info)

Args
`root_dir`	Path to the directory containing the images.
`shape`	Image shape forwarded to `tfds.features.Image`.
`dtype`	Image dtype forwarded to `tfds.features.Image`.

Attributes
`builder_config`	`tfds.core.BuilderConfig` for this builder.
`canonical_version`
`data_dir`	Returns the directory where this version + config is stored. Note that this is different from `data_dir_root`. For example, if `data_dir_root` is `/data/tfds`, then `data_dir` would be `/data/tfds/my_dataset/my_config/1.2.3`.
`data_dir_root`	Returns the root directory where all TFDS datasets are stored. Note that this is different from `data_dir`, which includes the dataset name, config, and version. For example, if `data_dir` is `/data/tfds/my_dataset/my_config/1.2.3`, then `data_dir_root` is `/data/tfds`.
`data_path`	Returns the path where this version + config is stored.
`info`	`tfds.core.DatasetInfo` for this builder.
`release_notes`
`supported_versions`
`version`
`versions`	Versions (canonical + availables), in preference order.

Methods

`as_data_source`

View source

as_data_source(
    split: Optional[Tree[splits_lib.SplitArg]] = None,
    *,
    decoders: Optional[TreeDict[decode.partial_decode.DecoderArg]] = None
) -> ListOrTreeOrElem[Sequence[Any]]

Constructs an ArrayRecordDataSource.

Args
`split`	Which split of the data to load (e.g. `'train'`, `'test'`, `['train', 'test']`, `'train[80%:]'`,...). See our split API guide. If `None`, will return all splits in a `Dict[Split, Sequence]`.
`decoders`	Nested dict of `Decoder` objects which allow to customize the decoding. The structure should match the feature structure, but only customized feature keys need to be present. See the guide for more info.

Returns
`Sequence` if `split`, `dict<key: tfds.Split, value: Sequence>` otherwise.

Raises
NotImplementedError if the data was not generated using ArrayRecords.

`as_dataset`

View source

as_dataset(
    split: Optional[Tree[splits_lib.SplitArg]] = None,
    *,
    batch_size: Optional[int] = None,
    shuffle_files: bool = False,
    decoders: Optional[TreeDict[decode.partial_decode.DecoderArg]] = None,
    read_config: Optional[read_config_lib.ReadConfig] = None,
    as_supervised: bool = False
)

Constructs a tf.data.Dataset.

Callers must pass arguments as keyword arguments.

The output types vary depending on the parameters. Examples:

builder = tfds.builder('imdb_reviews')
builder.download_and_prepare()

# Default parameters: Returns the dict of tf.data.Dataset
ds_all_dict = builder.as_dataset()
assert isinstance(ds_all_dict, dict)
print(ds_all_dict.keys())  # ==> ['test', 'train', 'unsupervised']

assert isinstance(ds_all_dict['test'], tf.data.Dataset)
# Each dataset (test, train, unsup.) consists of dictionaries
# {'label': <tf.Tensor: .. dtype=int64, numpy=1>,
#  'text': <tf.Tensor: .. dtype=string, numpy=b"I've watched the movie ..">}
# {'label': <tf.Tensor: .. dtype=int64, numpy=1>,
#  'text': <tf.Tensor: .. dtype=string, numpy=b'If you love Japanese ..'>}

# With as_supervised: tf.data.Dataset only contains (feature, label) tuples
ds_all_supervised = builder.as_dataset(as_supervised=True)
assert isinstance(ds_all_supervised, dict)
print(ds_all_supervised.keys())  # ==> ['test', 'train', 'unsupervised']

assert isinstance(ds_all_supervised['test'], tf.data.Dataset)
# Each dataset (test, train, unsup.) consists of tuples (text, label)
# (<tf.Tensor: ... dtype=string, numpy=b"I've watched the movie ..">,
#  <tf.Tensor: ... dtype=int64, numpy=1>)
# (<tf.Tensor: ... dtype=string, numpy=b"If you love Japanese ..">,
#  <tf.Tensor: ... dtype=int64, numpy=1>)

# Same as above plus requesting a particular split
ds_test_supervised = builder.as_dataset(as_supervised=True, split='test')
assert isinstance(ds_test_supervised, tf.data.Dataset)
# The dataset consists of tuples (text, label)
# (<tf.Tensor: ... dtype=string, numpy=b"I've watched the movie ..">,
#  <tf.Tensor: ... dtype=int64, numpy=1>)
# (<tf.Tensor: ... dtype=string, numpy=b"If you love Japanese ..">,
#  <tf.Tensor: ... dtype=int64, numpy=1>)

Args
`split`	Which split of the data to load (e.g. `'train'`, `'test'`, `['train', 'test']`, `'train[80%:]'`,...). See our split API guide. If `None`, will return all splits in a `Dict[Split, tf.data.Dataset]`.
`batch_size`	`int`, batch size. Note that variable-length features will be 0-padded if `batch_size` is set. Users that want more custom behavior should use `batch_size=None` and use the `tf.data` API to construct a custom pipeline. If `batch_size == -1`, will return feature dictionaries of the whole dataset with `tf.Tensor`s instead of a `tf.data.Dataset`.
`shuffle_files`	`bool`, whether to shuffle the input files. Defaults to `False`.
`decoders`	Nested dict of `Decoder` objects which allow to customize the decoding. The structure should match the feature structure, but only customized feature keys need to be present. See the guide for more info.
`read_config`	`tfds.ReadConfig`, Additional options to configure the input pipeline (e.g. seed, num parallel reads,...).
`as_supervised`	`bool`, if `True`, the returned `tf.data.Dataset` will have a 2-tuple structure `(input, label)` according to `builder.info.supervised_keys`. If `False`, the default, the returned `tf.data.Dataset` will have a dictionary with all the features.

Returns

Returns
`tf.data.Dataset`, or if `split=None`, `dict<key: tfds.Split, value: tf.data.Dataset>`. If `batch_size` is -1, will return feature dictionaries containing the entire dataset in `tf.Tensor`s instead of a `tf.data.Dataset`.

tf.data.Dataset, or if split=None,

dict<key: tfds.Split, value:
tf.data.Dataset>

If batch_size is -1, will return feature dictionaries containing the entire dataset in tf.Tensors instead of a tf.data.Dataset.

`dataset_info_from_configs`

View source

dataset_info_from_configs(
    **kwargs
)

Returns the DatasetInfo using given kwargs and config files.

Sub-class should call this and add information not present in config files using kwargs directly passed to tfds.core.DatasetInfo object.

If information is present both in passed arguments and config files, config files will prevail.

Args
`**kwargs`	kw args to pass to DatasetInfo directly.

`download_and_prepare`

View source

download_and_prepare(
    **kwargs
)

Downloads and prepares dataset for reading.

Args
`download_dir`	directory where downloaded files are stored. Defaults to "~/tensorflow-datasets/downloads".
`download_config`	`tfds.download.DownloadConfig`, further configuration for downloading and preparing dataset.
`file_format`	optional `str` or `file_adapters.FileFormat`, format of the record files in which the dataset will be written.

Raises
`IOError`	if there is not enough disk space available.
`RuntimeError`	when the config cannot be found.

`get_default_builder_config`

View source

get_default_builder_config() -> Optional[BuilderConfig]

Returns the default builder config if there is one.

Note that for dataset builders that cannot use the cls.BUILDER_CONFIGS, we need a method that uses the instance to get BUILDER_CONFIGS and DEFAULT_BUILDER_CONFIG_NAME.

Returns
the default builder config if there is one

`get_metadata`

View source

@classmethod
get_metadata() -> dataset_metadata.DatasetMetadata

Returns metadata (README, CITATIONS, ...) specified in config files.

The config files are read from the same package where the DatasetBuilder has been defined, so those metadata might be wrong for legacy builders.

`get_reference`

View source

get_reference(
    namespace: Optional[str] = None
) -> naming.DatasetReference

Returns a reference to the dataset produced by this dataset builder.

Includes the config if specified, the version, and the data_dir that should contain this dataset.

Arguments
`namespace`	if this dataset is a community dataset, and therefore has a namespace, then the namespace must be provided such that it can be set in the reference. Note that a dataset builder is not aware that it is part of a namespace.

Returns
a reference to this instantiated builder.

`is_prepared`

View source

is_prepared() -> bool

Returns whether this dataset is already downloaded and prepared.

Class Variables
BUILDER_CONFIGS	`[]`
DEFAULT_BUILDER_CONFIG_NAME	`None`
MANUAL_DOWNLOAD_INSTRUCTIONS	`None`
MAX_SIMULTANEOUS_DOWNLOADS	`None`
RELEASE_NOTES	`}`
SUPPORTED_VERSIONS	`[]`
VERSION	Instance of `tfds.core.Version`
builder_config_cls	`None`
builder_configs	`}`
code_path	Instance of `etils.epath.gpath.PosixGPath`
default_builder_config	`None`
name	`'image_folder'`
pkg_dir_path	`None`
url_infos	`None`

tfds.folder_dataset.ImageFolder

View aliases

To use it:

Args

Attributes

Methods

as_data_source

as_dataset

dataset_info_from_configs

download_and_prepare

get_default_builder_config

get_metadata

get_reference

is_prepared

Class Variables

`as_data_source`

`as_dataset`

`dataset_info_from_configs`

`download_and_prepare`

`get_default_builder_config`

`get_metadata`

`get_reference`

`is_prepared`