tfds.folder_dataset.ImageFolder

Generic image classification dataset created from manual directory.

Inherits From: DatasetBuilder

ImageFolder creates a tf.data.Dataset reading the original image files.

The data directory should have the following structure:

path/to/image_dir/
  split_name/  # Ex: 'train'
    label1/  # Ex: 'airplane' or '0015'
      xxx.png
      xxy.png
      xxz.png
    label2/
      xxx.png
      xxy.png
      xxz.png
  split_name/  # Ex: 'test'
    ...

To use it:

builder = tfds.ImageFolder('path/to/image_dir/')
print(builder.info)  # num examples, labels... are automatically calculated
ds = builder.as_dataset(split='train', shuffle_files=True)
tfds.show_examples(ds, builder.info)

root_dir Path to the directory containing the images.
shape Image shape forwarded to tfds.features.Image.
dtype Image dtype forwarded to tfds.features.Image.

builder_config tfds.core.BuilderConfig for this builder.
canonical_version

data_dir

info tfds.core.DatasetInfo for this builder.
supported_versions

version

versions Versions (canonical + availables), in preference order.

Methods

as_dataset

View source

Constructs a tf.data.Dataset.

Callers must pass arguments as keyword arguments.

The output types vary depending on the parameters. Examples:

builder = tfds.builder('imdb_reviews')
builder.download_and_prepare()

# Default parameters: Returns the dict of tf.data.Dataset
ds_all_dict = builder.as_dataset()
assert isinstance(ds_all_dict, dict)
print(ds_all_dict.keys())  # ==> ['test', 'train', 'unsupervised']

assert isinstance(ds_all_dict['test'], tf.data.Dataset)
# Each dataset (test, train, unsup.) consists of dictionaries
# {'label': <tf.Tensor: .. dtype=int64, numpy=1>,
#  'text': <tf.Tensor: .. dtype=string, numpy=b"I've watched the movie ..">}
# {'label': <tf.Tensor: .. dtype=int64, numpy=1>,
#  'text': <tf.Tensor: .. dtype=string, numpy=b'If you love Japanese ..'>}

# With as_supervised: tf.data.Dataset only contains (feature, label) tuples
ds_all_supervised = builder.as_dataset(as_supervised=True)
assert isinstance(ds_all_supervised, dict)
print(ds_all_supervised.keys())  # ==> ['test', 'train', 'unsupervised']

assert isinstance(ds_all_supervised['test'], tf.data.Dataset)
# Each dataset (test, train, unsup.) consists of tuples (text, label)
# (<tf.Tensor: ... dtype=string, numpy=b"I've watched the movie ..">,
#  <tf.Tensor: ... dtype=int64, numpy=1>)
# (<tf.Tensor: ... dtype=string, numpy=b"If you love Japanese ..">,
#  <tf.Tensor: ... dtype=int64, numpy=1>)

# Same as above plus requesting a particular split
ds_test_supervised = builder.as_dataset(as_supervised=True, split='test')
assert isinstance(ds_test_supervised, tf.data.Dataset)
# The dataset consists of tuples (text, label)
# (<tf.Tensor: ... dtype=string, numpy=b"I've watched the movie ..">,
#  <tf.Tensor: ... dtype=int64, numpy=1>)
# (<tf.Tensor: ... dtype=string, numpy=b"If you love Japanese ..">,
#  <tf.Tensor: ... dtype=int64, numpy=1>)

Args
split Which split of the data to load (e.g. 'train', 'test', ['train', 'test'], 'train[80%:]',...). See our split API guide. If None, will return all splits in a Dict[Split, tf.data.Dataset].
batch_size int, batch size. Note that variable-length features will be 0-padded if batch_size is set. Users that want more custom behavior should use batch_size=None and use the tf.data API to construct a custom pipeline. If batch_size == -1, will return feature dictionaries of the whole dataset with tf.Tensors instead of a tf.data.Dataset.
shuffle_files bool, whether to shuffle the input files. Defaults to False.
decoders Nested dict of Decoder objects which allow to customize the decoding. The structure should match the feature structure, but only customized feature keys need to be present. See the guide for more info.
read_config tfds.ReadConfig, Additional options to configure the input pipeline (e.g. seed, num parallel reads,...).
as_supervised bool, if True, the returned tf.data.Dataset will have a 2-tuple structure (input, label) according to builder.info.supervised_keys. If False, the default, the returned tf.data.Dataset will have a dictionary with all the features.

Returns
tf.data.Dataset, or if split=None, dict<key: tfds.Split, value: tfds.data.Dataset>.

If batch_size is -1, will return feature dictionaries containing the entire dataset in tf.Tensors instead of a tf.data.Dataset.

download_and_prepare

View source

Downloads and prepares dataset for reading.

Args
download_dir str, directory where downloaded files are stored. Defaults to "~/tensorflow-datasets/downloads".
download_config tfds.download.DownloadConfig, further configuration for downloading and preparing dataset.

Raises
IOError if there is not enough disk space available.

BUILDER_CONFIGS

MANUAL_DOWNLOAD_INSTRUCTIONS None
RELEASE_NOTES

SUPPORTED_VERSIONS

VERSION tfds.core.Version
builder_configs

code_path

name 'image_folder'
url_infos None