tfds.core.DatasetInfo

Class DatasetInfo

Defined in core/dataset_info.py.

Information about a dataset.

DatasetInfo documents datasets, including its name, version, and features. See the constructor arguments and properties for a full list.

__init__

__init__(
    builder,
    description=None,
    features=None,
    supervised_keys=None,
    urls=None,
    citation=None
)

Constructs DatasetInfo.

Args:

  • builder: DatasetBuilder, dataset builder for this info.
  • description: str, description of this dataset.
  • features: tfds.features.FeaturesDict, Information on the feature dict of the tf.data.Dataset() object from the builder.as_dataset() method.
  • supervised_keys: tuple, Specifies the input feature and the label for supervised learning, if applicable for the dataset.
  • urls: list(str), optional, the homepage(s) for this dataset.
  • citation: str, optional, the citation to use for this dataset.

Properties

as_json

as_proto

citation

description

download_checksums

features

full_name

Full canonical name: (//).

initialized

Whether DatasetInfo has been fully initialized.

name

size_in_bytes

splits

supervised_keys

urls

version

Methods

compute_dynamic_properties

compute_dynamic_properties()

initialize_from_bucket

initialize_from_bucket()

Initialize DatasetInfo from GCS bucket info files.

read_from_directory

read_from_directory(dataset_info_dir)

Update DatasetInfo from the JSON file in dataset_info_dir.

This function updates all the dynamically generated fields (num_examples, hash, time of creation,...) of the DatasetInfo.

This will overwrite all previous metadata.

Args:

  • dataset_info_dir: str The directory containing the metadata file. This should be the root directory of a specific dataset version.

update_splits_if_different

update_splits_if_different(split_dict)

Overwrite the splits if they are different from the current ones.

  • If splits aren't already defined or different (ex: different number of shards), then the new split dict is used. This will trigger stats computation during download_and_prepare.
  • If splits are already defined in DatasetInfo and similar (same names and shards): keep the restored split which contains the statistics (restored from GCS or file)

Args:

write_to_directory

write_to_directory(dataset_info_dir)

Write DatasetInfo as JSON to dataset_info_dir.