TF 2.0 is out! Get hands-on practice at TF World, Oct 28-31. Use code TF20 for 20% off select passes. Register now

tfds.core.DatasetInfo

View source

Class DatasetInfo

Information about a dataset.

DatasetInfo documents datasets, including its name, version, and features. See the constructor arguments and properties for a full list.

__init__

View source

__init__(
    builder,
    description=None,
    features=None,
    supervised_keys=None,
    urls=None,
    citation=None,
    metadata=None,
    redistribution_info=None
)

Constructs DatasetInfo.

Args:

  • builder: DatasetBuilder, dataset builder for this info.
  • description: str, description of this dataset.
  • features: tfds.features.FeaturesDict, Information on the feature dict of the tf.data.Dataset() object from the builder.as_dataset() method.
  • supervised_keys: tuple of (input_key, target_key), Specifies the input feature and the label for supervised learning, if applicable for the dataset. The keys correspond to the feature names to select in info.features. When calling tfds.core.DatasetBuilder.as_dataset() with as_supervised=True, the tf.data.Dataset object will yield the (input, target) defined here.
  • urls: list(str), optional, the homepage(s) for this dataset.
  • citation: str, optional, the citation to use for this dataset.
  • metadata: tfds.core.Metadata, additonal object which will be stored/restored with the dataset. This allows for storing additional information with the dataset.
  • redistribution_info: dict, optional, information needed for redistribution, as specified in dataset_info_pb2.RedistributionInfo. The content of the license subfield will automatically be written to a LICENSE file stored with the dataset.

Properties

as_json

as_proto

citation

description

features

full_name

Full canonical name: (//).

initialized

Whether DatasetInfo has been fully initialized.

metadata

name

redistribution_info

size_in_bytes

splits

supervised_keys

urls

version

Methods

compute_dynamic_properties

View source

compute_dynamic_properties()

initialize_from_bucket

View source

initialize_from_bucket()

Initialize DatasetInfo from GCS bucket info files.

read_from_directory

View source

read_from_directory(dataset_info_dir)

Update DatasetInfo from the JSON file in dataset_info_dir.

This function updates all the dynamically generated fields (num_examples, hash, time of creation,...) of the DatasetInfo.

This will overwrite all previous metadata.

Args:

  • dataset_info_dir: str The directory containing the metadata file. This should be the root directory of a specific dataset version.

update_splits_if_different

View source

update_splits_if_different(split_dict)

Overwrite the splits if they are different from the current ones.

  • If splits aren't already defined or different (ex: different number of shards), then the new split dict is used. This will trigger stats computation during download_and_prepare.
  • If splits are already defined in DatasetInfo and similar (same names and shards): keep the restored split which contains the statistics (restored from GCS or file)

Args:

write_to_directory

View source

write_to_directory(dataset_info_dir)

Write DatasetInfo as JSON to dataset_info_dir.