Missed TensorFlow Dev Summit? Check out the video playlist. Watch recordings

tfds.core.DatasetInfo

View source on GitHub

Information about a dataset.

tfds.core.DatasetInfo(
    builder, description=None, features=None, supervised_keys=None, homepage=None,
    urls=None, citation=None, metadata=None, redistribution_info=None
)

DatasetInfo documents datasets, including its name, version, and features. See the constructor arguments and properties for a full list.

Args:

  • builder: DatasetBuilder, dataset builder for this info.
  • description: str, description of this dataset.
  • features: tfds.features.FeaturesDict, Information on the feature dict of the tf.data.Dataset() object from the builder.as_dataset() method.
  • supervised_keys: tuple of (input_key, target_key), Specifies the input feature and the label for supervised learning, if applicable for the dataset. The keys correspond to the feature names to select in info.features. When calling tfds.core.DatasetBuilder.as_dataset() with as_supervised=True, the tf.data.Dataset object will yield the (input, target) defined here.
  • homepage: str, optional, the homepage for this dataset.
  • urls: DEPRECATED, use homepage instead.
  • citation: str, optional, the citation to use for this dataset.
  • metadata: tfds.core.Metadata, additonal object which will be stored/restored with the dataset. This allows for storing additional information with the dataset.
  • redistribution_info: dict, optional, information needed for redistribution, as specified in dataset_info_pb2.RedistributionInfo. The content of the license subfield will automatically be written to a LICENSE file stored with the dataset.

Attributes:

  • as_json
  • as_proto
  • citation
  • data_dir
  • dataset_size: Generated dataset files size, in bytes.
  • description
  • download_size: Downloaded files size, in bytes.
  • features
  • full_name: Full canonical name: (//).
  • homepage
  • initialized: Whether DatasetInfo has been fully initialized.
  • metadata
  • name
  • redistribution_info
  • splits
  • supervised_keys
  • version

Methods

compute_dynamic_properties

View source

compute_dynamic_properties()

initialize_from_bucket

View source

initialize_from_bucket()

Initialize DatasetInfo from GCS bucket info files.

read_from_directory

View source

read_from_directory(
    dataset_info_dir
)

Update DatasetInfo from the JSON file in dataset_info_dir.

This function updates all the dynamically generated fields (num_examples, hash, time of creation,...) of the DatasetInfo.

This will overwrite all previous metadata.

Args:

  • dataset_info_dir: str The directory containing the metadata file. This should be the root directory of a specific dataset version.

update_splits_if_different

View source

update_splits_if_different(
    split_dict
)

Overwrite the splits if they are different from the current ones.

  • If splits aren't already defined or different (ex: different number of shards), then the new split dict is used. This will trigger stats computation during download_and_prepare.
  • If splits are already defined in DatasetInfo and similar (same names and shards): keep the restored split which contains the statistics (restored from GCS or file)

Args:

write_to_directory

View source

write_to_directory(
    dataset_info_dir
)

Write DatasetInfo as JSON to dataset_info_dir.