Apply to speak at TensorFlow World. Deadline April 23rd. Propose talk

tfds.features.FeatureConnector

Class FeatureConnector

Defined in core/features/feature.py.

Abstract base class for feature types.

This class provides an interface between the way the information is stored on disk, and the way it is presented to the user.

Here is a diagram on how FeatureConnector methods fit into the data generation/reading:

generator => encode_example() => tf_example => decode_example() => data dict

The connector can either get raw or dictionary values as input, depending on the connector type.

Properties

dtype

Return the dtype (or dict of dtype) of this FeatureConnector.

serialized_keys

List of the flattened feature keys after serialization.

shape

Return the shape (or dict of shape) of this FeatureConnector.

Methods

decode_example

decode_example(tfexample_data)

Decode the feature dict to TF compatible input.

Args:

  • tfexample_data: Data or dictionary of data, as read by the tf-example reader. It correspond to the tf.Tensor() (or dict of tf.Tensor()) extracted from the tf.train.Example, matching the info defined in get_serialize_info().

Returns:

  • tensor_data: Tensor or dictionary of tensor, output of the tf.data.Dataset object

encode_example

encode_example(example_data)

Encode the feature dict into tf-example compatible input.

The input example_data can be anything that the user passed at data generation. For example:

For features:

features={
    'image': tfds.features.Image(),
    'custom_feature': tfds.features.CustomFeature(),
}

At data generation (in _generate_examples), if the user yields:

yield {
    'image': 'path/to/img.png',
    'custom_feature': [123, 'str', lambda x: x+1]
}

Then:

Args:

  • example_data: Value or dictionary of values to convert into tf-example compatible data.

Returns:

  • tfexample_data: Data or dictionary of data to write as tf-example. Data can be a list or numpy array. Note that numpy arrays are flattened so it's the feature connector responsibility to reshape them in decode_example(). Note that tf.train.Example only supports int64, float32 and string so the data returned here should be integer, float or string. User type can be restored in decode_example().

get_serialized_info

get_serialized_info()

Return the tf-example features for the adapter, as stored on disk.

This function indicates how this feature is encoded on file internally. The DatasetBuilder are written on disk as tf.train.Example proto.

Ex:

return {
    'image': tf.VarLenFeature(tf.uint8):
    'height': tf.FixedLenFeature((), tf.int32),
    'width': tf.FixedLenFeature((), tf.int32),
}

FeatureConnector which are not containers should return the feature proto directly:

return tf.FixedLenFeature((64, 64), tf.uint8)

If not defined, the retuned values are automatically deduced from the get_tensor_info function.

Returns:

  • features: Either a dict of feature proto object, or a feature proto object

get_tensor_info

get_tensor_info()

Return the tf.Tensor dtype/shape of the feature.

This returns the tensor dtype/shape, as returned by .as_dataset by the tf.data.Dataset object.

Ex:

return {
    'image': tfds.features.TensorInfo(shape=(None,), dtype=tf.uint8):
    'height': tfds.features.TensorInfo(shape=(), dtype=tf.int32),
    'width': tfds.features.TensorInfo(shape=(), dtype=tf.int32),
}

FeatureConnector which are not containers should return the feature proto directly:

return tfds.features.TensorInfo(shape=(256, 256), dtype=tf.uint8)

Returns:

load_metadata

load_metadata(
    data_dir,
    feature_name
)

Restore the feature metadata from disk.

If a dataset is re-loaded and generated files exists on disk, this function will restore the feature metadata from the saved file.

Args:

  • data_dir: str, path to the dataset folder to which save the info (ex: ~/datasets/cifar10/1.2.0/)
  • feature_name: str, the name of the feature (from the FeatureDict key)

save_metadata

save_metadata(
    data_dir,
    feature_name
)

Save the feature metadata on disk.

This function is called after the data has been generated (by _download_and_prepare) to save the feature connector info with the generated dataset.

Some dataset/features dynamically compute info during _download_and_prepare. For instance:

  • Labels are loaded from the downloaded data
  • Vocabulary is created from the downloaded data
  • ImageLabelFolder compute the image dtypes/shape from the manual_dir

After the info have been added to the feature, this function allow to save those additional info to be restored the next time the data is loaded.

By default, this function do not save anything, but sub-classes can overwrite the function.

Args:

  • data_dir: str, path to the dataset folder to which save the info (ex: ~/datasets/cifar10/1.2.0/)
  • feature_name: str, the name of the feature (from the FeatureDict key)