Module: tfds.features

API defining dataset features (image, text, scalar,...).

FeatureConnector is a way of abstracting what data is returned by the tensorflow/datasets builders from how they are encoded/decoded from file.

Using tfds.features.FeatureConnector in tfds.core.GeneratorBasedBuilder

To implement a new dataset:

        'input': tfds.features.Image(shape=(28, 28, 1)),
        'target': tfds.features.ClassLabel(names=['no', 'yes']),
        'extra_data': {
            'label_id': tf.int64,
            'language': tf.string,
  • In tfds.core.GeneratorBasedBuilder._generate_examples: Examples should be yield to match the structure defined in tfds.core.DatasetInfo. Values are automatically encoded.
yield {
    'input': '/path/to/img0.png',  # `np.array`, bytes file object also accepted
    'target': 'yes',  # Converted to int id 1
    'extra_data': {
        'label_id': 43,
        'language': 'en',
ds = tfds.load(...)
ds.element_spec == {
    'input': tf.TensorSpec(shape=(28, 28, 1), tf.uint8),
    'target': tf.TensorSpec(shape=(), tf.int64),
    'extra_data': {
        'label_id': tf.TensorSpec(shape=(), tf.int64),
        'language': tf.TensorSpec(shape=(), tf.string),

Create your own tfds.features.FeatureConnector

To create your own feature connector, you need to inherit from tfds.features.FeatureConnector and implement the abstract methods.

  • If your feature is a single tensor, it's best to inherit from tfds.feature.Tensor and use super() when needed. See tfds.features.BBoxFeature source code for an example.

  • If your feature is a container of multiple tensors, it's best to inherit from tfds.feature.FeaturesDict and use the super() to automatically encode sub-connectors.


class Audio: FeatureConnector for audio, encoded as raw integer wave form.

class BBox: BBox(ymin, xmin, ymax, xmax)

class BBoxFeature: FeatureConnector for a normalized bounding box.

class ClassLabel: FeatureConnector for integer class labels.

class FeatureConnector: Abstract base class for feature types.

class FeaturesDict: Composite FeatureConnector; each feature in dict has its own connector.

class Image: FeatureConnector for images.

class Sequence: Composite FeatureConnector for a dict where each value is a list.

class Tensor: FeatureConnector for generic data of arbitrary shape and type.

class TensorInfo: Structure containing info on the tf.Tensor shape/dtype.

class Text: FeatureConnector for text, encoding to integers with a TextEncoder.

class Video: FeatureConnector for videos, encoding frames individually on disk.