TF 2.0 is out! Get hands-on practice at TF World, Oct 28-31. Use code TF20 for 20% off select passes. Register now

tfds.features.Text

View source

Class Text

FeatureConnector for text, encoding to integers with a TextEncoder.

Inherits From: Tensor

__init__

View source

__init__(
    encoder=None,
    encoder_config=None
)

Constructs a Text FeatureConnector.

Args:

Properties

dtype

Return the dtype (or dict of dtype) of this FeatureConnector.

encoder

shape

Return the shape (or dict of shape) of this FeatureConnector.

vocab_size

Methods

decode_example

View source

decode_example(tfexample_data)

Decode the feature dict to TF compatible input.

Args:

  • tfexample_data: Data or dictionary of data, as read by the tf-example reader. It correspond to the tf.Tensor() (or dict of tf.Tensor()) extracted from the tf.train.Example, matching the info defined in get_serialized_info().

Returns:

  • tensor_data: Tensor or dictionary of tensor, output of the tf.data.Dataset object

encode_example

View source

encode_example(example_data)

get_serialized_info

View source

get_serialized_info()

Return the shape/dtype of features after encoding (for the adapter).

The FileAdapter then use those information to write data on disk.

This function indicates how this feature is encoded on file internally. The DatasetBuilder are written on disk as tf.train.Example proto.

Ex:

return {
    'image': tfds.features.TensorInfo(shape=(None,), dtype=tf.uint8),
    'height': tfds.features.TensorInfo(shape=(), dtype=tf.int32),
    'width': tfds.features.TensorInfo(shape=(), dtype=tf.int32),
}

FeatureConnector which are not containers should return the feature proto directly:

return tfds.features.TensorInfo(shape=(64, 64), tf.uint8)

If not defined, the retuned values are automatically deduced from the get_tensor_info function.

Returns:

  • features: Either a dict of feature proto object, or a feature proto object

get_tensor_info

View source

get_tensor_info()

See base class for details.

ints2str

View source

ints2str(int_values)

Conversion list[int] => decoded string.

load_metadata

View source

load_metadata(
    data_dir,
    feature_name
)

maybe_build_from_corpus

View source

maybe_build_from_corpus(
    corpus_generator,
    **kwargs
)

Call SubwordTextEncoder.build_from_corpus is encoder_cls is such.

If self.encoder is None and self._encoder_cls is of type SubwordTextEncoder, the method instantiates self.encoder as returned by SubwordTextEncoder.build_from_corpus().

Args:

maybe_set_encoder

View source

maybe_set_encoder(new_encoder)

Set encoder, but no-op if encoder is already set.

save_metadata

View source

save_metadata(
    data_dir,
    feature_name
)

str2ints

View source

str2ints(str_value)

Conversion string => encoded list[int].