tfds.features.Text

Class Text

FeatureConnector for text, encoding to integers with a TextEncoder.

Inherits From: FeatureConnector

Defined in core/features/text_feature.py.

__init__

__init__(
    encoder=None,
    encoder_config=None
)

Constructs a Text FeatureConnector.

Args:

Properties

dtype

Return the dtype (or dict of dtype) of this FeatureConnector.

encoder

serialized_keys

List of the flattened feature keys after serialization.

shape

Return the shape (or dict of shape) of this FeatureConnector.

vocab_size

Methods

decode_example

decode_example(tfexample_data)

encode_example

encode_example(example_data)

get_serialized_info

get_serialized_info()

Return the tf-example features for the adapter, as stored on disk.

This function indicates how this feature is encoded on file internally. The DatasetBuilder are written on disk as tf.train.Example proto.

Ex:

return {
    'image': tf.VarLenFeature(tf.uint8):
    'height': tf.FixedLenFeature((), tf.int32),
    'width': tf.FixedLenFeature((), tf.int32),
}

FeatureConnector which are not containers should return the feature proto directly:

return tf.FixedLenFeature((64, 64), tf.uint8)

If not defined, the retuned values are automatically deduced from the get_tensor_info function.

Returns:

  • features: Either a dict of feature proto object, or a feature proto object

get_tensor_info

get_tensor_info()

ints2str

ints2str(int_values)

Conversion list[int] => decoded string.

load_metadata

load_metadata(
    data_dir,
    feature_name
)

maybe_build_from_corpus

maybe_build_from_corpus(
    corpus_generator,
    **kwargs
)

Call SubwordTextEncoder.build_from_corpus is encoder_cls is such.

maybe_set_encoder

maybe_set_encoder(new_encoder)

Set encoder, but no-op if encoder is already set.

save_metadata

save_metadata(
    data_dir,
    feature_name
)

str2ints

str2ints(str_value)

Conversion string => encoded list[int].