Common Signatures for Text

This page describes common signatures that should be implemented by modules for tasks that accept text inputs.

Text feature vector

A text feature vector module creates a dense vector representation from text features. It accepts a batch of strings of shape [batch_size] and maps them to a float32 tensor of shape [batch_size, N]. This is often called text embedding in dimension N.

Basic usage

  embed = hub.Module("path/to/module")
  representations = embed([
      "A long sentence.",

Feature column usage

    feature_columns = [
      hub.text_embedding_column("comment", "path/to/module", trainable=False),
    input_fn = tf.estimator.input.numpy_input_fn(features, labels, shuffle=True)
    estimator = tf.estimator.DNNClassifier(hidden_units, feature_columns)
    estimator.train(input_fn, max_steps=100)


Modules have been pre-trained on different domains and/or tasks, and therefore not every text feature vector module would be suitable for your problem. E.g.: some modules could have been trained on a single language.