Apply to speak at TensorFlow World. Deadline April 23rd. Propose talk

Common Signatures for Text

This page describes common signatures that should be implemented by modules for tasks that accept text inputs.

Text feature vector

A text feature vector module creates a dense vector representation from text features. It accepts a batch of strings of shape [batch_size] and maps them to a float32 tensor of shape [batch_size, N]. This is often called text embedding in dimension N.

Basic usage

  embed = hub.Module("path/to/module")
  representations = embed([
      "A long sentence.",
      "single-word",
      "http://example.com"])

Feature column usage

    feature_columns = [
      hub.text_embedding_column("comment", "path/to/module", trainable=False),
    ]
    input_fn = tf.estimator.inputs.numpy_input_fn(features, labels, shuffle=True)
    estimator = tf.estimator.DNNClassifier(hidden_units, feature_columns)
    estimator.train(input_fn, max_steps=100)

Notes

Modules have been pre-trained on different domains and/or tasks, and therefore not every text feature vector module would be suitable for your problem. E.g.: some modules could have been trained on a single language.