TF 2.0 is out! Get hands-on practice at TF World, Oct 28-31. Use code TF20 for 20% off select passes. Register now

tft.bag_of_words

tft.bag_of_words(
    tokens,
    ngram_range,
    separator,
    name=None
)

Computes a bag of "words" based on the specified ngram configuration.

A light wrapper around tft.ngrams. First computes ngrams, then transforms the ngram representation (list semantics) into a Bag of Words (set semantics) per row. Each row reflects the set of unique ngrams present in an input record.

See tft.ngrams for more information.

Args:

  • tokens: a two-dimensional SparseTensor of dtype tf.string containing tokens that will be used to construct a bag of words.
  • ngram_range: A pair with the range (inclusive) of ngram sizes to compute.
  • separator: a string that will be inserted between tokens when ngrams are constructed.
  • name: (Optional) A name for this operation.

Returns:

A SparseTensor containing the unique set of ngrams from each row of the input. Note: the original order of the ngrams may not be preserved.