tft.ngrams

Create a SparseTensor of n-grams.

Given a SparseTensor of tokens, returns a SparseTensor containing the ngrams that can be constructed from each row.

separator is inserted between each pair of tokens, so " " would be an appropriate choice if the tokens are words, while "" would be an appropriate choice if they are characters.

Example:

tokens = tf.SparseTensor(
        indices=[[0, 0], [0, 1], [0, 2], [1, 0], [1, 1], [1, 2], [1, 3]],
        values=['One', 'was', 'Johnny', 'Two', 'was', 'a', 'rat'],
        dense_shape=[2, 4])
print(tft.ngrams(tokens, ngram_range=(1, 3), separator=' '))
SparseTensor(indices=tf.Tensor(
    [[0 0] [0 1] [0 2] [0 3] [0 4] [0 5]
     [1 0] [1 1] [1 2] [1 3] [1 4] [1 5] [1 6] [1 7] [1 8]],
     shape=(15, 2), dtype=int64),
  values=tf.Tensor(
    [b'One' b'One was' b'One was Johnny' b'was' b'was Johnny' b'Johnny' b'Two'
     b'Two was' b'Two was a' b'was' b'was a' b'was a rat' b'a' b'a rat'
     b'rat'], shape=(15,), dtype=string),
  dense_shape=tf.Tensor([2 9], shape=(2,), dtype=int64))

tokens a two-dimensionalSparseTensor of dtype tf.string containing tokens that will be used to construct ngrams.
ngram_range A pair with the range (inclusive) of ngram sizes to return.
separator a string that will be inserted between tokens when ngrams are constructed.
name (Optional) A name for this operation.

A SparseTensor containing all ngrams from each row of the input. Note: if an ngram appears multiple times in the input row, it will be present the same number of times in the output. For unique ngrams, see tft.bag_of_words.

ValueError if tokens is not 2D.
ValueError if ngram_range[0] < 1 or ngram_range[1] < ngram_range[0]