View source on GitHub |
Create a tensor of n-grams based on data
.
tf.strings.ngrams(
data,
ngram_width,
separator=' ',
pad_values=None,
padding_width=None,
preserve_short_sequences=False,
name=None
)
Creates a tensor of n-grams based on data
. The n-grams are created by
joining windows of width
adjacent strings from the inner axis of data
using separator
.
The input data can be padded on both the start and end of the sequence, if
desired, using the pad_values
argument. If set, pad_values
should contain
either a tuple of strings or a single string; the 0th element of the tuple
will be used to pad the left side of the sequence and the 1st element of the
tuple will be used to pad the right side of the sequence. The padding_width
arg controls how many padding values are added to each side; it defaults to
ngram_width-1
.
If this op is configured to not have padding, or if it is configured to add
padding with padding_width
set to less than ngram_width-1, it is possible
that a sequence, or a sequence plus padding, is smaller than the ngram
width. In that case, no ngrams will be generated for that sequence. This can
be prevented by setting preserve_short_sequences
, which will cause the op
to always generate at least one ngram per non-empty sequence.
Examples:
tf.strings.ngrams(["A", "B", "C", "D"], 2).numpy()
array([b'A B', b'B C', b'C D'], dtype=object)
tf.strings.ngrams(["TF", "and", "keras"], 1).numpy()
array([b'TF', b'and', b'keras'], dtype=object)
Returns | |
---|---|
A RaggedTensor of ngrams. If data.shape=[D1...DN, S] , then
output.shape=[D1...DN, NUM_NGRAMS] , where
NUM_NGRAMS=S-ngram_width+1+2*padding_width .
|
Raises | |
---|---|
TypeError
|
if pad_values is set to an invalid type.
|
ValueError
|
if pad_values , padding_width , or ngram_width is set to an
invalid value.
|