text.ngrams
Stay organized with collections
Save and categorize content based on your preferences.
Create a tensor of n-grams based on the input data data
.
text.ngrams(
data,
width,
axis=-1,
reduction_type=None,
string_separator=' ',
name=None
)
Used in the notebooks
Creates a tensor of n-grams based on data
. The n-grams are of width width
and are created along axis axis
; the n-grams are created by combining
windows of width
adjacent elements from data
using reduction_type
. This
op is intended to cover basic use cases; more complex combinations can be
created using the sliding_window op.
input_data = tf.ragged.constant([["e", "f", "g"], ["dd", "ee"]])
ngrams(
input_data,
width=2,
axis=-1,
reduction_type=Reduction.STRING_JOIN,
string_separator="|")
<tf.RaggedTensor [[b'e|f', b'f|g'], [b'dd|ee']]>
Args |
data
|
The data to reduce.
|
width
|
The width of the ngram window. If there is not sufficient data to
fill out the ngram window, the resulting ngram will be empty.
|
axis
|
The axis to create ngrams along. Note that for string join reductions,
only axis '-1' is supported; for other reductions, any positive or
negative axis can be used. Should be a constant.
|
reduction_type
|
A member of the Reduction enum. Should be a constant.
Currently supports:
|
string_separator
|
The separator string used for Reduction.STRING_JOIN .
Ignored otherwise. Must be a string constant, not a Tensor.
|
name
|
The op name.
|
Returns |
A tensor of ngrams. If the input is a tf.Tensor, the output will also
be a tf.Tensor; if the input is a tf.RaggedTensor, the output will be
a tf.RaggedTensor.
|
Raises |
InvalidArgumentError
|
if reduction_type is either None or not a Reduction,
or if reduction_type is STRING_JOIN and axis is not -1.
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-12-20 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-12-20 UTC."],[],[]]