![]() |
Find the sentence fragments in a given text. (deprecated)
text.sentence_fragments(
token_word,
token_starts,
token_ends,
token_properties,
input_encoding='UTF-8',
errors='replace',
replacement_char=65533,
replace_control_characters=False
)
A sentence fragment is a potential next sentence determined using deterministic heuristics based on punctuation, capitalization, and similar text attributes.
Args | |
---|---|
token_word
|
A Tensor (w/ rank=2) or a RaggedTensor (w/ ragged_rank=1) containing the token strings. |
token_starts
|
A Tensor (w/ rank=2) or a RaggedTensor (w/ ragged_rank=1) containing offsets where the token starts. |
token_ends
|
A Tensor (w/ rank=2) or a RaggedTensor (w/ ragged_rank=1) containing offsets where the token ends. |
token_properties
|
A Tensor (w/ rank=2) or a RaggedTensor (w/ ragged_rank=1)
containing a bitmask.
The values of the bitmask are:
|
input_encoding
|
String name for the unicode encoding that should be used to decode each string. |
errors
|
Specifies the response when an input string can't be converted
using the indicated encoding. One of:
|
replacement_char
|
The replacement codepoint to be used in place of invalid
substrings in input when errors='replace' ; and in place of C0 control
characters in input when replace_control_characters=True .
|
replace_control_characters
|
Whether to replace the C0 control characters
(U+0000 - U+001F) with the replacement_char .
|
Returns | |
---|---|
A RaggedTensor of fragment_start , fragment_end , fragment_properties
and terminal_punc_token .
|