View source on GitHub |
Normalizes each UTF-8 string in the input tensor using the specified rule.
text.normalize_utf8_with_offsets_map(
input, normalization_form='NFKC', name=None
)
Returns normalized strings and an offset map used by another operation to map post-normalized string offsets to pre-normalized string offsets.
See http://unicode.org/reports/tr15/
Examples:
# input: <string>[num_strings]
normalize_utf8_with_offsets_map(["株式会社", "KADOKAWA"])
# output: <string>[num_strings], <variant>[num_strings]
NormalizeUTF8WithOffsetsMap(output=<tf.Tensor: shape=(2,), dtype=string,
numpy=
array([b'\xe6\xa0\xaa\xe5\xbc\x8f\xe4\xbc\x9a\xe7\xa4\xbe', b'KADOKAWA'],
dtype=object)>, offsets_map=<tf.Tensor: shape=(2,), dtype=variant,
numpy=<unprintable>>)