text.normalize_utf8

Normalizes each UTF-8 string in the input tensor using the specified rule.

Used in the notebooks

Used in the tutorials

See http://unicode.org/reports/tr15/

Examples:

# input: <string>[num_strings]
normalize_utf8(["株式会社", "KADOKAWA"])
# output: <string>[num_strings]
<tf.Tensor: shape=(2,), dtype=string, numpy=
array([b'\xe6\xa0\xaa\xe5\xbc\x8f\xe4\xbc\x9a\xe7\xa4\xbe', b'KADOKAWA'],
      dtype=object)>

input A Tensor or RaggedTensor of type string. (Must be UTF-8.)
normalization_form One of the following string values ('NFC', 'NFKC', 'NFD', 'NFKD'). Default is 'NFKC'.
name The name for this op (optional).

A Tensor or RaggedTensor of type string, with normalized contents.