Decodes each string in `input` into a sequence of Unicode code points.
The character codepoints for all strings are returned using a single vector `char_values`, with strings expanded to characters in row-major order.
The `row_splits` tensor indicates where the codepoints for each input string begin and end within the `char_values` tensor. In particular, the values for the `i`th string (in row-major order) are stored in the slice `[row_splits[i]:row_splits[i+1]]`. Thus:
- `char_values[row_splits[i]+j]` is the Unicode codepoint for the `j`th character in the `i`th string (in row-major order).
- `row_splits[i+1] - row_splits[i]` is the number of characters in the `i`th string (in row-major order).
Nested Classes
class | UnicodeDecode.Options |
Optional attributes for
UnicodeDecode
|
Public Methods
Output <Integer> |
charValues
()
A 1D int32 Tensor containing the decoded codepoints.
|
static <T extends Number> UnicodeDecode <T> |
create
(
Scope
scope,
Operand
<String> input, String inputEncoding, Class<T> Tsplits,
Options...
options)
Factory method to create a class wrapping a new UnicodeDecode operation.
|
static UnicodeDecode <Long> |
create
(
Scope
scope,
Operand
<String> input, String inputEncoding,
Options...
options)
Factory method to create a class wrapping a new UnicodeDecode operation using default output types.
|
static UnicodeDecode.Options |
errors
(String errors)
|
static UnicodeDecode.Options |
replaceControlCharacters
(Boolean replaceControlCharacters)
|
static UnicodeDecode.Options |
replacementChar
(Long replacementChar)
|
Output <T> |
rowSplits
()
A 1D int32 tensor containing the row splits.
|
Inherited Methods
Public Methods
public static UnicodeDecode <T> create ( Scope scope, Operand <String> input, String inputEncoding, Class<T> Tsplits, Options... options)
Factory method to create a class wrapping a new UnicodeDecode operation.
Parameters
scope | current scope |
---|---|
input | The text to be decoded. Can have any shape. Note that the output is flattened to a vector of char values. |
inputEncoding | Text encoding of the input strings. This is any of the encodings supported by ICU ucnv algorithmic converters. Examples: `"UTF-16", "US ASCII", "UTF-8"`. |
options | carries optional attributes values |
Returns
- a new instance of UnicodeDecode
public static UnicodeDecode <Long> create ( Scope scope, Operand <String> input, String inputEncoding, Options... options)
Factory method to create a class wrapping a new UnicodeDecode operation using default output types.
Parameters
scope | current scope |
---|---|
input | The text to be decoded. Can have any shape. Note that the output is flattened to a vector of char values. |
inputEncoding | Text encoding of the input strings. This is any of the encodings supported by ICU ucnv algorithmic converters. Examples: `"UTF-16", "US ASCII", "UTF-8"`. |
options | carries optional attributes values |
Returns
- a new instance of UnicodeDecode
public static UnicodeDecode.Options errors (String errors)
Parameters
errors | Error handling policy when there is invalid formatting found in the input. The value of 'strict' will cause the operation to produce a InvalidArgument error on any invalid input formatting. A value of 'replace' (the default) will cause the operation to replace any invalid formatting in the input with the `replacement_char` codepoint. A value of 'ignore' will cause the operation to skip any invalid formatting in the input and produce no corresponding output character. |
---|
public static UnicodeDecode.Options replaceControlCharacters (Boolean replaceControlCharacters)
Parameters
replaceControlCharacters | Whether to replace the C0 control characters (00-1F) with the `replacement_char`. Default is false. |
---|
public static UnicodeDecode.Options replacementChar (Long replacementChar)
Parameters
replacementChar | The replacement character codepoint to be used in place of any invalid formatting in the input when `errors='replace'`. Any valid unicode codepoint may be used. The default value is the default unicode replacement character is 0xFFFD or U+65533.) |
---|