Help protect the Great Barrier Reef with TensorFlow on Kaggle Join Challenge

Convert raw bytes from input tensor into numeric tensors.

Used in the notebooks

Used in the guide

Every component of the input tensor is interpreted as a sequence of bytes. These bytes are then decoded as numbers in the format specified by out_type."1"), tf.uint8)
<tf.Tensor: shape=(1,), dtype=uint8, numpy=array([49], dtype=uint8)>"1,2"), tf.uint8)
<tf.Tensor: shape=(3,), dtype=uint8, numpy=array([49, 44, 50], dtype=uint8)>

Note that the rank of the output tensor is always one more than the input one:["1","2"]), tf.uint8).shape
TensorShape([2, 1])[["1"],["2"]]), tf.uint8).shape
TensorShape([2, 1, 1])

This is because each byte in the input is converted to a new value on the output (if output type is uint8 or int8, otherwise chunks of inputs get coverted to a new value):"123"), tf.uint8)
<tf.Tensor: shape=(3,), dtype=uint8, numpy=array([49, 50, 51], dtype=uint8)>"1234"), tf.uint8)
<tf.Tensor: shape=(4,), dtype=uint8, numpy=array([49, 50, 51, 52], ...
# chuncked output"12"), tf.uint16)
<tf.Tensor: shape=(1,), dtype=uint16, numpy=array([12849], dtype=uint16)>"1234"), tf.uint16)
<tf.Tensor: shape=(2,), dtype=uint16, numpy=array([12849, 13363], ...
# int64 output"12345678"), tf.int64)
<tf.Tensor: ... numpy=array([4050765991979987505])>"1234567887654321"), tf.int64)
<tf.Tensor: ... numpy=array([4050765991979987505, 3544952156018063160])>

The operation allows specifying endianness via the little_endian parameter."\x0a\x0b"), tf.int16)
<tf.Tensor: shape=(1,), dtype=int16, numpy=array([2826], dtype=int16)>
'0xb0a'"\x0a\x0b"), tf.int16, little_endian=False)
<tf.Tensor: shape=(1,), dtype=int16, numpy=array([2571], dtype=int16)>

If the elements of input_bytes are of different length, you must specify fixed_length:[["1"],["23"]]), tf.uint8, fixed_length=4)
<tf.Tensor: shape=(2, 1, 4), dtype=uint8, numpy=
array([[[49,  0,  0,  0]],
       [[50, 51,  0,  0]]], dtype=uint8)>

If the fixed_length value is larger that the length of the out_type dtype, multiple values are generated:["1212"]), tf.uint16, fixed_length=4)
<tf.Tensor: shape=(1, 2), dtype=uint16, numpy=array([[12849, 12849]], ...

If the input value is larger than fixed_length, it is truncated:

x=''.join([chr(1), chr(2), chr(3), chr(4)]), tf.uint16, fixed_length=2)
<tf.Tensor: shape=(1,), dtype=uint16, numpy=array([513], dtype=uint16)>

If little_endian and fixed_length are specified, truncation to the fixed length occurs before endianness conversion:

x=''.join([chr(1), chr(2), chr(3), chr(4)]), tf.uint16, fixed_length=2, little_endian=False)
<tf.Tensor: shape=(1,), dtype=uint16, numpy=array([258], dtype=uint16)>

If input values all have the same length, then specifying fixed_length equal to the size of the strings should not change output:

x = ["12345678", "87654321"], tf.int16)
<tf.Tensor: shape=(2, 4), dtype=int16, numpy=
array([[12849, 13363, 13877, 14391],
       [14136, 13622, 13108, 12594]], dtype=int16)>, tf.int16, fixed_length=len(x[0]))
<tf.Tensor: shape=(2, 4), dtype=int16, numpy=
array([[12849, 13363, 13877, 14391],
       [14136, 13622, 13108, 12594]], dtype=int16)>

input_bytes Each element of the input Tensor is converted to an array of bytes.

Currently, this must be a tensor of strings (bytes), although semantically the operation should support any input.

out_type DType of the output. Acceptable types are half, float, double, int32, uint16, uint8, int16, int8, int64.
little_endian Whether the input_bytes data is in little-endian format. Data will be converted into host byte order if necessary.