Attend the Women in ML Symposium on December 7 Register now

text.ByteSplitter

Stay organized with collections Save and categorize content based on your preferences.

Splits a string tensor into bytes.

Inherits From: SplitterWithOffsets, Splitter

Methods

split

View source

Splits a string tensor into bytes.

The strings are split bytes. Thus, some unicode characters may be split into multiple bytes.

Example:

ByteSplitter().split("hello")
<tf.Tensor: shape=(5,), dtype=uint8, numpy=array([104, 101, 108, 108, 111],
dtype=uint8)>

Args
input A RaggedTensor or Tensor of strings with any shape.

Returns
A RaggedTensor of bytes. The returned shape is the shape of the input tensor with an added ragged dimension for the bytes that make up each string.

split_with_offsets

View source

Splits a string tensor into bytes.

The strings are split bytes. Thus, some unicode characters may be split into multiple bytes.

Example:

splitter = ByteSplitter()
bytes, starts, ends = splitter.split_with_offsets("hello")
print(bytes.numpy(), starts.numpy(), ends.numpy())
[104 101 108 108 111] [0 1 2 3 4] [1 2 3 4 5]

Args
input A RaggedTensor or Tensor of strings with any shape.

Returns
A RaggedTensor of bytest. The returned shape is the shape of the input tensor with an added ragged dimension for the bytes that make up each string.

Returns
A tuple (bytes, offsets) where:

  • bytes: A RaggedTensor of bytes.
  • start_offsets: A RaggedTensor of the bytes' starting byte offset.
  • end_offsets: A RaggedTensor of the bytes' ending byte offset.