tff.analytics.heavy_hitters.iblt.IbltDecoder

Decodes the strings and counts stored in an IBLT data structure.

iblt Tensor representing the IBLT computed by the IbltEncoder.
capacity Number of distinct strings that we expect to be inserted.
string_max_bytes Maximum length of a string in bytes that can be inserted.
encoding The character encoding of the string data to decode. For non-character binary data or strings with unknown encoding, specify CharacterEncoding.UNKNOWN. Defaults to CharacterEncoding.UTF8.
seed Integer seed for hash functions. Defaults to 0.
repetitions Number of repetitions in IBLT data structure (must be >= 3). Defaults to 3.
hash_family A str specifying the hash family to use to construct IBLT. Options include coupled or random, default is chosen based on capacity.
hash_family_params An optional dict of parameters that the hash family hasher expects. Defaults are chosen based on capacity.
field_size The field size for all values in IBLT. Defaults to 2**31 - 1.

Methods

decode_string_from_chunks

View source

Computes string from sequence of ints each encoding 'chunk_length' bytes.

Inverse of IBLTEncoder.compute_iblt.

Args
chunks A tf.Tensor of num_chunks integers.

Returns
A tf.Tensor with the string encoded in the chunks.

get_freq_estimates

View source

Decodes key-value pairs from an IBLT.

Note that this method only works for UTF-8 strings, and when running TF in Eager mode.

Returns
A dictionary containing a decoded key with its frequency.

get_freq_estimates_tf

View source

Decodes key-value pairs from an IBLT.

Returns
(out_strings, out_counts, num_not_decoded) where out_strings is tf.Tensor containing all the decoded strings, out_counts is a tf.Tensor containing the counts of each string and num_not_decoded is tf.Tensor with the number of items not decoded in the IBLT.