tf.quantization.quantize

Quantize the 'input' tensor of type float to 'output' tensor of type 'T'.

[min_range, max_range] are scalar floats that specify the range for the 'input' data. The 'mode' attribute controls exactly which calculations are used to convert the float values to their quantized equivalents. The 'round_mode' attribute controls which rounding tie-breaking algorithm is used when rounding float values to their quantized equivalents.

In 'MIN_COMBINED' mode, each value of the tensor will undergo the following:

out[i] = (in[i] - min_range) * range(T) / (max_range - min_range)
if T == qint8: out[i] -= (range(T) + 1) / 2.0

here range(T) = numeric_limits<T>::max() - numeric_limits<T>::min()

MIN_COMBINED Mode Example

Assume the input is type float and has a possible range of [0.0, 6.0] and the output type is quint8 ([0, 255]). The min_range and max_range values should be specified as 0.0 and 6.0. Quantizing from float to quint8 will multiply each value of the input by 255/6 and cast to quint8.

If the output type was qint8 ([-128, 127]), the operation will additionally subtract each value by 128 prior to casting, so that the range of values aligns with the range of qint8.

If the mode is 'MIN_FIRST', then this approach is used:

num_discrete_values = 1 << (# of bits in T)
range_adjust = num_discrete_values / (num_discrete_values - 1)
range = (range_max - range_min) * r