tensorflow::ops::QuantizeAndDequantize

#include <array_ops.h>

Quantizes then dequantizes a tensor.

Summary

This op simulates the precision loss from the quantized forward pass by:

  1. Quantizing the tensor to fixed point numbers, which should match the target quantization method when it is used in inference.
  2. Dequantizing it back to floating point numbers for the following ops, most likely matmul.

There are different ways to quantize. This version does not use the full range of the output type, choosing to elide the lowest possible value for symmetry (e.g., output range is -127 to 127, not -128 to 127 for signed 8 bit quantization), so that 0.0 maps to 0.

To perform this op, we first find the range of values in our tensor. The range we use is always centered on 0, so we find m such that

  1. m = max(abs(input_min), abs(input_max)) if range_given is true,
  2. m = max(max(abs(min_elem(input)), abs(max_elem(input))) otherwise.

Our input tensor range is then [-m, m].

Next, we choose our fixed-point quantization buckets, [min_fixed, max_fixed]. If signed_input is true, this is

[min_fixed, max_fixed ] = [-(1 << (num_bits - 1) - 1), (1 << (num_bits - 1)) - 1].

Otherwise, if signed_input is false, the fixed-point range is

[min_fixed, max_fixed] = [0, (1 << num_bits) - 1].

From this we compute our scaling factor, s:

s = (max_fixed - min_fixed) / (2 * m).

Now we can quantize and dequantize the elements of our tensor. An element e is transformed into e':

e' = (e * s).round_to_nearest() / s.

Note that we have a different number of buckets in the signed vs. unsigned cases. For example, if num_bits == 8, we get 254 buckets in the signed case vs. 255 in the unsigned case.

For example, suppose num_bits = 8 and m = 1. Then

[min_fixed, max_fixed] = [-127, 127], and s = (127 + 127) / 2 = 127.

Given the vector {-1, -0.5, 0, 0.3}, this is quantized to {-127, -63, 0, 38}, and dequantized to {-1, -63.0/127, 0, 38.0/127}.

Arguments:

  • scope: A Scope object
  • input: Tensor to quantize and then dequantize.

Optional attributes (see Attrs):

  • signed_input: If the quantization is signed or unsigned.
  • num_bits: The bitwidth of the quantization.
  • range_given: If the range is given or should be computed from the tensor.
  • input_min: If range is given, this is the min of the range.
  • input_max: If range is given, this is the max of the range.

Returns:

Constructors and Destructors

QuantizeAndDequantize(const ::tensorflow::Scope & scope, ::tensorflow::Input input)
QuantizeAndDequantize(const ::tensorflow::Scope & scope, ::tensorflow::Input input, const QuantizeAndDequantize::Attrs & attrs)

Public attributes

output

Public functions

node() const
::tensorflow::Node *
operator::tensorflow::Input() const
operator::tensorflow::Output() const

Public static functions

InputMax(float x)
InputMin(float x)
NumBits(int64 x)
RangeGiven(bool x)
SignedInput(bool x)

Structs

tensorflow::ops::QuantizeAndDequantize::Attrs

Optional attribute setters for QuantizeAndDequantize.

Public attributes

output

::tensorflow::Output output

Public functions

QuantizeAndDequantize

 QuantizeAndDequantize(
  const ::tensorflow::Scope & scope,
  ::tensorflow::Input input
)

QuantizeAndDequantize

 QuantizeAndDequantize(
  const ::tensorflow::Scope & scope,
  ::tensorflow::Input input,
  const QuantizeAndDequantize::Attrs & attrs
)

node

::tensorflow::Node * node() const 

operator::tensorflow::Input

 operator::tensorflow::Input() const 

operator::tensorflow::Output

 operator::tensorflow::Output() const 

Public static functions

InputMax

Attrs InputMax(
  float x
)

InputMin

Attrs InputMin(
  float x
)

NumBits

Attrs NumBits(
  int64 x
)

RangeGiven

Attrs RangeGiven(
  bool x
)

SignedInput

Attrs SignedInput(
  bool x
)