tf.raw_ops.QuantizedMatMulWithBiasAndReluAndRequantize

robots: noindex

Perform a quantized matrix multiplication of a by the matrix b with bias

add and relu and requantize fusion.

The inputs must be two-dimensional matrices and 1D bias vector. And the inner dimension of a (after being transposed if transpose_a is non-zero) must match the outer dimension of b (after being transposed if transposed_b is non-zero). Then do broadcast add operation with bias values on the matrix multiplication result. The bias size must match inner dimension of b. Then do relu activation to get non-negative result. Then do requantize operation to get final uint8 result.

Args: a: A Tensor. Must be one of the following types: qint8, quint8, qint32, qint16, quint16. A matrix to be multiplied. Must be a two-dimensional tensor of type quint8. b: A Tensor. Must be one of the following types: qint8, quint8, qint32, qint16, quint16. A matrix to be multiplied and must be a two-dimensional tensor of type qint8. bias: A Tensor. Must be one of the following types: float32, qint32. A 1D bias tensor with size matching with inner dimension of b (after being transposed if transposed_b is non-zero). min_a: A Tensor of type float32. The float value that the lowest quantized a value represents. max_a: A Tensor of type float32. The float value that the highest quantized a value represents. min_b: A Tensor of type float32. The float value that the lowest quantized b value represents. max_b: A Tensor of type float32. The float value that the highest quantized b value represents. min_freezed_output: A Tensor of type float32. The float value that the highest quantized output value after requantize. max_freezed_output: A Tensor of type float32. Toutput: An optional tf.DType from: tf.qint8, tf.quint8, tf.qint32, tf.qint16, tf.quint16. Defaults to tf.quint8. transpose_a: An optional bool. Defaults to False. If true, a is transposed before multiplication. transpose_b: An optional bool. Defaults to False. If true, b is transposed before multiplication. input_quant_mode: An optional string from: "MIN_FIRST", "SCALED". Defaults to "MIN_FIRST". Input data quantization mode. Either MIN_FIRST(default) or SCALED. name: A name for the operation (optional).

Returns: A tuple of Tensor objects (out, min_out, max_out).

out: A `Tensor` of type `Toutput`.
min_out: A `Tensor` of type `float32`.
max_out: A `Tensor` of type `float32`.