tf.experimental.tensorrt.Converter

An offline converter for TF-TRT transformation for TF 2.0 SavedModels.

Currently this is not available on Windows platform.

There are several ways to run the conversion:

  1. FP32/FP16 precision

    params = tf.experimental.tensorrt.ConversionParams(
        precision_mode='FP16')
    converter = tf.experimental.tensorrt.Converter(
        input_saved_model_dir="my_dir", conversion_params=params)
    converter.convert()
    converter.save(output_saved_model_dir)
    

    In this case, no TRT engines will be built or saved in the converted SavedModel. But if input data is available during conversion, we can still build and save the TRT engines to reduce the cost during inference (see option 2 below).

  2. FP32/FP16 precision with pre-built engines

    params = tf.experimental.tensorrt.ConversionParams(
        precision_mode='FP16',
        # Set this to a large enough number so it can cache all the engines.
        maximum_cached_engines=16)
    converter = tf.experimental.tensorrt.Converter(
        input_saved_model_dir="my_dir", conversion_params=params)
    converter.convert()
    
    # Define a generator function that yields input data, and use it to execute
    # the graph to build TRT engines.
    # With TensorRT 5.1, different engines will be built (and saved later) for
    # different input shapes to the TRTEngineOp.
    def my_input_fn():
      for _ in range(num_runs):
        inp1, inp2 = ...
        yield inp1, inp2
    
    converter.build(input_fn=my_input_fn)  # Generate corresponding TRT engines
    converter.save(output_saved_model_dir)  # Generated engines will be saved.
    

    In this way, one engine will be built/saved for each unique input shapes of the TRTEngineOp. This is good for applications that cannot afford building engines during inference but have access to input data that is similar to the one used in production (for example, that has the same input shapes). Also, the generated TRT engines is platform dep