Signatures in TensorFlow Lite

View on TensorFlow.org Run in Google Colab View source on GitHub Download notebook

TensorFlow Lite supports converting TensorFlow model's input/output specifications to TensorFlow Lite models. The input/output specifications are called "signatures". Signatures can be specified when building a SavedModel or creating concrete functions.

Signatures in TensorFlow Lite provide the following features:

  • They specify inputs and outputs of the converted TensorFlow Lite model by respecting the TensorFlow model's signatures.
  • Allow a single TensorFlow Lite model to support multiple entry points.

The signature is composed of three pieces:

  • Inputs: Map for inputs from input name in the signature to an input tensor.
  • Outputs: Map for output mapping from output name in signature to an output tensor.
  • Signature Key: Name that identifies an entry point of the graph.

Setup

pip uninstall -y tensorflow keras
pip install tf-nightly
import tensorflow as tf

Example model

Let's say we have two tasks, e.g., encoding and decoding, as a TensorFlow model:

class Model(tf.Module):

  @tf.function(input_signature=[tf.TensorSpec(shape=[None], dtype=tf.float32)])
  def encode(self, x):
    result = tf.strings.as_string(x)
    return {
         "encoded_result": result
    }

  @tf.function(input_signature=[tf.TensorSpec(shape=[None], dtype=tf.string)])
  def decode(self, x):
    result = tf.strings.to_number(x)
    return {
         "decoded_result": result
    }

In the signature wise, the above TensorFlow model can be summarized as follows:

  • Signature

    • Key: encode
    • Inputs: {"x"}
    • Output: {"encoded_result"}
  • Signature

    • Key: decode
    • Inputs: {"x"}
    • Output: {"decoded_result"}

Convert a model with Signatures

TensorFlow Lite converter APIs will bring the above signature information into the converted TensorFlow Lite model.

This conversion functionality is available on all the converter APIs starting from TensorFlow version 2.7.0. See example usages.

From Saved Model

model = Model()

# Save the model
SAVED_MODEL_PATH = 'content/saved_models/coding'

tf.saved_model.save(
    model, SAVED_MODEL_PATH,
    signatures={
      'encode': model.encode.get_concrete_function(),
      'decode': model.decode.get_concrete_function()
    })

# Convert the saved model using TFLiteConverter
converter = tf.lite.TFLiteConverter.from_saved_model(SAVED_MODEL_PATH)
converter.target_spec.supported_ops = [
    tf.lite.OpsSet.TFLITE_BUILTINS,  # enable TensorFlow Lite ops.
    tf.lite.OpsSet.SELECT_TF_OPS  # enable TensorFlow ops.
]
tflite_model = converter.convert()

# Print the signatures from the converted model
interpreter = tf.lite.Interpreter(model_content=tflite_model)
signatures = interpreter.get_signature_list()
print(signatures)
2021-08-12 11:18:11.446900: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-12 11:18:11.454992: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-12 11:18:11.455897: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-12 11:18:11.457749: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-08-12 11:18:11.458392: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-12 11:18:11.459347: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-12 11:18:11.460223: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-12 11:18:12.044593: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-12 11:18:12.045532: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-12 11:18:12.046435: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-12 11:18:12.047303: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1505] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14646 MB memory:  -> device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:05.0, compute capability: 7.0
2021-08-12 11:18:12.127254: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
INFO:tensorflow:Assets written to: content/saved_models/coding/assets
{'decode': {'inputs': ['x'], 'outputs': ['decoded_result']}, 'encode': {'inputs': ['x'], 'outputs': ['encoded_result']} }
2021-08-12 11:18:12.484927: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:340] Ignored output_format.
2021-08-12 11:18:12.484964: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:343] Ignored drop_control_dependency.
2021-08-12 11:18:12.484973: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:349] Ignored change_concat_input_ranges.
2021-08-12 11:18:12.485901: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: content/saved_models/coding
2021-08-12 11:18:12.486272: I tensorflow/cc/saved_model/reader.cc:107] Reading meta graph with tags { serve }
2021-08-12 11:18:12.486305: I tensorflow/cc/saved_model/reader.cc:148] Reading SavedModel debug info (if present) from: content/saved_models/coding
2021-08-12 11:18:12.487648: I tensorflow/cc/saved_model/loader.cc:210] Restoring SavedModel bundle.
2021-08-12 11:18:12.496246: I tensorflow/cc/saved_model/loader.cc:194] Running initialization op on SavedModel bundle at path: content/saved_models/coding
2021-08-12 11:18:12.498999: I tensorflow/cc/saved_model/loader.cc:283] SavedModel load for tags { serve }; Status: success: OK. Took 13106 microseconds.
2021-08-12 11:18:12.523102: W tensorflow/compiler/mlir/lite/flatbuffer_export.cc:1890] TFLite interpreter needs to link Flex delegate in order to run the model since it contains the following Select TFop(s):
Flex ops: FlexAsString, FlexStringToNumber
Details:
    tf.AsString(tensor<?xf32>) -> (tensor<?x!tf_type.string>) : {device = "", fill = "", precision = -1 : i64, scientific = false, shortest = false, width = -1 : i64}
    tf.StringToNumber(tensor<?x!tf_type.string>) -> (tensor<?xf32>) : {device = "", out_type = f32}
See instructions: https://www.tensorflow.org/lite/guide/ops_select
2021-08-12 11:18:12.523132: I tensorflow/compiler/mlir/lite/flatbuffer_export.cc:1961] Estimated count of arithmetic ops: 0  ops, equivalently 0  MACs

INFO: Created TensorFlow Lite delegate for select TF ops.
2021-08-12 11:18:12.529690: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-12 11:18:12.530171: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-12 11:18:12.530501: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-12 11:18:12.530883: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-12 11:18:12.531205: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-12 11:18:12.531472: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1505] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14646 MB memory:  -> device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:05.0, compute capability: 7.0
INFO: TfLiteFlexDelegate delegate: 1 nodes delegated out of 1 nodes with 1 partitions.

INFO: TfLiteFlexDelegate delegate: 1 nodes delegated out of 1 nodes with 1 partitions.

From Keras Model

# Generate a Keras model.
keras_model = tf.keras.Sequential(
    [
        tf.keras.layers.Dense(2, input_dim=4, activation='relu', name='x'),
        tf.keras.layers.Dense(1, activation='relu', name='output'),
    ]
)

# Convert the keras model using TFLiteConverter.
# Keras model converter API uses the default signature automatically.
converter = tf.lite.TFLiteConverter.from_keras_model(keras_model)
tflite_model = converter.convert()

# Print the signatures from the converted model
interpreter = tf.lite.Interpreter(model_content=tflite_model)

signatures = interpreter.get_signature_list()
print(signatures)
INFO:tensorflow:Assets written to: /tmp/tmp5yj3wrmr/assets
{'serving_default': {'inputs': ['x_input'], 'outputs': ['output']} }
2021-08-12 11:18:13.370422: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:340] Ignored output_format.
2021-08-12 11:18:13.370459: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:343] Ignored drop_control_dependency.
2021-08-12 11:18:13.370679: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /tmp/tmp5yj3wrmr
2021-08-12 11:18:13.371482: I tensorflow/cc/saved_model/reader.cc:107] Reading meta graph with tags { serve }
2021-08-12 11:18:13.371507: I tensorflow/cc/saved_model/reader.cc:148] Reading SavedModel debug info (if present) from: /tmp/tmp5yj3wrmr
2021-08-12 11:18:13.374826: I tensorflow/cc/saved_model/loader.cc:210] Restoring SavedModel bundle.
2021-08-12 11:18:13.384156: I tensorflow/cc/saved_model/loader.cc:194] Running initialization op on SavedModel bundle at path: /tmp/tmp5yj3wrmr
2021-08-12 11:18:13.387187: I tensorflow/cc/saved_model/loader.cc:283] SavedModel load for tags { serve }; Status: success: OK. Took 16509 microseconds.

From Concrete Functions

model = Model()

# Convert the concrete functions using TFLiteConverter
converter = tf.lite.TFLiteConverter.from_concrete_functions(
    [model.encode.get_concrete_function(),
     model.decode.get_concrete_function()], model)
converter.target_spec.supported_ops = [
    tf.lite.OpsSet.TFLITE_BUILTINS,  # enable TensorFlow Lite ops.
    tf.lite.OpsSet.SELECT_TF_OPS  # enable TensorFlow ops.
]
tflite_model = converter.convert()

# Print the signatures from the converted model
interpreter = tf.lite.Interpreter(model_content=tflite_model)
signatures = interpreter.get_signature_list()
print(signatures)
INFO:tensorflow:Assets written to: /tmp/tmpxko_peby/assets
INFO:tensorflow:Assets written to: /tmp/tmpxko_peby/assets
{'decode': {'inputs': ['x'], 'outputs': ['decoded_result']}, 'encode': {'inputs': ['x'], 'outputs': ['encoded_result']} }
2021-08-12 11:18:13.531549: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:340] Ignored output_format.
2021-08-12 11:18:13.531584: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:343] Ignored drop_control_dependency.
2021-08-12 11:18:13.531804: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /tmp/tmpxko_peby
2021-08-12 11:18:13.532113: I tensorflow/cc/saved_model/reader.cc:107] Reading meta graph with tags { serve }
2021-08-12 11:18:13.532127: I tensorflow/cc/saved_model/reader.cc:148] Reading SavedModel debug info (if present) from: /tmp/tmpxko_peby
2021-08-12 11:18:13.532920: I tensorflow/cc/saved_model/loader.cc:210] Restoring SavedModel bundle.
2021-08-12 11:18:13.536413: I tensorflow/cc/saved_model/loader.cc:194] Running initialization op on SavedModel bundle at path: /tmp/tmpxko_peby
2021-08-12 11:18:13.538487: I tensorflow/cc/saved_model/loader.cc:283] SavedModel load for tags { serve }; Status: success: OK. Took 6683 microseconds.
2021-08-12 11:18:13.562462: W tensorflow/compiler/mlir/lite/flatbuffer_export.cc:1890] TFLite interpreter needs to link Flex delegate in order to run the model since it contains the following Select TFop(s):
Flex ops: FlexAsString, FlexStringToNumber
Details:
    tf.AsString(tensor<?xf32>) -> (tensor<?x!tf_type.string>) : {device = "", fill = "", precision = -1 : i64, scientific = false, shortest = false, width = -1 : i64}
    tf.StringToNumber(tensor<?x!tf_type.string>) -> (tensor<?xf32>) : {device = "", out_type = f32}
See instructions: https://www.tensorflow.org/lite/guide/ops_select
2021-08-12 11:18:13.562495: I tensorflow/compiler/mlir/lite/flatbuffer_export.cc:1961] Estimated count of arithmetic ops: 0  ops, equivalently 0  MACs

2021-08-12 11:18:13.568235: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-12 11:18:13.568669: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-12 11:18:13.568943: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-12 11:18:13.569286: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-12 11:18:13.569568: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-12 11:18:13.569821: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1505] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14646 MB memory:  -> device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:05.0, compute capability: 7.0
INFO: TfLiteFlexDelegate delegate: 1 nodes delegated out of 1 nodes with 1 partitions.

INFO: TfLiteFlexDelegate delegate: 1 nodes delegated out of 1 nodes with 1 partitions.

Run Signatures

TensorFlow inference APIs support the signature-based executions:

  • Accessing the input/output tensors through the names of the inputs and outputs, specified by the signature.
  • Running each entry point of the graph separately, identified by the signature key.
  • Support for the SavedModel's initialization procedure.

Java, C++ and Python language bindings are currently available. See example the below sections.

Java

try (Interpreter interpreter = new Interpreter(file_of_tensorflowlite_model)) {
  // Run encoding signature.
  Map<String, Object> inputs = new HashMap<>();
  inputs.put("x", input);
  Map<String, Object> outputs = new HashMap<>();
  outputs.put("encoded_result", encoded_result);
  interpreter.runSignature(inputs, outputs, "encode");

  // Run decoding signature.
  Map<String, Object> inputs = new HashMap<>();
  inputs.put("x", encoded_result);
  Map<String, Object> outputs = new HashMap<>();
  outputs.put("decoded_result", decoded_result);
  interpreter.runSignature(inputs, outputs, "decode");
}

C++

SignatureRunner* encode_runner =
    interpreter->GetSignatureRunner("encode");
encode_runner->ResizeInputTensor("x", {100});
encode_runner->AllocateTensors();

TfLiteTensor* input_tensor = encode_runner->input_tensor("x");
float* input = input_tensor->data.f;
// Fill `input`.

encode_runner->Invoke();

const TfLiteTensor* output_tensor = encode_runner->output_tensor(
    "encoded_result");
float* output = output_tensor->data.f;
// Access `output`.

Python

# Load the TFLite model in TFLite Interpreter
interpreter = tf.lite.Interpreter(model_content=tflite_model)

# Print the signatures from the converted model
signatures = interpreter.get_signature_list()
print('Signature:', signatures)

# encode and decode are callable with input as arguments.
encode = interpreter.get_signature_runner('encode')
decode = interpreter.get_signature_runner('decode')

# 'encoded' and 'decoded' are dictionaries with all outputs from the inference.
input = tf.constant([1, 2, 3], dtype=tf.float32)
print('Input:', input)
encoded = encode(x=input)
print('Encoded result:', encoded)
decoded = decode(x=encoded['encoded_result'])
print('Decoded result:', decoded)
Signature: {'decode': {'inputs': ['x'], 'outputs': ['decoded_result']}, 'encode': {'inputs': ['x'], 'outputs': ['encoded_result']} }
Input: tf.Tensor([1. 2. 3.], shape=(3,), dtype=float32)
Encoded result: {'encoded_result': array([b'1.000000', b'2.000000', b'3.000000'], dtype=object)}
Decoded result: {'decoded_result': array([1., 2., 3.], dtype=float32)}
2021-08-12 11:18:13.581570: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-12 11:18:13.582082: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-12 11:18:13.582383: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-12 11:18:13.582728: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-12 11:18:13.583064: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-12 11:18:13.583324: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1505] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14646 MB memory:  -> device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:05.0, compute capability: 7.0
INFO: TfLiteFlexDelegate delegate: 1 nodes delegated out of 1 nodes with 1 partitions.

INFO: TfLiteFlexDelegate delegate: 1 nodes delegated out of 1 nodes with 1 partitions.

Known limitations

  • As TFLite interpreter does not gurantee thread safety, the signature runners from the same interpreter won't be executed concurrently.
  • Support for C/iOS/Swift is not available yet.

Updates

  • Version 2.7
    • The multiple signature feature is implemented.
    • All the converter APIs from version two generate signature-enabled TensorFlow Lite models.
  • Version 2.5
    • Signature feature is available through the from_saved_model converter API.