The SavedModel format of TensorFlow 2 is the recommended way to share pre-trained models and model pieces on TensorFlow Hub. It replaces the older hub.Module format for TensorFlow 1 and comes with a new set of APIs.
This page explains how to reuse TF2 SavedModels in a TensorFlow 2
program with the low-level
hub.load() API and its
hub.KerasLayer is combined with other
to build a Keras model or the
model_fn of a TF2 Estimator.)
These APIs can also load the older hub.Modules for TF1, within limits,
see the migration guide.
Users of TensorFlow 1 can update to TF 1.15 and then use the same APIs. Older versions of TF1 do not work.
Using SavedModels from TF Hub
Using a SavedModel in Keras
Keras is TensorFlow's high-level API
for building deep learning models by composing Keras Layer objects.
tensorflow_hub library provides the class
hub.KerasLayer that gets
initialized with the URL (or filesystem path) of a SavedModel and then
provides the computation from the SavedModel, including its pre-trained
Here is an example of using a pre-trained text embedding:
import tensorflow as tf import tensorflow_hub as hub hub_url = "https://tfhub.dev/google/tf2-preview/nnlm-en-dim128/1" embed = hub.KerasLayer(hub_url) embeddings = embed(["A long sentence.", "single-word", "http://example.com"]) print(embeddings.shape, embeddings.dtype)
From this, a text classifier can be built in the usual Keras way:
model = tf.keras.Sequential([ embed, tf.keras.layers.Dense(16, activation="relu"), tf.keras.layers.Dense(1, activation="sigmoid"), ])
The Text classification colab is a complete example how to train and evaluate such a classifier.
The model weights in a
hub.KerasLayer are set to non-trainable by default.
See the section on fine-tuning below for how to change that. Weights are
shared between all applications of the same layer object, as usual in Keras.
Using a SavedModel in an Estimator
Behind the scenes: SavedModel downloading and caching
Using a SavedModel from TensorFlow Hub
(or other HTTPS servers that implement its hosting protocol)
downloads it to the local filesystem if not already present.
The environment variable
TFHUB_CACHE_DIR can be set to override the default
temporary location for caching the downloaded and uncompressed SavedModels.
Using a SavedModel in low-level TensorFlow
hub.load(handle) downloads and decompresses a SavedModel
handle is already a filesystem path) and then returns the result
of loading it with TensorFlow's built-in function
hub.load() can handle any valid SavedModel (unlike its
hub.Module for TF1).
Advanced topic: what to expect from the SavedModel after loading
Depending on the contents of the SavedModel, the result of
obj = hub.load(...) can be invoked in various ways (as explained in
much greater detail in TensorFlow's SavedModel
The serving signatures of the SavedModel (if any) are represented as a dictionary of concrete functions and can be called like
tensors_out = obj.signatures["serving_default"](**tensors_in), with dictionaries of tensors keyed by the respective input and output names and subject to the signature's shape and dtype constraints.
@tf.function-decorated methods of the saved object (if any) are restored as tf.function objects that can be called by all combinations of Tensor and non-Tensor arguments for which the tf.function had been traced prior to saving. In particular, if there is an
obj.__call__method with suitable traces,
objitself can be called like a Python function. A simple example could look like
output_tensor = obj(input_tensor, training=False).
This leaves enormous liberty in the interfaces that SavedModels can
implement. The Reusable SavedModels interface
obj establishes conventions such that client code, including adapters
hub.KerasLayer, know how to use the SavedModel.
Some SavedModels may not follow that convention, especially whole models not meant to be reused in larger models, and just provide serving signatures.
The trainable variables in a SavedModel are reloaded as trainable,
tf.GradientTape will watch them by default. See the section on
fine-tuning below for some caveats, and consider avoiding this for starters.
Even if you want to fine-tune, you may want to see if
advises to re-train only a subset of the originally trainable variables.
Creating SavedModels for TF Hub
SavedModel is TensorFlow's standard serialization format for trained models or model pieces. It stores the model's trained weights together with the exact TensorFlow operations to perform its computation. It can be used independently from the code that created it. In particular, it can be reused across different high-level model-building APIs like Keras, because TensorFlow operations are their common basic language.
Saving from Keras
Starting with TensorFlow 2,
tf.keras.models.save_model() default to the SavedModel format (not HDF5).
The resulting SavedModels that can be used with
hub.KerasLayer and similar adapters for other high-level APIs
as they become available.
To share a complete Keras Model, just save it with
To share a piece of a Keras Model, make the piece a Model in itself and then save that. You can either lay out the code like that from the start....
piece_to_share = tf.keras.Model(...) full_model = tf.keras.Sequential([piece_to_share, ...]) full_model.fit(...) piece_to_share.save(...)
...or cut out the piece to share after the fact (if it aligns with the layering of your full model):
full_model = tf.keras.Model(...) sharing_input = full_model.get_layer(...).get_output_at(0) sharing_output = full_model.get_layer(...).get_output_at(0) piece_to_share = tf.keras.Model(sharing_input, sharing_output) piece_to_share.save(..., include_optimizer=False)
TensorFlow Models on GitHub
uses the former approach for BERT (see
note the split between
and the the latter approach for ResNet (see
Saving from low-level TensorFlow
This requires good familiarity with TensorFlow's SavedModel Guide.
If you want to provide more than just a serving signature, you should implement the Reusable SavedModel interface. Conceptually, this looks like
class MyMulModel(tf.train.Checkpoint): def __init__(self, v_init): super(MyMulModel, self).__init__() self.v = tf.Variable(v_init) self.variables = [self.v] self.trainable_variables = [self.v] self.regularization_losses = [ tf.function(input_signature=)(lambda: 0.001 * self.v**2), ] @tf.function(input_signature=[tf.TensorSpec(shape=None, dtype=tf.float32)]) def __call__(self, inputs): return tf.multiply(inputs, self.v) tf.saved_model.save(MyMulModel(2.0), "/tmp/my_mul") layer = hub.KerasLayer("/tmp/my_mul") print(layer([10., 20.])) # [20., 40.] layer.trainable = True print(layer.trainable_weights) # [2.] print(layer.losses) # 0.004
The code at
contains larger examples, esp. the
Training the already-trained variables of an imported SavedModel together with those of the model around it is called fine-tuning the SavedModel. This can result in better quality, but often makes the training more demanding (may take more time, depend more on the optimizer and its hyperparameters, increase the risk of overfitting and require dataset augmentation, esp. for CNNs). We advise SavedModel consumers to look into fine-tuning only after having established a good training regime, and only if the SavedModel publisher recommends it.
Fine-tuning changes the "continuous" model parameters that are trained. It does not change hard-coded transformations, such as tokenizing text input and mapping tokens to their corresponding entries in an embedding matrix.
For SavedModel consumers
layer = hub.KerasLayer(..., trainable=True)
enables fine-tuning of the SavedModel loaded by the layer. It adds the trainable weights and weight regularizers declared in the SavedModel to the Keras model, and runs the SavedModel's computation in training mode (think of dropout etc.).
The image classification colab contains an end-to-end example with optional fine-tuning.
Re-exporting the fine-tuning result
Advanced users may want to save the results of fine-tuning back into a SavedModel that can be used instead of the originally loaded one. This can be done with code like
loaded_obj = hub.load("https://tfhub.dev/...") hub_layer = hub.KerasLayer(loaded_obj, trainable=True, ...) model = keras.Sequential([..., hub_layer, ...]) model.compile(...) model.fit(...) export_module_dir = os.path.join(os.getcwd(), "finetuned_model_export") tf.saved_model.save(loaded_obj, export_module_dir)
For SavedModel creators
When creating a SavedModel for sharing on TensorFlow Hub, think ahead if and how its consumers should fine-tune it, and provide guidance in the documentation.
Saving from a Keras Model should make all the mechanics of fine-tuning work
(saving weight regularization losses, declaring trainable variables, tracing
__call__ for both
Choose a model interface that plays well with gradient flow, e.g., output logits instead of softmax probabilities or top-k predictions.
If the model use dropout, batch normalization, or similar training techniques that involve hyperparameters, set them to values that make sense across many expected target problems and batch sizes. (As of this writing, saving from Keras does not make it easy to let consumers adjust them, but see tensorflow/examples/saved_model/integration_tests/export_mnist.py for some crude workarounds.)
Weight regularizers on individual layers are saved (with their regularization
strength coefficients), but weight regularization from within the optimizer
is lost. Advise consumers of your SavedModel accordingly.