Estimators

View on TensorFlow.org Run in Google Colab View source on GitHub Download notebook

This document introduces tf.estimator—a high-level TensorFlow API. Estimators encapsulate the following actions:

  • Training
  • Evaluation
  • Prediction
  • Export for serving

TensorFlow implements several pre-made Estimators. Custom estimators are still suported, but mainly as a backwards compatibility measure. Custom estimators should not be used for new code. All Estimators—pre-made or custom ones—are classes based on the tf.estimator.Estimator class.

For a quick example, try Estimator tutorials. For an overview of the API design, check the white paper.

Setup

pip install -U tensorflow_datasets
import tempfile
import os

import tensorflow as tf
import tensorflow_datasets as tfds

Advantages

Similar to a tf.keras.Model, an estimator is a model-level abstraction. The tf.estimator provides some capabilities currently still under development for tf.keras. These are:

  • Parameter server based training
  • Full TFX integration

Estimators Capabilities

Estimators provide the following benefits:

  • You can run Estimator-based models on a local host or on a distributed multi-server environment without changing your model. Furthermore, you can run Estimator-based models on CPUs, GPUs, or TPUs without recoding your model.
  • Estimators provide a safe distributed training loop that controls how and when to:
    • Load data
    • Handle exceptions
    • Create checkpoint files and recover from failures
    • Save summaries for TensorBoard

When writing an application with Estimators, you must separate the data input pipeline from the model. This separation simplifies experiments with different datasets.

Using pre-made Estimators

Pre-made Estimators enable you to work at a much higher conceptual level than the base TensorFlow APIs. You no longer have to worry about creating the computational graph or sessions since Estimators handle all the "plumbing" for you. Furthermore, pre-made Estimators let you experiment with different model architectures by making only minimal code changes. tf.estimator.DNNClassifier, for example, is a pre-made Estimator class that trains classification models based on dense, feed-forward neural networks.

A TensorFlow program relying on a pre-made Estimator typically consists of the following four steps:

1. Write an input functions

For example, you might create one function to import the training set and another function to import the test set. Estimators expect their inputs to be formatted as a pair of objects:

  • A dictionary in which the keys are feature names and the values are Tensors (or SparseTensors) containing the corresponding feature data
  • A Tensor containing one or more labels

The input_fn should return a tf.data.Dataset that yields pairs in that format.

For example, the following code builds a tf.data.Dataset from the Titanic dataset's train.csv file:

def train_input_fn():
  titanic_file = tf.keras.utils.get_file("train.csv", "https://storage.googleapis.com/tf-datasets/titanic/train.csv")
  titanic = tf.data.experimental.make_csv_dataset(
      titanic_file, batch_size=32,
      label_name="survived")
  titanic_batches = (
      titanic.cache().repeat().shuffle(500)
      .prefetch(tf.data.AUTOTUNE))
  return titanic_batches

The input_fn is executed in a tf.Graph and can also directly return a (features_dics, labels) pair containing graph tensors, but this is error prone outside of simple cases like returning constants.

2. Define the feature columns.

Each tf.feature_column identifies a feature name, its type, and any input pre-processing.

For example, the following snippet creates three feature columns.

  • The first uses the age feature directly as a floating-point input.
  • The second uses the class feature as a categorical input.
  • The third uses the embark_town as a categorical input, but uses the hashing trick to avoid the need to enumerate the options, and to set the number of options.

For further information, check the feature columns tutorial.

age = tf.feature_column.numeric_column('age')
cls = tf.feature_column.categorical_column_with_vocabulary_list('class', ['First', 'Second', 'Third']) 
embark = tf.feature_column.categorical_column_with_hash_bucket('embark_town', 32)

3. Instantiate the relevant pre-made Estimator.

For example, here's a sample instantiation of a pre-made Estimator named LinearClassifier:

model_dir = tempfile.mkdtemp()
model = tf.estimator.LinearClassifier(
    model_dir=model_dir,
    feature_columns=[embark, cls, age],
    n_classes=2
)

For more information, you can go the linear classifier tutorial.

4. Call a training, evaluation, or inference method.

All Estimators provide train, evaluate, and predict methods.

model = model.train(input_fn=train_input_fn, steps=100)
result = model.evaluate(train_input_fn, steps=10)

for key, value in result.items():
  print(key, ":", value)
for pred in model.predict(train_input_fn):
  for key, value in pred.items():
    print(key, ":", value)
  break

Benefits of pre-made Estimators

Pre-made Estimators encode best practices, providing the following benefits:

  • Best practices for determining where different parts of the computational graph should run, implementing strategies on a single machine or on a cluster.
  • Best practices for event (summary) writing and universally useful summaries.

If you don't use pre-made Estimators, you must implement the preceding features yourself.

Custom Estimators

The heart of every Estimator—whether pre-made or custom—is its model function, model_fn, which is a method that builds graphs for training, evaluation, and prediction. When you are using a pre-made Estimator, someone else has already implemented the model function. When relying on a custom Estimator, you must write the model function yourself.

Create an Estimator from a Keras model

You can convert existing Keras models to Estimators with tf.keras.estimator.model_to_estimator. This is helpful if you want to modernize your model code, but your training pipeline still requires Estimators.

Instantiate a Keras MobileNet V2 model and compile the model with the optimizer, loss, and metrics to train with:

keras_mobilenet_v2 = tf.keras.applications.MobileNetV2(
    input_shape=(160, 160, 3), include_top=False)
keras_mobilenet_v2.trainable = False

estimator_model = tf.keras.Sequential([
    keras_mobilenet_v2,
    tf.keras.layers.GlobalAveragePooling2D(),
    tf.keras.layers.Dense(1)
])

# Compile the model
estimator_model.compile(
    optimizer='adam',
    loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
    metrics=['accuracy'])

Create an Estimator from the compiled Keras model. The initial model state of the Keras model is preserved in the created Estimator:

est_mobilenet_v2 = tf.keras.estimator.model_to_estimator(keras_model=estimator_model)

Treat the derived Estimator as you would with any other Estimator.

IMG_SIZE = 160  # All images will be resized to 160x160

def preprocess(image, label):
  image = tf.cast(image, tf.float32)
  image = (image/127.5) - 1
  image = tf.image.resize(image, (IMG_SIZE, IMG_SIZE))
  return image, label
def train_input_fn(batch_size):
  data = tfds.load('cats_vs_dogs', as_supervised=True)
  train_data = data['train']
  train_data = train_data.map(preprocess).shuffle(500).batch(batch_size)
  return train_data

To train, call Estimator's train function:

est_mobilenet_v2.train(input_fn=lambda: train_input_fn(32), steps=50)

Similarly, to evaluate, call the Estimator's evaluate function:

est_mobilenet_v2.evaluate(input_fn=lambda: train_input_fn(32), steps=10)

For more details, please refer to the documentation for tf.keras.estimator.model_to_estimator.

Saving object-based checkpoints with Estimator

Estimators by default save checkpoints with variable names rather than the object graph described in the Checkpoint guide. tf.train.Checkpoint will read name-based checkpoints, but variable names may change when moving parts of a model outside of the Estimator's model_fn. For forwards compatibility saving object-based checkpoints makes it easier to train a model inside an Estimator and then use it outside of one.

import tensorflow.compat.v1 as tf_compat
def toy_dataset():
  inputs = tf.range(10.)[:, None]
  labels = inputs * 5. + tf.range(5.)[None, :]
  return tf.data.Dataset.from_tensor_slices(
    dict(x=inputs, y=labels)).repeat().batch(2)
class Net(tf.keras.Model):
  """A simple linear model."""

  def __init__(self):
    super(Net, self).__init__()
    self.l1 = tf.keras.layers.Dense(5)

  def call(self, x):
    return self.l1(x)
def model_fn(features, labels, mode):
  net = Net()
  opt = tf.keras.optimizers.Adam(0.1)
  ckpt = tf.train.Checkpoint(step=tf_compat.train.get_global_step(),
                             optimizer=opt, net=net)
  with tf.GradientTape() as tape:
    output = net(features['x'])
    loss = tf.reduce_mean(tf.abs(output - features['y']))
  variables = net.trainable_variables
  gradients = tape.gradient(loss, variables)
  return tf.estimator.EstimatorSpec(
    mode,
    loss=loss,
    train_op=tf.group(opt.apply_gradients(zip(gradients, variables)),
                      ckpt.step.assign_add(1)),
    # Tell the Estimator to save "ckpt" in an object-based format.
    scaffold=tf_compat.train.Scaffold(saver=ckpt))

tf.keras.backend.clear_session()
est = tf.estimator.Estimator(model_fn, './tf_estimator_example/')
est.train(toy_dataset, steps=10)

tf.train.Checkpoint can then load the Estimator's checkpoints from its model_dir.

opt = tf.keras.optimizers.Adam(0.1)
net = Net()
ckpt = tf.train.Checkpoint(
  step=tf.Variable(1, dtype=tf.int64), optimizer=opt, net=net)
ckpt.restore(tf.train.latest_checkpoint('./tf_estimator_example/'))
ckpt.step.numpy()  # From est.train(..., steps=10)

SavedModels from Estimators

Estimators export SavedModels through tf.Estimator.export_saved_model.

input_column = tf.feature_column.numeric_column("x")

estimator = tf.estimator.LinearClassifier(feature_columns=[input_column])

def input_fn():
  return tf.data.Dataset.from_tensor_slices(
    ({"x": [1., 2., 3., 4.]}, [1, 1, 0, 0])).repeat(200).shuffle(64).batch(16)
estimator.train(input_fn)

To save an Estimator you need to create a serving_input_receiver. This function builds a part of a tf.Graph that parses the raw data received by the SavedModel.

The tf.estimator.export module contains functions to help build these receivers.

The following code builds a receiver, based on the feature_columns, that accepts serialized tf.Example protocol buffers, which are often used with tf-serving.

tmpdir = tempfile.mkdtemp()

serving_input_fn = tf.estimator.export.build_parsing_serving_input_receiver_fn(
  tf.feature_column.make_parse_example_spec([input_column]))

estimator_base_path = os.path.join(tmpdir, 'from_estimator')
estimator_path = estimator.export_saved_model(estimator_base_path, serving_input_fn)

You can also load and run that model, from python:

imported = tf.saved_model.load(estimator_path)

def predict(x):
  example = tf.train.Example()
  example.features.feature["x"].float_list.value.extend([x])
  return imported.signatures["predict"](
    examples=tf.constant([example.SerializeToString()]))
print(predict(1.5))
print(predict(3.5))

tf.estimator.export.build_raw_serving_input_receiver_fn allows you to create input functions which take raw tensors rather than tf.train.Examples.

Using tf.distribute.Strategy with Estimator (Limited support)

tf.estimator is a distributed training TensorFlow API that originally supported the async parameter server approach. tf.estimator now supports tf.distribute.Strategy. If you're using tf.estimator, you can change to distributed training with very few changes to your code. With this, Estimator users can now do synchronous distributed training on multiple GPUs and multiple workers, as well as use TPUs. This support in Estimator is, however, limited. Check out the What's supported now section below for more details.

Using tf.distribute.Strategy with Estimator is slightly different than in the Keras case. Instead of using strategy.scope, now you pass the strategy object into the RunConfig for the Estimator.

You can refer to the distributed training guide for more information.

Here is a snippet of code that shows this with a premade Estimator LinearRegressor and MirroredStrategy:

mirrored_strategy = tf.distribute.MirroredStrategy()
config = tf.estimator.RunConfig(
    train_distribute=mirrored_strategy, eval_distribute=mirrored_strategy)
regressor = tf.estimator.LinearRegressor(
    feature_columns=[tf.feature_column.numeric_column('feats')],
    optimizer='SGD',
    config=config)

Here, you use a premade Estimator, but the same code works with a custom Estimator as well. train_distribute determines how training will be distributed, and eval_distribute determines how evaluation will be distributed. This is another difference from Keras where you use the same strategy for both training and eval.

Now you can train and evaluate this Estimator with an input function:

def input_fn():
  dataset = tf.data.Dataset.from_tensors(({"feats":[1.]}, [1.]))
  return dataset.repeat(1000).batch(10)
regressor.train(input_fn=input_fn, steps=10)
regressor.evaluate(input_fn=input_fn, steps=10)

Another difference to highlight here between Estimator and Keras is the input handling. In Keras, each batch of the dataset is split automatically across the multiple replicas. In Estimator, however, you do not perform automatic batch splitting, nor automatically shard the data across different workers. You have full control over how you want your data to be distributed across workers and devices, and you must provide an input_fn to specify how to distribute your data.

Your input_fn is called once per worker, thus giving one dataset per worker. Then one batch from that dataset is fed to one replica on that worker, thereby consuming N batches for N replicas on 1 worker. In other words, the dataset returned by the input_fn should provide batches of size PER_REPLICA_BATCH_SIZE. And the global batch size for a step can be obtained as PER_REPLICA_BATCH_SIZE * strategy.num_replicas_in_sync.

When performing multi-worker training, you should either split your data across the workers, or shuffle with a random seed on each. You can check an example of how to do this in the Multi-worker training with Estimator tutorial.

And similarly, you can use multi worker and parameter server strategies as well. The code remains the same, but you need to use tf.estimator.train_and_evaluate, and set TF_CONFIG environment variables for each binary running in your cluster.

What's supported now?

There is limited support for training with Estimator using all strategies except TPUStrategy. Basic training and evaluation should work, but a number of advanced features such as v1.train.Scaffold do not. There may also be a number of bugs in this integration and there are no plans to actively improve this support (the focus is on Keras and custom training loop support). If at all possible, you should prefer to use tf.distribute with those APIs instead.

Training API MirroredStrategy TPUStrategy MultiWorkerMirroredStrategy CentralStorageStrategy ParameterServerStrategy
Estimator API Limited support Not supported Limited support Limited support Limited support

Examples and tutorials

Here are some end-to-end examples that show how to use various strategies with Estimator:

  1. The Multi-worker Training with Estimator tutorial shows how you can train with multiple workers using MultiWorkerMirroredStrategy on the MNIST dataset.
  2. An end-to-end example of running multi-worker training with distribution strategies in tensorflow/ecosystem using Kubernetes templates. It starts with a Keras model and converts it to an Estimator using the tf.keras.estimator.model_to_estimator API.
  3. The official ResNet50 model, which can be trained using either MirroredStrategy or MultiWorkerMirroredStrategy.