Estimators

View on TensorFlow.org Run in Google Colab View source on GitHub Download notebook

This document introduces tf.estimator—a high-level TensorFlow API. Estimators encapsulate the following actions:

  • training
  • evaluation
  • prediction
  • export for serving

You may either use the pre-made Estimators we provide or write your own custom Estimators. All Estimators—whether pre-made or custom—are classes based on the tf.estimator.Estimator class.

For a quick example try Estimator tutorials. For an overview of the API design, see the white paper.

Advantages

Similar to a tf.keras.Model, an estimator is a model-level abstraction. The tf.estimator provides some capabilities currently still under development for tf.keras. These are:

  • Parameter server based training
  • Full TFX integration.

Estimators Capabilities

Estimators provide the following benefits:

  • You can run Estimator-based models on a local host or on a distributed multi-server environment without changing your model. Furthermore, you can run Estimator-based models on CPUs, GPUs, or TPUs without recoding your model.
  • Estimators provide a safe distributed training loop that controls how and when to:
    • load data
    • handle exceptions
    • create checkpoint files and recover from failures
    • save summaries for TensorBoard

When writing an application with Estimators, you must separate the data input pipeline from the model. This separation simplifies experiments with different data sets.

Using pre-made Estimators

Pre-made Estimators enable you to work at a much higher conceptual level than the base TensorFlow APIs. You no longer have to worry about creating the computational graph or sessions since Estimators handle all the "plumbing" for you. Furthermore, pre-made Estimators let you experiment with different model architectures by making only minimal code changes. tf.estimator.DNNClassifier, for example, is a pre-made Estimator class that trains classification models based on dense, feed-forward neural networks.

A TensorFlow program relying on a pre-made Estimator typically consists of the following four steps:

1. Write an input functions

For example, you might create one function to import the training set and another function to import the test set. Estimators expect their inputs to be formatted as a pair of objects:

  • A dictionary in which the keys are feature names and the values are Tensors (or SparseTensors) containing the corresponding feature data
  • A Tensor containing one or more labels

The input_fn should return a tf.data.Dataset that yields pairs in that format.

For example, the following code builds a tf.data.Dataset from the Titanic dataset's train.csv file:

import tensorflow as tf

def train_input_fn():
  titanic_file = tf.keras.utils.get_file("train.csv", "https://storage.googleapis.com/tf-datasets/titanic/train.csv")
  titanic = tf.data.experimental.make_csv_dataset(
      titanic_file, batch_size=32,
      label_name="survived")
  titanic_batches = (
      titanic.cache().repeat().shuffle(500)
      .prefetch(tf.data.experimental.AUTOTUNE))
  return titanic_batches

The input_fn is executed in a tf.Graph and can also directly return a (features_dics, labels) pair containing graph tensors, but this is error prone outside of simple cases like returning constants.

2. Define the feature columns.

Each tf.feature_column identifies a feature name, its type, and any input pre-processing.

For example, the following snippet creates three feature columns.

  • The first uses the age feature directly as a floating-point input.
  • The second uses the class feature as a categorical input.
  • The third uses the embark_town as a categorical input, but uses the hashing trick to avoid the need to enumerate the options, and to set the number of options.

For further information, see the feature columns tutorial.

age = tf.feature_column.numeric_column('age')
cls = tf.feature_column.categorical_column_with_vocabulary_list('class', ['First', 'Second', 'Third']) 
embark = tf.feature_column.categorical_column_with_hash_bucket('embark_town', 32)

3. Instantiate the relevant pre-made Estimator.

For example, here's a sample instantiation of a pre-made Estimator named LinearClassifier:

import tempfile
model_dir = tempfile.mkdtemp()
model = tf.estimator.LinearClassifier(
    model_dir=model_dir,
    feature_columns=[embark, cls, age],
    n_classes=2
)
INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpbsi4iylb', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

For further information, see the linear classifier tutorial.

4. Call a training, evaluation, or inference method.

All Estimators provide train, evaluate, and predict methods.

model = model.train(input_fn=train_input_fn, steps=100)
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
INFO:tensorflow:Calling model_fn.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/canned/linear.py:1481: Layer.add_variable (from tensorflow.python.keras.engine.base_layer_v1) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.add_weight` method instead.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/keras/optimizer_v2/ftrl.py:112: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmpbsi4iylb/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:loss = 0.6931472, step = 0
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 100...
INFO:tensorflow:Saving checkpoints for 100 into /tmp/tmpbsi4iylb/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 100...
INFO:tensorflow:Loss for final step: 0.6262238.

result = model.evaluate(train_input_fn, steps=10)

for key, value in result.items():
  print(key, ":", value)
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2020-09-19T01:23:19Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpbsi4iylb/model.ckpt-100
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [1/10]
INFO:tensorflow:Evaluation [2/10]
INFO:tensorflow:Evaluation [3/10]
INFO:tensorflow:Evaluation [4/10]
INFO:tensorflow:Evaluation [5/10]
INFO:tensorflow:Evaluation [6/10]
INFO:tensorflow:Evaluation [7/10]
INFO:tensorflow:Evaluation [8/10]
INFO:tensorflow:Evaluation [9/10]
INFO:tensorflow:Evaluation [10/10]
INFO:tensorflow:Inference Time : 0.60420s
INFO:tensorflow:Finished evaluation at 2020-09-19-01:23:20
INFO:tensorflow:Saving dict for global step 100: accuracy = 0.709375, accuracy_baseline = 0.634375, auc = 0.7508315, auc_precision_recall = 0.6325826, average_loss = 0.5779631, global_step = 100, label/mean = 0.365625, loss = 0.5779631, precision = 0.77272725, prediction/mean = 0.30943406, recall = 0.2905983
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 100: /tmp/tmpbsi4iylb/model.ckpt-100
accuracy : 0.709375
accuracy_baseline : 0.634375
auc : 0.7508315
auc_precision_recall : 0.6325826
average_loss : 0.5779631
label/mean : 0.365625
loss : 0.5779631
precision : 0.77272725
prediction/mean : 0.30943406
recall : 0.2905983
global_step : 100

for pred in model.predict(train_input_fn):
  for key, value in pred.items():
    print(key, ":", value)
  break
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpbsi4iylb/model.ckpt-100
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
logits : [-0.5147792]
logistic : [0.37407383]
probabilities : [0.62592614 0.37407383]
class_ids : [0]
classes : [b'0']
all_class_ids : [0 1]
all_classes : [b'0' b'1']

Benefits of pre-made Estimators

Pre-made Estimators encode best practices, providing the following benefits:

  • Best practices for determining where different parts of the computational graph should run, implementing strategies on a single machine or on a cluster.
  • Best practices for event (summary) writing and universally useful summaries.

If you don't use pre-made Estimators, you must implement the preceding features yourself.

Custom Estimators

The heart of every Estimator—whether pre-made or custom—is its model function, which is a method that builds graphs for training, evaluation, and prediction. When you are using a pre-made Estimator, someone else has already implemented the model function. When relying on a custom Estimator, you must write the model function yourself.

So the recommended workflow is:

  1. Assuming a suitable pre-made Estimator exists, use it to build your first model and use its results to establish a baseline.
  2. Build and test your overall pipeline, including the integrity and reliability of your data with this pre-made Estimator.
  3. If suitable alternative pre-made Estimators are available, run experiments to determine which pre-made Estimator produces the best results.
  4. Possibly, further improve your model by building your own custom Estimator.

Create an Estimator from a Keras model

You can convert existing Keras models to Estimators with tf.keras.estimator.model_to_estimator. Doing so enables your Keras model to access Estimator's strengths, such as distributed training.

Instantiate a Keras MobileNet V2 model and compile the model with the optimizer, loss, and metrics to train with:

import tensorflow as tf
import tensorflow_datasets as tfds
keras_mobilenet_v2 = tf.keras.applications.MobileNetV2(
    input_shape=(160, 160, 3), include_top=False)
keras_mobilenet_v2.trainable = False

estimator_model = tf.keras.Sequential([
    keras_mobilenet_v2,
    tf.keras.layers.GlobalAveragePooling2D(),
    tf.keras.layers.Dense(1)
])

# Compile the model
estimator_model.compile(
    optimizer='adam',
    loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
    metrics=['accuracy'])
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet_v2/mobilenet_v2_weights_tf_dim_ordering_tf_kernels_1.0_160_no_top.h5
9412608/9406464 [==============================] - 0s 0us/step

Create an Estimator from the compiled Keras model. The initial model state of the Keras model is preserved in the created Estimator:

est_mobilenet_v2 = tf.keras.estimator.model_to_estimator(keras_model=estimator_model)
INFO:tensorflow:Using default config.
WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmp6jubajxm
INFO:tensorflow:Using the Keras model provided.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/keras.py:220: set_learning_phase (from tensorflow.python.keras.backend) is deprecated and will be removed after 2020-10-11.
Instructions for updating:
Simply pass a True/False value to the `training` argument of the `__call__` method of your layer or model.
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmp6jubajxm', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

Treat the derived Estimator as you would with any other Estimator.

IMG_SIZE = 160  # All images will be resized to 160x160

def preprocess(image, label):
  image = tf.cast(image, tf.float32)
  image = (image/127.5) - 1
  image = tf.image.resize(image, (IMG_SIZE, IMG_SIZE))
  return image, label
def train_input_fn(batch_size):
  data = tfds.load('cats_vs_dogs', as_supervised=True)
  train_data = data['train']
  train_data = train_data.map(preprocess).shuffle(500).batch(batch_size)
  return train_data

To train, call Estimator's train function:

est_mobilenet_v2.train(input_fn=lambda: train_input_fn(32), steps=500)
Downloading and preparing dataset cats_vs_dogs/4.0.0 (download: 786.68 MiB, generated: Unknown size, total: 786.68 MiB) to /home/kbuilder/tensorflow_datasets/cats_vs_dogs/4.0.0...

Warning:absl:1738 images were corrupted and were skipped

Shuffling and writing examples to /home/kbuilder/tensorflow_datasets/cats_vs_dogs/4.0.0.incomplete1VDT6U/cats_vs_dogs-train.tfrecord
Dataset cats_vs_dogs downloaded and prepared to /home/kbuilder/tensorflow_datasets/cats_vs_dogs/4.0.0. Subsequent calls will reuse this data.
INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Warm-starting with WarmStartSettings: WarmStartSettings(ckpt_to_initialize_from='/tmp/tmp6jubajxm/keras/keras_model.ckpt', vars_to_warm_start='.*', var_name_to_vocab_info={}, var_name_to_prev_var_name={})

INFO:tensorflow:Warm-starting with WarmStartSettings: WarmStartSettings(ckpt_to_initialize_from='/tmp/tmp6jubajxm/keras/keras_model.ckpt', vars_to_warm_start='.*', var_name_to_vocab_info={}, var_name_to_prev_var_name={})

INFO:tensorflow:Warm-starting from: /tmp/tmp6jubajxm/keras/keras_model.ckpt

INFO:tensorflow:Warm-starting from: /tmp/tmp6jubajxm/keras/keras_model.ckpt

INFO:tensorflow:Warm-starting variables only in TRAINABLE_VARIABLES.

INFO:tensorflow:Warm-starting variables only in TRAINABLE_VARIABLES.

INFO:tensorflow:Warm-started 158 variables.

INFO:tensorflow:Warm-started 158 variables.

INFO:tensorflow:Create CheckpointSaverHook.

INFO:tensorflow:Create CheckpointSaverHook.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...

INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...

INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmp6jubajxm/model.ckpt.

INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmp6jubajxm/model.ckpt.

INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...

INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...

INFO:tensorflow:loss = 0.69570315, step = 0

INFO:tensorflow:loss = 0.69570315, step = 0

INFO:tensorflow:global_step/sec: 23.7702

INFO:tensorflow:global_step/sec: 23.7702

INFO:tensorflow:loss = 0.6749977, step = 100 (4.209 sec)

INFO:tensorflow:loss = 0.6749977, step = 100 (4.209 sec)

INFO:tensorflow:global_step/sec: 26.3102

INFO:tensorflow:global_step/sec: 26.3102

INFO:tensorflow:loss = 0.69422066, step = 200 (3.800 sec)

INFO:tensorflow:loss = 0.69422066, step = 200 (3.800 sec)

INFO:tensorflow:global_step/sec: 25.8025

INFO:tensorflow:global_step/sec: 25.8025

INFO:tensorflow:loss = 0.5480626, step = 300 (3.876 sec)

INFO:tensorflow:loss = 0.5480626, step = 300 (3.876 sec)

INFO:tensorflow:global_step/sec: 25.8782

INFO:tensorflow:global_step/sec: 25.8782

INFO:tensorflow:loss = 0.6515007, step = 400 (3.865 sec)

INFO:tensorflow:loss = 0.6515007, step = 400 (3.865 sec)

INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 500...

INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 500...

INFO:tensorflow:Saving checkpoints for 500 into /tmp/tmp6jubajxm/model.ckpt.

INFO:tensorflow:Saving checkpoints for 500 into /tmp/tmp6jubajxm/model.ckpt.

INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 500...

INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 500...

INFO:tensorflow:Loss for final step: 0.7829801.

INFO:tensorflow:Loss for final step: 0.7829801.

<tensorflow_estimator.python.estimator.estimator.EstimatorV2 at 0x7f87c0520518>

Similarly, to evaluate, call the Estimator's evaluate function:

est_mobilenet_v2.evaluate(input_fn=lambda: train_input_fn(32), steps=10)
INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Calling model_fn.

Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_v1.py:2048: Model.state_updates (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.

Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_v1.py:2048: Model.state_updates (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Starting evaluation at 2020-09-19T01:24:35Z

INFO:tensorflow:Starting evaluation at 2020-09-19T01:24:35Z

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Restoring parameters from /tmp/tmp6jubajxm/model.ckpt-500

INFO:tensorflow:Restoring parameters from /tmp/tmp6jubajxm/model.ckpt-500

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Evaluation [1/10]

INFO:tensorflow:Evaluation [1/10]

INFO:tensorflow:Evaluation [2/10]

INFO:tensorflow:Evaluation [2/10]

INFO:tensorflow:Evaluation [3/10]

INFO:tensorflow:Evaluation [3/10]

INFO:tensorflow:Evaluation [4/10]

INFO:tensorflow:Evaluation [4/10]

INFO:tensorflow:Evaluation [5/10]

INFO:tensorflow:Evaluation [5/10]

INFO:tensorflow:Evaluation [6/10]

INFO:tensorflow:Evaluation [6/10]

INFO:tensorflow:Evaluation [7/10]

INFO:tensorflow:Evaluation [7/10]

INFO:tensorflow:Evaluation [8/10]

INFO:tensorflow:Evaluation [8/10]

INFO:tensorflow:Evaluation [9/10]

INFO:tensorflow:Evaluation [9/10]

INFO:tensorflow:Evaluation [10/10]

INFO:tensorflow:Evaluation [10/10]

INFO:tensorflow:Inference Time : 1.74243s

INFO:tensorflow:Inference Time : 1.74243s

INFO:tensorflow:Finished evaluation at 2020-09-19-01:24:37

INFO:tensorflow:Finished evaluation at 2020-09-19-01:24:37

INFO:tensorflow:Saving dict for global step 500: accuracy = 0.61875, global_step = 500, loss = 0.6329565

INFO:tensorflow:Saving dict for global step 500: accuracy = 0.61875, global_step = 500, loss = 0.6329565

INFO:tensorflow:Saving 'checkpoint_path' summary for global step 500: /tmp/tmp6jubajxm/model.ckpt-500

INFO:tensorflow:Saving 'checkpoint_path' summary for global step 500: /tmp/tmp6jubajxm/model.ckpt-500

{'accuracy': 0.61875, 'loss': 0.6329565, 'global_step': 500}

For more details, please refer to the documentation for tf.keras.estimator.model_to_estimator.