View on TensorFlow.org | Run in Google Colab | View source on GitHub | Download notebook |

This document introduces `tf.estimator`

—a high-level TensorFlow
API. Estimators encapsulate the following actions:

- Training
- Evaluation
- Prediction
- Export for serving

TensorFlow implements several pre-made Estimators. Custom estimators are still suported, but mainly as a backwards compatibility measure. **Custom estimators should not be used for new code**. All Estimators—pre-made or custom ones—are classes based on the `tf.estimator.Estimator`

class.

For a quick example, try Estimator tutorials. For an overview of the API design, check the white paper.

## Setup

`pip install -U tensorflow_datasets`

```
import tempfile
import os
import tensorflow as tf
import tensorflow_datasets as tfds
```

## Advantages

Similar to a `tf.keras.Model`

, an `estimator`

is a model-level abstraction. The `tf.estimator`

provides some capabilities currently still under development for `tf.keras`

. These are:

- Parameter server based training
- Full TFX integration

## Estimators Capabilities

Estimators provide the following benefits:

- You can run Estimator-based models on a local host or on a distributed multi-server environment without changing your model. Furthermore, you can run Estimator-based models on CPUs, GPUs, or TPUs without recoding your model.
- Estimators provide a safe distributed training loop that controls how and when to:
- Load data
- Handle exceptions
- Create checkpoint files and recover from failures
- Save summaries for TensorBoard

When writing an application with Estimators, you must separate the data input pipeline from the model. This separation simplifies experiments with different datasets.

## Using pre-made Estimators

Pre-made Estimators enable you to work at a much higher conceptual level than the base TensorFlow APIs. You no longer have to worry about creating the computational graph or sessions since Estimators handle all the "plumbing" for you. Furthermore, pre-made Estimators let you experiment with different model architectures by making only minimal code changes. `tf.estimator.DNNClassifier`

, for example, is a pre-made Estimator class that trains classification models based on dense, feed-forward neural networks.

A TensorFlow program relying on a pre-made Estimator typically consists of the following four steps:

### 1. Write an input functions

For example, you might create one function to import the training set and another function to import the test set. Estimators expect their inputs to be formatted as a pair of objects:

- A dictionary in which the keys are feature names and the values are Tensors (or SparseTensors) containing the corresponding feature data
- A Tensor containing one or more labels

The `input_fn`

should return a `tf.data.Dataset`

that yields pairs in that format.

For example, the following code builds a `tf.data.Dataset`

from the Titanic dataset's `train.csv`

file:

```
def train_input_fn():
titanic_file = tf.keras.utils.get_file("train.csv", "https://storage.googleapis.com/tf-datasets/titanic/train.csv")
titanic = tf.data.experimental.make_csv_dataset(
titanic_file, batch_size=32,
label_name="survived")
titanic_batches = (
titanic.cache().repeat().shuffle(500)
.prefetch(tf.data.AUTOTUNE))
return titanic_batches
```

The `input_fn`

is executed in a `tf.Graph`

and can also directly return a `(features_dics, labels)`

pair containing graph tensors, but this is error prone outside of simple cases like returning constants.

### 2. Define the feature columns.

Each `tf.feature_column`

identifies a feature name, its type, and any input pre-processing.

For example, the following snippet creates three feature columns.

- The first uses the
`age`

feature directly as a floating-point input. - The second uses the
`class`

feature as a categorical input. - The third uses the
`embark_town`

as a categorical input, but uses the`hashing trick`

to avoid the need to enumerate the options, and to set the number of options.

For further information, check the feature columns tutorial.

```
age = tf.feature_column.numeric_column('age')
cls = tf.feature_column.categorical_column_with_vocabulary_list('class', ['First', 'Second', 'Third'])
embark = tf.feature_column.categorical_column_with_hash_bucket('embark_town', 32)
```

### 3. Instantiate the relevant pre-made Estimator.

For example, here's a sample instantiation of a pre-made Estimator named `LinearClassifier`

:

```
model_dir = tempfile.mkdtemp()
model = tf.estimator.LinearClassifier(
model_dir=model_dir,
feature_columns=[embark, cls, age],
n_classes=2
)
```

For more information, you can go the linear classifier tutorial.

### 4. Call a training, evaluation, or inference method.

All Estimators provide `train`

, `evaluate`

, and `predict`

methods.

```
model = model.train(input_fn=train_input_fn, steps=100)
```

```
result = model.evaluate(train_input_fn, steps=10)
for key, value in result.items():
print(key, ":", value)
```

```
for pred in model.predict(train_input_fn):
for key, value in pred.items():
print(key, ":", value)
break
```

### Benefits of pre-made Estimators

Pre-made Estimators encode best practices, providing the following benefits:

- Best practices for determining where different parts of the computational graph should run, implementing strategies on a single machine or on a cluster.
- Best practices for event (summary) writing and universally useful summaries.

If you don't use pre-made Estimators, you must implement the preceding features yourself.

## Custom Estimators

The heart of every Estimator—whether pre-made or custom—is its *model function*, `model_fn`

, which is a method that builds graphs for training, evaluation, and prediction. When you are using a pre-made Estimator, someone else has already implemented the model function. When relying on a custom Estimator, you must write the model function yourself.

## Create an Estimator from a Keras model

You can convert existing Keras models to Estimators with `tf.keras.estimator.model_to_estimator`

. This is helpful if you want to modernize your model code, but your training pipeline still requires Estimators.

Instantiate a Keras MobileNet V2 model and compile the model with the optimizer, loss, and metrics to train with:

```
keras_mobilenet_v2 = tf.keras.applications.MobileNetV2(
input_shape=(160, 160, 3), include_top=False)
keras_mobilenet_v2.trainable = False
estimator_model = tf.keras.Sequential([
keras_mobilenet_v2,
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dense(1)
])
# Compile the model
estimator_model.compile(
optimizer='adam',
loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
metrics=['accuracy'])
```

Create an `Estimator`

from the compiled Keras model. The initial model state of the Keras model is preserved in the created `Estimator`

:

```
est_mobilenet_v2 = tf.keras.estimator.model_to_estimator(keras_model=estimator_model)
```

Treat the derived `Estimator`

as you would with any other `Estimator`

.

```
IMG_SIZE = 160 # All images will be resized to 160x160
def preprocess(image, label):
image = tf.cast(image, tf.float32)
image = (image/127.5) - 1
image = tf.image.resize(image, (IMG_SIZE, IMG_SIZE))
return image, label
```

```
def train_input_fn(batch_size):
data = tfds.load('cats_vs_dogs', as_supervised=True)
train_data = data['train']
train_data = train_data.map(preprocess).shuffle(500).batch(batch_size)
return train_data
```

To train, call Estimator's train function:

```
est_mobilenet_v2.train(input_fn=lambda: train_input_fn(32), steps=50)
```

Similarly, to evaluate, call the Estimator's evaluate function:

```
est_mobilenet_v2.evaluate(input_fn=lambda: train_input_fn(32), steps=10)
```

For more details, please refer to the documentation for `tf.keras.estimator.model_to_estimator`

.

## Saving object-based checkpoints with Estimator

Estimators by default save checkpoints with variable names rather than the object graph described in the Checkpoint guide. `tf.train.Checkpoint`

will read name-based checkpoints, but variable names may change when moving parts of a model outside of the Estimator's `model_fn`

. For forwards compatibility saving object-based checkpoints makes it easier to train a model inside an Estimator and then use it outside of one.

```
import tensorflow.compat.v1 as tf_compat
```

```
def toy_dataset():
inputs = tf.range(10.)[:, None]
labels = inputs * 5. + tf.range(5.)[None, :]
return tf.data.Dataset.from_tensor_slices(
dict(x=inputs, y=labels)).repeat().batch(2)
```

```
class Net(tf.keras.Model):
"""A simple linear model."""
def __init__(self):
super(Net, self).__init__()
self.l1 = tf.keras.layers.Dense(5)
def call(self, x):
return self.l1(x)
```

```
def model_fn(features, labels, mode):
net = Net()
opt = tf.keras.optimizers.Adam(0.1)
ckpt = tf.train.Checkpoint(step=tf_compat.train.get_global_step(),
optimizer=opt, net=net)
with tf.GradientTape() as tape:
output = net(features['x'])
loss = tf.reduce_mean(tf.abs(output - features['y']))
variables = net.trainable_variables
gradients = tape.gradient(loss, variables)
return tf.estimator.EstimatorSpec(
mode,
loss=loss,
train_op=tf.group(opt.apply_gradients(zip(gradients, variables)),
ckpt.step.assign_add(1)),
# Tell the Estimator to save "ckpt" in an object-based format.
scaffold=tf_compat.train.Scaffold(saver=ckpt))
tf.keras.backend.clear_session()
est = tf.estimator.Estimator(model_fn, './tf_estimator_example/')
est.train(toy_dataset, steps=10)
```

`tf.train.Checkpoint`

can then load the Estimator's checkpoints from its `model_dir`

.

```
opt = tf.keras.optimizers.Adam(0.1)
net = Net()
ckpt = tf.train.Checkpoint(
step=tf.Variable(1, dtype=tf.int64), optimizer=opt, net=net)
ckpt.restore(tf.train.latest_checkpoint('./tf_estimator_example/'))
ckpt.step.numpy() # From est.train(..., steps=10)
```

## SavedModels from Estimators

Estimators export SavedModels through `tf.Estimator.export_saved_model`

.

```
input_column = tf.feature_column.numeric_column("x")
estimator = tf.estimator.LinearClassifier(feature_columns=[input_column])
def input_fn():
return tf.data.Dataset.from_tensor_slices(
({"x": [1., 2., 3., 4.]}, [1, 1, 0, 0])).repeat(200).shuffle(64).batch(16)
estimator.train(input_fn)
```

To save an `Estimator`

you need to create a `serving_input_receiver`

. This function builds a part of a `tf.Graph`

that parses the raw data received by the SavedModel.

The `tf.estimator.export`

module contains functions to help build these `receivers`

.

The following code builds a receiver, based on the `feature_columns`

, that accepts serialized `tf.Example`

protocol buffers, which are often used with tf-serving.

```
tmpdir = tempfile.mkdtemp()
serving_input_fn = tf.estimator.export.build_parsing_serving_input_receiver_fn(
tf.feature_column.make_parse_example_spec([input_column]))
estimator_base_path = os.path.join(tmpdir, 'from_estimator')
estimator_path = estimator.export_saved_model(estimator_base_path, serving_input_fn)
```

You can also load and run that model, from python:

```
imported = tf.saved_model.load(estimator_path)
def predict(x):
example = tf.train.Example()
example.features.feature["x"].float_list.value.extend([x])
return imported.signatures["predict"](
examples=tf.constant([example.SerializeToString()]))
```

```
print(predict(1.5))
print(predict(3.5))
```

`tf.estimator.export.build_raw_serving_input_receiver_fn`

allows you to create input functions which take raw tensors rather than `tf.train.Example`

s.

## Using `tf.distribute.Strategy`

with Estimator (Limited support)

`tf.estimator`

is a distributed training TensorFlow API that originally supported the async parameter server approach. `tf.estimator`

now supports `tf.distribute.Strategy`

. If you're using `tf.estimator`

, you can change to distributed training with very few changes to your code. With this, Estimator users can now do synchronous distributed training on multiple GPUs and multiple workers, as well as use TPUs. This support in Estimator is, however, limited. Check out the What's supported now section below for more details.

Using `tf.distribute.Strategy`

with Estimator is slightly different than in the Keras case. Instead of using `strategy.scope`

, now you pass the strategy object into the `RunConfig`

for the Estimator.

You can refer to the distributed training guide for more information.

Here is a snippet of code that shows this with a premade Estimator `LinearRegressor`

and `MirroredStrategy`

:

```
mirrored_strategy = tf.distribute.MirroredStrategy()
config = tf.estimator.RunConfig(
train_distribute=mirrored_strategy, eval_distribute=mirrored_strategy)
regressor = tf.estimator.LinearRegressor(
feature_columns=[tf.feature_column.numeric_column('feats')],
optimizer='SGD',
config=config)
```

Here, you use a premade Estimator, but the same code works with a custom Estimator as well. `train_distribute`

determines how training will be distributed, and `eval_distribute`

determines how evaluation will be distributed. This is another difference from Keras where you use the same strategy for both training and eval.

Now you can train and evaluate this Estimator with an input function:

```
def input_fn():
dataset = tf.data.Dataset.from_tensors(({"feats":[1.]}, [1.]))
return dataset.repeat(1000).batch(10)
regressor.train(input_fn=input_fn, steps=10)
regressor.evaluate(input_fn=input_fn, steps=10)
```

Another difference to highlight here between Estimator and Keras is the input handling. In Keras, each batch of the dataset is split automatically across the multiple replicas. In Estimator, however, you do not perform automatic batch splitting, nor automatically shard the data across different workers. You have full control over how you want your data to be distributed across workers and devices, and you must provide an `input_fn`

to specify how to distribute your data.

Your `input_fn`

is called once per worker, thus giving one dataset per worker. Then one batch from that dataset is fed to one replica on that worker, thereby consuming N batches for N replicas on 1 worker. In other words, the dataset returned by the `input_fn`

should provide batches of size `PER_REPLICA_BATCH_SIZE`

. And the global batch size for a step can be obtained as `PER_REPLICA_BATCH_SIZE * strategy.num_replicas_in_sync`

.

When performing multi-worker training, you should either split your data across the workers, or shuffle with a random seed on each. You can check an example of how to do this in the Multi-worker training with Estimator tutorial.

And similarly, you can use multi worker and parameter server strategies as well. The code remains the same, but you need to use `tf.estimator.train_and_evaluate`

, and set `TF_CONFIG`

environment variables for each binary running in your cluster.

### What's supported now?

There is limited support for training with Estimator using all strategies except `TPUStrategy`

. Basic training and evaluation should work, but a number of advanced features such as `v1.train.Scaffold`

do not. There may also be a number of bugs in this integration and there are no plans to actively improve this support (the focus is on Keras and custom training loop support). If at all possible, you should prefer to use `tf.distribute`

with those APIs instead.

Training API | MirroredStrategy | TPUStrategy | MultiWorkerMirroredStrategy | CentralStorageStrategy | ParameterServerStrategy |
---|---|---|---|---|---|

Estimator API | Limited support | Not supported | Limited support | Limited support | Limited support |

### Examples and tutorials

Here are some end-to-end examples that show how to use various strategies with Estimator:

- The Multi-worker Training with Estimator tutorial shows how you can train with multiple workers using
`MultiWorkerMirroredStrategy`

on the MNIST dataset. - An end-to-end example of running multi-worker training with distribution strategies in
`tensorflow/ecosystem`

using Kubernetes templates. It starts with a Keras model and converts it to an Estimator using the`tf.keras.estimator.model_to_estimator`

API. - The official ResNet50 model, which can be trained using either
`MirroredStrategy`

or`MultiWorkerMirroredStrategy`

.