View on TensorFlow.org | Run in Google Colab | View source on GitHub | Download notebook |

This guide covers training, evaluation, and prediction (inference) models in TensorFlow 2.0 in two broad situations:

- When using built-in APIs for training & validation (such as
`model.fit()`

,`model.evaluate()`

,`model.predict()`

). This is covered in the section**"Using built-in training & evaluation loops"**. - When writing custom loops from scratch using eager execution and the
`GradientTape`

object. This is covered in the section**"Writing your own training & evaluation loops from scratch"**.

In general, whether you are using built-in loops or writing your own, model training & evaluation works strictly in the same way across every kind of Keras model -- Sequential models, models built with the Functional API, and models written from scratch via model subclassing.

This guide doesn't cover distributed training.

## Setup

```
from __future__ import absolute_import, division, print_function, unicode_literals
import tensorflow as tf
import numpy as np
```

## Part I: Using built-in training & evaluation loops

When passing data to the built-in training loops of a model, you should either use **Numpy arrays** (if your data is small and fits in memory) or **tf.data Dataset** objects. In the next few paragraphs, we'll use the MNIST dataset as Numpy arrays, in order to demonstrate how to use optimizers, losses, and metrics.

### API overview: a first end-to-end example

Let's consider the following model (here, we build in with the Functional API, but it could be a Sequential model or a subclassed model as well):

```
from tensorflow import keras
from tensorflow.keras import layers
inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, name='predictions')(x)
model = keras.Model(inputs=inputs, outputs=outputs)
```

Here's what the typical end-to-end workflow looks like, consisting of training, validation on a holdout set generated from the original training data, and finally evaluation on the test data:

Load a toy dataset for the sake of this example

```
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
# Preprocess the data (these are Numpy arrays)
x_train = x_train.reshape(60000, 784).astype('float32') / 255
x_test = x_test.reshape(10000, 784).astype('float32') / 255
y_train = y_train.astype('float32')
y_test = y_test.astype('float32')
# Reserve 10,000 samples for validation
x_val = x_train[-10000:]
y_val = y_train[-10000:]
x_train = x_train[:-10000]
y_train = y_train[:-10000]
```

Specify the training configuration (optimizer, loss, metrics)

```
model.compile(optimizer=keras.optimizers.RMSprop(), # Optimizer
# Loss function to minimize
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
# List of metrics to monitor
metrics=['sparse_categorical_accuracy'])
```

Train the model by slicing the data into "batches" of size "batch_size", and repeatedly iterating over the entire dataset for a given number of "epochs"

```
print('# Fit model on training data')
history = model.fit(x_train, y_train,
batch_size=64,
epochs=3,
# We pass some validation for
# monitoring validation loss and metrics
# at the end of each epoch
validation_data=(x_val, y_val))
print('\nhistory dict:', history.history)
```

# Fit model on training data Train on 50000 samples, validate on 10000 samples Epoch 1/3 50000/50000 [==============================] - 3s 55us/sample - loss: 0.3340 - sparse_categorical_accuracy: 0.9054 - val_loss: 0.1746 - val_sparse_categorical_accuracy: 0.9499 Epoch 2/3 50000/50000 [==============================] - 2s 41us/sample - loss: 0.1585 - sparse_categorical_accuracy: 0.9525 - val_loss: 0.1305 - val_sparse_categorical_accuracy: 0.9614 Epoch 3/3 50000/50000 [==============================] - 2s 40us/sample - loss: 0.1147 - sparse_categorical_accuracy: 0.9656 - val_loss: 0.1365 - val_sparse_categorical_accuracy: 0.9607 history dict: {'loss': [0.3339568813562393, 0.1584551115450263, 0.11470136372566223], 'sparse_categorical_accuracy': [0.9054, 0.95252, 0.96556], 'val_loss': [0.17461313144266605, 0.13053996562361717, 0.13651921340860426], 'val_sparse_categorical_accuracy': [0.9499, 0.9614, 0.9607]}

The returned "history" object holds a record of the loss values and metric values during training

```
# Evaluate the model on the test data using `evaluate`
print('\n# Evaluate on test data')
results = model.evaluate(x_test, y_test, batch_size=128)
print('test loss, test acc:', results)
# Generate predictions (probabilities -- the output of the last layer)
# on new data using `predict`
print('\n# Generate predictions for 3 samples')
predictions = model.predict(x_test[:3])
print('predictions shape:', predictions.shape)
```

# Evaluate on test data 10000/10000 [==============================] - 0s 13us/sample - loss: 0.1406 - sparse_categorical_accuracy: 0.9557 test loss, test acc: [0.14064121429026127, 0.9557] # Generate predictions for 3 samples predictions shape: (3, 10)

### Specifying a loss, metrics, and an optimizer

To train a model with `fit`

, you need to specify a loss function, an optimizer, and optionally, some metrics to monitor.

You pass these to the model as arguments to the `compile()`

method:

```
model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['sparse_categorical_accuracy'])
```

The `metrics`

argument should be a list -- you model can have any number of metrics.

If your model has multiple outputs, your can specify different losses and metrics for each output,
and you can modulate the contribution of each output to the total loss of the model. You will find more details about this in the section "**Passing data to multi-input, multi-output models**".

Note that in many cases, the loss and metrics are specified via string identifiers, as a shortcut:

```
model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['sparse_categorical_accuracy'])
```

For later reuse, let's put our model definition and compile step in functions; we will call them several times across different examples in this guide.

```
def get_uncompiled_model():
inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, name='predictions')(x)
model = keras.Model(inputs=inputs, outputs=outputs)
return model
def get_compiled_model():
model = get_uncompiled_model()
model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['sparse_categorical_accuracy'])
return model
```

#### Many built-in optimizers, losses, and metrics are available

In general, you won't have to create from scratch your own losses, metrics, or optimizers, because what you need is likely already part of the Keras API:

Optimizers:

`SGD()`

(with or without momentum)`RMSprop()`

`Adam()`

- etc.

Losses:

`MeanSquaredError()`

`KLDivergence()`

`CosineSimilarity()`

- etc.

Metrics:

`AUC()`

`Precision()`

`Recall()`

- etc.

#### Custom losses

There are two ways to provide custom losses with Keras. The first example creates a function that accepts inputs `y_true`

and `y_pred`

. The following example shows a loss function that computes the average absolute error between the real data and the predictions:

```
def basic_loss_function(y_true, y_pred):
return tf.math.reduce_mean(tf.abs(y_true - y_pred))
model.compile(optimizer=keras.optimizers.Adam(),
loss=basic_loss_function)
model.fit(x_train, y_train, batch_size=64, epochs=3)
```

Train on 50000 samples Epoch 1/3 50000/50000 [==============================] - 2s 33us/sample - loss: 1.4199 Epoch 2/3 50000/50000 [==============================] - 1s 28us/sample - loss: 0.7038 Epoch 3/3 50000/50000 [==============================] - 1s 28us/sample - loss: 0.5798 <tensorflow.python.keras.callbacks.History at 0x7f39a06d29b0>

If you need a loss function that takes in parameters beside `y_true`

and `y_pred`

, you can subclass the `tf.keras.losses.Loss`

class and implement the following two methods:

`__init__(self)`

—Accept parameters to pass during the call of your loss function`call(self, y_true, y_pred)`

—Use the targets (`y_true`

) and the model predictions (`y_pred`

) to compute the model's loss

Parameters passed into `__init__()`

can be used during `call()`

when calculating loss.

The following example shows how to implement a `WeightedCrossEntropy`

loss function that calculates a `BinaryCrossEntropy`

loss, where the loss of a certain class or the whole function can be modified by a scalar.

```
class WeightedBinaryCrossEntropy(keras.losses.Loss):
"""
Args:
pos_weight: Scalar to affect the positive labels of the loss function.
weight: Scalar to affect the entirety of the loss function.
from_logits: Whether to compute loss from logits or the probability.
reduction: Type of tf.keras.losses.Reduction to apply to loss.
name: Name of the loss function.
"""
def __init__(self, pos_weight, weight, from_logits=False,
reduction=keras.losses.Reduction.AUTO,
name='weighted_binary_crossentropy'):
super().__init__(reduction=reduction, name=name)
self.pos_weight = pos_weight
self.weight = weight
self.from_logits = from_logits
def call(self, y_true, y_pred):
ce = tf.losses.binary_crossentropy(
y_true, y_pred, from_logits=self.from_logits)[:,None]
ce = self.weight * (ce*(1-y_true) + self.pos_weight*ce*(y_true))
return ce
```

Binary loss but the dataset has 10 classes, so apply the loss to the dataset as if it were making an independent binary prediction for each class. To do that, start by creating one-hot vectors from the class indices:

```
one_hot_y_train = tf.one_hot(y_train.astype(np.int32), depth=10)
```

Now use those hone-hots, and the custom loss to train a model:

```
model = get_uncompiled_model()
model.compile(
optimizer=keras.optimizers.Adam(),
loss=WeightedBinaryCrossEntropy(
pos_weight=0.5, weight = 2, from_logits=True)
)
model.fit(x_train, one_hot_y_train, batch_size=64, epochs=5)
```

Train on 50000 samples Epoch 1/5 50000/50000 [==============================] - 2s 41us/sample - loss: 0.1767 Epoch 2/5 50000/50000 [==============================] - 2s 35us/sample - loss: 0.0668 Epoch 3/5 50000/50000 [==============================] - 2s 35us/sample - loss: 0.0484 Epoch 4/5 50000/50000 [==============================] - 2s 35us/sample - loss: 0.0385 Epoch 5/5 50000/50000 [==============================] - 2s 34us/sample - loss: 0.0322 <tensorflow.python.keras.callbacks.History at 0x7f39a066f860>

#### Custom metrics

If you need a metric that isn't part of the API, you can easily create custom metrics by subclassing the `Metric`

class. You will need to implement 4 methods:

`__init__(self)`

, in which you will create state variables for your metric.`update_state(self, y_true, y_pred, sample_weight=None)`

, which uses the targets`y_true`

and the model predictions`y_pred`

to update the state variables.`result(self)`

, which uses the state variables to compute the final results.`reset_states(self)`

, which reinitializes the state of the metric.

State update and results computation are kept separate (in `update_state()`

and `result()`

, respectively) because in some cases, results computation might be very expensive, and would only be done periodically.

Here's a simple example showing how to implement a `CategoricalTruePositives`

metric, that counts how many samples where correctly classified as belonging to a given class:

```
class CategoricalTruePositives(keras.metrics.Metric):
def __init__(self, name='categorical_true_positives', **kwargs):
super(CategoricalTruePositives, self).__init__(name=name, **kwargs)
self.true_positives = self.add_weight(name='tp', initializer='zeros')
def update_state(self, y_true, y_pred, sample_weight=None):
y_pred = tf.reshape(tf.argmax(y_pred, axis=1), shape=(-1, 1))
values = tf.cast(y_true, 'int32') == tf.cast(y_pred, 'int32')
values = tf.cast(values, 'float32')
if sample_weight is not None:
sample_weight = tf.cast(sample_weight, 'float32')
values = tf.multiply(values, sample_weight)
self.true_positives.assign_add(tf.reduce_sum(values))
def result(self):
return self.true_positives
def reset_states(self):
# The state of the metric will be reset at the start of each epoch.
self.true_positives.assign(0.)
```

```
model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=[CategoricalTruePositives()])
model.fit(x_train, y_train,
batch_size=64,
epochs=3)
```

Train on 50000 samples Epoch 1/3 50000/50000 [==============================] - 2s 43us/sample - loss: 0.0687 - categorical_true_positives: 48964.0000 Epoch 2/3 50000/50000 [==============================] - 2s 36us/sample - loss: 0.0566 - categorical_true_positives: 49125.0000 Epoch 3/3 50000/50000 [==============================] - 2s 36us/sample - loss: 0.0498 - categorical_true_positives: 49256.0000 <tensorflow.python.keras.callbacks.History at 0x7f39a0506128>

#### Handling losses and metrics that don't fit the standard signature

The overwhelming majority of losses and metrics can be computed from `y_true`

and `y_pred`

, where `y_pred`

is an output of your model. But not all of them. For instance, a regularization loss may only require the activation of a layer (there are no targets in this case), and this activation may not be a model output.

In such cases, you can call `self.add_loss(loss_value)`

from inside the `call`

method of a custom layer. Here's a simple example that adds activity regularization (note that activity regularization is built-in in all Keras layers -- this layer is just for the sake of providing a concrete example):

```
class ActivityRegularizationLayer(layers.Layer):
def call(self, inputs):
self.add_loss(tf.reduce_sum(inputs) * 0.1)
return inputs # Pass-through layer.
inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
# Insert activity regularization as a layer
x = ActivityRegularizationLayer()(x)
x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, name='predictions')(x)
model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True))
# The displayed loss will be much higher than before
# due to the regularization component.
model.fit(x_train, y_train,
batch_size=64,
epochs=1)
```

Train on 50000 samples 50000/50000 [==============================] - 2s 41us/sample - loss: 2.4873 <tensorflow.python.keras.callbacks.History at 0x7f39a03d77f0>

You can do the same for logging metric values:

```
class MetricLoggingLayer(layers.Layer):
def call(self, inputs):
# The `aggregation` argument defines
# how to aggregate the per-batch values
# over each epoch:
# in this case we simply average them.
self.add_metric(keras.backend.std(inputs),
name='std_of_activation',
aggregation='mean')
return inputs # Pass-through layer.
inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
# Insert std logging as a layer.
x = MetricLoggingLayer()(x)
x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, name='predictions')(x)
model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True))
model.fit(x_train, y_train,
batch_size=64,
epochs=1)
```

Train on 50000 samples 50000/50000 [==============================] - 2s 41us/sample - loss: 0.3458 - std_of_activation: 0.9716 <tensorflow.python.keras.callbacks.History at 0x7f39a0224898>

In the Functional API, you can also call `model.add_loss(loss_tensor)`

, or `model.add_metric(metric_tensor, name, aggregation)`

.

Here's a simple example:

```
inputs = keras.Input(shape=(784,), name='digits')
x1 = layers.Dense(64, activation='relu', name='dense_1')(inputs)
x2 = layers.Dense(64, activation='relu', name='dense_2')(x1)
outputs = layers.Dense(10, name='predictions')(x2)
model = keras.Model(inputs=inputs, outputs=outputs)
model.add_loss(tf.reduce_sum(x1) * 0.1)
model.add_metric(keras.backend.std(x1),
name='std_of_activation',
aggregation='mean')
model.compile(optimizer=keras.optimizers.RMSprop(1e-3),
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True))
model.fit(x_train, y_train,
batch_size=64,
epochs=1)
```

Train on 50000 samples 50000/50000 [==============================] - 2s 45us/sample - loss: 2.4805 - std_of_activation: 0.0018 <tensorflow.python.keras.callbacks.History at 0x7f39a00a96a0>

#### Automatically setting apart a validation holdout set

In the first end-to-end example you saw, we used the `validation_data`

argument to pass a tuple
of Numpy arrays `(x_val, y_val)`

to the model for evaluating a validation loss and validation metrics at the end of each epoch.

Here's another option: the argument `validation_split`

allows you to automatically reserve part of your training data for validation. The argument value represents the fraction of the data to be reserved for validation, so it should be set to a number higher than 0 and lower than 1. For instance, `validation_split=0.2`

means "use 20% of the data for validation", and `validation_split=0.6`

means "use 60% of the data for validation".

The way the validation is computed is by *taking the last x% samples of the arrays received by the fit call, before any shuffling*.

You can only use `validation_split`

when training with Numpy data.

```
model = get_compiled_model()
model.fit(x_train, y_train, batch_size=64, validation_split=0.2, epochs=1, steps_per_epoch=1)
```

Train on 40000 samples, validate on 10000 samples 64/40000 [..............................] - ETA: 7:37 - loss: 2.3859 - sparse_categorical_accuracy: 0.0625 - val_loss: 2.2227 - val_sparse_categorical_accuracy: 0.1936 <tensorflow.python.keras.callbacks.History at 0x7f396c332278>

### Training & evaluation from tf.data Datasets

In the past few paragraphs, you've seen how to handle losses, metrics, and optimizers, and you've seen how to use the `validation_data`

and `validation_split`

arguments in `fit`

, when your data is passed as Numpy arrays.

Let's now take a look at the case where your data comes in the form of a tf.data Dataset.

The tf.data API is a set of utilities in TensorFlow 2.0 for loading and preprocessing data in a way that's fast and scalable.

For a complete guide about creating Datasets, see the tf.data documentation.

You can pass a Dataset instance directly to the methods `fit()`

, `evaluate()`

, and `predict()`

:

```
model = get_compiled_model()
# First, let's create a training Dataset instance.
# For the sake of our example, we'll use the same MNIST data as before.
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
# Shuffle and slice the dataset.
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)
# Now we get a test dataset.
test_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test))
test_dataset = test_dataset.batch(64)
# Since the dataset already takes care of batching,
# we don't pass a `batch_size` argument.
model.fit(train_dataset, epochs=3)
# You can also evaluate or predict on a dataset.
print('\n# Evaluate')
result = model.evaluate(test_dataset)
dict(zip(model.metrics_names, result))
```

Train for 782 steps Epoch 1/3 782/782 [==============================] - 2s 3ms/step - loss: 0.3370 - sparse_categorical_accuracy: 0.9045 Epoch 2/3 782/782 [==============================] - 2s 3ms/step - loss: 0.1573 - sparse_categorical_accuracy: 0.9530 Epoch 3/3 782/782 [==============================] - 2s 3ms/step - loss: 0.1152 - sparse_categorical_accuracy: 0.9656 # Evaluate 157/157 [==============================] - 0s 2ms/step - loss: 0.1098 - sparse_categorical_accuracy: 0.9665 {'loss': 0.1097613838482737, 'sparse_categorical_accuracy': 0.9665}

Note that the Dataset is reset at the end of each epoch, so it can be reused of the next epoch.

If you want to run training only on a specific number of batches from this Dataset, you can pass the `steps_per_epoch`

argument, which specifies how many training steps the model should run using this Dataset before moving on to the next epoch.

If you do this, the dataset is not reset at the end of each epoch, instead we just keep drawing the next batches. The dataset will eventually run out of data (unless it is an infinitely-looping dataset).

```
model = get_compiled_model()
# Prepare the training dataset
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)
# Only use the 100 batches per epoch (that's 64 * 100 samples)
model.fit(train_dataset.take(100), epochs=3)
```

Train for 100 steps Epoch 1/3 100/100 [==============================] - 1s 7ms/step - loss: 0.8050 - sparse_categorical_accuracy: 0.8009 Epoch 2/3 100/100 [==============================] - 0s 4ms/step - loss: 0.3327 - sparse_categorical_accuracy: 0.9058 Epoch 3/3 100/100 [==============================] - 0s 4ms/step - loss: 0.2532 - sparse_categorical_accuracy: 0.9297 <tensorflow.python.keras.callbacks.History at 0x7f396c172358>

#### Using a validation dataset

You can pass a Dataset instance as the `validation_data`

argument in `fit`

:

```
model = get_compiled_model()
# Prepare the training dataset
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)
# Prepare the validation dataset
val_dataset = tf.data.Dataset.from_tensor_slices((x_val, y_val))
val_dataset = val_dataset.batch(64)
model.fit(train_dataset, epochs=3, validation_data=val_dataset)
```

Train for 782 steps, validate for 157 steps Epoch 1/3 782/782 [==============================] - 3s 3ms/step - loss: 0.3417 - sparse_categorical_accuracy: 0.9032 - val_loss: 0.1910 - val_sparse_categorical_accuracy: 0.9429 Epoch 2/3 782/782 [==============================] - 2s 3ms/step - loss: 0.1590 - sparse_categorical_accuracy: 0.9528 - val_loss: 0.1335 - val_sparse_categorical_accuracy: 0.9617 Epoch 3/3 782/782 [==============================] - 2s 3ms/step - loss: 0.1171 - sparse_categorical_accuracy: 0.9656 - val_loss: 0.1406 - val_sparse_categorical_accuracy: 0.9594 <tensorflow.python.keras.callbacks.History at 0x7f396c0ab978>

At the end of each epoch, the model will iterate over the validation Dataset and compute the validation loss and validation metrics.

If you want to run validation only on a specific number of batches from this Dataset, you can pass the `validation_steps`

argument, which specifies how many validation steps the model should run with the validation Dataset before interrupting validation and moving on to the next epoch:

```
model = get_compiled_model()
# Prepare the training dataset
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)
# Prepare the validation dataset
val_dataset = tf.data.Dataset.from_tensor_slices((x_val, y_val))
val_dataset = val_dataset.batch(64)
model.fit(train_dataset, epochs=3,
# Only run validation using the first 10 batches of the dataset
# using the `validation_steps` argument
validation_data=val_dataset, validation_steps=10)
```

Train for 782 steps, validate for 10 steps Epoch 1/3 782/782 [==============================] - 2s 3ms/step - loss: 0.3271 - sparse_categorical_accuracy: 0.9078 - val_loss: 0.2916 - val_sparse_categorical_accuracy: 0.9203 Epoch 2/3 782/782 [==============================] - 2s 3ms/step - loss: 0.1529 - sparse_categorical_accuracy: 0.9545 - val_loss: 0.2307 - val_sparse_categorical_accuracy: 0.9344 Epoch 3/3 782/782 [==============================] - 2s 3ms/step - loss: 0.1097 - sparse_categorical_accuracy: 0.9669 - val_loss: 0.1927 - val_sparse_categorical_accuracy: 0.9406 <tensorflow.python.keras.callbacks.History at 0x7f394c6daf98>

Note that the validation Dataset will be reset after each use (so that you will always be evaluating on the same samples from epoch to epoch).

The argument `validation_split`

(generating a holdout set from the training data) is not supported when training from Dataset objects, since this features requires the ability to index the samples of the datasets, which is not possible in general with the Dataset API.

### Other input formats supported

Besides Numpy arrays and TensorFlow Datasets, it's possible to train a Keras model using Pandas dataframes, or from Python generators that yield batches.

In general, we recommend that you use Numpy input data if your data is small and fits in memory, and Datasets otherwise.

### Using sample weighting and class weighting

Besides input data and target data, it is possible to pass sample weights or class weights to a model when using `fit`

:

- When training from Numpy data: via the
`sample_weight`

and`class_weight`

arguments. - When training from Datasets: by having the Dataset return a tuple
`(input_batch, target_batch, sample_weight_batch)`

.

A "sample weights" array is an array of numbers that specify how much weight each sample in a batch should have in computing the total loss. It is commonly used in imbalanced classification problems (the idea being to give more weight to rarely-seen classes). When the weights used are ones and zeros, the array can be used as a *mask* for the loss function (entirely discarding the contribution of certain samples to the total loss).

A "class weights" dict is a more specific instance of the same concept: it maps class indices to the sample weight that should be used for samples belonging to this class. For instance, if class "0" is twice less represented than class "1" in your data, you could use `class_weight={0: 1., 1: 0.5}`

.

Here's a Numpy example where we use class weights or sample weights to give more importance to the correct classification of class #5 (which is the digit "5" in the MNIST dataset).

```
import numpy as np
class_weight = {0: 1., 1: 1., 2: 1., 3: 1., 4: 1.,
# Set weight "2" for class "5",
# making this class 2x more important
5: 2.,
6: 1., 7: 1., 8: 1., 9: 1.}
print('Fit with class weight')
model.fit(x_train, y_train,
class_weight=class_weight,
batch_size=64,
epochs=4)
```

Fit with class weight WARNING:tensorflow:sample_weight modes were coerced from ... to ['...'] Train on 50000 samples Epoch 1/4 50000/50000 [==============================] - 2s 47us/sample - loss: 0.0980 - sparse_categorical_accuracy: 0.9724 Epoch 2/4 50000/50000 [==============================] - 2s 43us/sample - loss: 0.0800 - sparse_categorical_accuracy: 0.9772 Epoch 3/4 50000/50000 [==============================] - 2s 48us/sample - loss: 0.0694 - sparse_categorical_accuracy: 0.9802 Epoch 4/4 50000/50000 [==============================] - 2s 43us/sample - loss: 0.0600 - sparse_categorical_accuracy: 0.9830 <tensorflow.python.keras.callbacks.History at 0x7f394c5a5978>

```
# Here's the same example using `sample_weight` instead:
sample_weight = np.ones(shape=(len(y_train),))
sample_weight[y_train == 5] = 2.
print('\nFit with sample weight')
model = get_compiled_model()
model.fit(x_train, y_train,
sample_weight=sample_weight,
batch_size=64,
epochs=4)
```

Fit with sample weight WARNING:tensorflow:sample_weight modes were coerced from ... to ['...'] Train on 50000 samples Epoch 1/4 50000/50000 [==============================] - 3s 51us/sample - loss: 0.3750 - sparse_categorical_accuracy: 0.9021 Epoch 2/4 50000/50000 [==============================] - 2s 43us/sample - loss: 0.1733 - sparse_categorical_accuracy: 0.9518 Epoch 3/4 50000/50000 [==============================] - 2s 42us/sample - loss: 0.1262 - sparse_categorical_accuracy: 0.9649 Epoch 4/4 50000/50000 [==============================] - 2s 42us/sample - loss: 0.1005 - sparse_categorical_accuracy: 0.9715 <tensorflow.python.keras.callbacks.History at 0x7f394c4a2128>

Here's a matching Dataset example:

```
sample_weight = np.ones(shape=(len(y_train),))
sample_weight[y_train == 5] = 2.
# Create a Dataset that includes sample weights
# (3rd element in the return tuple).
train_dataset = tf.data.Dataset.from_tensor_slices(
(x_train, y_train, sample_weight))
# Shuffle and slice the dataset.
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)
model = get_compiled_model()
model.fit(train_dataset, epochs=3)
```

Train for 782 steps Epoch 1/3 782/782 [==============================] - 3s 3ms/step - loss: 0.3709 - sparse_categorical_accuracy: 0.9038 Epoch 2/3 782/782 [==============================] - 2s 3ms/step - loss: 0.1652 - sparse_categorical_accuracy: 0.9535 Epoch 3/3 782/782 [==============================] - 2s 3ms/step - loss: 0.1206 - sparse_categorical_accuracy: 0.9664 <tensorflow.python.keras.callbacks.History at 0x7f394c31bc18>

### Passing data to multi-input, multi-output models

In the previous examples, we were considering a model with a single input (a tensor of shape `(764,)`

) and a single output (a prediction tensor of shape `(10,)`

). But what about models that have multiple inputs or outputs?

Consider the following model, which has an image input of shape `(32, 32, 3)`

(that's `(height, width, channels)`

) and a timeseries input of shape `(None, 10)`

(that's `(timesteps, features)`

). Our model will have two outputs computed from the combination of these inputs: a "score" (of shape `(1,)`

) and a probability distribution over five classes (of shape `(5,)`

).

```
from tensorflow import keras
from tensorflow.keras import layers
image_input = keras.Input(shape=(32, 32, 3), name='img_input')
timeseries_input = keras.Input(shape=(None, 10), name='ts_input')
x1 = layers.Conv2D(3, 3)(image_input)
x1 = layers.GlobalMaxPooling2D()(x1)
x2 = layers.Conv1D(3, 3)(timeseries_input)
x2 = layers.GlobalMaxPooling1D()(x2)
x = layers.concatenate([x1, x2])
score_output = layers.Dense(1, name='score_output')(x)
class_output = layers.Dense(5, name='class_output')(x)
model = keras.Model(inputs=[image_input, timeseries_input],
outputs=[score_output, class_output])
```

Let's plot this model, so you can clearly see what we're doing here (note that the shapes shown in the plot are batch shapes, rather than per-sample shapes).

```
keras.utils.plot_model(model, 'multi_input_and_output_model.png', show_shapes=True)
```

At compilation time, we can specify different losses to different outputs, by passing the loss functions as a list:

```
model.compile(
optimizer=keras.optimizers.RMSprop(1e-3),
loss=[keras.losses.MeanSquaredError(),
keras.losses.CategoricalCrossentropy(from_logits=True)])
```

If we only passed a single loss function to the model, the same loss function would be applied to every output, which is not appropriate here.

Likewise for metrics:

```
model.compile(
optimizer=keras.optimizers.RMSprop(1e-3),
loss=[keras.losses.MeanSquaredError(),
keras.losses.CategoricalCrossentropy(from_logits=True)],
metrics=[[keras.metrics.MeanAbsolutePercentageError(),
keras.metrics.MeanAbsoluteError()],
[keras.metrics.CategoricalAccuracy()]])
```

Since we gave names to our output layers, we could also specify per-output losses and metrics via a dict:

```
model.compile(
optimizer=keras.optimizers.RMSprop(1e-3),
loss={'score_output': keras.losses.MeanSquaredError(),
'class_output': keras.losses.CategoricalCrossentropy(from_logits=True)},
metrics={'score_output': [keras.metrics.MeanAbsolutePercentageError(),
keras.metrics.MeanAbsoluteError()],
'class_output': [keras.metrics.CategoricalAccuracy()]})
```

We recommend the use of explicit names and dicts if you have more than 2 outputs.

It's possible to give different weights to different output-specific losses (for instance, one might wish to privilege the "score" loss in our example, by giving to 2x the importance of the class loss), using the `loss_weights`

argument:

```
model.compile(
optimizer=keras.optimizers.RMSprop(1e-3),
loss={'score_output': keras.losses.MeanSquaredError(),
'class_output': keras.losses.CategoricalCrossentropy(from_logits=True)},
metrics={'score_output': [keras.metrics.MeanAbsolutePercentageError(),
keras.metrics.MeanAbsoluteError()],
'class_output': [keras.metrics.CategoricalAccuracy()]},
loss_weights={'score_output': 2., 'class_output': 1.})
```

You could also chose not to compute a loss for certain outputs, if these outputs meant for prediction but not for training:

```
# List loss version
model.compile(
optimizer=keras.optimizers.RMSprop(1e-3),
loss=[None, keras.losses.CategoricalCrossentropy(from_logits=True)])
# Or dict loss version
model.compile(
optimizer=keras.optimizers.RMSprop(1e-3),
loss={'class_output':keras.losses.CategoricalCrossentropy(from_logits=True)})
```

WARNING:tensorflow:Output score_output missing from loss dictionary. We assume this was done on purpose. The fit and evaluate APIs will not be expecting any data to be passed to score_output.

Passing data to a multi-input or multi-output model in `fit`

works in a similar way as specifying a loss function in `compile`

:
you can pass *lists of Numpy arrays (with 1:1 mapping to the outputs that received a loss function)* or *dicts mapping output names to Numpy arrays of training data*.

```
model.compile(
optimizer=keras.optimizers.RMSprop(1e-3),
loss=[keras.losses.MeanSquaredError(),
keras.losses.CategoricalCrossentropy(from_logits=True)])
# Generate dummy Numpy data
img_data = np.random.random_sample(size=(100, 32, 32, 3))
ts_data = np.random.random_sample(size=(100, 20, 10))
score_targets = np.random.random_sample(size=(100, 1))
class_targets = np.random.random_sample(size=(100, 5))
# Fit on lists
model.fit([img_data, ts_data], [score_targets, class_targets],
batch_size=32,
epochs=3)
# Alternatively, fit on dicts
model.fit({'img_input': img_data, 'ts_input': ts_data},
{'score_output': score_targets, 'class_output': class_targets},
batch_size=32,
epochs=3)
```

Train on 100 samples Epoch 1/3 100/100 [==============================] - 2s 21ms/sample - loss: 6.1054 - score_output_loss: 0.5598 - class_output_loss: 5.4202 Epoch 2/3 100/100 [==============================] - 0s 184us/sample - loss: 5.6142 - score_output_loss: 0.2550 - class_output_loss: 5.5780 Epoch 3/3 100/100 [==============================] - 0s 188us/sample - loss: 5.4071 - score_output_loss: 0.1756 - class_output_loss: 5.2709 Train on 100 samples Epoch 1/3 100/100 [==============================] - 0s 231us/sample - loss: 5.3040 - score_output_loss: 0.1111 - class_output_loss: 4.9626 Epoch 2/3 100/100 [==============================] - 0s 183us/sample - loss: 5.2867 - score_output_loss: 0.1113 - class_output_loss: 5.0326 Epoch 3/3 100/100 [==============================] - 0s 185us/sample - loss: 5.2791 - score_output_loss: 0.0877 - class_output_loss: 5.2646 <tensorflow.python.keras.callbacks.History at 0x7f39347d6b38>

Here's the Dataset use case: similarly as what we did for Numpy arrays, the Dataset should return a tuple of dicts.

```
train_dataset = tf.data.Dataset.from_tensor_slices(
({'img_input': img_data, 'ts_input': ts_data},
{'score_output': score_targets, 'class_output': class_targets}))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)
model.fit(train_dataset, epochs=3)
```

Train for 2 steps Epoch 1/3 2/2 [==============================] - 0s 223ms/step - loss: 5.3278 - score_output_loss: 0.1094 - class_output_loss: 5.2509 Epoch 2/3 2/2 [==============================] - 0s 10ms/step - loss: 5.2688 - score_output_loss: 0.1082 - class_output_loss: 5.1669 Epoch 3/3 2/2 [==============================] - 0s 10ms/step - loss: 5.2546 - score_output_loss: 0.1183 - class_output_loss: 5.1350 <tensorflow.python.keras.callbacks.History at 0x7f394c5ac8d0>

### Using callbacks

Callbacks in Keras are objects that are called at different point during training (at the start of an epoch, at the end of a batch, at the end of an epoch, etc.) and which can be used to implement behaviors such as:

- Doing validation at different points during training (beyond the built-in per-epoch validation)
- Checkpointing the model at regular intervals or when it exceeds a certain accuracy threshold
- Changing the learning rate of the model when training seems to be plateauing
- Doing fine-tuning of the top layers when training seems to be plateauing
- Sending email or instant message notifications when training ends or where a certain performance threshold is exceeded
- Etc.

Callbacks can be passed as a list to your call to `fit`

:

```
model = get_compiled_model()
callbacks = [
keras.callbacks.EarlyStopping(
# Stop training when `val_loss` is no longer improving
monitor='val_loss',
# "no longer improving" being defined as "no better than 1e-2 less"
min_delta=1e-2,
# "no longer improving" being further defined as "for at least 2 epochs"
patience=2,
verbose=1)
]
model.fit(x_train, y_train,
epochs=20,
batch_size=64,
callbacks=callbacks,
validation_split=0.2)
```

Train on 40000 samples, validate on 10000 samples Epoch 1/20 40000/40000 [==============================] - 2s 52us/sample - loss: 0.3816 - sparse_categorical_accuracy: 0.8925 - val_loss: 0.2310 - val_sparse_categorical_accuracy: 0.9310 Epoch 2/20 40000/40000 [==============================] - 2s 41us/sample - loss: 0.1744 - sparse_categorical_accuracy: 0.9487 - val_loss: 0.2201 - val_sparse_categorical_accuracy: 0.9315 Epoch 3/20 40000/40000 [==============================] - 2s 41us/sample - loss: 0.1224 - sparse_categorical_accuracy: 0.9628 - val_loss: 0.1460 - val_sparse_categorical_accuracy: 0.9547 Epoch 4/20 40000/40000 [==============================] - 2s 41us/sample - loss: 0.0962 - sparse_categorical_accuracy: 0.9711 - val_loss: 0.1440 - val_sparse_categorical_accuracy: 0.9576 Epoch 5/20 40000/40000 [==============================] - 2s 41us/sample - loss: 0.0792 - sparse_categorical_accuracy: 0.9766 - val_loss: 0.1287 - val_sparse_categorical_accuracy: 0.9627 Epoch 6/20 40000/40000 [==============================] - 2s 41us/sample - loss: 0.0656 - sparse_categorical_accuracy: 0.9799 - val_loss: 0.1259 - val_sparse_categorical_accuracy: 0.9629 Epoch 7/20 40000/40000 [==============================] - 2s 40us/sample - loss: 0.0568 - sparse_categorical_accuracy: 0.9830 - val_loss: 0.1293 - val_sparse_categorical_accuracy: 0.9657 Epoch 00007: early stopping <tensorflow.python.keras.callbacks.History at 0x7f393467b048>

#### Many built-in callbacks are available

`ModelCheckpoint`

: Periodically save the model.`EarlyStopping`

: Stop training when training is no longer improving the validation metrics.`TensorBoard`

: periodically write model logs that can be visualized in TensorBoard (more details in the section "Visualization").`CSVLogger`

: streams loss and metrics data to a CSV file.- etc.

#### Writing your own callback

You can create a custom callback by extending the base class keras.callbacks.Callback. A callback has access to its associated model through the class property `self.model`

.

Here's a simple example saving a list of per-batch loss values during training:

```
class LossHistory(keras.callbacks.Callback):
def on_train_begin(self, logs):
self.losses = []
def on_batch_end(self, batch, logs):
self.losses.append(logs.get('loss'))
```

### Checkpointing models

When you're training model on relatively large datasets, it's crucial to save checkpoints of your model at frequent intervals.

The easiest way to achieve this is with the `ModelCheckpoint`

callback:

```
model = get_compiled_model()
callbacks = [
keras.callbacks.ModelCheckpoint(
filepath='mymodel_{epoch}',
# Path where to save the model
# The two parameters below mean that we will overwrite
# the current checkpoint if and only if
# the `val_loss` score has improved.
save_best_only=True,
monitor='val_loss',
verbose=1)
]
model.fit(x_train, y_train,
epochs=3,
batch_size=64,
callbacks=callbacks,
validation_split=0.2)
```

Train on 40000 samples, validate on 10000 samples Epoch 1/3 39104/40000 [============================>.] - ETA: 0s - loss: 0.3736 - sparse_categorical_accuracy: 0.8961 Epoch 00001: val_loss improved from inf to 0.22722, saving model to mymodel_1 WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1786: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version. Instructions for updating: If using Keras pass *_constraint arguments to layers. INFO:tensorflow:Assets written to: mymodel_1/assets 40000/40000 [==============================] - 2s 61us/sample - loss: 0.3701 - sparse_categorical_accuracy: 0.8971 - val_loss: 0.2272 - val_sparse_categorical_accuracy: 0.9319 Epoch 2/3 39744/40000 [============================>.] - ETA: 0s - loss: 0.1719 - sparse_categorical_accuracy: 0.9492 Epoch 00002: val_loss improved from 0.22722 to 0.17655, saving model to mymodel_2 INFO:tensorflow:Assets written to: mymodel_2/assets 40000/40000 [==============================] - 2s 49us/sample - loss: 0.1717 - sparse_categorical_accuracy: 0.9492 - val_loss: 0.1766 - val_sparse_categorical_accuracy: 0.9484 Epoch 3/3 39488/40000 [============================>.] - ETA: 0s - loss: 0.1225 - sparse_categorical_accuracy: 0.9637 Epoch 00003: val_loss improved from 0.17655 to 0.15457, saving model to mymodel_3 INFO:tensorflow:Assets written to: mymodel_3/assets 40000/40000 [==============================] - 2s 50us/sample - loss: 0.1226 - sparse_categorical_accuracy: 0.9636 - val_loss: 0.1546 - val_sparse_categorical_accuracy: 0.9553 <tensorflow.python.keras.callbacks.History at 0x7f393452a358>

You call also write your own callback for saving and restoring models.

For a complete guide on serialization and saving, see Guide to Saving and Serializing Models.

### Using learning rate schedules

A common pattern when training deep learning models is to gradually reduce the learning as training progresses. This is generally known as "learning rate decay".

The learning decay schedule could be static (fixed in advance, as a function of the current epoch or the current batch index), or dynamic (responding to the current behavior of the model, in particular the validation loss).

#### Passing a schedule to an optimizer

You can easily use a static learning rate decay schedule by passing a schedule object as the `learning_rate`

argument in your optimizer:

```
initial_learning_rate = 0.1
lr_schedule = keras.optimizers.schedules.ExponentialDecay(
initial_learning_rate,
decay_steps=100000,
decay_rate=0.96,
staircase=True)
optimizer = keras.optimizers.RMSprop(learning_rate=lr_schedule)
```

Several built-in schedules are available: `ExponentialDecay`

, `PiecewiseConstantDecay`

, `PolynomialDecay`

, and `InverseTimeDecay`

.

#### Using callbacks to implement a dynamic learning rate schedule

A dynamic learning rate schedule (for instance, decreasing the learning rate when the validation loss is no longer improving) cannot be achieved with these schedule objects since the optimizer does not have access to validation metrics.

However, callbacks do have access to all metrics, including validation metrics! You can thus achieve this pattern by using a callback that modifies the current learning rate on the optimizer. In fact, this is even built-in as the `ReduceLROnPlateau`

callback.

### Visualizing loss and metrics during training

The best way to keep an eye on your model during training is to use TensorBoard, a browser-based application that you can run locally that provides you with:

- Live plots of the loss and metrics for training and evaluation
- (optionally) Visualizations of the histograms of your layer activations
- (optionally) 3D visualizations of the embedding spaces learned by your
`Embedding`

layers

If you have installed TensorFlow with pip, you should be able to launch TensorBoard from the command line:

```
tensorboard --logdir=/full_path_to_your_logs
```

#### Using the TensorBoard callback

The easiest way to use TensorBoard with a Keras model and the `fit`

method is the `TensorBoard`

callback.

In the simplest case, just specify where you want the callback to write logs, and you're good to go:

```
tensorboard_cbk = keras.callbacks.TensorBoard(log_dir='/full_path_to_your_logs')
model.fit(dataset, epochs=10, callbacks=[tensorboard_cbk])
```

The `TensorBoard`

callback has many useful options, including whether to log embeddings, histograms, and how often to write logs:

```
keras.callbacks.TensorBoard(
log_dir='/full_path_to_your_logs',
histogram_freq=0, # How often to log histogram visualizations
embeddings_freq=0, # How often to log embedding visualizations
update_freq='epoch') # How often to write logs (default: once per epoch)
```

## Part II: Writing your own training & evaluation loops from scratch

If you want lower-level over your training & evaluation loops than what `fit()`

and `evaluate()`

provide, you should write your own. It's actually pretty simple! But you should be ready to have a lot more debugging to do on your own.

### Using the GradientTape: a first end-to-end example

Calling a model inside a `GradientTape`

scope enables you to retrieve the gradients of the trainable weights of the layer with respect to a loss value. Using an optimizer instance, you can use these gradients to update these variables (which you can retrieve using `model.trainable_weights`

).

Let's reuse our initial MNIST model from Part I, and let's train it using mini-batch gradient with a training loop.

```
# Get the model.
inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, name='predictions')(x)
model = keras.Model(inputs=inputs, outputs=outputs)
# Instantiate an optimizer.
optimizer = keras.optimizers.SGD(learning_rate=1e-3)
# Instantiate a loss function.
loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)
# Prepare the training dataset.
batch_size = 64
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size)
```

Run a training loop for a few epochs:

```
epochs = 3
for epoch in range(epochs):
print('Start of epoch %d' % (epoch,))
# Iterate over the batches of the dataset.
for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
# Open a GradientTape to record the operations run
# during the forward pass, which enables autodifferentiation.
with tf.GradientTape() as tape:
# Run the forward pass of the layer.
# The operations that the layer applies
# to its inputs are going to be recorded
# on the GradientTape.
logits = model(x_batch_train, training=True) # Logits for this minibatch
# Compute the loss value for this minibatch.
loss_value = loss_fn(y_batch_train, logits)
# Use the gradient tape to automatically retrieve
# the gradients of the trainable variables with respect to the loss.
grads = tape.gradient(loss_value, model.trainable_weights)
# Run one step of gradient descent by updating
# the value of the variables to minimize the loss.
optimizer.apply_gradients(zip(grads, model.trainable_weights))
# Log every 200 batches.
if step % 200 == 0:
print('Training loss (for one batch) at step %s: %s' % (step, float(loss_value)))
print('Seen so far: %s samples' % ((step + 1) * 64))
```

Start of epoch 0 Training loss (for one batch) at step 0: 2.373323917388916 Seen so far: 64 samples Training loss (for one batch) at step 200: 2.1732797622680664 Seen so far: 12864 samples Training loss (for one batch) at step 400: 2.1088500022888184 Seen so far: 25664 samples Training loss (for one batch) at step 600: 2.0335419178009033 Seen so far: 38464 samples Start of epoch 1 Training loss (for one batch) at step 0: 1.8996050357818604 Seen so far: 64 samples Training loss (for one batch) at step 200: 1.8157986402511597 Seen so far: 12864 samples Training loss (for one batch) at step 400: 1.6991560459136963 Seen so far: 25664 samples Training loss (for one batch) at step 600: 1.604067325592041 Seen so far: 38464 samples Start of epoch 2 Training loss (for one batch) at step 0: 1.5966098308563232 Seen so far: 64 samples Training loss (for one batch) at step 200: 1.5976207256317139 Seen so far: 12864 samples Training loss (for one batch) at step 400: 1.4300774335861206 Seen so far: 25664 samples Training loss (for one batch) at step 600: 1.2870347499847412 Seen so far: 38464 samples

### Low-level handling of metrics

Let's add metrics to the mix. You can readily reuse the built-in metrics (or custom ones you wrote) in such training loops written from scratch. Here's the flow:

- Instantiate the metric at the start of the loop
- Call
`metric.update_state()`

after each batch - Call
`metric.result()`

when you need to display the current value of the metric - Call
`metric.reset_states()`

when you need to clear the state of the metric (typically at the end of an epoch)

Let's use this knowledge to compute `SparseCategoricalAccuracy`

on validation data at the end of each epoch:

```
# Get model
inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, name='predictions')(x)
model = keras.Model(inputs=inputs, outputs=outputs)
# Instantiate an optimizer to train the model.
optimizer = keras.optimizers.SGD(learning_rate=1e-3)
# Instantiate a loss function.
loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)
# Prepare the metrics.
train_acc_metric = keras.metrics.SparseCategoricalAccuracy()
val_acc_metric = keras.metrics.SparseCategoricalAccuracy()
# Prepare the training dataset.
batch_size = 64
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size)
# Prepare the validation dataset.
val_dataset = tf.data.Dataset.from_tensor_slices((x_val, y_val))
val_dataset = val_dataset.batch(64)
```

Run a training loop for a few epochs:

```
epochs = 3
for epoch in range(epochs):
print('Start of epoch %d' % (epoch,))
# Iterate over the batches of the dataset.
for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
with tf.GradientTape() as tape:
logits = model(x_batch_train)
loss_value = loss_fn(y_batch_train, logits)
grads = tape.gradient(loss_value, model.trainable_weights)
optimizer.apply_gradients(zip(grads, model.trainable_weights))
# Update training metric.
train_acc_metric(y_batch_train, logits)
# Log every 200 batches.
if step % 200 == 0:
print('Training loss (for one batch) at step %s: %s' % (step, float(loss_value)))
print('Seen so far: %s samples' % ((step + 1) * 64))
# Display metrics at the end of each epoch.
train_acc = train_acc_metric.result()
print('Training acc over epoch: %s' % (float(train_acc),))
# Reset training metrics at the end of each epoch
train_acc_metric.reset_states()
# Run a validation loop at the end of each epoch.
for x_batch_val, y_batch_val in val_dataset:
val_logits = model(x_batch_val)
# Update val metrics
val_acc_metric(y_batch_val, val_logits)
val_acc = val_acc_metric.result()
val_acc_metric.reset_states()
print('Validation acc: %s' % (float(val_acc),))
```

Start of epoch 0 Training loss (for one batch) at step 0: 2.36129093170166 Seen so far: 64 samples Training loss (for one batch) at step 200: 2.2219269275665283 Seen so far: 12864 samples Training loss (for one batch) at step 400: 2.1605920791625977 Seen so far: 25664 samples Training loss (for one batch) at step 600: 2.04164981842041 Seen so far: 38464 samples Training acc over epoch: 0.272460013628006 Validation acc: 0.4361000061035156 Start of epoch 1 Training loss (for one batch) at step 0: 2.0127315521240234 Seen so far: 64 samples Training loss (for one batch) at step 200: 1.9280786514282227 Seen so far: 12864 samples Training loss (for one batch) at step 400: 1.8632136583328247 Seen so far: 25664 samples Training loss (for one batch) at step 600: 1.6286849975585938 Seen so far: 38464 samples Training acc over epoch: 0.5415999889373779 Validation acc: 0.6420999765396118 Start of epoch 2 Training loss (for one batch) at step 0: 1.6193240880966187 Seen so far: 64 samples Training loss (for one batch) at step 200: 1.5449484586715698 Seen so far: 12864 samples Training loss (for one batch) at step 400: 1.3990552425384521 Seen so far: 25664 samples Training loss (for one batch) at step 600: 1.3469998836517334 Seen so far: 38464 samples Training acc over epoch: 0.692080020904541 Validation acc: 0.7674999833106995

### Low-level handling of extra losses

You saw in the previous section that it is possible for regularization losses to be added by a layer by calling `self.add_loss(value)`

in the `call`

method.

In the general case, you will want to take these losses into account in your training loops (unless you've written the model yourself and you already know that it creates no such losses).

Recall this example from the previous section, featuring a layer that creates a regularization loss:

```
class ActivityRegularizationLayer(layers.Layer):
def call(self, inputs):
self.add_loss(1e-2 * tf.reduce_sum(inputs))
return inputs
inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
# Insert activity regularization as a layer
x = ActivityRegularizationLayer()(x)
x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, name='predictions')(x)
model = keras.Model(inputs=inputs, outputs=outputs)
```

When you call a model, like this:

```
logits = model(x_train)
```

the losses it creates during the forward pass are added to the `model.losses`

attribute:

```
logits = model(x_train[:64])
print(model.losses)
```

[<tf.Tensor: shape=(), dtype=float32, numpy=8.973582>]

The tracked losses are first cleared at the start of the model `__call__`

, so you will only see the losses created during this one forward pass. For instance, calling the model repeatedly and then querying `losses`

only displays the latest losses, created during the last call:

```
logits = model(x_train[:64])
logits = model(x_train[64: 128])
logits = model(x_train[128: 192])
print(model.losses)
```

[<tf.Tensor: shape=(), dtype=float32, numpy=8.614785>]

To take these losses into account during training, all you have to do is to modify your training loop to add `sum(model.losses)`

to your total loss:

```
optimizer = keras.optimizers.SGD(learning_rate=1e-3)
epochs = 3
for epoch in range(epochs):
print('Start of epoch %d' % (epoch,))
for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
with tf.GradientTape() as tape:
logits = model(x_batch_train)
loss_value = loss_fn(y_batch_train, logits)
# Add extra losses created during this forward pass:
loss_value += sum(model.losses)
grads = tape.gradient(loss_value, model.trainable_weights)
optimizer.apply_gradients(zip(grads, model.trainable_weights))
# Log every 200 batches.
if step % 200 == 0:
print('Training loss (for one batch) at step %s: %s' % (step, float(loss_value)))
print('Seen so far: %s samples' % ((step + 1) * 64))
```

Start of epoch 0 Training loss (for one batch) at step 0: 11.443477630615234 Seen so far: 64 samples Training loss (for one batch) at step 200: 2.4743411540985107 Seen so far: 12864 samples Training loss (for one batch) at step 400: 2.4310429096221924 Seen so far: 25664 samples Training loss (for one batch) at step 600: 2.354671001434326 Seen so far: 38464 samples Start of epoch 1 Training loss (for one batch) at step 0: 2.3424928188323975 Seen so far: 64 samples Training loss (for one batch) at step 200: 2.3235275745391846 Seen so far: 12864 samples Training loss (for one batch) at step 400: 2.316230535507202 Seen so far: 25664 samples Training loss (for one batch) at step 600: 2.3230934143066406 Seen so far: 38464 samples Start of epoch 2 Training loss (for one batch) at step 0: 2.3138749599456787 Seen so far: 64 samples Training loss (for one batch) at step 200: 2.3191802501678467 Seen so far: 12864 samples Training loss (for one batch) at step 400: 2.3080945014953613 Seen so far: 25664 samples Training loss (for one batch) at step 600: 2.314190626144409 Seen so far: 38464 samples

That was the last piece of the puzzle! You've reached the end of this guide.

Now you know everything there is to know about using built-in training loops and writing your own from scratch.