ML Community Day is November 9! Join us for updates from TensorFlow, JAX, and more Learn more

tf.train.CheckpointManager

Manages multiple checkpoints by keeping some and deleting unneeded ones.

Used in the notebooks

Used in the guide Used in the tutorials

Example usage:

import tensorflow as tf
checkpoint = tf.train.Checkpoint(optimizer=optimizer, model=model)
manager = tf.train.CheckpointManager(
    checkpoint, directory="/tmp/model", max_to_keep=5)
status = checkpoint.restore(manager.latest_checkpoint)
while True:
  # train
  manager.save()

CheckpointManager preserves its own state across instantiations (see the __init__ documentation for details). Only one should be active in a particular directory at a time.

checkpoint The tf.train.Checkpoint instance to save and manage checkpoints for.
directory The path to a directory in which to write checkpoints. A special file named "checkpoint" is also written to this directory (in a human-readable text format) which contains the state of the CheckpointManager.
max_to_keep An integer, the number of checkpoints to keep. Unless preserved by keep_checkpoint_every_n_hours, checkpoints will be deleted from the active set, oldest first, until only max_to_keep checkpoints remain. If None, no checkpoints are deleted and everything stays in the active set. Note that max_to_keep=None will keep all checkpoint paths in memory and in the checkpoint state protocol buffer on disk.
keep_checkpoint_every_n_hours Upon removal from the active set, a checkpoint will be preserved if it has been at least keep_checkpoint_every_n_hours since the last preserved checkpoint. The default setting of None does not preserve any checkpoints in this way.
checkpoint_name Custom name for the checkpoint file.
step_counter A tf.Variable instance for checking the current step counter value, in case users want to save checkpoints every N steps.
checkpoint_interval An integer, indicates the minimum step interval between two checkpoints.
init_fn Callable. A function to do customized intialization if no checkpoints are in the directory.

ValueError If max_to_keep is not a positive integer.

checkpoint Returns the tf.train.Checkpoint object.
checkpoint_interval

checkpoints A list of managed checkpoints.

Note that checkpoints saved due to keep_checkpoint_every_n_hours will not show up in this list (to avoid ever-growing filename lists).

directory

</