Manages multiple checkpoints by keeping some and deleting unneeded ones.

Used in the notebooks

Used in the guide Used in the tutorials

Example usage:

import tensorflow as tf
checkpoint = tf.train.Checkpoint(optimizer=optimizer, model=model)
manager = tf.train.CheckpointManager(
    checkpoint, directory="/tmp/model", max_to_keep=5)
status = checkpoint.restore(manager.latest_checkpoint)
while True:
  # train

CheckpointManager preserves its own state across instantiations (see the __init__ documentation for details). Only one should be active in a particular directory at a time.

checkpoint The tf.train.Checkpoint instance to save and manage checkpoints for.
directory The path to a directory in which to write checkpoints. A special file named "checkpoint" is also written to this directory (in a human-readable text format) which contains the state of the CheckpointManager.
max_to_keep An integer, the number of checkpoints to keep. Unless preserved by keep_checkpoint_every_n_hours, checkpoints will be deleted from the active set, oldest first, until only max_to_keep checkpoints remain. If None, no checkpoints are deleted and everything stays in the active set. Note that max_to_keep=None will keep all checkpoint paths in memory and in the checkpoint state protocol buffer on disk.
keep_checkpoint_every_n_hours Upon removal from the active set, a checkpoint will be preserved if it has been at least keep_checkpoint_every_n_hours since the last preserved checkpoint. The default setting of None does not preserve any checkpoints in this way.
checkpoint_name Custom name for the checkpoint file.
step_counter A tf.Variable instance for checking the current step counter value, in case users want to save checkpoints every N steps.
checkpoint_interval An integer, indicates the minimum