|View source on GitHub|
Saves and restores variables.
tf.compat.v1.train.Saver( var_list=None, reshape=False, sharded=False, max_to_keep=5, keep_checkpoint_every_n_hours=10000.0, name=None, restore_sequentially=False, saver_def=None, builder=None, defer_build=False, allow_empty=False, write_version=tf.train.SaverDef.V2, pad_step_number=False, save_relative_paths=False, filename=None )
Used in the notebooks
|Used in the guide|
See Variables for an overview of variables, saving and restoring.
Saver class adds ops to save and restore variables to and from
checkpoints. It also provides convenience methods to run these ops.
Checkpoints are binary files in a proprietary format which map variable names
to tensor values. The best way to examine the contents of a checkpoint is to
load it using a
Savers can automatically number checkpoint filenames with a provided counter. This lets you keep multiple checkpoints at different steps while training a model. For example you can number the checkpoint filenames with the training step number. To avoid filling up disks, savers manage checkpoint files automatically. For example, they can keep only the N most recent files, or one checkpoint for every N hours of training.
You number checkpoint filenames by passing a value to the optional
global_step argument to
saver.save(sess, 'my-model', global_step=0) ==> filename: 'my-model-0' ... saver.save(sess, 'my-model', global_step=1000) ==> filename: 'my-model-1000'
Additionally, optional arguments to the
Saver() constructor let you control
the proliferation of checkpoint files on disk:
max_to_keepindicates the maximum number of recent checkpoint files to keep. As new files are created, older files are deleted. If None or 0, no checkpoints are deleted from the filesystem but only the last one is kept in the
checkpointfile. Defaults to 5 (that is, the 5 most recent checkpoint files are kept.)
keep_checkpoint_every_n_hours: In addition to keeping the most recent
max_to_keepcheckpoint files, you might want to keep one checkpoint file for every N hours of training. This can be useful if you want to later analyze how a model progressed during a long training session. For example, passing
keep_checkpoint_every_n_hours=2ensures that you keep one checkpoint file for every 2 hours of training. The default value of 10,000 hours effectively disables the feature.
Note that you still have to call the
save() method to save the model.
Passing these arguments to the constructor will not save variables
automatically for you.
A training program that saves regularly looks like:
... # Create a saver. saver = tf.compat.v1.train.Saver(...variables...) # Launch the graph and train, saving the model every 1,000 steps. sess = tf.compat.v1.Session() for step in xrange(1000000): sess.run(..training_op..) if step % 1000 == 0: # Append the step number to the checkpoint name: saver.save(sess, 'my-model', global_step=step)
In addition to checkpoint files, savers keep a protocol buffer on disk with
the list of recent checkpoints. This is used to manage numbered checkpoint
files and by
latest_checkpoint(), which makes it easy to discover the path
to the most recent checkpoint. That protocol buffer is stored in a file named
'checkpoint' next to the checkpoint files.
If you create several savers, you can specify a different filename for the
protocol buffer file in the call to
A list of
||Maximum number of recent checkpoints to keep. Defaults to 5.|
||How often to keep checkpoints. Defaults to 10,000 hours.|
||String. Optional name to use as a prefix when adding operations.|