tf.train.MonitoredTrainingSession(master='', is_chief=True, checkpoint_dir=None, scaffold=None, hooks=None, chief_only_hooks=None, save_checkpoint_secs=600, save_summaries_steps=100, config=None)
See the guide: Training > Distributed execution
MonitoredSession for training.
For a chief, this utility sets proper session initializer/restorer. It also creates hooks related to checkpoint and summary saving. For workers, this utility sets proper session creator which waits for the chief to inialize/restore.
Stringthe TensorFlow master to use.
True, it will take care of initialization and recovery the underlying TensorFlow session. If
False, it will wait on a chief to initialize or recover the TensorFlow session.
checkpoint_dir: A string. Optional path to a directory where to restore variables.
Scaffoldused for gathering or building supportive ops. If not specified, a default one is created. It's used to finalize the graph.
hooks: Optional list of
chief_only_hooks: list of
SessionRunHookobjects. Activate these hooks if
is_chief==True, ignore otherwise.
save_checkpoint_secs: The frequency, in seconds, that a checkpoint is saved using a default checkpoint saver. If
save_checkpoint_secsis set to
None, then the default checkpoint saver isn't used.
save_summaries_steps: The frequency, in number of global steps, that the summaries are written to disk using a default summary saver. If
save_summaries_stepsis set to
None, then the default summary saver isn't used.
config: an instance of
tf.ConfigProtoproto used to configure the session. It's the
configargument of constructor of