tf.train.MonitoredTrainingSession(master='', is_chief=True, checkpoint_dir=None, scaffold=None, hooks=None, chief_only_hooks=None, save_checkpoint_secs=600, save_summaries_steps=100, config=None)

tf.train.MonitoredTrainingSession(master='', is_chief=True, checkpoint_dir=None, scaffold=None, hooks=None, chief_only_hooks=None, save_checkpoint_secs=600, save_summaries_steps=100, config=None)

See the guide: Training > Distributed execution

Creates a MonitoredSession for training.

For a chief, this utility sets proper session initializer/restorer. It also creates hooks related to checkpoint and summary saving. For workers, this utility sets proper session creator which waits for the chief to inialize/restore.

Args:

  • master: String the TensorFlow master to use.
  • is_chief: If True, it will take care of initialization and recovery the underlying TensorFlow session. If False, it will wait on a chief to initialize or recover the TensorFlow session.
  • checkpoint_dir: A string. Optional path to a directory where to restore variables.
  • scaffold: A Scaffold used for gathering or building supportive ops. If not specified, a default one is created. It's used to finalize the graph.
  • hooks: Optional list of SessionRunHook objects.
  • chief_only_hooks: list of SessionRunHook objects. Activate these hooks if is_chief==True, ignore otherwise.
  • save_checkpoint_secs: The frequency, in seconds, that a checkpoint is saved using a default checkpoint saver. If save_checkpoint_secs is set to None, then the default checkpoint saver isn't used.
  • save_summaries_steps: The frequency, in number of global steps, that the summaries are written to disk using a default summary saver. If save_summaries_steps is set to None, then the default summary saver isn't used.
  • config: an instance of tf.ConfigProto proto used to configure the session. It's the config argument of constructor of tf.Session.

Returns:

A MonitoredSession object.

Defined in tensorflow/python/training/monitored_session.py.