|View source on GitHub|
A manager for facilitating multiple in-progress evaluations.
tff.program.FederatedDataSource, aggregated_metrics_manager: Optional[release_manager.ReleaseManager[release_manager.ReleasableStructure, int]], create_state_manager_fn: Callable[[str],
tff.program.FileProgramStateManager], create_process_fn: Callable[[str], tuple[learning_process.LearningProcess, Optional[ release_manager.ReleaseManager[release_manager.ReleasableStructure, int]]]], cohort_size: int, duration: datetime.timedelta = datetime.timedelta(hours=24) )
This manager performs three responsbilities:
- Prepares, starts and tracks new evaluation loops. This involves creating
a new evaluation process and state manager for that process, adding
the new process to the list of tracked inprocess evaluations, and
creating a new
asyncio.Taskto run the evaluation loop.
- Record evaluations that have finished. This removes the evaluation from the list of in-progresss evaluations.
- If the program has restarted, load the most recent state of in-progress evaluations and restart each of the evaluations.
This class uses N + 1
tff.program.ProgramStateManagers to enable resumable
- The first state managers is for this class itself, and manages the list of
in-progress evaluations via two tensor objects. Tensor objects must be
used (rather than Python lists) because
tff.program.FileProgramStateManagerdoes not support state objects that change Python structure across versions (e.g. to load the next version, we must known its shape, but after a restart we don't know). Alternatively, we can use tensor or ndarray objects with shape
[None]to support changing shapes of structure's leaf elements.
- The next N state managers manage the cross-round metric aggregation for each evaluation process started. One for each evaluation process.
A callable that returns a
A callable that returns a 2-tuple of
An integer denoting the size of each evaluation round to
select from the iterator created from
record_evaluations_finished( train_round )
Removes evaluation for
train_round from the internal state manager.
||The integer round number of the training round that has finished evaluation.|
Load the most recent state and restart in-progress evaluations.
start_evaluation( train_round, start_timestamp_seconds, model_weights )
Starts a new evaluation loop for the incoming model_weights.
Creates an awaitable that blocks until all evaluations are finished.