Missed TensorFlow Dev Summit? Check out the video playlist. Watch recordings


View source on GitHub

Shared interface of all model server runners.

Model server runner is responsible for managing the model server job and relevant resources in the serving platform. For example, model server runner for kubernetes will launch a Pod of model server with required resources allocated, and tear down all the kubernetes resources once infra validation is done. Note that model server runner does not interact with model server app.

Model server job have 5 states: Initial, Scheduled, Running, Aborted, or Succeeded. Each state transition is depicted in the diagram below.

           |  Initial  |
                 | Start()
        +--+ Scheduled |
        |  +-----+-----+
        |        | WaitUntilRunning()
        |  +-----v-----+
        +--+  Running  +--+
        |  +-----------+  | Stop()
        |                 |
  +-----v-----+     +-----v-----+
  |  Aborted  |     | Succeeded |
  +-----------+     +-----------+

At any step, the job can be aborted in the serving platform. Model server runner will NOT recover a job from failure (even if it can) and regard the abortion as a validation failure.

All the infra validation logic (waiting for model loaded, sending requests, measuring metrics, etc.) will happen when model server job has reached Running state. This is not a scope of model server runner work.

Depending on the serving platform, some of the states might be the same. For example, in a GCP cloud AI prediction service we have a global model server instance running, which makes Scheduled state and Running state indistinguishable. In such case, WaitUntilRunning() action will be a no-op.



View source


Get an endpoint to the model server to connect to.

Endpoint will be available after the model server job has reached the Running state.


  • AssertionError: if runner hasn't reached the Running state.


View source


Start the model server in non-blocking manner.

Start() will transition the job state from Initial to Scheduled. Serving platform will turn the job into Running state in the future.

In Start(), model server runner should prepare the resources model server requires including config files, environment variables, volumes, proper authentication, computing resource allocation, etc.. Cleanup for the resources does not happen automatically, and you should call Stop() to do that if you have ever called Start().

It is not allowed to run Start() twice. If you need to restart the job, you should create another model server runner instance.


View source


Stop the model server in blocking manner.

Model server job would be gracefully stopped once infra validation logic is done. Here is the place you need to cleanup every resources you've created in the Start(). It is recommended not to raise error during the Stop() as it will usually be called in the finally block.

Stop() is always called if Start() is ever called, even if error has been raised during the Start() without completing it. Stop() implementation should take into account the case where Start() has not been called.


View source


Wait until model server job is running.

When this method is returned without error, the model server job is in the Running state where you can perform all the infra validation logic. It does not guarantee that model server job would remain in the Running state forever, (e.g. preemption could happen in some serving platform) and any kind of infra validation logic failure can be caused from model server job not being in the Running state. Still, it is a validation failure and we blame model for this.


  • deadline: A deadline time in UTC timestamp (in seconds).


Whether the model is available or not.