tf.distribute.cluster_resolver.KubernetesClusterResolver

ClusterResolver for Kubernetes.

Inherits From: ClusterResolver

This is an implementation of cluster resolvers for Kubernetes. When given the the Kubernetes namespace and label selector for pods, we will retrieve the pod IP addresses of all running pods matching the selector, and return a ClusterSpec based on that information.

Usage example with tf.distribute.Strategy:

  # On worker 0
  cluster_resolver = KubernetesClusterResolver(
      {"worker": ["job-name=worker-cluster-a", "job-name=worker-cluster-b"]})
  cluster_resolver.task_type = "worker"
  cluster_resolver.task_id = 0
  strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy(
      cluster_resolver=cluster_resolver)

  # On worker 1
  cluster_resolver = KubernetesClusterResolver(
      {"worker": ["job-name=worker-cluster-a", "job-name=worker-cluster-b"]})
  cluster_resolver.task_type = "worker"
  cluster_resolver.task_id = 1
  strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy(
      cluster_resolver=cluster_resolver)

job_to_label_mapping A mapping of TensorFlow jobs to label selectors. This allows users to specify many TensorFlow jobs in one Cluster Resolver, and each job can have pods belong with different label selectors. For example, a sample mapping might be

{'worker': ['job-name=worker-cluster-a', 'job-name=worker-cluster-b'],
'ps': ['job-name=ps-1', 'job-name=ps-2']}

tf_server_port The port the TensorFlow server is listening on.
rpc_layer (Optional) The RPC layer TensorFlow should use to communicate between tasks in Kubernetes. Defaults to 'grpc'.
override_client The Kubernetes client (usually automatically retrieved using from kubernetes import client as k8sclient). If you pass this in, you are responsible for setting Kubernetes credentials manually.