View source on GitHub |
Connects to the given cluster.
tf.config.experimental_connect_to_cluster(
cluster_spec_or_resolver,
job_name='localhost',
task_index=0,
protocol=None,
make_master_device_default=True,
cluster_device_filters=None
)
Will make devices on the cluster available to use. Note that calling this more than once will work, but will invalidate any tensor handles on the old remote devices.
If the given local job name is not present in the cluster specification, it will be automatically added, using an unused port on the localhost.
Device filters can be specified to isolate groups of remote tasks to avoid undesired accesses between workers. Workers accessing resources or launching ops / functions on filtered remote devices will result in errors (unknown devices). For any remote task, if no device filter is present, all cluster devices will be visible; if any device filter is specified, it can only see devices matching at least one filter. Devices on the task itself are always visible. Device filters can be particially specified.
For example, for a cluster set up for parameter server training, the following device filters might be specified:
cdf = tf.config.experimental.ClusterDeviceFilters()
# For any worker, only the devices on PS nodes and itself are visible
for i in range(num_workers):
cdf.set_device_filters('worker', i, ['/job:ps'])
# Similarly for any ps, only the devices on workers and itself are visible
for i in range(num_ps):
cdf.set_device_filters('ps', i, ['/job:worker'])
tf.config.experimental_connect_to_cluster(cluster_def,
cluster_device_filters=cdf)