View source on GitHub |
Initializes accelerators and communication fabrics for DTensor.
tf.experimental.dtensor.initialize_accelerator_system(
device_type: Optional[str] = None,
enable_coordination_service: Optional[bool] = True
) -> str
DTensor configures TensorFlow to run in the local mode or multi-client mode.
- In local mode, a mesh can only use devices attached to the current process.
- In multi-client mode, a mesh can span across devices from multiple clients.
If DTENSOR_JOBS
is non-empty, DTensor configures TensorFlow to run in the
multi-client mode using the distributed runtime. In multi-client mode devices
on different clients can communicate with each other.
The following environment variables controls the behavior of this function.
DTENSOR_JOBS
: string, a comma separated list. Each item in the list is of format{hostname}:{port}
. If empty, DTensor runs in the local mode. Examples of validDTENSOR_JOBS
values:- 4 clients on localhost:
localhost:10000,localhost:10001,localhost:10002,localhost:10003
- 2 clients on host1, 2 clients on host2
host1:10000,host1:10001,host2:10000,host2:10003
If the hostnames are BNS addresses, the items must be sorted in alphabetical order.
- 4 clients on localhost:
DTENSOR_CLIENT_ID
: integer, between0
tonum_clients - 1
, to identify the client id of the current process. The default value is0
.DTENSOR_JOB_NAME
: string, a string for the name of the TensorFlow job. The job name controls the job name section of the TensorFlow DeviceSpecs, e.g.,job:worker
in/job:worker/replica:0/task:0/device:TPU:0
when the job name isworker
. The default value islocalhost
in local mode, andworker
when in the multi-client mode. All DTensor clients within the same multi-client cluster share the same job name.DTENSOR_USE_PARALLEL_EXECUTOR
: string, with its value beingpw
to specify that the backend is Pathways, and TensorFlow otherwise.
Args | |
---|---|
device_type
|
Type of accelerator to use, can be CPU, GPU, or TPU. If None,
uses tf.experimental.dtensor.preferred_device_type() .
|
enable_coordination_service
|
If true, enable distributed coordination service to make sure that workers know the devices on each other, when there is more than 1 client. |
Returns | |
---|---|
device_type
|
the type of accelerator that was initialized. |