View source on GitHub

Computes a device_assignment of a computation across a TPU topology.


Attempts to choose a compact grid of cores for locality.

Returns a DeviceAssignment that describes the cores in the topology assigned to each core of each replica.

computation_shape and computation_stride values should be powers of 2 for optimal packing.


  • topology: A Topology object that describes the TPU cluster topology. To obtain a TPU topology, evaluate the Tensor returned by initialize_system using Either a serialized TopologyProto or a Topology object may be passed. Note: you must evaluate the Tensor first; you cannot pass an unevaluated Tensor here.
  • computation_shape: A rank 1 int32 numpy array with size equal to the topology rank, describing the shape of the computation's block of cores. If None, the computation_shape is [1] * topology_rank.
  • computation_stride: A rank 1 int32 numpy array of size topology_rank, describing the inter-core spacing of the computation_shape cores in the TPU topology. If None, the computation_stride is [1] * topology_rank.
  • num_replicas: The number of computation replicas to run. The replicas will be packed into the free spaces of the topology.


A DeviceAssignment object, which describes the mapping between the logical cores in each computation replica and the physical cores in the TPU topology.


  • ValueError: If topology is not a valid Topology object.
  • ValueError: If computation_shape or computation_stride are not 1D int32 numpy arrays with shape [3] where all values are positive.
  • ValueError: If computation's replicas cannot fit into the TPU topology.