tf.data.experimental.service.register_dataset

Registers a dataset with the tf.data service.

register_dataset registers a dataset with the tf.data service so that datasets can be created later with tf.data.experimental.service.from_dataset_id. This is useful when the dataset is registered by one process, then used in another process. When the same process is both registering and reading from the dataset, it is simpler to use tf.data.experimental.service.distribute instead.

If the dataset is already registered with the tf.data service, register_dataset returns the already-registered dataset's id.

dispatcher = tf.data.experimental.service.DispatchServer()
dispatcher_address = dispatcher.target.split("://")[1]
worker = tf.data.experimental.service.WorkerServer(
    tf.data.experimental.service.WorkerConfig(
        dispatcher_address=dispatcher_address))
dataset = tf.data.Dataset.range(10)
dataset_id = tf.data.experimental.service.register_dataset(
    dispatcher.target, dataset)
dataset = tf.data.experimental.service.from_dataset_id(
    processing_mode="parallel_epochs",
    service=dispatcher.target,
    dataset_id=dataset_id,
    element_spec=dataset.element_spec)
print(list(dataset.as_numpy_iterator()))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

service A string or a tuple indicating how to connect to the tf.data service. If it's a string, it should be in the format [<protocol>://]<address>, where <address> identifies the dispatcher address and <protocol> can optionally be used to override the default protocol to use. If it's a tuple, it should be (protocol, address).
dataset A tf.data.Dataset to register with the tf.data service.

A scalar int64 tensor of the registered dataset's id.