tf.data.experimental.TFRecordWriter

TensorFlow 2 version View source on GitHub

Writes data to a TFRecord file.

To write a dataset to a single TFRecord file:

dataset = ... # dataset to be written
writer = tf.data.experimental.TFRecordWriter(PATH)
writer.write(dataset)

To shard a dataset across multiple TFRecord files:

dataset = ... # dataset to be written

def reduce_func(key, dataset):
  filename = tf.strings.join([PATH_PREFIX, tf.strings.as_string(key)])
  writer = tf.data.experimental.TFRecordWriter(filename)
  writer.write(dataset.map(lambda _, x: x))
  return tf.data.Dataset.from_tensors(filename)

dataset = dataset.enumerate()
dataset = dataset.apply(tf.data.experimental.group_by_window(
  lambda i, _: i % NUM_SHARDS, reduce_func, tf.int64.max
))

Methods

write

View source

Returns a tf.Operation to write a dataset to a file.

Args
dataset a tf.data.Dataset whose elements are to be written to a file

Returns
A tf.Operation that, when run, writes contents of dataset to a file.