Attend the Women in ML Symposium on December 7 Register now


Stay organized with collections Save and categorize content based on your preferences.

Creates a beam pipeline yielding TFDS examples.

Used in the notebooks

Used in the tutorials

Each dataset shard will be processed in parallel.


builder = tfds.builder('my_dataset')

_ = (
    | tfds.beam.ReadFromTFDS(builder, split='train')
    | beam.Map(tfds.as_numpy)
    | ...

Use tfds.as_numpy to convert each examples from tf.Tensor to numpy.

The split argument can make use of subsplits, eg 'train[:100]', only when the batch_size=None (in as_dataset_kwargs). Note: the order of the images will be different than when tfds.load(split='train[:100]') is used, but the same examples will be used.

pipeline beam pipeline (automatically set)
builder Dataset builder to load
split Split name to load (e.g. train+test, train)
**as_dataset_kwargs Arguments forwarded to builder.as_dataset.

The PCollection containing the TFDS examples.