Missed TensorFlow World? Check out the recap. Learn more


Mock tfds to generate random data.


This function requires the true metadata files (dataset_info.json, label.txt, vocabulary files) to be stored in data_dir/dataset_name/version, as they would be for the true dataset. The actual examples will be randomly generated using builder.info.features.get_tensor_info(). Download and prepare step will be skipped.

Usage (automated):

with mock_data(num_examples=5):
  ds = tfds.load('some_dataset', split='train')

  for ex in ds:  # ds will yield randomly generated examples.

The examples will be deterministically generated. Train and test split will yield the same examples.

If you want more fine grain control over the generated examples, you can manually overwrite the DatasetBuilder._as_dataset method. Usage (manual):

def as_dataset(self, *args, **kwargs):
  return tf.data.Dataset.from_generator(
      lambda: ({
          'image': np.ones(shape=(28, 28, 1), dtype=np.uint8),
          'label': i % 10,
      } for i in range(num_examples)),

with mock_data(as_dataset_fn=as_dataset):
  ds = tfds.load('some_dataset', split='train')

  for ex in ds:  # ds will yield the fake data example of 'as_dataset'.


  • num_examples: int, the number of fake example to generate.
  • as_dataset_fn: if provided, will replace the default random example generator. This function mock the FileAdapterBuilder._as_dataset
  • data_dir: str, data_dir folder from where to load the metadata. Will overwrite data_dir kwargs from tfds.load.