Join us at TensorFlow World, Oct 28-31. Use code TF20 for 20% off select passes. Register now

tfds.testing.mock_data

Mock tfds to generate random data.

tfds.testing.mock_data(
    *args,
    **kwds
)

This function requires the true metadata files (dataset_info.json, label.txt, vocabulary files) to be stored in data_dir/dataset_name/version, as they would be for the true dataset. The actual examples will be randomly generated using builder.info.features.get_tensor_info(). Download and prepare step will be skipped.

Usage (automated):

with mock_data(num_examples=5):
  ds = tfds.load('some_dataset', split='train')

  for ex in ds:  # ds will yield randomly generated examples.
    ex

If you want more fine grain control over the generated examples, you can manually overwrite the DatasetBuilder._as_dataset method. Usage (manual):

def as_dataset(self, *args, **kwargs):
  return tf.data.Dataset.from_generator(
      lambda: ({
          'image': np.ones(shape=(28, 28, 1), dtype=np.uint8),
          'label': i % 10,
      } for i in range(num_examples)),
      output_types=self.info.features.dtype,
      output_shapes=self.info.features.shape,
  )

with mock_data(as_dataset_fn=as_dataset):
  ds = tfds.load('some_dataset', split='train')

  for ex in ds:  # ds will yield the fake data example of 'as_dataset'.
    ex

Args:

  • num_examples: int, the number of fake example to generate.
  • as_dataset_fn: if provided, will replace the default random example generator. This function mock the FileAdapterBuilder._as_dataset
  • data_dir: str, data_dir folder from where to load the metadata. Will overwrite data_dir kwargs from tfds.load.

Yields:

None