TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

tfds.testing.mock_data
bookmark_border Stay organized with collections Save and categorize content based on your preferences.

Mock tfds to generate random data.

@contextlib.contextmanager
tfds.testing.mock_data(
    num_examples: int = 1,
    num_sub_examples: int = 1,
    max_value: Optional[int] = None,
    *,
    policy: MockPolicy = tfds.testing.MockPolicy.AUTO,
    as_dataset_fn: Optional[Callable[..., tf.data.Dataset]] = None,
    data_dir: Optional[str] = None,
    mock_array_record_data_source: Optional[PickableDataSourceMock] = None
) -> Iterator[None]

Usage

Usage (automated):

with tfds.testing.mock_data(num_examples=5):
  ds = tfds.load('some_dataset', split='train')

  for ex in ds:  # ds will yield randomly generated examples.
    ex

All calls to tfds.load/tfds.data_source within the context manager then return deterministic mocked data.

Usage (manual):

For more control over the generated examples, you can manually overwrite the DatasetBuilder._as_dataset method:

def as_dataset(self, *args, **kwargs):
  return tf.data.Dataset.from_generator(
      lambda: ({
          'image': np.ones(shape=(28, 28, 1), dtype=np.uint8),
          'label': i % 10,
      } for i in range(num_examples)),
      output_types=self.info.features.dtype,
      output_shapes=self.info.features.shape,
  )

with mock_data(as_dataset_fn=as_dataset):
  ds = tfds.load('some_dataset', split='train')

  for ex in ds:  # ds will yield the fake data example of 'as_dataset'.
    ex

Policy

For improved results, you can copy the true metadata files (dataset_info.json, label.txt, vocabulary files) in data_dir/dataset_name/version. This will allow the mocked dataset to use the true metadata computed during generation (split names,...).

If metadata files are not found, then info from the original class will be used, but the features computed during generation won't be available (e.g. unknown split names, so any splits are accepted).

Miscellaneous

The examples are deterministically generated. Train and test split will yield the same examples.
The actual examples will be randomly generated using builder.info.features.get_tensor_info().
Download and prepare step will always be a no-op.
Warning: info.split['train'].num_examples won't match len(list(ds_train))

Some of those points could be improved. If you have suggestions, issues with this functions, please open a new issue on our Github.

Args
`num_examples`	Number of fake example to generate.
`num_sub_examples`	Number of examples to generate in nested Dataset features.
`max_value`	The maximum value present in generated tensors; if max_value is None or it is set to 0, then random numbers are generated from the range from 0 to 255.
`policy`	Strategy to use to generate the fake examples. See `tfds.testing.MockPolicy`.
`as_dataset_fn`	If provided, will replace the default random example generator. This function mock the `FileAdapterBuilder._as_dataset`
`data_dir`	Folder containing the metadata file (searched in `data_dir/dataset_name/version`). Overwrite `data_dir` kwargs from `tfds.load`. Used in `MockPolicy.USE_FILES` mode.
`mock_array_record_data_source`	Overwrite a mock for the underlying ArrayRecord data source if it is used.

Yields
None

tfds.testing.mock_data bookmark_borderbookmark Stay organized with collections Save and categorize content based on your preferences.

Usage

Policy

Miscellaneous

Args

Yields

tfds.testing.mock_data
bookmark_border Stay organized with collections Save and categorize content based on your preferences.