TFDS CLI is a command-line tool that provides various commands to easily work with TensorFlow Datasets.
![]() |
![]() |
![]() |
![]() |
Disable TF logs on import
%%capture
%env TF_CPP_MIN_LOG_LEVEL=1 # Disable logs on TF import
Installation
The CLI tool is installed with tensorflow-datasets
(or tfds-nightly
).
pip install -q tfds-nightly
tfds --version
For the list of all CLI commands:
tfds --help
2022-12-14 12:08:08.526375: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory 2022-12-14 12:08:08.526480: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory 2022-12-14 12:08:08.526502: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. usage: tfds [-h] [--helpfull] [--version] {build,new} ... Tensorflow Datasets CLI tool optional arguments: -h, --help show this help message and exit --helpfull show full help message and exit --version show program's version number and exit command: {build,new} build Commands for downloading and preparing datasets. new Creates a new dataset directory from the template.
tfds new
: Implementing a new Dataset
This command will help you kickstart writing your new Python dataset by creating
a <dataset_name>/
directory containing default implementation files.
Usage:
tfds new my_dataset
2022-12-14 12:08:11.608531: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory 2022-12-14 12:08:11.608645: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory 2022-12-14 12:08:11.608669: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 2022-12-14 12:08:13.028620: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:267] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected Traceback (most recent call last): File "/tmpfs/src/tf_docs_env/bin/tfds", line 8, in <module> sys.exit(launch_cli()) File "/tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_datasets/scripts/cli/main.py", line 104, in launch_cli app.run(main, flags_parser=_parse_flags) File "/tmpfs/src/tf_docs_env/lib/python3.9/site-packages/absl/app.py", line 308, in run _run_main(main, args) File "/tmpfs/src/tf_docs_env/lib/python3.9/site-packages/absl/app.py", line 254, in _run_main sys.exit(main(argv)) File "/tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_datasets/scripts/cli/main.py", line 99, in main args.subparser_fn(args) File "/tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_datasets/scripts/cli/new.py", line 65, in _create_dataset_files create_dataset_files( File "/tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_datasets/scripts/cli/new.py", line 90, in create_dataset_files _create_dataset_tags(info) File "/tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_datasets/scripts/cli/new.py", line 172, in _create_dataset_tags dataset_metadata.valid_tags_with_comments()) File "/tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_datasets/core/dataset_metadata.py", line 69, in valid_tags_with_comments line for line in _get_valid_tags_text().split("\n") File "/tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_datasets/core/dataset_metadata.py", line 58, in _get_valid_tags_text return path.read_text("utf-8") File "/tmpfs/src/tf_docs_env/lib/python3.9/site-packages/etils/epath/abstract_path.py", line 144, in read_text with self.open('r', encoding=encoding) as f: File "/tmpfs/src/tf_docs_env/lib/python3.9/site-packages/etils/epath/gpath.py", line 226, in open gfile = self._backend.open(self._path_str, mode) File "/tmpfs/src/tf_docs_env/lib/python3.9/site-packages/etils/epath/backend.py", line 104, in open return open(path, mode, encoding=encoding) FileNotFoundError: [Errno 2] No such file or directory: '/tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_datasets/core/valid_tags.txt'
tfds new my_dataset
will create:
ls -1 my_dataset/
CITATIONS.bib README.md my_dataset.py my_dataset_test.py
An optional flag --data_format
can be used to generate format-specific dataset builders (e.g., conll
). If no data format is given, it will generate a template for a standard <a href="https://www.tensorflow.org/datasets/api_docs/python/tfds/core/GeneratorBasedBuilder"><code>tfds.core.GeneratorBasedBuilder</code></a>
.
Refer to the documentation for details on the available format-specific dataset builders.
See our writing dataset guide for more info.
Available options:
tfds new --help
2022-12-14 12:08:14.867979: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory 2022-12-14 12:08:14.868082: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory 2022-12-14 12:08:14.868104: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. usage: tfds new [-h] [--helpfull] [--data_format {standard,conll,conllu}] [--dir DIR] dataset_name positional arguments: dataset_name Name of the dataset to be created (in snake_case) optional arguments: -h, --help show this help message and exit --helpfull show full help message and exit --data_format {standard,conll,conllu} Optional format of the input data, which is used to generate a format-specific template. --dir DIR Path where the dataset directory will be created. Defaults to current directory.
tfds build
: Download and prepare a dataset
Use tfds build <my_dataset>
to generate a new dataset. <my_dataset>
can be:
A path to
dataset/
folder ordataset.py
file (empty for current directory):tfds build datasets/my_dataset/
cd datasets/my_dataset/ && tfds build
cd datasets/my_dataset/ && tfds build my_dataset
cd datasets/my_dataset/ && tfds build my_dataset.py
A registered dataset:
tfds build mnist
tfds build my_dataset --imports my_project.datasets
Available options:
tfds build --help
2022-12-14 12:08:17.925527: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory 2022-12-14 12:08:17.925628: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory 2022-12-14 12:08:17.925650: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. usage: tfds build [-h] [--helpfull] [--datasets DATASETS_KEYWORD [DATASETS_KEYWORD ...]] [--overwrite] [--fail_if_exists] [--max_examples_per_split [MAX_EXAMPLES_PER_SPLIT]] [--data_dir DATA_DIR] [--download_dir DOWNLOAD_DIR] [--extract_dir EXTRACT_DIR] [--manual_dir MANUAL_DIR] [--add_name_to_manual_dir] [--download_only] [--config CONFIG] [--config_idx CONFIG_IDX] [--imports IMPORTS] [--register_checksums] [--force_checksums_validation] [--beam_pipeline_options BEAM_PIPELINE_OPTIONS] [--file_format FILE_FORMAT] [--publish_dir PUBLISH_DIR] [--skip_if_published] [--exclude_datasets EXCLUDE_DATASETS] [--experimental_latest_version] [datasets ...] positional arguments: datasets Name(s) of the dataset(s) to build. Default to current dir. See https://www.tensorflow.org/datasets/cli for accepted values. optional arguments: -h, --help show this help message and exit --helpfull show full help message and exit --datasets DATASETS_KEYWORD [DATASETS_KEYWORD ...] Datasets can also be provided as keyword argument. Debug & tests: --pdb Enter post-mortem debugging mode if an exception is raised. --overwrite Delete pre-existing dataset if it exists. --fail_if_exists Fails the program if there is a pre-existing dataset. --max_examples_per_split [MAX_EXAMPLES_PER_SPLIT] When set, only generate the first X examples (default to 1), rather than the full dataset.If set to 0, only execute the `_split_generators` (which download the original data), but skip `_generator_examples` Paths: --data_dir DATA_DIR Where to place datasets. Default to `~/tensorflow_datasets/` or `TFDS_DATA_DIR` environement variable. --download_dir DOWNLOAD_DIR Where to place downloads. Default to `<data_dir>/downloads/`. --extract_dir EXTRACT_DIR Where to extract files. Default to `<download_dir>/extracted/`. --manual_dir MANUAL_DIR Where to manually download data (required for some datasets). Default to `<download_dir>/manual/`. --add_name_to_manual_dir If true, append the dataset name to the `manual_dir` (e.g. `<download_dir>/manual/<dataset_name>/`. Useful to avoid collisions if many datasets are generated. Generation: --download_only If True, download all files but do not prepare the dataset. Uses the checksum.tsv to find out what to download. Therefore, this does not work in combination with --register_checksums. --config CONFIG, -c CONFIG Config name to build. Build all configs if not set. Can also be a json of the kwargs forwarded to the config `__init__` (for custom configs). --config_idx CONFIG_IDX Config id to build (`builder_cls.BUILDER_CONFIGS[config_idx]`). Mutually exclusive with `--config`. --imports IMPORTS, -i IMPORTS Comma separated list of module to import to register datasets. --register_checksums If True, store size and checksum of downloaded files. --force_checksums_validation If True, raise an error if the checksums are not found. --beam_pipeline_options BEAM_PIPELINE_OPTIONS A (comma-separated) list of flags to pass to `PipelineOptions` when preparing with Apache Beam. (see: https://www.tensorflow.org/datasets/beam_datasets). Example: `--beam_pipeline_options=job_name=my- job,project=my-project` --file_format FILE_FORMAT File format to which generate the tf-examples. Available values: ['tfrecord', 'riegeli'] (see `tfds.core.FileFormat`). Publishing: Options for publishing successfully created datasets. --publish_dir PUBLISH_DIR Where to optionally publish the dataset after it has been generated successfully. Should be the root data dir under whichdatasets are stored. If unspecified, dataset will not be published --skip_if_published If the dataset with the same version and config is already published, then it will not be regenerated. Automation: Used by automated scripts. --exclude_datasets EXCLUDE_DATASETS If set, generate all datasets except the one defined here. Comma separated list of datasets to exclude. --experimental_latest_version Build the latest Version(experiments=...) available rather than default version.