Interface de ligne de commande TFDS

TFDS CLI est un outil de ligne de commande qui fournit diverses commandes pour travailler facilement avec les ensembles de données TensorFlow.

Voir sur TensorFlow.org Exécuter dans Google Colab Voir la source sur GitHub Télécharger le cahier
Désactiver les journaux TF lors de l'importation
%%capture
%env TF_CPP_MIN_LOG_LEVEL=1  # Disable logs on TF import

Installation

L'outil CLI est installé avec tensorflow-datasets (ou tfds-nightly ).

pip install -q tfds-nightly
tfds --version

Pour la liste de toutes les commandes CLI :

tfds --help
usage: tfds [-h] [--helpfull] [--version] {build,new} ...

Tensorflow Datasets CLI tool

optional arguments:
  -h, --help   show this help message and exit
  --helpfull   show full help message and exit
  --version    show program's version number and exit

command:
  {build,new}
    build      Commands for downloading and preparing datasets.
    new        Creates a new dataset directory from the template.

tfds new : Implémentation d'un nouveau jeu de données

Cette commande vous aidera à démarrer l'écriture de votre nouvel ensemble de données Python en créant un répertoire <dataset_name>/ contenant les fichiers d'implémentation par défaut.

Usage:

tfds new my_dataset
2022-02-07 04:04:10.397902: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
Dataset generated at /tmpfs/src/temp/docs/my_dataset
You can start searching `TODO(my_dataset)` to complete the implementation.
Please check https://www.tensorflow.org/datasets/add_dataset for additional details.

Créera:

ls -1 my_dataset/
__init__.py
checksums.tsv
dummy_data/
my_dataset.py
my_dataset_test.py

Consultez notre guide d'écriture d'ensemble de données pour plus d'informations.

Options disponibles:

tfds new --help
usage: tfds new [-h] [--helpfull] [--dir DIR] dataset_name

positional arguments:
  dataset_name  Name of the dataset to be created (in snake_case)

optional arguments:
  -h, --help    show this help message and exit
  --helpfull    show full help message and exit
  --dir DIR     Path where the dataset directory will be created. Defaults to
                current directory.

tfds build : Télécharger et préparer un jeu de données

Utilisez tfds build <my_dataset> pour générer un nouvel ensemble de données. <my_dataset> peut être :

  • Un chemin d'accès au dossier dataset/ ou au fichier dataset.py (vide pour le répertoire actuel) :

    • tfds build datasets/my_dataset/
    • cd datasets/my_dataset/ && tfds build
    • cd datasets/my_dataset/ && tfds build my_dataset
    • cd datasets/my_dataset/ && tfds build my_dataset.py
  • Un jeu de données enregistré :

    • tfds build mnist
    • tfds build my_dataset --imports my_project.datasets

Options disponibles:

tfds build --help
usage: tfds build [-h] [--helpfull]
                  [--datasets DATASETS_KEYWORD [DATASETS_KEYWORD ...]]
                  [--overwrite]
                  [--max_examples_per_split [MAX_EXAMPLES_PER_SPLIT]]
                  [--data_dir DATA_DIR] [--download_dir DOWNLOAD_DIR]
                  [--extract_dir EXTRACT_DIR] [--manual_dir MANUAL_DIR]
                  [--add_name_to_manual_dir] [--config CONFIG]
                  [--config_idx CONFIG_IDX] [--imports IMPORTS]
                  [--register_checksums] [--force_checksums_validation]
                  [--beam_pipeline_options BEAM_PIPELINE_OPTIONS]
                  [--file_format FILE_FORMAT]
                  [--exclude_datasets EXCLUDE_DATASETS]
                  [--experimental_latest_version]
                  [datasets [datasets ...]]

positional arguments:
  datasets              Name(s) of the dataset(s) to build. Default to current
                        dir. See https://www.tensorflow.org/datasets/cli for
                        accepted values.

optional arguments:
  -h, --help            show this help message and exit
  --helpfull            show full help message and exit
  --datasets DATASETS_KEYWORD [DATASETS_KEYWORD ...]
                        Datasets can also be provided as keyword argument.

Debug & tests:
  --pdb Enter post-mortem debugging mode if an exception is raised.

  --overwrite           Delete pre-existing dataset if it exists.
  --max_examples_per_split [MAX_EXAMPLES_PER_SPLIT]
                        When set, only generate the first X examples (default
                        to 1), rather than the full dataset.If set to 0, only
                        execute the `_split_generators` (which download the
                        original data), but skip `_generator_examples`

Paths:
  --data_dir DATA_DIR   Where to place datasets. Default to
                        `~/tensorflow_datasets/` or `TFDS_DATA_DIR`
                        environement variable.
  --download_dir DOWNLOAD_DIR
                        Where to place downloads. Default to
                        `<data_dir>/downloads/`.
  --extract_dir EXTRACT_DIR
                        Where to extract files. Default to
                        `<download_dir>/extracted/`.
  --manual_dir MANUAL_DIR
                        Where to manually download data (required for some
                        datasets). Default to `<download_dir>/manual/`.
  --add_name_to_manual_dir
                        If true, append the dataset name to the `manual_dir`
                        (e.g. `<download_dir>/manual/<dataset_name>/`. Useful
                        to avoid collisions if many datasets are generated.

Generation:
  --config CONFIG, -c CONFIG
                        Config name to build. Build all configs if not set.
  --config_idx CONFIG_IDX
                        Config id to build
                        (`builder_cls.BUILDER_CONFIGS[config_idx]`). Mutually
                        exclusive with `--config`.
  --imports IMPORTS, -i IMPORTS
                        Comma separated list of module to import to register
                        datasets.
  --register_checksums  If True, store size and checksum of downloaded files.
  --force_checksums_validation
                        If True, raise an error if the checksums are not
                        found.
  --beam_pipeline_options BEAM_PIPELINE_OPTIONS
                        A (comma-separated) list of flags to pass to
                        `PipelineOptions` when preparing with Apache Beam.
                        (see:
                        https://www.tensorflow.org/datasets/beam_datasets).
                        Example: `--beam_pipeline_options=job_name=my-
                        job,project=my-project`
  --file_format FILE_FORMAT
                        File format to which generate the tf-examples.
                        Available values: ['tfrecord', 'riegeli'] (see
                        `tfds.core.FileFormat`).

Automation:
  Used by automated scripts.

  --exclude_datasets EXCLUDE_DATASETS
                        If set, generate all datasets except the one defined
                        here. Comma separated list of datasets to exclude.
  --experimental_latest_version
                        Build the latest Version(experiments=...) available
                        rather than default version.