TFDS CLI

TFDS CLI adalah alat baris perintah yang menyediakan berbagai perintah untuk bekerja dengan mudah dengan Kumpulan Data TensorFlow.

Lihat di TensorFlow.org Jalankan di Google Colab Lihat sumber di GitHub Unduh buku catatan
Nonaktifkan log TF saat impor
%%capture
%env TF_CPP_MIN_LOG_LEVEL=1  # Disable logs on TF import

Instalasi

Alat CLI diinstal dengan tensorflow-datasets (atau tfds-nightly ).

pip install -q tfds-nightly
tfds --version

Untuk daftar semua perintah CLI:

tfds --help
usage: tfds [-h] [--helpfull] [--version] {build,new} ...

Tensorflow Datasets CLI tool

optional arguments:
  -h, --help   show this help message and exit
  --helpfull   show full help message and exit
  --version    show program's version number and exit

command:
  {build,new}
    build      Commands for downloading and preparing datasets.
    new        Creates a new dataset directory from the template.

tfds new : Menerapkan Dataset baru

Perintah ini akan membantu Anda memulai menulis dataset Python baru Anda dengan membuat <dataset_name>/ yang berisi file implementasi default.

Penggunaan:

tfds new my_dataset
2022-02-07 04:04:10.397902: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
Dataset generated at /tmpfs/src/temp/docs/my_dataset
You can start searching `TODO(my_dataset)` to complete the implementation.
Please check https://www.tensorflow.org/datasets/add_dataset for additional details.

Akan membuat:

ls -1 my_dataset/
__init__.py
checksums.tsv
dummy_data/
my_dataset.py
my_dataset_test.py

Lihat panduan dataset penulisan kami untuk info lebih lanjut.

Pilihan yang tersedia:

tfds new --help
usage: tfds new [-h] [--helpfull] [--dir DIR] dataset_name

positional arguments:
  dataset_name  Name of the dataset to be created (in snake_case)

optional arguments:
  -h, --help    show this help message and exit
  --helpfull    show full help message and exit
  --dir DIR     Path where the dataset directory will be created. Defaults to
                current directory.

tfds build : Unduh dan siapkan kumpulan data

Gunakan tfds build <my_dataset> untuk menghasilkan kumpulan data baru. <my_dataset> dapat berupa:

  • Jalur ke folder dataset/ atau file dataset.py (kosong untuk direktori saat ini):

    • tfds build datasets/my_dataset/
    • cd datasets/my_dataset/ && tfds build
    • cd datasets/my_dataset/ && tfds build my_dataset
    • cd datasets/my_dataset/ && tfds build my_dataset.py
  • Kumpulan data terdaftar:

    • tfds build mnist
    • tfds build my_dataset --imports my_project.datasets

Pilihan yang tersedia:

tfds build --help
usage: tfds build [-h] [--helpfull]
                  [--datasets DATASETS_KEYWORD [DATASETS_KEYWORD ...]]
                  [--overwrite]
                  [--max_examples_per_split [MAX_EXAMPLES_PER_SPLIT]]
                  [--data_dir DATA_DIR] [--download_dir DOWNLOAD_DIR]
                  [--extract_dir EXTRACT_DIR] [--manual_dir MANUAL_DIR]
                  [--add_name_to_manual_dir] [--config CONFIG]
                  [--config_idx CONFIG_IDX] [--imports IMPORTS]
                  [--register_checksums] [--force_checksums_validation]
                  [--beam_pipeline_options BEAM_PIPELINE_OPTIONS]
                  [--file_format FILE_FORMAT]
                  [--exclude_datasets EXCLUDE_DATASETS]
                  [--experimental_latest_version]
                  [datasets [datasets ...]]

positional arguments:
  datasets              Name(s) of the dataset(s) to build. Default to current
                        dir. See https://www.tensorflow.org/datasets/cli for
                        accepted values.

optional arguments:
  -h, --help            show this help message and exit
  --helpfull            show full help message and exit
  --datasets DATASETS_KEYWORD [DATASETS_KEYWORD ...]
                        Datasets can also be provided as keyword argument.

Debug & tests:
  --pdb Enter post-mortem debugging mode if an exception is raised.

  --overwrite           Delete pre-existing dataset if it exists.
  --max_examples_per_split [MAX_EXAMPLES_PER_SPLIT]
                        When set, only generate the first X examples (default
                        to 1), rather than the full dataset.If set to 0, only
                        execute the `_split_generators` (which download the
                        original data), but skip `_generator_examples`

Paths:
  --data_dir DATA_DIR   Where to place datasets. Default to
                        `~/tensorflow_datasets/` or `TFDS_DATA_DIR`
                        environement variable.
  --download_dir DOWNLOAD_DIR
                        Where to place downloads. Default to
                        `<data_dir>/downloads/`.
  --extract_dir EXTRACT_DIR
                        Where to extract files. Default to
                        `<download_dir>/extracted/`.
  --manual_dir MANUAL_DIR
                        Where to manually download data (required for some
                        datasets). Default to `<download_dir>/manual/`.
  --add_name_to_manual_dir
                        If true, append the dataset name to the `manual_dir`
                        (e.g. `<download_dir>/manual/<dataset_name>/`. Useful
                        to avoid collisions if many datasets are generated.

Generation:
  --config CONFIG, -c CONFIG
                        Config name to build. Build all configs if not set.
  --config_idx CONFIG_IDX
                        Config id to build
                        (`builder_cls.BUILDER_CONFIGS[config_idx]`). Mutually
                        exclusive with `--config`.
  --imports IMPORTS, -i IMPORTS
                        Comma separated list of module to import to register
                        datasets.
  --register_checksums  If True, store size and checksum of downloaded files.
  --force_checksums_validation
                        If True, raise an error if the checksums are not
                        found.
  --beam_pipeline_options BEAM_PIPELINE_OPTIONS
                        A (comma-separated) list of flags to pass to
                        `PipelineOptions` when preparing with Apache Beam.
                        (see:
                        https://www.tensorflow.org/datasets/beam_datasets).
                        Example: `--beam_pipeline_options=job_name=my-
                        job,project=my-project`
  --file_format FILE_FORMAT
                        File format to which generate the tf-examples.
                        Available values: ['tfrecord', 'riegeli'] (see
                        `tfds.core.FileFormat`).

Automation:
  Used by automated scripts.

  --exclude_datasets EXCLUDE_DATASETS
                        If set, generate all datasets except the one defined
                        here. Comma separated list of datasets to exclude.
  --experimental_latest_version
                        Build the latest Version(experiments=...) available
                        rather than default version.