![]() |
Generic text translation dataset created from manual directory.
Inherits From: DatasetBuilder
tfds.folder_dataset.TranslateFolder(
root_dir: str
)
The directory content should be as followed:
path/to/my_data/
lang1.train.txt
lang2.train.txt
lang1.test.txt
lang2.test.txt
...
Each files should have one example per line. Line order should match between files.
To use it:
builder = tfds.TranslateFolder(root_dir='path/to/my_data/')
print(builder.info) # Splits, num examples,... are automatically calculated
ds = builder.as_dataset(split='train', shuffle_files=True)
Args | |
---|---|
data_dir
|
directory to read/write data. Defaults to the value of the environment variable TFDS_DATA_DIR, if set, otherwise falls back to "~/tensorflow_datasets". |
config
|
tfds.core.BuilderConfig or str name, optional configuration
for the dataset that affects the data generated on disk. Different
builder_config s will have their own subdirectories and versions.
|
version
|
Optional version at which to load the dataset. An error is raised if specified version cannot be satisfied. Eg: '1.2.3', '1.2.*'. The special value "experimental_latest" will use the highest version, even if not default. This is not recommended unless you know what you are doing, as the version could be broken. |
Attributes | |
---|---|
builder_config
|
tfds.core.BuilderConfig for this builder.
|
canonical_version
|
|
data_dir
|
|
data_path
|
|
info
|
tfds.core.DatasetInfo for this builder.
|
release_notes
|
|
supported_versions
|
|
version
|
|
versions
|
Versions (canonical + availables), in preference order. |
Methods
as_dataset
as_dataset(
split=None, *, batch_size=None, shuffle_files=False, decoders=None,
read_config=None, as_supervised=False
)
Constructs a tf.data.Dataset
.
Callers must pass arguments as keyword arguments.
The output types vary depending on the parameters. Examples:
builder = tfds.builder('imdb_reviews')
builder.download_and_prepare()
# Default parameters: Returns the dict of tf.data.Dataset
ds_all_dict = builder.as_dataset()
assert isinstance(ds_all_dict, dict)
print(ds_all_dict.keys()) # ==> ['test', 'train', 'unsupervised']
assert isinstance(ds_all_dict['test'], tf.data.Dataset)
# Each dataset (test, train, unsup.) consists of dictionaries
# {'label': <tf.Tensor: .. dtype=int64, numpy=1>,
# 'text': <tf.Tensor: .. dtype=string, numpy=b"I've watched the movie ..">}
# {'label': <tf.Tensor: .. dtype=int64, numpy=1>,
# 'text': <tf.Tensor: .. dtype=string, numpy=b'If you love Japanese ..'>}
# With as_supervised: tf.data.Dataset only contains (feature, label) tuples
ds_all_supervised = builder.as_dataset(as_supervised=True)
assert isinstance(ds_all_supervised, dict)
print(ds_all_supervised.keys()) # ==> ['test', 'train', 'unsupervised']
assert isinstance(ds_all_supervised['test'], tf.data.Dataset)
# Each dataset (test, train, unsup.) consists of tuples (text, label)
# (<tf.Tensor: ... dtype=string, numpy=b"I've watched the movie ..">,
# <tf.Tensor: ... dtype=int64, numpy=1>)
# (<tf.Tensor: ... dtype=string, numpy=b"If you love Japanese ..">,
# <tf.Tensor: ... dtype=int64, numpy=1>)
# Same as above plus requesting a particular split
ds_test_supervised = builder.as_dataset(as_supervised=True, split='test')
assert isinstance(ds_test_supervised, tf.data.Dataset)
# The dataset consists of tuples (text, label)
# (<tf.Tensor: ... dtype=string, numpy=b"I've watched the movie ..">,
# <tf.Tensor: ... dtype=int64, numpy=1>)
# (<tf.Tensor: ... dtype=string, numpy=b"If you love Japanese ..">,
# <tf.Tensor: ... dtype=int64, numpy=1>)
Args | |
---|---|
split
|
Which split of the data to load (e.g. 'train' , 'test' ,
['train', 'test'] , 'train[80%:]' ,...). See our
split API guide.
If None , will return all splits in a Dict[Split, tf.data.Dataset] .
|
batch_size
|
int , batch size. Note that variable-length features will
be 0-padded if batch_size is set. Users that want more custom behavior
should use batch_size=None and use the tf.data API to construct a
custom pipeline. If batch_size == -1 , will return feature
dictionaries of the whole dataset with tf.Tensor s instead of a
tf.data.Dataset .
|
shuffle_files
|
bool , whether to shuffle the input files. Defaults to
False .
|
decoders
|
Nested dict of Decoder objects which allow to customize the
decoding. The structure should match the feature structure, but only
customized feature keys need to be present. See
the guide
for more info.
|
read_config
|
tfds.ReadConfig , Additional options to configure the
input pipeline (e.g. seed, num parallel reads,...).
|
as_supervised
|
bool , if True , the returned tf.data.Dataset
will have a 2-tuple structure (input, label) according to
builder.info.supervised_keys . If False , the default,
the returned tf.data.Dataset will have a dictionary with all the
features.
|
Returns | |
---|---|
tf.data.Dataset , or if split=None , dict<key: tfds.Split, value:
tfds.data.Dataset> .
If |
download_and_prepare
download_and_prepare(
**kwargs
)
Downloads and prepares dataset for reading.
Args | |
---|---|
download_dir
|
str , directory where downloaded files are stored.
Defaults to "~/tensorflow-datasets/downloads".
|
download_config
|
tfds.download.DownloadConfig , further configuration for
downloading and preparing dataset.
|
Raises | |
---|---|
IOError
|
if there is not enough disk space available. |
Class Variables | |
---|---|
BUILDER_CONFIGS | |
MANUAL_DOWNLOAD_INSTRUCTIONS |
None
|
RELEASE_NOTES | |
SUPPORTED_VERSIONS | |
VERSION |
tfds.core.Version
|
builder_configs | |
code_path | |
name |
'translate_folder'
|
url_infos |
None
|