The output types vary depending on the parameters. Examples:
builder = tfds.builder('imdb_reviews')
builder.download_and_prepare()
# Default parameters: Returns the dict of tf.data.Dataset
ds_all_dict = builder.as_dataset()
assert isinstance(ds_all_dict, dict)
print(ds_all_dict.keys()) # ==> ['test', 'train', 'unsupervised']
assert isinstance(ds_all_dict['test'], tf.data.Dataset)
# Each dataset (test, train, unsup.) consists of dictionaries
# {'label': <tf.Tensor: .. dtype=int64, numpy=1>,
# 'text': <tf.Tensor: .. dtype=string, numpy=b"I've watched the movie ..">}
# {'label': <tf.Tensor: .. dtype=int64, numpy=1>,
# 'text': <tf.Tensor: .. dtype=string, numpy=b'If you love Japanese ..'>}
# With as_supervised: tf.data.Dataset only contains (feature, label) tuples
ds_all_supervised = builder.as_dataset(as_supervised=True)
assert isinstance(ds_all_supervised, dict)
print(ds_all_supervised.keys()) # ==> ['test', 'train', 'unsupervised']
assert isinstance(ds_all_supervised['test'], tf.data.Dataset)
# Each dataset (test, train, unsup.) consists of tuples (text, label)
# (<tf.Tensor: ... dtype=string, numpy=b"I've watched the movie ..">,
# <tf.Tensor: ... dtype=int64, numpy=1>)
# (<tf.Tensor: ... dtype=string, numpy=b"If you love Japanese ..">,
# <tf.Tensor: ... dtype=int64, numpy=1>)
# Same as above plus requesting a particular split
ds_test_supervised = builder.as_dataset(as_supervised=True, split='test')
assert isinstance(ds_test_supervised, tf.data.Dataset)
# The dataset consists of tuples (text, label)
# (<tf.Tensor: ... dtype=string, numpy=b"I've watched the movie ..">,
# <tf.Tensor: ... dtype=int64, numpy=1>)
# (<tf.Tensor: ... dtype=string, numpy=b"If you love Japanese ..">,
# <tf.Tensor: ... dtype=int64, numpy=1>)
Args
split
Which split of the data to load (e.g. 'train', 'test',
['train', 'test'], 'train[80%:]',...). See our
split API guide. If
None, will return all splits in a Dict[Split, tf.data.Dataset].
batch_size
int, batch size. Note that variable-length features will be
0-padded if batch_size is set. Users that want more custom behavior
should use batch_size=None and use the tf.data API to construct a
custom pipeline. If batch_size == -1, will return feature dictionaries
of the whole dataset with tf.Tensors instead of a tf.data.Dataset.
shuffle_files
bool, whether to shuffle the input files. Defaults to
False.
decoders
Nested dict of Decoder objects which allow to customize the
decoding. The structure should match the feature structure, but only
customized feature keys need to be present. See the
guide
for more info.
read_config
tfds.ReadConfig, Additional options to configure the input
pipeline (e.g. seed, num parallel reads,...).
as_supervised
bool, if True, the returned tf.data.Dataset will have
a 2-tuple structure (input, label) according to
builder.info.supervised_keys. If False, the default, the returned
tf.data.Dataset will have a dictionary with all the features.
Returns
tf.data.Dataset, or if split=None, dict<key: tfds.Split, value:
tfds.data.Dataset>.
If batch_size is -1, will return feature dictionaries containing
the entire dataset in tf.Tensors instead of a tf.data.Dataset.