{ }
View source on GitHub |
Generates a tf.data.Dataset
from text files in a directory.
tf.keras.preprocessing.text_dataset_from_directory(
directory,
labels='inferred',
label_mode='int',
class_names=None,
batch_size=32,
max_length=None,
shuffle=True,
seed=None,
validation_split=None,
subset=None,
follow_links=False,
verbose=True
)
Used in the notebooks
Used in the tutorials |
---|
If your directory structure is:
main_directory/
...class_a/
......a_text_1.txt
......a_text_2.txt
...class_b/
......b_text_1.txt
......b_text_2.txt
Then calling text_dataset_from_directory(main_directory,
labels='inferred')
will return a tf.data.Dataset
that yields batches of
texts from the subdirectories class_a
and class_b
, together with labels
0 and 1 (0 corresponding to class_a
and 1 corresponding to class_b
).
Only .txt
files are supported at this time.
Returns |
---|
A tf.data.Dataset
object.
- If
label_mode
isNone
, it yieldsstring
tensors of shape(batch_size,)
, containing the contents of a batch of text files. - Otherwise, it yields a tuple
(texts, labels)
, wheretexts
has shape(batch_size,)
andlabels
follows the format described below.
Rules regarding labels format:
- if
label_mode
isint
, the labels are anint32
tensor of shape(batch_size,)
. - if
label_mode
isbinary
, the labels are afloat32
tensor of 1s and 0s of shape(batch_size, 1)
. - if
label_mode
iscategorical
, the labels are afloat32
tensor of shape(batch_size, num_classes)
, representing a one-hot encoding of the class index.