tf.keras.utils.text_dataset_from_directory

Generates a tf.data.Dataset from text files in a directory.

If your directory structure is:

main_directory/
...class_a/
......a_text_1.txt
......a_text_2.txt
...class_b/
......b_text_1.txt
......b_text_2.txt

Then calling text_dataset_from_directory(main_directory, labels='inferred') will return a tf.data.Dataset that yields batches of texts from the subdirectories class_a and class_b, together with labels 0 and 1 (0 corresponding to class_a and 1 corresponding to class_b).

Only .txt files are supported at this time.

directory Directory where the data is located. If labels is "inferred", it should contain subdirectories, each containing text files for a class. Otherwise, the directory structure is ignored.
labels Either "inferred" (labels are generated from the directory structure), None (no labels), or a list/tuple of integer labels of the same size as the number of text files found in the directory. Labels should be sorted according to the alphanumeric order of the text