Tune in to the first Women in ML Symposium this Tuesday, October 19 at 9am PST Register now

tf.data.TextLineDataset

Creates a Dataset comprising lines from one or more text files.

Inherits From: Dataset

Used in the notebooks

Used in the guide Used in the tutorials

The tf.data.TextLineDataset loads text from text files and creates a dataset where each line of the files becomes an element of the dataset.

For example, suppose we have 2 files "text_lines0.txt" and "text_lines1.txt" with the following lines:

with open('/tmp/text_lines0.txt', 'w') as f:
  f.write('the cow\n')
  f.write('jumped over\n')
  f.write('the moon\n')
with open('/tmp/text_lines1.txt', 'w') as f:
  f.write('jack and jill\n')
  f.write('went up\n')
  f.write('the hill\n')

We can construct a TextLineDataset from them as follows:

dataset = tf.data.TextLineDataset(['/tmp/text_lines0.txt',
                                   '/tmp/text_lines1.txt'])

The elements of the dataset are expected to be:

for element in dataset.as_numpy_iterator():
  print(element)
b'the cow'
b'jumped over'
b'the moon'
b'jack and jill'
b'went up'
b'the hill'

filenames A tf.string tensor or tf.data.Dataset containing one or more filenames.
compression_type (Optional.) A tf.string scalar evaluating to one of "" (no compression), "ZLIB", or "GZIP".