Help protect the Great Barrier Reef with TensorFlow on Kaggle Join Challenge

Creates a Dataset comprising lines from one or more text files.

Inherits From: Dataset

Used in the notebooks

Used in the guide Used in the tutorials

The loads text from text files and creates a dataset where each line of the files becomes an element of the dataset.

For example, suppose we have 2 files "text_lines0.txt" and "text_lines1.txt" with the following lines:

with open('/tmp/text_lines0.txt', 'w') as f:
  f.write('the cow\n')
  f.write('jumped over\n')
  f.write('the moon\n')
with open('/tmp/text_lines1.txt', 'w') as f:
  f.write('jack and jill\n')
  f.write('went up\n')
  f.write('the hill\n')

We can construct a TextLineDataset from them as follows:

dataset =['/tmp/text_lines0.txt',

The elements of the dataset are expected to be:

for element in dataset.as_numpy_iterator():
b'the cow'
b'jumped over'
b'the moon'
b'jack and jill'
b'went up'
b'the hill'

filenames A whose elements are tf.string scalars, a tf.string tensor, or a value that can be converted to a tf.string tensor (such as a list of Python strings).
compression_type (Optional.) A tf.string scalar evaluating to one of "" (no compression), "ZLIB", or "GZIP".
buffer_size (Optional.) A tf.int64 scalar denoting the number of bytes to buffer. A value of 0 results in the default buffering values chosen based on the compression type.
num_parallel_reads (Optional.) A tf.int64 scalar representing the number of files to read in parallel. If greater than one, the records of files read in parallel are outputted in an interleaved order. If your input pipeline is I/O bottlenecked, consider setting this parameter to a value greater than one to parallelize the I/O. If None, files will be read sequentially.
name (Optional.) A name for the operation.

element_spec The type specification of an element of this dataset.

dataset =[1, 2, 3])