ML Community Day is November 9! Join us for updates from TensorFlow, JAX, and more Learn more

A Dataset comprising lines from one or more CSV files.

Inherits From: Dataset

Used in the notebooks

Used in the guide Used in the tutorials

The class provides a minimal CSV Dataset interface. There is also a richer function which provides additional convenience features such as column header parsing, column type-inference, automatic shuffling, and file interleaving.

The elements of this dataset correspond to records from the file(s). RFC 4180 format is expected for CSV files ( Note that we allow leading and trailing spaces for int or float fields.

For example, suppose we have a file 'my_file0.csv' with four CSV columns of different data types:

with open('/tmp/my_file0.csv', 'w') as f:

We can construct a CsvDataset from it as follows:

dataset =
  [tf.float32,  # Required field, use dtype or empty tensor
   tf.constant([0.0], dtype=tf.float32),  # Optional field, default to 0.0
   tf.int32,  # Required field, use dtype or empty tensor
  select_cols=[1,2,3]  # Only parse last three columns

The expected output of its iterations is:

for element in dataset.as_numpy_iterator():
(4.28e10, 5.55e6, 12)
(-5.3e14, 0.0, 2)

See for more in-depth example usage.