Tune in to the first Women in ML Symposium this Tuesday, October 19 at 9am PST Register now

tf.data.experimental.CsvDataset

A Dataset comprising lines from one or more CSV files.

Inherits From: Dataset

Used in the notebooks

Used in the guide Used in the tutorials

The tf.data.experimental.CsvDataset class provides a minimal CSV Dataset interface. There is also a richer tf.data.experimental.make_csv_dataset function which provides additional convenience features such as column header parsing, column type-inference, automatic shuffling, and file interleaving.

The elements of this dataset correspond to records from the file(s). RFC 4180 format is expected for CSV files (https://tools.ietf.org/html/rfc4180) Note that we allow leading and trailing spaces for int or float fields.

For example, suppose we have a file 'my_file0.csv' with four CSV columns of different data types:

with open('/tmp/my_file0.csv', 'w') as f:
  f.write('abcdefg,4.28E10,5.55E6,12\n')
  f.write('hijklmn,-5.3E14,,2\n')

We can construct a CsvDataset from it as follows:

dataset = tf.data.experimental.CsvDataset(
  "/tmp/my_file0.csv",
  [tf.float32,  # Required field, use dtype or empty tensor
   tf.constant([0.0], dtype=tf.float32),  # Optional field, default to 0.0
   tf.int32,  # Required field, use dtype or empty tensor
  ],
  select_cols=[1,2,3]  # Only parse last three columns
)

The expected output of its iterations is:

for element in dataset.as_numpy_iterator():
  print(element)
(4.28e10, 5.55e6, 12)
(-5.3e14, 0.0, 2)

See https://www.tensorflow.org/tutorials/load_data/csv#tfdataexperimentalcsvdataset for more in-depth example usage.

filenames A tf.string tensor containing one or more filenames.
record_defaults A list of default values for the CSV fields. Each item in the list is either a valid CSV DType (float32, float64, int32, int64, string), or a Tensor object with one of the above types. One per column of CSV data, with either a s