tfdv.DecodeCSV

View source on GitHub

Decodes CSV records into Arrow RecordBatches.

column_names List of feature names. Order must match the order in the CSV file.
delimiter A one-character string used to separate fields.
skip_blank_lines A boolean to indicate whether to skip over blank lines rather than interpreting them as missing values.
schema An optional schema of the input data. If provided, types will be inferred from the schema. If this is provided, the feature names must equal column_names.
desired_batch_size Batch size. The output Arrow RecordBatches will have as many rows as the desired_batch_size.
multivalent_columns Name of column that can contain multiple values.
secondary_delimiter Delimiter used for parsing multivalent columns.

Class Variables

  • pipeline = None