tfdv.DecodeCSV

Class DecodeCSV

Decodes CSV records into an in-memory dict representation.

Currently we assume each column has only a single value.

__init__

__init__(
    column_names,
    delimiter=',',
    skip_blank_lines=True,
    schema=None,
    infer_type_from_schema=False
)

Initializes the CSV decoder.

Args:

  • column_names: List of feature names. Order must match the order in the CSV file.
  • delimiter: A one-character string used to separate fields.
  • skip_blank_lines: A boolean to indicate whether to skip over blank lines rather than interpreting them as missing values.
  • schema: An optional schema of the input data.
  • infer_type_from_schema: A boolean to indicate whether the feature types should be inferred from the schema. If set to True, an input schema must be provided.

Methods

expand

expand(lines)

Decodes the input CSV records into an in-memory dict representation.

Args:

  • lines: A PCollection of strings representing the lines in the CSV file.

Returns:

A PCollection of dicts representing the CSV records.