Decodes CSV records into Arrow RecordBatches.
tfdv.DecodeCSV(
column_names: List[types.FeatureName],
delimiter: Text = ',',
skip_blank_lines: bool = True,
schema: Optional[schema_pb2.Schema] = None,
desired_batch_size: Optional[int] = constants.DEFAULT_DESIRED_INPUT_BATCH_SIZE,
multivalent_columns: Optional[List[types.FeatureName]] = None,
secondary_delimiter: Optional[Union[Text, bytes]] = None
)
DEPRECATED: please use tfx_bsl.public.CsvTFXIO instead.
Args |
column_names
|
List of feature names. Order must match the order in the CSV
file.
|
delimiter
|
A one-character string used to separate fields.
|
skip_blank_lines
|
A boolean to indicate whether to skip over blank lines
rather than interpreting them as missing values.
|
schema
|
An optional schema of the input data. If provided, types
will be inferred from the schema. If this is provided, the feature names
must equal column_names.
|
desired_batch_size
|
Batch size. The output Arrow RecordBatches will have
as many rows as the desired_batch_size .
|
multivalent_columns
|
Name of column that can contain multiple
values.
|
secondary_delimiter
|
Delimiter used for parsing multivalent columns.
|
Class Variables |
pipeline
|
None
|