tf.contrib.data.make_csv_dataset( file_pattern, batch_size, column_names=None, column_defaults=None, label_name=None, field_delim=',', use_quote_delim=True, na_value='', header=True, comment=None, num_epochs=None, shuffle=True, shuffle_buffer_size=10000, shuffle_seed=None, prefetch_buffer_size=1, num_parallel_reads=1, num_parallel_parser_calls=2, sloppy=False, default_float_type=tf.float32, num_rows_for_inference=100 )
Reads CSV files into a dataset.
Reads CSV files into a dataset, where each element is a (features, labels)
tuple that corresponds to a batch of CSV rows. The features dictionary
maps feature column names to
Tensors containing the corresponding
feature data, and labels is a
Tensor containing the batch's label data.
file_pattern: List of files or patterns of file paths containing CSV records. See
tf.gfile.Globfor pattern rules.
batch_size: An int representing the number of consecutive elements of this dataset to combine in a single batch.
column_names: An optional list of strings that corresponds to the CSV columns, in order. One per column of the input record. If this is not provided, infers the column names from the first row of the records. These names will be the keys of the features dict of each dataset element.
column_defaults: A optional list of default values for the CSV fields. One item per column of the input record. Each item in the list is either a valid CSV dtype (float32, float64, int32, int64, or string), or a
Tensorwith one of the aforementioned types. The tensor can either be a scalar default value (if the column is optional), or an empty tensor (if the column is required). If a dtype is provided instead of a tensor, the column is also treated as required. If this list is not provided, tries to infer types based on reading the first num_rows_for_inference rows of files specified, and assumes all columns are optional, defaulting to
0for numeric values and
""for string values.
label_name: A optional string corresponding to the label column. If provided, the data for this column is returned as a separate
Tensorfrom the features dictionary, so that the dataset complies with the format expected by a
field_delim: An optional
string. Defaults to
",". Char delimiter to separate fields in a record.
use_quote_delim: An optional bool. Defaults to
True. If false, treats double quotation marks as regular characters inside of the string fields.
na_value: Additional string to recognize as NA/NaN.
header: A bool that indicates whether the first rows of provided CSV files correspond to header lines with column names, and should not be included in the data.
comment: An optional character string that marks lines that should not be parsed as csv records. If this is provided, all lines that start with this character will not be parsed.
num_epochs: An int specifying the number of times this dataset is repeated. If None, cycles through the dataset forever.
shuffle: A bool that indicates whether the input should be shuffled.
shuffle_buffer_size: Buffer size to use for shuffling. A large buffer size ensures better shuffling, but would increase memory usage and startup time.
shuffle_seed: Randomization seed to use for shuffling.
prefetch_buffer_size: An int specifying the number of feature batches to prefetch for performance improvement. Recommended value is the number of batches consumed per training step.
num_parallel_reads: Number of threads used to read CSV records from files. If >1, the results will be interleaved.
num_parallel_parser_calls: Number of parallel invocations of the CSV parsing function on CSV records.
True, reading performance will be improved at the cost of non-deterministic ordering. If
False, the order of elements produced is deterministic prior to shuffling (elements are still randomized if
shuffle=True. Note that if the seed is set, then order of elements after shuffling is deterministic). Defaults to
tf.float64. If defaults are not provided, float-like strings are interpreted to be this type.
num_rows_for_inference: Number of rows of a file to use for type inference if record_defaults is not provided. If None, reads all the rows of all the files. Defaults to 100.
A dataset, where each element is a (features, labels) tuple that corresponds
to a batch of
batch_size CSV rows. The features dictionary maps feature
column names to
Tensors containing the corresponding column data, and
labels is a
Tensor containing the column data for the label column
ValueError: If any of the arguments is malformed.