Input processing

Queue and read batched input data.

tf.contrib.learn.extract_dask_data(data)

Extract data from dask.Series or dask.DataFrame for predictors.


tf.contrib.learn.extract_dask_labels(labels)

Extract data from dask.Series for labels.


tf.contrib.learn.extract_pandas_data(data)

Extract data from pandas.DataFrame for predictors.

Given a DataFrame, will extract the values and cast them to float. The DataFrame is expected to contain values of type int, float or bool.

Args:
  • data: pandas.DataFrame containing the data to be extracted.
Returns:

A numpy ndarray of the DataFrame's values as floats.

Raises:
  • ValueError: if data contains types other than int, float or bool.

tf.contrib.learn.extract_pandas_labels(labels)

Extract data from pandas.DataFrame for labels.

Args:
  • labels: pandas.DataFrame or pandas.Series containing one column of labels to be extracted.
Returns:

A numpy ndarray of labels from the DataFrame.

Raises:
  • ValueError: if more than one column is found or type is not int, float or bool.

tf.contrib.learn.extract_pandas_matrix(data)

Extracts numpy matrix from pandas DataFrame.

Args:
  • data: pandas.DataFrame containing the data to be extracted.
Returns:

A numpy ndarray of the DataFrame's values.


tf.contrib.learn.read_batch_examples(file_pattern, batch_size, reader, randomize_input=True, num_epochs=None, queue_capacity=10000, num_threads=1, read_batch_size=1, parse_fn=None, name=None)

Adds operations to read, queue, batch Example protos.

Given file pattern (or list of files), will setup a queue for file names, read Example proto using provided reader, use batch queue to create batches of examples of size batch_size.

All queue runners are added to the queue runners collection, and may be started via start_queue_runners.

All ops are added to the default graph.

Use parse_fn if you need to do parsing / processing on single examples.

Args:
  • file_pattern: List of files or pattern of file paths containing Example records. See tf.gfile.Glob for pattern rules.
  • batch_size: An int or scalar Tensor specifying the batch size to use.
  • reader: A function or class that returns an object with read method, (filename tensor) -> (example tensor).
  • randomize_input: Whether the input should be randomized.
  • num_epochs: Integer specifying the number of times to read through the dataset. If None, cycles through the dataset forever. NOTE - If specified, creates a variable that must be initialized, so call tf.initialize_all_variables() as shown in the tests.
  • queue_capacity: Capacity for input queue.
  • num_threads: The number of threads enqueuing examples.
  • read_batch_size: An int or scalar Tensor specifying the number of records to read at once
  • parse_fn: Parsing function, takes Example Tensor returns parsed representation. If None, no parsing is done.
  • name: Name of resulting op.
Returns:

String Tensor of batched Example proto.

Raises:
  • ValueError: for invalid inputs.

tf.contrib.learn.read_batch_features(file_pattern, batch_size, features, reader, randomize_input=True, num_epochs=None, queue_capacity=10000, feature_queue_capacity=100, reader_num_threads=1, parser_num_threads=1, parse_fn=None, name=None)

Adds operations to read, queue, batch and parse Example protos.

Given file pattern (or list of files), will setup a queue for file names, read Example proto using provided reader, use batch queue to create batches of examples of size batch_size and parse example given features specification.

All queue runners are added to the queue runners collection, and may be started via start_queue_runners.

All ops are added to the default graph.

Args:
  • file_pattern: List of files or pattern of file paths containing Example records. See tf.gfile.Glob for pattern rules.
  • batch_size: An int or scalar Tensor specifying the batch size to use.
  • features: A dict mapping feature keys to FixedLenFeature or VarLenFeature values.
  • reader: A function or class that returns an object with read method, (filename tensor) -> (example tensor).
  • randomize_input: Whether the input should be randomized.
  • num_epochs: Integer specifying the number of times to read through the dataset. If None, cycles through the dataset forever. NOTE - If specified, creates a variable that must be initialized, so call tf.initialize_local_variables() as shown in the tests.
  • queue_capacity: Capacity for input queue.
  • feature_queue_capacity: Capacity of the parsed features queue. Set this value to a small number, for example 5 if the parsed features are large.
  • reader_num_threads: The number of threads to read examples.
  • parser_num_threads: The number of threads to parse examples. records to read at once
  • parse_fn: Parsing function, takes Example Tensor returns parsed representation. If None, no parsing is done.
  • name: Name of resulting op.
Returns:

A dict of Tensor or SparseTensor objects for each in features.

Raises:
  • ValueError: for invalid inputs.

tf.contrib.learn.read_batch_record_features(file_pattern, batch_size, features, randomize_input=True, num_epochs=None, queue_capacity=10000, reader_num_threads=1, parser_num_threads=1, name='dequeue_record_examples')

Reads TFRecord, queues, batches and parses Example proto.

See more detailed description in read_examples.

Args:
  • file_pattern: List of files or pattern of file paths containing Example records. See tf.gfile.Glob for pattern rules.
  • batch_size: An int or scalar Tensor specifying the batch size to use.
  • features: A dict mapping feature keys to FixedLenFeature or VarLenFeature values.
  • randomize_input: Whether the input should be randomized.
  • num_epochs: Integer specifying the number of times to read through the dataset. If None, cycles through the dataset forever. NOTE - If specified, creates a variable that must be initialized, so call tf.initialize_local_variables() as shown in the tests.
  • queue_capacity: Capacity for input queue.
  • reader_num_threads: The number of threads to read examples.
  • parser_num_threads: The number of threads to parse examples.
  • name: Name of resulting op.
Returns:

A dict of Tensor or SparseTensor objects for each in features.

Raises:
  • ValueError: for invalid inputs.