A Reader that outputs keys and tf.Example values from a BigQuery table.
Inherits From: ReaderBase
tf.contrib.cloud.BigQueryReader(
project_id, dataset_id, table_id, timestamp_millis, num_partitions,
features=None, columns=None, test_end_point=None, name=None
)
Example use:
# Assume a BigQuery has the following schema,
# name STRING,
# age INT,
# state STRING
# Create the parse_examples list of features.
features = dict(
name=tf.io.FixedLenFeature([1], tf.string),
age=tf.io.FixedLenFeature([1], tf.int32),
state=tf.io.FixedLenFeature([1], dtype=tf.string, default_value="UNK"))
# Create a Reader.
reader = bigquery_reader_ops.BigQueryReader(project_id=PROJECT,
dataset_id=DATASET,
table_id=TABLE,
timestamp_millis=TIME,
num_partitions=NUM_PARTITIONS,
features=features)
# Populate a queue with the BigQuery Table partitions.
queue = tf.compat.v1.train.string_input_producer(reader.partitions())
# Read and parse examples.
row_id, examples_serialized = reader.read(queue)
examples = tf.io.parse_example(examples_serialized, features=features)
# Process the Tensors examples["name"], examples["age"], etc...
Note that to create a reader a snapshot timestamp is necessary. This
will enable the reader to look at a consistent snapshot of the table.
For more information, see 'Table Decorators' in BigQuery docs.
See ReaderBase for supported methods.
Args |
project_id
|
GCP project ID.
|
dataset_id
|
BigQuery dataset ID.
|
table_id
|
BigQuery table ID.
|
timestamp_millis
|
timestamp to snapshot the table in milliseconds since
the epoch. Relative (negative or zero) snapshot times are not allowed.
For more details, see 'Table Decorators' in BigQuery docs.
|
num_partitions
|
Number of non-overlapping partitions to read from.
|
features
|
parse_example compatible dict from keys to VarLenFeature and
FixedLenFeature objects. Keys are read as columns from the db.
|
columns
|
list of columns to read, can be set iff features is None.
|
test_end_point
|
Used only for testing purposes (optional).
|
name
|
a name for the operation (optional).
|
Raises |
TypeError
|
- If features is neither None nor a dict or
- If columns is neither None nor a list or
- If both features and columns are None or set.
|
Attributes |
reader_ref
|
Op that implements the reader.
|
supports_serialize
|
Whether the Reader implementation can serialize its state.
|
Methods
num_records_produced
View source
num_records_produced(
name=None
)
Returns the number of records this reader has produced.
This is the same as the number of Read executions that have
succeeded.
Args |
name
|
A name for the operation (optional).
|
num_work_units_completed
View source
num_work_units_completed(
name=None
)
Returns the number of work units this reader has finished processing.
Args |
name
|
A name for the operation (optional).
|
partitions
View source
partitions(
name=None
)
Returns serialized BigQueryTablePartition messages.
These messages represent a non-overlapping division of a table for a
bulk read.
Args |
name
|
a name for the operation (optional).
|
Returns |
1-D string Tensor of serialized BigQueryTablePartition messages.
|
read
View source
read(
queue, name=None
)
Returns the next record (key, value) pair produced by a reader.
Will dequeue a work unit from queue if necessary (e.g. when the
Reader needs to start reading from a new file since it has
finished with the previous file).
Args |
queue
|
A Queue or a mutable string Tensor representing a handle
to a Queue, with string work items.
|
name
|
A name for the operation (optional).
|
Returns |
A tuple of Tensors (key, value).
|
key
|
A string scalar Tensor.
|
value
|
A string scalar Tensor.
|
read_up_to
View source
read_up_to(
queue, num_records, name=None
)
Returns up to num_records (key, value) pairs produced by a reader.
Will dequeue a work unit from queue if necessary (e.g., when the
Reader needs to start reading from a new file since it has
finished with the previous file).
It may return less than num_records even before the last batch.
Args |
queue
|
A Queue or a mutable string Tensor representing a handle
to a Queue, with string work items.
|
num_records
|
Number of records to read.
|
name
|
A name for the operation (optional).
|
Returns |
A tuple of Tensors (keys, values).
|
keys
|
A 1-D string Tensor.
|
values
|
A 1-D string Tensor.
|
reset
View source
reset(
name=None
)
Restore a reader to its initial clean state.
Args |
name
|
A name for the operation (optional).
|
Returns |
The created Operation.
|
restore_state
View source
restore_state(
state, name=None
)
Restore a reader to a previously saved state.
Not all Readers support being restored, so this can produce an
Unimplemented error.
Args |
state
|
A string Tensor.
Result of a SerializeState of a Reader with matching type.
|
name
|
A name for the operation (optional).
|
Returns |
The created Operation.
|
serialize_state
View source
serialize_state(
name=None
)
Produce a string tensor that encodes the state of a reader.
Not all Readers support being serialized, so this can produce an
Unimplemented error.
Args |
name
|
A name for the operation (optional).
|