Missed TensorFlow World? Check out the recap. Learn more

tfio.bigquery.BigQueryReadSession

View source on GitHub

Class BigQueryReadSession

Entry point for reading data from Cloud BigQuery.

__init__

View source

__init__(
    parent,
    project_id,
    table_id,
    dataset_id,
    selected_fields,
    output_types,
    row_restriction,
    requested_streams,
    streams,
    avro_schema,
    client_resource
)

Initialize self. See help(type(self)) for accurate signature.

Methods

get_streams

View source

get_streams()

Returns Tensor with stream names for reading data from BigQuery.

Returns:

Tensor with stream names.

parallel_read_rows

View source

parallel_read_rows(
    cycle_length=None,
    sloppy=False,
    block_length=1
)

Retrieves rows from the BigQuery service in parallel streams.

bq_client = BigQueryClient()
bq_read_session = bq_client.read_session(...)
ds1 = bq_read_session.parallel_read_rows(...)

Args: cycle_length: number of threads to run in parallel. If not specified, it is defaulted to the number of streams in a read session. sloppy: If false, elements are produced in deterministic order. Otherwise, the implementation is allowed, for the sake of expediency, to produce elements in a non-deterministic order. block_length: The number of consecutive elements to pull from an input Dataset before advancing to the next input Dataset.

Returns:

A tf.data.Dataset returning the row keys and the cell contents.

Raises:

  • ValueError: If the configured probability is unexpected.

read_rows

View source

read_rows(stream)

Retrieves rows (including values) from the BigQuery service.

Args:

  • stream: name of the stream to read from.

Returns:

A tf.data.Dataset returning the row keys and the cell contents.

Raises:

  • ValueError: If the configured probability is unexpected.