tfio.bigquery.BigQueryClient

BigQueryClient is the entrypoint for interacting with Cloud BigQuery in TF.

Used in the notebooks

Used in the tutorials

BigQueryClient encapsulates a connection to Cloud BigQuery, and exposes the readSession method to initiate a BigQuery read session.

Child Classes

class DataFormat

class FieldMode

Methods

read_session

View source

Opens a session and returns a BigQueryReadSession object.

Args
parent String of the form projects/{project_id} indicating the project this ReadSession is associated with. This is the project that will be billed for usage.
project_id The assigned project ID of the project.
table_id The ID of the table in the dataset.
dataset_id The ID of the dataset in the project.
selected_fields This can be a list or a dict. If a list, it has names of the fields in the table that should be read. If a dict, it should be in a form like, i.e: { "field_a_name": {"mode": "repeated", output_type: dtypes.int64}, "field_b_name": {"mode": "nullable", output_type: dtypes.string}, ... "field_x_name": {"mode": "repeated", output_type: dtypes.string} } "mode" is BigQuery column attribute, it can be 'repeated', 'nullable' or 'required'. The output field order is unrelated to the order of fields in selected_fields. If "mode" not specified, defaults to "nullable". If "output_type" not specified, DT_STRING is implied for all Tensors.
output_types Types for the output tensor in the same sequence as selected_fields. This is only needed when selected_fields is a list, if selected_fields is a dictionary, this output_types information is included in selected_fields as described above. If not specified, DT_STRING is implied for all Tensors.
row_restriction Optional. SQL text filtering statement, similar to a WHERE clause in a query.
requested_streams Initial number of streams. If unset or 0, we will provide a value of streams so as to produce reasonable throughput. Must be non-negative. The number of streams may be lower than the requested number, depending on the amount parallelism that is reasonable for the table and the maximum amount of parallelism allowed by the system.

Returns
A BigQueryReadSession Python object representing the operations available on the table.