tfio.bigquery.BigQueryClient

BigQueryClient is the entrypoint for interacting with Cloud BigQuery in TF.

Used in the notebooks

Used in the tutorials

BigQueryClient encapsulates a connection to Cloud BigQuery, and exposes the readSession method to initiate a BigQuery read session.

Child Classes

class DataFormat

class FieldMode

Methods

read_session

View source

Opens a session and returns a BigQueryReadSession object.

Args
parent String of the form projects/{project_id} indicating the project this ReadSession is associated with. This is the project that will be billed for usage.
project_id The assigned project ID of the project.
table_id The ID of the table in the dataset.
dataset_id The ID of the dataset in the project.
selected_fields This can be a list or a dict. If a list, it has names of the fields in the table that should be read. If a dict, it should be in a form like, i.e: { "field_a_name": {"mode": "repeated", "output_type": dtypes.int64}, "field_b_name": {"mode": "nullable", "output_type": dtypes.int32, "default_value": 0}, ... "field_x_name": {"mode": "repeated", "output_type": dtypes.string, "default_value": ""} } "mode" is BigQuery column attribute, it can be 'repeated', 'nullable' or 'required'. The output field order is unrelated to the order of fields in selected_fields. If "mode" not specified, defaults to "nullable". If "output_type" not specified, DT_STRING is implied for all Tensors.
output_types Types for the output tensor in the same sequence as selected_fields. This is only needed when selected_fields is a list, if selected_fields is a dictionary, this output_types information is included in selected_fields as described above. If not specified, DT_STRING is implied for all Tensors.
default_values Default values to use when underlying tensor is "null" in the same sequence as selected_fields. If not sepecified, meaningful defaults are going to be used (0 for numerices, empty string for strings, and False for booleans).
row_restriction Optional. SQL text filtering statement, similar to a WHERE clause in a query.
requested_streams Desirable number of streams that can be read in parallel. Must be a positive number. The actual number of streams that BigQuery Streaming API returns may be lower than this number, depending on the amount parallelism that is reasonable for the table and the maximum amount of parallelism allowed by the system.

Returns
A BigQueryReadSession Python object representing the operations available on the table.