Compute data statistics from TFRecord files containing TFExamples.

Used in the notebooks

Used in the tutorials

Runs a Beam pipeline to compute the data statistics and return the result data statistics proto.

This is a convenience method for users with data in TFRecord format. Users with data in unsupported file/data formats, or users who wish to create their own Beam pipelines need to use the 'GenerateStatistics' PTransform API directly instead.

data_location The location of the input data files.
output_path The file path to output data statistics result to. If None, we use a temporary directory. It will be a TFRecord file containing a single data statistics proto, and can be read with the 'load_statistics' API. If you run this function on Google Cloud, you must specify an output_path. Specifying None may cause an error.
stats_options tfdv.StatsOptions for generating data statistics.
pipeline_options Optional beam pipeline options. This allows users to specify various beam pipeline execution parameters like pipeline runner (DirectRunner or DataflowRunner), cloud dataflow service project id, etc. See for more details.

A DatasetFeatureStatisticsList proto.