Apply to speak at TensorFlow World. Deadline April 23rd. Propose talk



Compute data statistics from TFRecord files containing TFExamples.

Runs a Beam pipeline to compute the data statistics and return the result data statistics proto.

This is a convenience method for users with data in TFRecord format. Users with data in unsupported file/data formats, or users who wish to create their own Beam pipelines need to use the 'GenerateStatistics' PTransform API directly instead.


  • data_location: The location of the input data files.
  • output_path: The file path to output data statistics result to. If None, we use a temporary directory. It will be a TFRecord file containing a single data statistics proto, and can be read with the 'load_statistics' API.
  • stats_options: tfdv.StatsOptions for generating data statistics.
  • pipeline_options: Optional beam pipeline options. This allows users to specify various beam pipeline execution parameters like pipeline runner (DirectRunner or DataflowRunner), cloud dataflow service project id, etc. See for more details.


A DatasetFeatureStatisticsList proto.