tfdv.generate_statistics_from_tfrecord

Compute data statistics from TFRecord files containing TFExamples.

tfdv.generate_statistics_from_tfrecord(
    data_location: Text,
    output_path: Optional[bytes] = None,
    stats_options: tfdv.StatsOptions = options.StatsOptions(),
    pipeline_options: Optional[PipelineOptions] = None
) -> statistics_pb2.DatasetFeatureStatisticsList

Used in the notebooks

Used in the tutorials
FaceSSD Fairness Indicators Example Colab Introduction to Fairness Indicators

Runs a Beam pipeline to compute the data statistics and return the result data statistics proto.

This is a convenience method for users with data in TFRecord format. Users with data in unsupported file/data formats, or users who wish to create their own Beam pipelines need to use the 'GenerateStatistics' PTransform API directly instead.

Args
`data_location`	The location of the input data files.
`output_path`	The file path to output data statistics result to. If None, we use a temporary directory. It will be a TFRecord file containing a single data statistics proto, and can be read with the 'load_statistics' API. If you run this function on Google Cloud, you must specify an output_path. Specifying None may cause an error.
`stats_options`	`tfdv.StatsOptions` for generating data statistics.
`pipeline_options`	Optional beam pipeline options. This allows users to specify various beam pipeline execution parameters like pipeline runner (DirectRunner or DataflowRunner), cloud dataflow service project id, etc. See https://cloud.google.com/dataflow/pipelines/specifying-exec-params for more details.

Returns
A DatasetFeatureStatisticsList proto.

tfdv.generate_statistics_from_tfrecord

Used in the notebooks

Args

Returns