Validates TFExamples in TFRecord files.

Runs a Beam pipeline to detect anomalies on a per-example basis. If this function detects anomalous examples, it generates summary statistics regarding the set of examples that exhibit each anomaly.

This is a convenience function for users with data in TFRecord format. Users with data in unsupported file/data formats, or users who wish to create their own Beam pipelines need to use the 'IdentifyAnomalousExamples' PTransform API directly instead.

data_location The location of the input data files.
stats_options tfdv.StatsOptions for generating data statistics. This must contain a schema.
output_path The file path to output data statistics result to. If None, the function uses a temporary directory. The output will be a TFRecord file containing a single data statistics list proto, and can be read with the 'load_statistics' function. If you run this function on Google Cloud, you must specify an output_path. Specifying None may cause an error.
pipeline_options Optional beam pipeline options. This allows users to specify various beam pipeline execution parameters like pipeline runner (DirectRunner or DataflowRunner), cloud dataflow service project id, etc. See for more details.
num_sampled_examples If set, returns up to this many examples of each anomaly type as a map from anomaly reason string to a list of tf.Examples.

If num_sampled_examples is zero, returns a single DatasetFeatureStatisticsList proto in which each dataset consists of the set of examples that exhibit a particular anomaly. If num_sampled_examples is nonzero, returns the same statistics proto as well as a mapping from anomaly to a list of tf.Examples that exhibited that anomaly.

ValueError If the specified stats_options does not include a schema.