tfdv.validate_examples_in_tfrecord
bookmark_border Stay organized with collections Save and categorize content based on your preferences.

On this page
Args
Returns
Raises

Validates TFExamples in TFRecord files.

tfdv.validate_examples_in_tfrecord(
    data_location: Text,
    stats_options: tfdv.StatsOptions,
    output_path: Optional[Text] = None,
    pipeline_options: Optional[PipelineOptions] = None,
    num_sampled_examples=0
) -> Union[statistics_pb2.DatasetFeatureStatisticsList, Tuple[statistics_pb2.
    DatasetFeatureStatisticsList, Mapping[str, List[tf.train.Example]]]]

Runs a Beam pipeline to detect anomalies on a per-example basis. If this function detects anomalous examples, it generates summary statistics regarding the set of examples that exhibit each anomaly.

This is a convenience function for users with data in TFRecord format. Users with data in unsupported file/data formats, or users who wish to create their own Beam pipelines need to use the 'IdentifyAnomalousExamples' PTransform API directly instead.

Args
`data_location`	The location of the input data files.
`stats_options`	`tfdv.StatsOptions` for generating data statistics. This must contain a schema.
`output_path`	The file path to output data statistics result to. If None, the function uses a temporary directory. The output will be a TFRecord file containing a single data statistics list proto, and can be read with the 'load_statistics' function. If you run this function on Google Cloud, you must specify an output_path. Specifying None may cause an error.
`pipeline_options`	Optional beam pipeline options. This allows users to specify various beam pipeline execution parameters like pipeline runner (DirectRunner or DataflowRunner), cloud dataflow service project id, etc. See https://cloud.google.com/dataflow/pipelines/specifying-exec-params for more details.
`num_sampled_examples`	If set, returns up to this many examples of each anomaly type as a map from anomaly reason string to a list of tf.Examples.

Returns
If num_sampled_examples is zero, returns a single DatasetFeatureStatisticsList proto in which each dataset consists of the set of examples that exhibit a particular anomaly. If num_sampled_examples is nonzero, returns the same statistics proto as well as a mapping from anomaly to a list of tf.Examples that exhibited that anomaly.

Raises
`ValueError`	If the specified stats_options does not include a schema.

tfdv.validate_examples_in_tfrecord bookmark_borderbookmark Stay organized with collections Save and categorize content based on your preferences.

Args

Returns

Raises

tfdv.validate_examples_in_tfrecord
bookmark_border Stay organized with collections Save and categorize content based on your preferences.