Module: tfdv

Init module for TensorFlow Data Validation.

Classes

class CombinerStatsGenerator: A StatsGenerator which computes statistics using a combiner function.

class CrossFeatureView: View of a single cross feature.

class DatasetListView: View of statistics for multiple datasets (slices).

class DatasetView: View of statistics for a dataset (slice).

class DetectFeatureSkew: API for detecting feature skew between training and serving examples.

class FeaturePath: Represents the path to a feature in an input example.

class FeatureView: View of a single feature.

class GenerateStatistics: API for generating data statistics.

class MergeDatasetFeatureStatisticsList: API for merging sharded DatasetFeatureStatisticsList.

class StatsOptions: Options for generating statistics.

class TransformStatsGenerator: A StatsGenerator which wraps an arbitrary Beam PTransform.

class WriteStatisticsToBinaryFile: API for writing serialized data statistics to a binary file.

class WriteStatisticsToRecordsAndBinaryFile: API for writing statistics to both sharded records and binary pb.

class WriteStatisticsToTFRecord: API for writing serialized data statistics to TFRecord file.

Functions

compare_slices(...): Compare statistics of two slices using Facets.

default_sharded_output_suffix(...): Returns the default sharded output suffix.

default_sharded_output_supported(...): True if sharded output is supported by default.

display_anomalies(...): Displays the input anomalies (for use in a Jupyter notebook).

display_schema(...): Displays the input schema (for use in a Jupyter notebook).

experimental_get_feature_value_slicer(...): Returns a function that generates sliced record batches for a given one.

generate_dummy_schema_with_paths(...): Generate a schema with the requested paths and no other information.

generate_statistics_from_csv(...): Compute data statistics from CSV files.

generate_statistics_from_dataframe(...): Compute data statistics for the input pandas DataFrame.

generate_statistics_from_tfrecord(...): Compute data statistics from TFRecord files containing TFExamples.

get_confusion_count_dataframes(...): Returns a pandas dataframe representation of a sequence of ConfusionCount.

get_domain(...): Get the domain associated with the input feature from the schema.

get_feature(...): Get a feature from the schema.

get_feature_stats(...): Get feature statistics from the dataset statistics.

get_match_stats_dataframe(...): Formats MatchStats as a pandas dataframe.

get_skew_result_dataframe(...): Formats FeatureSkew results as a pandas dataframe.

get_slice_stats(...): Get statistics associated with a specific slice.

get_statistics_html(...): Build the HTML for visualizing the input statistics using Facets.

infer_schema(...): Infers schema from the input statistics.

load_anomalies_text(...): Loads the Anomalies proto stored in text format in the input path.

load_schema_text(...): Loads the schema stored in text format in the input path.

load_sharded_statistics(...): Read a sharded DatasetFeatureStatisticsList from disk as a DatasetListView.

load_statistics(...): Loads data statistics proto from file.

load_stats_binary(...): Loads a serialized DatasetFeatureStatisticsList proto from a file.

load_stats_text(...): Loads the specified DatasetFeatureStatisticsList proto stored in text format.

set_domain(...): Sets the domain for the input feature in the schema.

update_schema(...): Updates input schema to conform to the input statistics.

validate_corresponding_slices(...): Validates corresponding sliced statistics.

validate_examples_in_csv(...): Validates examples in csv files.

validate_examples_in_tfrecord(...): Validates TFExamples in TFRecord files.

validate_statistics(...): Validates the input statistics against the provided input schema.

visualize_statistics(...): Visualize the input statistics using Facets.

write_anomalies_text(...): Writes the Anomalies proto to a file in text format.

write_schema_text(...): Writes input schema to a file in text format.

write_stats_text(...): Writes a DatasetFeatureStatisticsList proto to a file in text format.

version '1.15.1'