TFDV checks for anomalies by comparing a schema and statistics proto(s). The following chart lists the anomaly types that TFDV can detect, the schema and statistics fields that are used to detect each anomaly type, and the condition(s) under which each anomaly type is detected.
BOOL_TYPE_BIG_INT- Schema Fields:
feature.bool_domain
- Statistics Fields:
features.num_stats.maxfeatures.type
- Detection Condition:
feature.bool_domainis specified andfeatures.type==INTandfeatures.num_stats.max> 1
- Schema Fields:
BOOL_TYPE_BYTES_NOT_INT- Anomaly type not detected in TFDV
BOOL_TYPE_BYTES_NOT_STRING- Anomaly type not detected in TFDV
BOOL_TYPE_FLOAT_NOT_INT- Anomaly type not detected in TFDV
BOOL_TYPE_FLOAT_NOT_STRING- Anomaly type not detected in TFDV
BOOL_TYPE_INT_NOT_STRING- Anomaly type not detected in TFDV
BOOL_TYPE_SMALL_INT- Schema Fields:
feature.bool_domain
- Statistics Fields:
features.num_stats.minfeatures.type
- Detection Condition:
features.type==INTandfeature.bool_domainis specified andfeatures.num_stats.min< 0
- Schema Fields:
BOOL_TYPE_STRING_NOT_INT- Anomaly type not detected in TFDV
BOOL_TYPE_UNEXPECTED_STRING- Schema Fields:
feature.bool_domain
- Statistics Fields:
features.string_stats.rank_histogram*
- Detection Condition:
features.type==STRINGandfeature.bool_domainis specified and- at least one value in
rank_histogram* is notfeature.bool_domain.true_valueorfeature.bool_domain.false_value
- Schema Fields:
BOOL_TYPE_UNEXPECTED_FLOAT- Schema Fields:
feature.bool_domain
- Statistics Fields:
features.num_stats.minfeatures.num_stats.maxfeatures.num_stats.histograms.num_nanfeatures.num_stats.histograms.buckets.low_valuefeatures.num_stats.histograms.buckets.high_valuefeatures.type
- Detection Condition:
features.type==FLOATandfeature.bool_domainis specified and either- (
features.num_stats.min!= 0 orfeatures.num_stats.min!= 1) or - (
features.num_stats.max!= 0 orfeatures.num_stats.max!= 1) or features.num_stats.histograms.num_nan> 0 or- (
features.num_stats.histograms.buckets.low_value!= 0 orfeatures.num_stats.histograms.buckets.high_value!= 1) andfeatures.num_stats.histograms.buckets.sample_count> 0
- (
- Schema Fields:
BOOL_TYPE_INVALID_CONFIG- Schema Fields:
feature.bool_domain
- Statistics Fields:
features.type
- Detection Condition:
- If
features.type==INTorFLOAT,feature.bool_domainis specified andfeature.bool_domain.true_valueorfeature.bool_domain.false_valueis specified, or
- if
features.type==STRING,feature.bool_domainis specified andfeature.bool_domain.true_valueandfeature.bool_domain.false_valueare not specified
- If
- Schema Fields:
ENUM_TYPE_BYTES_NOT_STRING- Anomaly type not detected in TFDV
ENUM_TYPE_FLOAT_NOT_STRING- Anomaly type not detected in TFDV
ENUM_TYPE_INT_NOT_STRING- Anomaly type not detected in TFDV
ENUM_TYPE_INVALID_UTF8- Statistics Fields:
features.string_stats.invalid_utf8_count
- Detection Condition:
invalid_utf8_count> 0
- Statistics Fields:
ENUM_TYPE_UNEXPECTED_STRING_VALUES- Schema Fields:
string_domainandfeature.domain; orfeature.string_domainfeature.distribution_constraints.min_domain_mass
- Statistics Fields:
features.string_stats.rank_histogram*
- Detection Condition:
- Either (number of values in
rank_histogram* that are not in domain / total number of values) > (1 -feature.distribution_constraints.min_domain_mass) or feature.distribution_constraints.min_domain_mass== 1.0 and there are values in the histogram that are not in the domain
- Either (number of values in
- Schema Fields:
FEATURE_TYPE_HIGH_NUMBER_VALUES- Schema Fields:
feature.value_count.maxfeature.value_counts.value_count.max
- Statistics Fields:
features.common_stats.max_num_valuesfeatures.common_stats.presence_and_valency_stats.max_num_values
- Detection Condition:
- If
feature.value_count.maxis specifiedfeatures.common_stats.max_num_values>feature.value_count.max; or
- if
feature.value_countsis specifiedfeature.value_counts.value_count.max<features.common_stats.presence_and_valency_stats.max_num_valuesat a given nestedness level
- If
- Schema Fields:
FEATURE_TYPE_LOW_FRACTION_PRESENT- Schema Fields:
feature.presence.min_fraction
- Statistics Fields:
features.common_stats.num_non_missing*num_examples*
- Detection Condition:
feature.presence.min_fractionis specified and (features.common_stats.num_non_missing* /num_examples*) <feature.presence.min_fractionorfeature.presence.min_fraction== 1.0 andcommon_stats.num_missing!= 0
- Schema Fields:
FEATURE_TYPE_LOW_NUMBER_PRESENT- Schema Fields:
feature.presence.min_count
- Statistics Fields:
features.common_stats.num_non_missing*
- Detection Condition:
feature.presence.min_countis specified and eitherfeatures.common_stats.num_non_missing* == 0 orfeatures.common_stats.num_non_missing* <feature.presence.min_count
- Schema Fields:
FEATURE_TYPE_LOW_NUMBER_VALUES- Schema Fields:
feature.value_count.minfeature.value_counts.value_count.min
- Statistics Fields:
features.common_stats.min_num_valuesfeatures.common_stats.presence_and_valency_stats.min_num_values
- Detection Condition:
- If
feature.value_count.minis specifiedfeatures.common_stats.min_num_values<feature.value_count.min; or
- if
feature.value_countsis specifiedfeatures.common_stats.presence_and_valency_stats.min_num_values<feature.value_counts.value_count.minat a given nestedness level
- If
- Schema Fields:
FEATURE_TYPE_NOT_PRESENT- Schema Fields:
feature.in_environmentorfeature.not_in_environmentorschema.default_environmentfeature.lifecycle_stagefeature.presence.min_countorfeature.presence.min_fraction
- Statistics Fields:
features.common_stats.num_non_missing*
- Detection Condition:
feature.lifecycle_stagenot in [PLANNED,ALPHA,DEBUG,DEPRECATED] andcommon_stats.num_non_missing* == 0 and- (
feature.presence.min_count> 0 orfeature.presence.min_fraction> 0) and eitherfeature.in_environment== current environment orfeature.not_in_environment!= current environment orschema.default_environment!= current environment
- Schema Fields:
FEATURE_TYPE_NO_VALUES- Anomaly type not detected in TFDV
FEATURE_TYPE_UNEXPECTED_REPEATED- Anomaly type not detected in TFDV
FEATURE_TYPE_HIGH_UNIQUE- Schema Fields:
feature.unique_constraints.max
- Statistics Fields:
features.string_stats.unique
- Detection Condition:
features.string_stats.unique>feature.unique_constraints.max
- Schema Fields:
FEATURE_TYPE_LOW_UNIQUE- Schema Fields:
feature.unique_constraints.min
- Statistics Fields:
features.string_stats.unique
- Detection Condition:
features.string_stats.unique<feature.unique_constraints.min
- Schema Fields:
FEATURE_TYPE_NO_UNIQUE- Schema Fields:
feature.unique_constraints
- Statistics Fields:
features.string_stats.unique
- Detection Condition:
feature.unique_constraintsspecified but nofeatures.string_stats.uniquepresent (as is the case where the feature is not a string or categorical)
- Schema Fields:
FLOAT_TYPE_BIG_FLOAT- Schema Fields:
feature.float_domain.max
- Statistics Fields:
features.typefeatures.num_stats.maxorfeatures.string_stats.rank_histogram
- Detection Condition:
- If
features.type==FLOAT,features.num_stats.max>feature.float_domain.max; or
- if
features.type==BYTESorSTRING,- maximum value in
features.string_stats.rank_histogram(when converted to float) >feature.float_domain.max
- maximum value in
- If
- Schema Fields:
FLOAT_TYPE_NOT_FLOAT- Anomaly type not detected in TFDV
FLOAT_TYPE_SMALL_FLOAT- Schema Fields:
feature.float_domain.min
- Statistics Fields:
features.typefeatures.num_stats.minorfeatures.string_stats.rank_histogram
- Detection Condition:
- If
features.type==FLOAT,features.num_stats.min<feature.float_domain.min; or
- if
features.type==BYTESorSTRING,- minimum value in
features.string_stats.rank_histogram(when converted to float) <feature.float_domain.min
- minimum value in
- If
- Schema Fields:
FLOAT_TYPE_STRING_NOT_FLOAT- Schema Fields:
feature.float_domain
- Statistics Fields:
features.typefeatures.string_stats.rank_histogram
- Detection Condition:
features.type==BYTESorSTRINGandfeatures.string_stats.rank_histogramhas at least one value that cannot be converted to a float
- Schema Fields:
FLOAT_TYPE_NON_STRING- Anomaly type not detected in TFDV
FLOAT_TYPE_UNKNOWN_TYPE_NUMBER- Anomaly type not detected in TFDV
FLOAT_TYPE_HAS_NAN- Schema Fields:
feature.float_domain.disallow_nan
- Statistics Fields:
features.typefeatures.num_stats.histograms.num_nan
- Detection Condition:
float_domain.disallow_nanis true andfeatures.num_stats.histograms.num_nan> 0
- Schema Fields:
FLOAT_TYPE_HAS_INF- Schema Fields:
feature.float_domain.disallow_inf
- Statistics Fields:
features.typefeatures.num_stats.minfeatures.num_stats.max
- Detection Condition:
features.type==FLOATfloat_domain.disallow_infis true and eitherfeatures.num_stats.min==inf/-inforfeatures.num_stats.max==inf/-inf
- Schema Fields:
INT_TYPE_BIG_INT- Schema Fields:
feature.int_domain.max
- Statistics Fields:
features.typefeatures.num_stats.maxfeatures.string_stats.rank_histogram
- Detection Condition:
- If
features.type==INT,features.num_stats.max>feature.int_domain.max; or
- if
features.type==BYTESorSTRING,- maximum value in
features.string_stats.rank_histogram(when converted to int) >feature.int_domain.max
- maximum value in
- If
- Schema Fields:
INT_TYPE_INT_EXPECTED- Anomaly type not detected in TFDV
INT_TYPE_NOT_INT_STRING- Schema Fields:
feature.int_domain
- Statistics Fields:
features.typefeatures.string_stats.rank_histogram
- Detection Condition:
features.type==BYTESorSTRINGandfeatures.string_stats.rank_histogramhas at least one value that cannot be converted to an int
- Schema Fields:
INT_TYPE_NOT_STRING- Anomaly type not detected in TFDV
INT_TYPE_SMALL_INT- Schema Fields:
feature.int_domain.min
- Statistics Fields:
features.typefeatures.num_stats.minfeatures.string_stats.rank_histogram
- Detection Condition:
- If
features.type==INT,features.num_stats.min<feature.int_domain.min; or
- if
features.type==BYTESorSTRING,- minimum value in
features.string_stats.rank_histogram(when converted to int) <feature.int_domain.min
- minimum value in
- If
- Schema Fields:
INT_TYPE_STRING_EXPECTED- Anomaly type not detected in TFDV
INT_TYPE_UNKNOWN_TYPE_NUMBER- Anomaly type not detected in TFDV
LOW_SUPPORTED_IMAGE_FRACTION- Schema Fields:
feature.image_domain.minimum_supported_image_fraction
- Statistics Fields:
features.custom_stats.rank_histogramfor the custom_stats with nameimage_format_histogram. Note that semantic domain stats must be enabled for the image_format_histogram to be generated and for this validation to be performed. Semantic domain stats are not generated by default.
- Detection Condition:
- The fraction of values that are supported Tensorflow image types to
all image types is less than
feature.image_domain.minimum_supported_image_fraction.
- The fraction of values that are supported Tensorflow image types to
all image types is less than
- Schema Fields:
SCHEMA_MISSING_COLUMN- Schema Fields:
feature.in_environmentorfeature.not_in_environmentorschema.default_environmentfeature.lifecycle_stagefeature.presence.min_countorfeature.presence.min_fraction
- Detection Condition:
feature.lifecycle_stage!=PLANNED,ALPHA,DEBUG, orDEPRECATEDandfeature.presence.min_count> 0 orfeature.presence.min_fraction> 0 andfeature.in_environment== current environment orfeature.not_in_environment!= current environment orschema.default_environment!= current environment and- no feature with the specified name/path is found in the statistics proto
- Schema Fields:
SCHEMA_NEW_COLUMN- Detection Condition:
- there is a feature in the statistics proto but no feature with its name/path in the schema proto
- Detection Condition:
SCHEMA_TRAINING_SERVING_SKEW- Anomaly type not detected in TFDV
STRING_TYPE_NOW_FLOAT- Anomaly type not detected in TFDV
STRING_TYPE_NOW_INT- Anomaly type not detected in TFDV
COMPARATOR_CONTROL_DATA_MISSING- Schema Fields:
feature.skew_comparator.infinity_norm.thresholdfeature.drift_comparator.infinity_norm.threshold
- Detection Condition:
- control statistics proto (i.e., serving statistics for skew or previous statistics for drift) is available but does not contain the specified feature
- Schema Fields:
COMPARATOR_TREATMENT_DATA_MISSING- Anomaly type not detected in TFDV
COMPARATOR_L_INFTY_HIGH- Schema Fields:
feature.skew_comparator.infinity_norm.thresholdfeature.drift_comparator.infinity_norm.threshold
- Statistics Fields:
features.string_stats.rank_histogram*
- Detection Condition:
- L-infinity norm of the vector that represents the difference between
the normalized counts from the
features.string_stats.rank_histogram* in the control statistics (i.e., serving statistics for skew or previous statistics for drift) and the treatment statistics (i.e., training statistics for skew or current statistics for drift) >feature.skew_comparator.infinity_norm.thresholdorfeature.drift_comparator.infinity_norm.threshold
- L-infinity norm of the vector that represents the difference between
the normalized counts from the
- Schema Fields:
COMPARATOR_NORMALIZED_ABSOLUTE_DIFFERENCE_HIGH- Schema Fields:
feature.skew_comparator.normalized_abs_difference.thresholdfeature.drift_comparator.normalized_abs_difference.threshold
- Statistics Fields:
features.string_stats.rank_histogram
- Detection Condition:
- The normalized absolute count difference of value counts from the
features.string_stats.rank_histogramin the control statistics (i.e., serving statistics for skew or previous statistics for drift) and the treatment statistics (i.e., training statistics for skew or current statistics for drift) exceeded feature.skew_comparator.normalized_abs_difference.threshold or feature.drift_comparator.normalized_abs_difference.threshold. Count differences are normalized by the total count across both conditions.
- The normalized absolute count difference of value counts from the
- Schema Fields:
COMPARATOR_JENSEN_SHANNON_DIVERGENCE_HIGH- Schema Fields:
feature.skew_comparator.jensen_shannon_divergence.thresholdfeature.drift_comparator.jensen_shannon_divergence.threshold
- Statistics Fields:
features.num_stats.histogramsof typeSTANDARDfeatures.string_stats.rank_histogram*
- Detection Condition:
- Approximate Jensen-Shannon divergence computed between in the
control statistics (i.e., serving statistics for skew or previous
statistics for drift) and the treatment statistics (i.e., training
statistics for skew or current statistics for drift) >
feature.skew_comparator.jensen_shannon_divergence.thresholdorfeature.drift_comparator.jensen_shannon_divergence.threshold. The approximate Jensen-Shannon divergence is computed based on the normalized sample counts in bothfeatures.num_stats.histogramsstandard histogram andfeatures.string_stats.rank_histogram*.
- Approximate Jensen-Shannon divergence computed between in the
control statistics (i.e., serving statistics for skew or previous
statistics for drift) and the treatment statistics (i.e., training
statistics for skew or current statistics for drift) >
- Schema Fields:
NO_DATA_IN_SPAN- Anomaly type not detected in TFDV
SPARSE_FEATURE_MISSING_VALUE- Schema Fields:
sparse_feature.value_feature
- Statistics Fields:
features.custom_stats
- Detection Condition:
features.custom_statswith "missing_value" as name andmissing_valuecustom stat != 0
- Schema Fields:
SPARSE_FEATURE_MISSING_INDEX- Schema Fields:
sparse_feature.index_feature
- Statistics Fields:
features.custom_stats
- Detection Condition:
features.custom_statswith "missing_index" as name andmissing_indexcustom stat contains any value != 0
- Schema Fields:
SPARSE_FEATURE_LENGTH_MISMATCH- Schema Fields:
sparse_feature.value_featuresparse_feature.index_feature
- Statistics Fields:
features.custom_stats
- Detection Condition:
features.custom_statswith "min_length_diff" or "max_length_diff" as namemin_length_difformax_length_diffcustom stat contains any value != 0
- Schema Fields:
SPARSE_FEATURE_NAME_COLLISION- Schema Fields:
sparse_feature.namesparse_feature.lifecycle_stagefeature.namefeature.lifecycle_stage
- Detection Condition:
sparse_feature.lifecycle_stage!=PLANNED,ALPHA,DEBUG, orDEPRECATED, andfeature.lifecycle_stage!=PLANNED,ALPHA,DEBUG, orDEPRECATED, andsparse_feature.name==feature.name
- Schema Fields:
SEMANTIC_DOMAIN_UPDATE- Schema Fields:
feature.domain_info
- Statistics Fields:
features.custom_stats
- Detection Condition:
features.custom_statswith "domain_info" as name andfeature.domain_infois not already set in the schema and- there is a single
domain_infocustom stat for the feature
- Schema Fields:
COMPARATOR_LOW_NUM_EXAMPLES- Schema Fields:
schema.dataset_constraints.num_examples_drift_comparator.min_fraction_thresholdschema.dataset_constraints.num_examples_version_comparator.min_fraction_threshold
- Statistics Fields:
num_examples*
- Detection Condition:
num_examples* > 0 and- previous statistics proto is available and
num_examples* / previous statisticsnum_examples* < comparatormin_fraction_threshold
- Schema Fields:
COMPARATOR_HIGH_NUM_EXAMPLES- Schema Fields:
schema.dataset_constraints.num_examples_drift_comparator.max_fraction_thresholdschema.dataset_constraints.num_examples_version_comparator.max_fraction_threshold
- Statistics Fields:
num_examples*
- Detection Condition:
num_examples* > 0 and- previous statistics proto is available and
num_examples* / previous statisticsnum_examples* > comparatormax_fraction_threshold
- Schema Fields:
DATASET_LOW_NUM_EXAMPLES- Schema Fields:
schema.dataset_constraints.min_examples_count
- Statistics Fields:
num_examples*
- Detection Condition:
num_examples* <dataset_constraints.min_examples_count
- Schema Fields:
DATASET_HIGH_NUM_EXAMPLES- Schema Fields:
schema.dataset_constraints.max_examples_count
- Statistics Fields:
num_examples*
- Detection Condition:
num_examples* >dataset_constraints.max_examples_count
- Schema Fields:
WEIGHTED_FEATURE_NAME_COLLISION- Schema Fields:
weighted_feature.nameweighted_feature.lifecycle_stagesparse_feature.namesparse_feature.lifecycle_stagefeature.namefeature.lifecycle_stage
- Detection Condition:
weighted_feature.lifecycle_stage!=PLANNED,ALPHA,DEBUG, orDEPRECATEDand either- if
feature.lifecycle_stage!=PLANNED,ALPHA,DEBUG, orDEPRECATED,weighted_feature.name==feature.name; or
- if
sparse_feature.lifecycle_stage!=PLANNED,ALPHA,DEBUG, orDEPRECATED,weighted_feature.name==sparse_feature.name
- if
- Schema Fields:
WEIGHTED_FEATURE_MISSING_VALUE- Schema Fields:
weighted_feature.feature
- Statistics Fields:
features.custom_stats
- Detection Condition:
features.custom_statswith "missing_value" as name andmissing_valuecustom stat != 0
- Schema Fields:
WEIGHTED_FEATURE_MISSING_WEIGHT- Schema Fields:
weighted_feature.weight_feature
- Statistics Fields:
features.custom_stats
- Detection Condition:
features.custom_statswith "missing_weight" as name andmissing_weightcustom stat != 0
- Schema Fields:
WEIGHTED_FEATURE_LENGTH_MISMATCH- Schema Fields:
weighted_feature.featureweighted_feature.weight_feature
- Statistics Fields:
features.custom_stats
- Detection Condition:
features.custom_statswith "min_weighted_length_diff" or "max_weight_length_diff" as name, andmin_weight_length_difformax_weight_length_diffcustom stat != 0
- Schema Fields:
VALUE_NESTEDNESS_MISMATCH- Schema Fields:
feature.value_countfeature.value_counts
- Statistics Fields:
features.common_stats.presence_and_valency_stats
- Detection Condition:
feature.value_countis specified, and there is a repeatedpresence_and_valency_statsof the feature (which indicates a nestedness level that is greater than one) andfeature.value_countsis specified, and the number of times thepresence_and_valency_statsof the feature is repeated does not match the number of timesvalue_countis repeated withinfeature.value_counts
- Schema Fields:
DOMAIN_INVALID_FOR_TYPE- Schema Fields:
feature.typefeature.domain_info
- Statistics Fields:
features.type
- Detection Condition:
- If
features.type==BYTES,feature.domain_infois of an incompatible type; or
- if
features.type!=BYTES,feature.domain_infodoes not matchfeature.type(e.g.,int_domainis specified, but feature'stypeisFLOAT)
- If
- Schema Fields:
FEATURE_MISSING_NAME- Schema Fields:
feature.name
- Detection Condition:
feature.nameis not specified
- Schema Fields:
FEATURE_MISSING_TYPE- Schema Fields:
feature.type
- Detection Condition:
feature.typeis not specified
- Schema Fields:
INVALID_SCHEMA_SPECIFICATION- Schema Fields:
feature.domain_infofeature.presence.min_fractionfeature.value_count.minfeature.value_count.maxfeature.distribution_constraints
- Detection Condition:
feature.presence.min_fraction< 0.0 or > 1.0, orfeature.value_count.min< 0 or >feature.value_count.max, or- a bool, int, float, struct, or semantic domain is specified for a
feature and
feature.distribution_constraintsis also specified for that feature, or feature.distribution_constraintsis specified for a feature, but neither a schema-level domain norfeature.string_domainis specified for that feature
- Schema Fields:
INVALID_DOMAIN_SPECIFICATION- Schema Fields:
feature.domain_infofeature.bool_domainfeature.string_domain
- Detection Condition:
- Unknown
feature.domain_infotype is specified or feature.domainis specified, but there is no matching domain specified at the schema level, or- if
feature.bool_domain,feature.bool_domain.true_value, andfeature.bool_domain.false_valueare specified,feature.bool_domain.true_value==feature.bool_domain.false_value, or
- if
feature.string_domainis specified,- has duplicated
feature.string_domain.valuesor feature.string_domainexceeds the maximum size
- has duplicated
- Unknown
- Schema Fields:
UNEXPECTED_DATA_TYPE- Schema Fields:
feature.type
- Statistics Fields:
features.type
- Detection Condition:
features.typeis not of type specified infeature.type
- Schema Fields:
SEQUENCE_VALUE_TOO_FEW_OCCURRENCES- Schema Fields:
feature.natural_language_domain.token_constraints.min_per_sequence
- Statistics Fields:
features.custom_stats.nl_statistics.token_statistics.per_sequence_min_frequency
- Detection Condition:
min_per_sequence>per_sequence_min_frequency
- Schema Fields:
SEQUENCE_VALUE_TOO_MANY_OCCURRENCES- Schema Fields:
feature.natural_language_domain.token_constraints.max_per_sequence
- Statistics Fields:
features.custom_stats.nl_statistics.token_statistics.per_sequence_max_frequency
- Detection Condition:
max_per_sequence<per_sequence_max_frequency
- Schema Fields:
SEQUENCE_VALUE_TOO_SMALL_FRACTION- Schema Fields:
feature.natural_language_domain.token_constraints.min_fraction_of_sequences
- Statistics Fields:
features.custom_stats.nl_statistics.token_statistics.fraction_of_sequences
- Detection Condition:
min_fraction_of_sequences>fraction_of_sequences
- Schema Fields:
SEQUENCE_VALUE_TOO_LARGE_FRACTION- Schema Fields:
feature.natural_language_domain.token_constraints.max_fraction_of_sequences
- Statistics Fields:
features.custom_stats.nl_statistics.token_statistics.fraction_of_sequences
- Detection Condition:
max_fraction_of_sequences<fraction_of_sequences
- Schema Fields:
FEATURE_COVERAGE_TOO_LOW- Schema Fields:
feature.natural_language_domain.coverage.min_coverage
- Statistics Fields:
features.custom_stats.nl_statistics.feature_coverage
- Detection Condition:
feature_coverage<coverage.min_coverage
- Schema Fields:
FEATURE_COVERAGE_TOO_SHORT_AVG_TOKEN_LENGTH- Schema Fields:
feature.natural_language_domain.coverage.min_avg_token_length
- Statistics Fields:
features.custom_stats.nl_statistics.avg_token_length
- Detection Condition:
avg_token_length<min_avg_token_length
- Schema Fields:
NLP_WRONG_LOCATION- Anomaly type not detected in TFDV
EMBEDDING_SHAPE_INVALID- Anomaly type not detected in TFDV
MAX_IMAGE_BYTE_SIZE_EXCEEDED- Schema Fields:
feature.image_domain.max_image_byte_size
- Statistics Fields:
features.bytes_stats.max_num_bytes_int
- Detection Condition:
max_num_bytes_int>max_image_byte_size
- Schema Fields:
INVALID_FEATURE_SHAPE- Schema Fields:
feature.shape
- Statistics Fields:
features.common_stats.num_missingfeatures.common_stats.min_num_valuesfeatures.common_stats.max_num_valuesfeatures.common_stats.presence_and_valency_stats.num_missingfeatures.common_stats.presence_and_valency_stats.min_num_valuesfeatures.common_stats.presence_and_valency_stats.max_num_valuesfeatures.common_stats.weighted_presence_and_valency_stats
- Detection Condition:
feature.shapeis specified, and either- the feature may be missing (
num_missing!= 0) at some nest level or - the feature may have variable number of values (
min_num_values!=max_num_values) at some nest level or - the specified shape is not compatible with the feature's value
count stats. For example, shape
[16]is compatible with (min_num_values==max_num_values==[2, 2, 4](for a 3-nested feature))
- the feature may be missing (
- Schema Fields:
STATS_NOT_AVAILBLE- Anomaly occurs when stats needed to validate constraints are not present.
DERIVED_FEATURE_BAD_LIFECYCLE- Schema Fields:
feature.lifecycle_stage
- Statistics Fields:
features.validation_derived_source
- Detection Condition:
feature.lifecycle_stageis not one ofDERIVEDorDISABLED, andfeatures.validation_derived_sourceis present, indicating that this is a derived feature.
- Schema Fields:
DERIVED_FEATURE_INVALID_SOURCE- Schema Fields:
feature.validation_derived_source
- Statistics Fields:
features.validation_derived_source
- Detection Condition:
features.validation_derived_sourceis present for a feature, but the correspondingfeature.validation_derived_sourceis not.
- Schema Fields:
* If a weighted statistic is available for this field, it will be used instead of the non-weighted statistic.