TFDV checks for anomalies by comparing a schema and statistics proto(s). The following chart lists the anomaly types that TFDV can detect, the schema and statistics fields that are used to detect each anomaly type, and the condition(s) under which each anomaly type is detected.
BOOL_TYPE_BIG_INT
- Schema Fields:
feature.bool_domain
feature.type
- Statistics Fields:
feature.num_stats.max
- Detection Condition:
feature.type
==INT
andfeature.bool_domain
is specified andfeature.num_stats.max
> 1
- Schema Fields:
BOOL_TYPE_BYTES_NOT_INT
- Anomaly type not detected in TFDV
BOOL_TYPE_BYTES_NOT_STRING
- Anomaly type not detected in TFDV
BOOL_TYPE_FLOAT_NOT_INT
- Anomaly type not detected in TFDV
BOOL_TYPE_FLOAT_NOT_STRING
- Anomaly type not detected in TFDV
BOOL_TYPE_INT_NOT_STRING
- Anomaly type not detected in TFDV
BOOL_TYPE_SMALL_INT
- Schema Fields:
feature.bool_domain
feature.type
- Statistics Fields:
feature.num_stats.min
- Detection Condition:
feature.type
==INT
andfeature.bool_domain
is specified andfeature.num_stats.min
< 0
- Schema Fields:
BOOL_TYPE_STRING_NOT_INT
- Anomaly type not detected in TFDV
BOOL_TYPE_UNEXPECTED_STRING
- Schema Fields:
feature.bool_domain
feature.type
- Statistics Fields:
feature.string_stats.rank_histogram
*
- Detection Condition:
- at least one value in
rank_histogram
is notfeature.bool_domain.true_value
orfeature.bool_domain.false_value
- at least one value in
- Schema Fields:
BOOL_TYPE_UNEXPECTED_FLOAT
- Schema Fields:
feature.bool_domain
feature.type
- Statistics Fields:
feature.num_stats.min
feature.num_stats.max
feature.num_stats.histograms.num_nan
feature.num_stats.histograms.buckets.low_value
feature.num_stats.histograms.buckets.high_value
- Detection Condition:
feature.type
==FLOAT
andfeature.bool_domain
is specified andfeature.num_stats.min
!= 0 andfeature.num_stats.min
!= 1 or
feature.num_stats.max
!= 0 andfeature.num_stats.max
!= 1 or
feature.num_stats.histograms.num_nan
> 0 or
feature.num_stats.histograms.buckets.low_value
< 0 or
feature.num_stats.histograms.buckets.high_value
> 1 or
feature.num_stats.histograms.buckets.low_value
> 0 andhigh_value
< 1
- Schema Fields:
ENUM_TYPE_BYTES_NOT_STRING
- Anomaly type not detected in TFDV
ENUM_TYPE_FLOAT_NOT_STRING
- Anomaly type not detected in TFDV
ENUM_TYPE_INT_NOT_STRING
- Anomaly type not detected in TFDV
ENUM_TYPE_INVALID_UTF8
- Statistics Fields:
feature.string_stats.invalid_utf8_count
- Detection Condition:
invalid_utf8_count
> 0
- Statistics Fields:
ENUM_TYPE_UNEXPECTED_STRING_VALUES
- Schema Fields:
string_domain
andfeature.domain
; orfeature.string_domain
feature.distribution_constraints.min_domain_mass
- Statistics Fields:
feature.string_stats.rank_histogram
*
- Detection Condition:
- (number of values in
rank_histogram
that are not in domain / total number of values) > (1 -feature.distribution_constraints.min_domain_mass
); or feature.distribution_constraints.min_domain_mass
== 1.0 and there are values in the histogram that are not in the domain
- (number of values in
- Schema Fields:
FEATURE_TYPE_HIGH_NUMBER_VALUES
- Schema Fields:
feature.value_count.max
feature.value_counts.value_count.max
- Statistics Fields:
feature.common_stats.max_num_values
feature.common_stats.presence_and_valency_stats.max_num_values
- Detection Condition:
feature.value_count.max
is specified andfeature.common_stats.max_num_values
>feature.value_count.max
; orfeature.value_counts
is specified andfeature.common_stats.presence_and_valency_stats.max_num_values
>feature.value_counts.value_count.max
at a given nestedness level
- Schema Fields:
FEATURE_TYPE_LOW_FRACTION_PRESENT
- Schema Fields:
feature.presence.min_fraction
- Statistics Fields:
feature.common_stats.num_non_missing
*num_examples
*
- Detection Condition:
feature.presence.min_fraction
is specified and (feature.common_stats.num_non_missing
/num_examples
) <feature.presence.min_fraction
; orfeature.presence.min_fraction
== 1.0 andcommon_stats.num_missing
!= 0
- Schema Fields:
FEATURE_TYPE_LOW_NUMBER_PRESENT
- Schema Fields:
feature.presence.min_count
- Statistics Fields:
feature.common_stats.num_non_missing
*
- Detection Condition:
feature.presence.min_count
is specified andfeature.common_stats.num_non_missing
== 0 orfeature.common_stats.num_non_missing
<feature.presence.min_count
- Schema Fields:
FEATURE_TYPE_LOW_NUMBER_VALUES
- Schema Fields:
feature.value_count.min
feature.value_counts.value_count.min
- Statistics Fields:
feature.common_stats.min_num_values
feature.common_stats.presence_and_valency_stats.min_num_values
- Detection Condition:
feature.value_count.min
is specified andfeature.common_stats.min_num_values
<feature.value_count.min
; orfeature.value_counts
is specified andfeature.common_stats.presence_and_valency_stats.min_num_values
<feature.value_counts.value_count.min
at a given nestedness level
- Schema Fields:
FEATURE_TYPE_NOT_PRESENT
- Schema Fields:
feature.in_environment
orfeature.not_in_environment
orschema.default_environment
feature.lifecycle_stage
feature.presence.min_count
orfeature.presence.min_fraction
- Statistics Fields:
feature.common_stats.num_non_missing
*
- Detection Condition:
feature.lifecycle_stage
!=PLANNED
,ALPHA
,DEBUG
, orDEPRECATED
andfeature.presence.min_count
> 0 orfeature.presence.min_fraction
> 0 andfeature.in_environment
== current environment orfeature.not_in_environment
!= current environment orschema.default_environment
!= current environment andcommon_stats.num_non_missing
* == 0
- Schema Fields:
FEATURE_TYPE_NO_VALUES
- Anomaly type not detected in TFDV
FEATURE_TYPE_UNEXPECTED_REPEATED
- Anomaly type not detected in TFDV
FEATURE_TYPE_HIGH_UNIQUE
- Schema Fields:
feature.unique_constraints.max
- Statistics Fields:
feature.string_stats.unique
- Detection Condition:
feature.string_stats.unique
>feature.unique_constraints.max
- Schema Fields:
FEATURE_TYPE_LOW_UNIQUE
- Schema Fields:
feature.unique_constraints.min
- Statistics Fields:
feature.string_stats.unique
- Detection Condition:
feature.string_stats.unique
<feature.unique_constraints.min
- Schema Fields:
FEATURE_TYPE_NO_UNIQUE
- Schema Fields:
feature.unique_constraints
- Statistics Fields:
feature.string_stats.unique
- Detection Condition:
feature.unique_constraints
specified but nofeature.string_stats.unique
present (as is the case where the feature is not a string or categorical)
- Schema Fields:
FLOAT_TYPE_BIG_FLOAT
- Schema Fields:
feature.float_domain.max
- Statistics Fields:
feature.type
feature.num_stats.max
orfeature.string_stats.rank_histogram
- Detection Condition:
feature.type
==FLOAT
,BYTES
, orSTRING
and- if
feature.type
isFLOAT
:feature.num_stats.max
>feature.float_domain.max
- if
feature.type
isBYTES
orSTRING
: maximum value infeature.string_stats.rank_histogram
(when converted to float) >feature.float_domain.max
- Schema Fields:
FLOAT_TYPE_NOT_FLOAT
- Anomaly type not detected in TFDV
FLOAT_TYPE_SMALL_FLOAT
- Schema Fields:
feature.float_domain.min
- Statistics Fields:
feature.type
feature.num_stats.min
orfeature.string_stats.rank_histogram
- Detection Condition:
feature.type
==FLOAT
,BYTES
, orSTRING
and- if
feature.type
isFLOAT
: feature.num_stats.min < feature.float_domain.min - if
feature.type
isBYTES
orSTRING
: minimum value infeature.string_stats.rank_histogram
(when converted to float) < feature.float_domain.min
- Schema Fields:
FLOAT_TYPE_STRING_NOT_FLOAT
- Schema Fields:
feature.float_domain
- Statistics Fields:
feature.type
feature.string_stats.rank_histogram
- Detection Condition:
feature.type
==BYTES
orSTRING
andfeature.string_stats.rank_histogram
has at least one value that cannot be converted to a float
- Schema Fields:
FLOAT_TYPE_NON_STRING
- Anomaly type not detected in TFDV
FLOAT_TYPE_UNKNOWN_TYPE_NUMBER
- Anomaly type not detected in TFDV
FLOAT_TYPE_HAS_NAN
- Schema Fields:
feature.float_domain.disallow_nan
- Statistics Fields:
feature.type
feature.num_stats.histograms.num_nan
- Detection Condition:
float_domain.disallow_nan is true
andfeature.num_stats.histograms.num_nan > 0
- Schema Fields:
FLOAT_TYPE_HAS_INF
- Schema Fields:
feature.float_domain.disallow_inf
- Statistics Fields:
feature.type
feature.num_stats.min
feature.num_stats.max
- Detection Condition:
float_domain.disallow_inf is true
andfeature.num_stats.min == inf/-inf
orfeature.num_stats.max == inf/-inf
- Schema Fields:
INT_TYPE_BIG_INT
- Schema Fields:
feature.int_domain.max
- Statistics Fields:
feature.type
feature.num_stats.max
orfeature.string_stats.rank_histogram
- Detection Condition:
feature.type
==INT
,BYTES
, orSTRING
and- if
feature.type
isINT
:feature.num_stats.max
>feature.int_domain.max
- if
feature.type
isBYTES
orSTRING
: maximum value infeature.string_stats.rank_histogram
(when converted to int) >feature.int_domain.max
- Schema Fields:
INT_TYPE_INT_EXPECTED
- Anomaly type not detected in TFDV
INT_TYPE_NOT_INT_STRING
- Schema Fields:
feature.int_domain
- Statistics Fields:
feature.type
feature.string_stats.rank_histogram
- Detection Condition:
feature.type
==BYTES
orSTRING
andfeature.string_stats.rank_histogram
has at least one value that cannot be converted to an int
- Schema Fields:
INT_TYPE_NOT_STRING
- Anomaly type not detected in TFDV
INT_TYPE_SMALL_INT
- Schema Fields:
feature.int_domain.min
- Statistics Fields:
feature.type
feature.num_stats.min
orfeature.string_stats.rank_histogram
- Detection Condition:
feature.type
==INT
,BYTES
, orSTRING
and- if
feature.type
isINT
:feature.num_stats.min
<feature.int_domain.min
- if
feature.type
isBYTES
orSTRING
: minimum value infeature.string_stats.rank_histogram
(when converted to int) <feature.int_domain.min
- Schema Fields:
INT_TYPE_STRING_EXPECTED
- Anomaly type not detected in TFDV
INT_TYPE_UNKNOWN_TYPE_NUMBER
- Anomaly type not detected in TFDV
LOW_SUPPORTED_IMAGE_FRACTION
- Schema Fields:
feature.image_domain.minimum_supported_image_fraction
- Statistics Fields:
feature.custom_stats.rank_histogram
for the custom_stats with nameimage_format_histogram
. Note that semantic domain stats must be enabled for the image_format_histogram to be generated and for this validation to be performed. Semantic domain stats are not generated by default.
- Detection Condition:
- The fraction of values that are supported Tensorflow image types to
all image types is less than
feature.image_domain.minimum_supported_image_fraction
.
- The fraction of values that are supported Tensorflow image types to
all image types is less than
- Schema Fields:
SCHEMA_MISSING_COLUMN
- Schema Fields:
feature.in_environment
orfeature.not_in_environment
orschema.default_environment
feature.lifecycle_stage
feature.presence.min_count
orfeature.presence.min_fraction
- Detection Condition:
feature.lifecycle_stage
!=PLANNED
,ALPHA
,DEBUG
, orDEPRECATED
andfeature.presence.min_count
> 0 orfeature.presence.min_fraction
> 0 andfeature.in_environment
== current environment orfeature.not_in_environment
!= current environment orschema.default_environment
!= current environment and- no feature with the specified name/path is found in the statistics proto
- Schema Fields:
SCHEMA_NEW_COLUMN
- Detection Condition:
- there is a feature in the statistics proto but no feature with its name/path in the schema proto
- Detection Condition:
SCHEMA_TRAINING_SERVING_SKEW
- Anomaly type not detected in TFDV
STRING_TYPE_NOW_FLOAT
- Anomaly type not detected in TFDV
STRING_TYPE_NOW_INT
- Anomaly type not detected in TFDV
COMPARATOR_CONTROL_DATA_MISSING
- Schema Fields:
feature.skew_comparator.infinity_norm.threshold
feature.drift_comparator.infinity_norm.threshold
- Detection Condition:
- control statistics proto (i.e., serving statistics for skew or previous statistics for drift) is available but does not contain the specified feature
- Schema Fields:
COMPARATOR_TREATMENT_DATA_MISSING
- Anomaly type not detected in TFDV
COMPARATOR_L_INFTY_HIGH
- Schema Fields:
feature.skew_comparator.infinity_norm.threshold
feature.drift_comparator.infinity_norm.threshold
- Statistics Fields:
feature.string_stats.rank_histogram
*
- Detection Condition:
- L-infinity norm of the vector that represents the difference between
the normalized counts from the
feature.string_stats.rank_histogram
in the control statistics (i.e., serving statistics for skew or previous statistics for drift) and the treatment statistics (i.e., training statistics for skew or current statistics for drift) >feature.skew_comparator.infinity_norm.threshold
orfeature.drift_comparator.infinity_norm.threshold
- L-infinity norm of the vector that represents the difference between
the normalized counts from the
- Schema Fields:
COMPARATOR_NORMALIZED_ABSOLUTE_DIFFERENCE_HIGH
- Schema Fields:
feature.skew_comparator.normalized_abs_difference.threshold
feature.drift_comparator.normalized_abs_difference.threshold
- Statistics Fields:
feature.string_stats.rank_histogram
- Detection Condition:
- The normalized absolute count difference of value counts from the
feature.string_stats.rank_histogram
in the control statistics (i.e., serving statistics for skew or previous statistics for drift) and the treatment statistics (i.e., training statistics for skew or current statistics for drift) exceeded feature.skew_comparator.normalized_abs_difference.threshold or feature.drift_comparator.normalized_abs_difference.threshold. Count differences are normalized by the total count across both conditions.
- The normalized absolute count difference of value counts from the
- Schema Fields:
COMPARATOR_JENSEN_SHANNON_DIVERGENCE_HIGH
- Schema Fields:
feature.skew_comparator.jensen_shannon_divergence.threshold
feature.drift_comparator.jensen_shannon_divergence.threshold
- Statistics Fields:
feature.num_stats.histograms
* of typeSTANDARD
- Detection Condition:
- Approximate Jensen-Shannon divergence computed between in the
control statistics (i.e., serving statistics for skew or previous
statistics for drift) and the treatment statistics (i.e., training
statistics for skew or current statistics for drift) >
feature.skew_comparator.jensen_shannon_divergence.threshold
orfeature.drift_comparator.jensen_shannon_divergence.threshold
. The approximate Jensen-Shannon divergence is computed based on the normalized sample counts in both num_stats standard histogram and string_stats rank histogram.
- Approximate Jensen-Shannon divergence computed between in the
control statistics (i.e., serving statistics for skew or previous
statistics for drift) and the treatment statistics (i.e., training
statistics for skew or current statistics for drift) >
- Schema Fields:
NO_DATA_IN_SPAN
- Anomaly type not detected in TFDV
SPARSE_FEATURE_MISSING_VALUE
- Schema Fields:
sparse_feature.value_feature
- Statistics Fields:
feature.custom_stats
with “missing_value” as name
- Detection Condition:
missing_value
custom stat != 0
- Schema Fields:
SPARSE_FEATURE_MISSING_INDEX
- Schema Fields:
sparse_feature.index_feature
- Statistics Fields:
feature.custom_stats
with “missing_index” as name
- Detection Condition:
missing_index
custom stat contains any value != 0
- Schema Fields:
SPARSE_FEATURE_LENGTH_MISMATCH
- Schema Fields:
sparse_feature.value_feature
sparse_feature.index_feature
- Statistics Fields:
feature.custom_stats
with "min_length_diff” or "max_length_diff" as name
- Detection Condition:
min_length_diff
ormax_length_diff
custom stat contains any value != 0
- Schema Fields:
SPARSE_FEATURE_NAME_COLLISION
- Schema Fields:
sparse_feature.name
sparse_feature.lifecycle_stage
feature.name
feature.lifecycle_stage
- Detection Condition:
sparse_feature.lifecycle_stage
!=PLANNED
,ALPHA
,DEBUG
, orDEPRECATED
andfeature.lifecycle_stage
!=PLANNED
,ALPHA
,DEBUG
, orDEPRECATED
andsparse_feature.name
==feature.name
- Schema Fields:
SEMANTIC_DOMAIN_UPDATE
- Schema Fields:
feature.domain_info
- Statistics Fields:
feature.custom_stats
with "domain_info" as name
- Detection Condition:
feature.domain_info
is not already set in the schema and- there is a single
domain_info
custom stat for the feature
- Schema Fields:
COMPARATOR_LOW_NUM_EXAMPLES
- Schema Fields:
schema.dataset_constraints.num_examples_drift_comparator.min_fraction_threshold
schema.dataset_constraints.num_examples_version_comparator.min_fraction_threshold
- Statistics Fields:
num_examples
*
- Detection Condition:
num_examples
> 0 and- previous statistics proto is available and
num_examples
/ previous statisticsnum_examples
< comparatormin_fraction_threshold
- Schema Fields:
COMPARATOR_HIGH_NUM_EXAMPLES
- Schema Fields:
schema.dataset_constraints.num_examples_drift_comparator.max_fraction_threshold
schema.dataset_constraints.num_examples_version_comparator.max_fraction_threshold
- Statistics Fields:
num_examples
*
- Detection Condition:
num_examples
> 0 and- previous statistics proto is available and
num_examples
/ previous statisticsnum_examples
> comparatormax_fraction_threshold
- Schema Fields:
DATASET_LOW_NUM_EXAMPLES
- Schema Fields:
schema.dataset_constraints.min_examples_count
- Statistics Fields:
num_examples
*
- Detection Condition:
num_examples
<dataset_constraints.min_examples_count
- Schema Fields:
DATASET_HIGH_NUM_EXAMPLES
- Schema Fields:
schema.dataset_constraints.max_examples_count
- Statistics Fields:
num_examples
*
- Detection Condition:
num_examples
>dataset_constraints.max_examples_count
- Schema Fields:
WEIGHTED_FEATURE_NAME_COLLISION
- Schema Fields:
weighted_feature.name
weighted_feature.lifecycle_stage
sparse_feature.name
sparse_feature.lifecycle_stage
feature.name
feature.lifecycle_stage
- Detection Condition:
weighted_feature.lifecycle_stage
!=PLANNED
,ALPHA
,DEBUG
, orDEPRECATED
and either:feature.lifecycle_stage
!=PLANNED
,ALPHA
,DEBUG
, orDEPRECATED
andweighted_feature.name
==feature.name
sparse_feature.lifecycle_stage
!=PLANNED
,ALPHA
,DEBUG
, orDEPRECATED
andweighted_feature.name
==sparse_feature.name
- Schema Fields:
WEIGHTED_FEATURE_MISSING_VALUE
- Schema Fields:
weighted_feature.feature
- Statistics Fields:
feature.custom_stats
with “missing_value” as name
- Detection Condition:
missing_value
custom stat != 0
- Schema Fields:
WEIGHTED_FEATURE_MISSING_WEIGHT
- Schema Fields:
weighted_feature.weight_feature
- Statistics Fields:
feature.custom_stats
with “missing_weight” as name
- Detection Condition:
missing_weight
custom stat != 0
- Schema Fields:
WEIGHTED_FEATURE_LENGTH_MISMATCH
- Schema Fields:
weighted_feature.feature
weighted_feature.weight_feature
- Statistics Fields:
feature.custom_stats
with "min_weighted_length_diff” or "max_weight_length_diff" as name
- Detection Condition:
min_weight_length_diff
ormax_weight_length_diff
custom stat != 0
- Schema Fields:
VALUE_NESTEDNESS_MISMATCH
- Schema Fields:
feature.value_count
feature.value_counts
- Statistics Fields:
feature.common_stats.presence_and_valency_stats
- Detection Condition:
feature.value_count
is specified, and there is a repeatedpresence_and_valency_stats
for the feature (which indicates a nestedness level that is greater than one)feature.value_counts
is specified, and the number of times thepresence_and_valency
stats for the feature is repeated does not match the number of timesvalue_count
is repeated withinfeature.value_counts
- Schema Fields:
DOMAIN_INVALID_FOR_TYPE
Schema Fields:
feature.type
feature.domain_info
Statistics Fields:
type
for each feature
Detection Condition:
feature.domain_info
does not match feature'stype
(e.g.,int_domain
is specified, but feature'stype
is float)- feature is of type
BYTES
in statistics butfeature.domain_info
is of an incompatible type
FEATURE_MISSING_NAME
- Schema Fields:
feature.name
- Detection Condition:
feature.name
is not specified
- Schema Fields:
FEATURE_MISSING_TYPE
- Schema Fields:
feature.type
- Detection Condition:
feature.type
is not specified
- Schema Fields:
INVALID_SCHEMA_SPECIFICATION
- Schema Fields:
feature.domain_info
feature.presence.min_fraction
feature.value_count.min
feature.value_count.max
feature.distribution_constraints
- Detection Condition:
feature.presence.min_fraction
< 0.0 or > 1.0feature.value_count.min
< 0 or >feature.value_count.max
- a bool, int, float, struct, or semantic domain is specified for a
feature and
feature.distribution_constraints
is also specified for that feature feature.distribution_constraints
is specified for a feature, but neither a schema-level domain norfeature.string_domain
is specified for that feature
- Schema Fields:
INVALID_DOMAIN_SPECIFICATION
- Schema Fields:
feature.domain_info
feature.bool_domain
feature.string_domain
- Detection Condition:
- unknown
feature.domain_info
type is specified feature.domain
is specified, but there is no matching domain specified at the schema levelfeature.bool_domain.true_value
==feature.bool_domain.false_value
- repeated values in
feature.string_domain
feature.string_domain
exceeds the maximum size
- unknown
- Schema Fields:
UNEXPECTED_DATA_TYPE
- Schema Fields:
feature.type
- Statistics Fields:
type
for each feature
- Detection Condition:
- feature's
type
is not of type specified infeature.type
- feature's
- Schema Fields:
SEQUENCE_VALUE_TOO_FEW_OCCURRENCES
- Schema Fields:
feature.natural_language_domain.token_constraints.min_per_sequence
- Statistics Fields:
feature.custom_stats.nl_statistics.token_statistics.per_sequence_min_frequency
- Detection Condition:
min_per_sequence
>per_sequence_min_frequency
- Schema Fields:
SEQUENCE_VALUE_TOO_MANY_OCCURRENCES
- Schema Fields:
feature.natural_language_domain.token_constraints.max_per_sequence
- Statistics Fields:
feature.custom_stats.nl_statistics.token_statistics.per_sequence_max_frequency
- Detection Condition:
max_per_sequence
<per_sequence_max_frequency
- Schema Fields:
SEQUENCE_VALUE_TOO_SMALL_FRACTION
- Schema Fields:
feature.natural_language_domain.token_constraints.min_fraction_of_sequences
- Statistics Fields:
feature.custom_stats.nl_statistics.token_statistics.fraction_of_sequences
- Detection Condition:
min_fraction_of_sequences
>fraction_of_sequences
- Schema Fields:
SEQUENCE_VALUE_TOO_LARGE_FRACTION
- Schema Fields:
feature.natural_language_domain.token_constraints.max_fraction_of_sequences
- Statistics Fields:
feature.custom_stats.nl_statistics.token_statistics.fraction_of_sequences
- Detection Condition:
max_fraction_of_sequences
<fraction_of_sequences
- Schema Fields:
FEATURE_COVERAGE_TOO_LOW
- Schema Fields:
feature.natural_language_domain.coverage.min_coverage
- Statistics Fields:
feature.custom_stats.nl_statistics.feature_coverage
- Detection Condition:
feature_coverage
<coverage.min_coverage
- Schema Fields:
FEATURE_COVERAGE_TOO_SHORT_AVG_TOKEN_LENGTH
- Schema Fields:
feature.natural_language_domain.coverage.min_avg_token_length
- Statistics Fields:
feature.custom_stats.nl_statistics.avg_token_length
- Detection Condition:
avg_token_length
<min_avg_token_length
- Schema Fields:
NLP_WRONG_LOCATION
- Anomaly type not detected in TFDV
EMBEDDING_SHAPE_INVALID
- Anomaly type not detected in TFDV
MAX_IMAGE_BYTE_SIZE_EXCEEDED
- Schema Fields:
feature.image_domain.max_image_byte_size
- Statistics Fields:
feature.bytes_stats.max_num_bytes_int
- Detection Condition:
max_num_bytes_int
>max_image_byte_size
- Schema Fields:
INVALID_FEATURE_SHAPE
- Schema Fields:
feature.shape
- Statistics Fields:
feature.common_stats.num_missing
feature.common_stats.min_num_values
feature.common_stats.max_num_values
feature.common_stats.presence_and_valency_stats.num_missing
feature.common_stats.presence_and_valency_stats.min_num_values
feature.common_stats.presence_and_valency_stats.max_num_values
feature.common_stats.weighted_presence_and_valency_stats
- Detection Condition:
feature.shape
is specified, and one of the following:- the feature may be missing (
num_missing != 0
) at some nest level. - the feature may have variable number of values (
min_num_values != max_num_values
) at some nest level - the specified shape is not compatible with the feature's value
count stats. For example, shape
[16]
is compatible with (min_num_values == max_num_values == [2, 2, 4]
(for a 3-nested feature)).
- the feature may be missing (
- Schema Fields:
STATS_NOT_AVAILBLE
- Anomaly occurs when stats needed to validate constraints are not present.
DERIVED_FEATURE_BAD_LIFECYCLE
- Schema Fields:
feature.lifecycle_stage
Statistics Fields:
feature.derived_source
Detection Condition:
feature.lifecycle_stage
is not one of DERIVED or DISABLED, andfeature.derived_source
is present, indicating that this is a derived feature.
- Schema Fields:
DERIVED_FEATURE_INVALID_SOURCE
- Schema Fields:
feature.derived_source
Statistics Fields:
feature.derived_source
Detection Condition:
statistics.feature.derived_source
is present for a feature, but the correspondingschema.feature.derived_source
is not.
- Schema Fields:
* If a weighted statistic is available for this field, it will be used instead of the non-weighted statistic.