Registration is open for TensorFlow Dev Summit 2020 Learn more

tfdv.update_schema

View source on GitHub

Updates input schema to conform to the input statistics.

tfdv.update_schema(
    schema,
    statistics,
    infer_feature_shape=True,
    max_string_domain_size=100
)

Args:

  • schema: A Schema protocol buffer.
  • statistics: A DatasetFeatureStatisticsList protocol buffer. Schema inference is currently supported only for lists with a single DatasetFeatureStatistics proto or lists with multiple DatasetFeatureStatistics protos corresponding to data slices that include the default slice (i.e., the slice with all examples). If a list with multiple DatasetFeatureStatistics protos is used, this function will update the schema to conform to the statistics corresponding to the default slice.
  • infer_feature_shape: A boolean to indicate if shape of the features need to be inferred from the statistics.
  • max_string_domain_size: Maximum size of the domain of a string feature in order to be interpreted as a categorical feature.

Returns:

A Schema protocol buffer.

Raises:

  • TypeError: If the input argument is not of the expected type.
  • ValueError: If the input statistics proto contains multiple datasets, none of which corresponds to the default slice.