Missed TensorFlow Dev Summit? Check out the video playlist. Watch recordings

tfdv.infer_schema

View source on GitHub

Infers schema from the input statistics.

tfdv.infer_schema(
    statistics,
    infer_feature_shape=True,
    max_string_domain_size=100,
    schema_transformations=None
)

Args:

  • statistics: A DatasetFeatureStatisticsList protocol buffer. Schema inference is currently supported only for lists with a single DatasetFeatureStatistics proto or lists with multiple DatasetFeatureStatistics protos corresponding to data slices that include the default slice (i.e., the slice with all examples). If a list with multiple DatasetFeatureStatistics protos is used, this function will infer the schema from the statistics corresponding to the default slice.
  • infer_feature_shape: A boolean to indicate if shape of the features need to be inferred from the statistics.
  • max_string_domain_size: Maximum size of the domain of a string feature in order to be interpreted as a categorical feature.
  • schema_transformations: List of transformation functions to apply to the auto-inferred schema. Each transformation function should take the schema and statistics as input and should return the transformed schema. The transformations are applied in the order provided in the list.

Returns:

A Schema protocol buffer.

Raises:

  • TypeError: If the input argument is not of the expected type.
  • ValueError: If the input statistics proto contains multiple datasets, none of which corresponds to the default slice.