tfdv.GenerateStatistics

Class GenerateStatistics

API for generating data statistics.

Example:

  with beam.Pipeline(runner=...) as p:
    _ = (p
         | 'ReadData' >> beam.io.ReadFromTFRecord(data_location)
         | 'DecodeData' >> beam.Map(TFExampleDecoder().decode)
         | 'GenerateStatistics' >> GenerateStatistics()
         | 'WriteStatsOutput' >> beam.io.WriteToTFRecord(
             output_path, shard_name_template='',
             coder=beam.coders.ProtoCoder(
                 statistics_pb2.DatasetFeatureStatisticsList)))

__init__

__init__(options=stats_options.StatsOptions())

Initializes the transform.

Args:

  • options: Options for generating data statistics.

Raises:

  • TypeError: If any of the input options is not of the expected type.
  • ValueError: If any of the input options is invalid.

Methods

expand

expand(dataset)