Join us at TensorFlow World, Oct 28-31. Use code TF20 for 20% off select passes. Register now

tfdv.generate_statistics_from_dataframe

tfdv.generate_statistics_from_dataframe(
    dataframe,
    stats_options=options.StatsOptions(),
    n_jobs=1
)

Compute data statistics for the input pandas DataFrame.

This is a utility method for users with in-memory data represented as a pandas DataFrame.

Args:

  • dataframe: Input pandas DataFrame.
  • stats_options: tfdv.StatsOptions for generating data statistics.
  • n_jobs: Number of processes to run (defaults to 1). If -1 is provided, uses the same number of processes as the number of CPU cores.

Returns:

A DatasetFeatureStatisticsList proto.