Compute data statistics for the input pandas DataFrame.
tfdv.generate_statistics_from_dataframe(
dataframe: DataFrame,
stats_options: tfdv.StatsOptions
= options.StatsOptions(),
n_jobs: int = 1
) -> statistics_pb2.DatasetFeatureStatisticsList
This is a utility method for users with in-memory data represented
as a pandas DataFrame.
Args |
dataframe
|
Input pandas DataFrame.
|
stats_options
|
tfdv.StatsOptions for generating data statistics.
|
n_jobs
|
Number of processes to run (defaults to 1). If -1 is provided,
uses the same number of processes as the number of CPU cores.
|
Returns |
A DatasetFeatureStatisticsList proto.
|