Have a question? Connect with the community at the TensorFlow Forum Visit Forum

tfdf.keras.pd_dataframe_to_tf_dataset

Converts a Panda Dataframe into a TF Dataset.

Used in the notebooks

Used in the tutorials

Details:

  • Ensures columns have uniform types.
  • If "label" is provided, separate it as a second channel in the tf.Dataset (as expected by TF-DF).
  • If "task" is provided, ensure the correct dtype of the label. If the task a classification and the label a string, integerize the labels. In this case, the label values are extracted from the dataset and ordered lexicographically. Warning: This logic won't work as expected if the training and testing dataset contains different label values. In such case, it is preferable to convert the label to integers beforehand while making sure the same encoding is used for all the datasets.
  • Returns "tf.data.from_tensor_slices"

dataframe Pandas dataframe containing a training or evaluation dataset.
label Name of the label column.
task Target task of the dataset.
max_num_classes Maximum number of classes for a classification task. A high number of unique value / classes might indicate that the problem is a regression or a ranking instead of a classification. Set to None to disable checking the number of classes.

A TensorFlow Dataset.