Have a question? Connect with the community at the TensorFlow Forum Visit Forum

tfdf.keras.FeatureUsage

Semantic and hyper-parameters for a single feature.

Used in the notebooks

Used in the tutorials

This class allows to:

  1. Limit the input features of the model.
  2. Set manually the semantic of a feature.
  3. Specify feature specific hyper-parameters.

Note that the model's "features" argument is optional. If it is not specified, all available feature will be used. See the "CoreModel" class documentation for more details.

Usage example:

# A feature named "A". The semantic will be detected automatically. The
# global hyper-parameters of the model will be used.
feature_a = FeatureUsage(name="A")

# A feature named "C" representing a CATEGORICAL value.
# Specifying the semantic ensure the feature is correctly detected.
# In this case, the feature might be stored as an integer, and would have be
# detected as NUMERICAL.
feature_b = FeatureUsage(name="B", semantic=Semantic.CATEGORICAL)

# A feature with a specific maximum dictionary size.
feature_c = FeatureUsage(name="C",
                              semantic=Semantic.CATEGORICAL,
                              max_vocab_count=32)

model = CoreModel(features=[feature_a, feature_b, feature_c])

name The name of the feature. Used as an identifier if the dataset is a dictionary of tensors.
semantic Semantic of the feature. If None, the semantic is automatically determined. The semantic controls how a feature is interpreted by a model. Using the wrong semantic (e.g. numerical instead of categorical) will hurt your model. See "FeatureSemantic" and "Semantic" for the definition of the of available semantics.
discretized For NUMERICAL features only. If set, the numerical values are discretized into a small set of unique values. This makes the training faster but often lead to worst models. A reasonable discretization value is 255.
max_vocab_count For CATEGORICAL features only. Number of unique categorical values. If more categorical values are present, the least frequent values are grouped into a Out-of-vocabulary item. Reducing the value can improve or hurt the model.
guide