Attend the Women in ML Symposium on December 7 Register now

tfdf.keras.FeatureUsage

Stay organized with collections Save and categorize content based on your preferences.

Semantic and hyper-parameters for a single feature.

Used in the notebooks

Used in the tutorials

  1. Limit the input features of the model.
  2. Set manually the semantic of a feature.
  3. Specify feature specific hyper-parameters.

Note that the model's "features" argument is optional. If it is not specified, all available feature will be used. See the "CoreModel" class documentation for more details.

Usage example:

# A feature named "A". The semantic will be detected automatically. The
# global hyper-parameters of the model will be used.
feature_a = FeatureUsage(name="A")

# A feature named "C" representing a CATEGORICAL value.
# Specifying the semantic ensure the feature is correctly detected.
# In this case, the feature might be stored as an integer, and would have be
# detected as NUMERICAL.
feature_b = FeatureUsage(name="B", semantic=Semantic.CATEGORICAL)

# A feature with a specific maximum dictionary size.
feature_c = FeatureUsage(name="C",
                              semantic=Semantic.CATEGORICAL,
                              max_vocab_count=32)

model = CoreModel(features=[feature_a, feature_b, feature_c])

name The name of the feature. Used as an identifier if the dataset is a dictionary of tensors.
semantic Semantic of the feature. If None, the semantic is automatically determined. The semantic controls how a feature is interpreted by a model. Using the wrong semantic (e.g. numerical instead of categorical) will hurt your model. See "FeatureSemantic" and "Semantic" for the definition of the of available semantics.
num_discretized_numerical_bins For DISCRETIZED_NUMERICAL features only. Number of bins used to discretize DISCRETIZED_NUMERICAL features.
max_vocab_count For CATEGORICAL and CATEGORICAL_SET features only. Number of unique categorical values stored as string. If more categorical values are present, the least frequent values are grouped into a Out-of-vocabulary item. Reducing the value can improve or hurt the model.
guide