tfdf.keras.FeatureUsage

Semantic and hyper-parameters for a single feature.

Used in the notebooks

Used in the tutorials

  1. Limit the input features of the model.
  2. Set manually the semantic of a feature.
  3. Specify feature specific hyper-parameters.

Note that the model's "features" argument is optional. If it is not specified, all available feature will be used. See the "CoreModel" class documentation for more details.

Usage example:

# A feature named "A". The semantic will be detected automatically. The
# global hyper-parameters of the model will be used.
feature_a = FeatureUsage(name="A")

# A feature named "C" representing a CATEGORICAL value.
# Specifying the semantic ensure the feature is correctly detected.
# In this case, the feature might be stored as an integer, and would have be
# detected as NUMERICAL.
feature_b = FeatureUsage(name="B", semantic=Semantic.CATEGORICAL)

# A feature with a specific maximum dictionary size.
feature_c = FeatureUsage(name="C",
                              semantic=Semantic.CATEGORICAL,
                              max_vocab_count=32)

model = CoreModel(features=[feature_a, feature_b, feature_c])

name The name of the feature. Used as an identifier if the dataset is a dictionary of tensors.
semantic Semantic of the feature. If None, the semantic is automatically determined. The semantic controls how a feature is interpreted by a model. Using the wrong semantic (e.g. numerical instead of categorical) will hurt your model. See "FeatureSemantic" and "Semantic" for the definition of the of available semantics.
num_discretized_numerical_bins For DISCRETIZED_NUMERICAL features only. Number of bins used to discretize DISCRETIZED_NUMERICAL features.
max_vocab_count For CATEGORICAL and CATEGORICAL_SET features only. Number of unique categorical values stored as string. If more categorical values are present, the least frequent values are grouped into a Out-of-vocabulary item. Reducing the value can improve or hurt the model.
min_vocab_frequency For CATEGORICAL and CATEGORICAL_SET features only. Minimum number of occurence of a categorical value. Values present less than "min_vocab_frequency" times in the training dataset are treated as "Out-of-vocabulary".
override_global_imputation_value For CATEGORICAL and CATEGORICAL_SET features only. If set, replaces the global imputation value used to handle missing values. That is, at inference time, missing values will be treated as "override_global_imputation_value". "override_global_imputation_value" can only be used on categorical features and on columns not containing missing values in the training dataset. If the algorithm used to handle missing values is not "GLOBAL_IMPUTATION" (default algorithm), this value is ignored.
monotonic Monotonic constraints between the feature and the model output. Use None (default) for a non monotonic constrainted features. Monotonic.INCREASING ensures the model is monotonically increasing with the features. Monotonic.DECREASING ensures the model is monotonically decreasing with the features. Alternatively, you can also use 0, +1 and -1 to respectively define a non-constrained, monotonically increasing, and monotonically decreasing feature.
guide