New! Use Simple ML for Sheets to apply machine learning to the data in your Google Sheets Read More

tfdf.keras.FeatureUsage

Stay organized with collections Save and categorize content based on your preferences.

Semantic and hyper-parameters for a single feature.

Used in the notebooks

Used in the tutorials

  1. Limit the input features of the model.
  2. Set manually the semantic of a feature.
  3. Specify feature specific hyper-parameters.

Note that the model's "features" argument is optional. If it is not specified, all available feature will be used. See the "CoreModel" class documentation for more details.

Usage example:

# A feature named "A". The semantic will be detected automatically. The
# global hyper-parameters of the model will be used.
feature_a = FeatureUsage(name="A")

# A feature named "C" representing a CATEGORICAL value.
# Specifying the semantic ensure the feature is correctly detected.
# In this case, the feature might be stored as an integer, and would have be
# detected as NUMERICAL.
feature_b = FeatureUsage(name="B", semantic=Semantic.CATEGORICAL)

# A feature with a specific maximum dictionary size.
feature_c = FeatureUsage(name="C",
                              semantic=Semantic.CATEGORICAL,
                              max_vocab_count=32)

model = CoreModel(features=[feature_a, feature_b, feature_c])

name The name of the feature. Used as an identifier if the dataset is a dictionary of tensors.
semantic Semantic of the feature. If None, the semantic is automatically determined. The semantic controls how a feature is interpreted by a model. Using the wrong semantic (e.g. numerical instead of categorical) will hurt your model. See "FeatureSemantic" and "Semantic" for the definition of the of available semantics.
num_discretized_numerical_bins For DISCRETIZED_NUMERICAL features only. Number of bins used to discretize DISCRETIZED_NUMERICAL features.
max_vocab_count For CATEGORICAL and CATEGORICAL_SET features only. Number of unique categorical values stored as string. If more categorical values are present, the least frequent values are grouped into a Out-of-vocabulary item. Reducing the value can improve or hurt the model.
min_vocab_frequency For CATEGORICAL and CATEGORICAL_SET features only. Minimum number of occurence of a categorical value. Values present less than "min_vocab_frequency" times in the training dataset are treated as "Out-of-vocabulary".
override_global_imputation_value For CATEGORICAL and CATEGORICAL_SET features only. If set, replaces the global imputation value used to handle missing values. That is, at inference time, missing values will be treated as "override_global_imputation_value". "override_global_imputation_value" can only be used on categorical features and on columns not containing missing values in the training dataset. If the algorithm used to handle missing values is not "GLOBAL_IMPUTATION" (default algorithm), this value is ignored.
guide