|View source on GitHub|
Semantic and hyper-parameters for a single feature.
tfdf.keras.FeatureUsage( name: Text, semantic: Optional[
tfdf.keras.FeatureSemantic] = None, discretized: Optional[int] = None, max_vocab_count: Optional[int] = None )
Used in the notebooks
|Used in the tutorials|
This class allows to:
- Limit the input features of the model.
- Set manually the semantic of a feature.
- Specify feature specific hyper-parameters.
Note that the model's "features" argument is optional. If it is not specified, all available feature will be used. See the "CoreModel" class documentation for more details.
# A feature named "A". The semantic will be detected automatically. The # global hyper-parameters of the model will be used. feature_a = FeatureUsage(name="A") # A feature named "C" representing a CATEGORICAL value. # Specifying the semantic ensure the feature is correctly detected. # In this case, the feature might be stored as an integer, and would have be # detected as NUMERICAL. feature_b = FeatureUsage(name="B", semantic=Semantic.CATEGORICAL) # A feature with a specific maximum dictionary size. feature_c = FeatureUsage(name="C", semantic=Semantic.CATEGORICAL, max_vocab_count=32) model = CoreModel(features=[feature_a, feature_b, feature_c])
||The name of the feature. Used as an identifier if the dataset is a dictionary of tensors.|
||Semantic of the feature. If None, the semantic is automatically determined. The semantic controls how a feature is interpreted by a model. Using the wrong semantic (e.g. numerical instead of categorical) will hurt your model. See "FeatureSemantic" and "Semantic" for the definition of the of available semantics.|
||For NUMERICAL features only. If set, the numerical values are discretized into a small set of unique values. This makes the training faster but often lead to worst models. A reasonable discretization value is 255.|
||For CATEGORICAL and CATEGORICAL_SET features only. Number of unique categorical values stored as string. If more categorical values are present, the least frequent values are grouped into a Out-of-vocabulary item. Reducing the value can improve or hurt the model.|