# tfdf.keras.FeatureSemantic

Semantic (e.g.

Inherits From: `Enum`

numerical, categorical) of an input feature.

Determines how a feature is interpreted by the model. Similar to the "column type" of Yggdrasil Decision Forest.

`NUMERICAL` Numerical value. Generally for quantities or counts with full ordering. For example, the age of a person, or the number of items in a bag. Can be a float or an integer. Missing values are represented by math.nan or with an empty sparse tensor. If a numerical tensor contains multiple values, its size should be constant, and each dimension is threaded independently (and each dimension should always have the same "meaning").
`CATEGORICAL` A categorical value. Generally for a type/class in finite set of possible values without ordering. For example, the color RED in the set {RED, BLUE, GREEN}. Can be a string or an integer. Missing values are represented by "" (empty sting), value -2 or with an empty sparse tensor. An out-of-vocabulary value (i.e. a value that was never seen in training) is represented by any new string value or the value -1. If a numerical tensor contains multiple values, its size should be constant, and each value is treated independently (each value on the tensor should always have the same meaning). Integer categorical values: (1) The training logic and model representation is optimized with the assumption that values are dense. (2) Internally, the value is stored as int32. The values should be <~2B. (3) The number of possible value is computed automatically from the training dataset. During inference, integer values greater than any value seen during training will be treated as out-of-vocabulary. (4) Minimum frequency and maximum vocabulary size constrains don't apply.
`HASH` The hash of a string value. Used when only the equality between values is important (not the value itself). Currently, only used for groups in ranking problems e.g. the query in a query/document problem. The hashing is computed with google's farmhash and stored as an uint64.
`CATEGORICAL_SET` Set of categorical values. Great to represent tokenized texts. Can be a string or an integer in a sparse tensor or a ragged tensor (recommended). Unlike CATEGORICAL, the number of items in a CATEGORICAL_SET can change and the order/index of each item doesn't matter.
`BOOLEAN` Boolean value. WARNING: Boolean values are not yet supported for training. Can be a float or an integer. Missing values are represented by math.nan or with an empty sparse tensor. If a numerical tensor contains multiple values, its size should be constant, and each dimension is threaded independently (and each dimension should always have the same "meaning").

BOOLEAN `<Semantic.BOOLEAN: 5>`
CATEGORICAL `<Semantic.CATEGORICAL: 2>`
CATEGORICAL_SET `<Semantic.CATEGORICAL_SET: 4>`
HASH `<Semantic.HASH: 3>`
NUMERICAL `<Semantic.NUMERICAL: 1>`

