|View source on GitHub|
Computes an approximate probability density at each x, given the bins.
tft.estimated_probability_density( x: tf.Tensor, boundaries: Optional[Union[tf.Tensor, int]] = None, categorical: bool = False, name: Optional[str] = None ) -> tf.Tensor
Using this type of fixed-interval method has several benefits compared to bucketization, although may not always be preferred.
- Quantiles does not work on categorical data.
- The quantiles algorithm does not currently operate on multiple features jointly, only independently.
Ex: Outlier detection in a multi-modal or arbitrary distribution. Imagine a value x where a simple model is highly predictive of a target y within certain densely populated ranges. Outside these ranges, we may want to treat the data differently, but there are too few samples for the model to detect them by case-by-case treatment. One option would be to use the density estimate for this purpose:
outputs['x_density'] = tft.estimated_prob(inputs['x'], bins=100) outputs['outlier_x'] = tf.where(outputs['x_density'] < OUTLIER_THRESHOLD, tf.constant(), tf.constant())
This exercise uses a single variable for illustration, but a direct density metric would become more useful with higher dimensions.
Note that we normalize by average bin_width to arrive at a probability density estimate. The result resembles a pdf, not the probability that a value falls in the bucket (except in the categorical case).