tfr.utils.de_noise

Returns a float Tensor as the de-noised counts.

The implementation is based on the the paper by Zhang and Xu: "Fast Exact Maximum Likelihood Estimation for Mixture of Language Models." It assumes that the observed counts are generated from a mixture of noise and the true distribution: ratio * noise_distribution + (1 - ratio) * true_distribution, where the contribution of noise is controlled by ratio. This method returns the true distribution.

counts A 2-D Tensor representing the observations. All values should be nonnegative.
noise A 2-D Tensor representing the noise distribution. This should be the same shape as counts. All values should be positive and are normalized to a simplex per row.
ratio A float in (0, 1) representing the contribution from noise.

A 2-D float Tensor and each row is a simplex.

ValueError if ratio is not in (0,1).
InvalidArgumentError if any of counts is negative or any of noise is not positive.