MinDiff is a model remediation technique that seeks to equalize two distributions. In practice, it can be used to balance error rates across different slices of your data by penalizing distributional differences.
Typically, you apply MinDiff when trying to ensure group fairness, such as minimizing the difference in either false positive rate (FPR) or false negative rate (FNR) between a slice of data belonging to a sensitive class and a better-performing slice. For in-depth discussion of fairness metrics, review the literature on this subject.123
How does MinDiff work?
Given two sets of examples from our dataset, MinDiff penalizes the model during training for differences in the distribution of scores between the two sets. The less distinguishable the two sets are based on prediction scores, the smaller the penalty that will be applied.
The penalty is applied by adding a component to the loss that the model is using for training. It can be thought of as a measurement of the difference in distribution of model predictions. As the model trains, it tries to minimize the penalty by bringing the distributions closer together, as shown in the graphs below.
Applying MinDiff may come with tradeoffs with respect to performance on the original task. MinDiff can be effective while not deteriorating performance beyond product needs, but the decision to balance between performance and effectiveness of MinDiff should be made deliberately by the product owner. For examples showing how to implement MinDiff, see the model remediation case study notebook.
For a tutorial on applying MinDiff on a text classification model, see MinDiff Keras notebook.
For a blog post on MinDiff on the TensorFlow blog, see Applying MinDiff to improve model blog post.
For the full Model Remediation library, see the model-remediation Github repo.
Chouldechova, A. (2016). Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. ↩