## Introduction

There are two steps to integrating MinDiff into your model:

Prepare the data (covered in the input preparation guide).

Alter or create a model that will integrate MinDiff during training.

This guide will cover the simplest way to complete the second step: using `MinDiffModel`

.

## Setup

`pip install --upgrade tensorflow-model-remediation`

```
import tensorflow as tf
tf.get_logger().setLevel('ERROR') # Avoid TF warnings.
from tensorflow_model_remediation import min_diff
from tensorflow_model_remediation.tools.tutorials_utils import uci as tutorials_utils
```

First, download the data. For succinctness, the input preparation logic has been factored out into helper functions as described in the input preparation guide. You can read the full guide for details on this process.

```
# Original DataFrame for training, sampled at 0.3 for reduced runtimes.
train_df = tutorials_utils.get_uci_data(split='train', sample=0.3)
# Dataset needed to train with MinDiff.
train_with_min_diff_ds = (
tutorials_utils.get_uci_with_min_diff_dataset(split='train', sample=0.3))
```

## Original Model

This guide uses a basic, untuned `keras.Model`

using the Functional API to highlight using MinDiff. In a real world application, you would carefully choose the model architecture and use tuning to improve model quality before attempting to address any fairness issues.

Since `MinDiffModel`

is designed to work with most Keras `Model`

classes, we have factored out the logic of building the model into a helper function: `get_uci_model`

.

### Training with a Pandas DataFrame

This guide trains over a single epoch for speed, but could easily improve the model's performance by increasing the number of epochs.

```
model = tutorials_utils.get_uci_model()
model.compile(optimizer='adam', loss='binary_crossentropy')
df_without_target = train_df.drop(['target'], axis=1) # Drop 'target' for x.
_ = model.fit(
x=dict(df_without_target), # The model expects a dictionary of features.
y=train_df['target'],
batch_size=128,
epochs=1)
```

77/77 [==============================] - 2s 7ms/step - loss: 0.5387

### Training with a `tf.data.Dataset`

The equivalent training with a `tf.data.Dataset`

would look very similar (although initialization and input randomness may yield slightly different results).

```
model = tutorials_utils.get_uci_model()
model.compile(optimizer='adam', loss='binary_crossentropy')
_ = model.fit(
tutorials_utils.df_to_dataset(train_df, batch_size=128), # Converted to Dataset.
epochs=1)
```

77/77 [==============================] - 1s 7ms/step - loss: 0.5925

## Integrating MinDiff for training

Once the data has been prepared, apply MinDiff to your model with the following steps:

- Create the original model as you would without MinDiff.

```
original_model = tutorials_utils.get_uci_model()
```

- Wrap it in a
`MinDiffModel`

.

```
min_diff_model = min_diff.keras.MinDiffModel(
original_model=original_model,
loss=min_diff.losses.MMDLoss(),
loss_weight=1)
```

- Compile it as you would without MinDiff.

```
min_diff_model.compile(optimizer='adam', loss='binary_crossentropy')
```

- Train it with the MinDiff dataset (
`train_with_min_diff_ds`

in this case).

```
_ = min_diff_model.fit(train_with_min_diff_ds, epochs=1)
```

2022-04-01 00:10:53.139306: W tensorflow/core/grappler/optimizers/loop_optimizer.cc:907] Skipping loop optimization for Merge node with control input: min_diff_model/mmd_loss_inputs/assert_non_negative/assert_less_equal/Assert/AssertGuard/branch_executed/_8 36/36 [==============================] - 4s 12ms/step - loss: 0.7614 - min_diff_loss: 0.0350

## Evaluation and Prediction with `MinDiffModel`

Both evaluating and predicting with a `MinDiffModel`

are similar to doing so with the original model.

When calling `evaluate`

you can pass in either the original dataset or the one containing MinDiff data. If you choose the latter, you will also get the `min_diff_loss`

metric in addition to any other metrics being measured `loss`

will also include the `min_diff_loss`

.

When calling `evaluate`

you can pass in either the original dataset or the one containing MinDiff data. If you include MinDiff in the call to evaluate, two things will differ:

- An additional metric called
`min_diff_loss`

will be present in the output. - The value of the
`loss`

metric will be the sum of the original`loss`

metric (not shown in the output) and the`min_diff_loss`

.

```
_ = min_diff_model.evaluate(
tutorials_utils.df_to_dataset(train_df, batch_size=128))
# Calling with MinDiff data will include min_diff_loss in metrics.
_ = min_diff_model.evaluate(train_with_min_diff_ds)
```

77/77 [==============================] - 1s 6ms/step - loss: 0.7653 1/36 [..............................] - ETA: 36s - loss: 0.6000 - min_diff_loss: 0.0291 2022-04-01 00:10:55.551612: W tensorflow/core/grappler/optimizers/loop_optimizer.cc:907] Skipping loop optimization for Merge node with control input: min_diff_model/mmd_loss_inputs/assert_non_negative/assert_less_equal/Assert/AssertGuard/branch_executed/_8 36/36 [==============================] - 1s 12ms/step - loss: 0.8052 - min_diff_loss: 0.0284

When calling `predict`

you can technically also pass in the dataset with the MinDiff data but it will be ignored and not affect the output.

```
_ = min_diff_model.predict(
tutorials_utils.df_to_dataset(train_df, batch_size=128))
_ = min_diff_model.predict(train_with_min_diff_ds) # Identical to results above.
```

2022-04-01 00:10:57.629525: W tensorflow/core/grappler/optimizers/loop_optimizer.cc:907] Skipping loop optimization for Merge node with control input: min_diff_model/mmd_loss_inputs/assert_non_negative/assert_less_equal/Assert/AssertGuard/branch_executed/_8

## Limitations of using `MinDiffModel`

directly

When using `MinDiffModel`

as described above, most methods will use the default implementations of `tf.keras.Model`

(exceptions listed in the API documentation).

```
print('MinDiffModel.fit == keras.Model.fit')
print(min_diff.keras.MinDiffModel.fit == tf.keras.Model.fit)
print('MinDiffModel.train_step == keras.Model.train_step')
print(min_diff.keras.MinDiffModel.train_step == tf.keras.Model.train_step)
```

MinDiffModel.fit == keras.Model.fit True MinDiffModel.train_step == keras.Model.train_step True

For `keras.Sequential`

or `keras.Model`

, this is perfectly fine since they use the same functions.

```
print('Sequential.fit == keras.Model.fit')
print(tf.keras.Sequential.fit == tf.keras.Model.fit)
print('tf.keras.Sequential.train_step == keras.Model.train_step')
print(tf.keras.Sequential.train_step == tf.keras.Model.train_step)
```

Sequential.fit == keras.Model.fit True tf.keras.Sequential.train_step == keras.Model.train_step True

However, if your model is a subclass of `keras.Model`

, wrapping it with `MinDiffModel`

will effectively lose the customization.

```
class CustomModel(tf.keras.Model):
def train_step(self, **kwargs):
pass # Custom implementation.
print('CustomModel.train_step == keras.Model.train_step')
print(CustomModel.train_step == tf.keras.Model.train_step)
```

CustomModel.train_step == keras.Model.train_step False

If this is your use case, you should not use `MinDiffModel`

directly. Instead, you will need to subclass it as described in the customization guide.

## Additional Resources

- For an in depth discussion on fairness evaluation see the Fairness Indicators guidance
- For general information on Remediation and MinDiff, see the remediation overview.
- For details on requirements surrounding MinDiff see this guide.
- To see an end-to-end tutorial on using MinDiff in Keras, see this tutorial.