|View source on GitHub|
Build MinDiff dataset from sensitive and nonsensitive datasets.
model_remediation.min_diff.keras.utils.build_min_diff_dataset( sensitive_group_dataset, nonsensitive_group_dataset ) -> tf.data.Dataset
This function builds a
tf.data.Dataset containing examples that are meant to
only be used when calculating a
min_diff_loss. This resulting dataset will
need to be packed with the original dataset used for the original task of the
model which can be done by calling
Each input dataset must output a tuple in the format used in
tf.keras.Model.fit. Specifically the output must be a tuple of
length 1, 2 or 3 in the form
(x, y, sample_weight).
This output will be parsed internally in the following way:
batch = ... # Batch from any of the input datasets. x, y, sample_weight = tf.keras.utils.unpack_x_y_sample_weight(batch)
Every batch from the returned
tf.data.Dataset will contain one batch from
each of the input datasets. Each returned batch will be a tuple or structure
(matching the structure of the inputs) of
min_diff_sample_weight) where, for each pair of input datasets:
min_diff_x: is formed by concatenating the
xcomponents of the paired datasets. The structure of these must match. If they don't the dataset will raise an error at the first batch.
min_diff_membership: is a tensor of size
[min_diff_batch_size, 1]indicating which dataset each example comes from (
min_diff_sample_weight: is formed by concatenating the
sample_weightcomponents of the paired datasets. If both are
None, then this will be set to
None. If only one is
None, it is replaced with a
Tensorof ones of the appropriate shape.