Introduction to Fairness Indicators

View on TensorFlow.org Run in Google Colab View on GitHub Download notebook See TF Hub model

Overview

Fairness Indicators is a suite of tools built on top of TensorFlow Model Analysis (TFMA) that enable regular evaluation of fairness metrics in product pipelines. TFMA is a library for evaluating both TensorFlow and non-TensorFlow machine learning models. It allows you to evaluate your models on large amounts of data in a distributed manner, compute in-graph and other metrics over different slices of data, and visualize them in notebooks.

Fairness Indicators is packaged with TensorFlow Data Validation (TFDV) and the What-If Tool. Using Fairness Indicators allows you to:

  • Evaluate model performance, sliced across defined groups of users
  • Gain confidence about results with confidence intervals and evaluations at multiple thresholds
  • Evaluate the distribution of datasets
  • Dive deep into individual slices to explore root causes and opportunities for improvement

In this notebook, you will use Fairness Indicators to fix fairness issues in a model you train using the Civil Comments dataset. Watch this video for more details and context on the real-world scenario this is based on which is also one of primary motivations for creating Fairness Indicators.

Dataset

In this notebook, you will work with the Civil Comments dataset, approximately 2 million public comments made public by the Civil Comments platform in 2017 for ongoing research. This effort was sponsored by Jigsaw, who have hosted competitions on Kaggle to help classify toxic comments as well as minimize unintended model bias.

Each individual text comment in the dataset has a toxicity label, with the label being 1 if the comment is toxic and 0 if the comment is non-toxic. Within the data, a subset of comments are labeled with a variety of identity attributes, including categories for gender, sexual orientation, religion, and race or ethnicity.

Setup

Install fairness-indicators and witwidget.

pip install -q --q fairness-indicators
pip install -q --q witwidget
ERROR: After October 2020 you may experience errors when installing or updating packages. This is because pip will change the way that it resolves dependency conflicts.

We recommend you use --use-feature=2020-resolver to test your packages with the new resolver before it becomes the default.

google-api-python-client 1.12.8 requires httplib2<1dev,>=0.15.0, but you'll have httplib2 0.9.2 which is incompatible.
ERROR: After October 2020 you may experience errors when installing or updating packages. This is because pip will change the way that it resolves dependency conflicts.

We recommend you use --use-feature=2020-resolver to test your packages with the new resolver before it becomes the default.

apache-beam 2.25.0 requires httplib2<0.18.0,>=0.8, but you'll have httplib2 0.18.1 which is incompatible.

You must restart the Colab runtime after installing. Select Runtime > Restart runtime from the Colab menu.

Do not proceed with the rest of this tutorial without first restarting the runtime.

Import all other required libraries.

import os
import tempfile
import apache_beam as beam
import numpy as np
import pandas as pd
from datetime import datetime
import pprint

import tensorflow_hub as hub
import tensorflow as tf
import tensorflow_model_analysis as tfma
import tensorflow_data_validation as tfdv

from tensorflow_model_analysis.addons.fairness.post_export_metrics import fairness_indicators
from tensorflow_model_analysis.addons.fairness.view import widget_view

from fairness_indicators.documentation.examples import util

from witwidget.notebook.visualization import WitConfigBuilder
from witwidget.notebook.visualization import WitWidget

Download and analyze the data

By default, this notebook downloads a preprocessed version of this dataset, but you may use the original dataset and re-run the processing steps if desired. In the original dataset, each comment is labeled with the percentage of raters who believed that a comment corresponds to a particular identity. For example, a comment might be labeled with the following: { male: 0.3, female: 1.0, transgender: 0.0, heterosexual: 0.8, homosexual_gay_or_lesbian: 1.0 } The processing step groups identity by category (gender, sexual_orientation, etc.) and removes identities with a score less than 0.5. So the example above would be converted to the following: of raters who believed that a comment corresponds to a particular identity. For example, the comment would be labeled with the following: { gender: [female], sexual_orientation: [heterosexual, homosexual_gay_or_lesbian] }

download_original_data = False

if download_original_data:
  train_tf_file = tf.keras.utils.get_file('train_tf.tfrecord',
                                          'https://storage.googleapis.com/civil_comments_dataset/train_tf.tfrecord')
  validate_tf_file = tf.keras.utils.get_file('validate_tf.tfrecord',
                                             'https://storage.googleapis.com/civil_comments_dataset/validate_tf.tfrecord')

  # The identity terms list will be grouped together by their categories
  # (see 'IDENTITY_COLUMNS') on threshould 0.5. Only the identity term column,
  # text column and label column will be kept after processing.
  train_tf_file = util.convert_comments_data(train_tf_file)
  validate_tf_file = util.convert_comments_data(validate_tf_file)

else:
  train_tf_file = tf.keras.utils.get_file('train_tf_processed.tfrecord',
                                          'https://storage.googleapis.com/civil_comments_dataset/train_tf_processed.tfrecord')
  validate_tf_file = tf.keras.utils.get_file('validate_tf_processed.tfrecord',
                                             'https://storage.googleapis.com/civil_comments_dataset/validate_tf_processed.tfrecord')
Downloading data from https://storage.googleapis.com/civil_comments_dataset/train_tf_processed.tfrecord
488161280/488153424 [==============================] - 6s 0us/step
Downloading data from https://storage.googleapis.com/civil_comments_dataset/validate_tf_processed.tfrecord
324943872/324941336 [==============================] - 9s 0us/step

Use TFDV to analyze the data and find potential problems in it, such as missing values and data imbalances, that can lead to fairness disparities.

stats = tfdv.generate_statistics_from_tfrecord(data_location=train_tf_file)
tfdv.visualize_statistics(stats)
WARNING:apache_beam.runners.interactive.interactive_environment:Dependencies required for Interactive Beam PCollection visualization are not available, please use: `pip install apache-beam[interactive]` to install necessary dependencies to enable all data visualization features.

Warning:apache_beam.io.tfrecordio:Couldn't find python-snappy so the implementation of _TFRecordUtil._masked_crc32c is not as fast as it could be.

Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_data_validation/utils/stats_util.py:247: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`

Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_data_validation/utils/stats_util.py:247: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`

TFDV shows that there are some significant imbalances in the data which could lead to biased model outcomes.

  • The toxicity label (the value predicted by the model) is unbalanced. Only 8% of the examples in the training set are toxic, which means that a classifier could get 92% accuracy by predicting that all comments are non-toxic.

  • In the fields relating to identity terms, only 6.6k out of the 1.08 million (0.61%) training examples deal with homosexuality, and those related to bisexuality are even more rare. This indicates that performance on these slices may suffer due to lack of training data.

Prepare the data

Define a feature map to parse the data. Each example will have a label, comment text, and identity features sexual orientation, gender, religion, race, and disability that are associated with the text.

BASE_DIR = tempfile.gettempdir()

TEXT_FEATURE = 'comment_text'
LABEL = 'toxicity'
FEATURE_MAP = {
    # Label:
    LABEL: tf.io.FixedLenFeature([], tf.float32),
    # Text:
    TEXT_FEATURE:  tf.io.FixedLenFeature([], tf.string),

    # Identities:
    'sexual_orientation':tf.io.VarLenFeature(tf.string),
    'gender':tf.io.VarLenFeature(tf.string),
    'religion':tf.io.VarLenFeature(tf.string),
    'race':tf.io.VarLenFeature(tf.string),
    'disability':tf.io.VarLenFeature(tf.string),
}

Next, set up an input function to feed data into the model. Add a weight column to each example and upweight the toxic examples to account for the class imbalance identified by the TFDV. Use only identity features during the evaluation phase, as only the comments are fed into the model during training.

def train_input_fn():
  def parse_function(serialized):
    parsed_example = tf.io.parse_single_example(
        serialized=serialized, features=FEATURE_MAP)
    # Adds a weight column to deal with unbalanced classes.
    parsed_example['weight'] = tf.add(parsed_example[LABEL], 0.1)
    return (parsed_example,
            parsed_example[LABEL])
  train_dataset = tf.data.TFRecordDataset(
      filenames=[train_tf_file]).map(parse_function).batch(512)
  return train_dataset

Train the model

Create and train a deep learning model on the data.

model_dir = os.path.join(BASE_DIR, 'train', datetime.now().strftime(
    "%Y%m%d-%H%M%S"))

embedded_text_feature_column = hub.text_embedding_column(
    key=TEXT_FEATURE,
    module_spec='https://tfhub.dev/google/nnlm-en-dim128/1')

classifier = tf.estimator.DNNClassifier(
    hidden_units=[500, 100],
    weight_column='weight',
    feature_columns=[embedded_text_feature_column],
    optimizer=tf.keras.optimizers.Adagrad(learning_rate=0.003),
    loss_reduction=tf.losses.Reduction.SUM,
    n_classes=2,
    model_dir=model_dir)

classifier.train(input_fn=train_input_fn, steps=1000)
INFO:tensorflow:Using default config.

INFO:tensorflow:Using default config.

INFO:tensorflow:Using config: {'_model_dir': '/tmp/train/20201121-100727', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

INFO:tensorflow:Using config: {'_model_dir': '/tmp/train/20201121-100727', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.

Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Saver not created because there are no variables in the graph to restore

INFO:tensorflow:Saver not created because there are no variables in the graph to restore

Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/head/base_head.py:517: NumericColumn._get_dense_tensor (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version.
Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.

Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/head/base_head.py:517: NumericColumn._get_dense_tensor (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version.
Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.

Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/feature_column/feature_column.py:2192: NumericColumn._transform_feature (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version.
Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.

Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/feature_column/feature_column.py:2192: NumericColumn._transform_feature (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version.
Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.

Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/keras/optimizer_v2/adagrad.py:83: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor

Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/keras/optimizer_v2/adagrad.py:83: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Create CheckpointSaverHook.

INFO:tensorflow:Create CheckpointSaverHook.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...

INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...

INFO:tensorflow:Saving checkpoints for 0 into /tmp/train/20201121-100727/model.ckpt.

INFO:tensorflow:Saving checkpoints for 0 into /tmp/train/20201121-100727/model.ckpt.

INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...

INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...

INFO:tensorflow:loss = 59.616936, step = 0

INFO:tensorflow:loss = 59.616936, step = 0

INFO:tensorflow:global_step/sec: 21.9404

INFO:tensorflow:global_step/sec: 21.9404

INFO:tensorflow:loss = 56.416004, step = 100 (4.560 sec)

INFO:tensorflow:loss = 56.416004, step = 100 (4.560 sec)

INFO:tensorflow:global_step/sec: 22.2819

INFO:tensorflow:global_step/sec: 22.2819

INFO:tensorflow:loss = 47.268112, step = 200 (4.488 sec)

INFO:tensorflow:loss = 47.268112, step = 200 (4.488 sec)

INFO:tensorflow:global_step/sec: 22.1474

INFO:tensorflow:global_step/sec: 22.1474

INFO:tensorflow:loss = 55.89038, step = 300 (4.515 sec)

INFO:tensorflow:loss = 55.89038, step = 300 (4.515 sec)

INFO:tensorflow:global_step/sec: 22.2912

INFO:tensorflow:global_step/sec: 22.2912

INFO:tensorflow:loss = 56.21254, step = 400 (4.486 sec)

INFO:tensorflow:loss = 56.21254, step = 400 (4.486 sec)

INFO:tensorflow:global_step/sec: 22.2124

INFO:tensorflow:global_step/sec: 22.2124

INFO:tensorflow:loss = 41.3372, step = 500 (4.502 sec)

INFO:tensorflow:loss = 41.3372, step = 500 (4.502 sec)

INFO:tensorflow:global_step/sec: 22.2862

INFO:tensorflow:global_step/sec: 22.2862

INFO:tensorflow:loss = 45.430252, step = 600 (4.487 sec)

INFO:tensorflow:loss = 45.430252, step = 600 (4.487 sec)

INFO:tensorflow:global_step/sec: 22.4628

INFO:tensorflow:global_step/sec: 22.4628

INFO:tensorflow:loss = 51.093895, step = 700 (4.452 sec)

INFO:tensorflow:loss = 51.093895, step = 700 (4.452 sec)

INFO:tensorflow:global_step/sec: 21.8744

INFO:tensorflow:global_step/sec: 21.8744

INFO:tensorflow:loss = 47.681576, step = 800 (4.572 sec)

INFO:tensorflow:loss = 47.681576, step = 800 (4.572 sec)

INFO:tensorflow:global_step/sec: 22.2926

INFO:tensorflow:global_step/sec: 22.2926

INFO:tensorflow:loss = 48.37216, step = 900 (4.486 sec)

INFO:tensorflow:loss = 48.37216, step = 900 (4.486 sec)

INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 1000...

INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 1000...

INFO:tensorflow:Saving checkpoints for 1000 into /tmp/train/20201121-100727/model.ckpt.

INFO:tensorflow:Saving checkpoints for 1000 into /tmp/train/20201121-100727/model.ckpt.

INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 1000...

INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 1000...

INFO:tensorflow:Loss for final step: 51.016426.

INFO:tensorflow:Loss for final step: 51.016426.

<tensorflow_estimator.python.estimator.canned.dnn.DNNClassifierV2 at 0x7f790812fac8>

Analyze the model

After obtaining the trained model, analyze it to compute fairness metrics using TFMA and Fairness Indicators. Begin by exporting the model as a SavedModel.

Export SavedModel

def eval_input_receiver_fn():
  serialized_tf_example = tf.compat.v1.placeholder(
      dtype=tf.string, shape=[None], name='input_example_placeholder')

  # This *must* be a dictionary containing a single key 'examples', which
  # points to the input placeholder.
  receiver_tensors = {'examples': serialized_tf_example}

  features = tf.io.parse_example(serialized_tf_example, FEATURE_MAP)
  features['weight'] = tf.ones_like(features[LABEL])

  return tfma.export.EvalInputReceiver(
    features=features,
    receiver_tensors=receiver_tensors,
    labels=features[LABEL])

tfma_export_dir = tfma.export.export_eval_savedmodel(
  estimator=classifier,
  export_dir_base=os.path.join(BASE_DIR, 'tfma_eval_model'),
  eval_input_receiver_fn=eval_input_receiver_fn)
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_model_analysis/eval_saved_model/encoding.py:141: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.

Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_model_analysis/eval_saved_model/encoding.py:141: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Saver not created because there are no variables in the graph to restore

INFO:tensorflow:Saver not created because there are no variables in the graph to restore

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Signatures INCLUDED in export for Classify: None

INFO:tensorflow:Signatures INCLUDED in export for Classify: None

INFO:tensorflow:Signatures INCLUDED in export for Regress: None

INFO:tensorflow:Signatures INCLUDED in export for Regress: None

INFO:tensorflow:Signatures INCLUDED in export for Predict: None

INFO:tensorflow:Signatures INCLUDED in export for Predict: None

INFO:tensorflow:Signatures INCLUDED in export for Train: None

INFO:tensorflow:Signatures INCLUDED in export for Train: None

INFO:tensorflow:Signatures INCLUDED in export for Eval: ['eval']

INFO:tensorflow:Signatures INCLUDED in export for Eval: ['eval']

Warning:tensorflow:Export includes no default signature!

Warning:tensorflow:Export includes no default signature!

INFO:tensorflow:Restoring parameters from /tmp/train/20201121-100727/model.ckpt-1000

INFO:tensorflow:Restoring parameters from /tmp/train/20201121-100727/model.ckpt-1000

INFO:tensorflow:Assets added to graph.

INFO:tensorflow:Assets added to graph.

INFO:tensorflow:Assets written to: /tmp/tfma_eval_model/temp-1605953301/assets

INFO:tensorflow:Assets written to: /tmp/tfma_eval_model/temp-1605953301/assets

INFO:tensorflow:SavedModel written to: /tmp/tfma_eval_model/temp-1605953301/saved_model.pb

INFO:tensorflow:SavedModel written to: /tmp/tfma_eval_model/temp-1605953301/saved_model.pb

Compute Fairness Metrics

Select the identity to compute metrics for and whether to run with confidence intervals using the dropdown in the panel on the right.

Fairness Indicators Computation Options

Slice selection: sexual_orientation
Compute confidence intervals: False

Warning:apache_beam.typehints.typehints:Ignoring send_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring return_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring send_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring return_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring send_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring return_type hint: <class 'NoneType'>

Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_model_analysis/eval_saved_model/load.py:169: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0.

Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_model_analysis/eval_saved_model/load.py:169: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0.

INFO:tensorflow:Restoring parameters from /tmp/tfma_eval_model/1605953301/variables/variables

INFO:tensorflow:Restoring parameters from /tmp/tfma_eval_model/1605953301/variables/variables

Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_model_analysis/eval_saved_model/graph_ref.py:189: get_tensor_from_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.get_tensor_from_tensor_info or tf.compat.v1.saved_model.get_tensor_from_tensor_info.

Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_model_analysis/eval_saved_model/graph_ref.py:189: get_tensor_from_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.get_tensor_from_tensor_info or tf.compat.v1.saved_model.get_tensor_from_tensor_info.

Visualize data using the What-if Tool

In this section, you'll use the What-If Tool's interactive visual interface to explore and manipulate data at a micro-level.

Each point on the scatter plot on the right-hand panel represents one of the examples in the subset loaded into the tool. Click on one of the points to see details about this particular example in the left-hand panel. The comment text, ground truth toxicity, and applicable identities are shown. At the bottom of this left-hand panel, you see the inference results from the model you just trained.

Modify the text of the example and then click the Run inference button to view how your changes caused the perceived toxicity prediction to change.

DEFAULT_MAX_EXAMPLES = 1000

# Load 100000 examples in memory. When first rendered, 
# What-If Tool should only display 1000 of these due to browser constraints.
def wit_dataset(file, num_examples=100000):
  dataset = tf.data.TFRecordDataset(
      filenames=[file]).take(num_examples)
  return [tf.train.Example.FromString(d.numpy()) for d in dataset]

wit_data = wit_dataset(train_tf_file)
config_builder = WitConfigBuilder(wit_data[:DEFAULT_MAX_EXAMPLES]).set_estimator_and_feature_spec(
    classifier, FEATURE_MAP).set_label_vocab(['non-toxicity', LABEL]).set_target_feature(LABEL)
wit = WitWidget(config_builder)

Render Fairness Indicators

Render the Fairness Indicators widget with the exported evaluation results.

Below you will see bar charts displaying performance of each slice of the data on selected metrics. You can adjust the baseline comparison slice as well as the displayed threshold(s) using the dropdown menus at the top of the visualization.

The Fairness Indicator widget is integrated with the What-If Tool rendered above. If you select one slice of the data in the bar chart, the What-If Tool will update to show you examples from the selected slice. When the data reloads in the What-If Tool above, try modifying Color By to toxicity. This can give you a visual understanding of the toxicity balance of examples by slice.

event_handlers={'slice-selected':
                wit.create_selection_callback(wit_data, DEFAULT_MAX_EXAMPLES)}
widget_view.render_fairness_indicator(eval_result=eval_result,
                                      slicing_column=slice_selection,
                                      event_handlers=event_handlers
                                      )
FairnessIndicatorViewer(slicingMetrics=[{'sliceValue': 'Overall', 'slice': 'Overall', 'metrics': {'label/mean'…

With this particular dataset and task, systematically higher false positive and false negative rates for certain identities can lead to negative consequences. For example, in a content moderation system, a higher-than-overall false positive rate for a certain group can lead to those voices being silenced. Thus, it is important to regularly evaluate these types of criteria as you develop and improve models, and utilize tools such as Fairness Indicators, TFDV, and WIT to help illuminate potential problems. Once you've identified fairness issues, you can experiment with new data sources, data balancing, or other techniques to improve performance on underperforming groups.

See here for more information and guidance on how to use Fairness Indicators.

Use fairness evaluation results

The eval_result object, rendered above in render_fairness_indicator(), has its own API that you can leverage to read TFMA results into your programs.

Get evaluated slices and metrics

Use get_slice_names() and get_metric_names() to get the evaluated slices and metrics, respectively.

pp = pprint.PrettyPrinter()

print("Slices:")
pp.pprint(eval_result.get_slice_names())
print("\nMetrics:")
pp.pprint(eval_result.get_metric_names())
Slices:
[(),
 (('sexual_orientation', 'homosexual_gay_or_lesbian'),),
 (('sexual_orientation', 'heterosexual'),),
 (('sexual_orientation', 'bisexual'),),
 (('sexual_orientation', 'other_sexual_orientation'),)]

Metrics:
['post_export_metrics/false_omission_rate@0.50',
 'precision',
 'post_export_metrics/false_negative_rate@0.90',
 'post_export_metrics/false_discovery_rate@0.10',
 'post_export_metrics/true_positive_rate@0.30',
 'post_export_metrics/false_discovery_rate@0.70',
 'post_export_metrics/true_positive_rate@0.90',
 'post_export_metrics/true_negative_rate@0.10',
 'post_export_metrics/positive_rate@0.30',
 'average_loss',
 'accuracy',
 'label/mean',
 'post_export_metrics/negative_rate@0.70',
 'post_export_metrics/false_omission_rate@0.30',
 'post_export_metrics/negative_rate@0.50',
 'post_export_metrics/true_positive_rate@0.10',
 'post_export_metrics/positive_rate@0.70',
 'post_export_metrics/false_omission_rate@0.90',
 'post_export_metrics/negative_rate@0.30',
 'post_export_metrics/negative_rate@0.10',
 'auc',
 'post_export_metrics/false_positive_rate@0.10',
 'post_export_metrics/false_positive_rate@0.30',
 'post_export_metrics/positive_rate@0.50',
 'post_export_metrics/false_omission_rate@0.10',
 'post_export_metrics/false_discovery_rate@0.50',
 'recall',
 'post_export_metrics/false_positive_rate@0.70',
 'post_export_metrics/positive_rate@0.90',
 'post_export_metrics/true_negative_rate@0.90',
 'post_export_metrics/fairness/confusion_matrix_at_thresholds',
 'post_export_metrics/false_omission_rate@0.70',
 'post_export_metrics/false_negative_rate@0.70',
 'post_export_metrics/false_positive_rate@0.50',
 'post_export_metrics/false_negative_rate@0.30',
 'post_export_metrics/false_negative_rate@0.50',
 'post_export_metrics/false_negative_rate@0.10',
 'prediction/mean',
 'post_export_metrics/true_negative_rate@0.50',
 'accuracy_baseline',
 'post_export_metrics/false_positive_rate@0.90',
 'post_export_metrics/false_discovery_rate@0.30',
 'post_export_metrics/negative_rate@0.90',
 'post_export_metrics/true_negative_rate@0.70',
 'post_export_metrics/true_positive_rate@0.50',
 'post_export_metrics/true_positive_rate@0.70',
 'post_export_metrics/positive_rate@0.10',
 'auc_precision_recall',
 'post_export_metrics/example_count',
 'post_export_metrics/false_discovery_rate@0.90',
 'post_export_metrics/true_negative_rate@0.30']

Use get_metrics_for_slice() to get the metrics for a particular slice as a dictionary mapping metric names to metric values.

baseline_slice = ()
heterosexual_slice = (('sexual_orientation', 'heterosexual'),)

print("Baseline metric values:")
pp.pprint(eval_result.get_metrics_for_slice(baseline_slice))
print("\nHeterosexual metric values:")
pp.pprint(eval_result.get_metrics_for_slice(heterosexual_slice))
Baseline metric values:
{'accuracy': {'doubleValue': 0.7180622220039368},
 'accuracy_baseline': {'doubleValue': 0.9198060631752014},
 'auc': {'doubleValue': 0.7966873049736023},
 'auc_precision_recall': {'doubleValue': 0.30154213309288025},
 'average_loss': {'doubleValue': 0.5623575448989868},
 'label/mean': {'doubleValue': 0.08019392192363739},
 'post_export_metrics/example_count': {'doubleValue': 721950.0},
 'post_export_metrics/fairness/confusion_matrix_at_thresholds': {'confusionMatrixAtThresholds': {'matrices': [{'boundedFalseNegatives': {'value': 340.0},
                                                                                                               'boundedFalsePositives': {'value': 613006.0},
                                                                                                               'boundedPrecision': {'value': 0.08583248406648636},
                                                                                                               'boundedRecall': {'value': 0.9941273927688599},
                                                                                                               'boundedTrueNegatives': {'value': 51048.0},
                                                                                                               'boundedTruePositives': {'value': 57556.0},
                                                                                                               'falseNegatives': 340.0,
                                                                                                               'falsePositives': 613006.0,
                                                                                                               'precision': 0.08583248406648636,
                                                                                                               'recall': 0.9941273927688599,
                                                                                                               'tDistributionFalseNegatives': {'unsampledValue': 340.0},
                                                                                                               'tDistributionFalsePositives': {'unsampledValue': 613006.0},
                                                                                                               'tDistributionPrecision': {'unsampledValue': 0.08583248406648636},
                                                                                                               'tDistributionRecall': {'unsampledValue': 0.9941273927688599},
                                                                                                               'tDistributionTrueNegatives': {'unsampledValue': 51048.0},
                                                                                                               'tDistributionTruePositives': {'unsampledValue': 57556.0},
                                                                                                               'threshold': 0.10000000149011612,
                                                                                                               'trueNegatives': 51048.0,
                                                                                                               'truePositives': 57556.0},
                                                                                                              {'boundedFalseNegatives': {'value': 5018.0},
                                                                                                               'boundedFalsePositives': {'value': 385545.0},
                                                                                                               'boundedPrecision': {'value': 0.12060955166816711},
                                                                                                               'boundedRecall': {'value': 0.9133273363113403},
                                                                                                               'boundedTrueNegatives': {'value': 278509.0},
                                                                                                               'boundedTruePositives': {'value': 52878.0},
                                                                                                               'falseNegatives': 5018.0,
                                                                                                               'falsePositives': 385545.0,
                                                                                                               'precision': 0.12060955166816711,
                                                                                                               'recall': 0.9133273363113403,
                                                                                                               'tDistributionFalseNegatives': {'unsampledValue': 5018.0},
                                                                                                               'tDistributionFalsePositives': {'unsampledValue': 385545.0},
                                                                                                               'tDistributionPrecision': {'unsampledValue': 0.12060955166816711},
                                                                                                               'tDistributionRecall': {'unsampledValue': 0.9133273363113403},
                                                                                                               'tDistributionTrueNegatives': {'unsampledValue': 278509.0},
                                                                                                               'tDistributionTruePositives': {'unsampledValue': 52878.0},
                                                                                                               'threshold': 0.30000001192092896,
                                                                                                               'trueNegatives': 278509.0,
                                                                                                               'truePositives': 52878.0},
                                                                                                              {'boundedFalseNegatives': {'value': 15714.0},
                                                                                                               'boundedFalsePositives': {'value': 187831.0},
                                                                                                               'boundedPrecision': {'value': 0.18338963389396667},
                                                                                                               'boundedRecall': {'value': 0.7285822629928589},
                                                                                                               'boundedTrueNegatives': {'value': 476223.0},
                                                                                                               'boundedTruePositives': {'value': 42182.0},
                                                                                                               'falseNegatives': 15714.0,
                                                                                                               'falsePositives': 187831.0,
                                                                                                               'precision': 0.18338963389396667,
                                                                                                               'recall': 0.7285822629928589,
                                                                                                               'tDistributionFalseNegatives': {'unsampledValue': 15714.0},
                                                                                                               'tDistributionFalsePositives': {'unsampledValue': 187831.0},
                                                                                                               'tDistributionPrecision': {'unsampledValue': 0.18338963389396667},
                                                                                                               'tDistributionRecall': {'unsampledValue': 0.7285822629928589},
                                                                                                               'tDistributionTrueNegatives': {'unsampledValue': 476223.0},
                                                                                                               'tDistributionTruePositives': {'unsampledValue': 42182.0},
                                                                                                               'threshold': 0.5,
                                                                                                               'trueNegatives': 476223.0,
                                                                                                               'truePositives': 42182.0},
                                                                                                              {'boundedFalseNegatives': {'value': 31427.0},
                                                                                                               'boundedFalsePositives': {'value': 64504.0},
                                                                                                               'boundedPrecision': {'value': 0.29095447063446045},
                                                                                                               'boundedRecall': {'value': 0.457181841135025},
                                                                                                               'boundedTrueNegatives': {'value': 599550.0},
                                                                                                               'boundedTruePositives': {'value': 26469.0},
                                                                                                               'falseNegatives': 31427.0,
                                                                                                               'falsePositives': 64504.0,
                                                                                                               'precision': 0.29095447063446045,
                                                                                                               'recall': 0.457181841135025,
                                                                                                               'tDistributionFalseNegatives': {'unsampledValue': 31427.0},
                                                                                                               'tDistributionFalsePositives': {'unsampledValue': 64504.0},
                                                                                                               'tDistributionPrecision': {'unsampledValue': 0.29095447063446045},
                                                                                                               'tDistributionRecall': {'unsampledValue': 0.457181841135025},
                                                                                                               'tDistributionTrueNegatives': {'unsampledValue': 599550.0},
                                                                                                               'tDistributionTruePositives': {'unsampledValue': 26469.0},
                                                                                                               'threshold': 0.699999988079071,
                                                                                                               'trueNegatives': 599550.0,
                                                                                                               'truePositives': 26469.0},
                                                                                                              {'boundedFalseNegatives': {'value': 51317.0},
                                                                                                               'boundedFalsePositives': {'value': 6104.0},
                                                                                                               'boundedPrecision': {'value': 0.5187258720397949},
                                                                                                               'boundedRecall': {'value': 0.11363479495048523},
                                                                                                               'boundedTrueNegatives': {'value': 657950.0},
                                                                                                               'boundedTruePositives': {'value': 6579.0},
                                                                                                               'falseNegatives': 51317.0,
                                                                                                               'falsePositives': 6104.0,
                                                                                                               'precision': 0.5187258720397949,
                                                                                                               'recall': 0.11363479495048523,
                                                                                                               'tDistributionFalseNegatives': {'unsampledValue': 51317.0},
                                                                                                               'tDistributionFalsePositives': {'unsampledValue': 6104.0},
                                                                                                               'tDistributionPrecision': {'unsampledValue': 0.5187258720397949},
                                                                                                               'tDistributionRecall': {'unsampledValue': 0.11363479495048523},
                                                                                                               'tDistributionTrueNegatives': {'unsampledValue': 657950.0},
                                                                                                               'tDistributionTruePositives': {'unsampledValue': 6579.0},
                                                                                                               'threshold': 0.8999999761581421,
                                                                                                               'trueNegatives': 657950.0,
                                                                                                               'truePositives': 6579.0}]} },
 'post_export_metrics/false_discovery_rate@0.10': {'doubleValue': 0.9141675233840942},
 'post_export_metrics/false_discovery_rate@0.30': {'doubleValue': 0.8793904781341553},
 'post_export_metrics/false_discovery_rate@0.50': {'doubleValue': 0.8166103363037109},
 'post_export_metrics/false_discovery_rate@0.70': {'doubleValue': 0.7090455293655396},
 'post_export_metrics/false_discovery_rate@0.90': {'doubleValue': 0.48127415776252747},
 'post_export_metrics/false_negative_rate@0.10': {'doubleValue': 0.0058725993148982525},
 'post_export_metrics/false_negative_rate@0.30': {'doubleValue': 0.08667265623807907},
 'post_export_metrics/false_negative_rate@0.50': {'doubleValue': 0.2714177072048187},
 'post_export_metrics/false_negative_rate@0.70': {'doubleValue': 0.5428181290626526},
 'post_export_metrics/false_negative_rate@0.90': {'doubleValue': 0.8863652348518372},
 'post_export_metrics/false_omission_rate@0.10': {'doubleValue': 0.0066163307055830956},
 'post_export_metrics/false_omission_rate@0.30': {'doubleValue': 0.017698490992188454},
 'post_export_metrics/false_omission_rate@0.50': {'doubleValue': 0.03194311633706093},
 'post_export_metrics/false_omission_rate@0.70': {'doubleValue': 0.04980688542127609},
 'post_export_metrics/false_omission_rate@0.90': {'doubleValue': 0.07235216349363327},
 'post_export_metrics/false_positive_rate@0.10': {'doubleValue': 0.923126757144928},
 'post_export_metrics/false_positive_rate@0.30': {'doubleValue': 0.5805928707122803},
 'post_export_metrics/false_positive_rate@0.50': {'doubleValue': 0.28285500407218933},
 'post_export_metrics/false_positive_rate@0.70': {'doubleValue': 0.09713667631149292},
 'post_export_metrics/false_positive_rate@0.90': {'doubleValue': 0.009192023426294327},
 'post_export_metrics/negative_rate@0.10': {'doubleValue': 0.07117944210767746},
 'post_export_metrics/negative_rate@0.30': {'doubleValue': 0.39272385835647583},
 'post_export_metrics/negative_rate@0.50': {'doubleValue': 0.6814003586769104},
 'post_export_metrics/negative_rate@0.70': {'doubleValue': 0.8739898800849915},
 'post_export_metrics/negative_rate@0.90': {'doubleValue': 0.9824323058128357},
 'post_export_metrics/positive_rate@0.10': {'doubleValue': 0.9288205504417419},
 'post_export_metrics/positive_rate@0.30': {'doubleValue': 0.6072761416435242},
 'post_export_metrics/positive_rate@0.50': {'doubleValue': 0.3185996115207672},
 'post_export_metrics/positive_rate@0.70': {'doubleValue': 0.12601010501384735},
 'post_export_metrics/positive_rate@0.90': {'doubleValue': 0.017567697912454605},
 'post_export_metrics/true_negative_rate@0.10': {'doubleValue': 0.07687326520681381},
 'post_export_metrics/true_negative_rate@0.30': {'doubleValue': 0.4194071590900421},
 'post_export_metrics/true_negative_rate@0.50': {'doubleValue': 0.7171449661254883},
 'post_export_metrics/true_negative_rate@0.70': {'doubleValue': 0.9028633236885071},
 'post_export_metrics/true_negative_rate@0.90': {'doubleValue': 0.9908079504966736},
 'post_export_metrics/true_positive_rate@0.10': {'doubleValue': 0.9941273927688599},
 'post_export_metrics/true_positive_rate@0.30': {'doubleValue': 0.9133273363113403},
 'post_export_metrics/true_positive_rate@0.50': {'doubleValue': 0.7285822629928589},
 'post_export_metrics/true_positive_rate@0.70': {'doubleValue': 0.457181841135025},
 'post_export_metrics/true_positive_rate@0.90': {'doubleValue': 0.11363479495048523},
 'precision': {'doubleValue': 0.18338963389396667},
 'prediction/mean': {'doubleValue': 0.4002366364002228},
 'recall': {'doubleValue': 0.7285822629928589} }

Heterosexual metric values:
{'accuracy': {'doubleValue': 0.5304877758026123},
 'accuracy_baseline': {'doubleValue': 0.7601625919342041},
 'auc': {'doubleValue': 0.6684832572937012},
 'auc_precision_recall': {'doubleValue': 0.408615380525589},
 'average_loss': {'doubleValue': 0.8278010487556458},
 'label/mean': {'doubleValue': 0.2398373931646347},
 'post_export_metrics/example_count': {'doubleValue': 492.0},
 'post_export_metrics/fairness/confusion_matrix_at_thresholds': {'confusionMatrixAtThresholds': {'matrices': [{'boundedFalseNegatives': {'value': 0.0},
                                                                                                               'boundedFalsePositives': {'value': 362.0},
                                                                                                               'boundedPrecision': {'value': 0.24583333730697632},
                                                                                                               'boundedRecall': {'value': 1.0},
                                                                                                               'boundedTrueNegatives': {'value': 12.0},
                                                                                                               'boundedTruePositives': {'value': 118.0},
                                                                                                               'falsePositives': 362.0,
                                                                                                               'precision': 0.24583333730697632,
                                                                                                               'recall': 1.0,
                                                                                                               'tDistributionFalseNegatives': {'unsampledValue': 0.0},
                                                                                                               'tDistributionFalsePositives': {'unsampledValue': 362.0},
                                                                                                               'tDistributionPrecision': {'unsampledValue': 0.24583333730697632},
                                                                                                               'tDistributionRecall': {'unsampledValue': 1.0},
                                                                                                               'tDistributionTrueNegatives': {'unsampledValue': 12.0},
                                                                                                               'tDistributionTruePositives': {'unsampledValue': 118.0},
                                                                                                               'threshold': 0.10000000149011612,
                                                                                                               'trueNegatives': 12.0,
                                                                                                               'truePositives': 118.0},
                                                                                                              {'boundedFalseNegatives': {'value': 9.0},
                                                                                                               'boundedFalsePositives': {'value': 289.0},
                                                                                                               'boundedPrecision': {'value': 0.2738693356513977},
                                                                                                               'boundedRecall': {'value': 0.9237288236618042},
                                                                                                               'boundedTrueNegatives': {'value': 85.0},
                                                                                                               'boundedTruePositives': {'value': 109.0},
                                                                                                               'falseNegatives': 9.0,
                                                                                                               'falsePositives': 289.0,
                                                                                                               'precision': 0.2738693356513977,
                                                                                                               'recall': 0.9237288236618042,
                                                                                                               'tDistributionFalseNegatives': {'unsampledValue': 9.0},
                                                                                                               'tDistributionFalsePositives': {'unsampledValue': 289.0},
                                                                                                               'tDistributionPrecision': {'unsampledValue': 0.2738693356513977},
                                                                                                               'tDistributionRecall': {'unsampledValue': 0.9237288236618042},
                                                                                                               'tDistributionTrueNegatives': {'unsampledValue': 85.0},
                                                                                                               'tDistributionTruePositives': {'unsampledValue': 109.0},
                                                                                                               'threshold': 0.30000001192092896,
                                                                                                               'trueNegatives': 85.0,
                                                                                                               'truePositives': 109.0},
                                                                                                              {'boundedFalseNegatives': {'value': 32.0},
                                                                                                               'boundedFalsePositives': {'value': 199.0},
                                                                                                               'boundedPrecision': {'value': 0.3017543852329254},
                                                                                                               'boundedRecall': {'value': 0.7288135886192322},
                                                                                                               'boundedTrueNegatives': {'value': 175.0},
                                                                                                               'boundedTruePositives': {'value': 86.0},
                                                                                                               'falseNegatives': 32.0,
                                                                                                               'falsePositives': 199.0,
                                                                                                               'precision': 0.3017543852329254,
                                                                                                               'recall': 0.7288135886192322,
                                                                                                               'tDistributionFalseNegatives': {'unsampledValue': 32.0},
                                                                                                               'tDistributionFalsePositives': {'unsampledValue': 199.0},
                                                                                                               'tDistributionPrecision': {'unsampledValue': 0.3017543852329254},
                                                                                                               'tDistributionRecall': {'unsampledValue': 0.7288135886192322},
                                                                                                               'tDistributionTrueNegatives': {'unsampledValue': 175.0},
                                                                                                               'tDistributionTruePositives': {'unsampledValue': 86.0},
                                                                                                               'threshold': 0.5,
                                                                                                               'trueNegatives': 175.0,
                                                                                                               'truePositives': 86.0},
                                                                                                              {'boundedFalseNegatives': {'value': 58.0},
                                                                                                               'boundedFalsePositives': {'value': 115.0},
                                                                                                               'boundedPrecision': {'value': 0.34285715222358704},
                                                                                                               'boundedRecall': {'value': 0.508474588394165},
                                                                                                               'boundedTrueNegatives': {'value': 259.0},
                                                                                                               'boundedTruePositives': {'value': 60.0},
                                                                                                               'falseNegatives': 58.0,
                                                                                                               'falsePositives': 115.0,
                                                                                                               'precision': 0.34285715222358704,
                                                                                                               'recall': 0.508474588394165,
                                                                                                               'tDistributionFalseNegatives': {'unsampledValue': 58.0},
                                                                                                               'tDistributionFalsePositives': {'unsampledValue': 115.0},
                                                                                                               'tDistributionPrecision': {'unsampledValue': 0.34285715222358704},
                                                                                                               'tDistributionRecall': {'unsampledValue': 0.508474588394165},
                                                                                                               'tDistributionTrueNegatives': {'unsampledValue': 259.0},
                                                                                                               'tDistributionTruePositives': {'unsampledValue': 60.0},
                                                                                                               'threshold': 0.699999988079071,
                                                                                                               'trueNegatives': 259.0,
                                                                                                               'truePositives': 60.0},
                                                                                                              {'boundedFalseNegatives': {'value': 95.0},
                                                                                                               'boundedFalsePositives': {'value': 17.0},
                                                                                                               'boundedPrecision': {'value': 0.574999988079071},
                                                                                                               'boundedRecall': {'value': 0.19491524994373322},
                                                                                                               'boundedTrueNegatives': {'value': 357.0},
                                                                                                               'boundedTruePositives': {'value': 23.0},
                                                                                                               'falseNegatives': 95.0,
                                                                                                               'falsePositives': 17.0,
                                                                                                               'precision': 0.574999988079071,
                                                                                                               'recall': 0.19491524994373322,
                                                                                                               'tDistributionFalseNegatives': {'unsampledValue': 95.0},
                                                                                                               'tDistributionFalsePositives': {'unsampledValue': 17.0},
                                                                                                               'tDistributionPrecision': {'unsampledValue': 0.574999988079071},
                                                                                                               'tDistributionRecall': {'unsampledValue': 0.19491524994373322},
                                                                                                               'tDistributionTrueNegatives': {'unsampledValue': 357.0},
                                                                                                               'tDistributionTruePositives': {'unsampledValue': 23.0},
                                                                                                               'threshold': 0.8999999761581421,
                                                                                                               'trueNegatives': 357.0,
                                                                                                               'truePositives': 23.0}]} },
 'post_export_metrics/false_discovery_rate@0.10': {'doubleValue': 0.7541666626930237},
 'post_export_metrics/false_discovery_rate@0.30': {'doubleValue': 0.7261306643486023},
 'post_export_metrics/false_discovery_rate@0.50': {'doubleValue': 0.6982455849647522},
 'post_export_metrics/false_discovery_rate@0.70': {'doubleValue': 0.6571428775787354},
 'post_export_metrics/false_discovery_rate@0.90': {'doubleValue': 0.42500001192092896},
 'post_export_metrics/false_negative_rate@0.10': {'doubleValue': 0.0},
 'post_export_metrics/false_negative_rate@0.30': {'doubleValue': 0.0762711837887764},
 'post_export_metrics/false_negative_rate@0.50': {'doubleValue': 0.2711864411830902},
 'post_export_metrics/false_negative_rate@0.70': {'doubleValue': 0.49152541160583496},
 'post_export_metrics/false_negative_rate@0.90': {'doubleValue': 0.805084764957428},
 'post_export_metrics/false_omission_rate@0.10': {'doubleValue': 0.0},
 'post_export_metrics/false_omission_rate@0.30': {'doubleValue': 0.09574468433856964},
 'post_export_metrics/false_omission_rate@0.50': {'doubleValue': 0.15458936989307404},
 'post_export_metrics/false_omission_rate@0.70': {'doubleValue': 0.18296529352664948},
 'post_export_metrics/false_omission_rate@0.90': {'doubleValue': 0.2101769894361496},
 'post_export_metrics/false_positive_rate@0.10': {'doubleValue': 0.9679144620895386},
 'post_export_metrics/false_positive_rate@0.30': {'doubleValue': 0.7727272510528564},
 'post_export_metrics/false_positive_rate@0.50': {'doubleValue': 0.5320855379104614},
 'post_export_metrics/false_positive_rate@0.70': {'doubleValue': 0.3074866235256195},
 'post_export_metrics/false_positive_rate@0.90': {'doubleValue': 0.04545454680919647},
 'post_export_metrics/negative_rate@0.10': {'doubleValue': 0.024390242993831635},
 'post_export_metrics/negative_rate@0.30': {'doubleValue': 0.19105690717697144},
 'post_export_metrics/negative_rate@0.50': {'doubleValue': 0.42073169350624084},
 'post_export_metrics/negative_rate@0.70': {'doubleValue': 0.6443089246749878},
 'post_export_metrics/negative_rate@0.90': {'doubleValue': 0.9186992049217224},
 'post_export_metrics/positive_rate@0.10': {'doubleValue': 0.9756097793579102},
 'post_export_metrics/positive_rate@0.30': {'doubleValue': 0.8089430928230286},
 'post_export_metrics/positive_rate@0.50': {'doubleValue': 0.5792682766914368},
 'post_export_metrics/positive_rate@0.70': {'doubleValue': 0.3556910455226898},
 'post_export_metrics/positive_rate@0.90': {'doubleValue': 0.08130080997943878},
 'post_export_metrics/true_negative_rate@0.10': {'doubleValue': 0.03208556026220322},
 'post_export_metrics/true_negative_rate@0.30': {'doubleValue': 0.22727273404598236},
 'post_export_metrics/true_negative_rate@0.50': {'doubleValue': 0.4679144322872162},
 'post_export_metrics/true_negative_rate@0.70': {'doubleValue': 0.6925133466720581},
 'post_export_metrics/true_negative_rate@0.90': {'doubleValue': 0.9545454382896423},
 'post_export_metrics/true_positive_rate@0.10': {'doubleValue': 1.0},
 'post_export_metrics/true_positive_rate@0.30': {'doubleValue': 0.9237288236618042},
 'post_export_metrics/true_positive_rate@0.50': {'doubleValue': 0.7288135886192322},
 'post_export_metrics/true_positive_rate@0.70': {'doubleValue': 0.508474588394165},
 'post_export_metrics/true_positive_rate@0.90': {'doubleValue': 0.19491524994373322},
 'precision': {'doubleValue': 0.3017543852329254},
 'prediction/mean': {'doubleValue': 0.5579363107681274},
 'recall': {'doubleValue': 0.7288135886192322} }

Use get_metrics_for_all_slices() to get the metrics for all slices as a dictionary mapping each slice to the corresponding metrics dictionary you obtain from running get_metrics_for_slice() on it.

pp.pprint(eval_result.get_metrics_for_all_slices())
{(): {'accuracy': {'doubleValue': 0.7180622220039368},
      'accuracy_baseline': {'doubleValue': 0.9198060631752014},
      'auc': {'doubleValue': 0.7966873049736023},
      'auc_precision_recall': {'doubleValue': 0.30154213309288025},
      'average_loss': {'doubleValue': 0.5623575448989868},
      'label/mean': {'doubleValue': 0.08019392192363739},
      'post_export_metrics/example_count': {'doubleValue': 721950.0},
      'post_export_metrics/fairness/confusion_matrix_at_thresholds': {'confusionMatrixAtThresholds': {'matrices': [{'boundedFalseNegatives': {'value': 340.0},
                                                                                                                    'boundedFalsePositives': {'value': 613006.0},
                                                                                                                    'boundedPrecision': {'value': 0.08583248406648636},
                                                                                                                    'boundedRecall': {'value': 0.9941273927688599},
                                                                                                                    'boundedTrueNegatives': {'value': 51048.0},
                                                                                                                    'boundedTruePositives': {'value': 57556.0},
                                                                                                                    'falseNegatives': 340.0,
                                                                                                                    'falsePositives': 613006.0,
                                                                                                                    'precision': 0.08583248406648636,
                                                                                                                    'recall': 0.9941273927688599,
                                                                                                                    'tDistributionFalseNegatives': {'unsampledValue': 340.0},
                                                                                                                    'tDistributionFalsePositives': {'unsampledValue': 613006.0},
                                                                                                                    'tDistributionPrecision': {'unsampledValue': 0.08583248406648636},
                                                                                                                    'tDistributionRecall': {'unsampledValue': 0.9941273927688599},
                                                                                                                    'tDistributionTrueNegatives': {'unsampledValue': 51048.0},
                                                                                                                    'tDistributionTruePositives': {'unsampledValue': 57556.0},
                                                                                                                    'threshold': 0.10000000149011612,
                                                                                                                    'trueNegatives': 51048.0,
                                                                                                                    'truePositives': 57556.0},
                                                                                                                   {'boundedFalseNegatives': {'value': 5018.0},
                                                                                                                    'boundedFalsePositives': {'value': 385545.0},
                                                                                                                    'boundedPrecision': {'value': 0.12060955166816711},
                                                                                                                    'boundedRecall': {'value': 0.9133273363113403},
                                                                                                                    'boundedTrueNegatives': {'value': 278509.0},
                                                                                                                    'boundedTruePositives': {'value': 52878.0},
                                                                                                                    'falseNegatives': 5018.0,
                                                                                                                    'falsePositives': 385545.0,
                                                                                                                    'precision': 0.12060955166816711,
                                                                                                                    'recall': 0.9133273363113403,
                                                                                                                    'tDistributionFalseNegatives': {'unsampledValue': 5018.0},
                                                                                                                    'tDistributionFalsePositives': {'unsampledValue': 385545.0},
                                                                                                                    'tDistributionPrecision': {'unsampledValue': 0.12060955166816711},
                                                                                                                    'tDistributionRecall': {'unsampledValue': 0.9133273363113403},
                                                                                                                    'tDistributionTrueNegatives': {'unsampledValue': 278509.0},
                                                                                                                    'tDistributionTruePositives': {'unsampledValue': 52878.0},
                                                                                                                    'threshold': 0.30000001192092896,
                                                                                                                    'trueNegatives': 278509.0,
                                                                                                                    'truePositives': 52878.0},
                                                                                                                   {'boundedFalseNegatives': {'value': 15714.0},
                                                                                                                    'boundedFalsePositives': {'value': 187831.0},
                                                                                                                    'boundedPrecision': {'value': 0.18338963389396667},
                                                                                                                    'boundedRecall': {'value': 0.7285822629928589},
                                                                                                                    'boundedTrueNegatives': {'value': 476223.0},
                                                                                                                    'boundedTruePositives': {'value': 42182.0},
                                                                                                                    'falseNegatives': 15714.0,
                                                                                                                    'falsePositives': 187831.0,
                                                                                                                    'precision': 0.18338963389396667,
                                                                                                                    'recall': 0.7285822629928589,
                                                                                                                    'tDistributionFalseNegatives': {'unsampledValue': 15714.0},
                                                                                                                    'tDistributionFalsePositives': {'unsampledValue': 187831.0},
                                                                                                                    'tDistributionPrecision': {'unsampledValue': 0.18338963389396667},
                                                                                                                    'tDistributionRecall': {'unsampledValue': 0.7285822629928589},
                                                                                                                    'tDistributionTrueNegatives': {'unsampledValue': 476223.0},
                                                                                                                    'tDistributionTruePositives': {'unsampledValue': 42182.0},
                                                                                                                    'threshold': 0.5,
                                                                                                                    'trueNegatives': 476223.0,
                                                                                                                    'truePositives': 42182.0},
                                                                                                                   {'boundedFalseNegatives': {'value': 31427.0},
                                                                                                                    'boundedFalsePositives': {'value': 64504.0},
                                                                                                                    'boundedPrecision': {'value': 0.29095447063446045},
                                                                                                                    'boundedRecall': {'value': 0.457181841135025},
                                                                                                                    'boundedTrueNegatives': {'value': 599550.0},
                                                                                                                    'boundedTruePositives': {'value': 26469.0},
                                                                                                                    'falseNegatives': 31427.0,
                                                                                                                    'falsePositives': 64504.0,
                                                                                                                    'precision': 0.29095447063446045,
                                                                                                                    'recall': 0.457181841135025,
                                                                                                                    'tDistributionFalseNegatives': {'unsampledValue': 31427.0},
                                                                                                                    'tDistributionFalsePositives': {'unsampledValue': 64504.0},
                                                                                                                    'tDistributionPrecision': {'unsampledValue': 0.29095447063446045},
                                                                                                                    'tDistributionRecall': {'unsampledValue': 0.457181841135025},
                                                                                                                    'tDistributionTrueNegatives': {'unsampledValue': 599550.0},
                                                                                                                    'tDistributionTruePositives': {'unsampledValue': 26469.0},
                                                                                                                    'threshold': 0.699999988079071,
                                                                                                                    'trueNegatives': 599550.0,
                                                                                                                    'truePositives': 26469.0},
                                                                                                                   {'boundedFalseNegatives': {'value': 51317.0},
                                                                                                                    'boundedFalsePositives': {'value': 6104.0},
                                                                                                                    'boundedPrecision': {'value': 0.5187258720397949},
                                                                                                                    'boundedRecall': {'value': 0.11363479495048523},
                                                                                                                    'boundedTrueNegatives': {'value': 657950.0},
                                                                                                                    'boundedTruePositives': {'value': 6579.0},
                                                                                                                    'falseNegatives': 51317.0,
                                                                                                                    'falsePositives': 6104.0,
                                                                                                                    'precision': 0.5187258720397949,
                                                                                                                    'recall': 0.11363479495048523,
                                                                                                                    'tDistributionFalseNegatives': {'unsampledValue': 51317.0},
                                                                                                                    'tDistributionFalsePositives': {'unsampledValue': 6104.0},
                                                                                                                    'tDistributionPrecision': {'unsampledValue': 0.5187258720397949},
                                                                                                                    'tDistributionRecall': {'unsampledValue': 0.11363479495048523},
                                                                                                                    'tDistributionTrueNegatives': {'unsampledValue': 657950.0},
                                                                                                                    'tDistributionTruePositives': {'unsampledValue': 6579.0},
                                                                                                                    'threshold': 0.8999999761581421,
                                                                                                                    'trueNegatives': 657950.0,
                                                                                                                    'truePositives': 6579.0}]} },
      'post_export_metrics/false_discovery_rate@0.10': {'doubleValue': 0.9141675233840942},
      'post_export_metrics/false_discovery_rate@0.30': {'doubleValue': 0.8793904781341553},
      'post_export_metrics/false_discovery_rate@0.50': {'doubleValue': 0.8166103363037109},
      'post_export_metrics/false_discovery_rate@0.70': {'doubleValue': 0.7090455293655396},
      'post_export_metrics/false_discovery_rate@0.90': {'doubleValue': 0.48127415776252747},
      'post_export_metrics/false_negative_rate@0.10': {'doubleValue': 0.0058725993148982525},
      'post_export_metrics/false_negative_rate@0.30': {'doubleValue': 0.08667265623807907},
      'post_export_metrics/false_negative_rate@0.50': {'doubleValue': 0.2714177072048187},
      'post_export_metrics/false_negative_rate@0.70': {'doubleValue': 0.5428181290626526},
      'post_export_metrics/false_negative_rate@0.90': {'doubleValue': 0.8863652348518372},
      'post_export_metrics/false_omission_rate@0.10': {'doubleValue': 0.0066163307055830956},
      'post_export_metrics/false_omission_rate@0.30': {'doubleValue': 0.017698490992188454},
      'post_export_metrics/false_omission_rate@0.50': {'doubleValue': 0.03194311633706093},
      'post_export_metrics/false_omission_rate@0.70': {'doubleValue': 0.04980688542127609},
      'post_export_metrics/false_omission_rate@0.90': {'doubleValue': 0.07235216349363327},
      'post_export_metrics/false_positive_rate@0.10': {'doubleValue': 0.923126757144928},
      'post_export_metrics/false_positive_rate@0.30': {'doubleValue': 0.5805928707122803},
      'post_export_metrics/false_positive_rate@0.50': {'doubleValue': 0.28285500407218933},
      'post_export_metrics/false_positive_rate@0.70': {'doubleValue': 0.09713667631149292},
      'post_export_metrics/false_positive_rate@0.90': {'doubleValue': 0.009192023426294327},
      'post_export_metrics/negative_rate@0.10': {'doubleValue': 0.07117944210767746},
      'post_export_metrics/negative_rate@0.30': {'doubleValue': 0.39272385835647583},
      'post_export_metrics/negative_rate@0.50': {'doubleValue': 0.6814003586769104},
      'post_export_metrics/negative_rate@0.70': {'doubleValue': 0.8739898800849915},
      'post_export_metrics/negative_rate@0.90': {'doubleValue': 0.9824323058128357},
      'post_export_metrics/positive_rate@0.10': {'doubleValue': 0.9288205504417419},
      'post_export_metrics/positive_rate@0.30': {'doubleValue': 0.6072761416435242},
      'post_export_metrics/positive_rate@0.50': {'doubleValue': 0.3185996115207672},
      'post_export_metrics/positive_rate@0.70': {'doubleValue': 0.12601010501384735},
      'post_export_metrics/positive_rate@0.90': {'doubleValue': 0.017567697912454605},
      'post_export_metrics/true_negative_rate@0.10': {'doubleValue': 0.07687326520681381},
      'post_export_metrics/true_negative_rate@0.30': {'doubleValue': 0.4194071590900421},
      'post_export_metrics/true_negative_rate@0.50': {'doubleValue': 0.7171449661254883},
      'post_export_metrics/true_negative_rate@0.70': {'doubleValue': 0.9028633236885071},
      'post_export_metrics/true_negative_rate@0.90': {'doubleValue': 0.9908079504966736},
      'post_export_metrics/true_positive_rate@0.10': {'doubleValue': 0.9941273927688599},
      'post_export_metrics/true_positive_rate@0.30': {'doubleValue': 0.9133273363113403},
      'post_export_metrics/true_positive_rate@0.50': {'doubleValue': 0.7285822629928589},
      'post_export_metrics/true_positive_rate@0.70': {'doubleValue': 0.457181841135025},
      'post_export_metrics/true_positive_rate@0.90': {'doubleValue': 0.11363479495048523},
      'precision': {'doubleValue': 0.18338963389396667},
      'prediction/mean': {'doubleValue': 0.4002366364002228},
      'recall': {'doubleValue': 0.7285822629928589} },
 (('sexual_orientation', 'bisexual'),): {'accuracy': {'doubleValue': 0.5431034564971924},
                                         'accuracy_baseline': {'doubleValue': 0.8017241358757019},
                                         'auc': {'doubleValue': 0.6311360001564026},
                                         'auc_precision_recall': {'doubleValue': 0.32485702633857727},
                                         'average_loss': {'doubleValue': 0.7483880519866943},
                                         'label/mean': {'doubleValue': 0.1982758641242981},
                                         'post_export_metrics/example_count': {'doubleValue': 116.0},
                                         'post_export_metrics/fairness/confusion_matrix_at_thresholds': {'confusionMatrixAtThresholds': {'matrices': [{'boundedFalseNegatives': {'value': 0.0},
                                                                                                                                                       'boundedFalsePositives': {'value': 85.0},
                                                                                                                                                       'boundedPrecision': {'value': 0.21296297013759613},
                                                                                                                                                       'boundedRecall': {'value': 1.0},
                                                                                                                                                       'boundedTrueNegatives': {'value': 8.0},
                                                                                                                                                       'boundedTruePositives': {'value': 23.0},
                                                                                                                                                       'falsePositives': 85.0,
                                                                                                                                                       'precision': 0.21296297013759613,
                                                                                                                                                       'recall': 1.0,
                                                                                                                                                       'tDistributionFalseNegatives': {'unsampledValue': 0.0},
                                                                                                                                                       'tDistributionFalsePositives': {'unsampledValue': 85.0},
                                                                                                                                                       'tDistributionPrecision': {'unsampledValue': 0.21296297013759613},
                                                                                                                                                       'tDistributionRecall': {'unsampledValue': 1.0},
                                                                                                                                                       'tDistributionTrueNegatives': {'unsampledValue': 8.0},
                                                                                                                                                       'tDistributionTruePositives': {'unsampledValue': 23.0},
                                                                                                                                                       'threshold': 0.10000000149011612,
                                                                                                                                                       'trueNegatives': 8.0,
                                                                                                                                                       'truePositives': 23.0},
                                                                                                                                                      {'boundedFalseNegatives': {'value': 4.0},
                                                                                                                                                       'boundedFalsePositives': {'value': 68.0},
                                                                                                                                                       'boundedPrecision': {'value': 0.2183908075094223},
                                                                                                                                                       'boundedRecall': {'value': 0.8260869383811951},
                                                                                                                                                       'boundedTrueNegatives': {'value': 25.0},
                                                                                                                                                       'boundedTruePositives': {'value': 19.0},
                                                                                                                                                       'falseNegatives': 4.0,
                                                                                                                                                       'falsePositives': 68.0,
                                                                                                                                                       'precision': 0.2183908075094223,
                                                                                                                                                       'recall': 0.8260869383811951,
                                                                                                                                                       'tDistributionFalseNegatives': {'unsampledValue': 4.0},
                                                                                                                                                       'tDistributionFalsePositives': {'unsampledValue': 68.0},
                                                                                                                                                       'tDistributionPrecision': {'unsampledValue': 0.2183908075094223},
                                                                                                                                                       'tDistributionRecall': {'unsampledValue': 0.8260869383811951},
                                                                                                                                                       'tDistributionTrueNegatives': {'unsampledValue': 25.0},
                                                                                                                                                       'tDistributionTruePositives': {'unsampledValue': 19.0},
                                                                                                                                                       'threshold': 0.30000001192092896,
                                                                                                                                                       'trueNegatives': 25.0,
                                                                                                                                                       'truePositives': 19.0},
                                                                                                                                                      {'boundedFalseNegatives': {'value': 9.0},
                                                                                                                                                       'boundedFalsePositives': {'value': 44.0},
                                                                                                                                                       'boundedPrecision': {'value': 0.24137930572032928},
                                                                                                                                                       'boundedRecall': {'value': 0.6086956262588501},
                                                                                                                                                       'boundedTrueNegatives': {'value': 49.0},
                                                                                                                                                       'boundedTruePositives': {'value': 14.0},
                                                                                                                                                       'falseNegatives': 9.0,
                                                                                                                                                       'falsePositives': 44.0,
                                                                                                                                                       'precision': 0.24137930572032928,
                                                                                                                                                       'recall': 0.6086956262588501,
                                                                                                                                                       'tDistributionFalseNegatives': {'unsampledValue': 9.0},
                                                                                                                                                       'tDistributionFalsePositives': {'unsampledValue': 44.0},
                                                                                                                                                       'tDistributionPrecision': {'unsampledValue': 0.24137930572032928},
                                                                                                                                                       'tDistributionRecall': {'unsampledValue': 0.6086956262588501},
                                                                                                                                                       'tDistributionTrueNegatives': {'unsampledValue': 49.0},
                                                                                                                                                       'tDistributionTruePositives': {'unsampledValue': 14.0},
                                                                                                                                                       'threshold': 0.5,
                                                                                                                                                       'trueNegatives': 49.0,
                                                                                                                                                       'truePositives': 14.0},
                                                                                                                                                      {'boundedFalseNegatives': {'value': 15.0},
                                                                                                                                                       'boundedFalsePositives': {'value': 20.0},
                                                                                                                                                       'boundedPrecision': {'value': 0.2857142984867096},
                                                                                                                                                       'boundedRecall': {'value': 0.3478260934352875},
                                                                                                                                                       'boundedTrueNegatives': {'value': 73.0},
                                                                                                                                                       'boundedTruePositives': {'value': 8.0},
                                                                                                                                                       'falseNegatives': 15.0,
                                                                                                                                                       'falsePositives': 20.0,
                                                                                                                                                       'precision': 0.2857142984867096,
                                                                                                                                                       'recall': 0.3478260934352875,
                                                                                                                                                       'tDistributionFalseNegatives': {'unsampledValue': 15.0},
                                                                                                                                                       'tDistributionFalsePositives': {'unsampledValue': 20.0},
                                                                                                                                                       'tDistributionPrecision': {'unsampledValue': 0.2857142984867096},
                                                                                                                                                       'tDistributionRecall': {'unsampledValue': 0.3478260934352875},
                                                                                                                                                       'tDistributionTrueNegatives': {'unsampledValue': 73.0},
                                                                                                                                                       'tDistributionTruePositives': {'unsampledValue': 8.0},
                                                                                                                                                       'threshold': 0.699999988079071,
                                                                                                                                                       'trueNegatives': 73.0,
                                                                                                                                                       'truePositives': 8.0},
                                                                                                                                                      {'boundedFalseNegatives': {'value': 22.0},
                                                                                                                                                       'boundedFalsePositives': {'value': 2.0},
                                                                                                                                                       'boundedPrecision': {'value': 0.3333333432674408},
                                                                                                                                                       'boundedRecall': {'value': 0.043478261679410934},
                                                                                                                                                       'boundedTrueNegatives': {'value': 91.0},
                                                                                                                                                       'boundedTruePositives': {'value': 1.0},
                                                                                                                                                       'falseNegatives': 22.0,
                                                                                                                                                       'falsePositives': 2.0,
                                                                                                                                                       'precision': 0.3333333432674408,
                                                                                                                                                       'recall': 0.043478261679410934,
                                                                                                                                                       'tDistributionFalseNegatives': {'unsampledValue': 22.0},
                                                                                                                                                       'tDistributionFalsePositives': {'unsampledValue': 2.0},
                                                                                                                                                       'tDistributionPrecision': {'unsampledValue': 0.3333333432674408},
                                                                                                                                                       'tDistributionRecall': {'unsampledValue': 0.043478261679410934},
                                                                                                                                                       'tDistributionTrueNegatives': {'unsampledValue': 91.0},
                                                                                                                                                       'tDistributionTruePositives': {'unsampledValue': 1.0},
                                                                                                                                                       'threshold': 0.8999999761581421,
                                                                                                                                                       'trueNegatives': 91.0,
                                                                                                                                                       'truePositives': 1.0}]} },
                                         'post_export_metrics/false_discovery_rate@0.10': {'doubleValue': 0.7870370149612427},
                                         'post_export_metrics/false_discovery_rate@0.30': {'doubleValue': 0.7816091775894165},
                                         'post_export_metrics/false_discovery_rate@0.50': {'doubleValue': 0.7586206793785095},
                                         'post_export_metrics/false_discovery_rate@0.70': {'doubleValue': 0.7142857313156128},
                                         'post_export_metrics/false_discovery_rate@0.90': {'doubleValue': 0.6666666865348816},
                                         'post_export_metrics/false_negative_rate@0.10': {'doubleValue': 0.0},
                                         'post_export_metrics/false_negative_rate@0.30': {'doubleValue': 0.17391304671764374},
                                         'post_export_metrics/false_negative_rate@0.50': {'doubleValue': 0.3913043439388275},
                                         'post_export_metrics/false_negative_rate@0.70': {'doubleValue': 0.6521739363670349},
                                         'post_export_metrics/false_negative_rate@0.90': {'doubleValue': 0.95652174949646},
                                         'post_export_metrics/false_omission_rate@0.10': {'doubleValue': 0.0},
                                         'post_export_metrics/false_omission_rate@0.30': {'doubleValue': 0.13793103396892548},
                                         'post_export_metrics/false_omission_rate@0.50': {'doubleValue': 0.1551724076271057},
                                         'post_export_metrics/false_omission_rate@0.70': {'doubleValue': 0.17045454680919647},
                                         'post_export_metrics/false_omission_rate@0.90': {'doubleValue': 0.1946902722120285},
                                         'post_export_metrics/false_positive_rate@0.10': {'doubleValue': 0.9139785170555115},
                                         'post_export_metrics/false_positive_rate@0.30': {'doubleValue': 0.7311828136444092},
                                         'post_export_metrics/false_positive_rate@0.50': {'doubleValue': 0.47311827540397644},
                                         'post_export_metrics/false_positive_rate@0.70': {'doubleValue': 0.2150537669658661},
                                         'post_export_metrics/false_positive_rate@0.90': {'doubleValue': 0.02150537632405758},
                                         'post_export_metrics/negative_rate@0.10': {'doubleValue': 0.06896551698446274},
                                         'post_export_metrics/negative_rate@0.30': {'doubleValue': 0.25},
                                         'post_export_metrics/negative_rate@0.50': {'doubleValue': 0.5},
                                         'post_export_metrics/negative_rate@0.70': {'doubleValue': 0.7586206793785095},
                                         'post_export_metrics/negative_rate@0.90': {'doubleValue': 0.9741379022598267},
                                         'post_export_metrics/positive_rate@0.10': {'doubleValue': 0.931034505367279},
                                         'post_export_metrics/positive_rate@0.30': {'doubleValue': 0.75},
                                         'post_export_metrics/positive_rate@0.50': {'doubleValue': 0.5},
                                         'post_export_metrics/positive_rate@0.70': {'doubleValue': 0.24137930572032928},
                                         'post_export_metrics/positive_rate@0.90': {'doubleValue': 0.0258620698004961},
                                         'post_export_metrics/true_negative_rate@0.10': {'doubleValue': 0.08602150529623032},
                                         'post_export_metrics/true_negative_rate@0.30': {'doubleValue': 0.2688172161579132},
                                         'post_export_metrics/true_negative_rate@0.50': {'doubleValue': 0.5268816947937012},
                                         'post_export_metrics/true_negative_rate@0.70': {'doubleValue': 0.7849462628364563},
                                         'post_export_metrics/true_negative_rate@0.90': {'doubleValue': 0.9784946441650391},
                                         'post_export_metrics/true_positive_rate@0.10': {'doubleValue': 1.0},
                                         'post_export_metrics/true_positive_rate@0.30': {'doubleValue': 0.8260869383811951},
                                         'post_export_metrics/true_positive_rate@0.50': {'doubleValue': 0.6086956262588501},
                                         'post_export_metrics/true_positive_rate@0.70': {'doubleValue': 0.3478260934352875},
                                         'post_export_metrics/true_positive_rate@0.90': {'doubleValue': 0.043478261679410934},
                                         'precision': {'doubleValue': 0.24137930572032928},
                                         'prediction/mean': {'doubleValue': 0.4921306073665619},
                                         'recall': {'doubleValue': 0.6086956262588501} },
 (('sexual_orientation', 'heterosexual'),): {'accuracy': {'doubleValue': 0.5304877758026123},
                                             'accuracy_baseline': {'doubleValue': 0.7601625919342041},
                                             'auc': {'doubleValue': 0.6684832572937012},
                                             'auc_precision_recall': {'doubleValue': 0.408615380525589},
                                             'average_loss': {'doubleValue': 0.8278010487556458},
                                             'label/mean': {'doubleValue': 0.2398373931646347},
                                             'post_export_metrics/example_count': {'doubleValue': 492.0},
                                             'post_export_metrics/fairness/confusion_matrix_at_thresholds': {'confusionMatrixAtThresholds': {'matrices': [{'boundedFalseNegatives': {'value': 0.0},
                                                                                                                                                           'boundedFalsePositives': {'value': 362.0},
                                                                                                                                                           'boundedPrecision': {'value': 0.24583333730697632},
                                                                                                                                                           'boundedRecall': {'value': 1.0},
                                                                                                                                                           'boundedTrueNegatives': {'value': 12.0},
                                                                                                                                                           'boundedTruePositives': {'value': 118.0},
                                                                                                                                                           'falsePositives': 362.0,
                                                                                                                                                           'precision': 0.24583333730697632,
                                                                                                                                                           'recall': 1.0,
                                                                                                                                                           'tDistributionFalseNegatives': {'unsampledValue': 0.0},
                                                                                                                                                           'tDistributionFalsePositives': {'unsampledValue': 362.0},
                                                                                                                                                           'tDistributionPrecision': {'unsampledValue': 0.24583333730697632},
                                                                                                                                                           'tDistributionRecall': {'unsampledValue': 1.0},
                                                                                                                                                           'tDistributionTrueNegatives': {'unsampledValue': 12.0},
                                                                                                                                                           'tDistributionTruePositives': {'unsampledValue': 118.0},
                                                                                                                                                           'threshold': 0.10000000149011612,
                                                                                                                                                           'trueNegatives': 12.0,
                                                                                                                                                           'truePositives': 118.0},
                                                                                                                                                          {'boundedFalseNegatives': {'value': 9.0},
                                                                                                                                                           'boundedFalsePositives': {'value': 289.0},
                                                                                                                                                           'boundedPrecision': {'value': 0.2738693356513977},
                                                                                                                                                           'boundedRecall': {'value': 0.9237288236618042},
                                                                                                                                                           'boundedTrueNegatives': {'value': 85.0},
                                                                                                                                                           'boundedTruePositives': {'value': 109.0},
                                                                                                                                                           'falseNegatives': 9.0,
                                                                                                                                                           'falsePositives': 289.0,
                                                                                                                                                           'precision': 0.2738693356513977,
                                                                                                                                                           'recall': 0.9237288236618042,
                                                                                                                                                           'tDistributionFalseNegatives': {'unsampledValue': 9.0},
                                                                                                                                                           'tDistributionFalsePositives': {'unsampledValue': 289.0},
                                                                                                                                                           'tDistributionPrecision': {'unsampledValue': 0.2738693356513977},
                                                                                                                                                           'tDistributionRecall': {'unsampledValue': 0.9237288236618042},
                                                                                                                                                           'tDistributionTrueNegatives': {'unsampledValue': 85.0},
                                                                                                                                                           'tDistributionTruePositives': {'unsampledValue': 109.0},
                                                                                                                                                           'threshold': 0.30000001192092896,
                                                                                                                                                           'trueNegatives': 85.0,
                                                                                                                                                           'truePositives': 109.0},
                                                                                                                                                          {'boundedFalseNegatives': {'value': 32.0},
                                                                                                                                                           'boundedFalsePositives': {'value': 199.0},
                                                                                                                                                           'boundedPrecision': {'value': 0.3017543852329254},
                                                                                                                                                           'boundedRecall': {'value': 0.7288135886192322},
                                                                                                                                                           'boundedTrueNegatives': {'value': 175.0},
                                                                                                                                                           'boundedTruePositives': {'value': 86.0},
                                                                                                                                                           'falseNegatives': 32.0,
                                                                                                                                                           'falsePositives': 199.0,
                                                                                                                                                           'precision': 0.3017543852329254,
                                                                                                                                                           'recall': 0.7288135886192322,
                                                                                                                                                           'tDistributionFalseNegatives': {'unsampledValue': 32.0},
                                                                                                                                                           'tDistributionFalsePositives': {'unsampledValue': 199.0},
                                                                                                                                                           'tDistributionPrecision': {'unsampledValue': 0.3017543852329254},
                                                                                                                                                           'tDistributionRecall': {'unsampledValue': 0.7288135886192322},
                                                                                                                                                           'tDistributionTrueNegatives': {'unsampledValue': 175.0},
                                                                                                                                                           'tDistributionTruePositives': {'unsampledValue': 86.0},
                                                                                                                                                           'threshold': 0.5,
                                                                                                                                                           'trueNegatives': 175.0,
                                                                                                                                                           'truePositives': 86.0},
                                                                                                                                                          {'boundedFalseNegatives': {'value': 58.0},
                                                                                                                                                           'boundedFalsePositives': {'value': 115.0},
                                                                                                                                                           'boundedPrecision': {'value': 0.34285715222358704},
                                                                                                                                                           'boundedRecall': {'value': 0.508474588394165},
                                                                                                                                                           'boundedTrueNegatives': {'value': 259.0},
                                                                                                                                                           'boundedTruePositives': {'value': 60.0},
                                                                                                                                                           'falseNegatives': 58.0,
                                                                                                                                                           'falsePositives': 115.0,
                                                                                                                                                           'precision': 0.34285715222358704,
                                                                                                                                                           'recall': 0.508474588394165,
                                                                                                                                                           'tDistributionFalseNegatives': {'unsampledValue': 58.0},
                                                                                                                                                           'tDistributionFalsePositives': {'unsampledValue': 115.0},
                                                                                                                                                           'tDistributionPrecision': {'unsampledValue': 0.34285715222358704},
                                                                                                                                                           'tDistributionRecall': {'unsampledValue': 0.508474588394165},
                                                                                                                                                           'tDistributionTrueNegatives': {'unsampledValue': 259.0},
                                                                                                                                                           'tDistributionTruePositives': {'unsampledValue': 60.0},
                                                                                                                                                           'threshold': 0.699999988079071,
                                                                                                                                                           'trueNegatives': 259.0,
                                                                                                                                                           'truePositives': 60.0},
                                                                                                                                                          {'boundedFalseNegatives': {'value': 95.0},
                                                                                                                                                           'boundedFalsePositives': {'value': 17.0},
                                                                                                                                                           'boundedPrecision': {'value': 0.574999988079071},
                                                                                                                                                           'boundedRecall': {'value': 0.19491524994373322},
                                                                                                                                                           'boundedTrueNegatives': {'value': 357.0},
                                                                                                                                                           'boundedTruePositives': {'value': 23.0},
                                                                                                                                                           'falseNegatives': 95.0,
                                                                                                                                                           'falsePositives': 17.0,
                                                                                                                                                           'precision': 0.574999988079071,
                                                                                                                                                           'recall': 0.19491524994373322,
                                                                                                                                                           'tDistributionFalseNegatives': {'unsampledValue': 95.0},
                                                                                                                                                           'tDistributionFalsePositives': {'unsampledValue': 17.0},
                                                                                                                                                           'tDistributionPrecision': {'unsampledValue': 0.574999988079071},
                                                                                                                                                           'tDistributionRecall': {'unsampledValue': 0.19491524994373322},
                                                                                                                                                           'tDistributionTrueNegatives': {'unsampledValue': 357.0},
                                                                                                                                                           'tDistributionTruePositives': {'unsampledValue': 23.0},
                                                                                                                                                           'threshold': 0.8999999761581421,
                                                                                                                                                           'trueNegatives': 357.0,
                                                                                                                                                           'truePositives': 23.0}]} },
                                             'post_export_metrics/false_discovery_rate@0.10': {'doubleValue': 0.7541666626930237},
                                             'post_export_metrics/false_discovery_rate@0.30': {'doubleValue': 0.7261306643486023},
                                             'post_export_metrics/false_discovery_rate@0.50': {'doubleValue': 0.6982455849647522},
                                             'post_export_metrics/false_discovery_rate@0.70': {'doubleValue': 0.6571428775787354},
                                             'post_export_metrics/false_discovery_rate@0.90': {'doubleValue': 0.42500001192092896},
                                             'post_export_metrics/false_negative_rate@0.10': {'doubleValue': 0.0},
                                             'post_export_metrics/false_negative_rate@0.30': {'doubleValue': 0.0762711837887764},
                                             'post_export_metrics/false_negative_rate@0.50': {'doubleValue': 0.2711864411830902},
                                             'post_export_metrics/false_negative_rate@0.70': {'doubleValue': 0.49152541160583496},
                                             'post_export_metrics/false_negative_rate@0.90': {'doubleValue': 0.805084764957428},
                                             'post_export_metrics/false_omission_rate@0.10': {'doubleValue': 0.0},
                                             'post_export_metrics/false_omission_rate@0.30': {'doubleValue': 0.09574468433856964},
                                             'post_export_metrics/false_omission_rate@0.50': {'doubleValue': 0.15458936989307404},
                                             'post_export_metrics/false_omission_rate@0.70': {'doubleValue': 0.18296529352664948},
                                             'post_export_metrics/false_omission_rate@0.90': {'doubleValue': 0.2101769894361496},
                                             'post_export_metrics/false_positive_rate@0.10': {'doubleValue': 0.9679144620895386},
                                             'post_export_metrics/false_positive_rate@0.30': {'doubleValue': 0.7727272510528564},
                                             'post_export_metrics/false_positive_rate@0.50': {'doubleValue': 0.5320855379104614},
                                             'post_export_metrics/false_positive_rate@0.70': {'doubleValue': 0.3074866235256195},
                                             'post_export_metrics/false_positive_rate@0.90': {'doubleValue': 0.04545454680919647},
                                             'post_export_metrics/negative_rate@0.10': {'doubleValue': 0.024390242993831635},
                                             'post_export_metrics/negative_rate@0.30': {'doubleValue': 0.19105690717697144},
                                             'post_export_metrics/negative_rate@0.50': {'doubleValue': 0.42073169350624084},
                                             'post_export_metrics/negative_rate@0.70': {'doubleValue': 0.6443089246749878},
                                             'post_export_metrics/negative_rate@0.90': {'doubleValue': 0.9186992049217224},
                                             'post_export_metrics/positive_rate@0.10': {'doubleValue': 0.9756097793579102},
                                             'post_export_metrics/positive_rate@0.30': {'doubleValue': 0.8089430928230286},
                                             'post_export_metrics/positive_rate@0.50': {'doubleValue': 0.5792682766914368},
                                             'post_export_metrics/positive_rate@0.70': {'doubleValue': 0.3556910455226898},
                                             'post_export_metrics/positive_rate@0.90': {'doubleValue': 0.08130080997943878},
                                             'post_export_metrics/true_negative_rate@0.10': {'doubleValue': 0.03208556026220322},
                                             'post_export_metrics/true_negative_rate@0.30': {'doubleValue': 0.22727273404598236},
                                             'post_export_metrics/true_negative_rate@0.50': {'doubleValue': 0.4679144322872162},
                                             'post_export_metrics/true_negative_rate@0.70': {'doubleValue': 0.6925133466720581},
                                             'post_export_metrics/true_negative_rate@0.90': {'doubleValue': 0.9545454382896423},
                                             'post_export_metrics/true_positive_rate@0.10': {'doubleValue': 1.0},
                                             'post_export_metrics/true_positive_rate@0.30': {'doubleValue': 0.9237288236618042},
                                             'post_export_metrics/true_positive_rate@0.50': {'doubleValue': 0.7288135886192322},
                                             'post_export_metrics/true_positive_rate@0.70': {'doubleValue': 0.508474588394165},
                                             'post_export_metrics/true_positive_rate@0.90': {'doubleValue': 0.19491524994373322},
                                             'precision': {'doubleValue': 0.3017543852329254},
                                             'prediction/mean': {'doubleValue': 0.5579363107681274},
                                             'recall': {'doubleValue': 0.7288135886192322} },
 (('sexual_orientation', 'homosexual_gay_or_lesbian'),): {'accuracy': {'doubleValue': 0.5831435322761536},
                                                          'accuracy_baseline': {'doubleValue': 0.7182232141494751},
                                                          'auc': {'doubleValue': 0.7068267464637756},
                                                          'auc_precision_recall': {'doubleValue': 0.4703059196472168},
                                                          'average_loss': {'doubleValue': 0.739447832107544},
                                                          'label/mean': {'doubleValue': 0.2817767560482025},
                                                          'post_export_metrics/example_count': {'doubleValue': 4390.0},
                                                          'post_export_metrics/fairness/confusion_matrix_at_thresholds': {'confusionMatrixAtThresholds': {'matrices': [{'boundedFalseNegatives': {'value': 2.0},
                                                                                                                                                                        'boundedFalsePositives': {'value': 3036.0},
                                                                                                                                                                        'boundedPrecision': {'value': 0.2891594469547272},
                                                                                                                                                                        'boundedRecall': {'value': 0.9983831644058228},
                                                                                                                                                                        'boundedTrueNegatives': {'value': 117.0},
                                                                                                                                                                        'boundedTruePositives': {'value': 1235.0},
                                                                                                                                                                        'falseNegatives': 2.0,
                                                                                                                                                                        'falsePositives': 3036.0,
                                                                                                                                                                        'precision': 0.2891594469547272,
                                                                                                                                                                        'recall': 0.9983831644058228,
                                                                                                                                                                        'tDistributionFalseNegatives': {'unsampledValue': 2.0},
                                                                                                                                                                        'tDistributionFalsePositives': {'unsampledValue': 3036.0},
                                                                                                                                                                        'tDistributionPrecision': {'unsampledValue': 0.2891594469547272},
                                                                                                                                                                        'tDistributionRecall': {'unsampledValue': 0.9983831644058228},
                                                                                                                                                                        'tDistributionTrueNegatives': {'unsampledValue': 117.0},
                                                                                                                                                                        'tDistributionTruePositives': {'unsampledValue': 1235.0},
                                                                                                                                                                        'threshold': 0.10000000149011612,
                                                                                                                                                                        'trueNegatives': 117.0,
                                                                                                                                                                        'truePositives': 1235.0},
                                                                                                                                                                       {'boundedFalseNegatives': {'value': 76.0},
                                                                                                                                                                        'boundedFalsePositives': {'value': 2383.0},
                                                                                                                                                                        'boundedPrecision': {'value': 0.32759594917297363},
                                                                                                                                                                        'boundedRecall': {'value': 0.9385610222816467},
                                                                                                                                                                        'boundedTrueNegatives': {'value': 770.0},
                                                                                                                                                                        'boundedTruePositives': {'value': 1161.0},
                                                                                                                                                                        'falseNegatives': 76.0,
                                                                                                                                                                        'falsePositives': 2383.0,
                                                                                                                                                                        'precision': 0.32759594917297363,
                                                                                                                                                                        'recall': 0.9385610222816467,
                                                                                                                                                                        'tDistributionFalseNegatives': {'unsampledValue': 76.0},
                                                                                                                                                                        'tDistributionFalsePositives': {'unsampledValue': 2383.0},
                                                                                                                                                                        'tDistributionPrecision': {'unsampledValue': 0.32759594917297363},
                                                                                                                                                                        'tDistributionRecall': {'unsampledValue': 0.9385610222816467},
                                                                                                                                                                        'tDistributionTrueNegatives': {'unsampledValue': 770.0},
                                                                                                                                                                        'tDistributionTruePositives': {'unsampledValue': 1161.0},
                                                                                                                                                                        'threshold': 0.30000001192092896,
                                                                                                                                                                        'trueNegatives': 770.0,
                                                                                                                                                                        'truePositives': 1161.0},
                                                                                                                                                                       {'boundedFalseNegatives': {'value': 284.0},
                                                                                                                                                                        'boundedFalsePositives': {'value': 1546.0},
                                                                                                                                                                        'boundedPrecision': {'value': 0.3813525438308716},
                                                                                                                                                                        'boundedRecall': {'value': 0.770412266254425},
                                                                                                                                                                        'boundedTrueNegatives': {'value': 1607.0},
                                                                                                                                                                        'boundedTruePositives': {'value': 953.0},
                                                                                                                                                                        'falseNegatives': 284.0,
                                                                                                                                                                        'falsePositives': 1546.0,
                                                                                                                                                                        'precision': 0.3813525438308716,
                                                                                                                                                                        'recall': 0.770412266254425,
                                                                                                                                                                        'tDistributionFalseNegatives': {'unsampledValue': 284.0},
                                                                                                                                                                        'tDistributionFalsePositives': {'unsampledValue': 1546.0},
                                                                                                                                                                        'tDistributionPrecision': {'unsampledValue': 0.3813525438308716},
                                                                                                                                                                        'tDistributionRecall': {'unsampledValue': 0.770412266254425},
                                                                                                                                                                        'tDistributionTrueNegatives': {'unsampledValue': 1607.0},
                                                                                                                                                                        'tDistributionTruePositives': {'unsampledValue': 953.0},
                                                                                                                                                                        'threshold': 0.5,
                                                                                                                                                                        'trueNegatives': 1607.0,
                                                                                                                                                                        'truePositives': 953.0},
                                                                                                                                                                       {'boundedFalseNegatives': {'value': 602.0},
                                                                                                                                                                        'boundedFalsePositives': {'value': 751.0},
                                                                                                                                                                        'boundedPrecision': {'value': 0.4581529498100281},
                                                                                                                                                                        'boundedRecall': {'value': 0.5133387446403503},
                                                                                                                                                                        'boundedTrueNegatives': {'value': 2402.0},
                                                                                                                                                                        'boundedTruePositives': {'value': 635.0},
                                                                                                                                                                        'falseNegatives': 602.0,
                                                                                                                                                                        'falsePositives': 751.0,
                                                                                                                                                                        'precision': 0.4581529498100281,
                                                                                                                                                                        'recall': 0.5133387446403503,
                                                                                                                                                                        'tDistributionFalseNegatives': {'unsampledValue': 602.0},
                                                                                                                                                                        'tDistributionFalsePositives': {'unsampledValue': 751.0},
                                                                                                                                                                        'tDistributionPrecision': {'unsampledValue': 0.4581529498100281},
                                                                                                                                                                        'tDistributionRecall': {'unsampledValue': 0.5133387446403503},
                                                                                                                                                                        'tDistributionTrueNegatives': {'unsampledValue': 2402.0},
                                                                                                                                                                        'tDistributionTruePositives': {'unsampledValue': 635.0},
                                                                                                                                                                        'threshold': 0.699999988079071,
                                                                                                                                                                        'trueNegatives': 2402.0,
                                                                                                                                                                        'truePositives': 635.0},
                                                                                                                                                                       {'boundedFalseNegatives': {'value': 1073.0},
                                                                                                                                                                        'boundedFalsePositives': {'value': 113.0},
                                                                                                                                                                        'boundedPrecision': {'value': 0.5920577645301819},
                                                                                                                                                                        'boundedRecall': {'value': 0.13257881999015808},
                                                                                                                                                                        'boundedTrueNegatives': {'value': 3040.0},
                                                                                                                                                                        'boundedTruePositives': {'value': 164.0},
                                                                                                                                                                        'falseNegatives': 1073.0,
                                                                                                                                                                        'falsePositives': 113.0,
                                                                                                                                                                        'precision': 0.5920577645301819,
                                                                                                                                                                        'recall': 0.13257881999015808,
                                                                                                                                                                        'tDistributionFalseNegatives': {'unsampledValue': 1073.0},
                                                                                                                                                                        'tDistributionFalsePositives': {'unsampledValue': 113.0},
                                                                                                                                                                        'tDistributionPrecision': {'unsampledValue': 0.5920577645301819},
                                                                                                                                                                        'tDistributionRecall': {'unsampledValue': 0.13257881999015808},
                                                                                                                                                                        'tDistributionTrueNegatives': {'unsampledValue': 3040.0},
                                                                                                                                                                        'tDistributionTruePositives': {'unsampledValue': 164.0},
                                                                                                                                                                        'threshold': 0.8999999761581421,
                                                                                                                                                                        'trueNegatives': 3040.0,
                                                                                                                                                                        'truePositives': 164.0}]} },
                                                          'post_export_metrics/false_discovery_rate@0.10': {'doubleValue': 0.7108405232429504},
                                                          'post_export_metrics/false_discovery_rate@0.30': {'doubleValue': 0.6724040508270264},
                                                          'post_export_metrics/false_discovery_rate@0.50': {'doubleValue': 0.6186474561691284},
                                                          'post_export_metrics/false_discovery_rate@0.70': {'doubleValue': 0.5418470501899719},
                                                          'post_export_metrics/false_discovery_rate@0.90': {'doubleValue': 0.4079422354698181},
                                                          'post_export_metrics/false_negative_rate@0.10': {'doubleValue': 0.001616814872249961},
                                                          'post_export_metrics/false_negative_rate@0.30': {'doubleValue': 0.061438966542482376},
                                                          'post_export_metrics/false_negative_rate@0.50': {'doubleValue': 0.22958771884441376},
                                                          'post_export_metrics/false_negative_rate@0.70': {'doubleValue': 0.48666128516197205},
                                                          'post_export_metrics/false_negative_rate@0.90': {'doubleValue': 0.8674212098121643},
                                                          'post_export_metrics/false_omission_rate@0.10': {'doubleValue': 0.016806723549962044},
                                                          'post_export_metrics/false_omission_rate@0.30': {'doubleValue': 0.08983451873064041},
                                                          'post_export_metrics/false_omission_rate@0.50': {'doubleValue': 0.15018509328365326},
                                                          'post_export_metrics/false_omission_rate@0.70': {'doubleValue': 0.2003994733095169},
                                                          'post_export_metrics/false_omission_rate@0.90': {'doubleValue': 0.26088014245033264},
                                                          'post_export_metrics/false_positive_rate@0.10': {'doubleValue': 0.962892472743988},
                                                          'post_export_metrics/false_positive_rate@0.30': {'doubleValue': 0.7557881474494934},
                                                          'post_export_metrics/false_positive_rate@0.50': {'doubleValue': 0.4903266727924347},
                                                          'post_export_metrics/false_positive_rate@0.70': {'doubleValue': 0.238185852766037},
                                                          'post_export_metrics/false_positive_rate@0.90': {'doubleValue': 0.035838883370161057},
                                                          'post_export_metrics/negative_rate@0.10': {'doubleValue': 0.027107061818242073},
                                                          'post_export_metrics/negative_rate@0.30': {'doubleValue': 0.19271071255207062},
                                                          'post_export_metrics/negative_rate@0.50': {'doubleValue': 0.4307517111301422},
                                                          'post_export_metrics/negative_rate@0.70': {'doubleValue': 0.6842824816703796},
                                                          'post_export_metrics/negative_rate@0.90': {'doubleValue': 0.9369020462036133},
                                                          'post_export_metrics/positive_rate@0.10': {'doubleValue': 0.9728929400444031},
                                                          'post_export_metrics/positive_rate@0.30': {'doubleValue': 0.8072893023490906},
                                                          'post_export_metrics/positive_rate@0.50': {'doubleValue': 0.5692483186721802},
                                                          'post_export_metrics/positive_rate@0.70': {'doubleValue': 0.31571754813194275},
                                                          'post_export_metrics/positive_rate@0.90': {'doubleValue': 0.06309794634580612},
                                                          'post_export_metrics/true_negative_rate@0.10': {'doubleValue': 0.03710751608014107},
                                                          'post_export_metrics/true_negative_rate@0.30': {'doubleValue': 0.24421186745166779},
                                                          'post_export_metrics/true_negative_rate@0.50': {'doubleValue': 0.5096732974052429},
                                                          'post_export_metrics/true_negative_rate@0.70': {'doubleValue': 0.7618141174316406},
                                                          'post_export_metrics/true_negative_rate@0.90': {'doubleValue': 0.9641610980033875},
                                                          'post_export_metrics/true_positive_rate@0.10': {'doubleValue': 0.9983831644058228},
                                                          'post_export_metrics/true_positive_rate@0.30': {'doubleValue': 0.9385610222816467},
                                                          'post_export_metrics/true_positive_rate@0.50': {'doubleValue': 0.770412266254425},
                                                          'post_export_metrics/true_positive_rate@0.70': {'doubleValue': 0.5133387446403503},
                                                          'post_export_metrics/true_positive_rate@0.90': {'doubleValue': 0.13257881999015808},
                                                          'precision': {'doubleValue': 0.3813525438308716},
                                                          'prediction/mean': {'doubleValue': 0.5444074869155884},
                                                          'recall': {'doubleValue': 0.770412266254425} },
 (('sexual_orientation', 'other_sexual_orientation'),): {'accuracy': {'doubleValue': 0.6000000238418579},
                                                         'accuracy_baseline': {'doubleValue': 0.800000011920929},
                                                         'auc': {'doubleValue': 1.0},
                                                         'auc_precision_recall': {'doubleValue': 1.0},
                                                         'average_loss': {'doubleValue': 0.748439610004425},
                                                         'label/mean': {'doubleValue': 0.20000000298023224},
                                                         'post_export_metrics/example_count': {'doubleValue': 5.0},
                                                         'post_export_metrics/fairness/confusion_matrix_at_thresholds': {'confusionMatrixAtThresholds': {'matrices': [{'boundedFalseNegatives': {'value': 0.0},
                                                                                                                                                                       'boundedFalsePositives': {'value': 4.0},
                                                                                                                                                                       'boundedPrecision': {'value': 0.20000000298023224},
                                                                                                                                                                       'boundedRecall': {'value': 1.0},
                                                                                                                                                                       'boundedTrueNegatives': {'value': 0.0},
                                                                                                                                                                       'boundedTruePositives': {'value': 1.0},
                                                                                                                                                                       'falsePositives': 4.0,
                                                                                                                                                                       'precision': 0.20000000298023224,
                                                                                                                                                                       'recall': 1.0,
                                                                                                                                                                       'tDistributionFalseNegatives': {'unsampledValue': 0.0},
                                                                                                                                                                       'tDistributionFalsePositives': {'unsampledValue': 4.0},
                                                                                                                                                                       'tDistributionPrecision': {'unsampledValue': 0.20000000298023224},
                                                                                                                                                                       'tDistributionRecall': {'unsampledValue': 1.0},
                                                                                                                                                                       'tDistributionTrueNegatives': {'unsampledValue': 0.0},
                                                                                                                                                                       'tDistributionTruePositives': {'unsampledValue': 1.0},
                                                                                                                                                                       'threshold': 0.10000000149011612,
                                                                                                                                                                       'truePositives': 1.0},
                                                                                                                                                                      {'boundedFalseNegatives': {'value': 0.0},
                                                                                                                                                                       'boundedFalsePositives': {'value': 3.0},
                                                                                                                                                                       'boundedPrecision': {'value': 0.25},
                                                                                                                                                                       'boundedRecall': {'value': 1.0},
                                                                                                                                                                       'boundedTrueNegatives': {'value': 1.0},
                                                                                                                                                                       'boundedTruePositives': {'value': 1.0},
                                                                                                                                                                       'falsePositives': 3.0,
                                                                                                                                                                       'precision': 0.25,
                                                                                                                                                                       'recall': 1.0,
                                                                                                                                                                       'tDistributionFalseNegatives': {'unsampledValue': 0.0},
                                                                                                                                                                       'tDistributionFalsePositives': {'unsampledValue': 3.0},
                                                                                                                                                                       'tDistributionPrecision': {'unsampledValue': 0.25},
                                                                                                                                                                       'tDistributionRecall': {'unsampledValue': 1.0},
                                                                                                                                                                       'tDistributionTrueNegatives': {'unsampledValue': 1.0},
                                                                                                                                                                       'tDistributionTruePositives': {'unsampledValue': 1.0},
                                                                                                                                                                       'threshold': 0.30000001192092896,
                                                                                                                                                                       'trueNegatives': 1.0,
                                                                                                                                                                       'truePositives': 1.0},
                                                                                                                                                                      {'boundedFalseNegatives': {'value': 0.0},
                                                                                                                                                                       'boundedFalsePositives': {'value': 2.0},
                                                                                                                                                                       'boundedPrecision': {'value': 0.3333333432674408},
                                                                                                                                                                       'boundedRecall': {'value': 1.0},
                                                                                                                                                                       'boundedTrueNegatives': {'value': 2.0},
                                                                                                                                                                       'boundedTruePositives': {'value': 1.0},
                                                                                                                                                                       'falsePositives': 2.0,
                                                                                                                                                                       'precision': 0.3333333432674408,
                                                                                                                                                                       'recall': 1.0,
                                                                                                                                                                       'tDistributionFalseNegatives': {'unsampledValue': 0.0},
                                                                                                                                                                       'tDistributionFalsePositives': {'unsampledValue': 2.0},
                                                                                                                                                                       'tDistributionPrecision': {'unsampledValue': 0.3333333432674408},
                                                                                                                                                                       'tDistributionRecall': {'unsampledValue': 1.0},
                                                                                                                                                                       'tDistributionTrueNegatives': {'unsampledValue': 2.0},
                                                                                                                                                                       'tDistributionTruePositives': {'unsampledValue': 1.0},
                                                                                                                                                                       'threshold': 0.5,
                                                                                                                                                                       'trueNegatives': 2.0,
                                                                                                                                                                       'truePositives': 1.0},
                                                                                                                                                                      {'boundedFalseNegatives': {'value': 0.0},
                                                                                                                                                                       'boundedFalsePositives': {'value': 1.0},
                                                                                                                                                                       'boundedPrecision': {'value': 0.5},
                                                                                                                                                                       'boundedRecall': {'value': 1.0},
                                                                                                                                                                       'boundedTrueNegatives': {'value': 3.0},
                                                                                                                                                                       'boundedTruePositives': {'value': 1.0},
                                                                                                                                                                       'falsePositives': 1.0,
                                                                                                                                                                       'precision': 0.5,
                                                                                                                                                                       'recall': 1.0,
                                                                                                                                                                       'tDistributionFalseNegatives': {'unsampledValue': 0.0},
                                                                                                                                                                       'tDistributionFalsePositives': {'unsampledValue': 1.0},
                                                                                                                                                                       'tDistributionPrecision': {'unsampledValue': 0.5},
                                                                                                                                                                       'tDistributionRecall': {'unsampledValue': 1.0},
                                                                                                                                                                       'tDistributionTrueNegatives': {'unsampledValue': 3.0},
                                                                                                                                                                       'tDistributionTruePositives': {'unsampledValue': 1.0},
                                                                                                                                                                       'threshold': 0.699999988079071,
                                                                                                                                                                       'trueNegatives': 3.0,
                                                                                                                                                                       'truePositives': 1.0},
                                                                                                                                                                      {'boundedFalseNegatives': {'value': 1.0},
                                                                                                                                                                       'boundedFalsePositives': {'value': 0.0},
                                                                                                                                                                       'boundedPrecision': {'value': 0.0},
                                                                                                                                                                       'boundedRecall': {'value': 0.0},
                                                                                                                                                                       'boundedTrueNegatives': {'value': 4.0},
                                                                                                                                                                       'boundedTruePositives': {'value': 0.0},
                                                                                                                                                                       'falseNegatives': 1.0,
                                                                                                                                                                       'tDistributionFalseNegatives': {'unsampledValue': 1.0},
                                                                                                                                                                       'tDistributionFalsePositives': {'unsampledValue': 0.0},
                                                                                                                                                                       'tDistributionPrecision': {'unsampledValue': 0.0},
                                                                                                                                                                       'tDistributionRecall': {'unsampledValue': 0.0},
                                                                                                                                                                       'tDistributionTrueNegatives': {'unsampledValue': 4.0},
                                                                                                                                                                       'tDistributionTruePositives': {'unsampledValue': 0.0},
                                                                                                                                                                       'threshold': 0.8999999761581421,
                                                                                                                                                                       'trueNegatives': 4.0}]} },
                                                         'post_export_metrics/false_discovery_rate@0.10': {'doubleValue': 0.800000011920929},
                                                         'post_export_metrics/false_discovery_rate@0.30': {'doubleValue': 0.75},
                                                         'post_export_metrics/false_discovery_rate@0.50': {'doubleValue': 0.6666666865348816},
                                                         'post_export_metrics/false_discovery_rate@0.70': {'doubleValue': 0.5},
                                                         'post_export_metrics/false_discovery_rate@0.90': {'doubleValue': 0.0},
                                                         'post_export_metrics/false_negative_rate@0.10': {'doubleValue': 0.0},
                                                         'post_export_metrics/false_negative_rate@0.30': {'doubleValue': 0.0},
                                                         'post_export_metrics/false_negative_rate@0.50': {'doubleValue': 0.0},
                                                         'post_export_metrics/false_negative_rate@0.70': {'doubleValue': 0.0},
                                                         'post_export_metrics/false_negative_rate@0.90': {'doubleValue': 1.0},
                                                         'post_export_metrics/false_omission_rate@0.10': {'doubleValue': 0.0},
                                                         'post_export_metrics/false_omission_rate@0.30': {'doubleValue': 0.0},
                                                         'post_export_metrics/false_omission_rate@0.50': {'doubleValue': 0.0},
                                                         'post_export_metrics/false_omission_rate@0.70': {'doubleValue': 0.0},
                                                         'post_export_metrics/false_omission_rate@0.90': {'doubleValue': 0.20000000298023224},
                                                         'post_export_metrics/false_positive_rate@0.10': {'doubleValue': 1.0},
                                                         'post_export_metrics/false_positive_rate@0.30': {'doubleValue': 0.75},
                                                         'post_export_metrics/false_positive_rate@0.50': {'doubleValue': 0.5},
                                                         'post_export_metrics/false_positive_rate@0.70': {'doubleValue': 0.25},
                                                         'post_export_metrics/false_positive_rate@0.90': {'doubleValue': 0.0},
                                                         'post_export_metrics/negative_rate@0.10': {'doubleValue': 0.0},
                                                         'post_export_metrics/negative_rate@0.30': {'doubleValue': 0.20000000298023224},
                                                         'post_export_metrics/negative_rate@0.50': {'doubleValue': 0.4000000059604645},
                                                         'post_export_metrics/negative_rate@0.70': {'doubleValue': 0.6000000238418579},
                                                         'post_export_metrics/negative_rate@0.90': {'doubleValue': 1.0},
                                                         'post_export_metrics/positive_rate@0.10': {'doubleValue': 1.0},
                                                         'post_export_metrics/positive_rate@0.30': {'doubleValue': 0.800000011920929},
                                                         'post_export_metrics/positive_rate@0.50': {'doubleValue': 0.6000000238418579},
                                                         'post_export_metrics/positive_rate@0.70': {'doubleValue': 0.4000000059604645},
                                                         'post_export_metrics/positive_rate@0.90': {'doubleValue': 0.0},
                                                         'post_export_metrics/true_negative_rate@0.10': {'doubleValue': 0.0},
                                                         'post_export_metrics/true_negative_rate@0.30': {'doubleValue': 0.25},
                                                         'post_export_metrics/true_negative_rate@0.50': {'doubleValue': 0.5},
                                                         'post_export_metrics/true_negative_rate@0.70': {'doubleValue': 0.75},
                                                         'post_export_metrics/true_negative_rate@0.90': {'doubleValue': 1.0},
                                                         'post_export_metrics/true_positive_rate@0.10': {'doubleValue': 1.0},
                                                         'post_export_metrics/true_positive_rate@0.30': {'doubleValue': 1.0},
                                                         'post_export_metrics/true_positive_rate@0.50': {'doubleValue': 1.0},
                                                         'post_export_metrics/true_positive_rate@0.70': {'doubleValue': 1.0},
                                                         'post_export_metrics/true_positive_rate@0.90': {'doubleValue': 0.0},
                                                         'precision': {'doubleValue': 0.3333333432674408},
                                                         'prediction/mean': {'doubleValue': 0.5993438363075256},
                                                         'recall': {'doubleValue': 1.0} } }