Introducción a los indicadores de equidad

Ver en TensorFlow.org Ejecutar en Google Colab Ver en GitHub Descargar cuaderno Ver modelo TF Hub

Descripción general

Los indicadores de equidad es un conjunto de herramientas integradas en la parte superior del Modelo de Análisis TensorFlow (TFMA) que permiten una evaluación regular de la métrica de equidad en las tuberías de productos. TFMA es una biblioteca para evaluar modelos de aprendizaje automático TensorFlow y no TensorFlow. Le permite evaluar sus modelos en grandes cantidades de datos de manera distribuida, calcular en gráficos y otras métricas en diferentes segmentos de datos y visualizarlos en cuadernos.

Los indicadores de equidad está empaquetado con TensorFlow validación de datos (TFDV) y la herramienta Qué-Si . Usando indicadores de imparcialidad le permite:

  • Evaluar el rendimiento del modelo, rebanado en grupos de usuarios definidos
  • Obtenga confianza en los resultados con intervalos de confianza y evaluaciones en umbrales múltiples
  • Evaluar la distribución de conjuntos de datos
  • Sumergir profundamente en rodajas individuales para explorar causas y oportunidades de mejora de la raíz

En este cuaderno, que va a utilizar indicadores de equidad con respecto a solucionar los problemas de equidad en un modelo se entrena con el Comentarios Civil conjunto de datos . Vea este vídeo para obtener más detalles y el contexto en el escenario del mundo real esto se basa en que es también una de las motivaciones principales para la creación de indicadores de equidad.

conjunto de datos

En este cuaderno, que va a trabajar con el Comentarios Civil conjunto de datos , aproximadamente 2 millones de comentarios públicos hechos públicos por el Civil Comentarios plataforma en 2017 para la investigación en curso. Este esfuerzo fue patrocinado por Jigsaw , que han sido sede de las competiciones de Kaggle para ayudar a clasificar los comentarios tóxicos, así como minimizar el sesgo modelo no deseado.

Cada comentario de texto individual en el conjunto de datos tiene una etiqueta de toxicidad, siendo la etiqueta 1 si el comentario es tóxico y 0 si el comentario no es tóxico. Dentro de los datos, un subconjunto de comentarios está etiquetado con una variedad de atributos de identidad, incluidas categorías de género, orientación sexual, religión y raza o etnia.

Configuración

Instalar fairness-indicators y witwidget .

pip install -q -U pip==20.2

pip install -q fairness-indicators
pip install -q witwidget

Debe reiniciar el tiempo de ejecución de Colab después de la instalación. Elija un tiempo de ejecución> Reiniciar el tiempo de ejecución en el menú Colab.

No continúe con el resto de este tutorial sin antes reiniciar el tiempo de ejecución.

Importe todas las demás bibliotecas necesarias.

import os
import tempfile
import apache_beam as beam
import numpy as np
import pandas as pd
from datetime import datetime
import pprint

from google.protobuf import text_format

import tensorflow_hub as hub
import tensorflow as tf
import tensorflow_model_analysis as tfma
import tensorflow_data_validation as tfdv

from tfx_bsl.tfxio import tensor_adapter
from tfx_bsl.tfxio import tf_example_record

from tensorflow_model_analysis.addons.fairness.post_export_metrics import fairness_indicators
from tensorflow_model_analysis.addons.fairness.view import widget_view

from fairness_indicators.tutorial_utils import util

from witwidget.notebook.visualization import WitConfigBuilder
from witwidget.notebook.visualization import WitWidget

from tensorflow_metadata.proto.v0 import schema_pb2

Descargar y analizar los datos.

De manera predeterminada, este cuaderno descarga una versión preprocesada de este conjunto de datos, pero puede usar el conjunto de datos original y volver a ejecutar los pasos de procesamiento si lo desea. En el conjunto de datos original, cada comentario está etiquetado con el porcentaje de evaluadores que creían que un comentario corresponde a una identidad particular. Por ejemplo, un comentario puede etiquetarse con lo siguiente: { masculino: 0.3, femenino: 1.0, transgénero: 0.0, heterosexual: 0.8, homosexual_gay_or_lesbian: 1.0 } El paso de procesamiento agrupa la identidad por categoría (género, orientación_sexual, etc.) y elimina Identidades con una puntuación inferior a 0.5. Entonces, el ejemplo anterior se convertiría en el siguiente: de evaluadores que creían que un comentario corresponde a una identidad particular. Por ejemplo, el comentario se etiquetaría con lo siguiente: { género: [femenino], orientación_sexual: [heterosexual, homosexual_gay_or_lesbiana] }

download_original_data = False

if download_original_data:
  train_tf_file = tf.keras.utils.get_file('train_tf.tfrecord',
                                          'https://storage.googleapis.com/civil_comments_dataset/train_tf.tfrecord')
  validate_tf_file = tf.keras.utils.get_file('validate_tf.tfrecord',
                                             'https://storage.googleapis.com/civil_comments_dataset/validate_tf.tfrecord')

  # The identity terms list will be grouped together by their categories
  # (see 'IDENTITY_COLUMNS') on threshould 0.5. Only the identity term column,
  # text column and label column will be kept after processing.
  train_tf_file = util.convert_comments_data(train_tf_file)
  validate_tf_file = util.convert_comments_data(validate_tf_file)

else:
  train_tf_file = tf.keras.utils.get_file('train_tf_processed.tfrecord',
                                          'https://storage.googleapis.com/civil_comments_dataset/train_tf_processed.tfrecord')
  validate_tf_file = tf.keras.utils.get_file('validate_tf_processed.tfrecord',
                                             'https://storage.googleapis.com/civil_comments_dataset/validate_tf_processed.tfrecord')

Use TFDV para analizar los datos y encontrar posibles problemas en ellos, como valores faltantes y desequilibrios de datos, que pueden generar disparidades en la equidad.

stats = tfdv.generate_statistics_from_tfrecord(data_location=train_tf_file)
tfdv.visualize_statistics(stats)
WARNING:apache_beam.runners.interactive.interactive_environment:Dependencies required for Interactive Beam PCollection visualization are not available, please use: `pip install apache-beam[interactive]` to install necessary dependencies to enable all data visualization features.
WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter.
WARNING:apache_beam.io.tfrecordio:Couldn't find python-snappy so the implementation of _TFRecordUtil._masked_crc32c is not as fast as it could be.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_data_validation/utils/stats_util.py:247: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_data_validation/utils/stats_util.py:247: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`

TFDV muestra que existen algunos desequilibrios significativos en los datos que podrían conducir a resultados sesgados del modelo.

  • La etiqueta de toxicidad (el valor predicho por el modelo) está desequilibrada. Solo el 8% de los ejemplos en el conjunto de entrenamiento son tóxicos, lo que significa que un clasificador podría obtener un 92% de precisión al predecir que todos los comentarios no son tóxicos.

  • En los campos relacionados con los términos de identidad, solo 6,6k de los 1,08 millones (0,61%) de ejemplos de capacitación tratan sobre la homosexualidad, y los relacionados con la bisexualidad son aún más raros. Esto indica que el rendimiento en estos segmentos puede verse afectado debido a la falta de datos de entrenamiento.

preparar los datos

Defina un mapa de características para analizar los datos. Cada ejemplo tendrá una etiqueta, comentario de texto, y la identidad de las características sexual orientation , gender , religion , race y disability que están asociadas con el texto.

BASE_DIR = tempfile.gettempdir()

TEXT_FEATURE = 'comment_text'
LABEL = 'toxicity'
FEATURE_MAP = {
    # Label:
    LABEL: tf.io.FixedLenFeature([], tf.float32),
    # Text:
    TEXT_FEATURE:  tf.io.FixedLenFeature([], tf.string),

    # Identities:
    'sexual_orientation':tf.io.VarLenFeature(tf.string),
    'gender':tf.io.VarLenFeature(tf.string),
    'religion':tf.io.VarLenFeature(tf.string),
    'race':tf.io.VarLenFeature(tf.string),
    'disability':tf.io.VarLenFeature(tf.string),
}

A continuación, configure una función de entrada para alimentar datos en el modelo. Agregue una columna de ponderación a cada ejemplo y aumente la ponderación de los ejemplos tóxicos para tener en cuenta el desequilibrio de clase identificado por el TFDV. Use solo características de identidad durante la fase de evaluación, ya que solo los comentarios se introducen en el modelo durante el entrenamiento.

def train_input_fn():
  def parse_function(serialized):
    parsed_example = tf.io.parse_single_example(
        serialized=serialized, features=FEATURE_MAP)
    # Adds a weight column to deal with unbalanced classes.
    parsed_example['weight'] = tf.add(parsed_example[LABEL], 0.1)
    return (parsed_example,
            parsed_example[LABEL])
  train_dataset = tf.data.TFRecordDataset(
      filenames=[train_tf_file]).map(parse_function).batch(512)
  return train_dataset

entrenar al modelo

Crea y entrena un modelo de aprendizaje profundo en los datos.

model_dir = os.path.join(BASE_DIR, 'train', datetime.now().strftime(
    "%Y%m%d-%H%M%S"))

embedded_text_feature_column = hub.text_embedding_column(
    key=TEXT_FEATURE,
    module_spec='https://tfhub.dev/google/nnlm-en-dim128/1')

classifier = tf.estimator.DNNClassifier(
    hidden_units=[500, 100],
    weight_column='weight',
    feature_columns=[embedded_text_feature_column],
    optimizer=tf.keras.optimizers.Adagrad(learning_rate=0.003),
    loss_reduction=tf.losses.Reduction.SUM,
    n_classes=2,
    model_dir=model_dir)

classifier.train(input_fn=train_input_fn, steps=1000)
INFO:tensorflow:Using default config.
INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': '/tmp/train/20210923-205025', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Using config: {'_model_dir': '/tmp/train/20210923-205025', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
2021-09-23 20:50:26.540914: W tensorflow/core/common_runtime/graph_constructor.cc:1511] Importing a graph with a lower producer version 26 into an existing graph with producer version 808. Shape inference will have run different parts of the graph with different producer versions.
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/head/base_head.py:512: NumericColumn._get_dense_tensor (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version.
Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/head/base_head.py:512: NumericColumn._get_dense_tensor (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version.
Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/feature_column/feature_column.py:2192: NumericColumn._transform_feature (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version.
Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/feature_column/feature_column.py:2192: NumericColumn._transform_feature (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version.
Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/keras/optimizer_v2/adagrad.py:84: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/keras/optimizer_v2/adagrad.py:84: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Saving checkpoints for 0 into /tmp/train/20210923-205025/model.ckpt.
INFO:tensorflow:Saving checkpoints for 0 into /tmp/train/20210923-205025/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:loss = 59.34932, step = 0
INFO:tensorflow:loss = 59.34932, step = 0
INFO:tensorflow:global_step/sec: 108.435
INFO:tensorflow:global_step/sec: 108.435
INFO:tensorflow:loss = 56.416668, step = 100 (0.924 sec)
INFO:tensorflow:loss = 56.416668, step = 100 (0.924 sec)
INFO:tensorflow:global_step/sec: 116.367
INFO:tensorflow:global_step/sec: 116.367
INFO:tensorflow:loss = 47.250374, step = 200 (0.859 sec)
INFO:tensorflow:loss = 47.250374, step = 200 (0.859 sec)
INFO:tensorflow:global_step/sec: 116.333
INFO:tensorflow:global_step/sec: 116.333
INFO:tensorflow:loss = 55.81682, step = 300 (0.860 sec)
INFO:tensorflow:loss = 55.81682, step = 300 (0.860 sec)
INFO:tensorflow:global_step/sec: 116.844
INFO:tensorflow:global_step/sec: 116.844
INFO:tensorflow:loss = 55.814293, step = 400 (0.856 sec)
INFO:tensorflow:loss = 55.814293, step = 400 (0.856 sec)
INFO:tensorflow:global_step/sec: 114.434
INFO:tensorflow:global_step/sec: 114.434
INFO:tensorflow:loss = 41.805046, step = 500 (0.874 sec)
INFO:tensorflow:loss = 41.805046, step = 500 (0.874 sec)
INFO:tensorflow:global_step/sec: 115.693
INFO:tensorflow:global_step/sec: 115.693
INFO:tensorflow:loss = 45.53726, step = 600 (0.864 sec)
INFO:tensorflow:loss = 45.53726, step = 600 (0.864 sec)
INFO:tensorflow:global_step/sec: 115.772
INFO:tensorflow:global_step/sec: 115.772
INFO:tensorflow:loss = 51.17028, step = 700 (0.864 sec)
INFO:tensorflow:loss = 51.17028, step = 700 (0.864 sec)
INFO:tensorflow:global_step/sec: 116.131
INFO:tensorflow:global_step/sec: 116.131
INFO:tensorflow:loss = 47.696205, step = 800 (0.861 sec)
INFO:tensorflow:loss = 47.696205, step = 800 (0.861 sec)
INFO:tensorflow:global_step/sec: 115.609
INFO:tensorflow:global_step/sec: 115.609
INFO:tensorflow:loss = 47.800926, step = 900 (0.865 sec)
INFO:tensorflow:loss = 47.800926, step = 900 (0.865 sec)
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 1000...
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 1000...
INFO:tensorflow:Saving checkpoints for 1000 into /tmp/train/20210923-205025/model.ckpt.
INFO:tensorflow:Saving checkpoints for 1000 into /tmp/train/20210923-205025/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 1000...
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 1000...
INFO:tensorflow:Loss for final step: 50.67367.
INFO:tensorflow:Loss for final step: 50.67367.
<tensorflow_estimator.python.estimator.canned.dnn.DNNClassifierV2 at 0x7f113351ebd0>

Analizar el modelo

Después de obtener el modelo entrenado, analícelo para calcular métricas de equidad utilizando TFMA e indicadores de equidad. Comience por exportar el modelo como un SavedModel .

Exportar modelo guardado

def eval_input_receiver_fn():
  serialized_tf_example = tf.compat.v1.placeholder(
      dtype=tf.string, shape=[None], name='input_example_placeholder')

  # This *must* be a dictionary containing a single key 'examples', which
  # points to the input placeholder.
  receiver_tensors = {'examples': serialized_tf_example}

  features = tf.io.parse_example(serialized_tf_example, FEATURE_MAP)
  features['weight'] = tf.ones_like(features[LABEL])

  return tfma.export.EvalInputReceiver(
    features=features,
    receiver_tensors=receiver_tensors,
    labels=features[LABEL])

tfma_export_dir = tfma.export.export_eval_savedmodel(
  estimator=classifier,
  export_dir_base=os.path.join(BASE_DIR, 'tfma_eval_model'),
  eval_input_receiver_fn=eval_input_receiver_fn)
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_model_analysis/eval_saved_model/encoding.py:141: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_model_analysis/eval_saved_model/encoding.py:141: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
2021-09-23 20:50:39.359797: W tensorflow/core/common_runtime/graph_constructor.cc:1511] Importing a graph with a lower producer version 26 into an existing graph with producer version 808. Shape inference will have run different parts of the graph with different producer versions.
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Signatures INCLUDED in export for Classify: None
INFO:tensorflow:Signatures INCLUDED in export for Classify: None
INFO:tensorflow:Signatures INCLUDED in export for Regress: None
INFO:tensorflow:Signatures INCLUDED in export for Regress: None
INFO:tensorflow:Signatures INCLUDED in export for Predict: None
INFO:tensorflow:Signatures INCLUDED in export for Predict: None
INFO:tensorflow:Signatures INCLUDED in export for Train: None
INFO:tensorflow:Signatures INCLUDED in export for Train: None
INFO:tensorflow:Signatures INCLUDED in export for Eval: ['eval']
INFO:tensorflow:Signatures INCLUDED in export for Eval: ['eval']
WARNING:tensorflow:Export includes no default signature!
WARNING:tensorflow:Export includes no default signature!
INFO:tensorflow:Restoring parameters from /tmp/train/20210923-205025/model.ckpt-1000
INFO:tensorflow:Restoring parameters from /tmp/train/20210923-205025/model.ckpt-1000
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:Assets written to: /tmp/tfma_eval_model/temp-1632430239/assets
INFO:tensorflow:Assets written to: /tmp/tfma_eval_model/temp-1632430239/assets
INFO:tensorflow:SavedModel written to: /tmp/tfma_eval_model/temp-1632430239/saved_model.pb
INFO:tensorflow:SavedModel written to: /tmp/tfma_eval_model/temp-1632430239/saved_model.pb

Métricas de equidad informática

Seleccione la identidad para calcular las métricas y si ejecutar con intervalos de confianza usando el menú desplegable en el panel de la derecha.

Indicadores de equidad Opciones de cálculo

Slice selection: sexual_orientation
Compute confidence intervals: False
WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_model_analysis/eval_saved_model/load.py:169: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_model_analysis/eval_saved_model/load.py:169: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0.
INFO:tensorflow:Restoring parameters from /tmp/tfma_eval_model/1632430239/variables/variables
INFO:tensorflow:Restoring parameters from /tmp/tfma_eval_model/1632430239/variables/variables
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_model_analysis/eval_saved_model/graph_ref.py:189: get_tensor_from_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.get_tensor_from_tensor_info or tf.compat.v1.saved_model.get_tensor_from_tensor_info.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_model_analysis/eval_saved_model/graph_ref.py:189: get_tensor_from_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.get_tensor_from_tensor_info or tf.compat.v1.saved_model.get_tensor_from_tensor_info.
WARNING:apache_beam.io.filebasedsink:Deleting 1 existing files in target path matching: 
WARNING:apache_beam.io.filebasedsink:Deleting 1 existing files in target path matching: 
WARNING:apache_beam.io.filebasedsink:Deleting 1 existing files in target path matching:

Visualice los datos utilizando la herramienta que si

En esta sección, utilizará la interfaz visual interactiva de What-If Tool para explorar y manipular datos a un nivel micro.

Cada punto del gráfico de dispersión del panel de la derecha representa uno de los ejemplos del subconjunto cargado en la herramienta. Haga clic en uno de los puntos para ver detalles sobre este ejemplo en particular en el panel de la izquierda. Se muestra el texto de comentarios, la toxicidad de la verdad molida y las identidades aplicables. En la parte inferior de este panel izquierdo, verá los resultados de la inferencia del modelo que acaba de entrenar.

Modificar el texto del ejemplo y haga clic en el botón Ejecutar la inferencia a la vista de cómo los cambios causaron la predicción de la toxicidad percibido a cambio.

DEFAULT_MAX_EXAMPLES = 1000

# Load 100000 examples in memory. When first rendered, 
# What-If Tool should only display 1000 of these due to browser constraints.
def wit_dataset(file, num_examples=100000):
  dataset = tf.data.TFRecordDataset(
      filenames=[file]).take(num_examples)
  return [tf.train.Example.FromString(d.numpy()) for d in dataset]

wit_data = wit_dataset(train_tf_file)
config_builder = WitConfigBuilder(wit_data[:DEFAULT_MAX_EXAMPLES]).set_estimator_and_feature_spec(
    classifier, FEATURE_MAP).set_label_vocab(['non-toxicity', LABEL]).set_target_feature(LABEL)
wit = WitWidget(config_builder)

Indicadores de equidad de renderizado

Presta el widget de indicadores de equidad con los resultados de la evaluación exportada.

A continuación, verá gráficos de barras que muestran el rendimiento de cada porción de los datos en las métricas seleccionadas. Puede ajustar el segmento de comparación de la línea de base, así como los umbrales mostrados, utilizando los menús desplegables en la parte superior de la visualización.

El widget Indicador de imparcialidad está integrado con la herramienta de What-SI que se hace anteriormente. Si selecciona una porción de los datos en el gráfico de barras, la herramienta What-If se actualizará para mostrarle ejemplos de la porción seleccionada. Cuando vuelve a cargar datos en la herramienta de What-If anterior, intentan modificar Colour By a la toxicidad. Esto puede darle una comprensión visual del equilibrio de la toxicidad de los ejemplos por la rebanada.

event_handlers={'slice-selected':
                wit.create_selection_callback(wit_data, DEFAULT_MAX_EXAMPLES)}
widget_view.render_fairness_indicator(eval_result=eval_result,
                                      slicing_column=slice_selection,
                                      event_handlers=event_handlers
                                      )
FairnessIndicatorViewer(slicingMetrics=[{'sliceValue': 'Overall', 'slice': 'Overall', 'metrics': {'prediction/…

Con este conjunto de datos y esta tarea en particular, las tasas sistemáticamente más altas de falsos positivos y falsos negativos para ciertas identidades pueden tener consecuencias negativas. Por ejemplo, en un sistema de moderación de contenido, una tasa de falsos positivos superior a la general para un determinado grupo puede hacer que se silencien esas voces. Por lo tanto, es importante evaluar periódicamente este tipo de criterios a medida que desarrolla y mejora los modelos, y utiliza herramientas como los indicadores de equidad, TFDV y WIT para ayudar a identificar posibles problemas. Una vez que haya identificado los problemas de equidad, puede experimentar con nuevas fuentes de datos, equilibrio de datos u otras técnicas para mejorar el rendimiento en grupos de bajo rendimiento.

Ver aquí para más información y orientación sobre el uso de indicadores de equidad.

Utilizar los resultados de la evaluación de equidad

El eval_result objeto, rendido por encima de render_fairness_indicator() , tiene su propia API que puede aprovechar para leer los resultados TFMA en sus programas.

Obtenga segmentos y métricas evaluados

Use get_slice_names() y get_metric_names() para obtener las rodajas y las métricas evaluadas, respectivamente.

pp = pprint.PrettyPrinter()

print("Slices:")
pp.pprint(eval_result.get_slice_names())
print("\nMetrics:")
pp.pprint(eval_result.get_metric_names())
Slices:
[(),
 (('sexual_orientation', 'homosexual_gay_or_lesbian'),),
 (('sexual_orientation', 'heterosexual'),),
 (('sexual_orientation', 'bisexual'),),
 (('sexual_orientation', 'other_sexual_orientation'),)]

Metrics:
['fairness_indicators_metrics/negative_rate@0.1',
 'fairness_indicators_metrics/positive_rate@0.7',
 'fairness_indicators_metrics/false_discovery_rate@0.9',
 'fairness_indicators_metrics/false_negative_rate@0.3',
 'fairness_indicators_metrics/false_omission_rate@0.1',
 'accuracy',
 'fairness_indicators_metrics/false_discovery_rate@0.7',
 'fairness_indicators_metrics/false_negative_rate@0.7',
 'label/mean',
 'fairness_indicators_metrics/true_positive_rate@0.5',
 'fairness_indicators_metrics/false_positive_rate@0.1',
 'recall',
 'fairness_indicators_metrics/false_omission_rate@0.7',
 'fairness_indicators_metrics/false_positive_rate@0.7',
 'auc_precision_recall',
 'fairness_indicators_metrics/negative_rate@0.7',
 'fairness_indicators_metrics/negative_rate@0.3',
 'fairness_indicators_metrics/false_discovery_rate@0.3',
 'fairness_indicators_metrics/true_negative_rate@0.9',
 'fairness_indicators_metrics/false_omission_rate@0.3',
 'fairness_indicators_metrics/false_negative_rate@0.1',
 'fairness_indicators_metrics/true_negative_rate@0.3',
 'fairness_indicators_metrics/true_positive_rate@0.7',
 'fairness_indicators_metrics/false_positive_rate@0.3',
 'fairness_indicators_metrics/true_positive_rate@0.1',
 'fairness_indicators_metrics/true_positive_rate@0.9',
 'fairness_indicators_metrics/false_negative_rate@0.9',
 'fairness_indicators_metrics/positive_rate@0.5',
 'fairness_indicators_metrics/positive_rate@0.9',
 'fairness_indicators_metrics/negative_rate@0.9',
 'fairness_indicators_metrics/true_negative_rate@0.1',
 'fairness_indicators_metrics/false_omission_rate@0.5',
 'post_export_metrics/example_count',
 'fairness_indicators_metrics/false_omission_rate@0.9',
 'fairness_indicators_metrics/negative_rate@0.5',
 'fairness_indicators_metrics/false_positive_rate@0.5',
 'fairness_indicators_metrics/positive_rate@0.3',
 'prediction/mean',
 'accuracy_baseline',
 'fairness_indicators_metrics/true_negative_rate@0.5',
 'fairness_indicators_metrics/false_discovery_rate@0.5',
 'fairness_indicators_metrics/false_discovery_rate@0.1',
 'precision',
 'fairness_indicators_metrics/false_positive_rate@0.9',
 'fairness_indicators_metrics/true_positive_rate@0.3',
 'auc',
 'average_loss',
 'fairness_indicators_metrics/positive_rate@0.1',
 'fairness_indicators_metrics/false_negative_rate@0.5',
 'fairness_indicators_metrics/true_negative_rate@0.7']

Utilice get_metrics_for_slice() para obtener la métrica de una porción particular, como un diccionario de mapeo nombres métricas a los valores métricos .

baseline_slice = ()
heterosexual_slice = (('sexual_orientation', 'heterosexual'),)

print("Baseline metric values:")
pp.pprint(eval_result.get_metrics_for_slice(baseline_slice))
print("\nHeterosexual metric values:")
pp.pprint(eval_result.get_metrics_for_slice(heterosexual_slice))
Baseline metric values:
{'accuracy': {'doubleValue': 0.7174859642982483},
 'accuracy_baseline': {'doubleValue': 0.9198060631752014},
 'auc': {'doubleValue': 0.796409547328949},
 'auc_precision_recall': {'doubleValue': 0.3000231087207794},
 'average_loss': {'doubleValue': 0.5615971088409424},
 'fairness_indicators_metrics/false_discovery_rate@0.1': {'doubleValue': 0.9139404145348933},
 'fairness_indicators_metrics/false_discovery_rate@0.3': {'doubleValue': 0.8796606156634021},
 'fairness_indicators_metrics/false_discovery_rate@0.5': {'doubleValue': 0.816806708107944},
 'fairness_indicators_metrics/false_discovery_rate@0.7': {'doubleValue': 0.7090802784427505},
 'fairness_indicators_metrics/false_discovery_rate@0.9': {'doubleValue': 0.4814937210839392},
 'fairness_indicators_metrics/false_negative_rate@0.1': {'doubleValue': 0.006079867348348763},
 'fairness_indicators_metrics/false_negative_rate@0.3': {'doubleValue': 0.08696628437197734},
 'fairness_indicators_metrics/false_negative_rate@0.5': {'doubleValue': 0.2705713693519414},
 'fairness_indicators_metrics/false_negative_rate@0.7': {'doubleValue': 0.5445108470360647},
 'fairness_indicators_metrics/false_negative_rate@0.9': {'doubleValue': 0.891598728755009},
 'fairness_indicators_metrics/false_omission_rate@0.1': {'doubleValue': 0.006604499315158452},
 'fairness_indicators_metrics/false_omission_rate@0.3': {'doubleValue': 0.017811407791031682},
 'fairness_indicators_metrics/false_omission_rate@0.5': {'doubleValue': 0.03187681488249431},
 'fairness_indicators_metrics/false_omission_rate@0.7': {'doubleValue': 0.04993640137936933},
 'fairness_indicators_metrics/false_omission_rate@0.9': {'doubleValue': 0.07271999842219298},
 'fairness_indicators_metrics/false_positive_rate@0.1': {'doubleValue': 0.9202700382800194},
 'fairness_indicators_metrics/false_positive_rate@0.3': {'doubleValue': 0.5818879187535954},
 'fairness_indicators_metrics/false_positive_rate@0.5': {'doubleValue': 0.28355525303665063},
 'fairness_indicators_metrics/false_positive_rate@0.7': {'doubleValue': 0.09679333307231039},
 'fairness_indicators_metrics/false_positive_rate@0.9': {'doubleValue': 0.00877639469079322},
 'fairness_indicators_metrics/negative_rate@0.1': {'doubleValue': 0.07382367199944595},
 'fairness_indicators_metrics/negative_rate@0.3': {'doubleValue': 0.39155620195304386},
 'fairness_indicators_metrics/negative_rate@0.5': {'doubleValue': 0.6806884133250225},
 'fairness_indicators_metrics/negative_rate@0.7': {'doubleValue': 0.8744414433132488},
 'fairness_indicators_metrics/negative_rate@0.9': {'doubleValue': 0.9832342960038783},
 'fairness_indicators_metrics/positive_rate@0.1': {'doubleValue': 0.926176328000554},
 'fairness_indicators_metrics/positive_rate@0.3': {'doubleValue': 0.6084437980469561},
 'fairness_indicators_metrics/positive_rate@0.5': {'doubleValue': 0.3193115866749775},
 'fairness_indicators_metrics/positive_rate@0.7': {'doubleValue': 0.12555855668675117},
 'fairness_indicators_metrics/positive_rate@0.9': {'doubleValue': 0.016765703996121616},
 'fairness_indicators_metrics/true_negative_rate@0.1': {'doubleValue': 0.0797299617199806},
 'fairness_indicators_metrics/true_negative_rate@0.3': {'doubleValue': 0.41811208124640464},
 'fairness_indicators_metrics/true_negative_rate@0.5': {'doubleValue': 0.7164447469633494},
 'fairness_indicators_metrics/true_negative_rate@0.7': {'doubleValue': 0.9032066669276896},
 'fairness_indicators_metrics/true_negative_rate@0.9': {'doubleValue': 0.9912236053092068},
 'fairness_indicators_metrics/true_positive_rate@0.1': {'doubleValue': 0.9939201326516512},
 'fairness_indicators_metrics/true_positive_rate@0.3': {'doubleValue': 0.9130337156280227},
 'fairness_indicators_metrics/true_positive_rate@0.5': {'doubleValue': 0.7294286306480586},
 'fairness_indicators_metrics/true_positive_rate@0.7': {'doubleValue': 0.45548915296393533},
 'fairness_indicators_metrics/true_positive_rate@0.9': {'doubleValue': 0.10840127124499102},
 'label/mean': {'doubleValue': 0.08019392192363739},
 'post_export_metrics/example_count': {'doubleValue': 721950.0},
 'precision': {'doubleValue': 0.18319329619407654},
 'prediction/mean': {'doubleValue': 0.3998037576675415},
 'recall': {'doubleValue': 0.7294286489486694} }

Heterosexual metric values:
{'accuracy': {'doubleValue': 0.5203251838684082},
 'accuracy_baseline': {'doubleValue': 0.7601625919342041},
 'auc': {'doubleValue': 0.6672822833061218},
 'auc_precision_recall': {'doubleValue': 0.4065391719341278},
 'average_loss': {'doubleValue': 0.8273133039474487},
 'fairness_indicators_metrics/false_discovery_rate@0.1': {'doubleValue': 0.7541666666666667},
 'fairness_indicators_metrics/false_discovery_rate@0.3': {'doubleValue': 0.7272727272727273},
 'fairness_indicators_metrics/false_discovery_rate@0.5': {'doubleValue': 0.7062937062937062},
 'fairness_indicators_metrics/false_discovery_rate@0.7': {'doubleValue': 0.655367231638418},
 'fairness_indicators_metrics/false_discovery_rate@0.9': {'doubleValue': 0.4473684210526316},
 'fairness_indicators_metrics/false_negative_rate@0.1': {'doubleValue': 0.0},
 'fairness_indicators_metrics/false_negative_rate@0.3': {'doubleValue': 0.0847457627118644},
 'fairness_indicators_metrics/false_negative_rate@0.5': {'doubleValue': 0.288135593220339},
 'fairness_indicators_metrics/false_negative_rate@0.7': {'doubleValue': 0.4830508474576271},
 'fairness_indicators_metrics/false_negative_rate@0.9': {'doubleValue': 0.8220338983050848},
 'fairness_indicators_metrics/false_omission_rate@0.1': {'doubleValue': 0.0},
 'fairness_indicators_metrics/false_omission_rate@0.3': {'doubleValue': 0.10416666666666667},
 'fairness_indicators_metrics/false_omission_rate@0.5': {'doubleValue': 0.1650485436893204},
 'fairness_indicators_metrics/false_omission_rate@0.7': {'doubleValue': 0.18095238095238095},
 'fairness_indicators_metrics/false_omission_rate@0.9': {'doubleValue': 0.21365638766519823},
 'fairness_indicators_metrics/false_positive_rate@0.1': {'doubleValue': 0.9679144385026738},
 'fairness_indicators_metrics/false_positive_rate@0.3': {'doubleValue': 0.7700534759358288},
 'fairness_indicators_metrics/false_positive_rate@0.5': {'doubleValue': 0.5401069518716578},
 'fairness_indicators_metrics/false_positive_rate@0.7': {'doubleValue': 0.31016042780748665},
 'fairness_indicators_metrics/false_positive_rate@0.9': {'doubleValue': 0.045454545454545456},
 'fairness_indicators_metrics/negative_rate@0.1': {'doubleValue': 0.024390243902439025},
 'fairness_indicators_metrics/negative_rate@0.3': {'doubleValue': 0.1951219512195122},
 'fairness_indicators_metrics/negative_rate@0.5': {'doubleValue': 0.4186991869918699},
 'fairness_indicators_metrics/negative_rate@0.7': {'doubleValue': 0.6402439024390244},
 'fairness_indicators_metrics/negative_rate@0.9': {'doubleValue': 0.9227642276422764},
 'fairness_indicators_metrics/positive_rate@0.1': {'doubleValue': 0.975609756097561},
 'fairness_indicators_metrics/positive_rate@0.3': {'doubleValue': 0.8048780487804879},
 'fairness_indicators_metrics/positive_rate@0.5': {'doubleValue': 0.5813008130081301},
 'fairness_indicators_metrics/positive_rate@0.7': {'doubleValue': 0.3597560975609756},
 'fairness_indicators_metrics/positive_rate@0.9': {'doubleValue': 0.07723577235772358},
 'fairness_indicators_metrics/true_negative_rate@0.1': {'doubleValue': 0.03208556149732621},
 'fairness_indicators_metrics/true_negative_rate@0.3': {'doubleValue': 0.22994652406417113},
 'fairness_indicators_metrics/true_negative_rate@0.5': {'doubleValue': 0.45989304812834225},
 'fairness_indicators_metrics/true_negative_rate@0.7': {'doubleValue': 0.6898395721925134},
 'fairness_indicators_metrics/true_negative_rate@0.9': {'doubleValue': 0.9545454545454546},
 'fairness_indicators_metrics/true_positive_rate@0.1': {'doubleValue': 1.0},
 'fairness_indicators_metrics/true_positive_rate@0.3': {'doubleValue': 0.9152542372881356},
 'fairness_indicators_metrics/true_positive_rate@0.5': {'doubleValue': 0.711864406779661},
 'fairness_indicators_metrics/true_positive_rate@0.7': {'doubleValue': 0.5169491525423728},
 'fairness_indicators_metrics/true_positive_rate@0.9': {'doubleValue': 0.17796610169491525},
 'label/mean': {'doubleValue': 0.2398373931646347},
 'post_export_metrics/example_count': {'doubleValue': 492.0},
 'precision': {'doubleValue': 0.2937062978744507},
 'prediction/mean': {'doubleValue': 0.5578703880310059},
 'recall': {'doubleValue': 0.7118644118309021} }

Use get_metrics_for_all_slices() para obtener las métricas para todos los sectores como un mapeo diccionario de cada rebanada a las métricas correspondientes diccionario que obtiene de funcionamiento get_metrics_for_slice() en él.

pp.pprint(eval_result.get_metrics_for_all_slices())
{(): {'accuracy': {'doubleValue': 0.7174859642982483},
      'accuracy_baseline': {'doubleValue': 0.9198060631752014},
      'auc': {'doubleValue': 0.796409547328949},
      'auc_precision_recall': {'doubleValue': 0.3000231087207794},
      'average_loss': {'doubleValue': 0.5615971088409424},
      'fairness_indicators_metrics/false_discovery_rate@0.1': {'doubleValue': 0.9139404145348933},
      'fairness_indicators_metrics/false_discovery_rate@0.3': {'doubleValue': 0.8796606156634021},
      'fairness_indicators_metrics/false_discovery_rate@0.5': {'doubleValue': 0.816806708107944},
      'fairness_indicators_metrics/false_discovery_rate@0.7': {'doubleValue': 0.7090802784427505},
      'fairness_indicators_metrics/false_discovery_rate@0.9': {'doubleValue': 0.4814937210839392},
      'fairness_indicators_metrics/false_negative_rate@0.1': {'doubleValue': 0.006079867348348763},
      'fairness_indicators_metrics/false_negative_rate@0.3': {'doubleValue': 0.08696628437197734},
      'fairness_indicators_metrics/false_negative_rate@0.5': {'doubleValue': 0.2705713693519414},
      'fairness_indicators_metrics/false_negative_rate@0.7': {'doubleValue': 0.5445108470360647},
      'fairness_indicators_metrics/false_negative_rate@0.9': {'doubleValue': 0.891598728755009},
      'fairness_indicators_metrics/false_omission_rate@0.1': {'doubleValue': 0.006604499315158452},
      'fairness_indicators_metrics/false_omission_rate@0.3': {'doubleValue': 0.017811407791031682},
      'fairness_indicators_metrics/false_omission_rate@0.5': {'doubleValue': 0.03187681488249431},
      'fairness_indicators_metrics/false_omission_rate@0.7': {'doubleValue': 0.04993640137936933},
      'fairness_indicators_metrics/false_omission_rate@0.9': {'doubleValue': 0.07271999842219298},
      'fairness_indicators_metrics/false_positive_rate@0.1': {'doubleValue': 0.9202700382800194},
      'fairness_indicators_metrics/false_positive_rate@0.3': {'doubleValue': 0.5818879187535954},
      'fairness_indicators_metrics/false_positive_rate@0.5': {'doubleValue': 0.28355525303665063},
      'fairness_indicators_metrics/false_positive_rate@0.7': {'doubleValue': 0.09679333307231039},
      'fairness_indicators_metrics/false_positive_rate@0.9': {'doubleValue': 0.00877639469079322},
      'fairness_indicators_metrics/negative_rate@0.1': {'doubleValue': 0.07382367199944595},
      'fairness_indicators_metrics/negative_rate@0.3': {'doubleValue': 0.39155620195304386},
      'fairness_indicators_metrics/negative_rate@0.5': {'doubleValue': 0.6806884133250225},
      'fairness_indicators_metrics/negative_rate@0.7': {'doubleValue': 0.8744414433132488},
      'fairness_indicators_metrics/negative_rate@0.9': {'doubleValue': 0.9832342960038783},
      'fairness_indicators_metrics/positive_rate@0.1': {'doubleValue': 0.926176328000554},
      'fairness_indicators_metrics/positive_rate@0.3': {'doubleValue': 0.6084437980469561},
      'fairness_indicators_metrics/positive_rate@0.5': {'doubleValue': 0.3193115866749775},
      'fairness_indicators_metrics/positive_rate@0.7': {'doubleValue': 0.12555855668675117},
      'fairness_indicators_metrics/positive_rate@0.9': {'doubleValue': 0.016765703996121616},
      'fairness_indicators_metrics/true_negative_rate@0.1': {'doubleValue': 0.0797299617199806},
      'fairness_indicators_metrics/true_negative_rate@0.3': {'doubleValue': 0.41811208124640464},
      'fairness_indicators_metrics/true_negative_rate@0.5': {'doubleValue': 0.7164447469633494},
      'fairness_indicators_metrics/true_negative_rate@0.7': {'doubleValue': 0.9032066669276896},
      'fairness_indicators_metrics/true_negative_rate@0.9': {'doubleValue': 0.9912236053092068},
      'fairness_indicators_metrics/true_positive_rate@0.1': {'doubleValue': 0.9939201326516512},
      'fairness_indicators_metrics/true_positive_rate@0.3': {'doubleValue': 0.9130337156280227},
      'fairness_indicators_metrics/true_positive_rate@0.5': {'doubleValue': 0.7294286306480586},
      'fairness_indicators_metrics/true_positive_rate@0.7': {'doubleValue': 0.45548915296393533},
      'fairness_indicators_metrics/true_positive_rate@0.9': {'doubleValue': 0.10840127124499102},
      'label/mean': {'doubleValue': 0.08019392192363739},
      'post_export_metrics/example_count': {'doubleValue': 721950.0},
      'precision': {'doubleValue': 0.18319329619407654},
      'prediction/mean': {'doubleValue': 0.3998037576675415},
      'recall': {'doubleValue': 0.7294286489486694} },
 (('sexual_orientation', 'bisexual'),): {'accuracy': {'doubleValue': 0.5258620977401733},
                                         'accuracy_baseline': {'doubleValue': 0.8017241358757019},
                                         'auc': {'doubleValue': 0.6252922415733337},
                                         'auc_precision_recall': {'doubleValue': 0.3546649217605591},
                                         'average_loss': {'doubleValue': 0.7461641430854797},
                                         'fairness_indicators_metrics/false_discovery_rate@0.1': {'doubleValue': 0.7870370370370371},
                                         'fairness_indicators_metrics/false_discovery_rate@0.3': {'doubleValue': 0.7816091954022989},
                                         'fairness_indicators_metrics/false_discovery_rate@0.5': {'doubleValue': 0.7666666666666667},
                                         'fairness_indicators_metrics/false_discovery_rate@0.7': {'doubleValue': 0.7037037037037037},
                                         'fairness_indicators_metrics/false_discovery_rate@0.9': {'doubleValue': 0.0},
                                         'fairness_indicators_metrics/false_negative_rate@0.1': {'doubleValue': 0.0},
                                         'fairness_indicators_metrics/false_negative_rate@0.3': {'doubleValue': 0.17391304347826086},
                                         'fairness_indicators_metrics/false_negative_rate@0.5': {'doubleValue': 0.391304347826087},
                                         'fairness_indicators_metrics/false_negative_rate@0.7': {'doubleValue': 0.6521739130434783},
                                         'fairness_indicators_metrics/false_negative_rate@0.9': {'doubleValue': 0.9130434782608695},
                                         'fairness_indicators_metrics/false_omission_rate@0.1': {'doubleValue': 0.0},
                                         'fairness_indicators_metrics/false_omission_rate@0.3': {'doubleValue': 0.13793103448275862},
                                         'fairness_indicators_metrics/false_omission_rate@0.5': {'doubleValue': 0.16071428571428573},
                                         'fairness_indicators_metrics/false_omission_rate@0.7': {'doubleValue': 0.16853932584269662},
                                         'fairness_indicators_metrics/false_omission_rate@0.9': {'doubleValue': 0.18421052631578946},
                                         'fairness_indicators_metrics/false_positive_rate@0.1': {'doubleValue': 0.9139784946236559},
                                         'fairness_indicators_metrics/false_positive_rate@0.3': {'doubleValue': 0.7311827956989247},
                                         'fairness_indicators_metrics/false_positive_rate@0.5': {'doubleValue': 0.4946236559139785},
                                         'fairness_indicators_metrics/false_positive_rate@0.7': {'doubleValue': 0.20430107526881722},
                                         'fairness_indicators_metrics/false_positive_rate@0.9': {'doubleValue': 0.0},
                                         'fairness_indicators_metrics/negative_rate@0.1': {'doubleValue': 0.06896551724137931},
                                         'fairness_indicators_metrics/negative_rate@0.3': {'doubleValue': 0.25},
                                         'fairness_indicators_metrics/negative_rate@0.5': {'doubleValue': 0.4827586206896552},
                                         'fairness_indicators_metrics/negative_rate@0.7': {'doubleValue': 0.7672413793103449},
                                         'fairness_indicators_metrics/negative_rate@0.9': {'doubleValue': 0.9827586206896551},
                                         'fairness_indicators_metrics/positive_rate@0.1': {'doubleValue': 0.9310344827586207},
                                         'fairness_indicators_metrics/positive_rate@0.3': {'doubleValue': 0.75},
                                         'fairness_indicators_metrics/positive_rate@0.5': {'doubleValue': 0.5172413793103449},
                                         'fairness_indicators_metrics/positive_rate@0.7': {'doubleValue': 0.23275862068965517},
                                         'fairness_indicators_metrics/positive_rate@0.9': {'doubleValue': 0.017241379310344827},
                                         'fairness_indicators_metrics/true_negative_rate@0.1': {'doubleValue': 0.08602150537634409},
                                         'fairness_indicators_metrics/true_negative_rate@0.3': {'doubleValue': 0.26881720430107525},
                                         'fairness_indicators_metrics/true_negative_rate@0.5': {'doubleValue': 0.5053763440860215},
                                         'fairness_indicators_metrics/true_negative_rate@0.7': {'doubleValue': 0.7956989247311828},
                                         'fairness_indicators_metrics/true_negative_rate@0.9': {'doubleValue': 1.0},
                                         'fairness_indicators_metrics/true_positive_rate@0.1': {'doubleValue': 1.0},
                                         'fairness_indicators_metrics/true_positive_rate@0.3': {'doubleValue': 0.8260869565217391},
                                         'fairness_indicators_metrics/true_positive_rate@0.5': {'doubleValue': 0.6086956521739131},
                                         'fairness_indicators_metrics/true_positive_rate@0.7': {'doubleValue': 0.34782608695652173},
                                         'fairness_indicators_metrics/true_positive_rate@0.9': {'doubleValue': 0.08695652173913043},
                                         'label/mean': {'doubleValue': 0.1982758641242981},
                                         'post_export_metrics/example_count': {'doubleValue': 116.0},
                                         'precision': {'doubleValue': 0.23333333432674408},
                                         'prediction/mean': {'doubleValue': 0.4908219575881958},
                                         'recall': {'doubleValue': 0.6086956262588501} },
 (('sexual_orientation', 'heterosexual'),): {'accuracy': {'doubleValue': 0.5203251838684082},
                                             'accuracy_baseline': {'doubleValue': 0.7601625919342041},
                                             'auc': {'doubleValue': 0.6672822833061218},
                                             'auc_precision_recall': {'doubleValue': 0.4065391719341278},
                                             'average_loss': {'doubleValue': 0.8273133039474487},
                                             'fairness_indicators_metrics/false_discovery_rate@0.1': {'doubleValue': 0.7541666666666667},
                                             'fairness_indicators_metrics/false_discovery_rate@0.3': {'doubleValue': 0.7272727272727273},
                                             'fairness_indicators_metrics/false_discovery_rate@0.5': {'doubleValue': 0.7062937062937062},
                                             'fairness_indicators_metrics/false_discovery_rate@0.7': {'doubleValue': 0.655367231638418},
                                             'fairness_indicators_metrics/false_discovery_rate@0.9': {'doubleValue': 0.4473684210526316},
                                             'fairness_indicators_metrics/false_negative_rate@0.1': {'doubleValue': 0.0},
                                             'fairness_indicators_metrics/false_negative_rate@0.3': {'doubleValue': 0.0847457627118644},
                                             'fairness_indicators_metrics/false_negative_rate@0.5': {'doubleValue': 0.288135593220339},
                                             'fairness_indicators_metrics/false_negative_rate@0.7': {'doubleValue': 0.4830508474576271},
                                             'fairness_indicators_metrics/false_negative_rate@0.9': {'doubleValue': 0.8220338983050848},
                                             'fairness_indicators_metrics/false_omission_rate@0.1': {'doubleValue': 0.0},
                                             'fairness_indicators_metrics/false_omission_rate@0.3': {'doubleValue': 0.10416666666666667},
                                             'fairness_indicators_metrics/false_omission_rate@0.5': {'doubleValue': 0.1650485436893204},
                                             'fairness_indicators_metrics/false_omission_rate@0.7': {'doubleValue': 0.18095238095238095},
                                             'fairness_indicators_metrics/false_omission_rate@0.9': {'doubleValue': 0.21365638766519823},
                                             'fairness_indicators_metrics/false_positive_rate@0.1': {'doubleValue': 0.9679144385026738},
                                             'fairness_indicators_metrics/false_positive_rate@0.3': {'doubleValue': 0.7700534759358288},
                                             'fairness_indicators_metrics/false_positive_rate@0.5': {'doubleValue': 0.5401069518716578},
                                             'fairness_indicators_metrics/false_positive_rate@0.7': {'doubleValue': 0.31016042780748665},
                                             'fairness_indicators_metrics/false_positive_rate@0.9': {'doubleValue': 0.045454545454545456},
                                             'fairness_indicators_metrics/negative_rate@0.1': {'doubleValue': 0.024390243902439025},
                                             'fairness_indicators_metrics/negative_rate@0.3': {'doubleValue': 0.1951219512195122},
                                             'fairness_indicators_metrics/negative_rate@0.5': {'doubleValue': 0.4186991869918699},
                                             'fairness_indicators_metrics/negative_rate@0.7': {'doubleValue': 0.6402439024390244},
                                             'fairness_indicators_metrics/negative_rate@0.9': {'doubleValue': 0.9227642276422764},
                                             'fairness_indicators_metrics/positive_rate@0.1': {'doubleValue': 0.975609756097561},
                                             'fairness_indicators_metrics/positive_rate@0.3': {'doubleValue': 0.8048780487804879},
                                             'fairness_indicators_metrics/positive_rate@0.5': {'doubleValue': 0.5813008130081301},
                                             'fairness_indicators_metrics/positive_rate@0.7': {'doubleValue': 0.3597560975609756},
                                             'fairness_indicators_metrics/positive_rate@0.9': {'doubleValue': 0.07723577235772358},
                                             'fairness_indicators_metrics/true_negative_rate@0.1': {'doubleValue': 0.03208556149732621},
                                             'fairness_indicators_metrics/true_negative_rate@0.3': {'doubleValue': 0.22994652406417113},
                                             'fairness_indicators_metrics/true_negative_rate@0.5': {'doubleValue': 0.45989304812834225},
                                             'fairness_indicators_metrics/true_negative_rate@0.7': {'doubleValue': 0.6898395721925134},
                                             'fairness_indicators_metrics/true_negative_rate@0.9': {'doubleValue': 0.9545454545454546},
                                             'fairness_indicators_metrics/true_positive_rate@0.1': {'doubleValue': 1.0},
                                             'fairness_indicators_metrics/true_positive_rate@0.3': {'doubleValue': 0.9152542372881356},
                                             'fairness_indicators_metrics/true_positive_rate@0.5': {'doubleValue': 0.711864406779661},
                                             'fairness_indicators_metrics/true_positive_rate@0.7': {'doubleValue': 0.5169491525423728},
                                             'fairness_indicators_metrics/true_positive_rate@0.9': {'doubleValue': 0.17796610169491525},
                                             'label/mean': {'doubleValue': 0.2398373931646347},
                                             'post_export_metrics/example_count': {'doubleValue': 492.0},
                                             'precision': {'doubleValue': 0.2937062978744507},
                                             'prediction/mean': {'doubleValue': 0.5578703880310059},
                                             'recall': {'doubleValue': 0.7118644118309021} },
 (('sexual_orientation', 'homosexual_gay_or_lesbian'),): {'accuracy': {'doubleValue': 0.5851936340332031},
                                                          'accuracy_baseline': {'doubleValue': 0.7182232141494751},
                                                          'auc': {'doubleValue': 0.7057511806488037},
                                                          'auc_precision_recall': {'doubleValue': 0.469566285610199},
                                                          'average_loss': {'doubleValue': 0.7369641661643982},
                                                          'fairness_indicators_metrics/false_discovery_rate@0.1': {'doubleValue': 0.7107050831576481},
                                                          'fairness_indicators_metrics/false_discovery_rate@0.3': {'doubleValue': 0.6717557251908397},
                                                          'fairness_indicators_metrics/false_discovery_rate@0.5': {'doubleValue': 0.6172690763052209},
                                                          'fairness_indicators_metrics/false_discovery_rate@0.7': {'doubleValue': 0.5427319211102994},
                                                          'fairness_indicators_metrics/false_discovery_rate@0.9': {'doubleValue': 0.4092664092664093},
                                                          'fairness_indicators_metrics/false_negative_rate@0.1': {'doubleValue': 0.0016168148746968471},
                                                          'fairness_indicators_metrics/false_negative_rate@0.3': {'doubleValue': 0.06143896523848019},
                                                          'fairness_indicators_metrics/false_negative_rate@0.5': {'doubleValue': 0.22958771220695232},
                                                          'fairness_indicators_metrics/false_negative_rate@0.7': {'doubleValue': 0.4939369442198868},
                                                          'fairness_indicators_metrics/false_negative_rate@0.9': {'doubleValue': 0.8763136620856912},
                                                          'fairness_indicators_metrics/false_omission_rate@0.1': {'doubleValue': 0.01652892561983471},
                                                          'fairness_indicators_metrics/false_omission_rate@0.3': {'doubleValue': 0.08909730363423213},
                                                          'fairness_indicators_metrics/false_omission_rate@0.5': {'doubleValue': 0.14947368421052631},
                                                          'fairness_indicators_metrics/false_omission_rate@0.7': {'doubleValue': 0.20225091029460443},
                                                          'fairness_indicators_metrics/false_omission_rate@0.9': {'doubleValue': 0.2624061970467199},
                                                          'fairness_indicators_metrics/false_positive_rate@0.1': {'doubleValue': 0.9622581668252458},
                                                          'fairness_indicators_metrics/false_positive_rate@0.3': {'doubleValue': 0.7535680304471931},
                                                          'fairness_indicators_metrics/false_positive_rate@0.5': {'doubleValue': 0.4874722486520774},
                                                          'fairness_indicators_metrics/false_positive_rate@0.7': {'doubleValue': 0.2356485886457342},
                                                          'fairness_indicators_metrics/false_positive_rate@0.9': {'doubleValue': 0.03361877576910879},
                                                          'fairness_indicators_metrics/negative_rate@0.1': {'doubleValue': 0.0275626423690205},
                                                          'fairness_indicators_metrics/negative_rate@0.3': {'doubleValue': 0.19430523917995443},
                                                          'fairness_indicators_metrics/negative_rate@0.5': {'doubleValue': 0.4328018223234624},
                                                          'fairness_indicators_metrics/negative_rate@0.7': {'doubleValue': 0.6881548974943053},
                                                          'fairness_indicators_metrics/negative_rate@0.9': {'doubleValue': 0.941002277904328},
                                                          'fairness_indicators_metrics/positive_rate@0.1': {'doubleValue': 0.9724373576309795},
                                                          'fairness_indicators_metrics/positive_rate@0.3': {'doubleValue': 0.8056947608200455},
                                                          'fairness_indicators_metrics/positive_rate@0.5': {'doubleValue': 0.5671981776765376},
                                                          'fairness_indicators_metrics/positive_rate@0.7': {'doubleValue': 0.31184510250569475},
                                                          'fairness_indicators_metrics/positive_rate@0.9': {'doubleValue': 0.05899772209567198},
                                                          'fairness_indicators_metrics/true_negative_rate@0.1': {'doubleValue': 0.0377418331747542},
                                                          'fairness_indicators_metrics/true_negative_rate@0.3': {'doubleValue': 0.24643196955280686},
                                                          'fairness_indicators_metrics/true_negative_rate@0.5': {'doubleValue': 0.5125277513479226},
                                                          'fairness_indicators_metrics/true_negative_rate@0.7': {'doubleValue': 0.7643514113542658},
                                                          'fairness_indicators_metrics/true_negative_rate@0.9': {'doubleValue': 0.9663812242308912},
                                                          'fairness_indicators_metrics/true_positive_rate@0.1': {'doubleValue': 0.9983831851253031},
                                                          'fairness_indicators_metrics/true_positive_rate@0.3': {'doubleValue': 0.9385610347615198},
                                                          'fairness_indicators_metrics/true_positive_rate@0.5': {'doubleValue': 0.7704122877930477},
                                                          'fairness_indicators_metrics/true_positive_rate@0.7': {'doubleValue': 0.5060630557801131},
                                                          'fairness_indicators_metrics/true_positive_rate@0.9': {'doubleValue': 0.12368633791430882},
                                                          'label/mean': {'doubleValue': 0.2817767560482025},
                                                          'post_export_metrics/example_count': {'doubleValue': 4390.0},
                                                          'precision': {'doubleValue': 0.3827309310436249},
                                                          'prediction/mean': {'doubleValue': 0.5428739786148071},
                                                          'recall': {'doubleValue': 0.770412266254425} },
 (('sexual_orientation', 'other_sexual_orientation'),): {'accuracy': {'doubleValue': 0.6000000238418579},
                                                         'accuracy_baseline': {'doubleValue': 0.800000011920929},
                                                         'auc': {'doubleValue': 1.0},
                                                         'auc_precision_recall': {'doubleValue': 1.0},
                                                         'average_loss': {'doubleValue': 0.7521011829376221},
                                                         'fairness_indicators_metrics/false_discovery_rate@0.1': {'doubleValue': 0.8},
                                                         'fairness_indicators_metrics/false_discovery_rate@0.3': {'doubleValue': 0.75},
                                                         'fairness_indicators_metrics/false_discovery_rate@0.5': {'doubleValue': 0.6666666666666666},
                                                         'fairness_indicators_metrics/false_discovery_rate@0.7': {'doubleValue': 0.5},
                                                         'fairness_indicators_metrics/false_discovery_rate@0.9': {'doubleValue': 0.0},
                                                         'fairness_indicators_metrics/false_negative_rate@0.1': {'doubleValue': 0.0},
                                                         'fairness_indicators_metrics/false_negative_rate@0.3': {'doubleValue': 0.0},
                                                         'fairness_indicators_metrics/false_negative_rate@0.5': {'doubleValue': 0.0},
                                                         'fairness_indicators_metrics/false_negative_rate@0.7': {'doubleValue': 0.0},
                                                         'fairness_indicators_metrics/false_negative_rate@0.9': {'doubleValue': 0.0},
                                                         'fairness_indicators_metrics/false_omission_rate@0.1': {'doubleValue': 'NaN'},
                                                         'fairness_indicators_metrics/false_omission_rate@0.3': {'doubleValue': 0.0},
                                                         'fairness_indicators_metrics/false_omission_rate@0.5': {'doubleValue': 0.0},
                                                         'fairness_indicators_metrics/false_omission_rate@0.7': {'doubleValue': 0.0},
                                                         'fairness_indicators_metrics/false_omission_rate@0.9': {'doubleValue': 0.0},
                                                         'fairness_indicators_metrics/false_positive_rate@0.1': {'doubleValue': 1.0},
                                                         'fairness_indicators_metrics/false_positive_rate@0.3': {'doubleValue': 0.75},
                                                         'fairness_indicators_metrics/false_positive_rate@0.5': {'doubleValue': 0.5},
                                                         'fairness_indicators_metrics/false_positive_rate@0.7': {'doubleValue': 0.25},
                                                         'fairness_indicators_metrics/false_positive_rate@0.9': {'doubleValue': 0.0},
                                                         'fairness_indicators_metrics/negative_rate@0.1': {'doubleValue': 0.0},
                                                         'fairness_indicators_metrics/negative_rate@0.3': {'doubleValue': 0.2},
                                                         'fairness_indicators_metrics/negative_rate@0.5': {'doubleValue': 0.4},
                                                         'fairness_indicators_metrics/negative_rate@0.7': {'doubleValue': 0.6},
                                                         'fairness_indicators_metrics/negative_rate@0.9': {'doubleValue': 0.8},
                                                         'fairness_indicators_metrics/positive_rate@0.1': {'doubleValue': 1.0},
                                                         'fairness_indicators_metrics/positive_rate@0.3': {'doubleValue': 0.8},
                                                         'fairness_indicators_metrics/positive_rate@0.5': {'doubleValue': 0.6},
                                                         'fairness_indicators_metrics/positive_rate@0.7': {'doubleValue': 0.4},
                                                         'fairness_indicators_metrics/positive_rate@0.9': {'doubleValue': 0.2},
                                                         'fairness_indicators_metrics/true_negative_rate@0.1': {'doubleValue': 0.0},
                                                         'fairness_indicators_metrics/true_negative_rate@0.3': {'doubleValue': 0.25},
                                                         'fairness_indicators_metrics/true_negative_rate@0.5': {'doubleValue': 0.5},
                                                         'fairness_indicators_metrics/true_negative_rate@0.7': {'doubleValue': 0.75},
                                                         'fairness_indicators_metrics/true_negative_rate@0.9': {'doubleValue': 1.0},
                                                         'fairness_indicators_metrics/true_positive_rate@0.1': {'doubleValue': 1.0},
                                                         'fairness_indicators_metrics/true_positive_rate@0.3': {'doubleValue': 1.0},
                                                         'fairness_indicators_metrics/true_positive_rate@0.5': {'doubleValue': 1.0},
                                                         'fairness_indicators_metrics/true_positive_rate@0.7': {'doubleValue': 1.0},
                                                         'fairness_indicators_metrics/true_positive_rate@0.9': {'doubleValue': 1.0},
                                                         'label/mean': {'doubleValue': 0.20000000298023224},
                                                         'post_export_metrics/example_count': {'doubleValue': 5.0},
                                                         'precision': {'doubleValue': 0.3333333432674408},
                                                         'prediction/mean': {'doubleValue': 0.6101843118667603},
                                                         'recall': {'doubleValue': 1.0} } }