Fairness Indicators Example Colab

Overview

In this activity, you'll use Fairness Indicators to explore the Civil Comments dataset. Fairness Indicators is a suite of tools built on top of TensorFlow Model Analysis that enable regular evaluation of fairness metrics in product pipelines. This Introductory Video provides more details and context on the real-world scenario we are presenting here, one of primary motivations for creating Fairness Indicators.

About the Dataset

In this exercise, you'll work with the Civil Comments dataset, approximately 2 million public comments made public by the Civil Comments platform in 2017 for ongoing research. This effort was sponsored by Jigsaw, who have hosted competitions on Kaggle to help classify toxic comments as well as minimize unintended model bias.

Each individual text comment in the dataset has a toxicity label. Within the data, a subset of comments are labeled with a variety of identity attributes, including categories for gender, sexual orientation, religion, and race or ethnicity.

About the Tools

TensorFlow Model Analysis is a library for evaluating both TensorFlow and non-TensorFlow machine learning models. It allows users to evaluate their models on large amounts of data in a distributed manner, computing in-graph and other metrics over different slices of data and visualized in notebooks.

Fairness Indicators is built on top of TFMA. With Fairness Indicators, users will be able to:

  • Evaluate model performance, sliced across defined groups of users
  • Feel confident about results with confidence intervals and evaluations at multiple thresholds

Fairness Indicators is packaged with TensorFlow Data Validation and What-If Tool to allow users to:

  • Evaluate the distribution of datasets
  • Dive deep into individual slices to explore root causes and opportunities for improvement with the What-If Tool

Importing

Run the following code to install the fairness_indicators library. This package contains the tools we'll be using in this exercise. Restart Runtime may be requested but is not necessary.

pip install -q --upgrade fairness-indicators
ERROR: apache-beam 2.21.0 has requirement oauth2client<4,>=2.0.1, but you'll have oauth2client 4.1.3 which is incompatible.
ERROR: tensorflow-serving-api 2.1.0 has requirement tensorflow~=2.1.0, but you'll have tensorflow 2.2.0 which is incompatible.
ERROR: tfx-bsl 0.22.0 has requirement absl-py<0.9,>=0.7, but you'll have absl-py 0.9.0 which is incompatible.
ERROR: tensorflow-model-analysis 0.22.1 has requirement absl-py<0.9,>=0.7, but you'll have absl-py 0.9.0 which is incompatible.
ERROR: tensorflow-transform 0.22.0 has requirement absl-py<0.9,>=0.7, but you'll have absl-py 0.9.0 which is incompatible.
ERROR: tensorflow-data-validation 0.22.0 has requirement absl-py<0.9,>=0.7, but you'll have absl-py 0.9.0 which is incompatible.
ERROR: tensorflow-data-validation 0.22.0 has requirement pandas<1,>=0.24, but you'll have pandas 1.0.4 which is incompatible.

import os
import tempfile
import apache_beam as beam
import numpy as np
import pandas as pd
from datetime import datetime

import tensorflow_hub as hub
import tensorflow as tf
import tensorflow_model_analysis as tfma
import tensorflow_data_validation as tfdv
from tensorflow_model_analysis.addons.fairness.post_export_metrics import fairness_indicators
from tensorflow_model_analysis.addons.fairness.view import widget_view
from fairness_indicators.examples import util

from witwidget.notebook.visualization import WitConfigBuilder
from witwidget.notebook.visualization import WitWidget

Download and Understand the Data

In this exercise, you'll work with the Civil Comments dataset, approximately 2 million public comments made public by the Civil Comments platform in 2017. Additionally, a subset of comments have been labelled with a variety of identity attributes, representing the identities that are mentioned in the comment.

We've hosted the dataset on Google Cloud Platform for convenience. Run the following code to download the data from GCP, the data will take about a minute to download and analyze.

TensorFlow Data Validation is one tool you can use to analyze your data. You can use it to find potential problems in your data, such as missing values and data imbalances, that can lead to Fairness disparities.














download_original_data = True 

if download_original_data:
  train_tf_file = tf.keras.utils.get_file('train_tf.tfrecord',
                                          'https://storage.googleapis.com/civil_comments_dataset/train_tf.tfrecord')
  validate_tf_file = tf.keras.utils.get_file('validate_tf.tfrecord',
                                             'https://storage.googleapis.com/civil_comments_dataset/validate_tf.tfrecord')

  # The identity terms list will be grouped together by their categories
  # (see 'IDENTITY_COLUMNS') on threshould 0.5. Only the identity term column,
  # text column and label column will be kept after processing.
  train_tf_file = util.convert_comments_data(train_tf_file)
  validate_tf_file = util.convert_comments_data(validate_tf_file)

else:
  train_tf_file = tf.keras.utils.get_file('train_tf_processed.tfrecord',
                                          'https://storage.googleapis.com/civil_comments_dataset/train_tf_processed.tfrecord')
  validate_tf_file = tf.keras.utils.get_file('validate_tf_processed.tfrecord',
                                             'https://storage.googleapis.com/civil_comments_dataset/validate_tf_processed.tfrecord')
Downloading data from https://storage.googleapis.com/civil_comments_dataset/train_tf.tfrecord
1439031296/1439024821 [==============================] - 26s 0us/step
Downloading data from https://storage.googleapis.com/civil_comments_dataset/validate_tf.tfrecord
958767104/958765415 [==============================] - 17s 0us/step

stats = tfdv.generate_statistics_from_tfrecord(data_location=train_tf_file)
tfdv.visualize_statistics(stats)
WARNING:apache_beam.runners.interactive.interactive_environment:Dependencies required for Interactive Beam PCollection visualization are not available, please use: `pip install apache-beam[interactive]` to install necessary dependencies to enable all data visualization features.

Warning:apache_beam.io.tfrecordio:Couldn't find python-snappy so the implementation of _TFRecordUtil._masked_crc32c is not as fast as it could be.

Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_data_validation/utils/stats_util.py:227: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`

Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_data_validation/utils/stats_util.py:227: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`

There are several interesting things that we may want to note in this data. The first is that the toxicity label, which is what we are predicting, is unbalanced. Only 8% of examples in the training set are toxic, which means that a classifier could get 92% accuracy by predicting that all comments are non-toxic.

For the fields relating to identity terms note that out of 1.08 million training examples, only around 6.6k examples deal with homosexuality, and those related to bisexuality are even more rare. This might indicate that performance on these slices may suffer due to lack of training data.

Defining Constants

Here, we define the feature map that will be used to parse the data. Each example will have a label, comment text, and identity features sexual orientation, gender, religion, race, and disability that are associated with the text.

BASE_DIR = tempfile.gettempdir()

TEXT_FEATURE = 'comment_text'
LABEL = 'toxicity'
FEATURE_MAP = {
    # Label:
    LABEL: tf.io.FixedLenFeature([], tf.float32),
    # Text:
    TEXT_FEATURE:  tf.io.FixedLenFeature([], tf.string),

    # Identities:
    'sexual_orientation':tf.io.VarLenFeature(tf.string),
    'gender':tf.io.VarLenFeature(tf.string),
    'religion':tf.io.VarLenFeature(tf.string),
    'race':tf.io.VarLenFeature(tf.string),
    'disability':tf.io.VarLenFeature(tf.string),
}

Train the Model

First, set up the input function to feed data into the model. Note that since we identified a class imbalance by our earlier TensorFlow Data Validation run, we will add a weight column to each example and upweight the toxic examples to account for this. We only use identity features during the evaluation phase, as only the comments are fed into the model at training time.

def train_input_fn():
  def parse_function(serialized):
    parsed_example = tf.io.parse_single_example(
        serialized=serialized, features=FEATURE_MAP)
    # Adds a weight column to deal with unbalanced classes.
    parsed_example['weight'] = tf.add(parsed_example[LABEL], 0.1)
    return (parsed_example,
            parsed_example[LABEL])
  train_dataset = tf.data.TFRecordDataset(
      filenames=[train_tf_file]).map(parse_function).batch(512)
  return train_dataset

Next, create a deep neural network model, and train it on the data:

model_dir = os.path.join(BASE_DIR, 'train', datetime.now().strftime(
    "%Y%m%d-%H%M%S"))

embedded_text_feature_column = hub.text_embedding_column(
    key=TEXT_FEATURE,
    module_spec='https://tfhub.dev/google/nnlm-en-dim128/1')

classifier = tf.estimator.DNNClassifier(
    hidden_units=[500, 100],
    weight_column='weight',
    feature_columns=[embedded_text_feature_column],
    optimizer=tf.keras.optimizers.Adagrad(learning_rate=0.003),
    loss_reduction=tf.losses.Reduction.SUM,
    n_classes=2,
    model_dir=model_dir)

classifier.train(input_fn=train_input_fn, steps=1000)
INFO:tensorflow:Using default config.

INFO:tensorflow:Using default config.

INFO:tensorflow:Using config: {'_model_dir': '/tmp/train/20200602-181446', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

INFO:tensorflow:Using config: {'_model_dir': '/tmp/train/20200602-181446', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py:1666: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.

Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py:1666: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.

Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.

Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Saver not created because there are no variables in the graph to restore

INFO:tensorflow:Saver not created because there are no variables in the graph to restore

Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/head/base_head.py:517: NumericColumn._get_dense_tensor (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version.
Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.

Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/head/base_head.py:517: NumericColumn._get_dense_tensor (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version.
Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.

Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/feature_column/feature_column.py:2167: NumericColumn._transform_feature (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version.
Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.

Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/feature_column/feature_column.py:2167: NumericColumn._transform_feature (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version.
Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.

Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/keras/optimizer_v2/adagrad.py:106: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor

Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/keras/optimizer_v2/adagrad.py:106: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Create CheckpointSaverHook.

INFO:tensorflow:Create CheckpointSaverHook.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...

INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...

INFO:tensorflow:Saving checkpoints for 0 into /tmp/train/20200602-181446/model.ckpt.

INFO:tensorflow:Saving checkpoints for 0 into /tmp/train/20200602-181446/model.ckpt.

INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...

INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...

INFO:tensorflow:loss = 58.801712, step = 0

INFO:tensorflow:loss = 58.801712, step = 0

INFO:tensorflow:global_step/sec: 24.4222

INFO:tensorflow:global_step/sec: 24.4222

INFO:tensorflow:loss = 56.420044, step = 100 (4.097 sec)

INFO:tensorflow:loss = 56.420044, step = 100 (4.097 sec)

INFO:tensorflow:global_step/sec: 23.7161

INFO:tensorflow:global_step/sec: 23.7161

INFO:tensorflow:loss = 47.905754, step = 200 (4.217 sec)

INFO:tensorflow:loss = 47.905754, step = 200 (4.217 sec)

INFO:tensorflow:global_step/sec: 21.4733

INFO:tensorflow:global_step/sec: 21.4733

INFO:tensorflow:loss = 56.08833, step = 300 (4.657 sec)

INFO:tensorflow:loss = 56.08833, step = 300 (4.657 sec)

INFO:tensorflow:global_step/sec: 21.4546

INFO:tensorflow:global_step/sec: 21.4546

INFO:tensorflow:loss = 55.43097, step = 400 (4.661 sec)

INFO:tensorflow:loss = 55.43097, step = 400 (4.661 sec)

INFO:tensorflow:global_step/sec: 24.5031

INFO:tensorflow:global_step/sec: 24.5031

INFO:tensorflow:loss = 41.52231, step = 500 (4.081 sec)

INFO:tensorflow:loss = 41.52231, step = 500 (4.081 sec)

INFO:tensorflow:global_step/sec: 24.9583

INFO:tensorflow:global_step/sec: 24.9583

INFO:tensorflow:loss = 45.33443, step = 600 (4.007 sec)

INFO:tensorflow:loss = 45.33443, step = 600 (4.007 sec)

INFO:tensorflow:global_step/sec: 25.1941

INFO:tensorflow:global_step/sec: 25.1941

INFO:tensorflow:loss = 51.031197, step = 700 (3.969 sec)

INFO:tensorflow:loss = 51.031197, step = 700 (3.969 sec)

INFO:tensorflow:global_step/sec: 24.0986

INFO:tensorflow:global_step/sec: 24.0986

INFO:tensorflow:loss = 47.586723, step = 800 (4.150 sec)

INFO:tensorflow:loss = 47.586723, step = 800 (4.150 sec)

INFO:tensorflow:global_step/sec: 24.7

INFO:tensorflow:global_step/sec: 24.7

INFO:tensorflow:loss = 48.077736, step = 900 (4.049 sec)

INFO:tensorflow:loss = 48.077736, step = 900 (4.049 sec)

INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 1000...

INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 1000...

INFO:tensorflow:Saving checkpoints for 1000 into /tmp/train/20200602-181446/model.ckpt.

INFO:tensorflow:Saving checkpoints for 1000 into /tmp/train/20200602-181446/model.ckpt.

INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 1000...

INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 1000...

INFO:tensorflow:Loss for final step: 50.77939.

INFO:tensorflow:Loss for final step: 50.77939.

<tensorflow_estimator.python.estimator.canned.dnn.DNNClassifierV2 at 0x7fb01039eda0>

Run TensorFlow Model Analysis with Fairness Indicators

Export Saved Model

def eval_input_receiver_fn():
  serialized_tf_example = tf.compat.v1.placeholder(
      dtype=tf.string, shape=[None], name='input_example_placeholder')

  # This *must* be a dictionary containing a single key 'examples', which
  # points to the input placeholder.
  receiver_tensors = {'examples': serialized_tf_example}

  features = tf.io.parse_example(serialized_tf_example, FEATURE_MAP)
  features['weight'] = tf.ones_like(features[LABEL])

  return tfma.export.EvalInputReceiver(
    features=features,
    receiver_tensors=receiver_tensors,
    labels=features[LABEL])

tfma_export_dir = tfma.export.export_eval_savedmodel(
  estimator=classifier,
  export_dir_base=os.path.join(BASE_DIR, 'tfma_eval_model'),
  eval_input_receiver_fn=eval_input_receiver_fn)
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_model_analysis/eval_saved_model/encoding.py:141: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.

Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_model_analysis/eval_saved_model/encoding.py:141: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Saver not created because there are no variables in the graph to restore

INFO:tensorflow:Saver not created because there are no variables in the graph to restore

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Signatures INCLUDED in export for Classify: None

INFO:tensorflow:Signatures INCLUDED in export for Classify: None

INFO:tensorflow:Signatures INCLUDED in export for Regress: None

INFO:tensorflow:Signatures INCLUDED in export for Regress: None

INFO:tensorflow:Signatures INCLUDED in export for Predict: None

INFO:tensorflow:Signatures INCLUDED in export for Predict: None

INFO:tensorflow:Signatures INCLUDED in export for Train: None

INFO:tensorflow:Signatures INCLUDED in export for Train: None

INFO:tensorflow:Signatures INCLUDED in export for Eval: ['eval']

INFO:tensorflow:Signatures INCLUDED in export for Eval: ['eval']

Warning:tensorflow:Export includes no default signature!

Warning:tensorflow:Export includes no default signature!

INFO:tensorflow:Restoring parameters from /tmp/train/20200602-181446/model.ckpt-1000

INFO:tensorflow:Restoring parameters from /tmp/train/20200602-181446/model.ckpt-1000

INFO:tensorflow:Assets added to graph.

INFO:tensorflow:Assets added to graph.

INFO:tensorflow:Assets written to: /tmp/tfma_eval_model/temp-1591121742/assets

INFO:tensorflow:Assets written to: /tmp/tfma_eval_model/temp-1591121742/assets

INFO:tensorflow:SavedModel written to: /tmp/tfma_eval_model/temp-1591121742/saved_model.pb

INFO:tensorflow:SavedModel written to: /tmp/tfma_eval_model/temp-1591121742/saved_model.pb

Compute Fairness Metrics

Select the identity to compute metrics for and whether to run with confidence intervals in the panel on the right-hand side. Depending on your configurations, this step will take 2-10 minutes to run.


tfma_eval_result_path = os.path.join(BASE_DIR, 'tfma_eval_result')


slice_selection = 'sexual_orientation' 

compute_confidence_intervals = False 

# Define slices that you want the evaluation to run on.
slice_spec = [
    tfma.slicer.SingleSliceSpec(), # Overall slice
    tfma.slicer.SingleSliceSpec(columns=[slice_selection]),
]

# Add the fairness metrics.
add_metrics_callbacks = [
  tfma.post_export_metrics.fairness_indicators(
      thresholds=[0.1, 0.3, 0.5, 0.7, 0.9],
      labels_key=LABEL
      )
]

eval_shared_model = tfma.default_eval_shared_model(
    eval_saved_model_path=tfma_export_dir,
    add_metrics_callbacks=add_metrics_callbacks)

# Run the fairness evaluation.
with beam.Pipeline() as pipeline:
  _ = (
      pipeline
      | 'ReadData' >> beam.io.ReadFromTFRecord(validate_tf_file)
      | 'ExtractEvaluateAndWriteResults' >>
       tfma.ExtractEvaluateAndWriteResults(
                 eval_shared_model=eval_shared_model,
                 slice_spec=slice_spec,
                 compute_confidence_intervals=compute_confidence_intervals,
                 output_path=tfma_eval_result_path)
  )

eval_result = tfma.load_eval_result(output_path=tfma_eval_result_path)
WARNING:apache_beam.typehints.typehints:Ignoring send_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring return_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring send_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring return_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring send_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring return_type hint: <class 'NoneType'>

Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_model_analysis/eval_saved_model/load.py:169: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0.

Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_model_analysis/eval_saved_model/load.py:169: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0.

INFO:tensorflow:Restoring parameters from /tmp/tfma_eval_model/1591121742/variables/variables

INFO:tensorflow:Restoring parameters from /tmp/tfma_eval_model/1591121742/variables/variables

Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_model_analysis/eval_saved_model/graph_ref.py:189: get_tensor_from_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.get_tensor_from_tensor_info or tf.compat.v1.saved_model.get_tensor_from_tensor_info.

Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_model_analysis/eval_saved_model/graph_ref.py:189: get_tensor_from_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.get_tensor_from_tensor_info or tf.compat.v1.saved_model.get_tensor_from_tensor_info.

Render What-if Tool

In this section, you'll use the What-If Tool's interactive visual interface to explore and manipulate data at a micro-level.

On the right-hand panel in the visualization, you will see a scatter plot where each point represents one of the examples in the subset loaded into the tool. Click on one of the points. In the left-hand panel, you should now see details about this particular example. The comment text, ground truth toxicity, and applicable identities are shown. At the bottom of this left-hand panel, you see the inference results from the model you just trained.

Modify the text of the example. You can then click the "Run inference" button to view how your changes caused the perceived toxicity prediction to change.

DEFAULT_MAX_EXAMPLES = 1000

# Load 100000 examples in memory. When first rendered, 
# What-If Tool should only display 1000 of these due to browser constraints.
def wit_dataset(file, num_examples=100000):
  dataset = tf.data.TFRecordDataset(
      filenames=[file]).take(num_examples)
  return [tf.train.Example.FromString(d.numpy()) for d in dataset]

wit_data = wit_dataset(train_tf_file)
config_builder = WitConfigBuilder(wit_data[:DEFAULT_MAX_EXAMPLES]).set_estimator_and_feature_spec(
    classifier, FEATURE_MAP).set_label_vocab(['non-toxicity', LABEL]).set_target_feature(LABEL)
wit = WitWidget(config_builder)

Render Fairness Indicators

Render the Fairness Indicators widget with the exported evaluation results.

Below you will see bar charts displaying performance of each slice of the data on selected metrics. You can adjust the baseline comparison slice as well as the displayed threshold(s) using the drop down menus at the top of the visualization.

The Fairness Indicator widget is integrated with the What-If Tool rendered above. If you select one slice of the data in the bar chart, the What-If Tool will update to show you examples from the selected slice. When the data reloads in the What-If Tool above, try modifying Color By to toxicity. This can give you a visual understanding of the toxicity balance of examples by slice.

event_handlers={'slice-selected':
                wit.create_selection_callback(wit_data, DEFAULT_MAX_EXAMPLES)}
widget_view.render_fairness_indicator(eval_result=eval_result,
                                      slicing_column=slice_selection,
                                      event_handlers=event_handlers
                                      )
FairnessIndicatorViewer(slicingMetrics=[{'sliceValue': 'Overall', 'slice': 'Overall', 'metrics': {'post_export…

With this particular dataset and task, systematically higher false positive and false negative rates for certain identities can lead to negative consequences. For example, in a content moderation system, a higher-than-overall false positive rate for a certain group can lead to those voices being silenced. Thus, it is important to regularly evaluate these types of criteria as you develop and improve models, and utilize tools such as Fairness Indicators, TFDV, and WIT to help illuminate potential problems. Once you've identified fairness issues, you can experiment with new data sources, data balancing, or other techniques to improve performance on underperforming groups.

For more information and guidance on how Fairness Indicators can be used, see this link.