FaceSSD Fairness Indicators Example Colab

View on TensorFlow.org Run in Google Colab View on GitHub Download notebook

Overview

In this activity, you'll use Fairness Indicators to explore the FaceSSD predictions on Labeled Faces in the Wild dataset. Fairness Indicators is a suite of tools built on top of TensorFlow Model Analysis that enable regular evaluation of fairness metrics in product pipelines.

About the Dataset

In this exercise, you'll work with the FaceSSD prediction dataset, approximately 200k different image predictions and groundtruths generated by FaceSSD API.

About the Tools

TensorFlow Model Analysis is a library for evaluating both TensorFlow and non-TensorFlow machine learning models. It allows users to evaluate their models on large amounts of data in a distributed manner, computing in-graph and other metrics over different slices of data and visualize in notebooks.

TensorFlow Data Validation is one tool you can use to analyze your data. You can use it to find potential problems in your data, such as missing values and data imbalances, that can lead to Fairness disparities.

With Fairness Indicators, users will be able to:

  • Evaluate model performance, sliced across defined groups of users
  • Feel confident about results with confidence intervals and evaluations at multiple thresholds

Importing

Run the following code to install the fairness_indicators library. This package contains the tools we'll be using in this exercise. Restart Runtime may be requested but is not necessary.

pip install apache_beam
pip install fairness-indicators
pip install witwidget
import os
import tempfile
import apache_beam as beam
import numpy as np
import pandas as pd
from datetime import datetime

import tensorflow_hub as hub
import tensorflow as tf
import tensorflow_model_analysis as tfma
import tensorflow_data_validation as tfdv
from tensorflow_model_analysis.addons.fairness.post_export_metrics import fairness_indicators
from tensorflow_model_analysis.addons.fairness.view import widget_view
from tensorflow_model_analysis.model_agnostic_eval import model_agnostic_predict as agnostic_predict
from tensorflow_model_analysis.model_agnostic_eval import model_agnostic_evaluate_graph
from tensorflow_model_analysis.model_agnostic_eval import model_agnostic_extractor

from witwidget.notebook.visualization import WitConfigBuilder
from witwidget.notebook.visualization import WitWidget

Download and Understand the Data

Labeled Faces in the Wild is a public benchmark dataset for face verification, also known as pair matching. LFW contains more than 13,000 images of faces collected from the web.

We ran FaceSSD predictions on this dataset to predict whether a face is present in a given image. In this Colab, we will slice data according to gender to observe if there are any significant differences between model performance for different gender groups.

If there is more than one face in an image, gender is labeled as "MISSING".

We've hosted the dataset on Google Cloud Platform for convenience. Run the following code to download the data from GCP, the data will take about a minute to download and analyze.

data_location = tf.keras.utils.get_file('lfw_dataset.tf', 'https://storage.googleapis.com/facessd_dataset/lfw_dataset.tfrecord')

stats = tfdv.generate_statistics_from_tfrecord(data_location=data_location)
tfdv.visualize_statistics(stats)
Downloading data from https://storage.googleapis.com/facessd_dataset/lfw_dataset.tfrecord
200828483/200828483 [==============================] - 2s 0us/step
WARNING:apache_beam.runners.interactive.interactive_environment:Dependencies required for Interactive Beam PCollection visualization are not available, please use: `pip install apache-beam[interactive]` to install necessary dependencies to enable all data visualization features.
WARNING:apache_beam.io.tfrecordio:Couldn't find python-snappy so the implementation of _TFRecordUtil._masked_crc32c is not as fast as it could be.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.8/site-packages/tensorflow_data_validation/utils/statistics_io_impl.py:91: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.8/site-packages/tensorflow_data_validation/utils/statistics_io_impl.py:91: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`

Defining Constants

BASE_DIR = tempfile.gettempdir()

tfma_eval_result_path = os.path.join(BASE_DIR, 'tfma_eval_result')

compute_confidence_intervals = True

slice_key = 'object/groundtruth/Gender'
label_key = 'object/groundtruth/face'
prediction_key = 'object/prediction/face'

feature_map = {
    slice_key:
        tf.io.FixedLenFeature([], tf.string, default_value=['none']),
    label_key:
        tf.io.FixedLenFeature([], tf.float32, default_value=[0.0]),
    prediction_key:
        tf.io.FixedLenFeature([], tf.float32, default_value=[0.0]),
}

Model Agnostic Config for TFMA

model_agnostic_config = agnostic_predict.ModelAgnosticConfig(
    label_keys=[label_key],
    prediction_keys=[prediction_key],
    feature_spec=feature_map)

model_agnostic_extractors = [
    model_agnostic_extractor.ModelAgnosticExtractor(
        model_agnostic_config=model_agnostic_config, desired_batch_size=3),
    tfma.extractors.slice_key_extractor.SliceKeyExtractor(
          [tfma.slicer.SingleSliceSpec(),
           tfma.slicer.SingleSliceSpec(columns=[slice_key])])
]

Fairness Callbacks and Computing Fairness Metrics

# Helper class for counting examples in beam PCollection
class CountExamples(beam.CombineFn):
    def __init__(self, message):
      self.message = message

    def create_accumulator(self):
      return 0

    def add_input(self, current_sum, element):
      return current_sum + 1

    def merge_accumulators(self, accumulators): 
      return sum(accumulators)

    def extract_output(self, final_sum):
      if final_sum:
        print("%s: %d"%(self.message, final_sum))
metrics_callbacks = [
  tfma.post_export_metrics.fairness_indicators(
      thresholds=[0.1, 0.3, 0.5, 0.7, 0.9],
      labels_key=label_key,
      target_prediction_keys=[prediction_key]),
  tfma.post_export_metrics.auc(
      curve='PR',
      labels_key=label_key,
      target_prediction_keys=[prediction_key]),
]

eval_shared_model = tfma.types.EvalSharedModel(
    add_metrics_callbacks=metrics_callbacks,
    construct_fn=model_agnostic_evaluate_graph.make_construct_fn(
        add_metrics_callbacks=metrics_callbacks,
        config=model_agnostic_config))

with beam.Pipeline() as pipeline:
  # Read data.
  data = (
      pipeline
      | 'ReadData' >> beam.io.ReadFromTFRecord(data_location))

  # Count all examples.
  data_count = (
      data | 'Count number of examples' >> beam.CombineGlobally(
          CountExamples('Before filtering "Gender:MISSING"')))

  # If there are more than one face in image, the gender feature is 'MISSING'
  # and we are filtering that image out.
  def filter_missing_gender(element):
    example = tf.train.Example.FromString(element)
    if example.features.feature[slice_key].bytes_list.value[0] != b'MISSING':
      yield element

  filtered_data = (
      data
      | 'Filter Missing Gender' >> beam.ParDo(filter_missing_gender))

  # Count after filtering "Gender:MISSING".
  filtered_data_count = (
      filtered_data | 'Count number of examples after filtering'
      >> beam.CombineGlobally(
          CountExamples('After filtering "Gender:MISSING"')))

  # Because LFW data set has always faces by default, we are adding
  # labels as 1.0 for all images.
  def add_face_groundtruth(element):
    example = tf.train.Example.FromString(element)
    example.features.feature[label_key].float_list.value[:] = [1.0]
    yield example.SerializeToString()

  final_data = (
      filtered_data
      | 'Add Face Groundtruth' >> beam.ParDo(add_face_groundtruth))

  # Run TFMA.
  _ = (
      final_data
      | 'ExtractEvaluateAndWriteResults' >>
       tfma.ExtractEvaluateAndWriteResults(
                 eval_shared_model=eval_shared_model,
                 compute_confidence_intervals=compute_confidence_intervals,
                 output_path=tfma_eval_result_path,
                 extractors=model_agnostic_extractors))

eval_result = tfma.load_eval_result(output_path=tfma_eval_result_path)
2023-11-16 10:13:47.541344: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1960] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.8/site-packages/tensorflow_model_analysis/post_export_metrics/post_export_metrics.py:167: auc (from tensorflow.python.ops.metrics_impl) is deprecated and will be removed in a future version.
Instructions for updating:
The value of AUC returned by this may race with the update so this is deprecated. Please use tf.keras.metrics.AUC instead.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.8/site-packages/tensorflow_model_analysis/post_export_metrics/post_export_metrics.py:167: auc (from tensorflow.python.ops.metrics_impl) is deprecated and will be removed in a future version.
Instructions for updating:
The value of AUC returned by this may race with the update so this is deprecated. Please use tf.keras.metrics.AUC instead.
Before filtering "Gender:MISSING": 13836
After filtering "Gender:MISSING": 11544

Render Fairness Indicators

Render the Fairness Indicators widget with the exported evaluation results.

Below you will see bar charts displaying performance of each slice of the data on selected metrics. You can adjust the baseline comparison slice as well as the displayed threshold(s) using the drop down menus at the top of the visualization.

A relevant metric for this use case is true positive rate, also known as recall. Use the selector on the left hand side to choose the graph for true_positive_rate. These metric values match the values displayed on the model card.

For some photos, gender is labeled as young instead of male or female, if the person in the photo is too young to be accurately annotated.

widget_view.render_fairness_indicator(eval_result=eval_result,
                                      slicing_column=slice_key)
FairnessIndicatorViewer(slicingMetrics=[{'sliceValue': 'Gender: Male', 'slice': 'object/groundtruth/Gender:Gen…