Fairness Indicators Case Study to showcase Lineage

In this activity, you'll use Fairness Indicators and ML Metadata to explore the COMPAS dataset within a TensorFlow Extended pipeline.

COMPAS Dataset

COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) is a public dataset, which contains approximately 18,000 criminal cases from Broward County, Florida between January, 2013 and December, 2014. The data contains information about 11,000 unique defendants, including criminal history demographics, and a risk score intended to represent the defendant’s likelihood of reoffending (recidivism). A machine learning model trained on this data has been used by judges and parole officers to determine whether or not to set bail and whether or not to grant parole.

In 2016, an article published in ProPublica found that the COMPAS model was incorrectly predicting that African-American defendants would recidivate at much higher rates than their white counterparts while Caucasian would not recidivate at a much higher rate. For Caucasian defendants, the model made mistakes in the opposite direction, making incorrect predictions that they wouldn’t commit another crime. The authors went on to show that these biases were likely due to an uneven distribution in the data between African-Americans and Caucasian defendants. Specifically, the ground truth label of a negative example (a defendant would not commit another crime) and a positive example (defendant would commit another crime) were disproportionate between the two races. Since 2016, the COMPAS dataset has appeared frequently in the ML fairness literature 1, 2, 3, with researchers using it to demonstrate techniques for identifying and remediating fairness concerns. This tutorial from the FAT* 2018 conference illustrates how COMPAS can dramatically impact a defendant’s prospects in the real world.

It is important to note that developing a machine learning model to predict pre-trial detention has a number of important ethical considerations. You can learn more about these issues in the Partnership on AI “Report on Algorithmic Risk Assessment Tools in the U.S. Criminal Justice System.” The Partnership on AI is a multi-stakeholder organization -- of which Google is a member -- that creates guidelines around AI.

We’re using the COMPAS dataset only as an example of how to identify and remediate fairness concerns in data. This dataset is canonical in the algorithmic fairness literature.

About the Tools in this Case Study

  • TensorFlow Extended (TFX) is a Google-production-scale machine learning platform based on TensorFlow. It provides a configuration framework and shared libraries to integrate common components needed to define, launch, and monitor your machine learning system.

  • TensorFlow Model Analysis is a library for evaluating machine learning models. Users can evaluate their models on a large amount of data in a distributed manner and view metrics over different slices within a notebook.

  • Fairness Indicators is a suite of tools built on top of TensorFlow Model Analysis that enables regular evaluation of fairness metrics in product pipelines.

  • ML Metadata is a library for recording and retrieving the lineage and metadata of ML artifacts such as models, datasets, and metrics. Within TFX ML Metadata will help us understand the artifacts created in a pipeline, which is a unit of data that is passed between TFX components.

  • TensorFlow Data Validation is a library to analyze your data and check for errors that can affect model training or serving.

Case Study Overview

For the duration of this case study we will define “fairness concerns” as a bias within a model that negatively impacts a slice within our data. Specifically, we’re trying to limit any recidivism prediction that could be biased towards race.

The walk through of the case study will proceed as follows:

  1. Download the data, preprocess, and explore the initial dataset.
  2. Build a TFX pipeline with the COMPAS dataset using a Keras binary classifier.
  3. Run our results through TensorFlow Model Analysis, TensorFlow Data Validation, and load Fairness Indicators to explore any potential fairness concerns within our model.
  4. Use ML Metadata to track all the artifacts for a model that we trained with TFX.
  5. Weight the initial COMPAS dataset for our second model to account for the uneven distribution between recidivism and race.
  6. Review the performance changes within the new dataset.
  7. Check the underlying changes within our TFX pipeline with ML Metadata to understand what changes were made between the two models.

Helpful Resources

This case study is an extension of the below case studies. It is recommended working through the below case studies first.

Setup

To start, we will install the necessary packages, download the data, and import the required modules for the case study.

To install the required packages for this case study in your notebook run the below PIP command.


  1. Wadsworth, C., Vera, F., Piech, C. (2017). Achieving Fairness Through Adversarial Learning: an Application to Recidivism Prediction. https://arxiv.org/abs/1807.00199

  2. Chouldechova, A., G’Sell, M., (2017). Fairer and more accurate, but for whom? https://arxiv.org/abs/1707.00046

  3. Berk et al., (2017), Fairness in Criminal Justice Risk Assessments: The State of the Art, https://arxiv.org/abs/1703.09207

!python -m pip install -q -U \
  tensorflow==2.1.1 \
  tfx==0.22.0 \
  tensorflow-model-analysis==0.22.1 \
  tensorflow_data_validation==0.22.0 \
  tensorflow-metadata==0.22.0 \
  tensorflow-transform==0.22.0 \
  ml-metadata==0.22.0 \
  tfx-bsl==0.22.0 \
  absl-py==0.8.1

 # If prompted, please restart the Colab environment after the pip installs
 # as you might run into import errors.
ERROR: tensorflow-serving-api 2.2.0 has requirement tensorflow~=2.2.0, but you'll have tensorflow 2.1.1 which is incompatible.
ERROR: tensorflow-data-validation 0.22.0 has requirement pandas<1,>=0.24, but you'll have pandas 1.0.4 which is incompatible.

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import os
import tempfile
import six.moves.urllib as urllib

from ml_metadata.metadata_store import metadata_store
from ml_metadata.proto import metadata_store_pb2

import pandas as pd
from google.protobuf import text_format
from sklearn.utils import shuffle
import tensorflow as tf
import tensorflow_data_validation as tfdv

import tensorflow_model_analysis as tfma
from tensorflow_model_analysis.addons.fairness.post_export_metrics import fairness_indicators
from tensorflow_model_analysis.addons.fairness.view import widget_view

import tfx
from tfx.components.evaluator.component import Evaluator
from tfx.components.example_gen.csv_example_gen.component import CsvExampleGen
from tfx.components.schema_gen.component import SchemaGen
from tfx.components.statistics_gen.component import StatisticsGen
from tfx.components.trainer.component import Trainer
from tfx.components.transform.component import Transform
from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext
from tfx.proto import evaluator_pb2
from tfx.proto import trainer_pb2
from tfx.utils.dsl_utils import external_input

Download and preprocess the dataset

# Download the COMPAS dataset and setup the required filepaths.
_DATA_ROOT = tempfile.mkdtemp(prefix='tfx-data')
_DATA_PATH = 'https://storage.googleapis.com/compas_dataset/cox-violent-parsed.csv'
_DATA_FILEPATH = os.path.join(_DATA_ROOT, 'compas-scores-two-years.csv')

data = urllib.request.urlopen(_DATA_PATH)
_COMPAS_DF = pd.read_csv(data)

# To simpliy the case study, we will only use the columns that will be used for
# our model.
_COLUMN_NAMES = [
  'age',
  'c_charge_desc',
  'c_charge_degree',
  'c_days_from_compas',
  'is_recid',
  'juv_fel_count',
  'juv_misd_count',
  'juv_other_count',
  'priors_count',
  'r_days_from_arrest',
  'race',
  'sex',
  'vr_charge_desc',                
]
_COMPAS_DF = _COMPAS_DF[_COLUMN_NAMES]

# We will use 'is_recid' as our ground truth lable, which is boolean value
# indicating if a defendant committed another crime. There are some rows with -1
# indicating that there is no data. These rows we will drop from training.
_COMPAS_DF = _COMPAS_DF[_COMPAS_DF['is_recid'] != -1]

# Given the distribution between races in this dataset we will only focuse on
# recidivism for African-Americans and Caucasians.
_COMPAS_DF = _COMPAS_DF[
  _COMPAS_DF['race'].isin(['African-American', 'Caucasian'])]

# Adding we weight feature that will be used during the second part of this
# case study to help improve fairness concerns.
_COMPAS_DF['sample_weight'] = 0.8

# Load the DataFrame back to a CSV file for our TFX model.
_COMPAS_DF.to_csv(_DATA_FILEPATH, index=False, na_rep='')

Building a TFX Pipeline


There are several TFX Pipeline Components that can be used for a production model, but for the purpose the this case study will focus on using only the below components:

  • ExampleGen to read our dataset.
  • StatisticsGen to calculate the statistics of our dataset.
  • SchemaGen to create a data schema.
  • Transform for feature engineering.
  • Trainer to run our machine learning model.

Create the InteractiveContext

To run TFX within a notebook, we first will need to create an InteractiveContext to run the components interactively.

InteractiveContext will use a temporary directory with an ephemeral ML Metadata database instance. To use your own pipeline root or database, the optional properties pipeline_root and metadata_connection_config may be passed to InteractiveContext.

context = InteractiveContext()
WARNING:absl:InteractiveContext pipeline_root argument not provided: using temporary directory /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp as root for pipeline outputs.
WARNING:absl:InteractiveContext metadata_connection_config not provided: using SQLite ML Metadata database at /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/metadata.sqlite.

TFX ExampleGen Component

# The ExampleGen TFX Pipeline component ingests data into TFX pipelines.
# It consumes external files/services to generate Examples which will be read by
# other TFX components. It also provides consistent and configurable partition,
# and shuffles the dataset for ML best practice.

example_gen = CsvExampleGen(input=external_input(_DATA_ROOT))
context.run(example_gen)
WARNING:apache_beam.runners.interactive.interactive_environment:Dependencies required for Interactive Beam PCollection visualization are not available, please use: `pip install apache-beam[interactive]` to install necessary dependencies to enable all data visualization features.

Warning:apache_beam.io.tfrecordio:Couldn't find python-snappy so the implementation of _TFRecordUtil._masked_crc32c is not as fast as it could be.

TFX StatisticsGen Component

# The StatisticsGen TFX pipeline component generates features statistics over
# both training and serving data, which can be used by other pipeline
# components. StatisticsGen uses Beam to scale to large datasets.

statistics_gen = StatisticsGen(examples=example_gen.outputs['examples'])
context.run(statistics_gen)

TFX SchemaGen Component

# Some TFX components use a description of your input data called a schema. The
# schema is an instance of schema.proto. It can specify data types for feature
# values, whether a feature has to be present in all examples, allowed value
# ranges, and other properties. A SchemaGen pipeline component will
# automatically generate a schema by inferring types, categories, and ranges
# from the training data.

infer_schema = SchemaGen(statistics=statistics_gen.outputs['statistics'])
context.run(infer_schema)
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_data_validation/utils/stats_util.py:227: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`

TFX Transform Component

The Transform component performs data transformations and feature engineering. The results include an input TensorFlow graph which is used during both training and serving to preprocess the data before training or inference. This graph becomes part of the SavedModel that is the result of model training. Since the same input graph is used for both training and serving, the preprocessing will always be the same, and only needs to be written once.

The Transform component requires more code than many other components because of the arbitrary complexity of the feature engineering that you may need for the data and/or model that you're working with.

Define some constants and functions for both the Transform component and the Trainer component. Define them in a Python module, in this case saved to disk using the %%writefile magic command since you are working in a notebook.

The transformation that we will be performing in this case study are as follows:

  • For string values we will generate a vocabulary that maps to an integer via tft.compute_and_apply_vocabulary.
  • For integer values we will standardize the column mean 0 and variance 1 via tft.scale_to_z_score.
  • Remove empty row values and replace them with an empty string or 0 depending on the feature type.
  • Append ‘_xf’ to column names to denote the features that were processed in the Transform Component.

Now let's define a module containing the preprocessing_fn() function that we will pass to the Transform component:

# Setup paths for the Transform Component.
_transform_module_file = 'compas_transform.py'
%%writefile {_transform_module_file}
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import tensorflow as tf
import tensorflow_transform as tft

CATEGORICAL_FEATURE_KEYS = [
    'sex',
    'race',
    'c_charge_desc',
    'c_charge_degree',
]

INT_FEATURE_KEYS = [
    'age',
    'c_days_from_compas',
    'juv_fel_count',
    'juv_misd_count',
    'juv_other_count',
    'priors_count',
    'sample_weight',
]

LABEL_KEY = 'is_recid'

# List of the unique values for the items within CATEGORICAL_FEATURE_KEYS.
MAX_CATEGORICAL_FEATURE_VALUES = [
    2,
    6,
    513,
    14,
]


def transformed_name(key):
  return '{}_xf'.format(key)


def preprocessing_fn(inputs):
  """tf.transform's callback function for preprocessing inputs.

  Args:
    inputs: Map from feature keys to raw features.

  Returns:
    Map from string feature key to transformed feature operations.
  """
  outputs = {}
  for key in CATEGORICAL_FEATURE_KEYS:
    outputs[transformed_name(key)] = tft.compute_and_apply_vocabulary(
        _fill_in_missing(inputs[key]),
        vocab_filename=key)

  for key in INT_FEATURE_KEYS:
    outputs[transformed_name(key)] = tft.scale_to_z_score(
        _fill_in_missing(inputs[key]))

  # Target label will be to see if the defendant is charged for another crime.
  outputs[transformed_name(LABEL_KEY)] = _fill_in_missing(inputs[LABEL_KEY])
  return outputs


def _fill_in_missing(tensor_value):
  """Replaces a missing values in a SparseTensor.

  Fills in missing values of `tensor_value` with '' or 0, and converts to a
  dense tensor.

  Args:
    tensor_value: A `SparseTensor` of rank 2. Its dense shape should have size
      at most 1 in the second dimension.

  Returns:
    A rank 1 tensor where missing values of `tensor_value` are filled in.
  """
  default_value = '' if tensor_value.dtype == tf.string else 0
  sparse_tensor = tf.SparseTensor(
      tensor_value.indices,
      tensor_value.values,
      [tensor_value.dense_shape[0], 1])
  dense_tensor = tf.sparse.to_dense(sparse_tensor, default_value)
  return tf.squeeze(dense_tensor, axis=1)

Writing compas_transform.py

# Build and run the Transform Component.
transform = Transform(
    examples=example_gen.outputs['examples'],
    schema=infer_schema.outputs['schema'],
    module_file=_transform_module_file
)
context.run(transform)
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tfx/components/transform/executor.py:511: Schema (from tensorflow_transform.tf_metadata.dataset_schema) is deprecated and will be removed in a future version.
Instructions for updating:
Schema is a deprecated, use schema_utils.schema_from_feature_spec to create a `Schema`
WARNING:tensorflow:Tensorflow version (2.1.1) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended. 

Warning:apache_beam.utils.interactive_utils:Failed to alter the label of a transform with the ipython prompt metadata. Cannot figure out the pipeline that the given pvalueish ({DatasetKey(key='tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp-CsvExampleGen-examples-1-train-STAR'): <PCollection[Decode[AnalysisIndex0]/ApplyDecodeFn.None] at 0x7fd50efc1b38>}, None, {'_schema': feature {
  name: "c_charge_degree"
  value_count {
    min: 1
    max: 1
  }
  type: BYTES
  domain: "c_charge_degree"
  presence {
    min_count: 1
  }
}
feature {
  name: "c_charge_desc"
  value_count {
    min: 1
    max: 1
  }
  type: BYTES
  presence {
    min_count: 1
  }
}
feature {
  name: "race"
  value_count {
    min: 1
    max: 1
  }
  type: BYTES
  domain: "race"
  presence {
    min_fraction: 1.0
    min_count: 1
  }
}
feature {
  name: "sex"
  value_count {
    min: 1
    max: 1
  }
  type: BYTES
  domain: "sex"
  presence {
    min_fraction: 1.0
    min_count: 1
  }
}
feature {
  name: "vr_charge_desc"
  value_count {
    min: 1
    max: 1
  }
  type: BYTES
  domain: "vr_charge_desc"
  presence {
    min_count: 1
  }
}
feature {
  name: "age"
  value_count {
    min: 1
    max: 1
  }
  type: INT
  presence {
    min_fraction: 1.0
    min_count: 1
  }
}
feature {
  name: "c_days_from_compas"
  value_count {
    min: 1
    max: 1
  }
  type: FLOAT
  presence {
    min_count: 1
  }
}
feature {
  name: "is_recid"
  value_count {
    min: 1
    max: 1
  }
  type: INT
  bool_domain {
  }
  presence {
    min_fraction: 1.0
    min_count: 1
  }
}
feature {
  name: "juv_fel_count"
  value_count {
    min: 1
    max: 1
  }
  type: INT
  presence {
    min_fraction: 1.0
    min_count: 1
  }
}
feature {
  name: "juv_misd_count"
  value_count {
    min: 1
    max: 1
  }
  type: INT
  presence {
    min_fraction: 1.0
    min_count: 1
  }
}
feature {
  name: "juv_other_count"
  value_count {
    min: 1
    max: 1
  }
  type: INT
  presence {
    min_fraction: 1.0
    min_count: 1
  }
}
feature {
  name: "priors_count"
  value_count {
    min: 1
    max: 1
  }
  type: INT
  presence {
    min_fraction: 1.0
    min_count: 1
  }
}
feature {
  name: "r_days_from_arrest"
  value_count {
    min: 1
    max: 1
  }
  type: FLOAT
  presence {
    min_count: 1
  }
}
feature {
  name: "sample_weight"
  value_count {
    min: 1
    max: 1
  }
  type: FLOAT
  presence {
    min_fraction: 1.0
    min_count: 1
  }
}
string_domain {
  name: "c_charge_degree"
  value: "(CO3)"
  value: "(CT)"
  value: "(F1)"
  value: "(F2)"
  value: "(F3)"
  value: "(F5)"
  value: "(F6)"
  value: "(F7)"
  value: "(M1)"
  value: "(M2)"
  value: "(MO3)"
  value: "(NI0)"
  value: "(X)"
}
string_domain {
  name: "race"
  value: "African-American"
  value: "Caucasian"
}
string_domain {
  name: "sex"
  value: "Female"
  value: "Male"
}
string_domain {
  name: "vr_charge_desc"
  value: "Agg Assault Law Enforc Officer"
  value: "Agg Assault W/int Com Fel Dome"
  value: "Agg Batt W/Arm S/B/I 25 Min/Ma"
  value: "Agg Battery Bod Hrm-Deadly Weap"
  value: "Agg Battery Grt/Bod/Harm"
  value: "Agg Battery Law Enforc Officer"
  value: "Agg Flee/Eluding (Injury/Prop Damage)"
  value: "Agg Fleeing and Eluding"
  value: "Agg Fleeing/Eluding High Speed"
  value: "Aggrav Battery w/Deadly Weapon"
  value: "Aggrav Child Abuse-Agg Battery"
  value: "Aggravated Assault"
  value: "Aggravated Assault W/Dead Weap"
  value: "Aggravated Assault W/dead Weap"
  value: "Aggravated Assault w/Firearm"
  value: "Aggravated Battery"
  value: "Aggravated Battery / Pregnant"
  value: "Armed Carjacking"
  value: "Armed False Imprisonment"
  value: "Armed Sex Batt/vict 12 Yrs +"
  value: "Arson in the Second Degree"
  value: "Assault"
  value: "Assault On Law Enforc Officer"
  value: "Attempt Felony Murder"
  value: "Attempt Murder in the First Degree"
  value: "Attempted Robbery  No Weapon"
  value: "Attempted Robbery Firearm"
  value: "Battery"
  value: "Battery Emergency Care Provide"
  value: "Battery Spouse Or Girlfriend"
  value: "Battery Upon Detainee"
  value: "Battery on Law Enforc Officer"
  value: "Battery on a Person Over 65"
  value: "Burglary Conveyance Assault/Bat"
  value: "Burglary Dwelling Armed"
  value: "Burglary Dwelling Assault/Batt"
  value: "Burglary With Assault/battery"
  value: "Child Abuse"
  value: "Cruelty To Animals"
  value: "D.U.I. Serious Bodily Injury"
  value: "DOC/Engage In Fighting"
  value: "Felony Batt(Great Bodily Harm)"
  value: "Felony Battery"
  value: "Felony Battery (Dom Strang)"
  value: "Felony Battery w/Prior Convict"
  value: "Home Invasion Robbery"
  value: "Kidnapping (Facilitate Felony)"
  value: "Manslaughter with Weapon"
  value: "Murder in the First Degree"
  value: "Neglect Child / Bodily Harm"
  value: "Robbery"
  value: "Robbery / No Weapon"
  value: "Robbery Sudd Snatch No Weapon"
  value: "Robbery Sudd Snatch w/Weapon"
  value: "Robbery W/Deadly Weapon"
  value: "Robbery W/Firearm"
  value: "Robbery-Strong Arm W/mask"
  value: "Sex Batt Faml/Cust Vict 12-17Y"
  value: "Sexual Battery / Vict 12 Yrs +"
  value: "Shoot/Throw Into Vehicle"
  value: "Stalking (Aggravated)"
  value: "Strong Armed  Robbery"
  value: "Threat Public Servant"
  value: "Threaten Throw Destruct Device"
  value: "Throw Deadly Missile Into Veh"
  value: "Throw In Occupied Dwell"
  value: "Vehicular Homicide"
}
}) belongs to. Thus noop.

Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_core/python/saved_model/signature_def_utils_impl.py:201: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:No assets to write.
WARNING:tensorflow:Issue encountered when serializing tft_mapper_use.
Type is unsupported, or the types of the items don't match field type in CollectionDef. Note this is a warning and probably safe to ignore.
'Counter' object has no attribute 'name'
INFO:tensorflow:SavedModel written to: /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Transform/transform_graph/4/.temp_path/tftransform_tmp/8a79f16666014e5a8ee2c2d44b91e381/saved_model.pb
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:No assets to write.
WARNING:tensorflow:Issue encountered when serializing tft_mapper_use.
Type is unsupported, or the types of the items don't match field type in CollectionDef. Note this is a warning and probably safe to ignore.
'Counter' object has no attribute 'name'
INFO:tensorflow:SavedModel written to: /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Transform/transform_graph/4/.temp_path/tftransform_tmp/ea2513b118fe4759a0744c477a2fa8ca/saved_model.pb

Warning:apache_beam.utils.interactive_utils:Failed to alter the label of a transform with the ipython prompt metadata. Cannot figure out the pipeline that the given pvalueish {'_schema': feature {
  name: "c_charge_degree"
  value_count {
    min: 1
    max: 1
  }
  type: BYTES
  domain: "c_charge_degree"
  presence {
    min_count: 1
  }
}
feature {
  name: "c_charge_desc"
  value_count {
    min: 1
    max: 1
  }
  type: BYTES
  presence {
    min_count: 1
  }
}
feature {
  name: "race"
  value_count {
    min: 1
    max: 1
  }
  type: BYTES
  domain: "race"
  presence {
    min_fraction: 1.0
    min_count: 1
  }
}
feature {
  name: "sex"
  value_count {
    min: 1
    max: 1
  }
  type: BYTES
  domain: "sex"
  presence {
    min_fraction: 1.0
    min_count: 1
  }
}
feature {
  name: "vr_charge_desc"
  value_count {
    min: 1
    max: 1
  }
  type: BYTES
  domain: "vr_charge_desc"
  presence {
    min_count: 1
  }
}
feature {
  name: "age"
  value_count {
    min: 1
    max: 1
  }
  type: INT
  presence {
    min_fraction: 1.0
    min_count: 1
  }
}
feature {
  name: "c_days_from_compas"
  value_count {
    min: 1
    max: 1
  }
  type: FLOAT
  presence {
    min_count: 1
  }
}
feature {
  name: "is_recid"
  value_count {
    min: 1
    max: 1
  }
  type: INT
  bool_domain {
  }
  presence {
    min_fraction: 1.0
    min_count: 1
  }
}
feature {
  name: "juv_fel_count"
  value_count {
    min: 1
    max: 1
  }
  type: INT
  presence {
    min_fraction: 1.0
    min_count: 1
  }
}
feature {
  name: "juv_misd_count"
  value_count {
    min: 1
    max: 1
  }
  type: INT
  presence {
    min_fraction: 1.0
    min_count: 1
  }
}
feature {
  name: "juv_other_count"
  value_count {
    min: 1
    max: 1
  }
  type: INT
  presence {
    min_fraction: 1.0
    min_count: 1
  }
}
feature {
  name: "priors_count"
  value_count {
    min: 1
    max: 1
  }
  type: INT
  presence {
    min_fraction: 1.0
    min_count: 1
  }
}
feature {
  name: "r_days_from_arrest"
  value_count {
    min: 1
    max: 1
  }
  type: FLOAT
  presence {
    min_count: 1
  }
}
feature {
  name: "sample_weight"
  value_count {
    min: 1
    max: 1
  }
  type: FLOAT
  presence {
    min_fraction: 1.0
    min_count: 1
  }
}
string_domain {
  name: "c_charge_degree"
  value: "(CO3)"
  value: "(CT)"
  value: "(F1)"
  value: "(F2)"
  value: "(F3)"
  value: "(F5)"
  value: "(F6)"
  value: "(F7)"
  value: "(M1)"
  value: "(M2)"
  value: "(MO3)"
  value: "(NI0)"
  value: "(X)"
}
string_domain {
  name: "race"
  value: "African-American"
  value: "Caucasian"
}
string_domain {
  name: "sex"
  value: "Female"
  value: "Male"
}
string_domain {
  name: "vr_charge_desc"
  value: "Agg Assault Law Enforc Officer"
  value: "Agg Assault W/int Com Fel Dome"
  value: "Agg Batt W/Arm S/B/I 25 Min/Ma"
  value: "Agg Battery Bod Hrm-Deadly Weap"
  value: "Agg Battery Grt/Bod/Harm"
  value: "Agg Battery Law Enforc Officer"
  value: "Agg Flee/Eluding (Injury/Prop Damage)"
  value: "Agg Fleeing and Eluding"
  value: "Agg Fleeing/Eluding High Speed"
  value: "Aggrav Battery w/Deadly Weapon"
  value: "Aggrav Child Abuse-Agg Battery"
  value: "Aggravated Assault"
  value: "Aggravated Assault W/Dead Weap"
  value: "Aggravated Assault W/dead Weap"
  value: "Aggravated Assault w/Firearm"
  value: "Aggravated Battery"
  value: "Aggravated Battery / Pregnant"
  value: "Armed Carjacking"
  value: "Armed False Imprisonment"
  value: "Armed Sex Batt/vict 12 Yrs +"
  value: "Arson in the Second Degree"
  value: "Assault"
  value: "Assault On Law Enforc Officer"
  value: "Attempt Felony Murder"
  value: "Attempt Murder in the First Degree"
  value: "Attempted Robbery  No Weapon"
  value: "Attempted Robbery Firearm"
  value: "Battery"
  value: "Battery Emergency Care Provide"
  value: "Battery Spouse Or Girlfriend"
  value: "Battery Upon Detainee"
  value: "Battery on Law Enforc Officer"
  value: "Battery on a Person Over 65"
  value: "Burglary Conveyance Assault/Bat"
  value: "Burglary Dwelling Armed"
  value: "Burglary Dwelling Assault/Batt"
  value: "Burglary With Assault/battery"
  value: "Child Abuse"
  value: "Cruelty To Animals"
  value: "D.U.I. Serious Bodily Injury"
  value: "DOC/Engage In Fighting"
  value: "Felony Batt(Great Bodily Harm)"
  value: "Felony Battery"
  value: "Felony Battery (Dom Strang)"
  value: "Felony Battery w/Prior Convict"
  value: "Home Invasion Robbery"
  value: "Kidnapping (Facilitate Felony)"
  value: "Manslaughter with Weapon"
  value: "Murder in the First Degree"
  value: "Neglect Child / Bodily Harm"
  value: "Robbery"
  value: "Robbery / No Weapon"
  value: "Robbery Sudd Snatch No Weapon"
  value: "Robbery Sudd Snatch w/Weapon"
  value: "Robbery W/Deadly Weapon"
  value: "Robbery W/Firearm"
  value: "Robbery-Strong Arm W/mask"
  value: "Sex Batt Faml/Cust Vict 12-17Y"
  value: "Sexual Battery / Vict 12 Yrs +"
  value: "Shoot/Throw Into Vehicle"
  value: "Stalking (Aggravated)"
  value: "Strong Armed  Robbery"
  value: "Threat Public Servant"
  value: "Threaten Throw Destruct Device"
  value: "Throw Deadly Missile Into Veh"
  value: "Throw In Occupied Dwell"
  value: "Vehicular Homicide"
}
} belongs to. Thus noop.

Warning:tensorflow:Tensorflow version (2.1.1) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended. 

Warning:apache_beam.utils.interactive_utils:Failed to alter the label of a transform with the ipython prompt metadata. Cannot figure out the pipeline that the given pvalueish ((<PCollection[Decode[TransformIndex0]/ApplyDecodeFn.None] at 0x7fd50f0d90f0>, {'_schema': feature {
  name: "age"
  type: INT
}
feature {
  name: "c_charge_degree"
  type: BYTES
}
feature {
  name: "c_charge_desc"
  type: BYTES
}
feature {
  name: "c_days_from_compas"
  type: FLOAT
}
feature {
  name: "is_recid"
  type: INT
}
feature {
  name: "juv_fel_count"
  type: INT
}
feature {
  name: "juv_misd_count"
  type: INT
}
feature {
  name: "juv_other_count"
  type: INT
}
feature {
  name: "priors_count"
  type: INT
}
feature {
  name: "race"
  type: BYTES
}
feature {
  name: "sample_weight"
  type: FLOAT
}
feature {
  name: "sex"
  type: BYTES
}
}), (<PCollection[Analyze/CreateSavedModel/BindTensors/ReplaceWithConstants.None] at 0x7fd50fa517f0>, BeamDatasetMetadata(dataset_metadata={'_schema': feature {
  name: "age_xf"
  type: FLOAT
  presence {
    min_fraction: 1.0
  }
  shape {
  }
}
feature {
  name: "c_charge_degree_xf"
  type: INT
  int_domain {
    is_categorical: true
  }
  presence {
    min_fraction: 1.0
  }
  shape {
  }
}
feature {
  name: "c_charge_desc_xf"
  type: INT
  int_domain {
    is_categorical: true
  }
  presence {
    min_fraction: 1.0
  }
  shape {
  }
}
feature {
  name: "c_days_from_compas_xf"
  type: FLOAT
  presence {
    min_fraction: 1.0
  }
  shape {
  }
}
feature {
  name: "is_recid_xf"
  type: INT
  presence {
    min_fraction: 1.0
  }
  shape {
  }
}
feature {
  name: "juv_fel_count_xf"
  type: FLOAT
  presence {
    min_fraction: 1.0
  }
  shape {
  }
}
feature {
  name: "juv_misd_count_xf"
  type: FLOAT
  presence {
    min_fraction: 1.0
  }
  shape {
  }
}
feature {
  name: "juv_other_count_xf"
  type: FLOAT
  presence {
    min_fraction: 1.0
  }
  shape {
  }
}
feature {
  name: "priors_count_xf"
  type: FLOAT
  presence {
    min_fraction: 1.0
  }
  shape {
  }
}
feature {
  name: "race_xf"
  type: INT
  int_domain {
    is_categorical: true
  }
  presence {
    min_fraction: 1.0
  }
  shape {
  }
}
feature {
  name: "sample_weight_xf"
  type: FLOAT
  presence {
    min_fraction: 1.0
  }
  shape {
  }
}
feature {
  name: "sex_xf"
  type: INT
  int_domain {
    is_categorical: true
  }
  presence {
    min_fraction: 1.0
  }
  shape {
  }
}
}, deferred_metadata=<PCollection[Analyze/ComputeDeferredMetadata.None] at 0x7fd50f0d27b8>))) belongs to. Thus noop.
WARNING:apache_beam.typehints.typehints:Ignoring send_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring return_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring send_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring return_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring send_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring return_type hint: <class 'NoneType'>

Warning:tensorflow:Tensorflow version (2.1.1) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended. 

Warning:apache_beam.utils.interactive_utils:Failed to alter the label of a transform with the ipython prompt metadata. Cannot figure out the pipeline that the given pvalueish ((<PCollection[Decode[TransformIndex1]/ApplyDecodeFn.None] at 0x7fd50f1e1048>, {'_schema': feature {
  name: "age"
  type: INT
}
feature {
  name: "c_charge_degree"
  type: BYTES
}
feature {
  name: "c_charge_desc"
  type: BYTES
}
feature {
  name: "c_days_from_compas"
  type: FLOAT
}
feature {
  name: "is_recid"
  type: INT
}
feature {
  name: "juv_fel_count"
  type: INT
}
feature {
  name: "juv_misd_count"
  type: INT
}
feature {
  name: "juv_other_count"
  type: INT
}
feature {
  name: "priors_count"
  type: INT
}
feature {
  name: "race"
  type: BYTES
}
feature {
  name: "sample_weight"
  type: FLOAT
}
feature {
  name: "sex"
  type: BYTES
}
}), (<PCollection[Analyze/CreateSavedModel/BindTensors/ReplaceWithConstants.None] at 0x7fd50fa517f0>, BeamDatasetMetadata(dataset_metadata={'_schema': feature {
  name: "age_xf"
  type: FLOAT
  presence {
    min_fraction: 1.0
  }
  shape {
  }
}
feature {
  name: "c_charge_degree_xf"
  type: INT
  int_domain {
    is_categorical: true
  }
  presence {
    min_fraction: 1.0
  }
  shape {
  }
}
feature {
  name: "c_charge_desc_xf"
  type: INT
  int_domain {
    is_categorical: true
  }
  presence {
    min_fraction: 1.0
  }
  shape {
  }
}
feature {
  name: "c_days_from_compas_xf"
  type: FLOAT
  presence {
    min_fraction: 1.0
  }
  shape {
  }
}
feature {
  name: "is_recid_xf"
  type: INT
  presence {
    min_fraction: 1.0
  }
  shape {
  }
}
feature {
  name: "juv_fel_count_xf"
  type: FLOAT
  presence {
    min_fraction: 1.0
  }
  shape {
  }
}
feature {
  name: "juv_misd_count_xf"
  type: FLOAT
  presence {
    min_fraction: 1.0
  }
  shape {
  }
}
feature {
  name: "juv_other_count_xf"
  type: FLOAT
  presence {
    min_fraction: 1.0
  }
  shape {
  }
}
feature {
  name: "priors_count_xf"
  type: FLOAT
  presence {
    min_fraction: 1.0
  }
  shape {
  }
}
feature {
  name: "race_xf"
  type: INT
  int_domain {
    is_categorical: true
  }
  presence {
    min_fraction: 1.0
  }
  shape {
  }
}
feature {
  name: "sample_weight_xf"
  type: FLOAT
  presence {
    min_fraction: 1.0
  }
  shape {
  }
}
feature {
  name: "sex_xf"
  type: INT
  int_domain {
    is_categorical: true
  }
  presence {
    min_fraction: 1.0
  }
  shape {
  }
}
}, deferred_metadata=<PCollection[Analyze/ComputeDeferredMetadata.None] at 0x7fd50f0d27b8>))) belongs to. Thus noop.
WARNING:apache_beam.typehints.typehints:Ignoring send_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring return_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring send_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring return_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring send_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring return_type hint: <class 'NoneType'>

INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:Assets written to: /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Transform/transform_graph/4/.temp_path/tftransform_tmp/556d9114b9424a4c9c938c9351dfb698/assets
INFO:tensorflow:SavedModel written to: /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Transform/transform_graph/4/.temp_path/tftransform_tmp/556d9114b9424a4c9c938c9351dfb698/saved_model.pb
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_3:0\022\003sex"

Warning:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_5:0\022\004race"

Warning:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_7:0\022\rc_charge_desc"

Warning:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_9:0\022\017c_charge_degree"

INFO:tensorflow:Saver not created because there are no variables in the graph to restore
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_3:0\022\003sex"

Warning:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_5:0\022\004race"

Warning:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_7:0\022\rc_charge_desc"

Warning:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_9:0\022\017c_charge_degree"

INFO:tensorflow:Saver not created because there are no variables in the graph to restore
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_3:0\022\003sex"

Warning:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_5:0\022\004race"

Warning:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_7:0\022\rc_charge_desc"

Warning:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_9:0\022\017c_charge_degree"

INFO:tensorflow:Saver not created because there are no variables in the graph to restore

TFX Trainer Component

The Trainer Component trains a specified TensorFlow model.

In order to run the trainer component we need to create a Python module containing a trainer_fn function that will return an estimator for our model. If you prefer creating a Keras model, you can do so and then convert it to an estimator using keras.model_to_estimator().

The Trainer component trains a specified TensorFlow model. In order to run the model we need to create a Python module containing a a function called trainer_fn function that TFX will call.

For our case study we will build a Keras model that will return will return keras.model_to_estimator().

# Setup paths for the Trainer Component.
_trainer_module_file = 'compas_trainer.py'
%%writefile {_trainer_module_file}
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import tensorflow as tf

import tensorflow_model_analysis as tfma
import tensorflow_transform as tft
from tensorflow_transform.tf_metadata import schema_utils

from compas_transform import *

_BATCH_SIZE = 1000
_LEARNING_RATE = 0.00001
_MAX_CHECKPOINTS = 1
_SAVE_CHECKPOINT_STEPS = 999


def transformed_names(keys):
  return [transformed_name(key) for key in keys]


def transformed_name(key):
  return '{}_xf'.format(key)


def _gzip_reader_fn(filenames):
  """Returns a record reader that can read gzip'ed files.

  Args:
    filenames: A tf.string tensor or tf.data.Dataset containing one or more
      filenames.

  Returns: A nested structure of tf.TypeSpec objects matching the structure of
    an element of this dataset and specifying the type of individual components.
  """
  return tf.data.TFRecordDataset(filenames, compression_type='GZIP')


# Tf.Transform considers these features as "raw".
def _get_raw_feature_spec(schema):
  """Generates a feature spec from a Schema proto.

  Args:
    schema: A Schema proto.

  Returns:
    A feature spec defined as a dict whose keys are feature names and values are
      instances of FixedLenFeature, VarLenFeature or SparseFeature.
  """
  return schema_utils.schema_as_feature_spec(schema).feature_spec


def _example_serving_receiver_fn(tf_transform_output, schema):
  """Builds the serving in inputs.

  Args:
    tf_transform_output: A TFTransformOutput.
    schema: the schema of the input data.

  Returns:
    TensorFlow graph which parses examples, applying tf-transform to them.
  """
  raw_feature_spec = _get_raw_feature_spec(schema)
  raw_feature_spec.pop(LABEL_KEY)

  raw_input_fn = tf.estimator.export.build_parsing_serving_input_receiver_fn(
      raw_feature_spec)
  serving_input_receiver = raw_input_fn()

  transformed_features = tf_transform_output.transform_raw_features(
      serving_input_receiver.features)
  transformed_features.pop(transformed_name(LABEL_KEY))
  return tf.estimator.export.ServingInputReceiver(
      transformed_features, serving_input_receiver.receiver_tensors)


def _eval_input_receiver_fn(tf_transform_output, schema):
  """Builds everything needed for the tf-model-analysis to run the model.

  Args:
    tf_transform_output: A TFTransformOutput.
    schema: the schema of the input data.

  Returns:
    EvalInputReceiver function, which contains:

      - TensorFlow graph which parses raw untransformed features, applies the
          tf-transform preprocessing operators.
      - Set of raw, untransformed features.
      - Label against which predictions will be compared.
  """
  # Notice that the inputs are raw features, not transformed features here.
  raw_feature_spec = _get_raw_feature_spec(schema)

  serialized_tf_example = tf.compat.v1.placeholder(
      dtype=tf.string, shape=[None], name='input_example_tensor')

  # Add a parse_example operator to the tensorflow graph, which will parse
  # raw, untransformed, tf examples.
  features = tf.io.parse_example(
      serialized=serialized_tf_example, features=raw_feature_spec)

  transformed_features = tf_transform_output.transform_raw_features(features)
  labels = transformed_features.pop(transformed_name(LABEL_KEY))

  receiver_tensors = {'examples': serialized_tf_example}

  return tfma.export.EvalInputReceiver(
      features=transformed_features,
      receiver_tensors=receiver_tensors,
      labels=labels)


def _input_fn(filenames, tf_transform_output, batch_size=200):
  """Generates features and labels for training or evaluation.

  Args:
    filenames: List of CSV files to read data from.
    tf_transform_output: A TFTransformOutput.
    batch_size: First dimension size of the Tensors returned by input_fn.

  Returns:
    A (features, indices) tuple where features is a dictionary of
      Tensors, and indices is a single Tensor of label indices.
  """
  transformed_feature_spec = (
      tf_transform_output.transformed_feature_spec().copy())

  dataset = tf.compat.v1.data.experimental.make_batched_features_dataset(
      filenames,
      batch_size,
      transformed_feature_spec,
      shuffle=False,
      reader=_gzip_reader_fn)

  transformed_features = dataset.make_one_shot_iterator().get_next()

  # We pop the label because we do not want to use it as a feature while we're
  # training.
  return transformed_features, transformed_features.pop(
      transformed_name(LABEL_KEY))


def _keras_model_builder():
  """Build a keras model for COMPAS dataset classification.
  
  Returns:
    A compiled Keras model.
  """
  feature_columns = []
  feature_layer_inputs = {}

  for key in transformed_names(INT_FEATURE_KEYS):
    feature_columns.append(tf.feature_column.numeric_column(key))
    feature_layer_inputs[key] = tf.keras.Input(shape=(1,), name=key)

  for key, num_buckets in zip(transformed_names(CATEGORICAL_FEATURE_KEYS),
                              MAX_CATEGORICAL_FEATURE_VALUES):
    feature_columns.append(
        tf.feature_column.indicator_column(
            tf.feature_column.categorical_column_with_identity(
                key, num_buckets=num_buckets)))
    feature_layer_inputs[key] = tf.keras.Input(
        shape=(1,), name=key, dtype=tf.dtypes.int32)

  feature_columns_input = tf.keras.layers.DenseFeatures(feature_columns)
  feature_layer_outputs = feature_columns_input(feature_layer_inputs)

  dense_layers = tf.keras.layers.Dense(
      20, activation='relu', name='dense_1')(feature_layer_outputs)
  dense_layers = tf.keras.layers.Dense(
      10, activation='relu', name='dense_2')(dense_layers)
  output = tf.keras.layers.Dense(
      1, name='predictions')(dense_layers)

  model = tf.keras.Model(
      inputs=[v for v in feature_layer_inputs.values()], outputs=output)

  model.compile(
      loss=tf.keras.losses.MeanAbsoluteError(),
      optimizer=tf.optimizers.Adam(learning_rate=_LEARNING_RATE))

  return model


# TFX will call this function.
def trainer_fn(hparams, schema):
  """Build the estimator using the high level API.

  Args:
    hparams: Hyperparameters used to train the model as name/value pairs.
    schema: Holds the schema of the training examples.

  Returns:
    A dict of the following:

      - estimator: The estimator that will be used for training and eval.
      - train_spec: Spec for training.
      - eval_spec: Spec for eval.
      - eval_input_receiver_fn: Input function for eval.
  """
  tf_transform_output = tft.TFTransformOutput(hparams.transform_output)

  train_input_fn = lambda: _input_fn(
      hparams.train_files,
      tf_transform_output,
      batch_size=_BATCH_SIZE)

  eval_input_fn = lambda: _input_fn(
      hparams.eval_files,
      tf_transform_output,
      batch_size=_BATCH_SIZE)

  train_spec = tf.estimator.TrainSpec(
      train_input_fn,
      max_steps=hparams.train_steps)

  serving_receiver_fn = lambda: _example_serving_receiver_fn(
      tf_transform_output, schema)

  exporter = tf.estimator.FinalExporter('compas', serving_receiver_fn)
  eval_spec = tf.estimator.EvalSpec(
      eval_input_fn,
      steps=hparams.eval_steps,
      exporters=[exporter],
      name='compas-eval')

  run_config = tf.estimator.RunConfig(
      save_checkpoints_steps=_SAVE_CHECKPOINT_STEPS,
      keep_checkpoint_max=_MAX_CHECKPOINTS)

  run_config = run_config.replace(model_dir=hparams.serving_model_dir)

  estimator = tf.keras.estimator.model_to_estimator(
      keras_model=_keras_model_builder(), config=run_config)

  # Create an input receiver for TFMA processing.
  receiver_fn = lambda: _eval_input_receiver_fn(tf_transform_output, schema)

  return {
      'estimator': estimator,
      'train_spec': train_spec,
      'eval_spec': eval_spec,
      'eval_input_receiver_fn': receiver_fn
  }
Writing compas_trainer.py

# Uses user-provided Python function that implements a model using TensorFlow's
# Estimators API.
trainer = Trainer(
    module_file=_trainer_module_file,
    transformed_examples=transform.outputs['transformed_examples'],
    schema=infer_schema.outputs['schema'],
    transform_graph=transform.outputs['transform_graph'],
    train_args=trainer_pb2.TrainArgs(num_steps=10000),
    eval_args=trainer_pb2.EvalArgs(num_steps=5000)
)
context.run(trainer)
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_core/python/feature_column/feature_column_v2.py:4267: IndicatorColumn._variable_shape (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version.
Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_core/python/feature_column/feature_column_v2.py:4322: IdentityCategoricalColumn._num_buckets (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version.
Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.
INFO:tensorflow:Using the Keras model provided.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1635: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/5/serving_model_dir', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 999, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 1, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps 999 or save_checkpoints_secs None.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
WARNING:tensorflow:From compas_trainer.py:136: DatasetV1.make_one_shot_iterator (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `for ... in dataset:` to iterate over a dataset. If using `tf.estimator`, return the `Dataset` object directly from your input function. As a last resort, you can use `tf.compat.v1.data.make_one_shot_iterator(dataset)`.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Warm-starting with WarmStartSettings: WarmStartSettings(ckpt_to_initialize_from='/tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/5/serving_model_dir/keras/keras_model.ckpt', vars_to_warm_start='.*', var_name_to_vocab_info={}, var_name_to_prev_var_name={})
INFO:tensorflow:Warm-starting from: /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/5/serving_model_dir/keras/keras_model.ckpt
INFO:tensorflow:Warm-starting variables only in TRAINABLE_VARIABLES.
INFO:tensorflow:Warm-started 6 variables.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/5/serving_model_dir/model.ckpt.
INFO:tensorflow:loss = 0.5342612, step = 0
INFO:tensorflow:global_step/sec: 97.9243
INFO:tensorflow:loss = 0.53287333, step = 100 (1.022 sec)
INFO:tensorflow:global_step/sec: 102.076
INFO:tensorflow:loss = 0.51312083, step = 200 (0.980 sec)
INFO:tensorflow:global_step/sec: 101.858
INFO:tensorflow:loss = 0.4881493, step = 300 (0.982 sec)
INFO:tensorflow:global_step/sec: 101.893
INFO:tensorflow:loss = 0.5433291, step = 400 (0.981 sec)
INFO:tensorflow:global_step/sec: 99.7642
INFO:tensorflow:loss = 0.52852637, step = 500 (1.002 sec)
INFO:tensorflow:global_step/sec: 101.916
INFO:tensorflow:loss = 0.49936834, step = 600 (0.981 sec)
INFO:tensorflow:global_step/sec: 101.868
INFO:tensorflow:loss = 0.4800488, step = 700 (0.982 sec)
INFO:tensorflow:global_step/sec: 102.969
INFO:tensorflow:loss = 0.4775047, step = 800 (0.971 sec)
INFO:tensorflow:global_step/sec: 101.644
INFO:tensorflow:loss = 0.4938972, step = 900 (0.984 sec)
INFO:tensorflow:Saving checkpoints for 999 into /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/5/serving_model_dir/model.ckpt.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2020-06-16T09:07:29Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/5/serving_model_dir/model.ckpt-999
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [500/5000]
INFO:tensorflow:Evaluation [1000/5000]
INFO:tensorflow:Evaluation [1500/5000]
INFO:tensorflow:Evaluation [2000/5000]
INFO:tensorflow:Evaluation [2500/5000]
INFO:tensorflow:Evaluation [3000/5000]
INFO:tensorflow:Evaluation [3500/5000]
INFO:tensorflow:Evaluation [4000/5000]
INFO:tensorflow:Evaluation [4500/5000]
INFO:tensorflow:Evaluation [5000/5000]
INFO:tensorflow:Inference Time : 48.13890s
INFO:tensorflow:Finished evaluation at 2020-06-16-09:08:17
INFO:tensorflow:Saving dict for global step 999: global_step = 999, loss = 0.49387324
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 999: /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/5/serving_model_dir/model.ckpt-999
INFO:tensorflow:global_step/sec: 2.01538
INFO:tensorflow:loss = 0.48940262, step = 1000 (49.618 sec)
INFO:tensorflow:global_step/sec: 101.073
INFO:tensorflow:loss = 0.5083299, step = 1100 (0.990 sec)
INFO:tensorflow:global_step/sec: 100.781
INFO:tensorflow:loss = 0.51749176, step = 1200 (0.992 sec)
INFO:tensorflow:global_step/sec: 97.8131
INFO:tensorflow:loss = 0.52554464, step = 1300 (1.022 sec)
INFO:tensorflow:global_step/sec: 100.316
INFO:tensorflow:loss = 0.4616558, step = 1400 (0.997 sec)
INFO:tensorflow:global_step/sec: 102.187
INFO:tensorflow:loss = 0.5255707, step = 1500 (0.978 sec)
INFO:tensorflow:global_step/sec: 104.339
INFO:tensorflow:loss = 0.5168723, step = 1600 (0.959 sec)
INFO:tensorflow:global_step/sec: 104.489
INFO:tensorflow:loss = 0.49188238, step = 1700 (0.957 sec)
INFO:tensorflow:global_step/sec: 102.893
INFO:tensorflow:loss = 0.4943112, step = 1800 (0.972 sec)
INFO:tensorflow:global_step/sec: 102.76
INFO:tensorflow:loss = 0.47224206, step = 1900 (0.973 sec)
INFO:tensorflow:Saving checkpoints for 1998 into /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/5/serving_model_dir/model.ckpt.
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:global_step/sec: 101.683
INFO:tensorflow:loss = 0.46463668, step = 2000 (0.983 sec)
INFO:tensorflow:global_step/sec: 104.88
INFO:tensorflow:loss = 0.4862053, step = 2100 (0.954 sec)
INFO:tensorflow:global_step/sec: 102.792
INFO:tensorflow:loss = 0.48628467, step = 2200 (0.973 sec)
INFO:tensorflow:global_step/sec: 103.756
INFO:tensorflow:loss = 0.50660515, step = 2300 (0.964 sec)
INFO:tensorflow:global_step/sec: 100.728
INFO:tensorflow:loss = 0.517697, step = 2400 (0.993 sec)
INFO:tensorflow:global_step/sec: 100.99
INFO:tensorflow:loss = 0.4871972, step = 2500 (0.990 sec)
INFO:tensorflow:global_step/sec: 101.21
INFO:tensorflow:loss = 0.47229007, step = 2600 (0.988 sec)
INFO:tensorflow:global_step/sec: 99.2453
INFO:tensorflow:loss = 0.5163679, step = 2700 (1.008 sec)
INFO:tensorflow:global_step/sec: 101.049
INFO:tensorflow:loss = 0.50264055, step = 2800 (0.989 sec)
INFO:tensorflow:global_step/sec: 101.894
INFO:tensorflow:loss = 0.47996846, step = 2900 (0.981 sec)
INFO:tensorflow:Saving checkpoints for 2997 into /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/5/serving_model_dir/model.ckpt.
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:global_step/sec: 100.323
INFO:tensorflow:loss = 0.4581678, step = 3000 (0.996 sec)
INFO:tensorflow:global_step/sec: 104.978
INFO:tensorflow:loss = 0.463873, step = 3100 (0.953 sec)
INFO:tensorflow:global_step/sec: 102.691
INFO:tensorflow:loss = 0.47974345, step = 3200 (0.974 sec)
INFO:tensorflow:global_step/sec: 103.16
INFO:tensorflow:loss = 0.45533055, step = 3300 (0.969 sec)
INFO:tensorflow:global_step/sec: 104.027
INFO:tensorflow:loss = 0.4832232, step = 3400 (0.961 sec)
INFO:tensorflow:global_step/sec: 102.268
INFO:tensorflow:loss = 0.49356607, step = 3500 (0.978 sec)
INFO:tensorflow:global_step/sec: 102.923
INFO:tensorflow:loss = 0.48723742, step = 3600 (0.972 sec)
INFO:tensorflow:global_step/sec: 103.339
INFO:tensorflow:loss = 0.4375402, step = 3700 (0.967 sec)
INFO:tensorflow:global_step/sec: 102.731
INFO:tensorflow:loss = 0.49103037, step = 3800 (0.973 sec)
INFO:tensorflow:global_step/sec: 102.971
INFO:tensorflow:loss = 0.4781995, step = 3900 (0.971 sec)
INFO:tensorflow:Saving checkpoints for 3996 into /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/5/serving_model_dir/model.ckpt.
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:global_step/sec: 101.491
INFO:tensorflow:loss = 0.47222623, step = 4000 (0.985 sec)
INFO:tensorflow:global_step/sec: 103.265
INFO:tensorflow:loss = 0.46628028, step = 4100 (0.968 sec)
INFO:tensorflow:global_step/sec: 103.292
INFO:tensorflow:loss = 0.45644736, step = 4200 (0.968 sec)
INFO:tensorflow:global_step/sec: 102.998
INFO:tensorflow:loss = 0.44622418, step = 4300 (0.971 sec)
INFO:tensorflow:global_step/sec: 101.177
INFO:tensorflow:loss = 0.4538631, step = 4400 (0.988 sec)
INFO:tensorflow:global_step/sec: 102.838
INFO:tensorflow:loss = 0.44961756, step = 4500 (0.972 sec)
INFO:tensorflow:global_step/sec: 101.713
INFO:tensorflow:loss = 0.4616842, step = 4600 (0.983 sec)
INFO:tensorflow:global_step/sec: 101.35
INFO:tensorflow:loss = 0.47677568, step = 4700 (0.987 sec)
INFO:tensorflow:global_step/sec: 102.948
INFO:tensorflow:loss = 0.45817706, step = 4800 (0.971 sec)
INFO:tensorflow:global_step/sec: 103.847
INFO:tensorflow:loss = 0.44255066, step = 4900 (0.963 sec)
INFO:tensorflow:Saving checkpoints for 4995 into /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/5/serving_model_dir/model.ckpt.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py:963: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to delete files with this prefix.
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:global_step/sec: 102.012
INFO:tensorflow:loss = 0.46741074, step = 5000 (0.980 sec)
INFO:tensorflow:global_step/sec: 104.216
INFO:tensorflow:loss = 0.45682105, step = 5100 (0.960 sec)
INFO:tensorflow:global_step/sec: 103.923
INFO:tensorflow:loss = 0.46161708, step = 5200 (0.963 sec)
INFO:tensorflow:global_step/sec: 103.588
INFO:tensorflow:loss = 0.42633456, step = 5300 (0.965 sec)
INFO:tensorflow:global_step/sec: 100.694
INFO:tensorflow:loss = 0.423661, step = 5400 (0.993 sec)
INFO:tensorflow:global_step/sec: 103.331
INFO:tensorflow:loss = 0.44440535, step = 5500 (0.968 sec)
INFO:tensorflow:global_step/sec: 102.536
INFO:tensorflow:loss = 0.42294195, step = 5600 (0.975 sec)
INFO:tensorflow:global_step/sec: 102.382
INFO:tensorflow:loss = 0.4267524, step = 5700 (0.977 sec)
INFO:tensorflow:global_step/sec: 101.686
INFO:tensorflow:loss = 0.4402061, step = 5800 (0.983 sec)
INFO:tensorflow:global_step/sec: 105.268
INFO:tensorflow:loss = 0.4383994, step = 5900 (0.950 sec)
INFO:tensorflow:Saving checkpoints for 5994 into /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/5/serving_model_dir/model.ckpt.
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:global_step/sec: 101.772
INFO:tensorflow:loss = 0.40447557, step = 6000 (0.982 sec)
INFO:tensorflow:global_step/sec: 102.961
INFO:tensorflow:loss = 0.44564992, step = 6100 (0.971 sec)
INFO:tensorflow:global_step/sec: 101.377
INFO:tensorflow:loss = 0.41465515, step = 6200 (0.987 sec)
INFO:tensorflow:global_step/sec: 101.733
INFO:tensorflow:loss = 0.45197776, step = 6300 (0.983 sec)
INFO:tensorflow:global_step/sec: 102.786
INFO:tensorflow:loss = 0.4346048, step = 6400 (0.973 sec)
INFO:tensorflow:global_step/sec: 101.437
INFO:tensorflow:loss = 0.41754076, step = 6500 (0.986 sec)
INFO:tensorflow:global_step/sec: 100.908
INFO:tensorflow:loss = 0.40131918, step = 6600 (0.991 sec)
INFO:tensorflow:global_step/sec: 102.451
INFO:tensorflow:loss = 0.41343847, step = 6700 (0.976 sec)
INFO:tensorflow:global_step/sec: 104.38
INFO:tensorflow:loss = 0.4107401, step = 6800 (0.959 sec)
INFO:tensorflow:global_step/sec: 103.13
INFO:tensorflow:loss = 0.4031917, step = 6900 (0.969 sec)
INFO:tensorflow:Saving checkpoints for 6993 into /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/5/serving_model_dir/model.ckpt.
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:global_step/sec: 103.406
INFO:tensorflow:loss = 0.41888848, step = 7000 (0.967 sec)
INFO:tensorflow:global_step/sec: 103.615
INFO:tensorflow:loss = 0.41445166, step = 7100 (0.965 sec)
INFO:tensorflow:global_step/sec: 104.51
INFO:tensorflow:loss = 0.40979365, step = 7200 (0.957 sec)
INFO:tensorflow:global_step/sec: 102.647
INFO:tensorflow:loss = 0.41860723, step = 7300 (0.974 sec)
INFO:tensorflow:global_step/sec: 102.629
INFO:tensorflow:loss = 0.40912846, step = 7400 (0.974 sec)
INFO:tensorflow:global_step/sec: 103.09
INFO:tensorflow:loss = 0.44190103, step = 7500 (0.970 sec)
INFO:tensorflow:global_step/sec: 103.964
INFO:tensorflow:loss = 0.40039375, step = 7600 (0.962 sec)
INFO:tensorflow:global_step/sec: 101.726
INFO:tensorflow:loss = 0.3894425, step = 7700 (0.983 sec)
INFO:tensorflow:global_step/sec: 101.743
INFO:tensorflow:loss = 0.4058127, step = 7800 (0.983 sec)
INFO:tensorflow:global_step/sec: 102.119
INFO:tensorflow:loss = 0.39976147, step = 7900 (0.979 sec)
INFO:tensorflow:Saving checkpoints for 7992 into /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/5/serving_model_dir/model.ckpt.
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:global_step/sec: 104.217
INFO:tensorflow:loss = 0.3762336, step = 8000 (0.959 sec)
INFO:tensorflow:global_step/sec: 102.768
INFO:tensorflow:loss = 0.39897168, step = 8100 (0.973 sec)
INFO:tensorflow:global_step/sec: 102.205
INFO:tensorflow:loss = 0.40708697, step = 8200 (0.978 sec)
INFO:tensorflow:global_step/sec: 102.714
INFO:tensorflow:loss = 0.3873837, step = 8300 (0.974 sec)
INFO:tensorflow:global_step/sec: 103.64
INFO:tensorflow:loss = 0.41379467, step = 8400 (0.964 sec)
INFO:tensorflow:global_step/sec: 101.537
INFO:tensorflow:loss = 0.38667163, step = 8500 (0.985 sec)
INFO:tensorflow:global_step/sec: 102.601
INFO:tensorflow:loss = 0.42144984, step = 8600 (0.975 sec)
INFO:tensorflow:global_step/sec: 101.125
INFO:tensorflow:loss = 0.40669602, step = 8700 (0.989 sec)
INFO:tensorflow:global_step/sec: 101.603
INFO:tensorflow:loss = 0.38647628, step = 8800 (0.984 sec)
INFO:tensorflow:global_step/sec: 104.351
INFO:tensorflow:loss = 0.37589487, step = 8900 (0.958 sec)
INFO:tensorflow:Saving checkpoints for 8991 into /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/5/serving_model_dir/model.ckpt.
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:global_step/sec: 105.208
INFO:tensorflow:loss = 0.39016503, step = 9000 (0.951 sec)
INFO:tensorflow:global_step/sec: 104.326
INFO:tensorflow:loss = 0.3849474, step = 9100 (0.958 sec)
INFO:tensorflow:global_step/sec: 104.2
INFO:tensorflow:loss = 0.36892012, step = 9200 (0.960 sec)
INFO:tensorflow:global_step/sec: 105.004
INFO:tensorflow:loss = 0.39607334, step = 9300 (0.952 sec)
INFO:tensorflow:global_step/sec: 102.785
INFO:tensorflow:loss = 0.3981924, step = 9400 (0.973 sec)
INFO:tensorflow:global_step/sec: 102.051
INFO:tensorflow:loss = 0.39690372, step = 9500 (0.980 sec)
INFO:tensorflow:global_step/sec: 102.912
INFO:tensorflow:loss = 0.38229468, step = 9600 (0.972 sec)
INFO:tensorflow:global_step/sec: 106.422
INFO:tensorflow:loss = 0.38972497, step = 9700 (0.940 sec)
INFO:tensorflow:global_step/sec: 102.601
INFO:tensorflow:loss = 0.42524213, step = 9800 (0.975 sec)
INFO:tensorflow:global_step/sec: 104.284
INFO:tensorflow:loss = 0.37592337, step = 9900 (0.959 sec)
INFO:tensorflow:Saving checkpoints for 9990 into /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/5/serving_model_dir/model.ckpt.
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:Saving checkpoints for 10000 into /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/5/serving_model_dir/model.ckpt.
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2020-06-16T09:09:45Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/5/serving_model_dir/model.ckpt-10000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [500/5000]
INFO:tensorflow:Evaluation [1000/5000]
INFO:tensorflow:Evaluation [1500/5000]
INFO:tensorflow:Evaluation [2000/5000]
INFO:tensorflow:Evaluation [2500/5000]
INFO:tensorflow:Evaluation [3000/5000]
INFO:tensorflow:Evaluation [3500/5000]
INFO:tensorflow:Evaluation [4000/5000]
INFO:tensorflow:Evaluation [4500/5000]
INFO:tensorflow:Evaluation [5000/5000]
INFO:tensorflow:Inference Time : 47.74377s
INFO:tensorflow:Finished evaluation at 2020-06-16-09:10:33
INFO:tensorflow:Saving dict for global step 10000: global_step = 10000, loss = 0.3880887
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 10000: /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/5/serving_model_dir/model.ckpt-10000
INFO:tensorflow:Performing the final export in the end of training.
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_3:0\022\003sex"

Warning:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_5:0\022\004race"

Warning:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_7:0\022\rc_charge_desc"

Warning:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_9:0\022\017c_charge_degree"

INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Signatures INCLUDED in export for Classify: None
INFO:tensorflow:Signatures INCLUDED in export for Regress: None
INFO:tensorflow:Signatures INCLUDED in export for Predict: ['serving_default']
INFO:tensorflow:Signatures INCLUDED in export for Train: None
INFO:tensorflow:Signatures INCLUDED in export for Eval: None
INFO:tensorflow:Restoring parameters from /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/5/serving_model_dir/model.ckpt-10000
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:Assets written to: /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/5/serving_model_dir/export/compas/temp-b'1592298633'/assets
INFO:tensorflow:SavedModel written to: /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/5/serving_model_dir/export/compas/temp-b'1592298633'/saved_model.pb
INFO:tensorflow:Loss for final step: 0.38644403.
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_3:0\022\003sex"

Warning:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_5:0\022\004race"

Warning:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_7:0\022\rc_charge_desc"

Warning:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_9:0\022\017c_charge_degree"

INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Signatures INCLUDED in export for Classify: None
INFO:tensorflow:Signatures INCLUDED in export for Regress: None
INFO:tensorflow:Signatures INCLUDED in export for Predict: None
INFO:tensorflow:Signatures INCLUDED in export for Train: None
INFO:tensorflow:Signatures INCLUDED in export for Eval: ['eval']
WARNING:tensorflow:Export includes no default signature!
INFO:tensorflow:Restoring parameters from /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/5/serving_model_dir/model.ckpt-10000
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:Assets written to: /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/5/eval_model_dir/temp-b'1592298633'/assets
INFO:tensorflow:SavedModel written to: /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/5/eval_model_dir/temp-b'1592298633'/saved_model.pb

TensorFlow Model Analysis

Now that our model is trained developed and trained within TFX, we can use several additional components within the TFX exosystem to understand our models performance in a little more detail. By looking at different metrics we’re able to get a better picture of how the overall model performs for different slices within our model to make sure our model is not underperforming for any subgroup.

First we'll examine TensorFlow Model Analysis, which is a library for evaluating TensorFlow models. It allows users to evaluate their models on large amounts of data in a distributed manner, using the same metrics defined in their trainer. These metrics can be computed over different slices of data and visualized in a notebook.

For a list of possible metrics that can be added into TensorFlow Model Analysis see here.

# Uses TensorFlow Model Analysis to compute a evaluation statistics over
# features of a model.
model_analyzer = Evaluator(
    examples=example_gen.outputs['examples'],
    model=trainer.outputs['model'],

    eval_config = text_format.Parse("""
      model_specs {
        label_key: 'is_recid'
      }
      metrics_specs {
        metrics {class_name: "BinaryAccuracy"}
        metrics {class_name: "AUC"}
        metrics {
          class_name: "FairnessIndicators"
          config: '{"thresholds": [0.25, 0.5, 0.75]}'
        }
      }
      slicing_specs {
        feature_keys: 'race'
      }
    """, tfma.EvalConfig())
)
context.run(model_analyzer)

Fairness Indicators

Load Fairness Indicators to examine the underlying data.

evaluation_uri = model_analyzer.outputs['output'].get()[0].uri
eval_result = tfma.load_eval_result(evaluation_uri)
tfma.addons.fairness.view.widget_view.render_fairness_indicator(eval_result)
FairnessIndicatorViewer(slicingMetrics=[{'sliceValue': 'Caucasian', 'slice': 'race:Caucasian', 'metrics': {'fa…

Fairness Indicators will allow us to drill down to see the performance of different slices and is designed to support teams in evaluating and improving models for fairness concerns. It enables easy computation of binary and multiclass classifiers and will allow you to evaluate across any size of use case.

We willl load Fairness Indicators into this notebook and analyse the results and take a look at the results. After you have had a moment explored with Fairness Indicators, examine the False Positive Rate and False Negative Rate tabs in the tool. In this case study, we're concerned with trying to reduce the number of false predictions of recidivism, corresponding to the False Positive Rate.

Type I and Type II errors

Within Fairness Indicator tool you'll see two dropdowns options:

  1. A "Baseline" option that is set by column_for_slicing.
  2. A "Thresholds" option that is set by fairness_indicator_thresholds.

“Baseline” is the slice you want to compare all other slices to. Most commonly, it is represented by the overall slice, but can also be one of the specific slices as well.

"Threshold" is a value set within a given binary classification model to indicate where a prediction should be placed. When setting a threshold there are two things you should keep in mind.

  1. Precision: What is the downside if your prediction results in a Type 1 error? In this case study a higher threshold would mean we're predicting more defendants will commit another crime when they actually don't.
  2. Recall: What is the downside of a Type II error? In this case study a higher threshold would mean we're predicting more defendants will not commit another crime when they actually do.

We will set arbitrary thresholds at 0.75 and we will only focus on the fairness metrics for African-American and Caucasian defendants given the small sample sizes for the other races, which aren’t large enough to draw statistically significant conclusions.

The rates of the below might differ slightly based on how the data was shuffled at the beginning of this case study, but take a look at the difference between the data between African-American and Caucasian defendants. At a lower threshold our model is more likely to predict that a Caucasian defended will commit a second crime compared to an African-American defended. However this prediction inverts as we increase our threshold.

  • False Positive Rate @ 0.75
    • African-American: ~30%
      • AUC: 0.71
      • Binary Accuracy: 0.67
    • Caucasian: ~8%
      • AUC: 0.71
      • AUC: 0.67

More information on Type I/II errors and threshold setting can be found here.

ML Metadata

To understand where disparity could be coming from and to take a snapshot of our current model, we can use ML Metadata for recording and retrieving metadata associated with our model. ML Metadata is an integral part of TFX, but is designed so that it can be used independently.

For this case study, we will list all artifacts that we developed previously within this case study. By cycling through the artifacts, executions, and context we will have a high level view of our TFX model to dig into where any potential issues are coming from. This will provide us a baseline overview of how our model was developed and what TFX components helped to develop our initial model.

We will start by first laying out the high level artifacts, execution, and context types in our model.

# Connect to the TFX database.
connection_config = metadata_store_pb2.ConnectionConfig()

connection_config.sqlite.filename_uri = os.path.join(
  context.pipeline_root, 'metadata.sqlite')
store = metadata_store.MetadataStore(connection_config)

def _mlmd_type_to_dataframe(mlmd_type):
  """Helper function to turn MLMD into a Pandas DataFrame.

  Args:
    mlmd_type: Metadata store type.

  Returns:
    DataFrame containing type ID, Name, and Properties.
  """
  pd.set_option('display.max_columns', None)  
  pd.set_option('display.expand_frame_repr', False)

  column_names = ['ID', 'Name', 'Properties']
  df = pd.DataFrame(columns=column_names)
  for a_type in mlmd_type:
    mlmd_row = pd.DataFrame([[a_type.id, a_type.name, a_type.properties]],
                            columns=column_names)
    df = df.append(mlmd_row)
  return df

# ML Metadata stores strong-typed Artifacts, Executions, and Contexts.
# First, we can use type APIs to understand what is defined in ML Metadata
# by the current version of TFX. We'll be able to view all the previous runs
# that created our initial model.
print('Artifact Types:')
display(_mlmd_type_to_dataframe(store.get_artifact_types()))

print('\nExecution Types:')
display(_mlmd_type_to_dataframe(store.get_execution_types()))

print('\nContext Types:')
display(_mlmd_type_to_dataframe(store.get_context_types()))

Artifact Types:


Execution Types:


Context Types:

Identify where the fairness issue could be coming from

For each of the above artifacts, execution, and context types we can use ML Metadata to dig into the attributes and how each part of our ML pipeline was developed.

We'll start by diving into the StatisticsGen to examine the underlying data that we initially fed into the model. By knowing the artifacts within our model we can use ML Metadata and TensorFlow Data Validation to look backward and forward within the model to identify where a potential problem is coming from.

After running the below cell, select Lift (Y=1) in the second chart on the Chart to show tab to see the lift between the different data slices. Within race, the lift for African-American is approximatly 1.08 whereas Caucasian is approximatly 0.86.

statistics_gen = StatisticsGen(
    examples=example_gen.outputs['examples'],
    schema=infer_schema.outputs['schema'],
    stats_options=tfdv.StatsOptions(label_feature='is_recid'))
exec_result = context.run(statistics_gen)

for event in store.get_events_by_execution_ids([exec_result.execution_id]):
  if event.path.steps[0].key == 'statistics':
    statistics_w_schema_uri = store.get_artifacts_by_id([event.artifact_id])[0].uri

model_stats = tfdv.load_statistics(
    os.path.join(statistics_w_schema_uri, 'eval/stats_tfrecord/'))
tfdv.visualize_statistics(model_stats)

Tracking a Model Change

Now that we have an idea on how we could improve the fairness of our model, we will first document our initial run within the ML Metadata for our own record and for anyone else that might review our changes at a future time.

ML Metadata can keep a log of our past models along with any notes that we would like to add between runs. We'll add a simple note on our first run denoting that this run was done on the full COMPAS dataset

_MODEL_NOTE_TO_ADD = 'First model that contains fairness concerns in the model.'

first_trained_model = store.get_artifacts_by_type('Model')[-1]

# Add the two notes above to the ML metadata.
first_trained_model.custom_properties['note'].string_value = _MODEL_NOTE_TO_ADD
store.put_artifacts([first_trained_model])

def _mlmd_model_to_dataframe(model, model_number):
  """Helper function to turn a MLMD modle into a Pandas DataFrame.

  Args:
    model: Metadata store model.
    model_number: Number of model run within ML Metadata.

  Returns:
    DataFrame containing the ML Metadata model.
  """
  pd.set_option('display.max_columns', None)  
  pd.set_option('display.expand_frame_repr', False)

  df = pd.DataFrame()
  custom_properties = ['name', 'note', 'state', 'producer_component',
                       'pipeline_name']
  df['id'] = [model[model_number].id]
  df['uri'] = [model[model_number].uri]
  for prop in custom_properties:
    df[prop] = model[model_number].custom_properties.get(prop)
    df[prop] = df[prop].astype(str).map(
        lambda x: x.lstrip('string_value: "').rstrip('"\n'))
  return df

# Print the current model to see the results of the ML Metadata for the model.
display(_mlmd_model_to_dataframe(store.get_artifacts_by_type('Model'), 0))

Improving fairness concerns by weighting the model

There are several ways we can approach fixing fairness concerns within a model. Manipulating observed data/labels, implementing fairness constraints, or prejudice removal by regularization are some techniques1 that have been used to fix fairness concerns. In this case study we will reweight the model by implementing a custom loss function into Keras.

The code below is the same as the above Transform Component but with the exception of a new class called LogisticEndpoint that we will use for our loss within Keras and a few parameter changes.


  1. Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., Galstyan, N. (2019). A Survey on Bias and Fairness in Machine Learning. https://arxiv.org/pdf/1908.09635.pdf
%%writefile {_trainer_module_file}
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import numpy as np
import tensorflow as tf

import tensorflow_model_analysis as tfma
import tensorflow_transform as tft
from tensorflow_transform.tf_metadata import schema_utils

from compas_transform import *

_BATCH_SIZE = 1000
_LEARNING_RATE = 0.00001
_MAX_CHECKPOINTS = 1
_SAVE_CHECKPOINT_STEPS = 999


def transformed_names(keys):
  return [transformed_name(key) for key in keys]


def transformed_name(key):
  return '{}_xf'.format(key)


def _gzip_reader_fn(filenames):
  """Returns a record reader that can read gzip'ed files.

  Args:
    filenames: A tf.string tensor or tf.data.Dataset containing one or more
      filenames.

  Returns: A nested structure of tf.TypeSpec objects matching the structure of
    an element of this dataset and specifying the type of individual components.
  """
  return tf.data.TFRecordDataset(filenames, compression_type='GZIP')


# Tf.Transform considers these features as "raw".
def _get_raw_feature_spec(schema):
  """Generates a feature spec from a Schema proto.

  Args:
    schema: A Schema proto.

  Returns:
    A feature spec defined as a dict whose keys are feature names and values are
      instances of FixedLenFeature, VarLenFeature or SparseFeature.
  """
  return schema_utils.schema_as_feature_spec(schema).feature_spec


def _example_serving_receiver_fn(tf_transform_output, schema):
  """Builds the serving in inputs.

  Args:
    tf_transform_output: A TFTransformOutput.
    schema: the schema of the input data.

  Returns:
    TensorFlow graph which parses examples, applying tf-transform to them.
  """
  raw_feature_spec = _get_raw_feature_spec(schema)
  raw_feature_spec.pop(LABEL_KEY)

  raw_input_fn = tf.estimator.export.build_parsing_serving_input_receiver_fn(
      raw_feature_spec)
  serving_input_receiver = raw_input_fn()

  transformed_features = tf_transform_output.transform_raw_features(
      serving_input_receiver.features)
  transformed_features.pop(transformed_name(LABEL_KEY))
  return tf.estimator.export.ServingInputReceiver(
      transformed_features, serving_input_receiver.receiver_tensors)


def _eval_input_receiver_fn(tf_transform_output, schema):
  """Builds everything needed for the tf-model-analysis to run the model.

  Args:
    tf_transform_output: A TFTransformOutput.
    schema: the schema of the input data.

  Returns:
    EvalInputReceiver function, which contains:

      - TensorFlow graph which parses raw untransformed features, applies the
          tf-transform preprocessing operators.
      - Set of raw, untransformed features.
      - Label against which predictions will be compared.
  """
  # Notice that the inputs are raw features, not transformed features here.
  raw_feature_spec = _get_raw_feature_spec(schema)

  serialized_tf_example = tf.compat.v1.placeholder(
      dtype=tf.string, shape=[None], name='input_example_tensor')

  # Add a parse_example operator to the tensorflow graph, which will parse
  # raw, untransformed, tf examples.
  features = tf.io.parse_example(
      serialized=serialized_tf_example, features=raw_feature_spec)

  transformed_features = tf_transform_output.transform_raw_features(features)
  labels = transformed_features.pop(transformed_name(LABEL_KEY))

  receiver_tensors = {'examples': serialized_tf_example}

  return tfma.export.EvalInputReceiver(
      features=transformed_features,
      receiver_tensors=receiver_tensors,
      labels=labels)


def _input_fn(filenames, tf_transform_output, batch_size=200):
  """Generates features and labels for training or evaluation.

  Args:
    filenames: List of CSV files to read data from.
    tf_transform_output: A TFTransformOutput.
    batch_size: First dimension size of the Tensors returned by input_fn.

  Returns:
    A (features, indices) tuple where features is a dictionary of
      Tensors, and indices is a single Tensor of label indices.
  """
  transformed_feature_spec = (
      tf_transform_output.transformed_feature_spec().copy())

  dataset = tf.compat.v1.data.experimental.make_batched_features_dataset(
      filenames,
      batch_size,
      transformed_feature_spec,
      shuffle=False,
      reader=_gzip_reader_fn)

  transformed_features = dataset.make_one_shot_iterator().get_next()

  # We pop the label because we do not want to use it as a feature while we're
  # training.
  return transformed_features, transformed_features.pop(
      transformed_name(LABEL_KEY))


# TFX will call this function.
def trainer_fn(hparams, schema):
  """Build the estimator using the high level API.

  Args:
    hparams: Hyperparameters used to train the model as name/value pairs.
    schema: Holds the schema of the training examples.

  Returns:
    A dict of the following:

      - estimator: The estimator that will be used for training and eval.
      - train_spec: Spec for training.
      - eval_spec: Spec for eval.
      - eval_input_receiver_fn: Input function for eval.
  """
  tf_transform_output = tft.TFTransformOutput(hparams.transform_output)

  train_input_fn = lambda: _input_fn(
      hparams.train_files,
      tf_transform_output,
      batch_size=_BATCH_SIZE)

  eval_input_fn = lambda: _input_fn(
      hparams.eval_files,
      tf_transform_output,
      batch_size=_BATCH_SIZE)

  train_spec = tf.estimator.TrainSpec(
      train_input_fn,
      max_steps=hparams.train_steps)

  serving_receiver_fn = lambda: _example_serving_receiver_fn(
      tf_transform_output, schema)

  exporter = tf.estimator.FinalExporter('compas', serving_receiver_fn)
  eval_spec = tf.estimator.EvalSpec(
      eval_input_fn,
      steps=hparams.eval_steps,
      exporters=[exporter],
      name='compas-eval')

  run_config = tf.estimator.RunConfig(
      save_checkpoints_steps=_SAVE_CHECKPOINT_STEPS,
      keep_checkpoint_max=_MAX_CHECKPOINTS)

  run_config = run_config.replace(model_dir=hparams.serving_model_dir)

  estimator = tf.keras.estimator.model_to_estimator(
      keras_model=_keras_model_builder(), config=run_config)

  # Create an input receiver for TFMA processing.
  receiver_fn = lambda: _eval_input_receiver_fn(tf_transform_output, schema)

  return {
      'estimator': estimator,
      'train_spec': train_spec,
      'eval_spec': eval_spec,
      'eval_input_receiver_fn': receiver_fn
  }


def _keras_model_builder():
  """Build a keras model for COMPAS dataset classification.
  
  Returns:
    A compiled Keras model.
  """
  feature_columns = []
  feature_layer_inputs = {}

  for key in transformed_names(INT_FEATURE_KEYS):
    feature_columns.append(tf.feature_column.numeric_column(key))
    feature_layer_inputs[key] = tf.keras.Input(shape=(1,), name=key)

  for key, num_buckets in zip(transformed_names(CATEGORICAL_FEATURE_KEYS),
                              MAX_CATEGORICAL_FEATURE_VALUES):
    feature_columns.append(
        tf.feature_column.indicator_column(
            tf.feature_column.categorical_column_with_identity(
                key, num_buckets=num_buckets)))
    feature_layer_inputs[key] = tf.keras.Input(
        shape=(1,), name=key, dtype=tf.dtypes.int32)

  feature_columns_input = tf.keras.layers.DenseFeatures(feature_columns)
  feature_layer_outputs = feature_columns_input(feature_layer_inputs)

  dense_layers = tf.keras.layers.Dense(
      20, activation='relu', name='dense_1')(feature_layer_outputs)
  dense_layers = tf.keras.layers.Dense(
      10, activation='relu', name='dense_2')(dense_layers)
  output = tf.keras.layers.Dense(
      1, name='predictions')(dense_layers)

  model = tf.keras.Model(
      inputs=[v for v in feature_layer_inputs.values()], outputs=output)

  # To weight our model we will develop a custom loss class within Keras.
  # The old loss is commented out below and the new one is added in below.
  model.compile(
      # loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
      loss=LogisticEndpoint(),
      optimizer=tf.optimizers.Adam(learning_rate=_LEARNING_RATE))

  return model


class LogisticEndpoint(tf.keras.layers.Layer):

  def __init__(self, name=None):
    super(LogisticEndpoint, self).__init__(name=name)
    self.loss_fn = tf.keras.losses.BinaryCrossentropy(from_logits=True)

  def __call__(self, y_true, y_pred, sample_weight=None):
    inputs = [y_true, y_pred]
    inputs += sample_weight or ['sample_weight_xf']
    return super(LogisticEndpoint, self).__call__(inputs)

  def call(self, inputs):
    y_true, y_pred = inputs[0], inputs[1]
    if len(inputs) == 3:
      sample_weight = inputs[2]
    else:
      sample_weight = None
    loss = self.loss_fn(y_true, y_pred, sample_weight)
    self.add_loss(loss)
    reduce_loss = tf.math.divide_no_nan(
        tf.math.reduce_sum(tf.nn.softmax(y_pred)), _BATCH_SIZE)
    return reduce_loss

Overwriting compas_trainer.py

Retrain the TFX model with the weighted model

In this next part we will use the weighted Transform Component to rerun the same Trainer model as before to see the improvement in fairness after the weighting is applied.

trainer_weighted = Trainer(
    module_file=_trainer_module_file,
    transformed_examples=transform.outputs['transformed_examples'],
    schema=infer_schema.outputs['schema'],
    transform_graph=transform.outputs['transform_graph'],
    train_args=trainer_pb2.TrainArgs(num_steps=10000),
    eval_args=trainer_pb2.EvalArgs(num_steps=5000)
)
context.run(trainer_weighted)

INFO:tensorflow:Using the Keras model provided.
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/8/serving_model_dir', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 999, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 1, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps 999 or save_checkpoints_secs None.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Warm-starting with WarmStartSettings: WarmStartSettings(ckpt_to_initialize_from='/tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/8/serving_model_dir/keras/keras_model.ckpt', vars_to_warm_start='.*', var_name_to_vocab_info={}, var_name_to_prev_var_name={})
INFO:tensorflow:Warm-starting from: /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/8/serving_model_dir/keras/keras_model.ckpt
INFO:tensorflow:Warm-starting variables only in TRAINABLE_VARIABLES.
INFO:tensorflow:Warm-started 6 variables.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/8/serving_model_dir/model.ckpt.
INFO:tensorflow:loss = 1.0, step = 0
INFO:tensorflow:global_step/sec: 98.0295
INFO:tensorflow:loss = 1.0, step = 100 (1.021 sec)
INFO:tensorflow:global_step/sec: 104.878
INFO:tensorflow:loss = 1.0, step = 200 (0.954 sec)
INFO:tensorflow:global_step/sec: 103.082
INFO:tensorflow:loss = 1.0, step = 300 (0.970 sec)
INFO:tensorflow:global_step/sec: 102.003
INFO:tensorflow:loss = 1.0, step = 400 (0.980 sec)
INFO:tensorflow:global_step/sec: 102.963
INFO:tensorflow:loss = 1.0, step = 500 (0.971 sec)
INFO:tensorflow:global_step/sec: 104.826
INFO:tensorflow:loss = 1.0, step = 600 (0.954 sec)
INFO:tensorflow:global_step/sec: 102.499
INFO:tensorflow:loss = 1.0, step = 700 (0.975 sec)
INFO:tensorflow:global_step/sec: 104.201
INFO:tensorflow:loss = 1.0, step = 800 (0.960 sec)
INFO:tensorflow:global_step/sec: 103.582
INFO:tensorflow:loss = 1.0, step = 900 (0.966 sec)
INFO:tensorflow:Saving checkpoints for 999 into /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/8/serving_model_dir/model.ckpt.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2020-06-16T09:10:59Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/8/serving_model_dir/model.ckpt-999
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [500/5000]
INFO:tensorflow:Evaluation [1000/5000]
INFO:tensorflow:Evaluation [1500/5000]
INFO:tensorflow:Evaluation [2000/5000]
INFO:tensorflow:Evaluation [2500/5000]
INFO:tensorflow:Evaluation [3000/5000]
INFO:tensorflow:Evaluation [3500/5000]
INFO:tensorflow:Evaluation [4000/5000]
INFO:tensorflow:Evaluation [4500/5000]
INFO:tensorflow:Evaluation [5000/5000]
INFO:tensorflow:Inference Time : 48.41083s
INFO:tensorflow:Finished evaluation at 2020-06-16-09:11:47
INFO:tensorflow:Saving dict for global step 999: global_step = 999, loss = 1.0
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 999: /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/8/serving_model_dir/model.ckpt-999
INFO:tensorflow:global_step/sec: 2.01342
INFO:tensorflow:loss = 1.0, step = 1000 (49.666 sec)
INFO:tensorflow:global_step/sec: 104.112
INFO:tensorflow:loss = 1.0, step = 1100 (0.961 sec)
INFO:tensorflow:global_step/sec: 103.767
INFO:tensorflow:loss = 1.0, step = 1200 (0.964 sec)
INFO:tensorflow:global_step/sec: 102.997
INFO:tensorflow:loss = 1.0, step = 1300 (0.971 sec)
INFO:tensorflow:global_step/sec: 101.926
INFO:tensorflow:loss = 1.0, step = 1400 (0.981 sec)
INFO:tensorflow:global_step/sec: 98.4835
INFO:tensorflow:loss = 1.0, step = 1500 (1.015 sec)
INFO:tensorflow:global_step/sec: 100.378
INFO:tensorflow:loss = 1.0, step = 1600 (0.996 sec)
INFO:tensorflow:global_step/sec: 100.589
INFO:tensorflow:loss = 1.0, step = 1700 (0.994 sec)
INFO:tensorflow:global_step/sec: 100.862
INFO:tensorflow:loss = 1.0, step = 1800 (0.991 sec)
INFO:tensorflow:global_step/sec: 99.0274
INFO:tensorflow:loss = 1.0, step = 1900 (1.010 sec)
INFO:tensorflow:Saving checkpoints for 1998 into /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/8/serving_model_dir/model.ckpt.
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:global_step/sec: 102.481
INFO:tensorflow:loss = 1.0, step = 2000 (0.976 sec)
INFO:tensorflow:global_step/sec: 104.645
INFO:tensorflow:loss = 1.0, step = 2100 (0.956 sec)
INFO:tensorflow:global_step/sec: 100.261
INFO:tensorflow:loss = 1.0, step = 2200 (0.997 sec)
INFO:tensorflow:global_step/sec: 105.615
INFO:tensorflow:loss = 1.0, step = 2300 (0.947 sec)
INFO:tensorflow:global_step/sec: 103.595
INFO:tensorflow:loss = 1.0, step = 2400 (0.965 sec)
INFO:tensorflow:global_step/sec: 106.076
INFO:tensorflow:loss = 1.0, step = 2500 (0.943 sec)
INFO:tensorflow:global_step/sec: 102.719
INFO:tensorflow:loss = 1.0, step = 2600 (0.974 sec)
INFO:tensorflow:global_step/sec: 109.477
INFO:tensorflow:loss = 1.0, step = 2700 (0.913 sec)
INFO:tensorflow:global_step/sec: 107.683
INFO:tensorflow:loss = 1.0, step = 2800 (0.929 sec)
INFO:tensorflow:global_step/sec: 100.729
INFO:tensorflow:loss = 1.0, step = 2900 (0.993 sec)
INFO:tensorflow:Saving checkpoints for 2997 into /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/8/serving_model_dir/model.ckpt.
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:global_step/sec: 99.6147
INFO:tensorflow:loss = 1.0, step = 3000 (1.004 sec)
INFO:tensorflow:global_step/sec: 106.147
INFO:tensorflow:loss = 1.0, step = 3100 (0.943 sec)
INFO:tensorflow:global_step/sec: 103.658
INFO:tensorflow:loss = 1.0, step = 3200 (0.965 sec)
INFO:tensorflow:global_step/sec: 102.585
INFO:tensorflow:loss = 1.0, step = 3300 (0.975 sec)
INFO:tensorflow:global_step/sec: 102.684
INFO:tensorflow:loss = 1.0, step = 3400 (0.973 sec)
INFO:tensorflow:global_step/sec: 103.62
INFO:tensorflow:loss = 1.0, step = 3500 (0.965 sec)
INFO:tensorflow:global_step/sec: 105.377
INFO:tensorflow:loss = 1.0, step = 3600 (0.949 sec)
INFO:tensorflow:global_step/sec: 104.141
INFO:tensorflow:loss = 1.0, step = 3700 (0.960 sec)
INFO:tensorflow:global_step/sec: 101.993
INFO:tensorflow:loss = 1.0, step = 3800 (0.981 sec)
INFO:tensorflow:global_step/sec: 101.802
INFO:tensorflow:loss = 1.0, step = 3900 (0.982 sec)
INFO:tensorflow:Saving checkpoints for 3996 into /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/8/serving_model_dir/model.ckpt.
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:global_step/sec: 99.5101
INFO:tensorflow:loss = 1.0, step = 4000 (1.005 sec)
INFO:tensorflow:global_step/sec: 105.775
INFO:tensorflow:loss = 1.0, step = 4100 (0.946 sec)
INFO:tensorflow:global_step/sec: 103.961
INFO:tensorflow:loss = 1.0, step = 4200 (0.962 sec)
INFO:tensorflow:global_step/sec: 103.67
INFO:tensorflow:loss = 1.0, step = 4300 (0.965 sec)
INFO:tensorflow:global_step/sec: 101.294
INFO:tensorflow:loss = 1.0, step = 4400 (0.987 sec)
INFO:tensorflow:global_step/sec: 101.572
INFO:tensorflow:loss = 1.0, step = 4500 (0.984 sec)
INFO:tensorflow:global_step/sec: 102.567
INFO:tensorflow:loss = 1.0, step = 4600 (0.975 sec)
INFO:tensorflow:global_step/sec: 101.35
INFO:tensorflow:loss = 1.0, step = 4700 (0.987 sec)
INFO:tensorflow:global_step/sec: 103.798
INFO:tensorflow:loss = 1.0, step = 4800 (0.963 sec)
INFO:tensorflow:global_step/sec: 101.807
INFO:tensorflow:loss = 1.0, step = 4900 (0.982 sec)
INFO:tensorflow:Saving checkpoints for 4995 into /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/8/serving_model_dir/model.ckpt.
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:global_step/sec: 101.262
INFO:tensorflow:loss = 1.0, step = 5000 (0.988 sec)
INFO:tensorflow:global_step/sec: 102.754
INFO:tensorflow:loss = 1.0, step = 5100 (0.973 sec)
INFO:tensorflow:global_step/sec: 101.638
INFO:tensorflow:loss = 1.0, step = 5200 (0.984 sec)
INFO:tensorflow:global_step/sec: 102.561
INFO:tensorflow:loss = 1.0, step = 5300 (0.975 sec)
INFO:tensorflow:global_step/sec: 102.365
INFO:tensorflow:loss = 1.0, step = 5400 (0.977 sec)
INFO:tensorflow:global_step/sec: 102.026
INFO:tensorflow:loss = 1.0, step = 5500 (0.980 sec)
INFO:tensorflow:global_step/sec: 103.501
INFO:tensorflow:loss = 1.0, step = 5600 (0.966 sec)
INFO:tensorflow:global_step/sec: 103.587
INFO:tensorflow:loss = 1.0, step = 5700 (0.965 sec)
INFO:tensorflow:global_step/sec: 102.083
INFO:tensorflow:loss = 1.0, step = 5800 (0.980 sec)
INFO:tensorflow:global_step/sec: 104.989
INFO:tensorflow:loss = 1.0, step = 5900 (0.953 sec)
INFO:tensorflow:Saving checkpoints for 5994 into /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/8/serving_model_dir/model.ckpt.
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:global_step/sec: 102.463
INFO:tensorflow:loss = 1.0, step = 6000 (0.976 sec)
INFO:tensorflow:global_step/sec: 102.777
INFO:tensorflow:loss = 1.0, step = 6100 (0.973 sec)
INFO:tensorflow:global_step/sec: 101.608
INFO:tensorflow:loss = 1.0, step = 6200 (0.984 sec)
INFO:tensorflow:global_step/sec: 100.998
INFO:tensorflow:loss = 1.0, step = 6300 (0.990 sec)
INFO:tensorflow:global_step/sec: 104.406
INFO:tensorflow:loss = 1.0, step = 6400 (0.958 sec)
INFO:tensorflow:global_step/sec: 103.868
INFO:tensorflow:loss = 1.0, step = 6500 (0.963 sec)
INFO:tensorflow:global_step/sec: 102.954
INFO:tensorflow:loss = 1.0, step = 6600 (0.971 sec)
INFO:tensorflow:global_step/sec: 103.535
INFO:tensorflow:loss = 1.0, step = 6700 (0.966 sec)
INFO:tensorflow:global_step/sec: 102.72
INFO:tensorflow:loss = 1.0, step = 6800 (0.974 sec)
INFO:tensorflow:global_step/sec: 103.74
INFO:tensorflow:loss = 1.0, step = 6900 (0.964 sec)
INFO:tensorflow:Saving checkpoints for 6993 into /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/8/serving_model_dir/model.ckpt.
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:global_step/sec: 104.564
INFO:tensorflow:loss = 1.0, step = 7000 (0.956 sec)
INFO:tensorflow:global_step/sec: 104.809
INFO:tensorflow:loss = 1.0, step = 7100 (0.955 sec)
INFO:tensorflow:global_step/sec: 102.638
INFO:tensorflow:loss = 1.0, step = 7200 (0.974 sec)
INFO:tensorflow:global_step/sec: 102.659
INFO:tensorflow:loss = 1.0, step = 7300 (0.974 sec)
INFO:tensorflow:global_step/sec: 103.391
INFO:tensorflow:loss = 1.0, step = 7400 (0.967 sec)
INFO:tensorflow:global_step/sec: 102.48
INFO:tensorflow:loss = 1.0, step = 7500 (0.976 sec)
INFO:tensorflow:global_step/sec: 100.341
INFO:tensorflow:loss = 1.0, step = 7600 (0.997 sec)
INFO:tensorflow:global_step/sec: 103.569
INFO:tensorflow:loss = 1.0, step = 7700 (0.965 sec)
INFO:tensorflow:global_step/sec: 103.456
INFO:tensorflow:loss = 1.0, step = 7800 (0.967 sec)
INFO:tensorflow:global_step/sec: 104.836
INFO:tensorflow:loss = 1.0, step = 7900 (0.954 sec)
INFO:tensorflow:Saving checkpoints for 7992 into /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/8/serving_model_dir/model.ckpt.
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:global_step/sec: 106.155
INFO:tensorflow:loss = 1.0, step = 8000 (0.942 sec)
INFO:tensorflow:global_step/sec: 101.01
INFO:tensorflow:loss = 1.0, step = 8100 (0.990 sec)
INFO:tensorflow:global_step/sec: 102.076
INFO:tensorflow:loss = 1.0, step = 8200 (0.980 sec)
INFO:tensorflow:global_step/sec: 101.999
INFO:tensorflow:loss = 1.0, step = 8300 (0.981 sec)
INFO:tensorflow:global_step/sec: 102.207
INFO:tensorflow:loss = 1.0, step = 8400 (0.978 sec)
INFO:tensorflow:global_step/sec: 103.124
INFO:tensorflow:loss = 1.0, step = 8500 (0.970 sec)
INFO:tensorflow:global_step/sec: 103.657
INFO:tensorflow:loss = 1.0, step = 8600 (0.965 sec)
INFO:tensorflow:global_step/sec: 101.446
INFO:tensorflow:loss = 1.0, step = 8700 (0.986 sec)
INFO:tensorflow:global_step/sec: 101.528
INFO:tensorflow:loss = 1.0, step = 8800 (0.985 sec)
INFO:tensorflow:global_step/sec: 101.803
INFO:tensorflow:loss = 1.0, step = 8900 (0.983 sec)
INFO:tensorflow:Saving checkpoints for 8991 into /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/8/serving_model_dir/model.ckpt.
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:global_step/sec: 104.658
INFO:tensorflow:loss = 1.0, step = 9000 (0.955 sec)
INFO:tensorflow:global_step/sec: 102.743
INFO:tensorflow:loss = 1.0, step = 9100 (0.973 sec)
INFO:tensorflow:global_step/sec: 104.382
INFO:tensorflow:loss = 1.0, step = 9200 (0.958 sec)
INFO:tensorflow:global_step/sec: 103.317
INFO:tensorflow:loss = 1.0, step = 9300 (0.968 sec)
INFO:tensorflow:global_step/sec: 100.44
INFO:tensorflow:loss = 1.0, step = 9400 (0.996 sec)
INFO:tensorflow:global_step/sec: 104.215
INFO:tensorflow:loss = 1.0, step = 9500 (0.959 sec)
INFO:tensorflow:global_step/sec: 103.324
INFO:tensorflow:loss = 1.0, step = 9600 (0.968 sec)
INFO:tensorflow:global_step/sec: 105.372
INFO:tensorflow:loss = 1.0, step = 9700 (0.949 sec)
INFO:tensorflow:global_step/sec: 105.583
INFO:tensorflow:loss = 1.0, step = 9800 (0.947 sec)
INFO:tensorflow:global_step/sec: 102.588
INFO:tensorflow:loss = 1.0, step = 9900 (0.975 sec)
INFO:tensorflow:Saving checkpoints for 9990 into /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/8/serving_model_dir/model.ckpt.
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:Saving checkpoints for 10000 into /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/8/serving_model_dir/model.ckpt.
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2020-06-16T09:13:15Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/8/serving_model_dir/model.ckpt-10000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [500/5000]
INFO:tensorflow:Evaluation [1000/5000]
INFO:tensorflow:Evaluation [1500/5000]
INFO:tensorflow:Evaluation [2000/5000]
INFO:tensorflow:Evaluation [2500/5000]
INFO:tensorflow:Evaluation [3000/5000]
INFO:tensorflow:Evaluation [3500/5000]
INFO:tensorflow:Evaluation [4000/5000]
INFO:tensorflow:Evaluation [4500/5000]
INFO:tensorflow:Evaluation [5000/5000]
INFO:tensorflow:Inference Time : 49.02546s
INFO:tensorflow:Finished evaluation at 2020-06-16-09:14:04
INFO:tensorflow:Saving dict for global step 10000: global_step = 10000, loss = 1.0
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 10000: /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/8/serving_model_dir/model.ckpt-10000
INFO:tensorflow:Performing the final export in the end of training.
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_3:0\022\003sex"

Warning:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_5:0\022\004race"

Warning:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_7:0\022\rc_charge_desc"

Warning:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_9:0\022\017c_charge_degree"

INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Signatures INCLUDED in export for Classify: None
INFO:tensorflow:Signatures INCLUDED in export for Regress: None
INFO:tensorflow:Signatures INCLUDED in export for Predict: ['serving_default']
INFO:tensorflow:Signatures INCLUDED in export for Train: None
INFO:tensorflow:Signatures INCLUDED in export for Eval: None
INFO:tensorflow:Restoring parameters from /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/8/serving_model_dir/model.ckpt-10000
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:Assets written to: /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/8/serving_model_dir/export/compas/temp-b'1592298844'/assets
INFO:tensorflow:SavedModel written to: /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/8/serving_model_dir/export/compas/temp-b'1592298844'/saved_model.pb
INFO:tensorflow:Loss for final step: 1.0.
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_3:0\022\003sex"

Warning:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_5:0\022\004race"

Warning:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_7:0\022\rc_charge_desc"

Warning:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_9:0\022\017c_charge_degree"

INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Signatures INCLUDED in export for Classify: None
INFO:tensorflow:Signatures INCLUDED in export for Regress: None
INFO:tensorflow:Signatures INCLUDED in export for Predict: None
INFO:tensorflow:Signatures INCLUDED in export for Train: None
INFO:tensorflow:Signatures INCLUDED in export for Eval: ['eval']
WARNING:tensorflow:Export includes no default signature!
INFO:tensorflow:Restoring parameters from /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/8/serving_model_dir/model.ckpt-10000
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:Assets written to: /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/8/eval_model_dir/temp-b'1592298844'/assets
INFO:tensorflow:SavedModel written to: /tmp/tfx-interactive-2020-06-16T09_06_48.035368-t1obdwcp/Trainer/model/8/eval_model_dir/temp-b'1592298844'/saved_model.pb

# Again, we will run TensorFlow Model Analysis and load Fairness Indicators
# to examine the performance change in our weighted model.
model_analyzer_weighted = Evaluator(
    examples=example_gen.outputs['examples'],
    model=trainer_weighted.outputs['model'],

    eval_config = text_format.Parse("""
      model_specs {
        label_key: 'is_recid'
      }
      metrics_specs {
        metrics {class_name: 'BinaryAccuracy'}
        metrics {class_name: 'AUC'}
        metrics {
          class_name: 'FairnessIndicators'
          config: '{"thresholds": [0.25, 0.5, 0.75]}'
        }
      }
      slicing_specs {
        feature_keys: 'race'
      }
    """, tfma.EvalConfig())
)
context.run(model_analyzer_weighted)
evaluation_uri_weighted = model_analyzer_weighted.outputs['output'].get()[0].uri
eval_result_weighted = tfma.load_eval_result(evaluation_uri_weighted)

multi_eval_results = {
    'Unweighted Model': eval_result,
    'Weighted Model': eval_result_weighted
}
tfma.addons.fairness.view.widget_view.render_fairness_indicator(
    multi_eval_results=multi_eval_results)
FairnessIndicatorViewer(evalName='Unweighted Model', evalNameCompare='Weighted Model', slicingMetrics=[{'slice…

After retraining our results with the weighted model, we can once again look at the fairness metrics to gauge any improvements in the model. This time, however, we will use the model comparison feature within Fairness Indicators to see the difference between the weighted and unweighted model. Although we’re still seeing some fairness concerns with the weighted model, the discrepancy is far less pronounced.

The drawback, however, is that our AUC and binary accuracy has also dropped after weighting the model.

  • False Positive Rate @ 0.75
    • African-American: ~1%
      • AUC: 0.47
      • Binary Accuracy: 0.59
    • Caucasian: ~0%
      • AUC: 0.47
      • Binary Accuracy: 0.58

Examine the data of the second run

Finally, we can visualize the data with TensorFlow Data Validation and overlay the data changes between the two models and add an additional note to the ML Metadata indicating that this model has improved the fairness concerns.

# Pull the URI for the two models that we ran in this case study.
first_model_uri = store.get_artifacts_by_type('ExampleStatistics')[-1].uri
second_model_uri = store.get_artifacts_by_type('ExampleStatistics')[0].uri

# Load the stats for both models.
first_model_uri = tfdv.load_statistics(os.path.join(
    first_model_uri, 'eval/stats_tfrecord/'))
second_model_stats = tfdv.load_statistics(os.path.join(
    second_model_uri, 'eval/stats_tfrecord/'))

# Visualize the statistics between the two models.
tfdv.visualize_statistics(
    lhs_statistics=second_model_stats,
    lhs_name='Sampled Model',
    rhs_statistics=first_model_uri,
    rhs_name='COMPAS Orginal')
# Add a new note within ML Metadata describing the weighted model.
_NOTE_TO_ADD = 'Weighted model between race and is_recid.'

# Pulling the URI for the weighted trained model.
second_trained_model = store.get_artifacts_by_type('Model')[-1]

# Add the note to ML Metadata.
second_trained_model.custom_properties['note'].string_value = _NOTE_TO_ADD
store.put_artifacts([second_trained_model])

display(_mlmd_model_to_dataframe(store.get_artifacts_by_type('Model'), -1))
display(_mlmd_model_to_dataframe(store.get_artifacts_by_type('Model'), 0))

Conclusion

Within this case study we developed a Keras classifier within a TFX pipeline with the COMPAS dataset to examine any fairness concerns within the dataset. After initially developing the TFX, fairness concerns were not immediately apparent until examining the individual slices within our model by our sensitive features --in our case race. After identifying the issues, we were able to track down the source of the fairness issue with TensorFlow DataValidation to identify a method to mitigate the fairness concerns via model weighting while tracking and annotating the changes via ML Metadata. Although we are not able to fully fix all the fairness concerns within the dataset, by adding a note for future developers to follow will allow others to understand and issues we faced while developing this model.

Finally it is important to note that this case study did not fix the fairness issues that are present in the COMPAS dataset. By improving the fairness concerns in the model we also reduced the AUC and accuracy in the performance of the model. What we were able to do, however, was build a model that showcased the fairness concerns and track down where the problems could be coming from by tracking or model's lineage while annotating any model concerns within the metadata.

For more information on the issues that the predicting pre-trial detention can have see the FAT* 2018 talk on "Understanding the Context and Consequences of Pre-trial Detention"