Google I/O is a wrap! Catch up on TensorFlow sessions View sessions

Scikit-Learn Model Card Toolkit Demo

View on Run in Google Colab View on GitHub Download notebook


This notebook demonstrates how to generate a model card using the Model Card Toolkit with a scikit-learn model in a Jupyter/Colab environment. You can learn more about model cards at


We first need to install and import the necessary packages.

Upgrade to Pip 20.2 and Install Packages

pip install --upgrade pip==21.3
pip install -U seaborn scikit-learn model-card-toolkit

Did you restart the runtime?

If you are using Google Colab, the first time that you run the cell above, you must restart the runtime (Runtime > Restart runtime ...).

Import packages

We import necessary packages, including scikit-learn.

from datetime import date
from io import BytesIO
from IPython import display
import model_card_toolkit as mctlib
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import plot_roc_curve, plot_confusion_matrix

import base64
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import uuid

Load data

This example uses the Breast Cancer Wisconsin Diagnostic dataset that scikit-learn can load using the load_breast_cancer() function.

cancer = load_breast_cancer()

X = pd.DataFrame(, columns=cancer.feature_names)
y = pd.Series(

X_train, X_test, y_train, y_test = train_test_split(X, y)
17     0
117    0
195    1
337    0
509    0
dtype: int64

Plot data

We will create several plots from the data that we will include in the model card.

# Utility function that will export a plot to a base-64 encoded string that the model card will accept.

def plot_to_str():
    img = BytesIO()
    plt.savefig(img, format='png')
    return base64.encodebytes(img.getvalue()).decode('utf-8')
# Plot the mean radius feature for both the train and test sets

sns.displot(x=X_train['mean radius'], hue=y_train)
mean_radius_train = plot_to_str()

sns.displot(x=X_test['mean radius'], hue=y_test)
mean_radius_test = plot_to_str()



# Plot the mean texture feature for both the train and test sets

sns.displot(x=X_train['mean texture'], hue=y_train)
mean_texture_train = plot_to_str()

sns.displot(x=X_test['mean texture'], hue=y_test)
mean_texture_test = plot_to_str()



Train model

# Create a classifier and fit the training data

clf = GradientBoostingClassifier().fit(X_train, y_train)

Evaluate model

# Plot a ROC curve

plot_roc_curve(clf, X_test, y_test)
roc_curve = plot_to_str()
/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/sklearn/utils/ FutureWarning: Function plot_roc_curve is deprecated; Function :func:`plot_roc_curve` is deprecated in 1.0 and will be removed in 1.2. Use one of the class methods: :meth:`sklearn.metric.RocCurveDisplay.from_predictions` or :meth:`sklearn.metric.RocCurveDisplay.from_estimator`.
  warnings.warn(msg, category=FutureWarning)


# Plot a confusion matrix

plot_confusion_matrix(clf, X_test, y_test)
confusion_matrix = plot_to_str()
/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/sklearn/utils/ FutureWarning: Function plot_confusion_matrix is deprecated; Function `plot_confusion_matrix` is deprecated in 1.0 and will be removed in 1.2. Use one of the class methods: ConfusionMatrixDisplay.from_predictions or ConfusionMatrixDisplay.from_estimator.
  warnings.warn(msg, category=FutureWarning)


Create a model card

Initialize toolkit and model card

mct = mctlib.ModelCardToolkit()

model_card = mct.scaffold_assets()

Annotate information into model card = 'Breast Cancer Wisconsin (Diagnostic) Dataset'
model_card.model_details.overview = (
    'This model predicts whether breast cancer is benign or malignant based on '
    'image measurements.')
model_card.model_details.owners = [
    mctlib.Owner(name= 'Model Cards Team', contact='')
model_card.model_details.references = [
] = str(uuid.uuid4()) = str(

model_card.considerations.ethical_considerations = [mctlib.Risk(
    name=('Manual selection of image sections to digitize could create '
            'selection bias'),
    mitigation_strategy='Automate the selection process'
model_card.considerations.limitations = [mctlib.Limitation(description='Breast cancer diagnosis')]
model_card.considerations.use_cases = [mctlib.UseCase(description='Breast cancer diagnosis')]
model_card.considerations.users = [mctlib.User(description='Medical professionals'), mctlib.User(description='ML researchers')][0].graphics.description = (
  f'{len(X_train)} rows with {len(X_train.columns)} features')[0].graphics.collection = [
][1].graphics.description = (
  f'{len(X_test)} rows with {len(X_test.columns)} features')[1].graphics.collection = [
] = (
  'ROC curve and confusion matrix') = [


Generate model card

# Return the model card document as an HTML page

html = mct.export_format()