tensorflow/lite/g3doc/tutorials/pose_classification.ipynb

{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "L30JbHSkiVZx"
      },
      "source": [
        "##### Copyright 2024 The AI Edge Authors."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "cellView": "form",
        "id": "ZtimvKLdili0"
      },
      "outputs": [],
      "source": [
        "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "QXdiroR-Ue-Z"
      },
      "source": [
        "# Human Pose Classification with MoveNet and TensorFlow Lite\n",
        "\n",
        "This notebook teaches you how to train a pose classification model using MoveNet and TensorFlow Lite. The result is a new TensorFlow Lite model that accepts the output from the MoveNet model as its input, and outputs a pose classification, such as the name of a yoga pose.\n",
        "\n",
        "The procedure in this notebook consists of 3 parts:\n",
        "* Part 1: Preprocess the pose classification training data into a CSV file that specifies the landmarks (body keypoints) detected by the MoveNet model, along with the ground truth pose labels.\n",
        "* Part 2: Build and train a pose classification model that takes the landmark coordinates from the CSV file as input, and outputs the predicted labels.\n",
        "* Part 3: Convert the pose classification model to TFLite.\n",
        "\n",
        "By default, this notebook uses an image dataset with labeled yoga poses, but we've also included a section in Part 1 where you can upload your own image dataset of poses.\n",
        "\n",
        "\u003ctable class=\"tfo-notebook-buttons\" align=\"left\"\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca target=\"_blank\" href=\"https://www.tensorflow.org/lite/tutorials/pose_classification\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" /\u003eView on TensorFlow.org\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/tensorflow/blob/master/tensorflow/lite/g3doc/tutorials/pose_classification.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" /\u003eRun in Google Colab\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca target=\"_blank\" href=\"https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/g3doc/tutorials/pose_classification.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" /\u003eView source on GitHub\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca href=\"https://storage.googleapis.com/tensorflow_docs/tensorflow/tensorflow/lite/g3doc/tutorials/pose_classification.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/download_logo_32px.png\" /\u003eDownload notebook\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca href=\"https://tfhub.dev/s?q=movenet\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/hub_logo_32px.png\" /\u003eSee TF Hub model\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "\u003c/table\u003e"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "IfQ3xP6-EY5r"
      },
      "source": [
        "## Preparation"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Jpy4A1Vpi9jH"
      },
      "source": [
        "In this section, you'll import the necessary libraries and define several functions to preprocess the training images into a CSV file that contains the landmark coordinates and ground truth labels.\n",
        "\n",
        "Nothing observable happens here, but you can expand the hidden code cells to see the implementation for some of the functions we'll be calling later on.\n",
        "\n",
        "**If you only want to create the CSV file without knowing all the details, just run this section and proceed to Part 1.**"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "PWlbrkMCx-W-"
      },
      "outputs": [],
      "source": [
        "!pip install -q opencv-python"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "KTkttSWnUi1Q"
      },
      "outputs": [],
      "source": [
        "import csv\n",
        "import cv2\n",
        "import itertools\n",
        "import numpy as np\n",
        "import pandas as pd\n",
        "import os\n",
        "import sys\n",
        "import tempfile\n",
        "import tqdm\n",
        "\n",
        "from matplotlib import pyplot as plt\n",
        "from matplotlib.collections import LineCollection\n",
        "\n",
        "import tensorflow as tf\n",
        "import tensorflow_hub as hub\n",
        "from tensorflow import keras\n",
        "\n",
        "from sklearn.model_selection import train_test_split\n",
        "from sklearn.metrics import accuracy_score, classification_report, confusion_matrix"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "KwRwEssyTciI"
      },
      "source": [
        "### Code to run pose estimation using MoveNet"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "cellView": "form",
        "id": "48kW1c2F5l1R"
      },
      "outputs": [],
      "source": [
        "#@title Functions to run pose estimation with MoveNet\n",
        "\n",
        "#@markdown You'll download the MoveNet Thunder model from [TensorFlow Hub](https://www.google.com/url?sa=D\u0026q=https%3A%2F%2Ftfhub.dev%2Fs%3Fq%3Dmovenet), and reuse some inference and visualization logic from the [MoveNet Raspberry Pi (Python)](https://github.com/tensorflow/examples/tree/master/lite/examples/pose_estimation/raspberry_pi) sample app to detect landmarks (ear, nose, wrist etc.) from the input images.\n",
        "\n",
        "#@markdown *Note: You should use the most accurate pose estimation model (i.e. MoveNet Thunder) to detect the keypoints and use them to train the pose classification model to achieve the best accuracy. When running inference, you can use a pose estimation model of your choice (e.g. either MoveNet Lightning or Thunder).*\n",
        "\n",
        "# Download model from TF Hub and check out inference code from GitHub\n",
        "!wget -q -O movenet_thunder.tflite https://tfhub.dev/google/lite-model/movenet/singlepose/thunder/tflite/float16/4?lite-format=tflite\n",
        "!git clone https://github.com/tensorflow/examples.git\n",
        "pose_sample_rpi_path = os.path.join(os.getcwd(), 'examples/lite/examples/pose_estimation/raspberry_pi')\n",
        "sys.path.append(pose_sample_rpi_path)\n",
        "\n",
        "# Load MoveNet Thunder model\n",
        "import utils\n",
        "from data import BodyPart\n",
        "from ml import Movenet\n",
        "movenet = Movenet('movenet_thunder')\n",
        "\n",
        "# Define function to run pose estimation using MoveNet Thunder.\n",
        "# You'll apply MoveNet's cropping algorithm and run inference multiple times on\n",
        "# the input image to improve pose estimation accuracy.\n",
        "def detect(input_tensor, inference_count=3):\n",
        "  \"\"\"Runs detection on an input image.\n",
        " \n",
        "  Args:\n",
        "    input_tensor: A [height, width, 3] Tensor of type tf.float32.\n",
        "      Note that height and width can be anything since the image will be\n",
        "      immediately resized according to the needs of the model within this\n",
        "      function.\n",
        "    inference_count: Number of times the model should run repeatly on the\n",
        "      same input image to improve detection accuracy.\n",
        " \n",
        "  Returns:\n",
        "    A Person entity detected by the MoveNet.SinglePose.\n",
        "  \"\"\"\n",
        "  image_height, image_width, channel = input_tensor.shape\n",
        " \n",
        "  # Detect pose using the full input image\n",
        "  movenet.detect(input_tensor.numpy(), reset_crop_region=True)\n",
        " \n",
        "  # Repeatedly using previous detection result to identify the region of\n",
        "  # interest and only croping that region to improve detection accuracy\n",
        "  for _ in range(inference_count - 1):\n",
        "    person = movenet.detect(input_tensor.numpy(), \n",
        "                            reset_crop_region=False)\n",
        "\n",
        "  return person"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "cellView": "form",
        "id": "fKo0NzwQJ5Rm"
      },
      "outputs": [],
      "source": [
        "#@title Functions to visualize the pose estimation results.\n",
        "\n",
        "def draw_prediction_on_image(\n",
        "    image, person, crop_region=None, close_figure=True,\n",
        "    keep_input_size=False):\n",
        "  \"\"\"Draws the keypoint predictions on image.\n",
        " \n",
        "  Args:\n",
        "    image: An numpy array with shape [height, width, channel] representing the\n",
        "      pixel values of the input image.\n",
        "    person: A person entity returned from the MoveNet.SinglePose model.\n",
        "    close_figure: Whether to close the plt figure after the function returns.\n",
        "    keep_input_size: Whether to keep the size of the input image.\n",
        " \n",
        "  Returns:\n",
        "    An numpy array with shape [out_height, out_width, channel] representing the\n",
        "    image overlaid with keypoint predictions.\n",
        "  \"\"\"\n",
        "  # Draw the detection result on top of the image.\n",
        "  image_np = utils.visualize(image, [person])\n",
        "  \n",
        "  # Plot the image with detection results.\n",
        "  height, width, channel = image.shape\n",
        "  aspect_ratio = float(width) / height\n",
        "  fig, ax = plt.subplots(figsize=(12 * aspect_ratio, 12))\n",
        "  im = ax.imshow(image_np)\n",
        " \n",
        "  if close_figure:\n",
        "    plt.close(fig)\n",
        " \n",
        "  if not keep_input_size:\n",
        "    image_np = utils.keep_aspect_ratio_resizer(image_np, (512, 512))\n",
        "\n",
        "  return image_np"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "cellView": "form",
        "id": "QUkOW_26S6K-"
      },
      "outputs": [],
      "source": [
        "#@title Code to load the images, detect pose landmarks and save them into a CSV file\n",
        "\n",
        "class MoveNetPreprocessor(object):\n",
        "  \"\"\"Helper class to preprocess pose sample images for classification.\"\"\"\n",
        " \n",
        "  def __init__(self,\n",
        "               images_in_folder,\n",
        "               images_out_folder,\n",
        "               csvs_out_path):\n",
        "    \"\"\"Creates a preprocessor to detection pose from images and save as CSV.\n",
        "\n",
        "    Args:\n",
        "      images_in_folder: Path to the folder with the input images. It should\n",
        "        follow this structure:\n",
        "        yoga_poses\n",
        "        |__ downdog\n",
        "            |______ 00000128.jpg\n",
        "            |______ 00000181.bmp\n",
        "            |______ ...\n",
        "        |__ goddess\n",
        "            |______ 00000243.jpg\n",
        "            |______ 00000306.jpg\n",
        "            |______ ...\n",
        "        ...\n",
        "      images_out_folder: Path to write the images overlay with detected\n",
        "        landmarks. These images are useful when you need to debug accuracy\n",
        "        issues.\n",
        "      csvs_out_path: Path to write the CSV containing the detected landmark\n",
        "        coordinates and label of each image that can be used to train a pose\n",
        "        classification model.\n",
        "    \"\"\"\n",
        "    self._images_in_folder = images_in_folder\n",
        "    self._images_out_folder = images_out_folder\n",
        "    self._csvs_out_path = csvs_out_path\n",
        "    self._messages = []\n",
        "\n",
        "    # Create a temp dir to store the pose CSVs per class\n",
        "    self._csvs_out_folder_per_class = tempfile.mkdtemp()\n",
        " \n",
        "    # Get list of pose classes and print image statistics\n",
        "    self._pose_class_names = sorted(\n",
        "        [n for n in os.listdir(self._images_in_folder) if not n.startswith('.')]\n",
        "        )\n",
        "    \n",
        "  def process(self, per_pose_class_limit=None, detection_threshold=0.1):\n",
        "    \"\"\"Preprocesses images in the given folder.\n",
        "    Args:\n",
        "      per_pose_class_limit: Number of images to load. As preprocessing usually\n",
        "        takes time, this parameter can be specified to make the reduce of the\n",
        "        dataset for testing.\n",
        "      detection_threshold: Only keep images with all landmark confidence score\n",
        "        above this threshold.\n",
        "    \"\"\"\n",
        "    # Loop through the classes and preprocess its images\n",
        "    for pose_class_name in self._pose_class_names:\n",
        "      print('Preprocessing', pose_class_name, file=sys.stderr)\n",
        "\n",
        "      # Paths for the pose class.\n",
        "      images_in_folder = os.path.join(self._images_in_folder, pose_class_name)\n",
        "      images_out_folder = os.path.join(self._images_out_folder, pose_class_name)\n",
        "      csv_out_path = os.path.join(self._csvs_out_folder_per_class,\n",
        "                                  pose_class_name + '.csv')\n",
        "      if not os.path.exists(images_out_folder):\n",
        "        os.makedirs(images_out_folder)\n",
        " \n",
        "      # Detect landmarks in each image and write it to a CSV file\n",
        "      with open(csv_out_path, 'w') as csv_out_file:\n",
        "        csv_out_writer = csv.writer(csv_out_file, \n",
        "                                    delimiter=',', \n",
        "                                    quoting=csv.QUOTE_MINIMAL)\n",
        "        # Get list of images\n",
        "        image_names = sorted(\n",
        "            [n for n in os.listdir(images_in_folder) if not n.startswith('.')])\n",
        "        if per_pose_class_limit is not None:\n",
        "          image_names = image_names[:per_pose_class_limit]\n",
        "\n",
        "        valid_image_count = 0\n",
        " \n",
        "        # Detect pose landmarks from each image\n",
        "        for image_name in tqdm.tqdm(image_names):\n",
        "          image_path = os.path.join(images_in_folder, image_name)\n",
        "\n",
        "          try:\n",
        "            image = tf.io.read_file(image_path)\n",
        "            image = tf.io.decode_jpeg(image)\n",
        "          except:\n",
        "            self._messages.append('Skipped ' + image_path + '. Invalid image.')\n",
        "            continue\n",
        "          else:\n",
        "            image = tf.io.read_file(image_path)\n",
        "            image = tf.io.decode_jpeg(image)\n",
        "            image_height, image_width, channel = image.shape\n",
        "          \n",
        "          # Skip images that isn't RGB because Movenet requires RGB images\n",
        "          if channel != 3:\n",
        "            self._messages.append('Skipped ' + image_path +\n",
        "                                  '. Image isn\\'t in RGB format.')\n",
        "            continue\n",
        "          person = detect(image)\n",
        "          \n",
        "          # Save landmarks if all landmarks were detected\n",
        "          min_landmark_score = min(\n",
        "              [keypoint.score for keypoint in person.keypoints])\n",
        "          should_keep_image = min_landmark_score \u003e= detection_threshold\n",
        "          if not should_keep_image:\n",
        "            self._messages.append('Skipped ' + image_path +\n",
        "                                  '. No pose was confidentlly detected.')\n",
        "            continue\n",
        "\n",
        "          valid_image_count += 1\n",
        "\n",
        "          # Draw the prediction result on top of the image for debugging later\n",
        "          output_overlay = draw_prediction_on_image(\n",
        "              image.numpy().astype(np.uint8), person, \n",
        "              close_figure=True, keep_input_size=True)\n",
        "        \n",
        "          # Write detection result into an image file\n",
        "          output_frame = cv2.cvtColor(output_overlay, cv2.COLOR_RGB2BGR)\n",
        "          cv2.imwrite(os.path.join(images_out_folder, image_name), output_frame)\n",
        "        \n",
        "          # Get landmarks and scale it to the same size as the input image\n",
        "          pose_landmarks = np.array(\n",
        "              [[keypoint.coordinate.x, keypoint.coordinate.y, keypoint.score]\n",
        "                for keypoint in person.keypoints],\n",
        "              dtype=np.float32)\n",
        "\n",
        "          # Write the landmark coordinates to its per-class CSV file\n",
        "          coordinates = pose_landmarks.flatten().astype(np.str).tolist()\n",
        "          csv_out_writer.writerow([image_name] + coordinates)\n",
        "\n",
        "        if not valid_image_count:\n",
        "          raise RuntimeError(\n",
        "              'No valid images found for the \"{}\" class.'\n",
        "              .format(pose_class_name))\n",
        "      \n",
        "    # Print the error message collected during preprocessing.\n",
        "    print('\\n'.join(self._messages))\n",
        "\n",
        "    # Combine all per-class CSVs into a single output file\n",
        "    all_landmarks_df = self._all_landmarks_as_dataframe()\n",
        "    all_landmarks_df.to_csv(self._csvs_out_path, index=False)\n",
        "\n",
        "  def class_names(self):\n",
        "    \"\"\"List of classes found in the training dataset.\"\"\"\n",
        "    return self._pose_class_names\n",
        "  \n",
        "  def _all_landmarks_as_dataframe(self):\n",
        "    \"\"\"Merge all per-class CSVs into a single dataframe.\"\"\"\n",
        "    total_df = None\n",
        "    for class_index, class_name in enumerate(self._pose_class_names):\n",
        "      csv_out_path = os.path.join(self._csvs_out_folder_per_class,\n",
        "                                  class_name + '.csv')\n",
        "      per_class_df = pd.read_csv(csv_out_path, header=None)\n",
        "      \n",
        "      # Add the labels\n",
        "      per_class_df['class_no'] = [class_index]*len(per_class_df)\n",
        "      per_class_df['class_name'] = [class_name]*len(per_class_df)\n",
        "\n",
        "      # Append the folder name to the filename column (first column)\n",
        "      per_class_df[per_class_df.columns[0]] = (os.path.join(class_name, '') \n",
        "        + per_class_df[per_class_df.columns[0]].astype(str))\n",
        "\n",
        "      if total_df is None:\n",
        "        # For the first class, assign its data to the total dataframe\n",
        "        total_df = per_class_df\n",
        "      else:\n",
        "        # Concatenate each class's data into the total dataframe\n",
        "        total_df = pd.concat([total_df, per_class_df], axis=0)\n",
        " \n",
        "    list_name = [[bodypart.name + '_x', bodypart.name + '_y', \n",
        "                  bodypart.name + '_score'] for bodypart in BodyPart] \n",
        "    header_name = []\n",
        "    for columns_name in list_name:\n",
        "      header_name += columns_name\n",
        "    header_name = ['file_name'] + header_name\n",
        "    header_map = {total_df.columns[i]: header_name[i] \n",
        "                  for i in range(len(header_name))}\n",
        " \n",
        "    total_df.rename(header_map, axis=1, inplace=True)\n",
        "\n",
        "    return total_df"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "cellView": "form",
        "id": "LB3QIVrdU108"
      },
      "outputs": [],
      "source": [
        "#@title (Optional) Code snippet to try out the Movenet pose estimation logic\n",
        "\n",
        "#@markdown You can download an image from the internet, run the pose estimation logic on it and plot the detected landmarks on top of the input image. \n",
        "\n",
        "#@markdown *Note: This code snippet is also useful for debugging when you encounter an image with bad pose classification accuracy. You can run pose estimation on the image and see if the detected landmarks look correct or not before investigating the pose classification logic.*\n",
        "\n",
        "test_image_url = \"https://cdn.pixabay.com/photo/2017/03/03/17/30/yoga-2114512_960_720.jpg\" #@param {type:\"string\"}\n",
        "!wget -O /tmp/image.jpeg {test_image_url}\n",
        "\n",
        "if len(test_image_url):\n",
        "  image = tf.io.read_file('/tmp/image.jpeg')\n",
        "  image = tf.io.decode_jpeg(image)\n",
        "  person = detect(image)\n",
        "  _ = draw_prediction_on_image(image.numpy(), person, crop_region=None, \n",
        "                               close_figure=False, keep_input_size=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "L24GWhgo4WAl"
      },
      "source": [
        "## Part 1: Preprocess the input images\n",
        "\n",
        "Because the input for our pose classifier is the *output* landmarks from the MoveNet model, we need to generate our training dataset by running labeled images through MoveNet and then capturing all the landmark data and ground truth labels into a CSV file.\n",
        "\n",
        "The dataset we've provided for this tutorial is a CG-generated yoga pose dataset. It contains images of multiple CG-generated models doing 5 different yoga poses. The directory is already split into a `train` dataset and a `test` dataset.\n",
        "\n",
        "So in this section, we'll download the yoga dataset and run it through MoveNet so we can capture all the landmarks into a CSV file... **However, it takes about 15 minutes to feed our yoga dataset to MoveNet and generate this CSV file**. So as an alternative, you can download a pre-existing CSV file for the yoga dataset by setting `is_skip_step_1` parameter below to **True**. That way, you'll skip this step and instead download the same CSV file that will be created in this preprocessing step.\n",
        "\n",
        "On the other hand, if you want to train the pose classifier with your own image dataset, you need to upload your images and run this preprocessing step (leave `is_skip_step_1` **False**)—follow the instructions below to upload your own pose dataset."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "cellView": "form",
        "id": "Kw6jwOFD40Fr"
      },
      "outputs": [],
      "source": [
        "is_skip_step_1 = False #@param [\"False\", \"True\"] {type:\"raw\"}"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "TJXSR2CQhm-z"
      },
      "source": [
        "### (Optional) Upload your own pose dataset"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "cellView": "form",
        "id": "iEnjgeKeS_VP"
      },
      "outputs": [],
      "source": [
        "use_custom_dataset = False #@param [\"False\", \"True\"] {type:\"raw\"}\n",
        "\n",
        "dataset_is_split = False #@param [\"False\", \"True\"] {type:\"raw\"}"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "YiqF3sRf3LLC"
      },
      "source": [
        "If you want to train the pose classifier with your own labeled poses (they can be any poses, not just yoga poses), follow these steps:\n",
        "\n",
        "1. Set the above `use_custom_dataset` option to **True**.\n",
        "\n",
        "2. Prepare an archive file (ZIP, TAR, or other) that includes a folder with your images dataset. The folder must include sorted images of your poses as follows.\n",
        "\n",
        "  If you've already split your dataset into train and test sets, then set `dataset_is_split` to **True**. That is, your images folder must include \"train\" and \"test\" directories like this:\n",
        "\n",
        "    ```\n",
        "    yoga_poses/\n",
        "    |__ train/\n",
        "        |__ downdog/\n",
        "            |______ 00000128.jpg\n",
        "            |______ ...\n",
        "    |__ test/\n",
        "        |__ downdog/\n",
        "            |______ 00000181.jpg\n",
        "            |______ ...\n",
        "    ```\n",
        "\n",
        "    Or, if your dataset is NOT split yet, then set\n",
        "    `dataset_is_split` to **False** and we'll split it up based\n",
        "    on a specified split fraction. That is, your uploaded images\n",
        "    folder should look like this:\n",
        "\n",
        "    ```\n",
        "    yoga_poses/\n",
        "    |__ downdog/\n",
        "        |______ 00000128.jpg\n",
        "        |______ 00000181.jpg\n",
        "        |______ ...\n",
        "    |__ goddess/\n",
        "        |______ 00000243.jpg\n",
        "        |______ 00000306.jpg\n",
        "        |______ ...\n",
        "    ```\n",
        "3. Click the **Files** tab on the left (folder icon) and then click **Upload to session storage** (file icon).\n",
        "4. Select your archive file and wait until it finishes uploading before you proceed.\n",
        "5. Edit the following code block to specify the name of your archive file and images directory. (By default, we expect a ZIP file, so you'll need to also modify that part if your archive is another format.)\n",
        "6. Now run the rest of the notebook."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "cellView": "form",
        "id": "joAHy_r62dsI"
      },
      "outputs": [],
      "source": [
        "#@markdown Be sure you run this cell. It's hiding the `split_into_train_test()` function that's called in the next code block.\n",
        "\n",
        "import os\n",
        "import random\n",
        "import shutil\n",
        "\n",
        "def split_into_train_test(images_origin, images_dest, test_split):\n",
        "  \"\"\"Splits a directory of sorted images into training and test sets.\n",
        "\n",
        "  Args:\n",
        "    images_origin: Path to the directory with your images. This directory\n",
        "      must include subdirectories for each of your labeled classes. For example:\n",
        "      yoga_poses/\n",
        "      |__ downdog/\n",
        "          |______ 00000128.jpg\n",
        "          |______ 00000181.jpg\n",
        "          |______ ...\n",
        "      |__ goddess/\n",
        "          |______ 00000243.jpg\n",
        "          |______ 00000306.jpg\n",
        "          |______ ...\n",
        "      ...\n",
        "    images_dest: Path to a directory where you want the split dataset to be\n",
        "      saved. The results looks like this:\n",
        "      split_yoga_poses/\n",
        "      |__ train/\n",
        "          |__ downdog/\n",
        "              |______ 00000128.jpg\n",
        "              |______ ...\n",
        "      |__ test/\n",
        "          |__ downdog/\n",
        "              |______ 00000181.jpg\n",
        "              |______ ...\n",
        "    test_split: Fraction of data to reserve for test (float between 0 and 1).\n",
        "  \"\"\"\n",
        "  _, dirs, _ = next(os.walk(images_origin))\n",
        "\n",
        "  TRAIN_DIR = os.path.join(images_dest, 'train')\n",
        "  TEST_DIR = os.path.join(images_dest, 'test')\n",
        "  os.makedirs(TRAIN_DIR, exist_ok=True)\n",
        "  os.makedirs(TEST_DIR, exist_ok=True)\n",
        "\n",
        "  for dir in dirs:\n",
        "    # Get all filenames for this dir, filtered by filetype\n",
        "    filenames = os.listdir(os.path.join(images_origin, dir))\n",
        "    filenames = [os.path.join(images_origin, dir, f) for f in filenames if (\n",
        "        f.endswith('.png') or f.endswith('.jpg') or f.endswith('.jpeg') or f.endswith('.bmp'))]\n",
        "    # Shuffle the files, deterministically\n",
        "    filenames.sort()\n",
        "    random.seed(42)\n",
        "    random.shuffle(filenames)\n",
        "    # Divide them into train/test dirs\n",
        "    os.makedirs(os.path.join(TEST_DIR, dir), exist_ok=True)\n",
        "    os.makedirs(os.path.join(TRAIN_DIR, dir), exist_ok=True)\n",
        "    test_count = int(len(filenames) * test_split)\n",
        "    for i, file in enumerate(filenames):\n",
        "      if i \u003c test_count:\n",
        "        destination = os.path.join(TEST_DIR, dir, os.path.split(file)[1])\n",
        "      else:\n",
        "        destination = os.path.join(TRAIN_DIR, dir, os.path.split(file)[1])\n",
        "      shutil.copyfile(file, destination)\n",
        "    print(f'Moved {test_count} of {len(filenames)} from class \"{dir}\" into test.')\n",
        "  print(f'Your split dataset is in \"{images_dest}\"')"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "IfpNIjAmR0lp"
      },
      "outputs": [],
      "source": [
        "if use_custom_dataset:\n",
        "  # ATTENTION:\n",
        "  # You must edit these two lines to match your archive and images folder name:\n",
        "  # !tar -xf YOUR_DATASET_ARCHIVE_NAME.tar\n",
        "  !unzip -q YOUR_DATASET_ARCHIVE_NAME.zip\n",
        "  dataset_in = 'YOUR_DATASET_DIR_NAME'\n",
        "\n",
        "  # You can leave the rest alone:\n",
        "  if not os.path.isdir(dataset_in):\n",
        "    raise Exception(\"dataset_in is not a valid directory\")\n",
        "  if dataset_is_split:\n",
        "    IMAGES_ROOT = dataset_in\n",
        "  else:\n",
        "    dataset_out = 'split_' + dataset_in\n",
        "    split_into_train_test(dataset_in, dataset_out, test_split=0.2)\n",
        "    IMAGES_ROOT = dataset_out"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "IPkTA5-sNF7W"
      },
      "source": [
        "**Note:** If you're using `split_into_train_test()` to split the dataset, it expects all images to be PNG, JPEG, or BMP—it ignores other file types."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "dcoak0QHW5d1"
      },
      "source": [
        "### Download the yoga dataset"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "GVpOi5Hr4Xxt"
      },
      "outputs": [],
      "source": [
        "if not is_skip_step_1 and not use_custom_dataset:\n",
        "  !wget -O yoga_poses.zip http://download.tensorflow.org/data/pose_classification/yoga_poses.zip\n",
        "  !unzip -q yoga_poses.zip -d yoga_cg\n",
        "  IMAGES_ROOT = \"yoga_cg\""
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "vxOkXvm-TvOZ"
      },
      "source": [
        "### Preprocess the `TRAIN` dataset"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "OsdqxGfxTE2H"
      },
      "outputs": [],
      "source": [
        "if not is_skip_step_1:\n",
        "  images_in_train_folder = os.path.join(IMAGES_ROOT, 'train')\n",
        "  images_out_train_folder = 'poses_images_out_train'\n",
        "  csvs_out_train_path = 'train_data.csv'\n",
        "\n",
        "  preprocessor = MoveNetPreprocessor(\n",
        "      images_in_folder=images_in_train_folder,\n",
        "      images_out_folder=images_out_train_folder,\n",
        "      csvs_out_path=csvs_out_train_path,\n",
        "  )\n",
        "\n",
        "  preprocessor.process(per_pose_class_limit=None)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "cQtgAeHVT0UE"
      },
      "source": [
        "### Preprocess the `TEST` dataset"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "hddKVPjrTNbt"
      },
      "outputs": [],
      "source": [
        "if not is_skip_step_1:\n",
        "  images_in_test_folder = os.path.join(IMAGES_ROOT, 'test')\n",
        "  images_out_test_folder = 'poses_images_out_test'\n",
        "  csvs_out_test_path = 'test_data.csv'\n",
        "\n",
        "  preprocessor = MoveNetPreprocessor(\n",
        "      images_in_folder=images_in_test_folder,\n",
        "      images_out_folder=images_out_test_folder,\n",
        "      csvs_out_path=csvs_out_test_path,\n",
        "  )\n",
        "\n",
        "  preprocessor.process(per_pose_class_limit=None)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "UevEKViRT_6J"
      },
      "source": [
        "## Part 2: Train a pose classification model that takes the landmark coordinates as input, and output the predicted labels.\n",
        "\n",
        "You'll build a TensorFlow model that takes the landmark coordinates and predicts the pose class that the person in the input image performs. The model consists of two submodels:\n",
        "\n",
        "* Submodel 1 calculates a pose embedding (a.k.a feature vector) from the detected landmark coordinates.\n",
        "* Submodel 2 feeds pose embedding through several `Dense` layer to predict the pose class.\n",
        "\n",
        "You'll then train the model based on the dataset that were preprocessed in part 1."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "E2D1czPJazvb"
      },
      "source": [
        "### (Optional) Download the preprocessed dataset if you didn't run part 1"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "ShpOD7yb4MRp"
      },
      "outputs": [],
      "source": [
        "# Download the preprocessed CSV files which are the same as the output of step 1\n",
        "if is_skip_step_1:\n",
        "  !wget -O train_data.csv http://download.tensorflow.org/data/pose_classification/yoga_train_data.csv\n",
        "  !wget -O test_data.csv http://download.tensorflow.org/data/pose_classification/yoga_test_data.csv\n",
        "\n",
        "  csvs_out_train_path = 'train_data.csv'\n",
        "  csvs_out_test_path = 'test_data.csv'\n",
        "  is_skipped_step_1 = True"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "iGMcoSwLwRSD"
      },
      "source": [
        "### Load the preprocessed CSVs into `TRAIN` and `TEST` datasets."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "pOUcc8EL5rrj"
      },
      "outputs": [],
      "source": [
        "def load_pose_landmarks(csv_path):\n",
        "  \"\"\"Loads a CSV created by MoveNetPreprocessor.\n",
        "  \n",
        "  Returns:\n",
        "    X: Detected landmark coordinates and scores of shape (N, 17 * 3)\n",
        "    y: Ground truth labels of shape (N, label_count)\n",
        "    classes: The list of all class names found in the dataset\n",
        "    dataframe: The CSV loaded as a Pandas dataframe features (X) and ground\n",
        "      truth labels (y) to use later to train a pose classification model.\n",
        "  \"\"\"\n",
        "\n",
        "  # Load the CSV file\n",
        "  dataframe = pd.read_csv(csv_path)\n",
        "  df_to_process = dataframe.copy()\n",
        "\n",
        "  # Drop the file_name columns as you don't need it during training.\n",
        "  df_to_process.drop(columns=['file_name'], inplace=True)\n",
        "\n",
        "  # Extract the list of class names\n",
        "  classes = df_to_process.pop('class_name').unique()\n",
        "\n",
        "  # Extract the labels\n",
        "  y = df_to_process.pop('class_no')\n",
        "\n",
        "  # Convert the input features and labels into the correct format for training.\n",
        "  X = df_to_process.astype('float64')\n",
        "  y = keras.utils.to_categorical(y)\n",
        "\n",
        "  return X, y, classes, dataframe"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "UMrLzfPz7E1U"
      },
      "source": [
        "Load and split the original `TRAIN` dataset into `TRAIN` (85% of the data) and `VALIDATE` (the remaining 15%)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "xawmSDGXUUzW"
      },
      "outputs": [],
      "source": [
        "# Load the train data\n",
        "X, y, class_names, _ = load_pose_landmarks(csvs_out_train_path)\n",
        "\n",
        "# Split training data (X, y) into (X_train, y_train) and (X_val, y_val)\n",
        "X_train, X_val, y_train, y_val = train_test_split(X, y,\n",
        "                                                  test_size=0.15)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "R42kicUMaTX0"
      },
      "outputs": [],
      "source": [
        "# Load the test data\n",
        "X_test, y_test, _, df_test = load_pose_landmarks(csvs_out_test_path)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ydb-bd_UWXMq"
      },
      "source": [
        "### Define functions to convert the pose landmarks to a pose embedding (a.k.a. feature vector) for pose classification\n",
        "\n",
        "Next, convert the landmark coordinates to a feature vector by:\n",
        "1. Moving the pose center to the origin.\n",
        "2. Scaling the pose so that the pose size becomes 1\n",
        "3. Flattening these coordinates into a feature vector\n",
        "\n",
        "Then use this feature vector to train a neural-network based pose classifier."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "HgQMdfeT65Z5"
      },
      "outputs": [],
      "source": [
        "def get_center_point(landmarks, left_bodypart, right_bodypart):\n",
        "  \"\"\"Calculates the center point of the two given landmarks.\"\"\"\n",
        "\n",
        "  left = tf.gather(landmarks, left_bodypart.value, axis=1)\n",
        "  right = tf.gather(landmarks, right_bodypart.value, axis=1)\n",
        "  center = left * 0.5 + right * 0.5\n",
        "  return center\n",
        "\n",
        "\n",
        "def get_pose_size(landmarks, torso_size_multiplier=2.5):\n",
        "  \"\"\"Calculates pose size.\n",
        "\n",
        "  It is the maximum of two values:\n",
        "    * Torso size multiplied by `torso_size_multiplier`\n",
        "    * Maximum distance from pose center to any pose landmark\n",
        "  \"\"\"\n",
        "  # Hips center\n",
        "  hips_center = get_center_point(landmarks, BodyPart.LEFT_HIP, \n",
        "                                 BodyPart.RIGHT_HIP)\n",
        "\n",
        "  # Shoulders center\n",
        "  shoulders_center = get_center_point(landmarks, BodyPart.LEFT_SHOULDER,\n",
        "                                      BodyPart.RIGHT_SHOULDER)\n",
        "\n",
        "  # Torso size as the minimum body size\n",
        "  torso_size = tf.linalg.norm(shoulders_center - hips_center)\n",
        "\n",
        "  # Pose center\n",
        "  pose_center_new = get_center_point(landmarks, BodyPart.LEFT_HIP, \n",
        "                                     BodyPart.RIGHT_HIP)\n",
        "  pose_center_new = tf.expand_dims(pose_center_new, axis=1)\n",
        "  # Broadcast the pose center to the same size as the landmark vector to\n",
        "  # perform substraction\n",
        "  pose_center_new = tf.broadcast_to(pose_center_new,\n",
        "                                    [tf.size(landmarks) // (17*2), 17, 2])\n",
        "\n",
        "  # Dist to pose center\n",
        "  d = tf.gather(landmarks - pose_center_new, 0, axis=0,\n",
        "                name=\"dist_to_pose_center\")\n",
        "  # Max dist to pose center\n",
        "  max_dist = tf.reduce_max(tf.linalg.norm(d, axis=0))\n",
        "\n",
        "  # Normalize scale\n",
        "  pose_size = tf.maximum(torso_size * torso_size_multiplier, max_dist)\n",
        "\n",
        "  return pose_size\n",
        "\n",
        "\n",
        "def normalize_pose_landmarks(landmarks):\n",
        "  \"\"\"Normalizes the landmarks translation by moving the pose center to (0,0) and\n",
        "  scaling it to a constant pose size.\n",
        "  \"\"\"\n",
        "  # Move landmarks so that the pose center becomes (0,0)\n",
        "  pose_center = get_center_point(landmarks, BodyPart.LEFT_HIP, \n",
        "                                 BodyPart.RIGHT_HIP)\n",
        "  pose_center = tf.expand_dims(pose_center, axis=1)\n",
        "  # Broadcast the pose center to the same size as the landmark vector to perform\n",
        "  # substraction\n",
        "  pose_center = tf.broadcast_to(pose_center, \n",
        "                                [tf.size(landmarks) // (17*2), 17, 2])\n",
        "  landmarks = landmarks - pose_center\n",
        "\n",
        "  # Scale the landmarks to a constant pose size\n",
        "  pose_size = get_pose_size(landmarks)\n",
        "  landmarks /= pose_size\n",
        "\n",
        "  return landmarks\n",
        "\n",
        "\n",
        "def landmarks_to_embedding(landmarks_and_scores):\n",
        "  \"\"\"Converts the input landmarks into a pose embedding.\"\"\"\n",
        "  # Reshape the flat input into a matrix with shape=(17, 3)\n",
        "  reshaped_inputs = keras.layers.Reshape((17, 3))(landmarks_and_scores)\n",
        "\n",
        "  # Normalize landmarks 2D\n",
        "  landmarks = normalize_pose_landmarks(reshaped_inputs[:, :, :2])\n",
        "\n",
        "  # Flatten the normalized landmark coordinates into a vector\n",
        "  embedding = keras.layers.Flatten()(landmarks)\n",
        "\n",
        "  return embedding"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "PI7Wb3Bagau3"
      },
      "source": [
        "### Define a Keras model for pose classification\n",
        "\n",
        "Our Keras model takes the detected pose landmarks, then calculates the pose embedding and predicts the pose class."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "1Pte6b1bgWKv"
      },
      "outputs": [],
      "source": [
        "# Define the model\n",
        "inputs = tf.keras.Input(shape=(51))\n",
        "embedding = landmarks_to_embedding(inputs)\n",
        "\n",
        "layer = keras.layers.Dense(128, activation=tf.nn.relu6)(embedding)\n",
        "layer = keras.layers.Dropout(0.5)(layer)\n",
        "layer = keras.layers.Dense(64, activation=tf.nn.relu6)(layer)\n",
        "layer = keras.layers.Dropout(0.5)(layer)\n",
        "outputs = keras.layers.Dense(len(class_names), activation=\"softmax\")(layer)\n",
        "\n",
        "model = keras.Model(inputs, outputs)\n",
        "model.summary()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "5ZuMwd7Ugtsa"
      },
      "outputs": [],
      "source": [
        "model.compile(\n",
        "    optimizer='adam',\n",
        "    loss='categorical_crossentropy',\n",
        "    metrics=['accuracy']\n",
        ")\n",
        "\n",
        "# Add a checkpoint callback to store the checkpoint that has the highest\n",
        "# validation accuracy.\n",
        "checkpoint_path = \"weights.best.hdf5\"\n",
        "checkpoint = keras.callbacks.ModelCheckpoint(checkpoint_path,\n",
        "                             monitor='val_accuracy',\n",
        "                             verbose=1,\n",
        "                             save_best_only=True,\n",
        "                             mode='max')\n",
        "earlystopping = keras.callbacks.EarlyStopping(monitor='val_accuracy', \n",
        "                                              patience=20)\n",
        "\n",
        "# Start training\n",
        "history = model.fit(X_train, y_train,\n",
        "                    epochs=200,\n",
        "                    batch_size=16,\n",
        "                    validation_data=(X_val, y_val),\n",
        "                    callbacks=[checkpoint, earlystopping])"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "pNVqmd2JO6Rp"
      },
      "outputs": [],
      "source": [
        "# Visualize the training history to see whether you're overfitting.\n",
        "plt.plot(history.history['accuracy'])\n",
        "plt.plot(history.history['val_accuracy'])\n",
        "plt.title('Model accuracy')\n",
        "plt.ylabel('accuracy')\n",
        "plt.xlabel('epoch')\n",
        "plt.legend(['TRAIN', 'VAL'], loc='lower right')\n",
        "plt.show()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "m_byMBVQgyQm"
      },
      "outputs": [],
      "source": [
        "# Evaluate the model using the TEST dataset\n",
        "loss, accuracy = model.evaluate(X_test, y_test)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "JPnPmwjn9452"
      },
      "source": [
        "### Draw the confusion matrix to better understand the model performance"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "CJuVw7gygyyd"
      },
      "outputs": [],
      "source": [
        "def plot_confusion_matrix(cm, classes,\n",
        "                          normalize=False,\n",
        "                          title='Confusion matrix',\n",
        "                          cmap=plt.cm.Blues):\n",
        "  \"\"\"Plots the confusion matrix.\"\"\"\n",
        "  if normalize:\n",
        "    cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]\n",
        "    print(\"Normalized confusion matrix\")\n",
        "  else:\n",
        "    print('Confusion matrix, without normalization')\n",
        "\n",
        "  plt.imshow(cm, interpolation='nearest', cmap=cmap)\n",
        "  plt.title(title)\n",
        "  plt.colorbar()\n",
        "  tick_marks = np.arange(len(classes))\n",
        "  plt.xticks(tick_marks, classes, rotation=55)\n",
        "  plt.yticks(tick_marks, classes)\n",
        "  fmt = '.2f' if normalize else 'd'\n",
        "  thresh = cm.max() / 2.\n",
        "  for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):\n",
        "    plt.text(j, i, format(cm[i, j], fmt),\n",
        "              horizontalalignment=\"center\",\n",
        "              color=\"white\" if cm[i, j] \u003e thresh else \"black\")\n",
        "\n",
        "  plt.ylabel('True label')\n",
        "  plt.xlabel('Predicted label')\n",
        "  plt.tight_layout()\n",
        "\n",
        "# Classify pose in the TEST dataset using the trained model\n",
        "y_pred = model.predict(X_test)\n",
        "\n",
        "# Convert the prediction result to class name\n",
        "y_pred_label = [class_names[i] for i in np.argmax(y_pred, axis=1)]\n",
        "y_true_label = [class_names[i] for i in np.argmax(y_test, axis=1)]\n",
        "\n",
        "# Plot the confusion matrix\n",
        "cm = confusion_matrix(np.argmax(y_test, axis=1), np.argmax(y_pred, axis=1))\n",
        "plot_confusion_matrix(cm,\n",
        "                      class_names,\n",
        "                      title ='Confusion Matrix of Pose Classification Model')\n",
        "\n",
        "# Print the classification report\n",
        "print('\\nClassification Report:\\n', classification_report(y_true_label,\n",
        "                                                          y_pred_label))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "YPmtRf79GkCa"
      },
      "source": [
        "### (Optional) Investigate incorrect predictions\n",
        "\n",
        "You can look at the poses from the `TEST` dataset that were incorrectly predicted to see whether the model accuracy can be improved.\n",
        "\n",
        "Note: This only works if you have run step 1 because you need the pose image files on your local machine to display them."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "bdJdwOkFGonK"
      },
      "outputs": [],
      "source": [
        "if is_skip_step_1:\n",
        "  raise RuntimeError('You must have run step 1 to run this cell.')\n",
        "\n",
        "# If step 1 was skipped, skip this step.\n",
        "IMAGE_PER_ROW = 3\n",
        "MAX_NO_OF_IMAGE_TO_PLOT = 30\n",
        "\n",
        "# Extract the list of incorrectly predicted poses\n",
        "false_predict = [id_in_df for id_in_df in range(len(y_test)) \\\n",
        "                if y_pred_label[id_in_df] != y_true_label[id_in_df]]\n",
        "if len(false_predict) \u003e MAX_NO_OF_IMAGE_TO_PLOT:\n",
        "  false_predict = false_predict[:MAX_NO_OF_IMAGE_TO_PLOT]\n",
        "\n",
        "# Plot the incorrectly predicted images\n",
        "row_count = len(false_predict) // IMAGE_PER_ROW + 1\n",
        "fig = plt.figure(figsize=(10 * IMAGE_PER_ROW, 10 * row_count))\n",
        "for i, id_in_df in enumerate(false_predict):\n",
        "  ax = fig.add_subplot(row_count, IMAGE_PER_ROW, i + 1)\n",
        "  image_path = os.path.join(images_out_test_folder,\n",
        "                            df_test.iloc[id_in_df]['file_name'])\n",
        "\n",
        "  image = cv2.imread(image_path)\n",
        "  plt.title(\"Predict: %s; Actual: %s\"\n",
        "            % (y_pred_label[id_in_df], y_true_label[id_in_df]))\n",
        "  plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))\n",
        "plt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "uhY0VeDkFK7W"
      },
      "source": [
        "## Part 3: Convert the pose classification model to TensorFlow Lite\n",
        "\n",
        "You'll convert the Keras pose classification model to the TensorFlow Lite format so that you can deploy it to mobile apps, web browsers and edge devices. When converting the model, you'll apply [dynamic range quantization](https://www.tensorflow.org/lite/performance/post_training_quant) to reduce the pose classification TensorFlow Lite model size by about 4 times with insignificant accuracy loss.\n",
        "\n",
        "Note: TensorFlow Lite supports multiple quantization schemes. See the [documentation](https://www.tensorflow.org/lite/performance/model_optimization) if you are interested to learn more."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "FmwEAgi2Flb3"
      },
      "outputs": [],
      "source": [
        "converter = tf.lite.TFLiteConverter.from_keras_model(model)\n",
        "converter.optimizations = [tf.lite.Optimize.DEFAULT]\n",
        "tflite_model = converter.convert()\n",
        "\n",
        "print('Model size: %dKB' % (len(tflite_model) / 1024))\n",
        "\n",
        "with open('pose_classifier.tflite', 'wb') as f:\n",
        "  f.write(tflite_model)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "XUOQwuBP6jMH"
      },
      "source": [
        "Then you'll write the label file which contains mapping from the class indexes to the human readable class names."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "ZVW9j5vF6hBM"
      },
      "outputs": [],
      "source": [
        "with open('pose_labels.txt', 'w') as f:\n",
        "  f.write('\\n'.join(class_names))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "H4T0HFGCve-Y"
      },
      "source": [
        "As you've applied quantization to reduce the model size, let's evaluate the quantized TFLite model to check whether the accuracy drop is acceptable."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "rv4fZFNcsN-1"
      },
      "outputs": [],
      "source": [
        "def evaluate_model(interpreter, X, y_true):\n",
        "  \"\"\"Evaluates the given TFLite model and return its accuracy.\"\"\"\n",
        "  input_index = interpreter.get_input_details()[0][\"index\"]\n",
        "  output_index = interpreter.get_output_details()[0][\"index\"]\n",
        "\n",
        "  # Run predictions on all given poses.\n",
        "  y_pred = []\n",
        "  for i in range(len(y_true)):\n",
        "    # Pre-processing: add batch dimension and convert to float32 to match with\n",
        "    # the model's input data format.\n",
        "    test_image = X[i: i + 1].astype('float32')\n",
        "    interpreter.set_tensor(input_index, test_image)\n",
        "\n",
        "    # Run inference.\n",
        "    interpreter.invoke()\n",
        "\n",
        "    # Post-processing: remove batch dimension and find the class with highest\n",
        "    # probability.\n",
        "    output = interpreter.tensor(output_index)\n",
        "    predicted_label = np.argmax(output()[0])\n",
        "    y_pred.append(predicted_label)\n",
        "\n",
        "  # Compare prediction results with ground truth labels to calculate accuracy.\n",
        "  y_pred = keras.utils.to_categorical(y_pred)\n",
        "  return accuracy_score(y_true, y_pred)\n",
        "\n",
        "# Evaluate the accuracy of the converted TFLite model\n",
        "classifier_interpreter = tf.lite.Interpreter(model_content=tflite_model)\n",
        "classifier_interpreter.allocate_tensors()\n",
        "print('Accuracy of TFLite model: %s' %\n",
        "      evaluate_model(classifier_interpreter, X_test, y_test))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "-HWqcersePiY"
      },
      "source": [
        "Now you can download the TFLite model (`pose_classifier.tflite`) and the label file (`pose_labels.txt`) to classify custom poses. See the [Android](https://github.com/tensorflow/examples/tree/master/lite/examples/pose_estimation/android) and [Python/Raspberry Pi](https://github.com/tensorflow/examples/tree/master/lite/examples/pose_estimation/raspberry_pi) sample app for an end-to-end example of how to use the TFLite pose classification model."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "KvcM_LkApOT3"
      },
      "outputs": [],
      "source": [
        "!zip pose_classifier.zip pose_labels.txt pose_classifier.tflite"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "VQ-i27VypI1u"
      },
      "outputs": [],
      "source": [
        "# Download the zip archive if running on Colab.\n",
        "try:\n",
        "  from google.colab import files\n",
        "  files.download('pose_classifier.zip')\n",
        "except:\n",
        "  pass"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "collapsed_sections": [],
      "name": "pose_classification.ipynb",
      "private_outputs": true,
      "provenance": [
        {
          "file_id": "1DghVqExBLSe8zgRaxRSxU04WGmGDM1L-",
          "timestamp": 1635321811242
        },
        {
          "file_id": "https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/g3doc/tutorials/pose_classification.ipynb",
          "timestamp": 1635313965008
        }
      ],
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}