Recommending Movies: Recommender Models in TFX

View on TensorFlow.org

Run in Google Colab

View source on GitHub

Download notebook

TFRS Tutorial Ported to TFX

This is a port of a basic TensorFlow Recommenders (TFRS) tutorial to TFX, which is designed to demonstrate how to use TFRS in a TFX pipeline. It mirrors the basic tutorial.

For context, real-world recommender systems are often composed of two stages:

The retrieval stage is responsible for selecting an initial set of hundreds of candidates from all possible candidates. The main objective of this model is to efficiently weed out all candidates that the user is not interested in. Because the retrieval model may be dealing with millions of candidates, it has to be computationally efficient.
The ranking stage takes the outputs of the retrieval model and fine-tunes them to select the best possible handful of recommendations. Its task is to narrow down the set of items the user may be interested in to a shortlist of likely candidates.

In this tutorial, we're going to focus on the first stage, retrieval. Retrieval models are often composed of two sub-models:

A query model computing the query representation (normally a fixed-dimensionality embedding vector) using query features.
A candidate model computing the candidate representation (an equally-sized vector) using the candidate features

The outputs of the two models are then multiplied together to give a query-candidate affinity score, with higher scores expressing a better match between the candidate and the query.

In this tutorial, we're going to build and train such a two-tower model using the Movielens dataset.

We're going to:

Ingest and inspect the MovieLens dataset.
Implement a retrieval model.
Train and export the model.
Make predictions

The dataset

The Movielens dataset is a classic dataset from the GroupLens research group at the University of Minnesota. It contains a set of ratings given to movies by a set of users, and is a workhorse of recommender system research.

The data can be treated in two ways:

It can be interpreted as expressesing which movies the users watched (and rated), and which they did not. This is a form of implicit feedback, where users' watches tell us which things they prefer to see and which they'd rather not see.
It can also be seen as expressesing how much the users liked the movies they did watch. This is a form of explicit feedback: given that a user watched a movie, we can tell roughly how much they liked by looking at the rating they have given.

In this tutorial, we are focusing on a retrieval system: a model that predicts a set of movies from the catalogue that the user is likely to watch. Often, implicit data is more useful here, and so we are going to treat Movielens as an implicit system. This means that every movie a user watched is a positive example, and every movie they have not seen is an implicit negative example.

Imports

Let's first get our imports out of the way.

pip install -Uq tfx
pip install -Uq tensorflow-recommenders
pip install -Uq tensorflow-datasets

Did you restart the runtime?

If you are using Google Colab, the first time that you run the cell above, you must restart the runtime (Runtime > Restart runtime ...). This is because of the way that Colab loads packages.

import os
import absl
import json
import pprint
import tempfile

from typing import Any, Dict, List, Text

import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds
import tensorflow_recommenders as tfrs
import apache_beam as beam

from absl import logging

from tfx.components.example_gen.base_example_gen_executor import BaseExampleGenExecutor
from tfx.components.example_gen.component import FileBasedExampleGen
from tfx.components.example_gen import utils
from tfx.dsl.components.base import executor_spec

from tfx.types import artifact
from tfx.types import artifact_utils
from tfx.types import channel
from tfx.types import standard_artifacts
from tfx.types.standard_artifacts import Examples

from tfx.dsl.component.experimental.annotations import InputArtifact
from tfx.dsl.component.experimental.annotations import OutputArtifact
from tfx.dsl.component.experimental.annotations import Parameter
from tfx.dsl.component.experimental.decorators import component
from tfx.types.experimental.simple_artifacts import Dataset

from tfx import v1 as tfx
from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext

# Set up logging.
tf.get_logger().propagate = False
absl.logging.set_verbosity(absl.logging.INFO)
pp = pprint.PrettyPrinter()

print(f"TensorFlow version: {tf.__version__}")
print(f"TFX version: {tfx.__version__}")
print(f"TensorFlow Recommenders version: {tfrs.__version__}")

%load_ext tfx.orchestration.experimental.interactive.notebook_extensions.skip

2024-05-08 09:58:22.819698: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-05-08 09:58:22.819743: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-05-08 09:58:22.821388: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
TensorFlow version: 2.15.1
TFX version: 1.15.0
TensorFlow Recommenders version: v0.7.3

Create a TFDS ExampleGen

We create a custom ExampleGen component which we use to load a TensorFlow Datasets (TFDS) dataset. This uses a custom executor in a FileBasedExampleGen.

@beam.ptransform_fn
@beam.typehints.with_input_types(beam.Pipeline)
@beam.typehints.with_output_types(tf.train.Example)
def _TFDatasetToExample(  # pylint: disable=invalid-name
    pipeline: beam.Pipeline,
    exec_properties: Dict[str, Any],
    split_pattern: str
    ) -> beam.pvalue.PCollection:
    """Read a TensorFlow Dataset and create tf.Examples"""
    custom_config = json.loads(exec_properties['custom_config'])
    dataset_name = custom_config['dataset']
    split_name = custom_config['split']

    builder = tfds.builder(dataset_name)
    builder.download_and_prepare()

    return (pipeline
            | 'MakeExamples' >> tfds.beam.ReadFromTFDS(builder, split=split_name)
            | 'AsNumpy' >> beam.Map(tfds.as_numpy)
            | 'ToDict' >> beam.Map(dict)
            | 'ToTFExample' >> beam.Map(utils.dict_to_example)
            )

class TFDSExecutor(BaseExampleGenExecutor):
  def GetInputSourceToExamplePTransform(self) -> beam.PTransform:
    """Returns PTransform for TF Dataset to TF examples."""
    return _TFDatasetToExample

Init TFX Pipeline Context

context = InteractiveContext()

WARNING:absl:InteractiveContext pipeline_root argument not provided: using temporary directory /tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4 as root for pipeline outputs.
WARNING:absl:InteractiveContext metadata_connection_config not provided: using SQLite ML Metadata database at /tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/metadata.sqlite.

Preparing the dataset

We will use our custom executor in a FileBasedExampleGen to load our datasets from TFDS. Since we have two datasets, we will create two ExampleGen components.

# Ratings data.
ratings_example_gen = FileBasedExampleGen(
    input_base='dummy',
    custom_config={'dataset':'movielens/100k-ratings', 'split':'train'},
    custom_executor_spec=executor_spec.ExecutorClassSpec(TFDSExecutor))
context.run(ratings_example_gen, enable_cache=True)

INFO:absl:Running driver for FileBasedExampleGen
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:select span and version = (0, None)
INFO:absl:latest span and version = (0, None)
INFO:absl:Running executor for FileBasedExampleGen
INFO:absl:Generating examples.
WARNING:apache_beam.runners.interactive.interactive_environment:Dependencies required for Interactive Beam PCollection visualization are not available, please use: `pip install apache-beam[interactive]` to install necessary dependencies to enable all data visualization features.
INFO:absl:Load dataset info from gs://tensorflow-datasets/datasets/movielens/100k-ratings/0.1.1
INFO:absl:Reusing dataset movielens (gs://tensorflow-datasets/datasets/movielens/100k-ratings/0.1.1)
INFO:absl:Load dataset info from gs://tensorflow-datasets/datasets/movielens/100k-ratings/0.1.1
INFO:absl:Load dataset info from gs://tensorflow-datasets/datasets/movielens/100k-ratings/0.1.1
INFO:absl:Load dataset info from gs://tensorflow-datasets/datasets/movielens/100k-ratings/0.1.1
INFO:absl:Load dataset info from gs://tensorflow-datasets/datasets/movielens/100k-ratings/0.1.1
INFO:absl:Load dataset info from gs://tensorflow-datasets/datasets/movielens/100k-ratings/0.1.1
INFO:absl:Constructing tf.data.Dataset movielens for split train[0shard], from gs://tensorflow-datasets/datasets/movielens/100k-ratings/0.1.1
WARNING:apache_beam.io.tfrecordio:Couldn't find python-snappy so the implementation of _TFRecordUtil._masked_crc32c is not as fast as it could be.
INFO:absl:Examples generated.
INFO:absl:Running publisher for FileBasedExampleGen
INFO:absl:MetadataStore with DB connection initialized

# Features of all the available movies.
movies_example_gen = FileBasedExampleGen(
    input_base='dummy',
    custom_config={'dataset':'movielens/100k-movies', 'split':'train'},
    custom_executor_spec=executor_spec.ExecutorClassSpec(TFDSExecutor))
context.run(movies_example_gen, enable_cache=True)

INFO:absl:Running driver for FileBasedExampleGen
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:select span and version = (0, None)
INFO:absl:latest span and version = (0, None)
INFO:absl:Running executor for FileBasedExampleGen
INFO:absl:Generating examples.
INFO:absl:Load dataset info from gs://tensorflow-datasets/datasets/movielens/100k-movies/0.1.1
INFO:absl:Reusing dataset movielens (gs://tensorflow-datasets/datasets/movielens/100k-movies/0.1.1)
INFO:absl:Load dataset info from gs://tensorflow-datasets/datasets/movielens/100k-movies/0.1.1
INFO:absl:Load dataset info from gs://tensorflow-datasets/datasets/movielens/100k-movies/0.1.1
INFO:absl:Load dataset info from gs://tensorflow-datasets/datasets/movielens/100k-movies/0.1.1
INFO:absl:Load dataset info from gs://tensorflow-datasets/datasets/movielens/100k-movies/0.1.1
INFO:absl:Load dataset info from gs://tensorflow-datasets/datasets/movielens/100k-movies/0.1.1
INFO:absl:Constructing tf.data.Dataset movielens for split train[0shard], from gs://tensorflow-datasets/datasets/movielens/100k-movies/0.1.1
INFO:absl:Examples generated.
INFO:absl:Running publisher for FileBasedExampleGen
INFO:absl:MetadataStore with DB connection initialized

Create `inspect_examples` utility

We create a convenience utility to inspect datasets of TF.Examples. The ratings dataset returns a dictionary of movie id, user id, the assigned rating, timestamp, movie information, and user information:

def inspect_examples(component,
                     channel_name='examples',
                     split_name='train',
                     num_examples=1):
  # Get the URI of the output artifact, which is a directory
  full_split_name = 'Split-{}'.format(split_name)
  print('channel_name: {}, split_name: {} (\"{}\"), num_examples: {}\n'.format(
      channel_name, split_name, full_split_name, num_examples))
  train_uri = os.path.join(
      component.outputs[channel_name].get()[0].uri, full_split_name)

  # Get the list of files in this directory (all compressed TFRecord files)
  tfrecord_filenames = [os.path.join(train_uri, name)
                        for name in os.listdir(train_uri)]

  # Create a `TFRecordDataset` to read these files
  dataset = tf.data.TFRecordDataset(tfrecord_filenames, compression_type="GZIP")

  # Iterate over the records and print them
  for tfrecord in dataset.take(num_examples):
    serialized_example = tfrecord.numpy()
    example = tf.train.Example()
    example.ParseFromString(serialized_example)
    pp.pprint(example)

inspect_examples(ratings_example_gen)

channel_name: examples, split_name: train ("Split-train"), num_examples: 1

features {
  feature {
    key: "bucketized_user_age"
    value {
      float_list {
        value: 45.0
      }
    }
  }
  feature {
    key: "movie_genres"
    value {
      int64_list {
        value: 7
      }
    }
  }
  feature {
    key: "movie_id"
    value {
      bytes_list {
        value: "357"
      }
    }
  }
  feature {
    key: "movie_title"
    value {
      bytes_list {
        value: "One Flew Over the Cuckoo\'s Nest (1975)"
      }
    }
  }
  feature {
    key: "raw_user_age"
    value {
      float_list {
        value: 46.0
      }
    }
  }
  feature {
    key: "timestamp"
    value {
      int64_list {
        value: 879024327
      }
    }
  }
  feature {
    key: "user_gender"
    value {
      int64_list {
        value: 1
      }
    }
  }
  feature {
    key: "user_id"
    value {
      bytes_list {
        value: "138"
      }
    }
  }
  feature {
    key: "user_occupation_label"
    value {
      int64_list {
        value: 4
      }
    }
  }
  feature {
    key: "user_occupation_text"
    value {
      bytes_list {
        value: "doctor"
      }
    }
  }
  feature {
    key: "user_rating"
    value {
      float_list {
        value: 4.0
      }
    }
  }
  feature {
    key: "user_zip_code"
    value {
      bytes_list {
        value: "53211"
      }
    }
  }
}

The movies dataset contains the movie id, movie title, and data on what genres it belongs to. Note that the genres are encoded with integer labels.

inspect_examples(movies_example_gen)

channel_name: examples, split_name: train ("Split-train"), num_examples: 1

features {
  feature {
    key: "movie_genres"
    value {
      int64_list {
        value: 4
      }
    }
  }
  feature {
    key: "movie_id"
    value {
      bytes_list {
        value: "1681"
      }
    }
  }
  feature {
    key: "movie_title"
    value {
      bytes_list {
        value: "You So Crazy (1994)"
      }
    }
  }
}

ExampleGen did the split

When we ingested the movie lens dataset, our ExampleGen component split the data into train and eval splits. They are actually named Split-train and Split-eval. By default the split is 66% training, 34% evaluation.

Generate statistics for movies and ratings

For a TFX pipeline we need to generate statistics for the dataset. We do that by using a StatisticsGen component. These will be used by the SchemaGen component below when we generate a schema for our dataset. This is good practice anyway, because it's important to examine and analyze your data on an ongoing basis. Since we have two datasets we will create two StatisticsGen components.

movies_stats_gen = tfx.components.StatisticsGen(
    examples=movies_example_gen.outputs['examples'])
context.run(movies_stats_gen, enable_cache=True)

INFO:absl:Excluding no splits because exclude_splits is not set.
INFO:absl:Running driver for StatisticsGen
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Running executor for StatisticsGen
INFO:absl:Generating statistics for split train.
INFO:absl:Statistics for split train written to /tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/StatisticsGen/statistics/3/Split-train.
INFO:absl:Generating statistics for split eval.
INFO:absl:Statistics for split eval written to /tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/StatisticsGen/statistics/3/Split-eval.
INFO:absl:Running publisher for StatisticsGen
INFO:absl:MetadataStore with DB connection initialized

context.show(movies_stats_gen.outputs['statistics'])

ratings_stats_gen = tfx.components.StatisticsGen(
    examples=ratings_example_gen.outputs['examples'])
context.run(ratings_stats_gen, enable_cache=True)

INFO:absl:Excluding no splits because exclude_splits is not set.
INFO:absl:Running driver for StatisticsGen
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Running executor for StatisticsGen
INFO:absl:Generating statistics for split train.
INFO:absl:Statistics for split train written to /tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/StatisticsGen/statistics/4/Split-train.
INFO:absl:Generating statistics for split eval.
INFO:absl:Statistics for split eval written to /tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/StatisticsGen/statistics/4/Split-eval.
INFO:absl:Running publisher for StatisticsGen
INFO:absl:MetadataStore with DB connection initialized

context.show(ratings_stats_gen.outputs['statistics'])

Create schemas for movies and ratings

For a TFX pipeline we need to generate a data schema from our dataset. We do that by using a SchemaGen component. This will be used by the Transform component below to do our feature engineering in a way that is highly scalable to large datasets, and avoids training/serving skew. Since we have two datasets we will create two SchemaGen components.

movies_schema_gen = tfx.components.SchemaGen(
    statistics=movies_stats_gen.outputs['statistics'],
    infer_feature_shape=False)
context.run(movies_schema_gen, enable_cache=True)

INFO:absl:Excluding no splits because exclude_splits is not set.
INFO:absl:Running driver for SchemaGen
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Running executor for SchemaGen
INFO:absl:Processing schema from statistics for split train.
INFO:absl:Processing schema from statistics for split eval.
INFO:absl:Schema written to /tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/SchemaGen/schema/5/schema.pbtxt.
INFO:absl:Running publisher for SchemaGen
INFO:absl:MetadataStore with DB connection initialized

context.show(movies_schema_gen.outputs['schema'])

ratings_schema_gen = tfx.components.SchemaGen(
    statistics=ratings_stats_gen.outputs['statistics'],
    infer_feature_shape=False)
context.run(ratings_schema_gen, enable_cache=True)

INFO:absl:Excluding no splits because exclude_splits is not set.
INFO:absl:Running driver for SchemaGen
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Running executor for SchemaGen
INFO:absl:Processing schema from statistics for split train.
INFO:absl:Processing schema from statistics for split eval.
INFO:absl:Schema written to /tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/SchemaGen/schema/6/schema.pbtxt.
INFO:absl:Running publisher for SchemaGen
INFO:absl:MetadataStore with DB connection initialized

context.show(ratings_schema_gen.outputs['schema'])

Feature Engineering using Transform

For a structured and repeatable design of a TFX pipeline we will need a scalable approach to feature engineering. This allows us to handle the large datasets which are usually part of many recommender systems, and it also avoids training/serving skew. We will do that using the Transform component.

The Transform component uses a module file to supply user code for the feature engineering what we want to do, so our first step is to create that module file. Since we have two datasets, we will create two of these module files and two Transform components.

One of the things that our recommender needs is vocabularies for the user_id and movie_title fields. In the basic_retrieval tutorial those are created with inline Numpy, but here we will use Transform.

_movies_transform_module_file = 'movies_transform_module.py'

%%writefile {_movies_transform_module_file}

import tensorflow as tf
import tensorflow_transform as tft

def preprocessing_fn(inputs):
  # We only want the movie title
  return {'movie_title':inputs['movie_title']}

Writing movies_transform_module.py

movies_transform = tfx.components.Transform(
    examples=movies_example_gen.outputs['examples'],
    schema=movies_schema_gen.outputs['schema'],
    module_file=os.path.abspath(_movies_transform_module_file))
context.run(movies_transform, enable_cache=True)

INFO:absl:Generating ephemeral wheel package for '/tmpfs/src/temp/docs/tutorials/tfx/movies_transform_module.py' (including modules: ['movies_transform_module']).
INFO:absl:User module package has hash fingerprint version 5eb30f0529e01ad72232bd9acba34fc83d7fa66b99898a3d3ee424fbdf388204.
INFO:absl:Executing: ['/tmpfs/src/tf_docs_env/bin/python', '/tmpfs/tmp/tmpua66oe02/_tfx_generated_setup.py', 'bdist_wheel', '--bdist-dir', '/tmpfs/tmp/tmpxlh1lqpu', '--dist-dir', '/tmpfs/tmp/tmp35ghcwm0']
/tmpfs/src/tf_docs_env/lib/python3.9/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
!!

        ********************************************************************************
        Please avoid running ``setup.py`` directly.
        Instead, use pypa/build, pypa/installer or other
        standards-based tools.

        See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
        ********************************************************************************

!!
  self.initialize_options()
INFO:absl:Successfully built user code wheel distribution at '/tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/_wheels/tfx_user_code_Transform-0.0+5eb30f0529e01ad72232bd9acba34fc83d7fa66b99898a3d3ee424fbdf388204-py3-none-any.whl'; target user module is 'movies_transform_module'.
INFO:absl:Full user module path is 'movies_transform_module@/tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/_wheels/tfx_user_code_Transform-0.0+5eb30f0529e01ad72232bd9acba34fc83d7fa66b99898a3d3ee424fbdf388204-py3-none-any.whl'
INFO:absl:Running driver for Transform
INFO:absl:MetadataStore with DB connection initialized
running bdist_wheel
running build
running build_py
creating build
creating build/lib
copying movies_transform_module.py -> build/lib
installing to /tmpfs/tmp/tmpxlh1lqpu
running install
running install_lib
copying build/lib/movies_transform_module.py -> /tmpfs/tmp/tmpxlh1lqpu
running install_egg_info
running egg_info
creating tfx_user_code_Transform.egg-info
writing tfx_user_code_Transform.egg-info/PKG-INFO
writing dependency_links to tfx_user_code_Transform.egg-info/dependency_links.txt
writing top-level names to tfx_user_code_Transform.egg-info/top_level.txt
writing manifest file 'tfx_user_code_Transform.egg-info/SOURCES.txt'
reading manifest file 'tfx_user_code_Transform.egg-info/SOURCES.txt'
writing manifest file 'tfx_user_code_Transform.egg-info/SOURCES.txt'
Copying tfx_user_code_Transform.egg-info to /tmpfs/tmp/tmpxlh1lqpu/tfx_user_code_Transform-0.0+5eb30f0529e01ad72232bd9acba34fc83d7fa66b99898a3d3ee424fbdf388204-py3.9.egg-info
running install_scripts
creating /tmpfs/tmp/tmpxlh1lqpu/tfx_user_code_Transform-0.0+5eb30f0529e01ad72232bd9acba34fc83d7fa66b99898a3d3ee424fbdf388204.dist-info/WHEEL
creating '/tmpfs/tmp/tmp35ghcwm0/tfx_user_code_Transform-0.0+5eb30f0529e01ad72232bd9acba34fc83d7fa66b99898a3d3ee424fbdf388204-py3-none-any.whl' and adding '/tmpfs/tmp/tmpxlh1lqpu' to it
adding 'movies_transform_module.py'
adding 'tfx_user_code_Transform-0.0+5eb30f0529e01ad72232bd9acba34fc83d7fa66b99898a3d3ee424fbdf388204.dist-info/METADATA'
adding 'tfx_user_code_Transform-0.0+5eb30f0529e01ad72232bd9acba34fc83d7fa66b99898a3d3ee424fbdf388204.dist-info/WHEEL'
adding 'tfx_user_code_Transform-0.0+5eb30f0529e01ad72232bd9acba34fc83d7fa66b99898a3d3ee424fbdf388204.dist-info/top_level.txt'
adding 'tfx_user_code_Transform-0.0+5eb30f0529e01ad72232bd9acba34fc83d7fa66b99898a3d3ee424fbdf388204.dist-info/RECORD'
removing /tmpfs/tmp/tmpxlh1lqpu
INFO:absl:Running executor for Transform
INFO:absl:Analyze the 'train' split and transform all splits when splits_config is not set.
INFO:absl:udf_utils.get_fn {'module_file': None, 'module_path': 'movies_transform_module@/tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/_wheels/tfx_user_code_Transform-0.0+5eb30f0529e01ad72232bd9acba34fc83d7fa66b99898a3d3ee424fbdf388204-py3-none-any.whl', 'preprocessing_fn': None} 'preprocessing_fn'
INFO:absl:Installing '/tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/_wheels/tfx_user_code_Transform-0.0+5eb30f0529e01ad72232bd9acba34fc83d7fa66b99898a3d3ee424fbdf388204-py3-none-any.whl' to a temporary directory.
INFO:absl:Executing: ['/tmpfs/src/tf_docs_env/bin/python', '-m', 'pip', 'install', '--target', '/tmpfs/tmp/tmp5ezzs38e', '/tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/_wheels/tfx_user_code_Transform-0.0+5eb30f0529e01ad72232bd9acba34fc83d7fa66b99898a3d3ee424fbdf388204-py3-none-any.whl']
Processing /tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/_wheels/tfx_user_code_Transform-0.0+5eb30f0529e01ad72232bd9acba34fc83d7fa66b99898a3d3ee424fbdf388204-py3-none-any.whl
INFO:absl:Successfully installed '/tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/_wheels/tfx_user_code_Transform-0.0+5eb30f0529e01ad72232bd9acba34fc83d7fa66b99898a3d3ee424fbdf388204-py3-none-any.whl'.
INFO:absl:udf_utils.get_fn {'module_file': None, 'module_path': 'movies_transform_module@/tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/_wheels/tfx_user_code_Transform-0.0+5eb30f0529e01ad72232bd9acba34fc83d7fa66b99898a3d3ee424fbdf388204-py3-none-any.whl', 'stats_options_updater_fn': None} 'stats_options_updater_fn'
INFO:absl:Installing '/tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/_wheels/tfx_user_code_Transform-0.0+5eb30f0529e01ad72232bd9acba34fc83d7fa66b99898a3d3ee424fbdf388204-py3-none-any.whl' to a temporary directory.
INFO:absl:Executing: ['/tmpfs/src/tf_docs_env/bin/python', '-m', 'pip', 'install', '--target', '/tmpfs/tmp/tmpp7f0540m', '/tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/_wheels/tfx_user_code_Transform-0.0+5eb30f0529e01ad72232bd9acba34fc83d7fa66b99898a3d3ee424fbdf388204-py3-none-any.whl']
Installing collected packages: tfx-user-code-Transform
Successfully installed tfx-user-code-Transform-0.0+5eb30f0529e01ad72232bd9acba34fc83d7fa66b99898a3d3ee424fbdf388204
Processing /tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/_wheels/tfx_user_code_Transform-0.0+5eb30f0529e01ad72232bd9acba34fc83d7fa66b99898a3d3ee424fbdf388204-py3-none-any.whl
INFO:absl:Successfully installed '/tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/_wheels/tfx_user_code_Transform-0.0+5eb30f0529e01ad72232bd9acba34fc83d7fa66b99898a3d3ee424fbdf388204-py3-none-any.whl'.
INFO:absl:Installing '/tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/_wheels/tfx_user_code_Transform-0.0+5eb30f0529e01ad72232bd9acba34fc83d7fa66b99898a3d3ee424fbdf388204-py3-none-any.whl' to a temporary directory.
INFO:absl:Executing: ['/tmpfs/src/tf_docs_env/bin/python', '-m', 'pip', 'install', '--target', '/tmpfs/tmp/tmpzwvbbd37', '/tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/_wheels/tfx_user_code_Transform-0.0+5eb30f0529e01ad72232bd9acba34fc83d7fa66b99898a3d3ee424fbdf388204-py3-none-any.whl']
Installing collected packages: tfx-user-code-Transform
Successfully installed tfx-user-code-Transform-0.0+5eb30f0529e01ad72232bd9acba34fc83d7fa66b99898a3d3ee424fbdf388204
Processing /tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/_wheels/tfx_user_code_Transform-0.0+5eb30f0529e01ad72232bd9acba34fc83d7fa66b99898a3d3ee424fbdf388204-py3-none-any.whl
INFO:absl:Successfully installed '/tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/_wheels/tfx_user_code_Transform-0.0+5eb30f0529e01ad72232bd9acba34fc83d7fa66b99898a3d3ee424fbdf388204-py3-none-any.whl'.
INFO:absl:Feature movie_genres has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_id has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_title has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_genres has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_id has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_title has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_genres has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_id has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_title has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_title has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_genres has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_id has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_title has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_genres has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_id has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_title has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_genres has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_id has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_title has no shape. Setting to varlen_sparse_tensor.
Installing collected packages: tfx-user-code-Transform
Successfully installed tfx-user-code-Transform-0.0+5eb30f0529e01ad72232bd9acba34fc83d7fa66b99898a3d3ee424fbdf388204
INFO:absl:Feature movie_title has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_genres has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_id has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_title has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_title has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_genres has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_id has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_title has no shape. Setting to varlen_sparse_tensor.
INFO:tensorflow:Assets written to: /tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/Transform/transform_graph/7/.temp_path/tftransform_tmp/bc960e2610b049f3aed640b6ef094ce6/assets
INFO:absl:Writing fingerprint to /tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/Transform/transform_graph/7/.temp_path/tftransform_tmp/bc960e2610b049f3aed640b6ef094ce6/fingerprint.pb
INFO:absl:Feature movie_title has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_title has no shape. Setting to varlen_sparse_tensor.
INFO:tensorflow:struct2tensor is not available.
INFO:tensorflow:tensorflow_decision_forests is not available.
INFO:tensorflow:tensorflow_text is not available.
INFO:tensorflow:struct2tensor is not available.
INFO:tensorflow:tensorflow_decision_forests is not available.
INFO:tensorflow:tensorflow_text is not available.
INFO:absl:Running publisher for Transform
INFO:absl:MetadataStore with DB connection initialized

context.show(movies_transform.outputs['post_transform_schema'])

inspect_examples(movies_transform, channel_name='transformed_examples')

channel_name: transformed_examples, split_name: train ("Split-train"), num_examples: 1

features {
  feature {
    key: "movie_title"
    value {
      bytes_list {
        value: "You So Crazy (1994)"
      }
    }
  }
}

_ratings_transform_module_file = 'ratings_transform_module.py'

%%writefile {_ratings_transform_module_file}

import tensorflow as tf
import tensorflow_transform as tft
import pdb

NUM_OOV_BUCKETS = 1

def preprocessing_fn(inputs):
  # We only want the user ID and the movie title, but we also need vocabularies
  # for both of them.  The vocabularies aren't features, they're only used by
  # the lookup.
  outputs = {}
  outputs['user_id'] = tft.sparse_tensor_to_dense_with_shape(inputs['user_id'], [None, 1], '-1')
  outputs['movie_title'] = tft.sparse_tensor_to_dense_with_shape(inputs['movie_title'], [None, 1], '-1')

  tft.compute_and_apply_vocabulary(
      inputs['user_id'],
      num_oov_buckets=NUM_OOV_BUCKETS,
      vocab_filename='user_id_vocab')

  tft.compute_and_apply_vocabulary(
      inputs['movie_title'],
      num_oov_buckets=NUM_OOV_BUCKETS,
      vocab_filename='movie_title_vocab')

  return outputs

Writing ratings_transform_module.py

ratings_transform = tfx.components.Transform(
    examples=ratings_example_gen.outputs['examples'],
    schema=ratings_schema_gen.outputs['schema'],
    module_file=os.path.abspath(_ratings_transform_module_file))
context.run(ratings_transform, enable_cache=True)

INFO:absl:Generating ephemeral wheel package for '/tmpfs/src/temp/docs/tutorials/tfx/ratings_transform_module.py' (including modules: ['ratings_transform_module', 'movies_transform_module']).
INFO:absl:User module package has hash fingerprint version 4a5113f0b8c14180b5cd46cfa8cc0e3d065b2031e1567b99a9df81abd4940b51.
INFO:absl:Executing: ['/tmpfs/src/tf_docs_env/bin/python', '/tmpfs/tmp/tmpiju8qrq6/_tfx_generated_setup.py', 'bdist_wheel', '--bdist-dir', '/tmpfs/tmp/tmpasy1fnai', '--dist-dir', '/tmpfs/tmp/tmpl4x43rxa']
/tmpfs/src/tf_docs_env/lib/python3.9/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
!!

        ********************************************************************************
        Please avoid running ``setup.py`` directly.
        Instead, use pypa/build, pypa/installer or other
        standards-based tools.

        See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
        ********************************************************************************

!!
  self.initialize_options()
INFO:absl:Successfully built user code wheel distribution at '/tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/_wheels/tfx_user_code_Transform-0.0+4a5113f0b8c14180b5cd46cfa8cc0e3d065b2031e1567b99a9df81abd4940b51-py3-none-any.whl'; target user module is 'ratings_transform_module'.
INFO:absl:Full user module path is 'ratings_transform_module@/tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/_wheels/tfx_user_code_Transform-0.0+4a5113f0b8c14180b5cd46cfa8cc0e3d065b2031e1567b99a9df81abd4940b51-py3-none-any.whl'
INFO:absl:Running driver for Transform
INFO:absl:MetadataStore with DB connection initialized
running bdist_wheel
running build
running build_py
creating build
creating build/lib
copying ratings_transform_module.py -> build/lib
copying movies_transform_module.py -> build/lib
installing to /tmpfs/tmp/tmpasy1fnai
running install
running install_lib
copying build/lib/ratings_transform_module.py -> /tmpfs/tmp/tmpasy1fnai
copying build/lib/movies_transform_module.py -> /tmpfs/tmp/tmpasy1fnai
running install_egg_info
running egg_info
creating tfx_user_code_Transform.egg-info
writing tfx_user_code_Transform.egg-info/PKG-INFO
writing dependency_links to tfx_user_code_Transform.egg-info/dependency_links.txt
writing top-level names to tfx_user_code_Transform.egg-info/top_level.txt
writing manifest file 'tfx_user_code_Transform.egg-info/SOURCES.txt'
reading manifest file 'tfx_user_code_Transform.egg-info/SOURCES.txt'
writing manifest file 'tfx_user_code_Transform.egg-info/SOURCES.txt'
Copying tfx_user_code_Transform.egg-info to /tmpfs/tmp/tmpasy1fnai/tfx_user_code_Transform-0.0+4a5113f0b8c14180b5cd46cfa8cc0e3d065b2031e1567b99a9df81abd4940b51-py3.9.egg-info
running install_scripts
creating /tmpfs/tmp/tmpasy1fnai/tfx_user_code_Transform-0.0+4a5113f0b8c14180b5cd46cfa8cc0e3d065b2031e1567b99a9df81abd4940b51.dist-info/WHEEL
creating '/tmpfs/tmp/tmpl4x43rxa/tfx_user_code_Transform-0.0+4a5113f0b8c14180b5cd46cfa8cc0e3d065b2031e1567b99a9df81abd4940b51-py3-none-any.whl' and adding '/tmpfs/tmp/tmpasy1fnai' to it
adding 'movies_transform_module.py'
adding 'ratings_transform_module.py'
adding 'tfx_user_code_Transform-0.0+4a5113f0b8c14180b5cd46cfa8cc0e3d065b2031e1567b99a9df81abd4940b51.dist-info/METADATA'
adding 'tfx_user_code_Transform-0.0+4a5113f0b8c14180b5cd46cfa8cc0e3d065b2031e1567b99a9df81abd4940b51.dist-info/WHEEL'
adding 'tfx_user_code_Transform-0.0+4a5113f0b8c14180b5cd46cfa8cc0e3d065b2031e1567b99a9df81abd4940b51.dist-info/top_level.txt'
adding 'tfx_user_code_Transform-0.0+4a5113f0b8c14180b5cd46cfa8cc0e3d065b2031e1567b99a9df81abd4940b51.dist-info/RECORD'
removing /tmpfs/tmp/tmpasy1fnai
INFO:absl:Running executor for Transform
INFO:absl:Analyze the 'train' split and transform all splits when splits_config is not set.
INFO:absl:udf_utils.get_fn {'module_file': None, 'module_path': 'ratings_transform_module@/tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/_wheels/tfx_user_code_Transform-0.0+4a5113f0b8c14180b5cd46cfa8cc0e3d065b2031e1567b99a9df81abd4940b51-py3-none-any.whl', 'preprocessing_fn': None} 'preprocessing_fn'
INFO:absl:Installing '/tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/_wheels/tfx_user_code_Transform-0.0+4a5113f0b8c14180b5cd46cfa8cc0e3d065b2031e1567b99a9df81abd4940b51-py3-none-any.whl' to a temporary directory.
INFO:absl:Executing: ['/tmpfs/src/tf_docs_env/bin/python', '-m', 'pip', 'install', '--target', '/tmpfs/tmp/tmputh4lqb4', '/tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/_wheels/tfx_user_code_Transform-0.0+4a5113f0b8c14180b5cd46cfa8cc0e3d065b2031e1567b99a9df81abd4940b51-py3-none-any.whl']
Processing /tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/_wheels/tfx_user_code_Transform-0.0+4a5113f0b8c14180b5cd46cfa8cc0e3d065b2031e1567b99a9df81abd4940b51-py3-none-any.whl
INFO:absl:Successfully installed '/tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/_wheels/tfx_user_code_Transform-0.0+4a5113f0b8c14180b5cd46cfa8cc0e3d065b2031e1567b99a9df81abd4940b51-py3-none-any.whl'.
INFO:absl:udf_utils.get_fn {'module_file': None, 'module_path': 'ratings_transform_module@/tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/_wheels/tfx_user_code_Transform-0.0+4a5113f0b8c14180b5cd46cfa8cc0e3d065b2031e1567b99a9df81abd4940b51-py3-none-any.whl', 'stats_options_updater_fn': None} 'stats_options_updater_fn'
INFO:absl:Installing '/tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/_wheels/tfx_user_code_Transform-0.0+4a5113f0b8c14180b5cd46cfa8cc0e3d065b2031e1567b99a9df81abd4940b51-py3-none-any.whl' to a temporary directory.
INFO:absl:Executing: ['/tmpfs/src/tf_docs_env/bin/python', '-m', 'pip', 'install', '--target', '/tmpfs/tmp/tmpljnq0ikd', '/tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/_wheels/tfx_user_code_Transform-0.0+4a5113f0b8c14180b5cd46cfa8cc0e3d065b2031e1567b99a9df81abd4940b51-py3-none-any.whl']
Installing collected packages: tfx-user-code-Transform
Successfully installed tfx-user-code-Transform-0.0+4a5113f0b8c14180b5cd46cfa8cc0e3d065b2031e1567b99a9df81abd4940b51
Processing /tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/_wheels/tfx_user_code_Transform-0.0+4a5113f0b8c14180b5cd46cfa8cc0e3d065b2031e1567b99a9df81abd4940b51-py3-none-any.whl
INFO:absl:Successfully installed '/tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/_wheels/tfx_user_code_Transform-0.0+4a5113f0b8c14180b5cd46cfa8cc0e3d065b2031e1567b99a9df81abd4940b51-py3-none-any.whl'.
INFO:absl:Installing '/tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/_wheels/tfx_user_code_Transform-0.0+4a5113f0b8c14180b5cd46cfa8cc0e3d065b2031e1567b99a9df81abd4940b51-py3-none-any.whl' to a temporary directory.
INFO:absl:Executing: ['/tmpfs/src/tf_docs_env/bin/python', '-m', 'pip', 'install', '--target', '/tmpfs/tmp/tmpimv8sdt4', '/tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/_wheels/tfx_user_code_Transform-0.0+4a5113f0b8c14180b5cd46cfa8cc0e3d065b2031e1567b99a9df81abd4940b51-py3-none-any.whl']
Installing collected packages: tfx-user-code-Transform
Successfully installed tfx-user-code-Transform-0.0+4a5113f0b8c14180b5cd46cfa8cc0e3d065b2031e1567b99a9df81abd4940b51
Processing /tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/_wheels/tfx_user_code_Transform-0.0+4a5113f0b8c14180b5cd46cfa8cc0e3d065b2031e1567b99a9df81abd4940b51-py3-none-any.whl
INFO:absl:Successfully installed '/tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/_wheels/tfx_user_code_Transform-0.0+4a5113f0b8c14180b5cd46cfa8cc0e3d065b2031e1567b99a9df81abd4940b51-py3-none-any.whl'.
INFO:absl:Feature bucketized_user_age has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_genres has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_id has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_title has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature raw_user_age has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature timestamp has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_gender has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_id has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_occupation_label has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_occupation_text has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_rating has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_zip_code has no shape. Setting to varlen_sparse_tensor.
Installing collected packages: tfx-user-code-Transform
Successfully installed tfx-user-code-Transform-0.0+4a5113f0b8c14180b5cd46cfa8cc0e3d065b2031e1567b99a9df81abd4940b51
INFO:absl:Feature bucketized_user_age has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_genres has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_id has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_title has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature raw_user_age has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature timestamp has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_gender has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_id has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_occupation_label has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_occupation_text has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_rating has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_zip_code has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature bucketized_user_age has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_genres has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_id has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_title has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature raw_user_age has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature timestamp has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_gender has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_id has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_occupation_label has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_occupation_text has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_rating has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_zip_code has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_title has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_id has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature bucketized_user_age has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_genres has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_id has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_title has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature raw_user_age has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature timestamp has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_gender has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_id has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_occupation_label has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_occupation_text has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_rating has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_zip_code has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature bucketized_user_age has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_genres has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_id has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_title has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature raw_user_age has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature timestamp has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_gender has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_id has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_occupation_label has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_occupation_text has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_rating has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_zip_code has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature bucketized_user_age has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_genres has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_id has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_title has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature raw_user_age has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature timestamp has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_gender has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_id has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_occupation_label has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_occupation_text has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_rating has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_zip_code has no shape. Setting to varlen_sparse_tensor.
WARNING:absl:Tables initialized inside a tf.function  will be re-initialized on every invocation of the function. This  re-initialization can have significant impact on performance. Consider lifting  them out of the graph context using  `tf.init_scope`.: compute_and_apply_vocabulary/apply_vocab/text_file_init/InitializeTableFromTextFileV2
WARNING:absl:Tables initialized inside a tf.function  will be re-initialized on every invocation of the function. This  re-initialization can have significant impact on performance. Consider lifting  them out of the graph context using  `tf.init_scope`.: compute_and_apply_vocabulary_1/apply_vocab/text_file_init/InitializeTableFromTextFileV2
WARNING:absl:Tables initialized inside a tf.function  will be re-initialized on every invocation of the function. This  re-initialization can have significant impact on performance. Consider lifting  them out of the graph context using  `tf.init_scope`.: compute_and_apply_vocabulary/apply_vocab/text_file_init/InitializeTableFromTextFileV2
WARNING:absl:Tables initialized inside a tf.function  will be re-initialized on every invocation of the function. This  re-initialization can have significant impact on performance. Consider lifting  them out of the graph context using  `tf.init_scope`.: compute_and_apply_vocabulary_1/apply_vocab/text_file_init/InitializeTableFromTextFileV2
INFO:absl:Feature movie_title has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_id has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature bucketized_user_age has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_genres has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_id has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_title has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature raw_user_age has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature timestamp has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_gender has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_id has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_occupation_label has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_occupation_text has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_rating has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_zip_code has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_title has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_id has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature bucketized_user_age has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_genres has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_id has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_title has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature raw_user_age has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature timestamp has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_gender has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_id has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_occupation_label has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_occupation_text has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_rating has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_zip_code has no shape. Setting to varlen_sparse_tensor.
INFO:tensorflow:Assets written to: /tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/Transform/transform_graph/8/.temp_path/tftransform_tmp/2cbbc2eb7d4142b4aa762943eca65cf1/assets
INFO:absl:Writing fingerprint to /tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/Transform/transform_graph/8/.temp_path/tftransform_tmp/2cbbc2eb7d4142b4aa762943eca65cf1/fingerprint.pb
INFO:tensorflow:struct2tensor is not available.
INFO:tensorflow:tensorflow_decision_forests is not available.
INFO:tensorflow:tensorflow_text is not available.
INFO:tensorflow:Assets written to: /tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/Transform/transform_graph/8/.temp_path/tftransform_tmp/04bbabd3468e4666bd299638171acef8/assets
INFO:absl:Writing fingerprint to /tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/Transform/transform_graph/8/.temp_path/tftransform_tmp/04bbabd3468e4666bd299638171acef8/fingerprint.pb
INFO:absl:Feature movie_title has a shape dim {
  size: 1
}
. Setting to DenseTensor.
INFO:absl:Feature user_id has a shape dim {
  size: 1
}
. Setting to DenseTensor.
INFO:absl:Feature movie_title has a shape dim {
  size: 1
}
. Setting to DenseTensor.
INFO:absl:Feature user_id has a shape dim {
  size: 1
}
. Setting to DenseTensor.
INFO:tensorflow:struct2tensor is not available.
INFO:tensorflow:tensorflow_decision_forests is not available.
INFO:tensorflow:tensorflow_text is not available.
INFO:tensorflow:struct2tensor is not available.
INFO:tensorflow:tensorflow_decision_forests is not available.
INFO:tensorflow:tensorflow_text is not available.
INFO:absl:Running publisher for Transform
INFO:absl:MetadataStore with DB connection initialized

context.show(ratings_transform.outputs['post_transform_schema'])

inspect_examples(ratings_transform, channel_name='transformed_examples')

channel_name: transformed_examples, split_name: train ("Split-train"), num_examples: 1

features {
  feature {
    key: "movie_title"
    value {
      bytes_list {
        value: "One Flew Over the Cuckoo\'s Nest (1975)"
      }
    }
  }
  feature {
    key: "user_id"
    value {
      bytes_list {
        value: "138"
      }
    }
  }
}

Implementing a model in TFX

In the basic_retrieval tutorial the model was created inline in the Python runtime. In a TFX pipeline, the model, metric, and loss are defined and trained in the module file for a pipeline component called Trainer. This makes the model, metric, and loss part of a repeatable process which can be automated and monitored.

TensorFlow Recommenders model architecture

We are going to build a two-tower retrieval model. The concept of two-tower means we will have a query tower computing the user representation using user features, and another item tower computing the movie representation using the movie features. We can build each tower separately (in the _build_user_model() and _build_movie_model() methods below) and then combine them in the final model (as in the MobieLensModel class). MovieLensModel is a subclass of tfrs.Model base class, which streamlines building models: all we need to do is to set up the components in the __init__ method, and implement the compute_loss method, taking in the raw features and returning a loss value.

# We're now going to create the module file for Trainer, which will include the
# code above with some modifications for TFX.

_trainer_module_file = 'trainer_module.py'

%%writefile {_trainer_module_file}

from typing import Dict, List, Text

import pdb

import os
import absl
import datetime
import glob
import tensorflow as tf
import tensorflow_transform as tft
import tensorflow_recommenders as tfrs

from absl import logging
from tfx.types import artifact_utils

from tfx import v1 as tfx
from tfx_bsl.coders import example_coder
from tfx_bsl.public import tfxio

absl.logging.set_verbosity(absl.logging.INFO)

EMBEDDING_DIMENSION = 32
INPUT_FN_BATCH_SIZE = 1


def extract_str_feature(dataset, feature_name):
  np_dataset = []
  for example in dataset:
    np_example = example_coder.ExampleToNumpyDict(example.numpy())
    np_dataset.append(np_example[feature_name][0].decode())
  return tf.data.Dataset.from_tensor_slices(np_dataset)


class MovielensModel(tfrs.Model):

  def __init__(self, user_model, movie_model, tf_transform_output, movies_uri):
    super().__init__()
    self.movie_model: tf.keras.Model = movie_model
    self.user_model: tf.keras.Model = user_model

    movies_artifact = movies_uri.get()[0]
    input_dir = artifact_utils.get_split_uri([movies_artifact], 'train')
    movie_files = glob.glob(os.path.join(input_dir, '*'))
    movies = tf.data.TFRecordDataset(movie_files, compression_type="GZIP")
    movies_dataset = extract_str_feature(movies, 'movie_title')

    loss_metrics = tfrs.metrics.FactorizedTopK(
        candidates=movies_dataset.batch(128).map(movie_model)
        )

    self.task: tf.keras.layers.Layer = tfrs.tasks.Retrieval(
        metrics=loss_metrics
        )


  def compute_loss(self, features: Dict[Text, tf.Tensor], training=False) -> tf.Tensor:
    # We pick out the user features and pass them into the user model.
    try:
      user_embeddings = tf.squeeze(self.user_model(features['user_id']), axis=1)
      # And pick out the movie features and pass them into the movie model,
      # getting embeddings back.
      positive_movie_embeddings = self.movie_model(features['movie_title'])

      # The task computes the loss and the metrics.
      _task = self.task(user_embeddings, positive_movie_embeddings)
    except BaseException as err:
      logging.error('######## ERROR IN compute_loss:\n{}\n###############'.format(err))

    return _task


# This function will apply the same transform operation to training data
# and serving requests.
def _apply_preprocessing(raw_features, tft_layer):
  try:
    transformed_features = tft_layer(raw_features)
  except BaseException as err:
    logging.error('######## ERROR IN _apply_preprocessing:\n{}\n###############'.format(err))

  return transformed_features


def _input_fn(file_pattern: List[Text],
              data_accessor: tfx.components.DataAccessor,
              tf_transform_output: tft.TFTransformOutput,
              batch_size: int = 200) -> tf.data.Dataset:
  """Generates features and label for tuning/training.

  Args:
    file_pattern: List of paths or patterns of input tfrecord files.
    data_accessor: DataAccessor for converting input to RecordBatch.
    tf_transform_output: A TFTransformOutput.
    batch_size: representing the number of consecutive elements of returned
      dataset to combine in a single batch

  Returns:
    A dataset that contains (features, indices) tuple where features is a
      dictionary of Tensors, and indices is a single Tensor of label indices.
  """
  try:
    return data_accessor.tf_dataset_factory(
      file_pattern,
      tfxio.TensorFlowDatasetOptions(
          batch_size=batch_size),
      tf_transform_output.transformed_metadata.schema)
  except BaseException as err:
    logging.error('######## ERROR IN _input_fn:\n{}\n###############'.format(err))

  return None


def _get_serve_tf_examples_fn(model, tf_transform_output):
  """Returns a function that parses a serialized tf.Example and applies TFT."""
  try:
    model.tft_layer = tf_transform_output.transform_features_layer()

    @tf.function
    def serve_tf_examples_fn(serialized_tf_examples):
      """Returns the output to be used in the serving signature."""
      try:
        feature_spec = tf_transform_output.raw_feature_spec()
        parsed_features = tf.io.parse_example(serialized_tf_examples, feature_spec)
        transformed_features = model.tft_layer(parsed_features)
        result = model(transformed_features)
      except BaseException as err:
        logging.error('######## ERROR IN serve_tf_examples_fn:\n{}\n###############'.format(err))
      return result
  except BaseException as err:
      logging.error('######## ERROR IN _get_serve_tf_examples_fn:\n{}\n###############'.format(err))

  return serve_tf_examples_fn


def _build_user_model(
    tf_transform_output: tft.TFTransformOutput, # Specific to ratings
    embedding_dimension: int = 32) -> tf.keras.Model:
  """Creates a Keras model for the query tower.

  Args:
    tf_transform_output: [tft.TFTransformOutput], the results of Transform
    embedding_dimension: [int], the dimensionality of the embedding space

  Returns:
    A keras Model.
  """
  try:
    unique_user_ids = tf_transform_output.vocabulary_by_name('user_id_vocab')
    users_vocab_str = [b.decode() for b in unique_user_ids]

    model = tf.keras.Sequential(
        [
         tf.keras.layers.StringLookup(
             vocabulary=users_vocab_str, mask_token=None),
         # We add an additional embedding to account for unknown tokens.
         tf.keras.layers.Embedding(len(users_vocab_str) + 1, embedding_dimension)
         ])
  except BaseException as err:
    logging.error('######## ERROR IN _build_user_model:\n{}\n###############'.format(err))

  return model


def _build_movie_model(
    tf_transform_output: tft.TFTransformOutput, # Specific to movies
    embedding_dimension: int = 32) -> tf.keras.Model:
  """Creates a Keras model for the candidate tower.

  Args:
    tf_transform_output: [tft.TFTransformOutput], the results of Transform
    embedding_dimension: [int], the dimensionality of the embedding space

  Returns:
    A keras Model.
  """
  try:
    unique_movie_titles = tf_transform_output.vocabulary_by_name('movie_title_vocab')
    titles_vocab_str = [b.decode() for b in unique_movie_titles]

    model = tf.keras.Sequential(
        [
         tf.keras.layers.StringLookup(
             vocabulary=titles_vocab_str, mask_token=None),
         # We add an additional embedding to account for unknown tokens.
         tf.keras.layers.Embedding(len(titles_vocab_str) + 1, embedding_dimension)
        ])
  except BaseException as err:
      logging.error('######## ERROR IN _build_movie_model:\n{}\n###############'.format(err))
  return model


# TFX Trainer will call this function.
def run_fn(fn_args: tfx.components.FnArgs):
  """Train the model based on given args.

  Args:
    fn_args: Holds args used to train the model as name/value pairs.
  """
  try:
    tf_transform_output = tft.TFTransformOutput(fn_args.transform_output)

    train_dataset = _input_fn(fn_args.train_files, fn_args.data_accessor,
                              tf_transform_output, INPUT_FN_BATCH_SIZE)
    eval_dataset = _input_fn(fn_args.eval_files, fn_args.data_accessor,
                            tf_transform_output, INPUT_FN_BATCH_SIZE)

    model = MovielensModel(
        _build_user_model(tf_transform_output, EMBEDDING_DIMENSION),
        _build_movie_model(tf_transform_output, EMBEDDING_DIMENSION),
        tf_transform_output,
        fn_args.custom_config['movies']
        )

    tensorboard_callback = tf.keras.callbacks.TensorBoard(
        log_dir=fn_args.model_run_dir, update_freq='batch')

    model.compile(optimizer=tf.keras.optimizers.Adagrad(learning_rate=0.1))
  except BaseException as err:
    logging.error('######## ERROR IN run_fn before fit:\n{}\n###############'.format(err))

  try:
    model.fit(
        train_dataset,
        epochs=fn_args.custom_config['epochs'],
        steps_per_epoch=fn_args.train_steps,
        validation_data=eval_dataset,
        validation_steps=fn_args.eval_steps,
        callbacks=[tensorboard_callback])
  except BaseException as err:
      logging.error('######## ERROR IN run_fn during fit:\n{}\n###############'.format(err))

  try:
    index = tfrs.layers.factorized_top_k.BruteForce(model.user_model)

    movies_artifact = fn_args.custom_config['movies'].get()[0]
    input_dir = artifact_utils.get_split_uri([movies_artifact], 'eval')
    movie_files = glob.glob(os.path.join(input_dir, '*'))
    movies = tf.data.TFRecordDataset(movie_files, compression_type="GZIP")

    movies_dataset = extract_str_feature(movies, 'movie_title')

    index.index_from_dataset(
      tf.data.Dataset.zip((
          movies_dataset.batch(100),
          movies_dataset.batch(100).map(model.movie_model))
      )
    )

    # Run once so that we can get the right signatures into SavedModel
    _, titles = index(tf.constant(["42"]))
    print(f"Recommendations for user 42: {titles[0, :3]}")

    signatures = {
        'serving_default':
            _get_serve_tf_examples_fn(index,
                                      tf_transform_output).get_concrete_function(
                                          tf.TensorSpec(
                                              shape=[None],
                                              dtype=tf.string,
                                              name='examples')),
    }
    index.save(fn_args.serving_model_dir, save_format='tf', signatures=signatures)

  except BaseException as err:
      logging.error('######## ERROR IN run_fn during export:\n{}\n###############'.format(err))

Writing trainer_module.py

Training the model

After defining the model, we can run the Trainer component to do the model training.

trainer = tfx.components.Trainer(
    module_file=os.path.abspath(_trainer_module_file),
    examples=ratings_transform.outputs['transformed_examples'],
    transform_graph=ratings_transform.outputs['transform_graph'],
    schema=ratings_transform.outputs['post_transform_schema'],
    train_args=tfx.proto.TrainArgs(num_steps=500),
    eval_args=tfx.proto.EvalArgs(num_steps=10),
    custom_config={
        'epochs':5,
        'movies':movies_transform.outputs['transformed_examples'],
        'movie_schema':movies_transform.outputs['post_transform_schema'],
        'ratings':ratings_transform.outputs['transformed_examples'],
        'ratings_schema':ratings_transform.outputs['post_transform_schema']
        })

context.run(trainer, enable_cache=False)

INFO:absl:Generating ephemeral wheel package for '/tmpfs/src/temp/docs/tutorials/tfx/trainer_module.py' (including modules: ['trainer_module', 'ratings_transform_module', 'movies_transform_module']).
INFO:absl:User module package has hash fingerprint version 4c202258fc2c517eea8b489d39d665ef0cf758328d1dec40e9e9f405bfb5b918.
INFO:absl:Executing: ['/tmpfs/src/tf_docs_env/bin/python', '/tmpfs/tmp/tmp7mnb83rn/_tfx_generated_setup.py', 'bdist_wheel', '--bdist-dir', '/tmpfs/tmp/tmpuz9j4qzr', '--dist-dir', '/tmpfs/tmp/tmpyyub9qtl']
/tmpfs/src/tf_docs_env/lib/python3.9/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
!!

        ********************************************************************************
        Please avoid running ``setup.py`` directly.
        Instead, use pypa/build, pypa/installer or other
        standards-based tools.

        See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
        ********************************************************************************

!!
  self.initialize_options()
INFO:absl:Successfully built user code wheel distribution at '/tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/_wheels/tfx_user_code_Trainer-0.0+4c202258fc2c517eea8b489d39d665ef0cf758328d1dec40e9e9f405bfb5b918-py3-none-any.whl'; target user module is 'trainer_module'.
INFO:absl:Full user module path is 'trainer_module@/tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/_wheels/tfx_user_code_Trainer-0.0+4c202258fc2c517eea8b489d39d665ef0cf758328d1dec40e9e9f405bfb5b918-py3-none-any.whl'
INFO:absl:Running driver for Trainer
INFO:absl:MetadataStore with DB connection initialized
running bdist_wheel
running build
running build_py
creating build
creating build/lib
copying trainer_module.py -> build/lib
copying ratings_transform_module.py -> build/lib
copying movies_transform_module.py -> build/lib
installing to /tmpfs/tmp/tmpuz9j4qzr
running install
running install_lib
copying build/lib/trainer_module.py -> /tmpfs/tmp/tmpuz9j4qzr
copying build/lib/ratings_transform_module.py -> /tmpfs/tmp/tmpuz9j4qzr
copying build/lib/movies_transform_module.py -> /tmpfs/tmp/tmpuz9j4qzr
running install_egg_info
running egg_info
creating tfx_user_code_Trainer.egg-info
writing tfx_user_code_Trainer.egg-info/PKG-INFO
writing dependency_links to tfx_user_code_Trainer.egg-info/dependency_links.txt
writing top-level names to tfx_user_code_Trainer.egg-info/top_level.txt
writing manifest file 'tfx_user_code_Trainer.egg-info/SOURCES.txt'
reading manifest file 'tfx_user_code_Trainer.egg-info/SOURCES.txt'
writing manifest file 'tfx_user_code_Trainer.egg-info/SOURCES.txt'
Copying tfx_user_code_Trainer.egg-info to /tmpfs/tmp/tmpuz9j4qzr/tfx_user_code_Trainer-0.0+4c202258fc2c517eea8b489d39d665ef0cf758328d1dec40e9e9f405bfb5b918-py3.9.egg-info
running install_scripts
creating /tmpfs/tmp/tmpuz9j4qzr/tfx_user_code_Trainer-0.0+4c202258fc2c517eea8b489d39d665ef0cf758328d1dec40e9e9f405bfb5b918.dist-info/WHEEL
creating '/tmpfs/tmp/tmpyyub9qtl/tfx_user_code_Trainer-0.0+4c202258fc2c517eea8b489d39d665ef0cf758328d1dec40e9e9f405bfb5b918-py3-none-any.whl' and adding '/tmpfs/tmp/tmpuz9j4qzr' to it
adding 'movies_transform_module.py'
adding 'ratings_transform_module.py'
adding 'trainer_module.py'
adding 'tfx_user_code_Trainer-0.0+4c202258fc2c517eea8b489d39d665ef0cf758328d1dec40e9e9f405bfb5b918.dist-info/METADATA'
adding 'tfx_user_code_Trainer-0.0+4c202258fc2c517eea8b489d39d665ef0cf758328d1dec40e9e9f405bfb5b918.dist-info/WHEEL'
adding 'tfx_user_code_Trainer-0.0+4c202258fc2c517eea8b489d39d665ef0cf758328d1dec40e9e9f405bfb5b918.dist-info/top_level.txt'
adding 'tfx_user_code_Trainer-0.0+4c202258fc2c517eea8b489d39d665ef0cf758328d1dec40e9e9f405bfb5b918.dist-info/RECORD'
removing /tmpfs/tmp/tmpuz9j4qzr
INFO:absl:Running executor for Trainer
INFO:absl:Train on the 'train' split when train_args.splits is not set.
INFO:absl:Evaluate on the 'eval' split when eval_args.splits is not set.
WARNING:absl:Examples artifact does not have payload_format custom property. Falling back to FORMAT_TF_EXAMPLE
WARNING:absl:Examples artifact does not have payload_format custom property. Falling back to FORMAT_TF_EXAMPLE
WARNING:absl:Examples artifact does not have payload_format custom property. Falling back to FORMAT_TF_EXAMPLE
INFO:absl:udf_utils.get_fn {'train_args': '{\n  "num_steps": 500\n}', 'eval_args': '{\n  "num_steps": 10\n}', 'module_file': None, 'run_fn': None, 'trainer_fn': None, 'custom_config': '{"epochs": 5, "movie_schema": {"__class__": "OutputChannel", "__module__": "tfx.types.channel", "__tfx_object_type__": "jsonable", "additional_custom_properties": {}, "additional_properties": {}, "artifacts": [{"__artifact_class_module__": "tfx.types.standard_artifacts", "__artifact_class_name__": "Schema", "artifact": {"custom_properties": {"name": {"string_value": "post_transform_schema:2024-05-08T10:00:07.247622"}, "producer_component": {"string_value": "Transform"}, "tfx_version": {"string_value": "1.15.0"} }, "id": "12", "name": "post_transform_schema:2024-05-08T10:00:07.247622", "state": "LIVE", "type_id": "18", "uri": "/tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/Transform/post_transform_schema/7"}, "artifact_type": {"id": "18", "name": "Schema"} }], "output_key": "post_transform_schema", "producer_component_id": "Transform", "type": {"name": "Schema"} }, "movies": {"__class__": "OutputChannel", "__module__": "tfx.types.channel", "__tfx_object_type__": "jsonable", "additional_custom_properties": {}, "additional_properties": {}, "artifacts": [{"__artifact_class_module__": "tfx.types.standard_artifacts", "__artifact_class_name__": "Examples", "artifact": {"custom_properties": {"name": {"string_value": "transformed_examples:2024-05-08T10:00:07.247622"}, "producer_component": {"string_value": "Transform"}, "tfx_version": {"string_value": "1.15.0"} }, "id": "8", "name": "transformed_examples:2024-05-08T10:00:07.247622", "properties": {"split_names": {"string_value": "[\\"eval\\", \\"train\\"]"} }, "state": "LIVE", "type_id": "14", "uri": "/tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/Transform/transformed_examples/7"}, "artifact_type": {"base_type": "DATASET", "id": "14", "name": "Examples", "properties": {"span": "INT", "split_names": "STRING", "version": "INT"} } }], "output_key": "transformed_examples", "producer_component_id": "Transform", "type": {"base_type": "DATASET", "name": "Examples", "properties": {"span": "INT", "split_names": "STRING", "version": "INT"} } }, "ratings": {"__class__": "OutputChannel", "__module__": "tfx.types.channel", "__tfx_object_type__": "jsonable", "additional_custom_properties": {}, "additional_properties": {}, "artifacts": [{"__artifact_class_module__": "tfx.types.standard_artifacts", "__artifact_class_name__": "Examples", "artifact": {"custom_properties": {"name": {"string_value": "transformed_examples:2024-05-08T10:00:21.296257"}, "producer_component": {"string_value": "Transform"}, "tfx_version": {"string_value": "1.15.0"} }, "id": "16", "name": "transformed_examples:2024-05-08T10:00:21.296257", "properties": {"split_names": {"string_value": "[\\"eval\\", \\"train\\"]"} }, "state": "LIVE", "type_id": "14", "uri": "/tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/Transform/transformed_examples/8"}, "artifact_type": {"base_type": "DATASET", "id": "14", "name": "Examples", "properties": {"span": "INT", "split_names": "STRING", "version": "INT"} } }], "output_key": "transformed_examples", "producer_component_id": "Transform", "type": {"base_type": "DATASET", "name": "Examples", "properties": {"span": "INT", "split_names": "STRING", "version": "INT"} } }, "ratings_schema": {"__class__": "OutputChannel", "__module__": "tfx.types.channel", "__tfx_object_type__": "jsonable", "additional_custom_properties": {}, "additional_properties": {}, "artifacts": [{"__artifact_class_module__": "tfx.types.standard_artifacts", "__artifact_class_name__": "Schema", "artifact": {"custom_properties": {"name": {"string_value": "post_transform_schema:2024-05-08T10:00:21.296257"}, "producer_component": {"string_value": "Transform"}, "tfx_version": {"string_value": "1.15.0"} }, "id": "20", "name": "post_transform_schema:2024-05-08T10:00:21.296257", "state": "LIVE", "type_id": "18", "uri": "/tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/Transform/post_transform_schema/8"}, "artifact_type": {"id": "18", "name": "Schema"} }], "output_key": "post_transform_schema", "producer_component_id": "Transform", "type": {"name": "Schema"} } }', 'module_path': 'trainer_module@/tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/_wheels/tfx_user_code_Trainer-0.0+4c202258fc2c517eea8b489d39d665ef0cf758328d1dec40e9e9f405bfb5b918-py3-none-any.whl'} 'run_fn'
INFO:absl:Installing '/tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/_wheels/tfx_user_code_Trainer-0.0+4c202258fc2c517eea8b489d39d665ef0cf758328d1dec40e9e9f405bfb5b918-py3-none-any.whl' to a temporary directory.
INFO:absl:Executing: ['/tmpfs/src/tf_docs_env/bin/python', '-m', 'pip', 'install', '--target', '/tmpfs/tmp/tmp5zw5hh0w', '/tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/_wheels/tfx_user_code_Trainer-0.0+4c202258fc2c517eea8b489d39d665ef0cf758328d1dec40e9e9f405bfb5b918-py3-none-any.whl']
Processing /tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/_wheels/tfx_user_code_Trainer-0.0+4c202258fc2c517eea8b489d39d665ef0cf758328d1dec40e9e9f405bfb5b918-py3-none-any.whl
INFO:absl:Successfully installed '/tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/_wheels/tfx_user_code_Trainer-0.0+4c202258fc2c517eea8b489d39d665ef0cf758328d1dec40e9e9f405bfb5b918-py3-none-any.whl'.
INFO:absl:Training model.
INFO:absl:Feature movie_title has a shape dim {
  size: 1
}
. Setting to DenseTensor.
INFO:absl:Feature user_id has a shape dim {
  size: 1
}
. Setting to DenseTensor.
Installing collected packages: tfx-user-code-Trainer
Successfully installed tfx-user-code-Trainer-0.0+4c202258fc2c517eea8b489d39d665ef0cf758328d1dec40e9e9f405bfb5b918
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tfx_bsl/tfxio/tf_example_record.py:343: parse_example_dataset (from tensorflow.python.data.experimental.ops.parsing_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map(tf.io.parse_example(...))` instead.
INFO:absl:Feature movie_title has a shape dim {
  size: 1
}
. Setting to DenseTensor.
INFO:absl:Feature user_id has a shape dim {
  size: 1
}
. Setting to DenseTensor.
Epoch 1/5
/tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/util/dispatch.py:1260: SyntaxWarning: In loss categorical_crossentropy, expected y_pred.shape to be (batch_size, num_classes) with num_classes > 1. Received: y_pred.shape=(1, 1). Consider using 'binary_crossentropy' if you only have 2 classes.
  return dispatch_target(*args, **kwargs)
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1715162456.460413   45879 device_compiler.h:186] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
500/500 [==============================] - 50s 95ms/step - factorized_top_k/top_1_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_5_categorical_accuracy: 0.0020 - factorized_top_k/top_10_categorical_accuracy: 0.0140 - factorized_top_k/top_50_categorical_accuracy: 0.0400 - factorized_top_k/top_100_categorical_accuracy: 0.0940 - loss: 0.0000e+00 - regularization_loss: 0.0000e+00 - total_loss: 0.0000e+00 - val_factorized_top_k/top_1_categorical_accuracy: 0.0000e+00 - val_factorized_top_k/top_5_categorical_accuracy: 0.0000e+00 - val_factorized_top_k/top_10_categorical_accuracy: 0.0000e+00 - val_factorized_top_k/top_50_categorical_accuracy: 0.0000e+00 - val_factorized_top_k/top_100_categorical_accuracy: 0.0000e+00 - val_loss: 0.0000e+00 - val_regularization_loss: 0.0000e+00 - val_total_loss: 0.0000e+00
Epoch 2/5
500/500 [==============================] - 48s 95ms/step - factorized_top_k/top_1_categorical_accuracy: 0.0020 - factorized_top_k/top_5_categorical_accuracy: 0.0020 - factorized_top_k/top_10_categorical_accuracy: 0.0020 - factorized_top_k/top_50_categorical_accuracy: 0.0380 - factorized_top_k/top_100_categorical_accuracy: 0.0800 - loss: 0.0000e+00 - regularization_loss: 0.0000e+00 - total_loss: 0.0000e+00 - val_factorized_top_k/top_1_categorical_accuracy: 0.0000e+00 - val_factorized_top_k/top_5_categorical_accuracy: 0.0000e+00 - val_factorized_top_k/top_10_categorical_accuracy: 0.0000e+00 - val_factorized_top_k/top_50_categorical_accuracy: 0.0000e+00 - val_factorized_top_k/top_100_categorical_accuracy: 0.0000e+00 - val_loss: 0.0000e+00 - val_regularization_loss: 0.0000e+00 - val_total_loss: 0.0000e+00
Epoch 3/5
500/500 [==============================] - 47s 95ms/step - factorized_top_k/top_1_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_5_categorical_accuracy: 0.0020 - factorized_top_k/top_10_categorical_accuracy: 0.0080 - factorized_top_k/top_50_categorical_accuracy: 0.0320 - factorized_top_k/top_100_categorical_accuracy: 0.0700 - loss: 0.0000e+00 - regularization_loss: 0.0000e+00 - total_loss: 0.0000e+00 - val_factorized_top_k/top_1_categorical_accuracy: 0.0000e+00 - val_factorized_top_k/top_5_categorical_accuracy: 0.0000e+00 - val_factorized_top_k/top_10_categorical_accuracy: 0.0000e+00 - val_factorized_top_k/top_50_categorical_accuracy: 0.0000e+00 - val_factorized_top_k/top_100_categorical_accuracy: 0.2000 - val_loss: 0.0000e+00 - val_regularization_loss: 0.0000e+00 - val_total_loss: 0.0000e+00
Epoch 4/5
500/500 [==============================] - 48s 95ms/step - factorized_top_k/top_1_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_5_categorical_accuracy: 0.0020 - factorized_top_k/top_10_categorical_accuracy: 0.0060 - factorized_top_k/top_50_categorical_accuracy: 0.0600 - factorized_top_k/top_100_categorical_accuracy: 0.1080 - loss: 0.0000e+00 - regularization_loss: 0.0000e+00 - total_loss: 0.0000e+00 - val_factorized_top_k/top_1_categorical_accuracy: 0.0000e+00 - val_factorized_top_k/top_5_categorical_accuracy: 0.0000e+00 - val_factorized_top_k/top_10_categorical_accuracy: 0.0000e+00 - val_factorized_top_k/top_50_categorical_accuracy: 0.1000 - val_factorized_top_k/top_100_categorical_accuracy: 0.1000 - val_loss: 0.0000e+00 - val_regularization_loss: 0.0000e+00 - val_total_loss: 0.0000e+00
Epoch 5/5
500/500 [==============================] - 48s 95ms/step - factorized_top_k/top_1_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_5_categorical_accuracy: 0.0060 - factorized_top_k/top_10_categorical_accuracy: 0.0100 - factorized_top_k/top_50_categorical_accuracy: 0.0420 - factorized_top_k/top_100_categorical_accuracy: 0.0740 - loss: 0.0000e+00 - regularization_loss: 0.0000e+00 - total_loss: 0.0000e+00 - val_factorized_top_k/top_1_categorical_accuracy: 0.0000e+00 - val_factorized_top_k/top_5_categorical_accuracy: 0.0000e+00 - val_factorized_top_k/top_10_categorical_accuracy: 0.0000e+00 - val_factorized_top_k/top_50_categorical_accuracy: 0.0000e+00 - val_factorized_top_k/top_100_categorical_accuracy: 0.0000e+00 - val_loss: 0.0000e+00 - val_regularization_loss: 0.0000e+00 - val_total_loss: 0.0000e+00
Recommendations for user 42: [[b'Pulp Fiction (1994)' b'Nightmare Before Christmas, The (1993)'
  b'Believers, The (1987)' b'Scream of Stone (Schrei aus Stein) (1991)'
  b'Kansas City (1996)' b'Die Hard (1988)'
  b'Ma vie en rose (My Life in Pink) (1997)' b'Stalker (1979)'
  b'Traveller (1997)' b'Hot Shots! Part Deux (1993)']]
INFO:tensorflow:struct2tensor is not available.
INFO:tensorflow:tensorflow_decision_forests is not available.
INFO:tensorflow:tensorflow_text is not available.
INFO:absl:Feature bucketized_user_age has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_genres has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_id has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature movie_title has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature raw_user_age has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature timestamp has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_gender has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_id has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_occupation_label has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_occupation_text has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_rating has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature user_zip_code has no shape. Setting to varlen_sparse_tensor.
/tmpfs/src/tf_docs_env/lib/python3.9/site-packages/keras/src/engine/functional.py:642: UserWarning: Input dict contained keys ['user_id', 'movie_title'] which did not match any model input. They will be ignored by the model.
  inputs = self._flatten_to_reference_inputs(inputs)
INFO:absl:Function `serve_tf_examples_fn` contains input name(s) 1485921, 1485931, table_handle, 1485943, resource with unsupported characters which will be renamed to transform_features_layer_1485921, transform_features_layer_1485931, brute_force_sequential_string_lookup_none_lookup_lookuptablefindv2_table_handle, brute_force_sequential_embedding_embedding_lookup_1485943, brute_force_gather_resource in the SavedModel.
WARNING:tensorflow:Model's `__init__()` arguments contain non-serializable objects. Please implement a `get_config()` method in the subclassed Model for proper saving and loading. Defaulting to empty config.
WARNING:tensorflow:Model's `__init__()` arguments contain non-serializable objects. Please implement a `get_config()` method in the subclassed Model for proper saving and loading. Defaulting to empty config.
INFO:absl:Found untraced functions such as query_with_exclusions while saving (showing 1 of 1). These functions will not be directly callable after loading.
INFO:tensorflow:Assets written to: /tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/Trainer/model/9/Format-Serving/assets
INFO:absl:Writing fingerprint to /tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/Trainer/model/9/Format-Serving/fingerprint.pb
WARNING:tensorflow:Model's `__init__()` arguments contain non-serializable objects. Please implement a `get_config()` method in the subclassed Model for proper saving and loading. Defaulting to empty config.
WARNING:tensorflow:Model's `__init__()` arguments contain non-serializable objects. Please implement a `get_config()` method in the subclassed Model for proper saving and loading. Defaulting to empty config.
INFO:absl:Training complete. Model written to /tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/Trainer/model/9/Format-Serving. ModelRun written to /tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/Trainer/model_run/9
INFO:absl:Running publisher for Trainer
INFO:absl:MetadataStore with DB connection initialized

Exporting the model

After training the model, we can use the Pusher component to export the model.

_serving_model_dir = os.path.join(tempfile.mkdtemp(), 'serving_model/tfrs_retrieval')

pusher = tfx.components.Pusher(
    model=trainer.outputs['model'],
    push_destination=tfx.proto.PushDestination(
        filesystem=tfx.proto.PushDestination.Filesystem(
            base_directory=_serving_model_dir)))
context.run(pusher, enable_cache=True)

INFO:absl:Running driver for Pusher
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Running executor for Pusher
WARNING:absl:Pusher is going to push the model without validation. Consider using Evaluator or InfraValidator in your pipeline.
INFO:absl:Model version: 1715162699
INFO:absl:Model written to serving path /tmpfs/tmp/tmpdg1a17qe/serving_model/tfrs_retrieval/1715162699.
INFO:absl:Model pushed to /tmpfs/tmp/tfx-interactive-2024-05-08T09_58_28.906690-q9pdz5m4/Pusher/pushed_model/10.
INFO:absl:Running publisher for Pusher
INFO:absl:MetadataStore with DB connection initialized

Make predictions

Now that we have a model, we load it back and make predictions.

loaded = tf.saved_model.load(pusher.outputs['pushed_model'].get()[0].uri)
scores, titles = loaded(["42"])

print(f"Recommendations: {titles[0][:3]}")

Recommendations: [[b'Pulp Fiction (1994)' b'Nightmare Before Christmas, The (1993)'
  b'Believers, The (1987)' b'Scream of Stone (Schrei aus Stein) (1991)'
  b'Kansas City (1996)' b'Die Hard (1988)'
  b'Ma vie en rose (My Life in Pink) (1997)' b'Stalker (1979)'
  b'Traveller (1997)' b'Hot Shots! Part Deux (1993)']]

Next step

In this tutorial, you have learned how to implement a retrieval model with TensorFlow Recommenders and TFX. To expand on what is presented here, have a look at the TFRS ranking with TFX tutorial.