Recommandations TensorFlow : Démarrage rapide

Voir sur TensorFlow.org Exécuter dans Google Colab Voir la source sur GitHub Télécharger le cahier

Dans ce tutoriel, nous construisons un modèle simple matrice de factorisation en utilisant l' ensemble de données MovieLens 100K avec TFRS. Nous pouvons utiliser ce modèle pour recommander des films pour un utilisateur donné.

Importer TFRS

Tout d'abord, installez et importez TFRS :

pip install -q tensorflow-recommenders
pip install -q --upgrade tensorflow-datasets
from typing import Dict, Text

import numpy as np
import tensorflow as tf

import tensorflow_datasets as tfds
import tensorflow_recommenders as tfrs

Lire les données

# Ratings data.
ratings = tfds.load('movielens/100k-ratings', split="train")
# Features of all the available movies.
movies = tfds.load('movielens/100k-movies', split="train")

# Select the basic features.
ratings = ratings.map(lambda x: {
    "movie_title": x["movie_title"],
    "user_id": x["user_id"]
})
movies = movies.map(lambda x: x["movie_title"])
2021-08-25 11:16:39.656990: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected

Créez des vocabulaires pour convertir les identifiants des utilisateurs et les titres des films en indices entiers pour l'intégration des couches :

user_ids_vocabulary = tf.keras.layers.experimental.preprocessing.StringLookup(mask_token=None)
user_ids_vocabulary.adapt(ratings.map(lambda x: x["user_id"]))

movie_titles_vocabulary = tf.keras.layers.experimental.preprocessing.StringLookup(mask_token=None)
movie_titles_vocabulary.adapt(movies)

Définir un modèle

Nous pouvons définir un modèle TFRS en héritant de tfrs.Model et mettre en œuvre la compute_loss méthode:

class MovieLensModel(tfrs.Model):
  # We derive from a custom base class to help reduce boilerplate. Under the hood,
  # these are still plain Keras Models.

  def __init__(
      self,
      user_model: tf.keras.Model,
      movie_model: tf.keras.Model,
      task: tfrs.tasks.Retrieval):
    super().__init__()

    # Set up user and movie representations.
    self.user_model = user_model
    self.movie_model = movie_model

    # Set up a retrieval task.
    self.task = task

  def compute_loss(self, features: Dict[Text, tf.Tensor], training=False) -> tf.Tensor:
    # Define how the loss is computed.

    user_embeddings = self.user_model(features["user_id"])
    movie_embeddings = self.movie_model(features["movie_title"])

    return self.task(user_embeddings, movie_embeddings)

Définissez les deux modèles et la tâche de récupération.

# Define user and movie models.
user_model = tf.keras.Sequential([
    user_ids_vocabulary,
    tf.keras.layers.Embedding(user_ids_vocabulary.vocab_size(), 64)
])
movie_model = tf.keras.Sequential([
    movie_titles_vocabulary,
    tf.keras.layers.Embedding(movie_titles_vocabulary.vocab_size(), 64)
])

# Define your objectives.
task = tfrs.tasks.Retrieval(metrics=tfrs.metrics.FactorizedTopK(
    movies.batch(128).map(movie_model)
  )
)
WARNING:tensorflow:vocab_size is deprecated, please use vocabulary_size.
WARNING:tensorflow:vocab_size is deprecated, please use vocabulary_size.
WARNING:tensorflow:vocab_size is deprecated, please use vocabulary_size.
WARNING:tensorflow:vocab_size is deprecated, please use vocabulary_size.

Ajustez-le et évaluez-le.

Créez le modèle, entraînez-le et générez des prédictions :

# Create a retrieval model.
model = MovieLensModel(user_model, movie_model, task)
model.compile(optimizer=tf.keras.optimizers.Adagrad(0.5))

# Train for 3 epochs.
model.fit(ratings.batch(4096), epochs=3)

# Use brute-force search to set up retrieval using the trained representations.
index = tfrs.layers.factorized_top_k.BruteForce(model.user_model)
index.index_from_dataset(
    movies.batch(100).map(lambda title: (title, model.movie_model(title))))

# Get some recommendations.
_, titles = index(np.array(["42"]))
print(f"Top 3 recommendations for user 42: {titles[0, :3]}")
Epoch 1/3
25/25 [==============================] - 5s 156ms/step - factorized_top_k/top_1_categorical_accuracy: 7.0000e-05 - factorized_top_k/top_5_categorical_accuracy: 0.0016 - factorized_top_k/top_10_categorical_accuracy: 0.0047 - factorized_top_k/top_50_categorical_accuracy: 0.0439 - factorized_top_k/top_100_categorical_accuracy: 0.1000 - loss: 33088.3344 - regularization_loss: 0.0000e+00 - total_loss: 33088.3344
Epoch 2/3
25/25 [==============================] - 4s 155ms/step - factorized_top_k/top_1_categorical_accuracy: 1.1000e-04 - factorized_top_k/top_5_categorical_accuracy: 0.0048 - factorized_top_k/top_10_categorical_accuracy: 0.0140 - factorized_top_k/top_50_categorical_accuracy: 0.1042 - factorized_top_k/top_100_categorical_accuracy: 0.2104 - loss: 31011.7002 - regularization_loss: 0.0000e+00 - total_loss: 31011.7002
Epoch 3/3
25/25 [==============================] - 4s 154ms/step - factorized_top_k/top_1_categorical_accuracy: 3.8000e-04 - factorized_top_k/top_5_categorical_accuracy: 0.0080 - factorized_top_k/top_10_categorical_accuracy: 0.0219 - factorized_top_k/top_50_categorical_accuracy: 0.1440 - factorized_top_k/top_100_categorical_accuracy: 0.2671 - loss: 30417.0654 - regularization_loss: 0.0000e+00 - total_loss: 30417.0654
Top 3 recommendations for user 42: [b'Rent-a-Kid (1995)' b'House Arrest (1996)' b'Mirage (1995)']