Clasificación de texto con un RNN

Ver en TensorFlow.org Ejecutar en Google Colab Ver fuente en GitHub Descargar cuaderno

Este tutorial clasificación de texto entrena una red neuronal recurrente en la IMDB gran película de opinión conjunto de datos para el análisis de los sentimientos.

Configuración

import numpy as np

import tensorflow_datasets as tfds
import tensorflow as tf

tfds.disable_progress_bar()
2021-08-11 17:13:39.142911: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0

Importación matplotlib y crear una función de ayuda a los gráficos de la trama:

import matplotlib.pyplot as plt


def plot_graphs(history, metric):
  plt.plot(history.history[metric])
  plt.plot(history.history['val_'+metric], '')
  plt.xlabel("Epochs")
  plt.ylabel(metric)
  plt.legend([metric, 'val_'+metric])

Configurar canalización de entrada

El IMDB amplia reseña de la película conjunto de datos es un conjunto de datos binarios de clasificación-todas las opiniones tienen ya sea positiva o sentimiento negativo.

Descargar el conjunto de datos utilizando TFDS . Ver el tutorial de texto de carga para obtener más información sobre cómo cargar este tipo de datos de forma manual.

dataset, info = tfds.load('imdb_reviews', with_info=True,
                          as_supervised=True)
train_dataset, test_dataset = dataset['train'], dataset['test']

train_dataset.element_spec
2021-08-11 17:13:44.932351: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-08-11 17:13:45.580911: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-11 17:13:45.581828: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:00:05.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-08-11 17:13:45.581863: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-08-11 17:13:45.585229: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-08-11 17:13:45.585313: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-08-11 17:13:45.586503: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-08-11 17:13:45.586856: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-08-11 17:13:45.587873: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-08-11 17:13:45.588833: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-08-11 17:13:45.589011: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-08-11 17:13:45.589112: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-11 17:13:45.590061: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-11 17:13:45.590953: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-08-11 17:13:45.591672: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-08-11 17:13:45.592263: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-11 17:13:45.593243: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:00:05.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-08-11 17:13:45.593339: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-11 17:13:45.594320: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-11 17:13:45.595237: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-08-11 17:13:45.595273: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-08-11 17:13:46.197066: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-08-11 17:13:46.197100: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 
2021-08-11 17:13:46.197108: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N 
2021-08-11 17:13:46.197324: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-11 17:13:46.198268: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-11 17:13:46.199187: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-11 17:13:46.200063: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14646 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:05.0, compute capability: 7.0)
(TensorSpec(shape=(), dtype=tf.string, name=None),
 TensorSpec(shape=(), dtype=tf.int64, name=None))

Inicialmente, esto devuelve un conjunto de datos de (texto, pares de etiquetas):

for example, label in train_dataset.take(1):
  print('text: ', example.numpy())
  print('label: ', label.numpy())
2021-08-11 17:13:46.308471: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2021-08-11 17:13:46.309038: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2000165000 Hz
text:  b"This was an absolutely terrible movie. Don't be lured in by Christopher Walken or Michael Ironside. Both are great actors, but this must simply be their worst role in history. Even their great acting could not redeem this movie's ridiculous storyline. This movie is an early nineties US propaganda piece. The most pathetic scenes were those when the Columbian rebels were making their cases for revolutions. Maria Conchita Alonso appeared phony, and her pseudo-love affair with Walken was nothing but a pathetic emotional plug in a movie that was devoid of any real meaning. I am disappointed that there are movies like this, ruining actor's like Christopher Walken's good name. I could barely sit through it."
label:  0

A continuación mezclar los datos de formación y crear lotes de estos (text, label) pares:

BUFFER_SIZE = 10000
BATCH_SIZE = 64
train_dataset = train_dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)
test_dataset = test_dataset.batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)
for example, label in train_dataset.take(1):
  print('texts: ', example.numpy()[:3])
  print()
  print('labels: ', label.numpy()[:3])
texts:  [b"Released as Zentropa in North America to avoid confusion with Agniezska Holland's own Holocaust film Europa Europa, this third theatrical feature by a filmmaker who never ceases to surprise, inspire or downright shock is a bizarre, nostalgic, elaborate film about a naive American in Germany shortly following the end of WWII. The American, named Leo, doesn't fully get what he's doing there. He has come to take part in fixing up the country since, in his mind, it's about time Germany was shown some charity. No matter how that sounds, he is not a Nazi sympathizer or so much as especially pro-German, merely mixed up. His uncle, who works on the railroad, gets Leo a job as a helmsman on a sleeping car, and he is increasingly enmeshed in a vortex of 1945 Germany's horrors and enigmas.<br /><br />This progression starts when Leo, played rather memorably by the calm yet restless actor Jean-Marc Barr, meets a sultry heiress on the train played by Barbara Sukowa, an actress with gentility on the surface but internal vigor. She seduces him and then takes him home to meet her family, which owns the company which manufactures the trains. These were the precise trains that took Jews to their deaths during the war, but now they run a drab day-to-day timetable, and the woman's Uncle Kessler postures as another one of those good Germans who were just doing their jobs. There is also Udo Kier, the tremendous actor who blew me away in Von Trier's shocking second film Epidemic, though here he is mere scenery.<br /><br />Another guest at the house is Eddie Constantine, an actor with a quiet strength, playing a somber American intelligence man. He can confirm that Uncle Kessler was a war criminal, though it is all completely baffling to Leo. Americans have been characterized as gullible rubes out of their element for decades, but little have they been more blithely unconcerned than Leo, who goes back to his job on what gradually looks like his own customized death train.<br /><br />The story is told in a purposely uncoordinated manner by the film's Danish director, Lars Von Trier, whose anchor is in the film's breathtaking editing and cinematography. He shoots in black and white and color, he uses double-exposures, optical effects and trick photography, having actors interact with rear-projected footage, he places his characters inside a richly shaded visceral world so that they sometimes feel like insects, caught between glass for our more precise survey.<br /><br />This Grand Jury Prize-winning surrealist work is allegorical, but maybe in a distinct tone for every viewer. I interpret it as a film about the last legs of Nazism, symbolized by the train, and the ethical accountability of Americans and others who appeared too late to salvage the martyrs of these trains and the camps where they distributed their condemned shiploads. During the time frame of the movie, and the Nazi state, and such significance to the train, are dead, but like decapitated chickens they persist in jolting through their reflexes.<br /><br />The characters, music, dialogue, and plot are deliberately hammy and almost satirically procured from film noir conventions. The most entrancing points in the movie are the entirely cinematographic ones. Two trains halting back and forth, Barr on one and Sukowa on another. An underwater shot of proliferating blood. An uncommonly expressive sequence on what it must be like to drown. And most metaphysically affecting of all, an anesthetic shot of train tracks, as Max von Sydow's voice allures us to hark back to Europe with him, and abandon our personal restraint."
 b'Kate Beckinsale is excellent as the manipulative and yet irresistibly charming Emma in this TV-adaptation of Jane Austen\xc2\xb4s novel. When I read that novel I was sometimes quite doubtful whether the protagonist really deserved to be considered the heroine of the story: for honestly, she is so terribly self-righteous and scheming that one is tempted to dislike her seriously. Kate Beckinsale\xc2\xb4s interpretation, however, saves Emma from herself so to speak: she is portrayed with all the innocence and generosity of her character in full view, and one can\xc2\xb4t help but give in and like (not to say love) her in spite of her less amiable qualities. Kate Beckinsale is the main, but not the only, reason why this TV-series is so delightful; Raymond Coulthard is perfect as Mr. Frank Churchill, expressing this character\xc2\xb4s personal magnetism to the full (which is all the more conspicuous because of this role being not very well handled by Ewan McGregor in the 1996-screen adaptation of Emma), and Mark Strong, Samantha Morton, Bernard Hepton, and Olivia Williams are all as they should be in their respective roles. This production is, in short, a great achievement and one to view many times with increasing pleasure.'
 b'If only Eddie Murphy were born 10 years later. Then we\'d all remember it. But even I was only 4 when it came out. If you haven\'t seen it yet, rent Dr. Dolittle, Showtime, I spy, Pluto Nash and all Eddie\'s family comedy movies - then watch this. Hands down, you\'ll laugh 90% of the time. The other 10% you\'ll be wiping the tears from your eyes.<br /><br />It really needs to be watched more then once to understand all the jokes. From crude humor to a joke for kids!(if you\'ve seen it you\'ll laugh here) - you\'ll love his stuff. If you can, (or are a big fan) try to download clips from Eddie\'s acts. Allot of the shows are different as you\'d imagine and he has even more funny jokes.<br /><br />But this is like the "best of" Eddie Murphy \'X-rated\' if you will.<br /><br />And all I can say is please don\'t watch Delirious if you don\'t like comedy, don\'t have a sense of humor or are not fun to hang out with. You will only put down this great Eddie Murphy classic and possibly make someone miss out on it.<br /><br />If you wanna know how Eddie got Beverly Hills Cop and got famous from it- Delirious is it.']

labels:  [1 1 1]

Crea el codificador de texto

El texto en bruto cargado por tfds necesita ser procesada antes de que pueda ser utilizado en un modelo. La forma más sencilla de texto proceso de formación está utilizando el experimental.preprocessing.TextVectorization capa. Esta capa tiene muchas capacidades, pero este tutorial se adhiere al comportamiento predeterminado.

Crear la capa, y aprobar el texto del conjunto de datos a la capa de .adapt método:

VOCAB_SIZE = 1000
encoder = tf.keras.layers.experimental.preprocessing.TextVectorization(
    max_tokens=VOCAB_SIZE)
encoder.adapt(train_dataset.map(lambda text, label: text))

El .adapt método establece el vocabulario de la capa. Aquí están las primeras 20 fichas. Después del relleno y los tokens desconocidos, se ordenan por frecuencia:

vocab = np.array(encoder.get_vocabulary())
vocab[:20]
array(['', '[UNK]', 'the', 'and', 'a', 'of', 'to', 'is', 'in', 'it', 'i',
       'this', 'that', 'br', 'was', 'as', 'for', 'with', 'movie', 'but'],
      dtype='<U14')

Una vez que se establece el vocabulario, la capa puede codificar texto en índices. Los tensores de índices se 0-acolchado en la secuencia más larga en el lote (a menos que establezca un fijo output_sequence_length ):

encoded_example = encoder(example)[:3].numpy()
encoded_example
array([[627,  15,   1, ..., 254, 925,   1],
       [  1,   1,   7, ...,   0,   0,   0],
       [ 45,  61,   1, ...,   0,   0,   0]])

Con la configuración predeterminada, el proceso no es completamente reversible. Hay tres razones principales para ello:

  1. El valor por defecto para preprocessing.TextVectorization 's standardize argumento es "lower_and_strip_punctuation" .
  2. El tamaño limitado del vocabulario y la falta de respaldo basado en caracteres dan como resultado algunos tokens desconocidos.
for n in range(3):
  print("Original: ", example[n].numpy())
  print("Round-trip: ", " ".join(vocab[encoded_example[n]]))
  print()
Original:  b"Released as Zentropa in North America to avoid confusion with Agniezska Holland's own Holocaust film Europa Europa, this third theatrical feature by a filmmaker who never ceases to surprise, inspire or downright shock is a bizarre, nostalgic, elaborate film about a naive American in Germany shortly following the end of WWII. The American, named Leo, doesn't fully get what he's doing there. He has come to take part in fixing up the country since, in his mind, it's about time Germany was shown some charity. No matter how that sounds, he is not a Nazi sympathizer or so much as especially pro-German, merely mixed up. His uncle, who works on the railroad, gets Leo a job as a helmsman on a sleeping car, and he is increasingly enmeshed in a vortex of 1945 Germany's horrors and enigmas.<br /><br />This progression starts when Leo, played rather memorably by the calm yet restless actor Jean-Marc Barr, meets a sultry heiress on the train played by Barbara Sukowa, an actress with gentility on the surface but internal vigor. She seduces him and then takes him home to meet her family, which owns the company which manufactures the trains. These were the precise trains that took Jews to their deaths during the war, but now they run a drab day-to-day timetable, and the woman's Uncle Kessler postures as another one of those good Germans who were just doing their jobs. There is also Udo Kier, the tremendous actor who blew me away in Von Trier's shocking second film Epidemic, though here he is mere scenery.<br /><br />Another guest at the house is Eddie Constantine, an actor with a quiet strength, playing a somber American intelligence man. He can confirm that Uncle Kessler was a war criminal, though it is all completely baffling to Leo. Americans have been characterized as gullible rubes out of their element for decades, but little have they been more blithely unconcerned than Leo, who goes back to his job on what gradually looks like his own customized death train.<br /><br />The story is told in a purposely uncoordinated manner by the film's Danish director, Lars Von Trier, whose anchor is in the film's breathtaking editing and cinematography. He shoots in black and white and color, he uses double-exposures, optical effects and trick photography, having actors interact with rear-projected footage, he places his characters inside a richly shaded visceral world so that they sometimes feel like insects, caught between glass for our more precise survey.<br /><br />This Grand Jury Prize-winning surrealist work is allegorical, but maybe in a distinct tone for every viewer. I interpret it as a film about the last legs of Nazism, symbolized by the train, and the ethical accountability of Americans and others who appeared too late to salvage the martyrs of these trains and the camps where they distributed their condemned shiploads. During the time frame of the movie, and the Nazi state, and such significance to the train, are dead, but like decapitated chickens they persist in jolting through their reflexes.<br /><br />The characters, music, dialogue, and plot are deliberately hammy and almost satirically procured from film noir conventions. The most entrancing points in the movie are the entirely cinematographic ones. Two trains halting back and forth, Barr on one and Sukowa on another. An underwater shot of proliferating blood. An uncommonly expressive sequence on what it must be like to drown. And most metaphysically affecting of all, an anesthetic shot of train tracks, as Max von Sydow's voice allures us to hark back to Europe with him, and abandon our personal restraint."
Round-trip:  released as [UNK] in [UNK] america to avoid [UNK] with [UNK] [UNK] own [UNK] film [UNK] [UNK] this third [UNK] feature by a [UNK] who never [UNK] to surprise [UNK] or [UNK] [UNK] is a [UNK] [UNK] [UNK] film about a [UNK] american in [UNK] [UNK] [UNK] the end of [UNK] the american named [UNK] doesnt [UNK] get what hes doing there he has come to take part in [UNK] up the country since in his mind its about time [UNK] was shown some [UNK] no matter how that sounds he is not a [UNK] [UNK] or so much as especially [UNK] [UNK] [UNK] up his [UNK] who works on the [UNK] gets [UNK] a job as a [UNK] on a [UNK] car and he is [UNK] [UNK] in a [UNK] of [UNK] [UNK] [UNK] and [UNK] br this [UNK] starts when [UNK] played rather [UNK] by the [UNK] yet [UNK] actor [UNK] [UNK] meets a [UNK] [UNK] on the [UNK] played by [UNK] [UNK] an actress with [UNK] on the [UNK] but [UNK] [UNK] she [UNK] him and then takes him home to meet her family which [UNK] the [UNK] which [UNK] the [UNK] these were the [UNK] [UNK] that took [UNK] to their [UNK] during the war but now they run a [UNK] [UNK] [UNK] and the [UNK] [UNK] [UNK] [UNK] as another one of those good [UNK] who were just doing their [UNK] there is also [UNK] [UNK] the [UNK] actor who [UNK] me away in [UNK] [UNK] [UNK] second film [UNK] though here he is [UNK] [UNK] br another [UNK] at the house is [UNK] [UNK] an actor with a [UNK] [UNK] playing a [UNK] american [UNK] man he can [UNK] that [UNK] [UNK] was a war [UNK] though it is all completely [UNK] to [UNK] [UNK] have been [UNK] as [UNK] [UNK] out of their [UNK] for [UNK] but little have they been more [UNK] [UNK] than [UNK] who goes back to his job on what [UNK] looks like his own [UNK] death [UNK] br the story is told in a [UNK] [UNK] [UNK] by the films [UNK] director [UNK] [UNK] [UNK] whose [UNK] is in the films [UNK] editing and cinematography he [UNK] in black and white and [UNK] he [UNK] [UNK] [UNK] effects and [UNK] [UNK] having actors [UNK] with [UNK] footage he [UNK] his characters inside a [UNK] [UNK] [UNK] world so that they sometimes feel like [UNK] [UNK] between [UNK] for our more [UNK] [UNK] br this [UNK] [UNK] [UNK] [UNK] work is [UNK] but maybe in a [UNK] [UNK] for every viewer i [UNK] it as a film about the last [UNK] of [UNK] [UNK] by the [UNK] and the [UNK] [UNK] of [UNK] and others who [UNK] too late to [UNK] the [UNK] of these [UNK] and the [UNK] where they [UNK] their [UNK] [UNK] during the time [UNK] of the movie and the [UNK] [UNK] and such [UNK] to the [UNK] are dead but like [UNK] [UNK] they [UNK] in [UNK] through their [UNK] br the characters music dialogue and plot are [UNK] [UNK] and almost [UNK] [UNK] from film [UNK] [UNK] the most [UNK] points in the movie are the [UNK] [UNK] ones two [UNK] [UNK] back and [UNK] [UNK] on one and [UNK] on another an [UNK] shot of [UNK] blood an [UNK] [UNK] sequence on what it must be like to [UNK] and most [UNK] [UNK] of all an [UNK] shot of [UNK] [UNK] as [UNK] [UNK] [UNK] voice [UNK] us to [UNK] back to [UNK] with him and [UNK] our personal [UNK]

Original:  b'Kate Beckinsale is excellent as the manipulative and yet irresistibly charming Emma in this TV-adaptation of Jane Austen\xc2\xb4s novel. When I read that novel I was sometimes quite doubtful whether the protagonist really deserved to be considered the heroine of the story: for honestly, she is so terribly self-righteous and scheming that one is tempted to dislike her seriously. Kate Beckinsale\xc2\xb4s interpretation, however, saves Emma from herself so to speak: she is portrayed with all the innocence and generosity of her character in full view, and one can\xc2\xb4t help but give in and like (not to say love) her in spite of her less amiable qualities. Kate Beckinsale is the main, but not the only, reason why this TV-series is so delightful; Raymond Coulthard is perfect as Mr. Frank Churchill, expressing this character\xc2\xb4s personal magnetism to the full (which is all the more conspicuous because of this role being not very well handled by Ewan McGregor in the 1996-screen adaptation of Emma), and Mark Strong, Samantha Morton, Bernard Hepton, and Olivia Williams are all as they should be in their respective roles. This production is, in short, a great achievement and one to view many times with increasing pleasure.'
Round-trip:  [UNK] [UNK] is excellent as the [UNK] and yet [UNK] [UNK] [UNK] in this [UNK] of jane [UNK] novel when i read that novel i was sometimes quite [UNK] whether the [UNK] really [UNK] to be [UNK] the [UNK] of the story for [UNK] she is so [UNK] [UNK] and [UNK] that one is [UNK] to [UNK] her seriously [UNK] [UNK] [UNK] however [UNK] [UNK] from herself so to [UNK] she is portrayed with all the [UNK] and [UNK] of her character in full view and one [UNK] help but give in and like not to say love her in [UNK] of her less [UNK] [UNK] [UNK] [UNK] is the main but not the only reason why this [UNK] is so [UNK] [UNK] [UNK] is perfect as mr [UNK] [UNK] [UNK] this [UNK] personal [UNK] to the full which is all the more [UNK] because of this role being not very well [UNK] by [UNK] [UNK] in the [UNK] [UNK] of [UNK] and mark strong [UNK] [UNK] [UNK] [UNK] and [UNK] [UNK] are all as they should be in their [UNK] roles this production is in short a great [UNK] and one to view many times with [UNK] [UNK]                                                                                                                                                                                                                                                                                                                                                                                                               

Original:  b'If only Eddie Murphy were born 10 years later. Then we\'d all remember it. But even I was only 4 when it came out. If you haven\'t seen it yet, rent Dr. Dolittle, Showtime, I spy, Pluto Nash and all Eddie\'s family comedy movies - then watch this. Hands down, you\'ll laugh 90% of the time. The other 10% you\'ll be wiping the tears from your eyes.<br /><br />It really needs to be watched more then once to understand all the jokes. From crude humor to a joke for kids!(if you\'ve seen it you\'ll laugh here) - you\'ll love his stuff. If you can, (or are a big fan) try to download clips from Eddie\'s acts. Allot of the shows are different as you\'d imagine and he has even more funny jokes.<br /><br />But this is like the "best of" Eddie Murphy \'X-rated\' if you will.<br /><br />And all I can say is please don\'t watch Delirious if you don\'t like comedy, don\'t have a sense of humor or are not fun to hang out with. You will only put down this great Eddie Murphy classic and possibly make someone miss out on it.<br /><br />If you wanna know how Eddie got Beverly Hills Cop and got famous from it- Delirious is it.'
Round-trip:  if only [UNK] [UNK] were [UNK] 10 years later then [UNK] all remember it but even i was only 4 when it came out if you havent seen it yet rent dr [UNK] [UNK] i [UNK] [UNK] [UNK] and all [UNK] family comedy movies then watch this hands down youll laugh [UNK] of the time the other 10 youll be [UNK] the [UNK] from your [UNK] br it really needs to be watched more then once to understand all the jokes from [UNK] humor to a joke for [UNK] youve seen it youll laugh here youll love his stuff if you can or are a big fan try to [UNK] [UNK] from [UNK] [UNK] [UNK] of the shows are different as [UNK] imagine and he has even more funny [UNK] br but this is like the best of [UNK] [UNK] [UNK] if you [UNK] br and all i can say is please dont watch [UNK] if you dont like comedy dont have a sense of humor or are not fun to [UNK] out with you will only put down this great [UNK] [UNK] classic and possibly make someone miss out on itbr br if you [UNK] know how [UNK] got [UNK] [UNK] [UNK] and got famous from it [UNK] is it

Crea el modelo

Un dibujo del flujo de información en el modelo.

Arriba hay un diagrama del modelo.

  1. Este modelo se puede construir como un tf.keras.Sequential .

  2. La primera capa es la encoder , que convierte el texto en una secuencia de índices de tokens.

  3. Después del codificador hay una capa de incrustación. Una capa de incrustación almacena un vector por palabra. Cuando se llama, convierte las secuencias de índices de palabras en secuencias de vectores. Estos vectores se pueden entrenar. Después del entrenamiento (con suficientes datos), las palabras con significados similares a menudo tienen vectores similares.

    Este índice de búsqueda es mucho más eficiente que la operación equivalente de pasar un vector codificada de una sola caliente a través de un tf.keras.layers.Dense capa.

  4. Una red neuronal recurrente (RNN) procesa la entrada de secuencia iterando a través de los elementos. Los RNN pasan las salidas de un paso de tiempo a su entrada en el siguiente paso de tiempo.

    El tf.keras.layers.Bidirectional envoltura también se puede utilizar con una capa RNN. Esto propaga la entrada hacia adelante y hacia atrás a través de la capa RNN y luego concatena la salida final.

    • La principal ventaja de un RNN bidireccional es que la señal desde el principio de la entrada no necesita procesarse hasta el final en cada paso de tiempo para afectar la salida.

    • La principal desventaja de un RNN bidireccional es que no puede transmitir predicciones de manera eficiente a medida que se agregan palabras al final.

  5. Después de la RNN ha convertido la secuencia para un único vector de los dos layers.Dense hacer algo de procesamiento final, y convertir de esta representación vectorial a un solo logit como la salida de clasificación.

El código para implementar esto es el siguiente:

model = tf.keras.Sequential([
    encoder,
    tf.keras.layers.Embedding(
        input_dim=len(encoder.get_vocabulary()),
        output_dim=64,
        # Use masking to handle the variable sequence lengths
        mask_zero=True),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(1)
])

Tenga en cuenta que aquí se utiliza el modelo secuencial de Keras, ya que todas las capas del modelo solo tienen una entrada única y producen una salida única. En caso de que desee utilizar la capa RNN con estado, es posible que desee crear su modelo con la API funcional de Keras o la subclasificación del modelo para poder recuperar y reutilizar los estados de la capa RNN. Por favor, compruebe guía Keras RNN para más detalles.

Los capa de encaje usos de enmascaramiento para manejar la secuencia variable longitudes. Todas las capas después de la Embedding enmascaramiento de apoyo:

print([layer.supports_masking for layer in model.layers])
[False, True, True, True, True]

Para confirmar que esto funciona como se esperaba, evalúe una oración dos veces. Primero, solo para que no haya relleno para enmascarar:

# predict on a sample text without padding.

sample_text = ('The movie was cool. The animation and the graphics '
               'were out of this world. I would recommend this movie.')
predictions = model.predict(np.array([sample_text]))
print(predictions[0])
2021-08-11 17:14:00.070455: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-08-11 17:14:02.142033: I tensorflow/stream_executor/cuda/cuda_dnn.cc:359] Loaded cuDNN version 8100
2021-08-11 17:14:03.154836: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
[0.00120276]
2021-08-11 17:14:03.513036: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11

Ahora, evalúelo nuevamente en un lote con una oración más larga. El resultado debería ser idéntico:

# predict on a sample text with padding

padding = "the " * 2000
predictions = model.predict(np.array([sample_text, padding]))
print(predictions[0])
[0.00120276]

Compile el modelo de Keras para configurar el proceso de entrenamiento:

model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
              optimizer=tf.keras.optimizers.Adam(1e-4),
              metrics=['accuracy'])

Entrena el modelo

history = model.fit(train_dataset, epochs=10,
                    validation_data=test_dataset,
                    validation_steps=30)
Epoch 1/10
391/391 [==============================] - 41s 85ms/step - loss: 0.6308 - accuracy: 0.5788 - val_loss: 0.4698 - val_accuracy: 0.7927
Epoch 2/10
391/391 [==============================] - 32s 80ms/step - loss: 0.4226 - accuracy: 0.8127 - val_loss: 0.3699 - val_accuracy: 0.8370
Epoch 3/10
391/391 [==============================] - 32s 81ms/step - loss: 0.3429 - accuracy: 0.8519 - val_loss: 0.3456 - val_accuracy: 0.8516
Epoch 4/10
391/391 [==============================] - 32s 80ms/step - loss: 0.3224 - accuracy: 0.8621 - val_loss: 0.3357 - val_accuracy: 0.8589
Epoch 5/10
391/391 [==============================] - 32s 80ms/step - loss: 0.3149 - accuracy: 0.8647 - val_loss: 0.3406 - val_accuracy: 0.8594
Epoch 6/10
391/391 [==============================] - 34s 83ms/step - loss: 0.3073 - accuracy: 0.8702 - val_loss: 0.3276 - val_accuracy: 0.8615
Epoch 7/10
391/391 [==============================] - 32s 80ms/step - loss: 0.3039 - accuracy: 0.8706 - val_loss: 0.3344 - val_accuracy: 0.8417
Epoch 8/10
391/391 [==============================] - 32s 80ms/step - loss: 0.3001 - accuracy: 0.8728 - val_loss: 0.3267 - val_accuracy: 0.8469
Epoch 9/10
391/391 [==============================] - 32s 80ms/step - loss: 0.2994 - accuracy: 0.8739 - val_loss: 0.3287 - val_accuracy: 0.8599
Epoch 10/10
391/391 [==============================] - 32s 80ms/step - loss: 0.2968 - accuracy: 0.8729 - val_loss: 0.3197 - val_accuracy: 0.8536
test_loss, test_acc = model.evaluate(test_dataset)

print('Test Loss:', test_loss)
print('Test Accuracy:', test_acc)
391/391 [==============================] - 15s 38ms/step - loss: 0.3178 - accuracy: 0.8555
Test Loss: 0.31781235337257385
Test Accuracy: 0.8554800152778625
plt.figure(figsize=(16, 8))
plt.subplot(1, 2, 1)
plot_graphs(history, 'accuracy')
plt.ylim(None, 1)
plt.subplot(1, 2, 2)
plot_graphs(history, 'loss')
plt.ylim(0, None)
(0.0, 0.6475058257579803)

png

Ejecute una predicción en una nueva oración:

Si la predicción es> = 0.0, es positiva, de lo contrario es negativa.

sample_text = ('The movie was cool. The animation and the graphics '
               'were out of this world. I would recommend this movie.')
predictions = model.predict(np.array([sample_text]))

Apila dos o más capas de LSTM

Capas recurrentes Keras tienen dos modos disponibles que son controlados por el return_sequences argumento del constructor:

  • Si False devuelve sólo el último de salida para cada secuencia de entrada (un tensor 2D de forma (batch_size, output_features)). Este es el predeterminado, utilizado en el modelo anterior.

  • Si True se devuelven las secuencias completas de salidas sucesivas para cada paso de tiempo (un tensor 3D de la forma (batch_size, timesteps, output_features) ).

Esto es lo que el flujo de información como se ve con return_sequences=True :

layered_bidirectional

Lo interesante acerca del uso de un RNN con return_sequences=True es que la salida todavía tiene 3 ejes, como la entrada, por lo que se puede pasar a otra capa RNN, como este:

model = tf.keras.Sequential([
    encoder,
    tf.keras.layers.Embedding(len(encoder.get_vocabulary()), 64, mask_zero=True),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64,  return_sequences=True)),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32)),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(1)
])
model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
              optimizer=tf.keras.optimizers.Adam(1e-4),
              metrics=['accuracy'])
history = model.fit(train_dataset, epochs=10,
                    validation_data=test_dataset,
                    validation_steps=30)
Epoch 1/10
391/391 [==============================] - 73s 149ms/step - loss: 0.6253 - accuracy: 0.5859 - val_loss: 0.4572 - val_accuracy: 0.7734
Epoch 2/10
391/391 [==============================] - 55s 138ms/step - loss: 0.3838 - accuracy: 0.8322 - val_loss: 0.3487 - val_accuracy: 0.8542
Epoch 3/10
391/391 [==============================] - 55s 138ms/step - loss: 0.3348 - accuracy: 0.8556 - val_loss: 0.3277 - val_accuracy: 0.8568
Epoch 4/10
391/391 [==============================] - 59s 148ms/step - loss: 0.3161 - accuracy: 0.8630 - val_loss: 0.3227 - val_accuracy: 0.8604
Epoch 5/10
391/391 [==============================] - 55s 139ms/step - loss: 0.3098 - accuracy: 0.8674 - val_loss: 0.3237 - val_accuracy: 0.8453
Epoch 6/10
391/391 [==============================] - 55s 138ms/step - loss: 0.3038 - accuracy: 0.8695 - val_loss: 0.3185 - val_accuracy: 0.8594
Epoch 7/10
391/391 [==============================] - 56s 139ms/step - loss: 0.3033 - accuracy: 0.8707 - val_loss: 0.3437 - val_accuracy: 0.8604
Epoch 8/10
391/391 [==============================] - 55s 139ms/step - loss: 0.3005 - accuracy: 0.8717 - val_loss: 0.3215 - val_accuracy: 0.8521
Epoch 9/10
391/391 [==============================] - 57s 139ms/step - loss: 0.2986 - accuracy: 0.8717 - val_loss: 0.3208 - val_accuracy: 0.8469
Epoch 10/10
391/391 [==============================] - 55s 138ms/step - loss: 0.2948 - accuracy: 0.8707 - val_loss: 0.3271 - val_accuracy: 0.8641
test_loss, test_acc = model.evaluate(test_dataset)

print('Test Loss:', test_loss)
print('Test Accuracy:', test_acc)
391/391 [==============================] - 26s 66ms/step - loss: 0.3226 - accuracy: 0.8630
Test Loss: 0.3225603401660919
Test Accuracy: 0.8629999756813049
# predict on a sample text without padding.

sample_text = ('The movie was not good. The animation and the graphics '
               'were terrible. I would not recommend this movie.')
predictions = model.predict(np.array([sample_text]))
print(predictions)
[[-1.6429266]]
plt.figure(figsize=(16, 6))
plt.subplot(1, 2, 1)
plot_graphs(history, 'accuracy')
plt.subplot(1, 2, 2)
plot_graphs(history, 'loss')

png

Echa un vistazo a otras capas recurrentes existentes, tales como capas GRU .

Si usted está en la construcción de interestied RNNs personalizados, consulte la Guía de Keras RNN .