此页面由 Cloud Translation API 翻译。
Switch to English

使用合成图进行情感分类的图正则化

在TensorFlow.org上查看 在Google Colab中运行 在GitHub上查看源代码

总览

该笔记本使用评论文本将电影评论分为正面评论或负面评论。这是二进制分类的示例, 二进制分类是一种重要且广泛适用的机器学习问题。

我们将通过从给定的输入中构建一个图形来演示在此笔记本中图形正则化的用法。当输入不包含显式图时,使用神经结构学习(NSL)框架构建图正则化模型的一般方法如下:

  1. 为输入中的每个文本样本创建嵌入。这可以使用预先训练的模型来完成,例如word2vecSwivelBERT等。
  2. 通过使用“ L2”距离,“余弦”距离等相似性度量,基于这些嵌入构建图。图中的节点对应于样本,图中的边对应于成对样本之间的相似性。
  3. 从上面的合成图和样本特征生成训练数据。所得的训练数据除原始节点特征外还将包含邻居特征。
  4. 使用Keras顺序,功能或子类API将神经网络创建为基本模型。
  5. 使用由NSL框架提供的GraphRegularization包装器类包装基本模型,以创建新的图Keras模型。这个新模型将在训练目标中包括图正则化损失作为正则化项。
  6. 训练和评估图Keras模型。

要求

  1. 安装神经结构学习包。
  2. 安装tensorflow-hub
pip install --quiet neural-structured-learning
pip install --quiet tensorflow-hub

依赖项和导入

 import matplotlib.pyplot as plt
import numpy as np

import neural_structured_learning as nsl

import tensorflow as tf
import tensorflow_hub as hub

# Resets notebook state
tf.keras.backend.clear_session()

print("Version: ", tf.__version__)
print("Eager mode: ", tf.executing_eagerly())
print("Hub version: ", hub.__version__)
print(
    "GPU is",
    "available" if tf.config.list_physical_devices("GPU") else "NOT AVAILABLE")
 
Version:  2.3.0
Eager mode:  True
Hub version:  0.8.0
GPU is NOT AVAILABLE

IMDB数据集

IMDB数据集包含来自Internet电影数据库的50,000个电影评论的文本。这些内容分为25,000条用于培训的评论和25,000条用于测试的评论。培训和测试集是平衡的 ,这意味着它们包含相同数量的正面和负面评论。

在本教程中,我们将使用IMDB数据集的预处理版本。

下载预处理的IMDB数据集

IMDB数据集与TensorFlow打包在一起。已经对其进行了预处理,以使评论(单词序列)已转换为整数序列,其中每个整数代表字典中的特定单词。

以下代码下载IMDB数据集(如果已下载,则使用缓存的副本):

 imdb = tf.keras.datasets.imdb
(pp_train_data, pp_train_labels), (pp_test_data, pp_test_labels) = (
    imdb.load_data(num_words=10000))
 
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz
17465344/17464789 [==============================] - 0s 0us/step

参数num_words=10000将训练数据中的10,000个最频繁出现的单词保留num_words=10000 。稀有单词被丢弃以保持词汇量的可管理性。

探索数据

让我们花一点时间来理解数据的格式。数据集已经过预处理:每个示例都是一个整数数组,代表电影评论的单词。每个标签都是0或1的整数值,其中0是否定评论,而1是肯定评论。

 print('Training entries: {}, labels: {}'.format(
    len(pp_train_data), len(pp_train_labels)))
training_samples_count = len(pp_train_data)
 
Training entries: 25000, labels: 25000

评论文本已转换为整数,其中每个整数代表字典中的特定单词。这是第一条评论的样子:

 print(pp_train_data[0])
 
[1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 5952, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]

电影评论的长度可能不同。以下代码显示了第一条评论和第二条评论中的单词数。由于神经网络的输入必须具有相同的长度,因此我们稍后需要解决。

 len(pp_train_data[0]), len(pp_train_data[1])
 
(218, 189)

将整数转换回单词

知道如何将整数转换回相应的文本可能很有用。在这里,我们将创建一个辅助函数来查询包含整数到字符串映射的字典对象:

 def build_reverse_word_index():
  # A dictionary mapping words to an integer index
  word_index = imdb.get_word_index()

  # The first indices are reserved
  word_index = {k: (v + 3) for k, v in word_index.items()}
  word_index['<PAD>'] = 0
  word_index['<START>'] = 1
  word_index['<UNK>'] = 2  # unknown
  word_index['<UNUSED>'] = 3
  return dict((value, key) for (key, value) in word_index.items())

reverse_word_index = build_reverse_word_index()

def decode_review(text):
  return ' '.join([reverse_word_index.get(i, '?') for i in text])
 
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb_word_index.json
1646592/1641221 [==============================] - 0s 0us/step

现在,我们可以使用decode_review函数显示第一次审阅的文本:

 decode_review(pp_train_data[0])
 
"<START> this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert <UNK> is an amazing actor and now the same being director <UNK> father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for <UNK> and would recommend it to everyone to watch and the fly fishing was amazing really cried at the end it was so sad and you know what they say if you cry at a film it must have been good and this definitely was also <UNK> to the two little boy's that played the <UNK> of norman and paul they were just brilliant children are often left out of the <UNK> list i think because the stars that play them all grown up are such a big profile for the whole film but these children are amazing and should be praised for what they have done don't you think the whole story was so lovely because it was true and was someone's life after all that was shared with us all"

图的构造

图的构建涉及为文本样本创建嵌入,然后使用相似性函数比较嵌入。

在继续进行之前,我们首先创建一个目录来存储本教程创建的工件。

mkdir -p /tmp/imdb

创建样本嵌入

我们将使用预训练的Swivel嵌入在tf.train.Example格式中为输入中的每个样本创建嵌入。我们将以TFRecord格式存储生成的嵌入TFRecord以及代表每个样本ID的其他功能。这很重要,以后我们可以将样本嵌入与图中的相应节点进行匹配。

 pretrained_embedding = 'https://  tfhub.dev  /google/tf2-preview/gnews-swivel-20dim/1'

hub_layer = hub.KerasLayer(
    pretrained_embedding, input_shape=[], dtype=tf.string, trainable=True)
 
 def _int64_feature(value):
  """Returns int64 tf.train.Feature."""
  return tf.train.Feature(int64_list=tf.train.Int64List(value=value.tolist()))


def _bytes_feature(value):
  """Returns bytes tf.train.Feature."""
  return tf.train.Feature(
      bytes_list=tf.train.BytesList(value=[value.encode('utf-8')]))


def _float_feature(value):
  """Returns float tf.train.Feature."""
  return tf.train.Feature(float_list=tf.train.FloatList(value=value.tolist()))


def create_embedding_example(word_vector, record_id):
  """Create tf.Example containing the sample's embedding and its ID."""

  text = decode_review(word_vector)

  # Shape = [batch_size,].
  sentence_embedding = hub_layer(tf.reshape(text, shape=[-1,]))

  # Flatten the sentence embedding back to 1-D.
  sentence_embedding = tf.reshape(sentence_embedding, shape=[-1])

  features = {
      'id': _bytes_feature(str(record_id)),
      'embedding': _float_feature(sentence_embedding.numpy())
  }
  return tf.train.Example(features=tf.train.Features(feature=features))


def create_embeddings(word_vectors, output_path, starting_record_id):
  record_id = int(starting_record_id)
  with tf.io.TFRecordWriter(output_path) as writer:
    for word_vector in word_vectors:
      example = create_embedding_example(word_vector, record_id)
      record_id = record_id + 1
      writer.write(example.SerializeToString())
  return record_id


# Persist TF.Example features containing embeddings for training data in
# TFRecord format.
create_embeddings(pp_train_data, '/tmp/imdb/embeddings.tfr', 0)
 
25000

建立图

现在我们有了样本嵌入,我们将使用它们来构建相似度图,即,该图中的节点将与样本相对应,并且该图中的边将与节点对之间的相似度相对应。

神经结构学习提供了一个图形构建库,用于基于样本嵌入构建图形。它使用余弦相似度作为相似度度量来比较嵌入并在它们之间建立边缘。它还允许我们指定一个相似度阈值,该阈值可用于从最终图形中丢弃不相似的边缘。在此示例中,使用0.99作为相似度阈值,我们最终得到了一个具有445,327个双向边的图。

 nsl.tools.build_graph(['/tmp/imdb/embeddings.tfr'],
                      '/tmp/imdb/graph_99.tsv',
                      similarity_threshold=0.99)
 

样品特征

我们使用tf.train.Example格式为问题创建示例功能,并将其保留为TFRecord格式。每个样本将包括以下三个功能:

  1. id :样本的节点ID。
  2. words :一个包含单词ID的int64列表。
  3. label :一个int64 int64,用于标识评论的目标类别。
 def create_example(word_vector, label, record_id):
  """Create tf.Example containing the sample's word vector, label, and ID."""
  features = {
      'id': _bytes_feature(str(record_id)),
      'words': _int64_feature(np.asarray(word_vector)),
      'label': _int64_feature(np.asarray([label])),
  }
  return tf.train.Example(features=tf.train.Features(feature=features))

def create_records(word_vectors, labels, record_path, starting_record_id):
  record_id = int(starting_record_id)
  with tf.io.TFRecordWriter(record_path) as writer:
    for word_vector, label in zip(word_vectors, labels):
      example = create_example(word_vector, label, record_id)
      record_id = record_id + 1
      writer.write(example.SerializeToString())
  return record_id

# Persist TF.Example features (word vectors and labels) for training and test
# data in TFRecord format.
next_record_id = create_records(pp_train_data, pp_train_labels,
                                '/tmp/imdb/train_data.tfr', 0)
create_records(pp_test_data, pp_test_labels, '/tmp/imdb/test_data.tfr',
               next_record_id)
 
50000

图邻居的增强训练数据

由于我们具有样本特征和合成图,因此我们可以生成用于神经结构学习的增强训练数据。 NSL框架提供了一个库,用于组合图形和样本特征,以生成用于图形正则化的最终训练数据。所得的训练数据将包括原始样本特征及其相应邻域的特征。

在本教程中,我们考虑无方向的边,每个样本最多使用3个邻居,以利用图邻居扩展训练数据。

 nsl.tools.pack_nbrs(
    '/tmp/imdb/train_data.tfr',
    '',
    '/tmp/imdb/graph_99.tsv',
    '/tmp/imdb/nsl_train_data.tfr',
    add_undirected_edges=True,
    max_nbrs=3)
 

基本型号

现在,我们准备建立不带图正则化的基本模型。为了构建此模型,我们可以使用在构建图形时使用的嵌入,也可以与分类任务一起学习新的嵌入。就此笔记本而言,我们将使用后者。

全局变量

 NBR_FEATURE_PREFIX = 'NL_nbr_'
NBR_WEIGHT_SUFFIX = '_weight'
 

超参数

我们将使用HParams实例来HParams用于训练和评估的各种超参数和常量。我们在下面简要介绍它们:

  • num_classes :分为2类- 正数负数

  • max_seq_length :在此示例中,这是每个电影评论考虑的最大单词数。

  • vocab_size :这是此示例考虑的词汇表的大小。

  • distance_type :这是用于将样本与其邻居进行正则化的距离度量。

  • graph_regularization_multiplier :这控制图规则化项在整体损失函数中的相对权重。

  • num_neighbors :用于图正则化的邻居数。调用nsl.tools.pack_nbrs时,此值必须小于或等于上面使用的max_nbrs参数。

  • num_fc_units :神经网络的完全连接层中的单位数。

  • train_epochs :训练时期的数量。

  • batch_size :用于培训和评估的批次大小。

  • eval_steps :在认为评估完成之前要处理的批次数。如果设置为None ,则评估测试集中的所有实例。

 class HParams(object):
  """Hyperparameters used for training."""
  def __init__(self):
    ### dataset parameters
    self.num_classes = 2
    self.max_seq_length = 256
    self.vocab_size = 10000
    ### neural graph learning parameters
    self.distance_type = nsl.configs.DistanceType.L2
    self.graph_regularization_multiplier = 0.1
    self.num_neighbors = 2
    ### model architecture
    self.num_embedding_dims = 16
    self.num_lstm_dims = 64
    self.num_fc_units = 64
    ### training parameters
    self.train_epochs = 10
    self.batch_size = 128
    ### eval parameters
    self.eval_steps = None  # All instances in the test set are evaluated.

HPARAMS = HParams()
 

准备数据

评论(整数数组)必须先转换为张量,然后再馈入神经网络。可以通过以下两种方法完成此转换:

  • 将数组转换为指示单词出现的0 s和1 s向量,类似于一键编码。例如,序列[3, 5]将成为一个10000维向量,除了索引35都是1之外,它们全为零。然后,使其成为我们网络中的第一层- Dense层-可以处理浮点矢量数据。但是,此方法需要占用大量内存,需要num_words * num_reviews大小矩阵。

  • 另外,我们可以填充数组,使它们都具有相同的长度,然后创建一个形状为max_length * num_reviews的整数张量。我们可以使用能够处理此形状的嵌入层作为网络中的第一层。

在本教程中,我们将使用第二种方法。

由于电影评论的长度必须相同,因此我们将使用下面定义的pad_sequence函数来标准化长度。

 def make_dataset(file_path, training=False):
  """Creates a `tf.data.TFRecordDataset`.

  Args:
    file_path: Name of the file in the `.tfrecord` format containing
      `tf.train.Example` objects.
    training: Boolean indicating if we are in training mode.

  Returns:
    An instance of `tf.data.TFRecordDataset` containing the `tf.train.Example`
    objects.
  """

  def pad_sequence(sequence, max_seq_length):
    """Pads the input sequence (a `tf.SparseTensor`) to `max_seq_length`."""
    pad_size = tf.maximum([0], max_seq_length - tf.shape(sequence)[0])
    padded = tf.concat(
        [sequence.values,
         tf.fill((pad_size), tf.cast(0, sequence.dtype))],
        axis=0)
    # The input sequence may be larger than max_seq_length. Truncate down if
    # necessary.
    return tf.slice(padded, [0], [max_seq_length])

  def parse_example(example_proto):
    """Extracts relevant fields from the `example_proto`.

    Args:
      example_proto: An instance of `tf.train.Example`.

    Returns:
      A pair whose first value is a dictionary containing relevant features
      and whose second value contains the ground truth labels.
    """
    # The 'words' feature is a variable length word ID vector.
    feature_spec = {
        'words': tf.io.VarLenFeature(tf.int64),
        'label': tf.io.FixedLenFeature((), tf.int64, default_value=-1),
    }
    # We also extract corresponding neighbor features in a similar manner to
    # the features above during training.
    if training:
      for i in range(HPARAMS.num_neighbors):
        nbr_feature_key = '{}{}_{}'.format(NBR_FEATURE_PREFIX, i, 'words')
        nbr_weight_key = '{}{}{}'.format(NBR_FEATURE_PREFIX, i,
                                         NBR_WEIGHT_SUFFIX)
        feature_spec[nbr_feature_key] = tf.io.VarLenFeature(tf.int64)

        # We assign a default value of 0.0 for the neighbor weight so that
        # graph regularization is done on samples based on their exact number
        # of neighbors. In other words, non-existent neighbors are discounted.
        feature_spec[nbr_weight_key] = tf.io.FixedLenFeature(
            [1], tf.float32, default_value=tf.constant([0.0]))

    features = tf.io.parse_single_example(example_proto, feature_spec)

    # Since the 'words' feature is a variable length word vector, we pad it to a
    # constant maximum length based on HPARAMS.max_seq_length
    features['words'] = pad_sequence(features['words'], HPARAMS.max_seq_length)
    if training:
      for i in range(HPARAMS.num_neighbors):
        nbr_feature_key = '{}{}_{}'.format(NBR_FEATURE_PREFIX, i, 'words')
        features[nbr_feature_key] = pad_sequence(features[nbr_feature_key],
                                                 HPARAMS.max_seq_length)

    labels = features.pop('label')
    return features, labels

  dataset = tf.data.TFRecordDataset([file_path])
  if training:
    dataset = dataset.shuffle(10000)
  dataset = dataset.map(parse_example)
  dataset = dataset.batch(HPARAMS.batch_size)
  return dataset


train_dataset = make_dataset('/tmp/imdb/nsl_train_data.tfr', True)
test_dataset = make_dataset('/tmp/imdb/test_data.tfr')
 

建立模型

通过堆叠图层来创建神经网络-这需要两个主要的体系结构决策:

  • 在模型中使用多少层?
  • 每层使用多少个隐藏单元

在此示例中,输入数据由单词索引数组组成。要预测的标签为0或1。

在本教程中,我们将使用双向LSTM作为基本模型。

 # This function exists as an alternative to the bi-LSTM model used in this
# notebook.
def make_feed_forward_model():
  """Builds a simple 2 layer feed forward neural network."""
  inputs = tf.keras.Input(
      shape=(HPARAMS.max_seq_length,), dtype='int64', name='words')
  embedding_layer = tf.keras.layers.Embedding(HPARAMS.vocab_size, 16)(inputs)
  pooling_layer = tf.keras.layers.GlobalAveragePooling1D()(embedding_layer)
  dense_layer = tf.keras.layers.Dense(16, activation='relu')(pooling_layer)
  outputs = tf.keras.layers.Dense(1, activation='sigmoid')(dense_layer)
  return tf.keras.Model(inputs=inputs, outputs=outputs)


def make_bilstm_model():
  """Builds a bi-directional LSTM model."""
  inputs = tf.keras.Input(
      shape=(HPARAMS.max_seq_length,), dtype='int64', name='words')
  embedding_layer = tf.keras.layers.Embedding(HPARAMS.vocab_size,
                                              HPARAMS.num_embedding_dims)(
                                                  inputs)
  lstm_layer = tf.keras.layers.Bidirectional(
      tf.keras.layers.LSTM(HPARAMS.num_lstm_dims))(
          embedding_layer)
  dense_layer = tf.keras.layers.Dense(
      HPARAMS.num_fc_units, activation='relu')(
          lstm_layer)
  outputs = tf.keras.layers.Dense(1, activation='sigmoid')(dense_layer)
  return tf.keras.Model(inputs=inputs, outputs=outputs)


# Feel free to use an architecture of your choice.
model = make_bilstm_model()
model.summary()
 
Model: "functional_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
words (InputLayer)           [(None, 256)]             0         
_________________________________________________________________
embedding (Embedding)        (None, 256, 16)           160000    
_________________________________________________________________
bidirectional (Bidirectional (None, 128)               41472     
_________________________________________________________________
dense (Dense)                (None, 64)                8256      
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 65        
=================================================================
Total params: 209,793
Trainable params: 209,793
Non-trainable params: 0
_________________________________________________________________

有效地依次堆叠各层以构建分类器:

  1. 第一层是Input层,它接受整数编码的词汇表。
  2. 下一层是Embedding层,它采用整数编码的词汇表,并为每个单词索引查找嵌入向量。这些向量是在模型训练中学习的。向量将维度添加到输出数组。产生的尺寸为:( (batch, sequence, embedding)
  3. 接下来,双向LSTM层为每个示例返回固定长度的输出向量。
  4. 该固定长度的输出矢量通过具有64个隐藏单元的完全连接( Dense )层进行管道传输。
  5. 最后一层与单个输出节点紧密连接。使用sigmoid激活函数,此值是0到1之间的浮点数,表示概率或置信度。

隐藏的单位

上面的模型在输入和输出之间有两个中间层或“隐藏”层,不包括Embedding层。输出(单元,节点或神经元)的数量是该图层的表示空间的尺寸。换句话说,学习内部表示时允许网络的自由度。

如果模型具有更多的隐藏单元(较高维的表示空间)和/或更多的层,则网络可以学习更多复杂的表示。但是,这会使网络的计算成本更高,并且可能导致学习不必要的模式,这些模式可以提高训练数据的性能,但不能提高测试数据的性能。这称为过拟合

损失函数和优化器

模型需要损失函数和用于训练的优化器。由于这是一个二进制分类问题,并且该模型输出概率(具有S形激活的单个单元层),因此我们将使用binary_crossentropy损失函数。

 model.compile(
    optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
 

创建一个验证集

训练时,我们希望根据以前从未见过的数据检查模型的准确性。通过分开一部分原始训练数据来创建验证集 。 (为什么现在不使用测试集?我们的目标是仅使用训练数据来开发和调整模型,然后仅使用测试数据一次来评估我们的准确性)。

在本教程中,我们将大约10%的初始训练样本(25000中的10%)作为标记数据进行训练,其余作为验证数据。由于初始训练/测试划分为50/50(每个25000个样本),所以我们现在拥有的有效训练/验证/测试划分为5/45/50。

注意,“ train_dataset”已经被批处理和改组。

 validation_fraction = 0.9
validation_size = int(validation_fraction *
                      int(training_samples_count / HPARAMS.batch_size))
print(validation_size)
validation_dataset = train_dataset.take(validation_size)
train_dataset = train_dataset.skip(validation_size)
 
175

训练模型

迷你批次训练模型。训练时,在验证集上监视模型的损失和准确性:

 history = model.fit(
    train_dataset,
    validation_data=validation_dataset,
    epochs=HPARAMS.train_epochs,
    verbose=1)
 
Epoch 1/10

/tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/keras/engine/functional.py:543: UserWarning: Input dict contained keys ['NL_nbr_0_words', 'NL_nbr_1_words', 'NL_nbr_0_weight', 'NL_nbr_1_weight'] which did not match any model input. They will be ignored by the model.
  [n for n in tensors.keys() if n not in ref_input_names])

21/21 [==============================] - 19s 925ms/step - loss: 0.6930 - accuracy: 0.5092 - val_loss: 0.6924 - val_accuracy: 0.5006
Epoch 2/10
21/21 [==============================] - 19s 894ms/step - loss: 0.6890 - accuracy: 0.5465 - val_loss: 0.7294 - val_accuracy: 0.5698
Epoch 3/10
21/21 [==============================] - 19s 883ms/step - loss: 0.6785 - accuracy: 0.6208 - val_loss: 0.6489 - val_accuracy: 0.7043
Epoch 4/10
21/21 [==============================] - 19s 890ms/step - loss: 0.6592 - accuracy: 0.6400 - val_loss: 0.6523 - val_accuracy: 0.6866
Epoch 5/10
21/21 [==============================] - 19s 883ms/step - loss: 0.6413 - accuracy: 0.6923 - val_loss: 0.6335 - val_accuracy: 0.7004
Epoch 6/10
21/21 [==============================] - 21s 982ms/step - loss: 0.6053 - accuracy: 0.7188 - val_loss: 0.5716 - val_accuracy: 0.7183
Epoch 7/10
21/21 [==============================] - 18s 879ms/step - loss: 0.5204 - accuracy: 0.7619 - val_loss: 0.4511 - val_accuracy: 0.7930
Epoch 8/10
21/21 [==============================] - 19s 882ms/step - loss: 0.4719 - accuracy: 0.7758 - val_loss: 0.4244 - val_accuracy: 0.8094
Epoch 9/10
21/21 [==============================] - 18s 880ms/step - loss: 0.3695 - accuracy: 0.8431 - val_loss: 0.3567 - val_accuracy: 0.8487
Epoch 10/10
21/21 [==============================] - 19s 891ms/step - loss: 0.3504 - accuracy: 0.8500 - val_loss: 0.3219 - val_accuracy: 0.8652

评估模型

现在,让我们看看模型的性能。将返回两个值。损失(代表我们的错误的数字,较低的值更好)和准确性。

 results = model.evaluate(test_dataset, steps=HPARAMS.eval_steps)
print(results)
 
196/196 [==============================] - 17s 85ms/step - loss: 0.4116 - accuracy: 0.8221
[0.4116455018520355, 0.8221200108528137]

创建随时间变化的精度/损耗图

model.fit()返回一个History对象,该对象包含一个字典,其中包含训练期间发生的所有事情:

 history_dict = history.history
history_dict.keys()
 
dict_keys(['loss', 'accuracy', 'val_loss', 'val_accuracy'])

有四个条目:在训练和验证期间每个受监视的指标一个。我们可以使用这些来绘制训练和验证损失以进行比较,以及训练和验证准确性:

 acc = history_dict['accuracy']
val_acc = history_dict['val_accuracy']
loss = history_dict['loss']
val_loss = history_dict['val_loss']

epochs = range(1, len(acc) + 1)

# "-r^" is for solid red line with triangle markers.
plt.plot(epochs, loss, '-r^', label='Training loss')
# "-b0" is for solid blue line with circle markers.
plt.plot(epochs, val_loss, '-bo', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend(loc='best')

plt.show()
 

png

 plt.clf()   # clear figure

plt.plot(epochs, acc, '-r^', label='Training acc')
plt.plot(epochs, val_acc, '-bo', label='Validation acc')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend(loc='best')

plt.show()
 

png

请注意,训练损失在每个时期都会减少 ,而训练精度在每个时期都增加 。当使用梯度下降优化时,这是可以预期的-它应该在每次迭代中将所需的数量最小化。

图正则化

现在,我们准备使用上面构建的基本模型尝试图形正则化。我们将使用神经结构学习框架提供的GraphRegularization包装器类包装基本(bi-LSTM)模型以包括图正则化。训练和评估图形正则化模型的其余步骤与基本模型相似。

创建图规则化模型

为了评估图正则化的增量收益,我们将创建一个新的基本模型实例。这是因为已经对model进行了几次迭代训练,并且重新使用该训练后的模型来创建图形正则化模型对于model不是公平的比较。

 # Build a new base LSTM model.
base_reg_model = make_bilstm_model()
 
 # Wrap the base model with graph regularization.
graph_reg_config = nsl.configs.make_graph_reg_config(
    max_neighbors=HPARAMS.num_neighbors,
    multiplier=HPARAMS.graph_regularization_multiplier,
    distance_type=HPARAMS.distance_type,
    sum_over_axis=-1)
graph_reg_model = nsl.keras.GraphRegularization(base_reg_model,
                                                graph_reg_config)
graph_reg_model.compile(
    optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
 

训练模型

 graph_reg_history = graph_reg_model.fit(
    train_dataset,
    validation_data=validation_dataset,
    epochs=HPARAMS.train_epochs,
    verbose=1)
 
Epoch 1/10

/tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/framework/indexed_slices.py:432: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "

21/21 [==============================] - 22s 1s/step - loss: 0.6930 - accuracy: 0.5246 - scaled_graph_loss: 2.9800e-06 - val_loss: 0.6929 - val_accuracy: 0.4998
Epoch 2/10
21/21 [==============================] - 21s 988ms/step - loss: 0.6909 - accuracy: 0.5200 - scaled_graph_loss: 7.8452e-06 - val_loss: 0.6838 - val_accuracy: 0.5917
Epoch 3/10
21/21 [==============================] - 21s 980ms/step - loss: 0.6656 - accuracy: 0.6277 - scaled_graph_loss: 6.1205e-04 - val_loss: 0.6591 - val_accuracy: 0.6905
Epoch 4/10
21/21 [==============================] - 21s 981ms/step - loss: 0.6395 - accuracy: 0.6846 - scaled_graph_loss: 0.0016 - val_loss: 0.5860 - val_accuracy: 0.7171
Epoch 5/10
21/21 [==============================] - 21s 980ms/step - loss: 0.5388 - accuracy: 0.7573 - scaled_graph_loss: 0.0043 - val_loss: 0.4910 - val_accuracy: 0.7844
Epoch 6/10
21/21 [==============================] - 21s 989ms/step - loss: 0.4105 - accuracy: 0.8281 - scaled_graph_loss: 0.0146 - val_loss: 0.3353 - val_accuracy: 0.8612
Epoch 7/10
21/21 [==============================] - 21s 986ms/step - loss: 0.3416 - accuracy: 0.8681 - scaled_graph_loss: 0.0203 - val_loss: 0.4134 - val_accuracy: 0.8209
Epoch 8/10
21/21 [==============================] - 21s 981ms/step - loss: 0.4230 - accuracy: 0.8273 - scaled_graph_loss: 0.0144 - val_loss: 0.4755 - val_accuracy: 0.7696
Epoch 9/10
21/21 [==============================] - 22s 1s/step - loss: 0.4905 - accuracy: 0.7950 - scaled_graph_loss: 0.0080 - val_loss: 0.3862 - val_accuracy: 0.8382
Epoch 10/10
21/21 [==============================] - 21s 978ms/step - loss: 0.3384 - accuracy: 0.8754 - scaled_graph_loss: 0.0215 - val_loss: 0.3002 - val_accuracy: 0.8811

评估模型

 graph_reg_results = graph_reg_model.evaluate(test_dataset, steps=HPARAMS.eval_steps)
print(graph_reg_results)
 
196/196 [==============================] - 16s 84ms/step - loss: 0.3852 - accuracy: 0.8301
[0.385225385427475, 0.830079972743988]

创建随时间变化的精度/损耗图

 graph_reg_history_dict = graph_reg_history.history
graph_reg_history_dict.keys()
 
dict_keys(['loss', 'accuracy', 'scaled_graph_loss', 'val_loss', 'val_accuracy'])

词典中总共有五个条目:训练损失,训练准确性,训练图损失,验证损失和验证准确性。我们可以将它们一起绘制以进行比较。请注意,图表损失仅在训练期间计算。

 acc = graph_reg_history_dict['accuracy']
val_acc = graph_reg_history_dict['val_accuracy']
loss = graph_reg_history_dict['loss']
graph_loss = graph_reg_history_dict['scaled_graph_loss']
val_loss = graph_reg_history_dict['val_loss']

epochs = range(1, len(acc) + 1)

plt.clf()   # clear figure

# "-r^" is for solid red line with triangle markers.
plt.plot(epochs, loss, '-r^', label='Training loss')
# "-gD" is for solid green line with diamond markers.
plt.plot(epochs, graph_loss, '-gD', label='Training graph loss')
# "-b0" is for solid blue line with circle markers.
plt.plot(epochs, val_loss, '-bo', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend(loc='best')

plt.show()
 

png

 plt.clf()   # clear figure

plt.plot(epochs, acc, '-r^', label='Training acc')
plt.plot(epochs, val_acc, '-bo', label='Validation acc')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend(loc='best')

plt.show()
 

png

半监督学习的力量

当训练数据量很少时,半监督学习,更具体地说,在本教程的上下文中的图正则化可能非常强大。缺乏训练数据可通过利用训练样本之间的相似性来弥补,这在传统的有监督学习中是不可能的。

我们将监督比率定义为训练样本与样本总数之比,​​其中包括训练样本,验证样本和测试样本。在此笔记本中,我们使用了0.05的监管比率(即,标记数据的5%)来训练基本模型和图形正则化模型。我们在下面的单元格中说明监管比率对模型准确性的影响。

 # Accuracy values for both the Bi-LSTM model and the feed forward NN model have
# been precomputed for the following supervision ratios.

supervision_ratios = [0.3, 0.15, 0.05, 0.03, 0.02, 0.01, 0.005]

model_tags = ['Bi-LSTM model', 'Feed Forward NN model']
base_model_accs = [[84, 84, 83, 80, 65, 52, 50], [87, 86, 76, 74, 67, 52, 51]]
graph_reg_model_accs = [[84, 84, 83, 83, 65, 63, 50],
                        [87, 86, 80, 75, 67, 52, 50]]

plt.clf()  # clear figure

fig, axes = plt.subplots(1, 2)
fig.set_size_inches((12, 5))

for ax, model_tag, base_model_acc, graph_reg_model_acc in zip(
    axes, model_tags, base_model_accs, graph_reg_model_accs):

  # "-r^" is for solid red line with triangle markers.
  ax.plot(base_model_acc, '-r^', label='Base model')
  # "-gD" is for solid green line with diamond markers.
  ax.plot(graph_reg_model_acc, '-gD', label='Graph-regularized model')
  ax.set_title(model_tag)
  ax.set_xlabel('Supervision ratio')
  ax.set_ylabel('Accuracy(%)')
  ax.set_ylim((25, 100))
  ax.set_xticks(range(len(supervision_ratios)))
  ax.set_xticklabels(supervision_ratios)
  ax.legend(loc='best')

plt.show()
 
<Figure size 432x288 with 0 Axes>

png

可以看出,随着监督比率的降低,模型的准确性也会降低。无论使用哪种模型体系结构,基本模型和图正则化模型都是如此。但是,请注意,对于这两种架构,图规则化模型的性能均优于基本模型。特别是,对于Bi-LSTM模型,当监督比率为0.01时,图形正则化模型的精度比基本模型的精度高约20% 。这主要是因为针对图正则化模型的半监督学习,其中除了训练样本本身之外,还使用了训练样本之间的结构相似性。

结论

我们已经证明了使用神经结构化学习(NSL)框架进行图正则化的使用,即使输入不包含显式图也是如此。我们考虑了IMDB电影评论的情感分类任务,为此我们基于评论嵌入合成了相似度图。我们鼓励用户通过更改超参数,监控量以及使用不同的模型架构来进一步进行实验。