TPUでBERTを使用してGLUEタスクを解決する

TensorFlow.orgで表示

GoogleColabで実行

GitHubで表示

ノートブックをダウンロード

TFハブモデルを参照してください

BERTは、自然言語処理における多くの問題を解決するために使用できます。あなたはどのようにより多くのタスクのための微調整BERTに学びますGLUEベンチマーク：

COLA （言語受容性のコーパス）：文は文法的に正しいですか？
SST-2 （スタンフォード感情ツリーバンク）：タスクは、所与の文の感情を予測することです。
MRPC （マイクロソフトリサーチ言い換えコーパス）：文のペアは意味的に等価であるかどうかを確認します。
QQP （Quoraの質問Pairs2）：質問のペアは意味的に等価であるかどうかを確認します。
MNLI （多ジャンル自然言語推論）：前提文と仮説文を考えると、タスクは、前提が仮説（含意）を必要とするかどうかを予測することであるという仮説（矛盾）が矛盾する、またはどちらも（ニュートラル）。
QNLI （質問応答自然言語推論）：タスクコンテキスト文は、質問への答えが含まれているかどうかを判断することです。
RTEは、（テキスト含意を認識）：文が与えられた仮説を伴うかどうかを判断します。
WNLI （ウィノグラード自然言語推論）：タスクが置換代名詞と文は原文にともなうされた場合に予測することです。

このチュートリアルには、TPUでこれらのモデルをトレーニングするための完全なエンドツーエンドのコードが含まれています。このノートブックは、1行変更することでGPU上で実行することもできます（以下で説明）。

このノートブックでは、次のことを行います。

TensorFlowハブからBERTモデルをロードする
GLUEタスクの1つを選択し、データセットをダウンロードします
テキストを前処理する
BERTを微調整します（シングルセンテンスおよびマルチセンテンスのデータセットの例を示します）
トレーニング済みモデルを保存して使用します

設定

別のモデルを使用してテキストを前処理してから、BERTを微調整します。このモデルは、に依存tensorflow /テキストあなたは以下のインストールされます。

pip install -q -U tensorflow-text

あなたはからAdamWオプティマイザを使用しますtensorflow /モデルあなたがよくとしてインストールされます微調整BERTに。

pip install -q -U tf-models-official

pip install -U tfds-nightly

import os
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_datasets as tfds
import tensorflow_text as text  # A dependency of the preprocessing model
import tensorflow_addons as tfa
from official.nlp import optimization
import numpy as np

tf.get_logger().setLevel('ERROR')

/tmpfs/src/tf_docs_env/lib/python3.6/site-packages/requests/__init__.py:104: RequestsDependencyWarning: urllib3 (1.26.7) or chardet (2.3.0)/charset_normalizer (2.0.7) doesn't match a supported version!
  RequestsDependencyWarning)

次に、TFHubのCloudStorageバケットからチェックポイントを直接読み取るようにTFHubを構成します。これは、TPUでTFHubモデルを実行する場合にのみ推奨されます。

この設定がないと、TFHubは圧縮ファイルをダウンロードし、チェックポイントをローカルに抽出します。これらのローカルファイルからロードしようとすると、次のエラーで失敗します。

InvalidArgumentError: Unimplemented: File system scheme '[local]' not implemented

これは、 TPUが唯一のクラウドストレージバケットから直接読み取ることができます。

os.environ["TFHUB_MODEL_LOAD_FORMAT"]="UNCOMPRESSED"

TPUワーカーに接続します

次のコードはTPUワーカーに接続し、TensorFlowのデフォルトデバイスをTPUワーカーのCPUデバイスに変更します。また、この1つのTPUワーカーで使用可能な8つの個別のTPUコアにモデルトレーニングを配布するために使用するTPU配布戦略も定義します。 TensorFlowの参照TPUガイドを詳細については。

import os

if os.environ['COLAB_TPU_ADDR']:
  cluster_resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='')
  tf.config.experimental_connect_to_cluster(cluster_resolver)
  tf.tpu.experimental.initialize_tpu_system(cluster_resolver)
  strategy = tf.distribute.TPUStrategy(cluster_resolver)
  print('Using TPU')
elif tf.config.list_physical_devices('GPU'):
  strategy = tf.distribute.MirroredStrategy()
  print('Using GPU')
else:
  raise ValueError('Running on CPU is not recommended.')

Using TPU

TensorFlowハブからモデルを読み込んでいます

ここでは、TensorFlow HubからロードするBERTモデルを選択し、微調整できます。選択できるBERTモデルは複数あります。

BERT-ベース、 Uncasedおよび7つのより多くのモデル、元BERTの作者によって解放訓練された重みを持ちます。
小型のBERTは、あなたがスピード、サイズと品質の間のトレードオフを探ることができます同じ一般的なアーキテクチャが、少ないおよび/またはより小さなトランスブロックを、持っています。
ALBERT ：層間パラメータを共有することにより、モデルサイズ（ただし、計算時間）を減少させる「AライトBERT」の4種類のサイズ。
BERT専門家：8つのモデル全てのことは、BERTベースのアーキテクチャを持っていますが、対象タスクとより密接に整合するように、異なる事前研修ドメイン間の選択肢を提供します。
エレクトラは（3種類のサイズで）BERTと同じアーキテクチャを持っていますが、似ている生成的敵対ネットワーク（GAN）ことをセット・アップに弁別よう事前に訓練を受け取得します。
トーキング・ヘッドの注意とゲーテッドGELUとBERT [塩基、大型は】トランスアーキテクチャのコアに2つの改良を有しています。

詳細については、上記のリンク先のモデルドキュメントを参照してください。

このチュートリアルでは、BERTベースから始めます。より大きく、より新しいモデルを使用して精度を高めたり、より小さなモデルを使用してトレーニング時間を短縮したりできます。モデルを変更するには、1行のコードを切り替えるだけです（以下を参照）。すべての違いは、TensorFlowハブからダウンロードするSavedModelにカプセル化されています。

BERTモデルを選択して微調整します

bert_model_name = 'bert_en_uncased_L-12_H-768_A-12' 

map_name_to_handle = {
    'bert_en_uncased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/3',
    'bert_en_uncased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_uncased_L-24_H-1024_A-16/3',
    'bert_en_wwm_uncased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_wwm_uncased_L-24_H-1024_A-16/3',
    'bert_en_cased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_cased_L-12_H-768_A-12/3',
    'bert_en_cased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_cased_L-24_H-1024_A-16/3',
    'bert_en_wwm_cased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_wwm_cased_L-24_H-1024_A-16/3',
    'bert_multi_cased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_multi_cased_L-12_H-768_A-12/3',
    'small_bert/bert_en_uncased_L-2_H-128_A-2':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-2_H-128_A-2/1',
    'small_bert/bert_en_uncased_L-2_H-256_A-4':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-2_H-256_A-4/1',
    'small_bert/bert_en_uncased_L-2_H-512_A-8':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-2_H-512_A-8/1',
    'small_bert/bert_en_uncased_L-2_H-768_A-12':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-2_H-768_A-12/1',
    'small_bert/bert_en_uncased_L-4_H-128_A-2':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-4_H-128_A-2/1',
    'small_bert/bert_en_uncased_L-4_H-256_A-4':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-4_H-256_A-4/1',
    'small_bert/bert_en_uncased_L-4_H-512_A-8':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-4_H-512_A-8/1',
    'small_bert/bert_en_uncased_L-4_H-768_A-12':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-4_H-768_A-12/1',
    'small_bert/bert_en_uncased_L-6_H-128_A-2':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-6_H-128_A-2/1',
    'small_bert/bert_en_uncased_L-6_H-256_A-4':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-6_H-256_A-4/1',
    'small_bert/bert_en_uncased_L-6_H-512_A-8':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-6_H-512_A-8/1',
    'small_bert/bert_en_uncased_L-6_H-768_A-12':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-6_H-768_A-12/1',
    'small_bert/bert_en_uncased_L-8_H-128_A-2':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-8_H-128_A-2/1',
    'small_bert/bert_en_uncased_L-8_H-256_A-4':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-8_H-256_A-4/1',
    'small_bert/bert_en_uncased_L-8_H-512_A-8':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-8_H-512_A-8/1',
    'small_bert/bert_en_uncased_L-8_H-768_A-12':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-8_H-768_A-12/1',
    'small_bert/bert_en_uncased_L-10_H-128_A-2':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-10_H-128_A-2/1',
    'small_bert/bert_en_uncased_L-10_H-256_A-4':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-10_H-256_A-4/1',
    'small_bert/bert_en_uncased_L-10_H-512_A-8':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-10_H-512_A-8/1',
    'small_bert/bert_en_uncased_L-10_H-768_A-12':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-10_H-768_A-12/1',
    'small_bert/bert_en_uncased_L-12_H-128_A-2':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-12_H-128_A-2/1',
    'small_bert/bert_en_uncased_L-12_H-256_A-4':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-12_H-256_A-4/1',
    'small_bert/bert_en_uncased_L-12_H-512_A-8':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-12_H-512_A-8/1',
    'small_bert/bert_en_uncased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-12_H-768_A-12/1',
    'albert_en_base':
        'https://tfhub.dev/tensorflow/albert_en_base/2',
    'albert_en_large':
        'https://tfhub.dev/tensorflow/albert_en_large/2',
    'albert_en_xlarge':
        'https://tfhub.dev/tensorflow/albert_en_xlarge/2',
    'albert_en_xxlarge':
        'https://tfhub.dev/tensorflow/albert_en_xxlarge/2',
    'electra_small':
        'https://tfhub.dev/google/electra_small/2',
    'electra_base':
        'https://tfhub.dev/google/electra_base/2',
    'experts_pubmed':
        'https://tfhub.dev/google/experts/bert/pubmed/2',
    'experts_wiki_books':
        'https://tfhub.dev/google/experts/bert/wiki_books/2',
    'talking-heads_base':
        'https://tfhub.dev/tensorflow/talkheads_ggelu_bert_en_base/1',
    'talking-heads_large':
        'https://tfhub.dev/tensorflow/talkheads_ggelu_bert_en_large/1',
}

map_model_to_preprocess = {
    'bert_en_uncased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'bert_en_uncased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'bert_en_wwm_cased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_cased_preprocess/3',
    'bert_en_cased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_cased_preprocess/3',
    'bert_en_cased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_cased_preprocess/3',
    'bert_en_wwm_uncased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-2_H-128_A-2':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-2_H-256_A-4':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-2_H-512_A-8':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-2_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-4_H-128_A-2':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-4_H-256_A-4':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-4_H-512_A-8':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-4_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-6_H-128_A-2':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-6_H-256_A-4':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-6_H-512_A-8':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-6_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-8_H-128_A-2':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-8_H-256_A-4':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-8_H-512_A-8':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-8_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-10_H-128_A-2':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-10_H-256_A-4':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-10_H-512_A-8':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-10_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-12_H-128_A-2':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-12_H-256_A-4':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-12_H-512_A-8':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'bert_multi_cased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_multi_cased_preprocess/3',
    'albert_en_base':
        'https://tfhub.dev/tensorflow/albert_en_preprocess/3',
    'albert_en_large':
        'https://tfhub.dev/tensorflow/albert_en_preprocess/3',
    'albert_en_xlarge':
        'https://tfhub.dev/tensorflow/albert_en_preprocess/3',
    'albert_en_xxlarge':
        'https://tfhub.dev/tensorflow/albert_en_preprocess/3',
    'electra_small':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'electra_base':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'experts_pubmed':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'experts_wiki_books':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'talking-heads_base':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'talking-heads_large':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
}

tfhub_handle_encoder = map_name_to_handle[bert_model_name]
tfhub_handle_preprocess = map_model_to_preprocess[bert_model_name]

print('BERT model selected           :', tfhub_handle_encoder)
print('Preprocessing model auto-selected:', tfhub_handle_preprocess)

BERT model selected           : https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/3
Preprocessing model auto-selected: https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3

テキストを前処理する

上BERTと分類テキストコラボ前処理モデルを直接BERTエンコーダが埋め込ま使用されます。

このチュートリアルでは、Dataset.mapを使用して、トレーニング用の入力パイプラインの一部として前処理を実行し、それを推論用にエクスポートされるモデルにマージする方法を示します。このように、TPU自体は数値入力を必要としますが、トレーニングと推論の両方が生のテキスト入力から機能します。

TPUの要件はさておき、それはパフォーマンスが（あなたがより多くを学ぶことができ、入力パイプラインに非同期で実行前処理していることができますtf.dataパフォーマンスガイド）。

このチュートリアルでは、多入力モデルを作成する方法と、BERTへの入力のシーケンス長を調整する方法も示します。

前処理モデルを示しましょう。

bert_preprocess = hub.load(tfhub_handle_preprocess)
tok = bert_preprocess.tokenize(tf.constant(['Hello TensorFlow!']))
print(tok)

<tf.RaggedTensor [[[7592], [23435, 12314], [999]]]>

各前処理モデルはまた、方法、提供.bert_pack_inputs(tensors, seq_length)同様にトークンのリストを受け取りtok上記）と系列長引数。これにより、入力がパックされ、BERTモデルで期待される形式のテンソルの辞書が作成されます。

text_preprocessed = bert_preprocess.bert_pack_inputs([tok, tok], tf.constant(20))

print('Shape Word Ids : ', text_preprocessed['input_word_ids'].shape)
print('Word Ids       : ', text_preprocessed['input_word_ids'][0, :16])
print('Shape Mask     : ', text_preprocessed['input_mask'].shape)
print('Input Mask     : ', text_preprocessed['input_mask'][0, :16])
print('Shape Type Ids : ', text_preprocessed['input_type_ids'].shape)
print('Type Ids       : ', text_preprocessed['input_type_ids'][0, :16])

Shape Word Ids :  (1, 20)
Word Ids       :  tf.Tensor(
[  101  7592 23435 12314   999   102  7592 23435 12314   999   102     0
     0     0     0     0], shape=(16,), dtype=int32)
Shape Mask     :  (1, 20)
Input Mask     :  tf.Tensor([1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0], shape=(16,), dtype=int32)
Shape Type Ids :  (1, 20)
Type Ids       :  tf.Tensor([0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0], shape=(16,), dtype=int32)

注意すべき詳細は次のとおりです。

input_maskマスクは、コンテンツとパディングの間にきれいに区別するためにモデルを可能にします。マスクは、同じ形状持つinput_word_ids 、1つのどこ含まinput_word_idsパディングされていないことを。
input_type_ids同じ形状を有しているinput_maskが、非パディング領域の内側に、トークンがその一部である文を示す0又は1を含んでいます。

次に、このすべてのロジックをカプセル化する前処理モデルを作成します。モデルは文字列を入力として受け取り、BERTに渡すことができる適切にフォーマットされたオブジェクトを返します。

各BERTモデルには特定の前処理モデルがあります。必ず、BERTのモデルドキュメントに記載されている適切なモデルを使用してください。

def make_bert_preprocess_model(sentence_features, seq_length=128):
  """Returns Model mapping string features to BERT inputs.

  Args:
    sentence_features: a list with the names of string-valued features.
    seq_length: an integer that defines the sequence length of BERT inputs.

  Returns:
    A Keras Model that can be called on a list or dict of string Tensors
    (with the order or names, resp., given by sentence_features) and
    returns a dict of tensors for input to BERT.
  """

  input_segments = [
      tf.keras.layers.Input(shape=(), dtype=tf.string, name=ft)
      for ft in sentence_features]

  # Tokenize the text to word pieces.
  bert_preprocess = hub.load(tfhub_handle_preprocess)
  tokenizer = hub.KerasLayer(bert_preprocess.tokenize, name='tokenizer')
  segments = [tokenizer(s) for s in input_segments]

  # Optional: Trim segments in a smart way to fit seq_length.
  # Simple cases (like this example) can skip this step and let
  # the next step apply a default truncation to approximately equal lengths.
  truncated_segments = segments

  # Pack inputs. The details (start/end token ids, dict of output tensors)
  # are model-dependent, so this gets loaded from the SavedModel.
  packer = hub.KerasLayer(bert_preprocess.bert_pack_inputs,
                          arguments=dict(seq_length=seq_length),
                          name='packer')
  model_inputs = packer(truncated_segments)
  return tf.keras.Model(input_segments, model_inputs)

前処理モデルを示しましょう。 2つの文の入力（input1とinput2）を使用してテストを作成します。：出力はBERTモデルは、入力として期待するものであるinput_word_ids 、 input_masksとinput_type_ids 。

test_preprocess_model = make_bert_preprocess_model(['my_input1', 'my_input2'])
test_text = [np.array(['some random test sentence']),
             np.array(['another sentence'])]
text_preprocessed = test_preprocess_model(test_text)

print('Keys           : ', list(text_preprocessed.keys()))
print('Shape Word Ids : ', text_preprocessed['input_word_ids'].shape)
print('Word Ids       : ', text_preprocessed['input_word_ids'][0, :16])
print('Shape Mask     : ', text_preprocessed['input_mask'].shape)
print('Input Mask     : ', text_preprocessed['input_mask'][0, :16])
print('Shape Type Ids : ', text_preprocessed['input_type_ids'].shape)
print('Type Ids       : ', text_preprocessed['input_type_ids'][0, :16])

Keys           :  ['input_word_ids', 'input_mask', 'input_type_ids']
Shape Word Ids :  (1, 128)
Word Ids       :  tf.Tensor(
[ 101 2070 6721 3231 6251  102 2178 6251  102    0    0    0    0    0
    0    0], shape=(16,), dtype=int32)
Shape Mask     :  (1, 128)
Input Mask     :  tf.Tensor([1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0], shape=(16,), dtype=int32)
Shape Type Ids :  (1, 128)
Type Ids       :  tf.Tensor([0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0], shape=(16,), dtype=int32)

定義した2つの入力に注目して、モデルの構造を見てみましょう。

tf.keras.utils.plot_model(test_preprocess_model, show_shapes=True, show_dtype=True)

('You must install pydot (`pip install pydot`) and install graphviz (see instructions at https://graphviz.gitlab.io/download/) ', 'for plot_model/model_to_dot to work.')

データセットからのすべての入力に前処理を適用するには、使用するmapデータセットから機能を。その結果はキャッシュされた性能。

AUTOTUNE = tf.data.AUTOTUNE


def load_dataset_from_tfds(in_memory_ds, info, split, batch_size,
                           bert_preprocess_model):
  is_training = split.startswith('train')
  dataset = tf.data.Dataset.from_tensor_slices(in_memory_ds[split])
  num_examples = info.splits[split].num_examples

  if is_training:
    dataset = dataset.shuffle(num_examples)
    dataset = dataset.repeat()
  dataset = dataset.batch(batch_size)
  dataset = dataset.map(lambda ex: (bert_preprocess_model(ex), ex['label']))
  dataset = dataset.cache().prefetch(buffer_size=AUTOTUNE)
  return dataset, num_examples

モデルを定義する

これで、前処理された入力をBERTエンコーダーに送り、線形分類器を上に配置し（または、必要に応じて他のレイヤーの配置）、正則化にドロップアウトを使用することで、文または文のペアの分類のモデルを定義する準備が整いました。

def build_classifier_model(num_classes):

  class Classifier(tf.keras.Model):
    def __init__(self, num_classes):
      super(Classifier, self).__init__(name="prediction")
      self.encoder = hub.KerasLayer(tfhub_handle_encoder, trainable=True)
      self.dropout = tf.keras.layers.Dropout(0.1)
      self.dense = tf.keras.layers.Dense(num_classes)

    def call(self, preprocessed_text):
      encoder_outputs = self.encoder(preprocessed_text)
      pooled_output = encoder_outputs["pooled_output"]
      x = self.dropout(pooled_output)
      x = self.dense(x)
      return x

  model = Classifier(num_classes)
  return model

いくつかの前処理された入力でモデルを実行してみましょう。

test_classifier_model = build_classifier_model(2)
bert_raw_result = test_classifier_model(text_preprocessed)
print(tf.sigmoid(bert_raw_result))

tf.Tensor([[0.29329836 0.44367802]], shape=(1, 2), dtype=float32)

GLUEからタスクを選択してください

あなたはからTensorFlow DataSetを使用しようとしているGLUEのベンチマークスイート。

Colabを使用すると、これらの小さなデータセットをローカルファイルシステムにダウンロードできます。別のTPUワーカーホストはcolabランタイムのローカルファイルシステムにアクセスできないため、以下のコードはそれらを完全にメモリに読み込みます。

大きなデータセットの場合、あなたはあなた自身の作成する必要がありますGoogleのクラウドストレージバケットをし、TPUの労働者がそこからデータを読みました。あなたにはもっと学ぶことができるTPUガイド。

CoLaデータセット（単一文の場合）またはMRPC（複数文の場合）から始めることをお勧めします。これらは小さく、微調整に時間がかからないためです。

tfds_name = 'glue/cola' 

tfds_info = tfds.builder(tfds_name).info

sentence_features = list(tfds_info.features.keys())
sentence_features.remove('idx')
sentence_features.remove('label')

available_splits = list(tfds_info.splits.keys())
train_split = 'train'
validation_split = 'validation'
test_split = 'test'
if tfds_name == 'glue/mnli':
  validation_split = 'validation_matched'
  test_split = 'test_matched'

num_classes = tfds_info.features['label'].num_classes
num_examples = tfds_info.splits.total_num_examples

print(f'Using {tfds_name} from TFDS')
print(f'This dataset has {num_examples} examples')
print(f'Number of classes: {num_classes}')
print(f'Features {sentence_features}')
print(f'Splits {available_splits}')

with tf.device('/job:localhost'):
  # batch_size=-1 is a way to load the dataset into memory
  in_memory_ds = tfds.load(tfds_name, batch_size=-1, shuffle_files=True)

# The code below is just to show some samples from the selected dataset
print(f'Here are some sample rows from {tfds_name} dataset')
sample_dataset = tf.data.Dataset.from_tensor_slices(in_memory_ds[train_split])

labels_names = tfds_info.features['label'].names
print(labels_names)
print()

sample_i = 1
for sample_row in sample_dataset.take(5):
  samples = [sample_row[feature] for feature in sentence_features]
  print(f'sample row {sample_i}')
  for sample in samples:
    print(sample.numpy())
  sample_label = sample_row['label']

  print(f'label: {sample_label} ({labels_names[sample_label]})')
  print()
  sample_i += 1

Using glue/cola from TFDS
This dataset has 10657 examples
Number of classes: 2
Features ['sentence']
Splits ['train', 'validation', 'test']
Here are some sample rows from glue/cola dataset
['unacceptable', 'acceptable']

sample row 1
b'It is this hat that it is certain that he was wearing.'
label: 1 (acceptable)

sample row 2
b'Her efficient looking up of the answer pleased the boss.'
label: 1 (acceptable)

sample row 3
b'Both the workers will wear carnations.'
label: 1 (acceptable)

sample row 4
b'John enjoyed drawing trees for his syntax homework.'
label: 1 (acceptable)

sample row 5
b'We consider Leslie rather foolish, and Lou a complete idiot.'
label: 1 (acceptable)

データセットは、問題の種類（分類または回帰）とトレーニングに適した損失関数も決定します。

def get_configuration(glue_task):

  loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

  if glue_task == 'glue/cola':
    metrics = tfa.metrics.MatthewsCorrelationCoefficient(num_classes=2)
  else:
    metrics = tf.keras.metrics.SparseCategoricalAccuracy(
        'accuracy', dtype=tf.float32)

  return metrics, loss

モデルをトレーニングする

最後に、選択したデータセットでモデルをエンドツーエンドでトレーニングできます。

分布

colabランタイムを複数のTPUデバイスを備えたTPUワーカーに接続している、上部のセットアップコードを思い出してください。それらにトレーニングを配布するには、TPU配布戦略の範囲内でメインのKerasモデルを作成してコンパイルします。（詳細については、を参照してくださいKerasによる分散訓練を。）

一方、前処理はTPUではなくワーカーホストのCPUで実行されるため、前処理用のKerasモデルと、それにマッピングされたトレーニングおよび検証データセットは、配布戦略の範囲外で構築されます。呼び出しModel.fit()渡されたモデルのレプリカにデータセットを配布するの世話をします。

オプティマイザ

微調整がBERTからオプティマイザセットアップを次の（のようなトレーニングを事前BERTと分類テキスト）：これは、最初の上に線形のウォームアップ相が付いて、想定元本初期学習率の線形減衰とAdamWオプティマイザを使用していますトレーニングステップ（10％ num_warmup_steps ）。 BERTの論文に沿って、微調整の場合、初期学習率は低くなります（5e-5、3e-5、2e-5のベスト）。

epochs = 3
batch_size = 32
init_lr = 2e-5

print(f'Fine tuning {tfhub_handle_encoder} model')
bert_preprocess_model = make_bert_preprocess_model(sentence_features)

with strategy.scope():

  # metric have to be created inside the strategy scope
  metrics, loss = get_configuration(tfds_name)

  train_dataset, train_data_size = load_dataset_from_tfds(
      in_memory_ds, tfds_info, train_split, batch_size, bert_preprocess_model)
  steps_per_epoch = train_data_size // batch_size
  num_train_steps = steps_per_epoch * epochs
  num_warmup_steps = num_train_steps // 10

  validation_dataset, validation_data_size = load_dataset_from_tfds(
      in_memory_ds, tfds_info, validation_split, batch_size,
      bert_preprocess_model)
  validation_steps = validation_data_size // batch_size

  classifier_model = build_classifier_model(num_classes)

  optimizer = optimization.create_optimizer(
      init_lr=init_lr,
      num_train_steps=num_train_steps,
      num_warmup_steps=num_warmup_steps,
      optimizer_type='adamw')

  classifier_model.compile(optimizer=optimizer, loss=loss, metrics=[metrics])

  classifier_model.fit(
      x=train_dataset,
      validation_data=validation_dataset,
      steps_per_epoch=steps_per_epoch,
      epochs=epochs,
      validation_steps=validation_steps)

Fine tuning https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/3 model
/tmpfs/src/tf_docs_env/lib/python3.6/site-packages/keras/engine/functional.py:585: UserWarning: Input dict contained keys ['idx', 'label'] which did not match any model input. They will be ignored by the model.
  [n for n in tensors.keys() if n not in ref_input_names])
Epoch 1/3
/tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/framework/indexed_slices.py:449: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("AdamWeightDecay/gradients/StatefulPartitionedCall:1", shape=(None,), dtype=int32), values=Tensor("clip_by_global_norm/clip_by_global_norm/_0:0", dtype=float32), dense_shape=Tensor("AdamWeightDecay/gradients/StatefulPartitionedCall:2", shape=(None,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "shape. This may consume a large amount of memory." % value)
267/267 [==============================] - 86s 81ms/step - loss: 0.6092 - MatthewsCorrelationCoefficient: 0.0000e+00 - val_loss: 0.4846 - val_MatthewsCorrelationCoefficient: 0.0000e+00
Epoch 2/3
267/267 [==============================] - 14s 53ms/step - loss: 0.3774 - MatthewsCorrelationCoefficient: 0.0000e+00 - val_loss: 0.5322 - val_MatthewsCorrelationCoefficient: 0.0000e+00
Epoch 3/3
267/267 [==============================] - 14s 53ms/step - loss: 0.2623 - MatthewsCorrelationCoefficient: 0.0000e+00 - val_loss: 0.6469 - val_MatthewsCorrelationCoefficient: 0.0000e+00

推論のためにエクスポート

前処理部分と、作成したばかりの微調整されたBERTを含む最終モデルを作成します。

推論時には、前処理をモデルの一部にする必要があります（それを行うトレーニングデータに関しては、別個の入力キューがなくなったため）。前処理は単なる計算ではありません。エクスポート用に保存されるKerasモデルにアタッチする必要がある独自のリソース（語彙テーブル）があります。この最終的なアセンブリが保存されます。

あなたがコラボでモデルを保存しようとしていると、後であなたが将来のためにそれを維持するためにダウンロードすることができます（表示- >目次- >ファイル）。

main_save_path = './my_models'
bert_type = tfhub_handle_encoder.split('/')[-2]
saved_model_name = f'{tfds_name.replace("/", "_")}_{bert_type}'

saved_model_path = os.path.join(main_save_path, saved_model_name)

preprocess_inputs = bert_preprocess_model.inputs
bert_encoder_inputs = bert_preprocess_model(preprocess_inputs)
bert_outputs = classifier_model(bert_encoder_inputs)
model_for_export = tf.keras.Model(preprocess_inputs, bert_outputs)

print('Saving', saved_model_path)

# Save everything on the Colab host (even the variables from TPU memory)
save_options = tf.saved_model.SaveOptions(experimental_io_device='/job:localhost')
model_for_export.save(saved_model_path, include_optimizer=False,
                      options=save_options)

Saving ./my_models/glue_cola_bert_en_uncased_L-12_H-768_A-12
WARNING:absl:Found untraced functions such as restored_function_body, restored_function_body, restored_function_body, restored_function_body, restored_function_body while saving (showing 5 of 910). These functions will not be directly callable after loading.

モデルをテストする

最後のステップは、エクスポートされたモデルの結果をテストすることです。

比較のために、モデルをリロードし、データセットから分割されたテストからの入力を使用してテストしてみましょう。

with tf.device('/job:localhost'):
  reloaded_model = tf.saved_model.load(saved_model_path)

ユーティリティメソッド

def prepare(record):
  model_inputs = [[record[ft]] for ft in sentence_features]
  return model_inputs


def prepare_serving(record):
  model_inputs = {ft: record[ft] for ft in sentence_features}
  return model_inputs


def print_bert_results(test, bert_result, dataset_name):

  bert_result_class = tf.argmax(bert_result, axis=1)[0]

  if dataset_name == 'glue/cola':
    print('sentence:', test[0].numpy())
    if bert_result_class == 1:
      print('This sentence is acceptable')
    else:
      print('This sentence is unacceptable')

  elif dataset_name == 'glue/sst2':
    print('sentence:', test[0])
    if bert_result_class == 1:
      print('This sentence has POSITIVE sentiment')
    else:
      print('This sentence has NEGATIVE sentiment')

  elif dataset_name == 'glue/mrpc':
    print('sentence1:', test[0])
    print('sentence2:', test[1])
    if bert_result_class == 1:
      print('Are a paraphrase')
    else:
      print('Are NOT a paraphrase')

  elif dataset_name == 'glue/qqp':
    print('question1:', test[0])
    print('question2:', test[1])
    if bert_result_class == 1:
      print('Questions are similar')
    else:
      print('Questions are NOT similar')

  elif dataset_name == 'glue/mnli':
    print('premise   :', test[0])
    print('hypothesis:', test[1])
    if bert_result_class == 1:
      print('This premise is NEUTRAL to the hypothesis')
    elif bert_result_class == 2:
      print('This premise CONTRADICTS the hypothesis')
    else:
      print('This premise ENTAILS the hypothesis')

  elif dataset_name == 'glue/qnli':
    print('question:', test[0])
    print('sentence:', test[1])
    if bert_result_class == 1:
      print('The question is NOT answerable by the sentence')
    else:
      print('The question is answerable by the sentence')

  elif dataset_name == 'glue/rte':
    print('sentence1:', test[0])
    print('sentence2:', test[1])
    if bert_result_class == 1:
      print('Sentence1 DOES NOT entails sentence2')
    else:
      print('Sentence1 entails sentence2')

  elif dataset_name == 'glue/wnli':
    print('sentence1:', test[0])
    print('sentence2:', test[1])
    if bert_result_class == 1:
      print('Sentence1 DOES NOT entails sentence2')
    else:
      print('Sentence1 entails sentence2')

  print('BERT raw results:', bert_result[0])
  print()

テスト

with tf.device('/job:localhost'):
  test_dataset = tf.data.Dataset.from_tensor_slices(in_memory_ds[test_split])
  for test_row in test_dataset.shuffle(1000).map(prepare).take(5):
    if len(sentence_features) == 1:
      result = reloaded_model(test_row[0])
    else:
      result = reloaded_model(list(test_row))

    print_bert_results(test_row, result, tfds_name)

sentence: [b'An old woman languished in the forest.']
This sentence is acceptable
BERT raw results: tf.Tensor([-1.7032353  3.3714833], shape=(2,), dtype=float32)

sentence: [b"I went to the movies and didn't pick up the shirts."]
This sentence is acceptable
BERT raw results: tf.Tensor([-0.73970896  1.0806316 ], shape=(2,), dtype=float32)

sentence: [b"Every essay that she's written and which I've read is on that pile."]
This sentence is acceptable
BERT raw results: tf.Tensor([-0.7034159  0.6236454], shape=(2,), dtype=float32)

sentence: [b'Either Bill ate the peaches, or Harry.']
This sentence is unacceptable
BERT raw results: tf.Tensor([ 0.05972151 -0.08620442], shape=(2,), dtype=float32)

sentence: [b'I ran into the baker from whom I bought these bagels.']
This sentence is acceptable
BERT raw results: tf.Tensor([-1.6862067  3.285925 ], shape=(2,), dtype=float32)

あなたは上のモデルを使用したい場合はTFサービング、それはその名前の署名のうちの1つを介して、あなたのSavedModelを呼び出すことを覚えておいてください。入力にいくつかの小さな違いがあることに注意してください。 Pythonでは、次のようにテストできます。

with tf.device('/job:localhost'):
  serving_model = reloaded_model.signatures['serving_default']
  for test_row in test_dataset.shuffle(1000).map(prepare_serving).take(5):
    result = serving_model(**test_row)
    # The 'prediction' key is the classifier's defined model name.
    print_bert_results(list(test_row.values()), result['prediction'], tfds_name)

sentence: b'Everyone attended more than two seminars.'
This sentence is acceptable
BERT raw results: tf.Tensor([-1.5594155  2.862155 ], shape=(2,), dtype=float32)

sentence: b'Most columnists claim that a senior White House official has been briefing them.'
This sentence is acceptable
BERT raw results: tf.Tensor([-1.6298996  3.3155093], shape=(2,), dtype=float32)

sentence: b"That my father, he's lived here all his life is well known to those cops."
This sentence is acceptable
BERT raw results: tf.Tensor([-1.2048947  1.8589772], shape=(2,), dtype=float32)

sentence: b'Ourselves like us.'
This sentence is acceptable
BERT raw results: tf.Tensor([-1.2723312  2.0494034], shape=(2,), dtype=float32)

sentence: b'John is clever.'
This sentence is acceptable
BERT raw results: tf.Tensor([-1.6516167  3.3147635], shape=(2,), dtype=float32)

できたね！保存したモデルは、プロセスでの提供または単純な推論に使用でき、コードが少なく、保守が容易な、より単純なAPIを使用できます。

次のステップ

ベースBERTモデルの1つを試したので、他のモデルを試して、より高い精度を達成するか、より小さなモデルバージョンを使用することができます。

他のデータセットで試すこともできます。