Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

Selesaikan tugas GLUE menggunakan BERT di TPU

Lihat di TensorFlow.org

Jalankan di Google Colab

Lihat di GitHub

Unduh buku catatan

Lihat model TF Hub

BERT dapat digunakan untuk memecahkan banyak masalah dalam pemrosesan bahasa alami. Anda akan belajar bagaimana untuk menyempurnakan BERT untuk banyak tugas dari patokan LEM :

COLA (Corpus Linguistik Akseptabilitas): Apakah kalimat tata bahasa yang benar?
SST-2 (Stanford Sentimen Treebank): Tugas kita adalah untuk memprediksi sentimen dari kalimat yang diberikan.
MRPC (Microsoft Research Paraphrase Corpus): Tentukan apakah sepasang kalimat semantik setara.
QQP (Quora Pertanyaan Pairs2): Tentukan apakah sepasang pertanyaan semantik setara.
MNLI (Multi-Genre Natural Bahasa Inference): Mengingat kalimat premis dan kalimat hipotesis, tugas ini adalah untuk memprediksi apakah premis memerlukan hipotesis (entailment), bertentangan dengan hipotesis (kontradiksi), atau tidak (netral).
QNLI (Pertanyaan-menjawab Alam Bahasa Inference): Tugas kita adalah untuk menentukan apakah kalimat konteks berisi jawaban atas pertanyaan.
RTE (Menyadari Tekstual Entailment): Menentukan apakah kalimat memerlukan hipotesis yang diberikan atau tidak.
WNLI (Winograd Alam Bahasa Inference): Tugas kita adalah untuk memprediksi jika kalimat dengan kata ganti diganti yang terkandung oleh kalimat aslinya.

Tutorial ini berisi kode end-to-end lengkap untuk melatih model-model ini pada TPU. Anda juga dapat menjalankan notebook ini pada GPU, dengan mengubah satu baris (dijelaskan di bawah).

Dalam buku catatan ini, Anda akan:

Muat model BERT dari TensorFlow Hub
Pilih salah satu tugas GLUE dan unduh kumpulan data
Pra-proses teks
Sempurnakan BERT (contoh diberikan untuk kumpulan data kalimat tunggal dan multi-kalimat)
Simpan model terlatih dan gunakan

Mempersiapkan

Anda akan menggunakan model terpisah untuk melakukan praproses teks sebelum menggunakannya untuk menyempurnakan BERT. Model ini tergantung pada tensorflow / teks , yang Anda akan menginstal di bawah ini.

pip install -q -U tensorflow-text

Anda akan menggunakan optimizer AdamW dari tensorflow / model untuk menyempurnakan BERT, mana Anda akan menginstal juga.

pip install -q -U tf-models-official

pip install -U tfds-nightly

import os
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_datasets as tfds
import tensorflow_text as text  # A dependency of the preprocessing model
import tensorflow_addons as tfa
from official.nlp import optimization
import numpy as np

tf.get_logger().setLevel('ERROR')

/tmpfs/src/tf_docs_env/lib/python3.6/site-packages/requests/__init__.py:104: RequestsDependencyWarning: urllib3 (1.26.7) or chardet (2.3.0)/charset_normalizer (2.0.7) doesn't match a supported version!
  RequestsDependencyWarning)

Selanjutnya, konfigurasikan TFHub untuk membaca pos pemeriksaan langsung dari bucket Cloud Storage TFHub. Ini hanya disarankan saat menjalankan model TFHub di TPU.

Tanpa pengaturan ini, TFHub akan mengunduh file terkompresi dan mengekstrak pos pemeriksaan secara lokal. Mencoba memuat dari file lokal ini akan gagal dengan kesalahan berikut:

InvalidArgumentError: Unimplemented: File system scheme '[local]' not implemented

Hal ini karena TPU hanya bisa membaca langsung dari ember Cloud Storage .

os.environ["TFHUB_MODEL_LOAD_FORMAT"]="UNCOMPRESSED"

Hubungkan ke pekerja TPU

Kode berikut terhubung ke pekerja TPU dan mengubah perangkat default TensorFlow ke perangkat CPU pada pekerja TPU. Ini juga menentukan strategi distribusi TPU yang akan Anda gunakan untuk mendistribusikan pelatihan model ke 8 inti TPU terpisah yang tersedia pada satu pekerja TPU ini. Lihat TensorFlow ini TPU panduan untuk informasi lebih lanjut.

import os

if os.environ['COLAB_TPU_ADDR']:
  cluster_resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='')
  tf.config.experimental_connect_to_cluster(cluster_resolver)
  tf.tpu.experimental.initialize_tpu_system(cluster_resolver)
  strategy = tf.distribute.TPUStrategy(cluster_resolver)
  print('Using TPU')
elif tf.config.list_physical_devices('GPU'):
  strategy = tf.distribute.MirroredStrategy()
  print('Using GPU')
else:
  raise ValueError('Running on CPU is not recommended.')

Using TPU

Memuat model dari TensorFlow Hub

Di sini Anda dapat memilih model BERT mana yang akan Anda muat dari TensorFlow Hub dan menyempurnakannya. Ada beberapa model BERT yang tersedia untuk dipilih.

Bert-Base , Uncased dan tujuh model yang lebih dengan bobot dilatih dirilis oleh penulis Bert asli.
Berts kecil memiliki arsitektur yang sama umum tetapi lebih sedikit dan / atau blok Transformer kecil, yang memungkinkan Anda menjelajahi timbal balik antara kecepatan, ukuran dan kualitas.
ALBERT : empat ukuran yang berbeda dari "A Lite Bert" yang mengurangi ukuran Model (tapi tidak waktu komputasi) dengan berbagi parameter antara lapisan.
Bert Ahli : delapan model yang semua memiliki arsitektur Bert-dasar tetapi menawarkan pilihan antara domain pra-pelatihan yang berbeda, untuk menyelaraskan lebih dekat dengan tugas sasaran.
Electra memiliki arsitektur yang sama seperti Bert (dalam tiga ukuran yang berbeda), tapi akan pra-dilatih sebagai diskriminator dalam satu set-up yang menyerupai sebuah generatif Adversarial Network (GAN).
Bert dengan Talking-Kepala Perhatian dan Gated Gelu [ dasar , besar ] memiliki dua perbaikan inti dari arsitektur Transformer.

Lihat dokumentasi model yang ditautkan di atas untuk detail lebih lanjut.

Dalam tutorial ini, Anda akan mulai dengan BERT-base. Anda dapat menggunakan model yang lebih besar dan lebih baru untuk akurasi yang lebih tinggi, atau model yang lebih kecil untuk waktu pelatihan yang lebih cepat. Untuk mengubah model, Anda hanya perlu mengganti satu baris kode (ditampilkan di bawah). Semua perbedaan dirangkum dalam SavedModel yang akan Anda unduh dari TensorFlow Hub.

Pilih model BERT untuk menyempurnakan

bert_model_name = 'bert_en_uncased_L-12_H-768_A-12' 

map_name_to_handle = {
    'bert_en_uncased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/3',
    'bert_en_uncased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_uncased_L-24_H-1024_A-16/3',
    'bert_en_wwm_uncased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_wwm_uncased_L-24_H-1024_A-16/3',
    'bert_en_cased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_cased_L-12_H-768_A-12/3',
    'bert_en_cased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_cased_L-24_H-1024_A-16/3',
    'bert_en_wwm_cased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_wwm_cased_L-24_H-1024_A-16/3',
    'bert_multi_cased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_multi_cased_L-12_H-768_A-12/3',
    'small_bert/bert_en_uncased_L-2_H-128_A-2':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-2_H-128_A-2/1',
    'small_bert/bert_en_uncased_L-2_H-256_A-4':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-2_H-256_A-4/1',
    'small_bert/bert_en_uncased_L-2_H-512_A-8':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-2_H-512_A-8/1',
    'small_bert/bert_en_uncased_L-2_H-768_A-12':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-2_H-768_A-12/1',
    'small_bert/bert_en_uncased_L-4_H-128_A-2':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-4_H-128_A-2/1',
    'small_bert/bert_en_uncased_L-4_H-256_A-4':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-4_H-256_A-4/1',
    'small_bert/bert_en_uncased_L-4_H-512_A-8':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-4_H-512_A-8/1',
    'small_bert/bert_en_uncased_L-4_H-768_A-12':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-4_H-768_A-12/1',
    'small_bert/bert_en_uncased_L-6_H-128_A-2':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-6_H-128_A-2/1',
    'small_bert/bert_en_uncased_L-6_H-256_A-4':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-6_H-256_A-4/1',
    'small_bert/bert_en_uncased_L-6_H-512_A-8':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-6_H-512_A-8/1',
    'small_bert/bert_en_uncased_L-6_H-768_A-12':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-6_H-768_A-12/1',
    'small_bert/bert_en_uncased_L-8_H-128_A-2':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-8_H-128_A-2/1',
    'small_bert/bert_en_uncased_L-8_H-256_A-4':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-8_H-256_A-4/1',
    'small_bert/bert_en_uncased_L-8_H-512_A-8':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-8_H-512_A-8/1',
    'small_bert/bert_en_uncased_L-8_H-768_A-12':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-8_H-768_A-12/1',
    'small_bert/bert_en_uncased_L-10_H-128_A-2':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-10_H-128_A-2/1',
    'small_bert/bert_en_uncased_L-10_H-256_A-4':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-10_H-256_A-4/1',
    'small_bert/bert_en_uncased_L-10_H-512_A-8':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-10_H-512_A-8/1',
    'small_bert/bert_en_uncased_L-10_H-768_A-12':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-10_H-768_A-12/1',
    'small_bert/bert_en_uncased_L-12_H-128_A-2':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-12_H-128_A-2/1',
    'small_bert/bert_en_uncased_L-12_H-256_A-4':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-12_H-256_A-4/1',
    'small_bert/bert_en_uncased_L-12_H-512_A-8':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-12_H-512_A-8/1',
    'small_bert/bert_en_uncased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-12_H-768_A-12/1',
    'albert_en_base':
        'https://tfhub.dev/tensorflow/albert_en_base/2',
    'albert_en_large':
        'https://tfhub.dev/tensorflow/albert_en_large/2',
    'albert_en_xlarge':
        'https://tfhub.dev/tensorflow/albert_en_xlarge/2',
    'albert_en_xxlarge':
        'https://tfhub.dev/tensorflow/albert_en_xxlarge/2',
    'electra_small':
        'https://tfhub.dev/google/electra_small/2',
    'electra_base':
        'https://tfhub.dev/google/electra_base/2',
    'experts_pubmed':
        'https://tfhub.dev/google/experts/bert/pubmed/2',
    'experts_wiki_books':
        'https://tfhub.dev/google/experts/bert/wiki_books/2',
    'talking-heads_base':
        'https://tfhub.dev/tensorflow/talkheads_ggelu_bert_en_base/1',
    'talking-heads_large':
        'https://tfhub.dev/tensorflow/talkheads_ggelu_bert_en_large/1',
}

map_model_to_preprocess = {
    'bert_en_uncased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'bert_en_uncased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'bert_en_wwm_cased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_cased_preprocess/3',
    'bert_en_cased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_cased_preprocess/3',
    'bert_en_cased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_cased_preprocess/3',
    'bert_en_wwm_uncased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-2_H-128_A-2':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-2_H-256_A-4':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-2_H-512_A-8':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-2_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-4_H-128_A-2':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-4_H-256_A-4':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-4_H-512_A-8':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-4_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-6_H-128_A-2':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-6_H-256_A-4':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-6_H-512_A-8':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-6_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-8_H-128_A-2':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-8_H-256_A-4':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-8_H-512_A-8':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-8_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-10_H-128_A-2':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-10_H-256_A-4':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-10_H-512_A-8':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-10_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-12_H-128_A-2':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-12_H-256_A-4':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-12_H-512_A-8':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'bert_multi_cased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_multi_cased_preprocess/3',
    'albert_en_base':
        'https://tfhub.dev/tensorflow/albert_en_preprocess/3',
    'albert_en_large':
        'https://tfhub.dev/tensorflow/albert_en_preprocess/3',
    'albert_en_xlarge':
        'https://tfhub.dev/tensorflow/albert_en_preprocess/3',
    'albert_en_xxlarge':
        'https://tfhub.dev/tensorflow/albert_en_preprocess/3',
    'electra_small':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'electra_base':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'experts_pubmed':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'experts_wiki_books':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'talking-heads_base':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'talking-heads_large':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
}

tfhub_handle_encoder = map_name_to_handle[bert_model_name]
tfhub_handle_preprocess = map_model_to_preprocess[bert_model_name]

print('BERT model selected           :', tfhub_handle_encoder)
print('Preprocessing model auto-selected:', tfhub_handle_preprocess)

BERT model selected           : https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/3
Preprocessing model auto-selected: https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3

Pra-proses teks

Di teks Klasifikasikan dengan Bert CoLab model preprocessing adalah digunakan langsung tertanam dengan encoder Bert.

Tutorial ini mendemonstrasikan cara melakukan prapemrosesan sebagai bagian dari jalur input Anda untuk pelatihan, menggunakan Dataset.map, lalu menggabungkannya ke dalam model yang diekspor untuk inferensi. Dengan begitu, pelatihan dan inferensi dapat bekerja dari input teks mentah, meskipun TPU itu sendiri membutuhkan input numerik.

Persyaratan TPU samping, itu dapat membantu kinerja memiliki preprocessing dilakukan asynchronously dalam pipa masukan (Anda dapat mempelajari lebih dalam panduan kinerja tf.data ).

Tutorial ini juga menunjukkan bagaimana membangun model multi-input, dan bagaimana menyesuaikan panjang urutan input ke BERT.

Mari kita tunjukkan model preprocessing.

bert_preprocess = hub.load(tfhub_handle_preprocess)
tok = bert_preprocess.tokenize(tf.constant(['Hello TensorFlow!']))
print(tok)

<tf.RaggedTensor [[[7592], [23435, 12314], [999]]]>

Setiap model preprocessing juga menyediakan metode, .bert_pack_inputs(tensors, seq_length) , yang mengambil daftar token (seperti tok atas) dan argumen urutan panjang. Ini mengemas input untuk membuat kamus tensor dalam format yang diharapkan oleh model BERT.

text_preprocessed = bert_preprocess.bert_pack_inputs([tok, tok], tf.constant(20))

print('Shape Word Ids : ', text_preprocessed['input_word_ids'].shape)
print('Word Ids       : ', text_preprocessed['input_word_ids'][0, :16])
print('Shape Mask     : ', text_preprocessed['input_mask'].shape)
print('Input Mask     : ', text_preprocessed['input_mask'][0, :16])
print('Shape Type Ids : ', text_preprocessed['input_type_ids'].shape)
print('Type Ids       : ', text_preprocessed['input_type_ids'][0, :16])

Shape Word Ids :  (1, 20)
Word Ids       :  tf.Tensor(
[  101  7592 23435 12314   999   102  7592 23435 12314   999   102     0
     0     0     0     0], shape=(16,), dtype=int32)
Shape Mask     :  (1, 20)
Input Mask     :  tf.Tensor([1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0], shape=(16,), dtype=int32)
Shape Type Ids :  (1, 20)
Type Ids       :  tf.Tensor([0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0], shape=(16,), dtype=int32)

Berikut beberapa detail yang perlu diperhatikan:

input_mask topeng ini memungkinkan model untuk membedakan bersih antara konten dan padding. Topeng ini memiliki bentuk yang sama seperti input_word_ids , dan berisi 1 di mana saja input_word_ids tidak melakukan padding.
input_type_ids memiliki bentuk yang sama seperti input_mask , tapi di dalam wilayah non-empuk, berisi 0 atau 1 yang menunjukkan kalimat token adalah bagian dari.

Selanjutnya, Anda akan membuat model prapemrosesan yang merangkum semua logika ini. Model Anda akan mengambil string sebagai input, dan mengembalikan objek yang diformat dengan tepat yang dapat diteruskan ke BERT.

Setiap model BERT memiliki model preprocessing yang spesifik, pastikan untuk menggunakan model yang sesuai yang dijelaskan pada dokumentasi model BERT.

def make_bert_preprocess_model(sentence_features, seq_length=128):
  """Returns Model mapping string features to BERT inputs.

  Args:
    sentence_features: a list with the names of string-valued features.
    seq_length: an integer that defines the sequence length of BERT inputs.

  Returns:
    A Keras Model that can be called on a list or dict of string Tensors
    (with the order or names, resp., given by sentence_features) and
    returns a dict of tensors for input to BERT.
  """

  input_segments = [
      tf.keras.layers.Input(shape=(), dtype=tf.string, name=ft)
      for ft in sentence_features]

  # Tokenize the text to word pieces.
  bert_preprocess = hub.load(tfhub_handle_preprocess)
  tokenizer = hub.KerasLayer(bert_preprocess.tokenize, name='tokenizer')
  segments = [tokenizer(s) for s in input_segments]

  # Optional: Trim segments in a smart way to fit seq_length.
  # Simple cases (like this example) can skip this step and let
  # the next step apply a default truncation to approximately equal lengths.
  truncated_segments = segments

  # Pack inputs. The details (start/end token ids, dict of output tensors)
  # are model-dependent, so this gets loaded from the SavedModel.
  packer = hub.KerasLayer(bert_preprocess.bert_pack_inputs,
                          arguments=dict(seq_length=seq_length),
                          name='packer')
  model_inputs = packer(truncated_segments)
  return tf.keras.Model(input_segments, model_inputs)

Mari kita tunjukkan model preprocessing. Anda akan membuat tes dengan input dua kalimat (input1 dan input2). Outputnya adalah apa model Bert harapkan sebagai masukan: input_word_ids , input_masks dan input_type_ids .

test_preprocess_model = make_bert_preprocess_model(['my_input1', 'my_input2'])
test_text = [np.array(['some random test sentence']),
             np.array(['another sentence'])]
text_preprocessed = test_preprocess_model(test_text)

print('Keys           : ', list(text_preprocessed.keys()))
print('Shape Word Ids : ', text_preprocessed['input_word_ids'].shape)
print('Word Ids       : ', text_preprocessed['input_word_ids'][0, :16])
print('Shape Mask     : ', text_preprocessed['input_mask'].shape)
print('Input Mask     : ', text_preprocessed['input_mask'][0, :16])
print('Shape Type Ids : ', text_preprocessed['input_type_ids'].shape)
print('Type Ids       : ', text_preprocessed['input_type_ids'][0, :16])

Keys           :  ['input_word_ids', 'input_mask', 'input_type_ids']
Shape Word Ids :  (1, 128)
Word Ids       :  tf.Tensor(
[ 101 2070 6721 3231 6251  102 2178 6251  102    0    0    0    0    0
    0    0], shape=(16,), dtype=int32)
Shape Mask     :  (1, 128)
Input Mask     :  tf.Tensor([1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0], shape=(16,), dtype=int32)
Shape Type Ids :  (1, 128)
Type Ids       :  tf.Tensor([0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0], shape=(16,), dtype=int32)

Mari kita lihat struktur model, dengan memperhatikan dua input yang baru saja Anda definisikan.

tf.keras.utils.plot_model(test_preprocess_model, show_shapes=True, show_dtype=True)

('You must install pydot (`pip install pydot`) and install graphviz (see instructions at https://graphviz.gitlab.io/download/) ', 'for plot_model/model_to_dot to work.')

Untuk menerapkan preprocessing di semua masukan dari dataset, Anda akan menggunakan map fungsi dari dataset. Hasilnya kemudian cache untuk kinerja .

AUTOTUNE = tf.data.AUTOTUNE


def load_dataset_from_tfds(in_memory_ds, info, split, batch_size,
                           bert_preprocess_model):
  is_training = split.startswith('train')
  dataset = tf.data.Dataset.from_tensor_slices(in_memory_ds[split])
  num_examples = info.splits[split].num_examples

  if is_training:
    dataset = dataset.shuffle(num_examples)
    dataset = dataset.repeat()
  dataset = dataset.batch(batch_size)
  dataset = dataset.map(lambda ex: (bert_preprocess_model(ex), ex['label']))
  dataset = dataset.cache().prefetch(buffer_size=AUTOTUNE)
  return dataset, num_examples

Tentukan model Anda

Anda sekarang siap untuk menentukan model Anda untuk klasifikasi kalimat atau pasangan kalimat dengan memasukkan input yang telah diproses sebelumnya melalui encoder BERT dan menempatkan pengklasifikasi linier di atas (atau susunan lapisan lain yang Anda inginkan), dan menggunakan dropout untuk regularisasi.

def build_classifier_model(num_classes):

  class Classifier(tf.keras.Model):
    def __init__(self, num_classes):
      super(Classifier, self).__init__(name="prediction")
      self.encoder = hub.KerasLayer(tfhub_handle_encoder, trainable=True)
      self.dropout = tf.keras.layers.Dropout(0.1)
      self.dense = tf.keras.layers.Dense(num_classes)

    def call(self, preprocessed_text):
      encoder_outputs = self.encoder(preprocessed_text)
      pooled_output = encoder_outputs["pooled_output"]
      x = self.dropout(pooled_output)
      x = self.dense(x)
      return x

  model = Classifier(num_classes)
  return model

Mari kita coba menjalankan model pada beberapa input yang telah diproses sebelumnya.

test_classifier_model = build_classifier_model(2)
bert_raw_result = test_classifier_model(text_preprocessed)
print(tf.sigmoid(bert_raw_result))

tf.Tensor([[0.29329836 0.44367802]], shape=(1, 2), dtype=float32)

Pilih tugas dari GLUE

Anda akan menggunakan TensorFlow DataSet dari LEM benchmark suite.

Colab memungkinkan Anda mendownload set data kecil ini ke sistem file lokal, dan kode di bawah membacanya seluruhnya ke dalam memori, karena host pekerja TPU yang terpisah tidak dapat mengakses sistem file lokal dari waktu proses colab.

Untuk dataset yang lebih besar, Anda harus membuat sendiri Google Cloud Storage ember dan memiliki pekerja TPU membaca data dari sana. Anda dapat mempelajari lebih lanjut di buku TPU .

Disarankan untuk memulai dengan kumpulan data CoLa (untuk satu kalimat) atau MRPC (untuk beberapa kalimat) karena ini kecil dan tidak membutuhkan waktu lama untuk menyempurnakannya.

tfds_name = 'glue/cola' 

tfds_info = tfds.builder(tfds_name).info

sentence_features = list(tfds_info.features.keys())
sentence_features.remove('idx')
sentence_features.remove('label')

available_splits = list(tfds_info.splits.keys())
train_split = 'train'
validation_split = 'validation'
test_split = 'test'
if tfds_name == 'glue/mnli':
  validation_split = 'validation_matched'
  test_split = 'test_matched'

num_classes = tfds_info.features['label'].num_classes
num_examples = tfds_info.splits.total_num_examples

print(f'Using {tfds_name} from TFDS')
print(f'This dataset has {num_examples} examples')
print(f'Number of classes: {num_classes}')
print(f'Features {sentence_features}')
print(f'Splits {available_splits}')

with tf.device('/job:localhost'):
  # batch_size=-1 is a way to load the dataset into memory
  in_memory_ds = tfds.load(tfds_name, batch_size=-1, shuffle_files=True)

# The code below is just to show some samples from the selected dataset
print(f'Here are some sample rows from {tfds_name} dataset')
sample_dataset = tf.data.Dataset.from_tensor_slices(in_memory_ds[train_split])

labels_names = tfds_info.features['label'].names
print(labels_names)
print()

sample_i = 1
for sample_row in sample_dataset.take(5):
  samples = [sample_row[feature] for feature in sentence_features]
  print(f'sample row {sample_i}')
  for sample in samples:
    print(sample.numpy())
  sample_label = sample_row['label']

  print(f'label: {sample_label} ({labels_names[sample_label]})')
  print()
  sample_i += 1

Using glue/cola from TFDS
This dataset has 10657 examples
Number of classes: 2
Features ['sentence']
Splits ['train', 'validation', 'test']
Here are some sample rows from glue/cola dataset
['unacceptable', 'acceptable']

sample row 1
b'It is this hat that it is certain that he was wearing.'
label: 1 (acceptable)

sample row 2
b'Her efficient looking up of the answer pleased the boss.'
label: 1 (acceptable)

sample row 3
b'Both the workers will wear carnations.'
label: 1 (acceptable)

sample row 4
b'John enjoyed drawing trees for his syntax homework.'
label: 1 (acceptable)

sample row 5
b'We consider Leslie rather foolish, and Lou a complete idiot.'
label: 1 (acceptable)

Dataset juga menentukan jenis masalah (klasifikasi atau regresi) dan fungsi kerugian yang sesuai untuk pelatihan.

def get_configuration(glue_task):

  loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

  if glue_task == 'glue/cola':
    metrics = tfa.metrics.MatthewsCorrelationCoefficient(num_classes=2)
  else:
    metrics = tf.keras.metrics.SparseCategoricalAccuracy(
        'accuracy', dtype=tf.float32)

  return metrics, loss

Latih model Anda

Terakhir, Anda dapat melatih model end-to-end pada dataset yang Anda pilih.

Distribusi

Ingat kode penyiapan di bagian atas, yang telah menghubungkan runtime colab ke pekerja TPU dengan beberapa perangkat TPU. Untuk mendistribusikan pelatihan ke mereka, Anda akan membuat dan mengkompilasi model Keras utama Anda dalam lingkup strategi distribusi TPU. (Untuk rincian, lihat Terdistribusi pelatihan dengan Keras .)

Prapemrosesan, di sisi lain, berjalan pada CPU host pekerja, bukan TPU, sehingga model Keras untuk prapemrosesan serta kumpulan data pelatihan dan validasi yang dipetakan dengannya dibangun di luar cakupan strategi distribusi. Panggilan untuk Model.fit() akan mengurus mendistribusikan berlalu-dalam dataset dengan model replika.

Pengoptimal

Fine-tuning mengikuti optimizer set-up dari BERT pra-pelatihan (seperti dalam teks Klasifikasikan dengan Bert ): Ia menggunakan optimizer AdamW dengan pembusukan linear dari tingkat pembelajaran awal nosional, diawali dengan fase pemanasan linear selama pertama 10% dari langkah-langkah pelatihan ( num_warmup_steps ). Sejalan dengan makalah BERT, tingkat pembelajaran awal lebih kecil untuk fine-tuning (terbaik dari 5e-5, 3e-5, 2e-5).

epochs = 3
batch_size = 32
init_lr = 2e-5

print(f'Fine tuning {tfhub_handle_encoder} model')
bert_preprocess_model = make_bert_preprocess_model(sentence_features)

with strategy.scope():

  # metric have to be created inside the strategy scope
  metrics, loss = get_configuration(tfds_name)

  train_dataset, train_data_size = load_dataset_from_tfds(
      in_memory_ds, tfds_info, train_split, batch_size, bert_preprocess_model)
  steps_per_epoch = train_data_size // batch_size
  num_train_steps = steps_per_epoch * epochs
  num_warmup_steps = num_train_steps // 10

  validation_dataset, validation_data_size = load_dataset_from_tfds(
      in_memory_ds, tfds_info, validation_split, batch_size,
      bert_preprocess_model)
  validation_steps = validation_data_size // batch_size

  classifier_model = build_classifier_model(num_classes)

  optimizer = optimization.create_optimizer(
      init_lr=init_lr,
      num_train_steps=num_train_steps,
      num_warmup_steps=num_warmup_steps,
      optimizer_type='adamw')

  classifier_model.compile(optimizer=optimizer, loss=loss, metrics=[metrics])

  classifier_model.fit(
      x=train_dataset,
      validation_data=validation_dataset,
      steps_per_epoch=steps_per_epoch,
      epochs=epochs,
      validation_steps=validation_steps)

Fine tuning https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/3 model
/tmpfs/src/tf_docs_env/lib/python3.6/site-packages/keras/engine/functional.py:585: UserWarning: Input dict contained keys ['idx', 'label'] which did not match any model input. They will be ignored by the model.
  [n for n in tensors.keys() if n not in ref_input_names])
Epoch 1/3
/tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/framework/indexed_slices.py:449: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("AdamWeightDecay/gradients/StatefulPartitionedCall:1", shape=(None,), dtype=int32), values=Tensor("clip_by_global_norm/clip_by_global_norm/_0:0", dtype=float32), dense_shape=Tensor("AdamWeightDecay/gradients/StatefulPartitionedCall:2", shape=(None,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "shape. This may consume a large amount of memory." % value)
267/267 [==============================] - 86s 81ms/step - loss: 0.6092 - MatthewsCorrelationCoefficient: 0.0000e+00 - val_loss: 0.4846 - val_MatthewsCorrelationCoefficient: 0.0000e+00
Epoch 2/3
267/267 [==============================] - 14s 53ms/step - loss: 0.3774 - MatthewsCorrelationCoefficient: 0.0000e+00 - val_loss: 0.5322 - val_MatthewsCorrelationCoefficient: 0.0000e+00
Epoch 3/3
267/267 [==============================] - 14s 53ms/step - loss: 0.2623 - MatthewsCorrelationCoefficient: 0.0000e+00 - val_loss: 0.6469 - val_MatthewsCorrelationCoefficient: 0.0000e+00

Ekspor untuk inferensi

Anda akan membuat model akhir yang memiliki bagian preprocessing dan BERT yang baru saja kita buat.

Pada waktu inferensi, prapemrosesan perlu menjadi bagian dari model (karena tidak ada lagi antrian input terpisah seperti untuk data pelatihan yang melakukannya). Pra-pemrosesan bukan hanya komputasi; ia memiliki sumber daya sendiri (tabel vocab) yang harus dilampirkan ke Model Keras yang disimpan untuk ekspor. Perakitan terakhir inilah yang akan disimpan.

Anda akan menyimpan model pada colab dan kemudian Anda dapat men-download untuk menyimpannya untuk masa depan (View -> Daftar isi -> Files).

main_save_path = './my_models'
bert_type = tfhub_handle_encoder.split('/')[-2]
saved_model_name = f'{tfds_name.replace("/", "_")}_{bert_type}'

saved_model_path = os.path.join(main_save_path, saved_model_name)

preprocess_inputs = bert_preprocess_model.inputs
bert_encoder_inputs = bert_preprocess_model(preprocess_inputs)
bert_outputs = classifier_model(bert_encoder_inputs)
model_for_export = tf.keras.Model(preprocess_inputs, bert_outputs)

print('Saving', saved_model_path)

# Save everything on the Colab host (even the variables from TPU memory)
save_options = tf.saved_model.SaveOptions(experimental_io_device='/job:localhost')
model_for_export.save(saved_model_path, include_optimizer=False,
                      options=save_options)

Saving ./my_models/glue_cola_bert_en_uncased_L-12_H-768_A-12
WARNING:absl:Found untraced functions such as restored_function_body, restored_function_body, restored_function_body, restored_function_body, restored_function_body while saving (showing 5 of 910). These functions will not be directly callable after loading.

Uji modelnya

Langkah terakhir adalah menguji hasil model yang Anda ekspor.

Hanya untuk membuat beberapa perbandingan, mari muat ulang model dan uji menggunakan beberapa input dari pengujian split dari dataset.

with tf.device('/job:localhost'):
  reloaded_model = tf.saved_model.load(saved_model_path)

Metode utilitas

def prepare(record):
  model_inputs = [[record[ft]] for ft in sentence_features]
  return model_inputs


def prepare_serving(record):
  model_inputs = {ft: record[ft] for ft in sentence_features}
  return model_inputs


def print_bert_results(test, bert_result, dataset_name):

  bert_result_class = tf.argmax(bert_result, axis=1)[0]

  if dataset_name == 'glue/cola':
    print('sentence:', test[0].numpy())
    if bert_result_class == 1:
      print('This sentence is acceptable')
    else:
      print('This sentence is unacceptable')

  elif dataset_name == 'glue/sst2':
    print('sentence:', test[0])
    if bert_result_class == 1:
      print('This sentence has POSITIVE sentiment')
    else:
      print('This sentence has NEGATIVE sentiment')

  elif dataset_name == 'glue/mrpc':
    print('sentence1:', test[0])
    print('sentence2:', test[1])
    if bert_result_class == 1:
      print('Are a paraphrase')
    else:
      print('Are NOT a paraphrase')

  elif dataset_name == 'glue/qqp':
    print('question1:', test[0])
    print('question2:', test[1])
    if bert_result_class == 1:
      print('Questions are similar')
    else:
      print('Questions are NOT similar')

  elif dataset_name == 'glue/mnli':
    print('premise   :', test[0])
    print('hypothesis:', test[1])
    if bert_result_class == 1:
      print('This premise is NEUTRAL to the hypothesis')
    elif bert_result_class == 2:
      print('This premise CONTRADICTS the hypothesis')
    else:
      print('This premise ENTAILS the hypothesis')

  elif dataset_name == 'glue/qnli':
    print('question:', test[0])
    print('sentence:', test[1])
    if bert_result_class == 1:
      print('The question is NOT answerable by the sentence')
    else:
      print('The question is answerable by the sentence')

  elif dataset_name == 'glue/rte':
    print('sentence1:', test[0])
    print('sentence2:', test[1])
    if bert_result_class == 1:
      print('Sentence1 DOES NOT entails sentence2')
    else:
      print('Sentence1 entails sentence2')

  elif dataset_name == 'glue/wnli':
    print('sentence1:', test[0])
    print('sentence2:', test[1])
    if bert_result_class == 1:
      print('Sentence1 DOES NOT entails sentence2')
    else:
      print('Sentence1 entails sentence2')

  print('BERT raw results:', bert_result[0])
  print()

Tes

with tf.device('/job:localhost'):
  test_dataset = tf.data.Dataset.from_tensor_slices(in_memory_ds[test_split])
  for test_row in test_dataset.shuffle(1000).map(prepare).take(5):
    if len(sentence_features) == 1:
      result = reloaded_model(test_row[0])
    else:
      result = reloaded_model(list(test_row))

    print_bert_results(test_row, result, tfds_name)

sentence: [b'An old woman languished in the forest.']
This sentence is acceptable
BERT raw results: tf.Tensor([-1.7032353  3.3714833], shape=(2,), dtype=float32)

sentence: [b"I went to the movies and didn't pick up the shirts."]
This sentence is acceptable
BERT raw results: tf.Tensor([-0.73970896  1.0806316 ], shape=(2,), dtype=float32)

sentence: [b"Every essay that she's written and which I've read is on that pile."]
This sentence is acceptable
BERT raw results: tf.Tensor([-0.7034159  0.6236454], shape=(2,), dtype=float32)

sentence: [b'Either Bill ate the peaches, or Harry.']
This sentence is unacceptable
BERT raw results: tf.Tensor([ 0.05972151 -0.08620442], shape=(2,), dtype=float32)

sentence: [b'I ran into the baker from whom I bought these bagels.']
This sentence is acceptable
BERT raw results: tf.Tensor([-1.6862067  3.285925 ], shape=(2,), dtype=float32)

Jika Anda ingin menggunakan model Anda pada TF Melayani , ingatlah bahwa ia akan memanggil SavedModel Anda melalui salah satu tanda tangan bernama. Perhatikan ada beberapa perbedaan kecil dalam input. Dengan Python, Anda dapat mengujinya sebagai berikut:

with tf.device('/job:localhost'):
  serving_model = reloaded_model.signatures['serving_default']
  for test_row in test_dataset.shuffle(1000).map(prepare_serving).take(5):
    result = serving_model(**test_row)
    # The 'prediction' key is the classifier's defined model name.
    print_bert_results(list(test_row.values()), result['prediction'], tfds_name)

sentence: b'Everyone attended more than two seminars.'
This sentence is acceptable
BERT raw results: tf.Tensor([-1.5594155  2.862155 ], shape=(2,), dtype=float32)

sentence: b'Most columnists claim that a senior White House official has been briefing them.'
This sentence is acceptable
BERT raw results: tf.Tensor([-1.6298996  3.3155093], shape=(2,), dtype=float32)

sentence: b"That my father, he's lived here all his life is well known to those cops."
This sentence is acceptable
BERT raw results: tf.Tensor([-1.2048947  1.8589772], shape=(2,), dtype=float32)

sentence: b'Ourselves like us.'
This sentence is acceptable
BERT raw results: tf.Tensor([-1.2723312  2.0494034], shape=(2,), dtype=float32)

sentence: b'John is clever.'
This sentence is acceptable
BERT raw results: tf.Tensor([-1.6516167  3.3147635], shape=(2,), dtype=float32)

Anda melakukannya! Model yang Anda simpan dapat digunakan untuk penyajian atau inferensi sederhana dalam suatu proses, dengan api yang lebih sederhana dengan lebih sedikit kode dan lebih mudah untuk dipelihara.

Langkah selanjutnya

Sekarang setelah Anda mencoba salah satu model BERT dasar, Anda dapat mencoba yang lain untuk mencapai akurasi yang lebih tinggi atau mungkin dengan versi model yang lebih kecil.

Anda juga dapat mencoba di kumpulan data lain.