این صفحه به‌وسیله ‏Cloud Translation API‏ ترجمه شده است.

با استفاده از BERT در TPU ، کارهای GLUE را حل کنید

مشاهده در TensorFlow.org

در Google Colab اجرا شود

در GitHub مشاهده کنید

دانلود دفترچه یادداشت

مدل TF Hub را ببینید

از BERT می توان برای حل بسیاری از مشکلات در پردازش زبان طبیعی استفاده کرد. شما خواهید آموخت که چگونه به برت ریز لحن برای انجام وظایف بسیاری از معیار GLUE :

کولا (مجموعه زبانی مقبولیت): آیا حکم دستوری صحیح؟
SST-2 (استنفورد تمایلات Treebank): وظیفه این است که پیش بینی احساسات یک جمله داده شده است.
MRPC (تحقیقات مایکروسافت تفسیر جسم): تعیین اینکه آیا یک جفت از جملات معنایی معادل هستند.
QQP (اورجینال سوال Pairs2): تعیین اینکه آیا یک جفت از سوالات معنایی معادل هستند.
MNLI (چند سبک طبیعی زبان استنتاج): با توجه به یک جمله فرض و یک جمله فرضیه، کار این است که پیش بینی اینکه آیا این فرض مستلزم فرضیه (استلزام)، در تضاد با فرضیه (تناقض)، یا نه (خنثی).
QNLI (پرسش و پاسخ زبان طبیعی استنتاج): وظیفه این است که برای تعیین اینکه آیا حکم زمینه شامل پاسخ به این سوال است.
RTE (شناخت متنی استلزام): تعیین اگر یک جمله مستلزم یک فرضیه داده یا نه.
WNLI (وینوگراد طبیعی زبان استنتاج): وظیفه این است که پیش بینی کند اگر جمله را با ضمیر جایگزین است حکم اصلی به دنبال داشت.

این آموزش شامل کدهای سرتاسر کامل برای آموزش این مدل ها بر روی یک TPU است. شما همچنین می توانید این نوت بوک را با تغییر یک خط (که در زیر توضیح داده شده است) روی یک GPU اجرا کنید.

در این دفترچه یادداشت خواهید داشت:

یک مدل BERT را از TensorFlow Hub بارگیری کنید
یکی از وظایف GLUE را انتخاب کنید و مجموعه داده را دانلود کنید
متن را از قبل پردازش کنید
تنظیم دقیق BERT (نمونه هایی برای مجموعه داده های تک جمله ای و چند جمله ای آورده شده است)
مدل آموزش دیده را ذخیره کنید و از آن استفاده کنید

برپایی

شما از یک مدل جداگانه برای پیش پردازش متن قبل از استفاده از آن برای تنظیم دقیق BERT استفاده خواهید کرد. این مدل بستگی tensorflow / متن ، که شما در زیر نصب خواهد شد.

pip install -q -U tensorflow-text

شما خواهید بهینه ساز AdamW از استفاده tensorflow / مدل به برت ریز لحن، که شما و همچنین نصب کنید.

pip install -q -U tf-models-official

pip install -U tfds-nightly

import os
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_datasets as tfds
import tensorflow_text as text  # A dependency of the preprocessing model
import tensorflow_addons as tfa
from official.nlp import optimization
import numpy as np

tf.get_logger().setLevel('ERROR')

/tmpfs/src/tf_docs_env/lib/python3.6/site-packages/requests/__init__.py:104: RequestsDependencyWarning: urllib3 (1.26.7) or chardet (2.3.0)/charset_normalizer (2.0.7) doesn't match a supported version!
  RequestsDependencyWarning)

سپس، TFHub را پیکربندی کنید تا نقاط بازرسی را مستقیماً از سطل‌های ذخیره‌سازی ابری TFHub بخواند. این فقط در هنگام اجرای مدل های TFHub در TPU توصیه می شود.

بدون این تنظیم، TFHub فایل فشرده را دانلود کرده و نقطه چک را به صورت محلی استخراج می کند. تلاش برای بارگیری از این فایل های محلی با خطای زیر ناموفق خواهد بود:

InvalidArgumentError: Unimplemented: File system scheme '[local]' not implemented

دلیل این است که TPU تنها می توانید به طور مستقیم از سطل ابر ذخیره سازی به عنوان خوانده شده .

os.environ["TFHUB_MODEL_LOAD_FORMAT"]="UNCOMPRESSED"

به کارگر TPU متصل شوید

کد زیر به TPU worker متصل می شود و دستگاه پیش فرض TensorFlow را به دستگاه CPU روی TPU worker تغییر می دهد. همچنین یک استراتژی توزیع TPU را تعریف می کند که از آن برای توزیع آموزش مدل بر روی 8 هسته TPU جداگانه موجود در این یک TPU Worker استفاده خواهید کرد. TensorFlow را مشاهده کنید راهنمای TPU برای اطلاعات بیشتر.

import os

if os.environ['COLAB_TPU_ADDR']:
  cluster_resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='')
  tf.config.experimental_connect_to_cluster(cluster_resolver)
  tf.tpu.experimental.initialize_tpu_system(cluster_resolver)
  strategy = tf.distribute.TPUStrategy(cluster_resolver)
  print('Using TPU')
elif tf.config.list_physical_devices('GPU'):
  strategy = tf.distribute.MirroredStrategy()
  print('Using GPU')
else:
  raise ValueError('Running on CPU is not recommended.')

Using TPU

بارگیری مدل ها از TensorFlow Hub

در اینجا می توانید انتخاب کنید که کدام مدل BERT را از TensorFlow Hub بارگیری کنید و آن را تنظیم کنید. چندین مدل BERT برای انتخاب وجود دارد.

برت پایه ، Uncased و هفت مدل با وزن آموزش دیده منتشر شده توسط نویسندگان برت اصلی است.
BERTs کوچک دارند همان معماری کلی اما کمتر و / یا بلوک ترانسفورماتور کوچکتر، که به شما اجازه مبادلات بین سرعت، اندازه و کیفیت اکتشاف.
آلبرت : چهار اندازه مختلف از "A بازگشت به محتوا | برت" است که باعث کاهش اندازه مدل (اما نه زمان محاسبه) با به اشتراک گذاشتن پارامترهای بین لایه.
برت کارشناسان : هشت مدل که همه معماری برت پایه اما ارائه یک انتخاب بین حوزه های قبل از آموزش های مختلف، به چین بیشتر از نزدیک با کار مورد نظر.
الکترا است همان معماری برت (در سه اندازه مختلف)، اما می شود قبل از آموزش دیده به عنوان یک ممیز در یک مجموعه به بالا که شبیه به یک خصمانه شبکه زایشی (GAN).
برت با صحبت کردن سر توجه و دردار از Gelu [ پایه ، بزرگ ] دو بهبود به هسته از معماری ترانسفورماتور.

برای جزئیات بیشتر به مستندات مدل لینک شده در بالا مراجعه کنید.

در این آموزش شما با BERT-base شروع می کنید. برای دقت بیشتر می‌توانید از مدل‌های بزرگ‌تر و جدیدتر یا برای زمان‌های تمرین سریع‌تر از مدل‌های کوچک‌تر استفاده کنید. برای تغییر مدل، فقط باید یک خط کد را تغییر دهید (در زیر نشان داده شده است). تمام تفاوت ها در SavedModel که از TensorFlow Hub دانلود خواهید کرد، محصور شده است.

یک مدل BERT را برای تنظیم دقیق انتخاب کنید

bert_model_name = 'bert_en_uncased_L-12_H-768_A-12' 

map_name_to_handle = {
    'bert_en_uncased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/3',
    'bert_en_uncased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_uncased_L-24_H-1024_A-16/3',
    'bert_en_wwm_uncased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_wwm_uncased_L-24_H-1024_A-16/3',
    'bert_en_cased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_cased_L-12_H-768_A-12/3',
    'bert_en_cased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_cased_L-24_H-1024_A-16/3',
    'bert_en_wwm_cased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_wwm_cased_L-24_H-1024_A-16/3',
    'bert_multi_cased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_multi_cased_L-12_H-768_A-12/3',
    'small_bert/bert_en_uncased_L-2_H-128_A-2':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-2_H-128_A-2/1',
    'small_bert/bert_en_uncased_L-2_H-256_A-4':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-2_H-256_A-4/1',
    'small_bert/bert_en_uncased_L-2_H-512_A-8':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-2_H-512_A-8/1',
    'small_bert/bert_en_uncased_L-2_H-768_A-12':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-2_H-768_A-12/1',
    'small_bert/bert_en_uncased_L-4_H-128_A-2':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-4_H-128_A-2/1',
    'small_bert/bert_en_uncased_L-4_H-256_A-4':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-4_H-256_A-4/1',
    'small_bert/bert_en_uncased_L-4_H-512_A-8':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-4_H-512_A-8/1',
    'small_bert/bert_en_uncased_L-4_H-768_A-12':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-4_H-768_A-12/1',
    'small_bert/bert_en_uncased_L-6_H-128_A-2':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-6_H-128_A-2/1',
    'small_bert/bert_en_uncased_L-6_H-256_A-4':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-6_H-256_A-4/1',
    'small_bert/bert_en_uncased_L-6_H-512_A-8':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-6_H-512_A-8/1',
    'small_bert/bert_en_uncased_L-6_H-768_A-12':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-6_H-768_A-12/1',
    'small_bert/bert_en_uncased_L-8_H-128_A-2':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-8_H-128_A-2/1',
    'small_bert/bert_en_uncased_L-8_H-256_A-4':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-8_H-256_A-4/1',
    'small_bert/bert_en_uncased_L-8_H-512_A-8':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-8_H-512_A-8/1',
    'small_bert/bert_en_uncased_L-8_H-768_A-12':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-8_H-768_A-12/1',
    'small_bert/bert_en_uncased_L-10_H-128_A-2':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-10_H-128_A-2/1',
    'small_bert/bert_en_uncased_L-10_H-256_A-4':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-10_H-256_A-4/1',
    'small_bert/bert_en_uncased_L-10_H-512_A-8':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-10_H-512_A-8/1',
    'small_bert/bert_en_uncased_L-10_H-768_A-12':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-10_H-768_A-12/1',
    'small_bert/bert_en_uncased_L-12_H-128_A-2':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-12_H-128_A-2/1',
    'small_bert/bert_en_uncased_L-12_H-256_A-4':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-12_H-256_A-4/1',
    'small_bert/bert_en_uncased_L-12_H-512_A-8':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-12_H-512_A-8/1',
    'small_bert/bert_en_uncased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-12_H-768_A-12/1',
    'albert_en_base':
        'https://tfhub.dev/tensorflow/albert_en_base/2',
    'albert_en_large':
        'https://tfhub.dev/tensorflow/albert_en_large/2',
    'albert_en_xlarge':
        'https://tfhub.dev/tensorflow/albert_en_xlarge/2',
    'albert_en_xxlarge':
        'https://tfhub.dev/tensorflow/albert_en_xxlarge/2',
    'electra_small':
        'https://tfhub.dev/google/electra_small/2',
    'electra_base':
        'https://tfhub.dev/google/electra_base/2',
    'experts_pubmed':
        'https://tfhub.dev/google/experts/bert/pubmed/2',
    'experts_wiki_books':
        'https://tfhub.dev/google/experts/bert/wiki_books/2',
    'talking-heads_base':
        'https://tfhub.dev/tensorflow/talkheads_ggelu_bert_en_base/1',
    'talking-heads_large':
        'https://tfhub.dev/tensorflow/talkheads_ggelu_bert_en_large/1',
}

map_model_to_preprocess = {
    'bert_en_uncased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'bert_en_uncased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'bert_en_wwm_cased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_cased_preprocess/3',
    'bert_en_cased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_cased_preprocess/3',
    'bert_en_cased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_cased_preprocess/3',
    'bert_en_wwm_uncased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-2_H-128_A-2':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-2_H-256_A-4':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-2_H-512_A-8':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-2_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-4_H-128_A-2':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-4_H-256_A-4':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-4_H-512_A-8':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-4_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-6_H-128_A-2':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-6_H-256_A-4':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-6_H-512_A-8':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-6_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-8_H-128_A-2':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-8_H-256_A-4':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-8_H-512_A-8':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-8_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-10_H-128_A-2':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-10_H-256_A-4':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-10_H-512_A-8':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-10_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-12_H-128_A-2':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-12_H-256_A-4':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-12_H-512_A-8':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'bert_multi_cased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_multi_cased_preprocess/3',
    'albert_en_base':
        'https://tfhub.dev/tensorflow/albert_en_preprocess/3',
    'albert_en_large':
        'https://tfhub.dev/tensorflow/albert_en_preprocess/3',
    'albert_en_xlarge':
        'https://tfhub.dev/tensorflow/albert_en_preprocess/3',
    'albert_en_xxlarge':
        'https://tfhub.dev/tensorflow/albert_en_preprocess/3',
    'electra_small':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'electra_base':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'experts_pubmed':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'experts_wiki_books':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'talking-heads_base':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'talking-heads_large':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
}

tfhub_handle_encoder = map_name_to_handle[bert_model_name]
tfhub_handle_preprocess = map_model_to_preprocess[bert_model_name]

print('BERT model selected           :', tfhub_handle_encoder)
print('Preprocessing model auto-selected:', tfhub_handle_preprocess)

BERT model selected           : https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/3
Preprocessing model auto-selected: https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3

متن را از قبل پردازش کنید

در متن طبقه بندی با برت COLAB مدل پردازش استفاده شده است به طور مستقیم با رمزگذار برت تعبیه شده است.

این آموزش نشان می دهد که چگونه با استفاده از Dataset.map، پیش پردازش را به عنوان بخشی از خط لوله ورودی خود برای آموزش انجام دهید و سپس آن را در مدلی که برای استنتاج صادر می شود ادغام کنید. به این ترتیب، هم آموزش و هم استنتاج می‌توانند از ورودی‌های متن خام کار کنند، اگرچه خود TPU به ورودی‌های عددی نیاز دارد.

مورد نیاز TPU به کنار، آن می تواند کمک به عملکرد داشته پردازش ناهمگام در یک خط لوله ورودی انجام می شود (شما می توانید در یاد راهنمای عملکرد tf.data ).

این آموزش همچنین نحوه ساخت مدل‌های چند ورودی و نحوه تنظیم طول ترتیب ورودی‌ها را به BERT نشان می‌دهد.

بیایید مدل پیش پردازش را نشان دهیم.

bert_preprocess = hub.load(tfhub_handle_preprocess)
tok = bert_preprocess.tokenize(tf.constant(['Hello TensorFlow!']))
print(tok)

<tf.RaggedTensor [[[7592], [23435, 12314], [999]]]>

هر مدل از پیش پردازش نیز یک روش، فراهم می کند .bert_pack_inputs(tensors, seq_length) ، که طول می کشد یک لیست از نشانه های (مانند tok بالا) و یک مقدار طول دنباله. این ورودی ها را برای ایجاد فرهنگ لغت تانسورها در قالب مورد انتظار مدل BERT بسته می کند.

text_preprocessed = bert_preprocess.bert_pack_inputs([tok, tok], tf.constant(20))

print('Shape Word Ids : ', text_preprocessed['input_word_ids'].shape)
print('Word Ids       : ', text_preprocessed['input_word_ids'][0, :16])
print('Shape Mask     : ', text_preprocessed['input_mask'].shape)
print('Input Mask     : ', text_preprocessed['input_mask'][0, :16])
print('Shape Type Ids : ', text_preprocessed['input_type_ids'].shape)
print('Type Ids       : ', text_preprocessed['input_type_ids'][0, :16])

Shape Word Ids :  (1, 20)
Word Ids       :  tf.Tensor(
[  101  7592 23435 12314   999   102  7592 23435 12314   999   102     0
     0     0     0     0], shape=(16,), dtype=int32)
Shape Mask     :  (1, 20)
Input Mask     :  tf.Tensor([1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0], shape=(16,), dtype=int32)
Shape Type Ids :  (1, 20)
Type Ids       :  tf.Tensor([0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0], shape=(16,), dtype=int32)

در اینجا جزئیاتی وجود دارد که باید به آنها توجه کرد:

input_mask ماسک اجازه می دهد تا مدل به افتراق پاک بین محتوا و لایه. ماسک به شکل همان input_word_ids است، و شامل 1 در هر نقطه input_word_ids است بالشتک است.
input_type_ids است همان شکل به عنوان input_mask ، اما در داخل منطقه غیر خالی، شامل یک 0 یا 1 نشان می دهد که جمله این نشانه رمز بخشی از است.

در مرحله بعد، یک مدل پیش پردازش ایجاد می کنید که تمام این منطق را در بر می گیرد. مدل شما رشته هایی را به عنوان ورودی دریافت می کند و اشیایی با فرمت مناسب را برمی گرداند که می توانند به BERT ارسال شوند.

هر مدل BERT دارای یک مدل پیش پردازش خاص است، مطمئن شوید که از مدل مناسبی که در مستندات مدل BERT توضیح داده شده است استفاده کنید.

def make_bert_preprocess_model(sentence_features, seq_length=128):
  """Returns Model mapping string features to BERT inputs.

  Args:
    sentence_features: a list with the names of string-valued features.
    seq_length: an integer that defines the sequence length of BERT inputs.

  Returns:
    A Keras Model that can be called on a list or dict of string Tensors
    (with the order or names, resp., given by sentence_features) and
    returns a dict of tensors for input to BERT.
  """

  input_segments = [
      tf.keras.layers.Input(shape=(), dtype=tf.string, name=ft)
      for ft in sentence_features]

  # Tokenize the text to word pieces.
  bert_preprocess = hub.load(tfhub_handle_preprocess)
  tokenizer = hub.KerasLayer(bert_preprocess.tokenize, name='tokenizer')
  segments = [tokenizer(s) for s in input_segments]

  # Optional: Trim segments in a smart way to fit seq_length.
  # Simple cases (like this example) can skip this step and let
  # the next step apply a default truncation to approximately equal lengths.
  truncated_segments = segments

  # Pack inputs. The details (start/end token ids, dict of output tensors)
  # are model-dependent, so this gets loaded from the SavedModel.
  packer = hub.KerasLayer(bert_preprocess.bert_pack_inputs,
                          arguments=dict(seq_length=seq_length),
                          name='packer')
  model_inputs = packer(truncated_segments)
  return tf.keras.Model(input_segments, model_inputs)

بیایید مدل پیش پردازش را نشان دهیم. شما یک تست با ورودی دو جمله (input1 و input2) ایجاد خواهید کرد. خروجی آن چیزی که یک مدل برت به عنوان ورودی انتظار: input_word_ids ، input_masks و input_type_ids .

test_preprocess_model = make_bert_preprocess_model(['my_input1', 'my_input2'])
test_text = [np.array(['some random test sentence']),
             np.array(['another sentence'])]
text_preprocessed = test_preprocess_model(test_text)

print('Keys           : ', list(text_preprocessed.keys()))
print('Shape Word Ids : ', text_preprocessed['input_word_ids'].shape)
print('Word Ids       : ', text_preprocessed['input_word_ids'][0, :16])
print('Shape Mask     : ', text_preprocessed['input_mask'].shape)
print('Input Mask     : ', text_preprocessed['input_mask'][0, :16])
print('Shape Type Ids : ', text_preprocessed['input_type_ids'].shape)
print('Type Ids       : ', text_preprocessed['input_type_ids'][0, :16])

Keys           :  ['input_word_ids', 'input_mask', 'input_type_ids']
Shape Word Ids :  (1, 128)
Word Ids       :  tf.Tensor(
[ 101 2070 6721 3231 6251  102 2178 6251  102    0    0    0    0    0
    0    0], shape=(16,), dtype=int32)
Shape Mask     :  (1, 128)
Input Mask     :  tf.Tensor([1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0], shape=(16,), dtype=int32)
Shape Type Ids :  (1, 128)
Type Ids       :  tf.Tensor([0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0], shape=(16,), dtype=int32)

بیایید نگاهی به ساختار مدل بیندازیم، با توجه به دو ورودی که به تازگی تعریف کردید.

tf.keras.utils.plot_model(test_preprocess_model, show_shapes=True, show_dtype=True)

('You must install pydot (`pip install pydot`) and install graphviz (see instructions at https://graphviz.gitlab.io/download/) ', 'for plot_model/model_to_dot to work.')

برای اعمال پیش پردازش در تمام ورودی ها را از مجموعه داده، شما را به استفاده از map تابع از مجموعه داده. نتیجه این است که پس از آن برای ذخیره سازی عملکرد .

AUTOTUNE = tf.data.AUTOTUNE


def load_dataset_from_tfds(in_memory_ds, info, split, batch_size,
                           bert_preprocess_model):
  is_training = split.startswith('train')
  dataset = tf.data.Dataset.from_tensor_slices(in_memory_ds[split])
  num_examples = info.splits[split].num_examples

  if is_training:
    dataset = dataset.shuffle(num_examples)
    dataset = dataset.repeat()
  dataset = dataset.batch(batch_size)
  dataset = dataset.map(lambda ex: (bert_preprocess_model(ex), ex['label']))
  dataset = dataset.cache().prefetch(buffer_size=AUTOTUNE)
  return dataset, num_examples

مدل خود را تعریف کنید

اکنون می‌توانید با تغذیه ورودی‌های از پیش پردازش شده از طریق رمزگذار BERT و قرار دادن یک طبقه‌بندی خطی در بالا (یا ترتیب دیگر لایه‌ها به دلخواه) مدل خود را برای طبقه‌بندی جفت جمله یا جمله تعریف کنید، و از حذف برای منظم‌سازی استفاده کنید.

def build_classifier_model(num_classes):

  class Classifier(tf.keras.Model):
    def __init__(self, num_classes):
      super(Classifier, self).__init__(name="prediction")
      self.encoder = hub.KerasLayer(tfhub_handle_encoder, trainable=True)
      self.dropout = tf.keras.layers.Dropout(0.1)
      self.dense = tf.keras.layers.Dense(num_classes)

    def call(self, preprocessed_text):
      encoder_outputs = self.encoder(preprocessed_text)
      pooled_output = encoder_outputs["pooled_output"]
      x = self.dropout(pooled_output)
      x = self.dense(x)
      return x

  model = Classifier(num_classes)
  return model

بیایید مدل را روی برخی ورودی های از پیش پردازش شده اجرا کنیم.

test_classifier_model = build_classifier_model(2)
bert_raw_result = test_classifier_model(text_preprocessed)
print(tf.sigmoid(bert_raw_result))

tf.Tensor([[0.29329836 0.44367802]], shape=(1, 2), dtype=float32)

یک کار از GLUE انتخاب کنید

شما در حال رفتن به استفاده از یک DataSet را TensorFlow از GLUE مجموعه معیار.

Colab به شما امکان می دهد این مجموعه داده های کوچک را در سیستم فایل محلی بارگیری کنید و کد زیر آنها را کاملاً در حافظه می خواند، زیرا میزبان کارگر TPU جداگانه نمی تواند به سیستم فایل محلی زمان اجرا colab دسترسی داشته باشد.

برای مجموعه داده های بزرگتر، شما نیاز به خود را ایجاد کنید گوگل ابر ذخیره سازی سطل و کارگر TPU به عنوان خوانده شده داده ها را از وجود دارد. شما می توانید در یاد راهنمای TPU .

توصیه می شود با مجموعه داده CoLa (برای یک جمله) یا MRPC (برای چند جمله) شروع کنید، زیرا این مجموعه ها کوچک هستند و تنظیم دقیق آن زمان زیادی نمی برد.

tfds_name = 'glue/cola' 

tfds_info = tfds.builder(tfds_name).info

sentence_features = list(tfds_info.features.keys())
sentence_features.remove('idx')
sentence_features.remove('label')

available_splits = list(tfds_info.splits.keys())
train_split = 'train'
validation_split = 'validation'
test_split = 'test'
if tfds_name == 'glue/mnli':
  validation_split = 'validation_matched'
  test_split = 'test_matched'

num_classes = tfds_info.features['label'].num_classes
num_examples = tfds_info.splits.total_num_examples

print(f'Using {tfds_name} from TFDS')
print(f'This dataset has {num_examples} examples')
print(f'Number of classes: {num_classes}')
print(f'Features {sentence_features}')
print(f'Splits {available_splits}')

with tf.device('/job:localhost'):
  # batch_size=-1 is a way to load the dataset into memory
  in_memory_ds = tfds.load(tfds_name, batch_size=-1, shuffle_files=True)

# The code below is just to show some samples from the selected dataset
print(f'Here are some sample rows from {tfds_name} dataset')
sample_dataset = tf.data.Dataset.from_tensor_slices(in_memory_ds[train_split])

labels_names = tfds_info.features['label'].names
print(labels_names)
print()

sample_i = 1
for sample_row in sample_dataset.take(5):
  samples = [sample_row[feature] for feature in sentence_features]
  print(f'sample row {sample_i}')
  for sample in samples:
    print(sample.numpy())
  sample_label = sample_row['label']

  print(f'label: {sample_label} ({labels_names[sample_label]})')
  print()
  sample_i += 1

Using glue/cola from TFDS
This dataset has 10657 examples
Number of classes: 2
Features ['sentence']
Splits ['train', 'validation', 'test']
Here are some sample rows from glue/cola dataset
['unacceptable', 'acceptable']

sample row 1
b'It is this hat that it is certain that he was wearing.'
label: 1 (acceptable)

sample row 2
b'Her efficient looking up of the answer pleased the boss.'
label: 1 (acceptable)

sample row 3
b'Both the workers will wear carnations.'
label: 1 (acceptable)

sample row 4
b'John enjoyed drawing trees for his syntax homework.'
label: 1 (acceptable)

sample row 5
b'We consider Leslie rather foolish, and Lou a complete idiot.'
label: 1 (acceptable)

مجموعه داده همچنین نوع مشکل (طبقه بندی یا رگرسیون) و تابع ضرر مناسب برای آموزش را تعیین می کند.

def get_configuration(glue_task):

  loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

  if glue_task == 'glue/cola':
    metrics = tfa.metrics.MatthewsCorrelationCoefficient(num_classes=2)
  else:
    metrics = tf.keras.metrics.SparseCategoricalAccuracy(
        'accuracy', dtype=tf.float32)

  return metrics, loss

مدل خود را آموزش دهید

در نهایت، می‌توانید مدل را در مجموعه داده‌ای که انتخاب کرده‌اید آموزش دهید.

توزیع

کد راه‌اندازی را در بالا به یاد بیاورید که زمان اجرا colab را به یک کارگر TPU با چندین دستگاه TPU متصل کرده است. برای توزیع آموزش روی آنها، مدل اصلی Keras خود را در محدوده استراتژی توزیع TPU ایجاد و کامپایل خواهید کرد. (برای جزئیات بیشتر، به آموزش با Keras توزیع .)

از سوی دیگر، پیش پردازش بر روی CPU میزبان کارگر اجرا می شود، نه TPU ها، بنابراین مدل Keras برای پیش پردازش و همچنین مجموعه داده های آموزشی و اعتبار سنجی نقشه برداری شده با آن خارج از محدوده استراتژی توزیع ساخته شده اند. تماس به Model.fit() خواهد شد مراقبت از توزیع را تصویب در مجموعه داده به کپی مدل.

بهینه ساز

ریز تنظیم زیر بهینه ساز تنظیم از برت قبل از آموزش (همانطور که در متن طبقه بندی با برت ): با استفاده از بهینه ساز AdamW با فروپاشی خطی از یک نرخ یادگیری اولیه پیمان با فاز گرم کردن خطی اولین پیشوند 10٪ از مراحل آموزش ( num_warmup_steps ). مطابق با مقاله BERT، نرخ یادگیری اولیه برای تنظیم دقیق کمتر است (بهترین 5e-5، 3e-5، 2e-5).

epochs = 3
batch_size = 32
init_lr = 2e-5

print(f'Fine tuning {tfhub_handle_encoder} model')
bert_preprocess_model = make_bert_preprocess_model(sentence_features)

with strategy.scope():

  # metric have to be created inside the strategy scope
  metrics, loss = get_configuration(tfds_name)

  train_dataset, train_data_size = load_dataset_from_tfds(
      in_memory_ds, tfds_info, train_split, batch_size, bert_preprocess_model)
  steps_per_epoch = train_data_size // batch_size
  num_train_steps = steps_per_epoch * epochs
  num_warmup_steps = num_train_steps // 10

  validation_dataset, validation_data_size = load_dataset_from_tfds(
      in_memory_ds, tfds_info, validation_split, batch_size,
      bert_preprocess_model)
  validation_steps = validation_data_size // batch_size

  classifier_model = build_classifier_model(num_classes)

  optimizer = optimization.create_optimizer(
      init_lr=init_lr,
      num_train_steps=num_train_steps,
      num_warmup_steps=num_warmup_steps,
      optimizer_type='adamw')

  classifier_model.compile(optimizer=optimizer, loss=loss, metrics=[metrics])

  classifier_model.fit(
      x=train_dataset,
      validation_data=validation_dataset,
      steps_per_epoch=steps_per_epoch,
      epochs=epochs,
      validation_steps=validation_steps)

Fine tuning https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/3 model
/tmpfs/src/tf_docs_env/lib/python3.6/site-packages/keras/engine/functional.py:585: UserWarning: Input dict contained keys ['idx', 'label'] which did not match any model input. They will be ignored by the model.
  [n for n in tensors.keys() if n not in ref_input_names])
Epoch 1/3
/tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/framework/indexed_slices.py:449: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("AdamWeightDecay/gradients/StatefulPartitionedCall:1", shape=(None,), dtype=int32), values=Tensor("clip_by_global_norm/clip_by_global_norm/_0:0", dtype=float32), dense_shape=Tensor("AdamWeightDecay/gradients/StatefulPartitionedCall:2", shape=(None,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "shape. This may consume a large amount of memory." % value)
267/267 [==============================] - 86s 81ms/step - loss: 0.6092 - MatthewsCorrelationCoefficient: 0.0000e+00 - val_loss: 0.4846 - val_MatthewsCorrelationCoefficient: 0.0000e+00
Epoch 2/3
267/267 [==============================] - 14s 53ms/step - loss: 0.3774 - MatthewsCorrelationCoefficient: 0.0000e+00 - val_loss: 0.5322 - val_MatthewsCorrelationCoefficient: 0.0000e+00
Epoch 3/3
267/267 [==============================] - 14s 53ms/step - loss: 0.2623 - MatthewsCorrelationCoefficient: 0.0000e+00 - val_loss: 0.6469 - val_MatthewsCorrelationCoefficient: 0.0000e+00

صادرات برای استنتاج

شما یک مدل نهایی ایجاد خواهید کرد که دارای بخش پیش پردازش و BERT تنظیم شده ای است که ما به تازگی ایجاد کرده ایم.

در زمان استنتاج، پیش پردازش باید بخشی از مدل باشد (زیرا دیگر صف ورودی جداگانه ای برای داده های آموزشی که این کار را انجام می دهد وجود ندارد). پیش پردازش فقط محاسبات نیست. منابع خاص خود را دارد (جدول vocab) که باید به مدل Keras که برای صادرات ذخیره شده است متصل شود. این مونتاژ نهایی چیزی است که ذخیره خواهد شد.

شما در حال رفتن برای نجات مدل در COLAB و بعد شما می توانید دانلود کنید آن را نگه دارید برای آینده (نمایش -> فهرست مندرجات -> فایل).

main_save_path = './my_models'
bert_type = tfhub_handle_encoder.split('/')[-2]
saved_model_name = f'{tfds_name.replace("/", "_")}_{bert_type}'

saved_model_path = os.path.join(main_save_path, saved_model_name)

preprocess_inputs = bert_preprocess_model.inputs
bert_encoder_inputs = bert_preprocess_model(preprocess_inputs)
bert_outputs = classifier_model(bert_encoder_inputs)
model_for_export = tf.keras.Model(preprocess_inputs, bert_outputs)

print('Saving', saved_model_path)

# Save everything on the Colab host (even the variables from TPU memory)
save_options = tf.saved_model.SaveOptions(experimental_io_device='/job:localhost')
model_for_export.save(saved_model_path, include_optimizer=False,
                      options=save_options)

Saving ./my_models/glue_cola_bert_en_uncased_L-12_H-768_A-12
WARNING:absl:Found untraced functions such as restored_function_body, restored_function_body, restored_function_body, restored_function_body, restored_function_body while saving (showing 5 of 910). These functions will not be directly callable after loading.

مدل را تست کنید

مرحله آخر آزمایش نتایج مدل صادراتی شما است.

فقط برای مقایسه، بیایید مدل را دوباره بارگذاری کنیم و آن را با استفاده از برخی ورودی‌های تست جدا شده از مجموعه داده آزمایش کنیم.

with tf.device('/job:localhost'):
  reloaded_model = tf.saved_model.load(saved_model_path)

روش های سودمند

def prepare(record):
  model_inputs = [[record[ft]] for ft in sentence_features]
  return model_inputs


def prepare_serving(record):
  model_inputs = {ft: record[ft] for ft in sentence_features}
  return model_inputs


def print_bert_results(test, bert_result, dataset_name):

  bert_result_class = tf.argmax(bert_result, axis=1)[0]

  if dataset_name == 'glue/cola':
    print('sentence:', test[0].numpy())
    if bert_result_class == 1:
      print('This sentence is acceptable')
    else:
      print('This sentence is unacceptable')

  elif dataset_name == 'glue/sst2':
    print('sentence:', test[0])
    if bert_result_class == 1:
      print('This sentence has POSITIVE sentiment')
    else:
      print('This sentence has NEGATIVE sentiment')

  elif dataset_name == 'glue/mrpc':
    print('sentence1:', test[0])
    print('sentence2:', test[1])
    if bert_result_class == 1:
      print('Are a paraphrase')
    else:
      print('Are NOT a paraphrase')

  elif dataset_name == 'glue/qqp':
    print('question1:', test[0])
    print('question2:', test[1])
    if bert_result_class == 1:
      print('Questions are similar')
    else:
      print('Questions are NOT similar')

  elif dataset_name == 'glue/mnli':
    print('premise   :', test[0])
    print('hypothesis:', test[1])
    if bert_result_class == 1:
      print('This premise is NEUTRAL to the hypothesis')
    elif bert_result_class == 2:
      print('This premise CONTRADICTS the hypothesis')
    else:
      print('This premise ENTAILS the hypothesis')

  elif dataset_name == 'glue/qnli':
    print('question:', test[0])
    print('sentence:', test[1])
    if bert_result_class == 1:
      print('The question is NOT answerable by the sentence')
    else:
      print('The question is answerable by the sentence')

  elif dataset_name == 'glue/rte':
    print('sentence1:', test[0])
    print('sentence2:', test[1])
    if bert_result_class == 1:
      print('Sentence1 DOES NOT entails sentence2')
    else:
      print('Sentence1 entails sentence2')

  elif dataset_name == 'glue/wnli':
    print('sentence1:', test[0])
    print('sentence2:', test[1])
    if bert_result_class == 1:
      print('Sentence1 DOES NOT entails sentence2')
    else:
      print('Sentence1 entails sentence2')

  print('BERT raw results:', bert_result[0])
  print()

تست

with tf.device('/job:localhost'):
  test_dataset = tf.data.Dataset.from_tensor_slices(in_memory_ds[test_split])
  for test_row in test_dataset.shuffle(1000).map(prepare).take(5):
    if len(sentence_features) == 1:
      result = reloaded_model(test_row[0])
    else:
      result = reloaded_model(list(test_row))

    print_bert_results(test_row, result, tfds_name)

sentence: [b'An old woman languished in the forest.']
This sentence is acceptable
BERT raw results: tf.Tensor([-1.7032353  3.3714833], shape=(2,), dtype=float32)

sentence: [b"I went to the movies and didn't pick up the shirts."]
This sentence is acceptable
BERT raw results: tf.Tensor([-0.73970896  1.0806316 ], shape=(2,), dtype=float32)

sentence: [b"Every essay that she's written and which I've read is on that pile."]
This sentence is acceptable
BERT raw results: tf.Tensor([-0.7034159  0.6236454], shape=(2,), dtype=float32)

sentence: [b'Either Bill ate the peaches, or Harry.']
This sentence is unacceptable
BERT raw results: tf.Tensor([ 0.05972151 -0.08620442], shape=(2,), dtype=float32)

sentence: [b'I ran into the baker from whom I bought these bagels.']
This sentence is acceptable
BERT raw results: tf.Tensor([-1.6862067  3.285925 ], shape=(2,), dtype=float32)

اگر شما می خواهید به استفاده از مدل خود را در تعمیر و نگهداری TF ، به یاد داشته باشید که آن را SavedModel خود را از طریق یکی از امضا به نام آن است. توجه داشته باشید که تفاوت های کوچکی در ورودی وجود دارد. در پایتون می توانید آنها را به صورت زیر تست کنید:

with tf.device('/job:localhost'):
  serving_model = reloaded_model.signatures['serving_default']
  for test_row in test_dataset.shuffle(1000).map(prepare_serving).take(5):
    result = serving_model(**test_row)
    # The 'prediction' key is the classifier's defined model name.
    print_bert_results(list(test_row.values()), result['prediction'], tfds_name)

sentence: b'Everyone attended more than two seminars.'
This sentence is acceptable
BERT raw results: tf.Tensor([-1.5594155  2.862155 ], shape=(2,), dtype=float32)

sentence: b'Most columnists claim that a senior White House official has been briefing them.'
This sentence is acceptable
BERT raw results: tf.Tensor([-1.6298996  3.3155093], shape=(2,), dtype=float32)

sentence: b"That my father, he's lived here all his life is well known to those cops."
This sentence is acceptable
BERT raw results: tf.Tensor([-1.2048947  1.8589772], shape=(2,), dtype=float32)

sentence: b'Ourselves like us.'
This sentence is acceptable
BERT raw results: tf.Tensor([-1.2723312  2.0494034], shape=(2,), dtype=float32)

sentence: b'John is clever.'
This sentence is acceptable
BERT raw results: tf.Tensor([-1.6516167  3.3147635], shape=(2,), dtype=float32)

توانجامش دادی! مدل ذخیره شده شما می تواند برای ارائه یا استنتاج ساده در یک فرآیند، با یک api ساده تر با کد کمتر و نگهداری آسان تر استفاده شود.

مراحل بعدی

اکنون که یکی از مدل های پایه BERT را امتحان کرده اید، می توانید مدل های دیگر را برای دستیابی به دقت بیشتر یا شاید با نسخه های مدل کوچکتر امتحان کنید.

همچنین می توانید در مجموعه داده های دیگر امتحان کنید.