Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

פתר משימות GLUE באמצעות BERT ב- TPU

הצג באתר TensorFlow.org

BERT יכול לשמש כדי לפתור בעיות רבות בעיבוד שפה טבעית. תלמד איך ברט מדוקדקת משימות רבות מן benchmark הדבק :

קולה (Corpus מקובלת לשוני): האם המשפט נכון מבחינה דקדוקית?
SST-2 (סנטימנט סטנפורד Treebank): המשימה היא לחזות את הסנטימנט של משפט נתון.
MRPC (Corpus פרפרזה המחקר Microsoft): יש לקבוע האם זוג המשפטים שקולה מבחינה סמנטית.
QQP (Quora השאלה Pairs2): יש לקבוע האם זוג השאלות שקולה מבחינה סמנטית.
MNLI (היקש שפה הטבעי Multi-ז'אנר): בהינתן משפט הנחה ו משפט שערה, המשימה היא לחזות אם ההנחה כרוך ההשערה (הַצרָכָה), סותרת את ההשערה (סתירה), או לא (ניטראלי).
QNLI (שאלה-תשובה הסקה שפה טבעית): המשימה היא לקבוע האם המשפט בהקשר מכיל את התשובה לשאלה.
RTE (הכרת גרירה טקסטואלית): קבע אם עונש כרוך שערה נתון או לא.
WNLI (היקש שפה טבעית וינוגרד): המשימה היא לחזות אם את המשפט עם כנוי להחליף שמתחייב במשפט המקורי.

מדריך זה מכיל קוד מלא מקצה לקצה כדי לאמן דגמים אלה על TPU. אתה יכול גם להפעיל מחברת זו על GPU, על ידי שינוי שורה אחת (מתואר להלן).

במחברת זו תוכלו:

טען דגם BERT מ- TensorFlow Hub
בחר אחת ממשימות GLUE והורד את מערך הנתונים
עבדו מראש את הטקסט
כוונן את BERT (דוגמאות ניתנות עבור מערכי נתונים של משפט בודד ורב-משפטים)
שמור את הדגם המיומן והשתמש בו

להכין

אתה תשתמש במודל נפרד כדי לעבד מראש טקסט לפני השימוש בו כדי לכוונן את BERT. מודל זה תלוי tensorflow / טקסט , אשר תוכלו להתקין בהמשך.

pip install -q -U tensorflow-text

תוכלו להשתמש בכלי למיטוב AdamW מן tensorflow / מודלים לברט לכוונן, אשר תוכלו להתקין גם כן.

pip install -q -U tf-models-official

pip install -U tfds-nightly

import os
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_datasets as tfds
import tensorflow_text as text  # A dependency of the preprocessing model
import tensorflow_addons as tfa
from official.nlp import optimization
import numpy as np

tf.get_logger().setLevel('ERROR')

/tmpfs/src/tf_docs_env/lib/python3.6/site-packages/requests/__init__.py:104: RequestsDependencyWarning: urllib3 (1.26.7) or chardet (2.3.0)/charset_normalizer (2.0.7) doesn't match a supported version!
  RequestsDependencyWarning)

לאחר מכן, הגדר את TFHub לקרוא נקודות ביקורת ישירות מהדליים של Cloud Storage של TFHub. זה מומלץ רק כאשר מריצים דגמי TFHub על TPU.

ללא הגדרה זו TFHub היה מוריד את הקובץ הדחוס ומחלץ את נקודת הבידוק באופן מקומי. ניסיון לטעון מקבצים מקומיים אלה ייכשל עם השגיאה הבאה:

InvalidArgumentError: Unimplemented: File system scheme '[local]' not implemented

הסיבה לכך היא TPU יכול לקרוא רק ישירות דליים לאחסון בענן .

os.environ["TFHUB_MODEL_LOAD_FORMAT"]="UNCOMPRESSED"

התחבר לעובד ה-TPU

הקוד הבא מתחבר לעובד ה-TPU ומשנה את התקן ברירת המחדל של TensorFlow להתקן ה-CPU ב-TPU. זה גם מגדיר אסטרטגיית הפצת TPU שבה תשתמש כדי להפיץ אימון מודלים על 8 ליבות ה-TPU הנפרדות הזמינות בעובד TPU אחד זה. ראה של TensorFlow מדריך TPU לקבלת מידע נוסף.

import os

if os.environ['COLAB_TPU_ADDR']:
  cluster_resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='')
  tf.config.experimental_connect_to_cluster(cluster_resolver)
  tf.tpu.experimental.initialize_tpu_system(cluster_resolver)
  strategy = tf.distribute.TPUStrategy(cluster_resolver)
  print('Using TPU')
elif tf.config.list_physical_devices('GPU'):
  strategy = tf.distribute.MirroredStrategy()
  print('Using GPU')
else:
  raise ValueError('Running on CPU is not recommended.')

Using TPU

טוען דגמים מ- TensorFlow Hub

כאן תוכל לבחור איזה דגם BERT תטען מ- TensorFlow Hub ולכוון עדין. ישנם מספר דגמי BERT זמינים לבחירה.

ברט-Base , Uncased ו שבעה יותר דגמים עם משקולות מאומנות שפורסמו על ידי המחברים ברט המקוריים.
BERTs הקטן יש את אותה ארכיטקטורה כללית אך פחות ו / או בלוקי Transformer קטנים, אשר מאפשרת לך לחקור פשרות בין מהירות, גודל ואיכות.
אלברט : ארבעה גדלים שונים של "ברט לייט" מפחיתת גודל מודל (אבל לא זמן חישוב) על ידי שיתוף פרמטרים בין שכבות.
מומחי ברט : שמונה דגמים שלכול ארכיטקטורה ברט-הבסיס אלא להציע בחירה בין תחומים טרום הכשרה השונות, כדי ליישר באופן הדוק יותר עם משימת היעד.
אלקטרה יש את אותה ארכיטקטורה כמו ברט (בשלושה גדלים שונים), אך מקבלת מראש מאומן בתור מאבחן ב סט-אפ דומה לרשת תשובה: Generative (גן).
ברט עם תשומת לב Talking Heads-ו המגודר Gelu [ בסיס , גדול ] יש שני שיפורי הליבה של אדריכלות Transformer.

עיין בתיעוד הדגם המקושר למעלה לפרטים נוספים.

במדריך זה תתחיל עם BERT-base. אתה יכול להשתמש בדגמים גדולים יותר ועדכניים יותר לדיוק גבוה יותר, או בדגמים קטנים יותר לזמני אימון מהירים יותר. כדי לשנות את הדגם, עליך להחליף רק שורת קוד אחת (מוצג להלן). כל ההבדלים מובלעים ב- SavedModel שתורידו מ- TensorFlow Hub.

בחר דגם BERT כדי לכוונן

bert_model_name = 'bert_en_uncased_L-12_H-768_A-12' 

map_name_to_handle = {
    'bert_en_uncased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/3',
    'bert_en_uncased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_uncased_L-24_H-1024_A-16/3',
    'bert_en_wwm_uncased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_wwm_uncased_L-24_H-1024_A-16/3',
    'bert_en_cased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_cased_L-12_H-768_A-12/3',
    'bert_en_cased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_cased_L-24_H-1024_A-16/3',
    'bert_en_wwm_cased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_wwm_cased_L-24_H-1024_A-16/3',
    'bert_multi_cased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_multi_cased_L-12_H-768_A-12/3',
    'small_bert/bert_en_uncased_L-2_H-128_A-2':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-2_H-128_A-2/1',
    'small_bert/bert_en_uncased_L-2_H-256_A-4':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-2_H-256_A-4/1',
    'small_bert/bert_en_uncased_L-2_H-512_A-8':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-2_H-512_A-8/1',
    'small_bert/bert_en_uncased_L-2_H-768_A-12':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-2_H-768_A-12/1',
    'small_bert/bert_en_uncased_L-4_H-128_A-2':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-4_H-128_A-2/1',
    'small_bert/bert_en_uncased_L-4_H-256_A-4':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-4_H-256_A-4/1',
    'small_bert/bert_en_uncased_L-4_H-512_A-8':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-4_H-512_A-8/1',
    'small_bert/bert_en_uncased_L-4_H-768_A-12':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-4_H-768_A-12/1',
    'small_bert/bert_en_uncased_L-6_H-128_A-2':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-6_H-128_A-2/1',
    'small_bert/bert_en_uncased_L-6_H-256_A-4':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-6_H-256_A-4/1',
    'small_bert/bert_en_uncased_L-6_H-512_A-8':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-6_H-512_A-8/1',
    'small_bert/bert_en_uncased_L-6_H-768_A-12':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-6_H-768_A-12/1',
    'small_bert/bert_en_uncased_L-8_H-128_A-2':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-8_H-128_A-2/1',
    'small_bert/bert_en_uncased_L-8_H-256_A-4':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-8_H-256_A-4/1',
    'small_bert/bert_en_uncased_L-8_H-512_A-8':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-8_H-512_A-8/1',
    'small_bert/bert_en_uncased_L-8_H-768_A-12':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-8_H-768_A-12/1',
    'small_bert/bert_en_uncased_L-10_H-128_A-2':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-10_H-128_A-2/1',
    'small_bert/bert_en_uncased_L-10_H-256_A-4':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-10_H-256_A-4/1',
    'small_bert/bert_en_uncased_L-10_H-512_A-8':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-10_H-512_A-8/1',
    'small_bert/bert_en_uncased_L-10_H-768_A-12':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-10_H-768_A-12/1',
    'small_bert/bert_en_uncased_L-12_H-128_A-2':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-12_H-128_A-2/1',
    'small_bert/bert_en_uncased_L-12_H-256_A-4':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-12_H-256_A-4/1',
    'small_bert/bert_en_uncased_L-12_H-512_A-8':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-12_H-512_A-8/1',
    'small_bert/bert_en_uncased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-12_H-768_A-12/1',
    'albert_en_base':
        'https://tfhub.dev/tensorflow/albert_en_base/2',
    'albert_en_large':
        'https://tfhub.dev/tensorflow/albert_en_large/2',
    'albert_en_xlarge':
        'https://tfhub.dev/tensorflow/albert_en_xlarge/2',
    'albert_en_xxlarge':
        'https://tfhub.dev/tensorflow/albert_en_xxlarge/2',
    'electra_small':
        'https://tfhub.dev/google/electra_small/2',
    'electra_base':
        'https://tfhub.dev/google/electra_base/2',
    'experts_pubmed':
        'https://tfhub.dev/google/experts/bert/pubmed/2',
    'experts_wiki_books':
        'https://tfhub.dev/google/experts/bert/wiki_books/2',
    'talking-heads_base':
        'https://tfhub.dev/tensorflow/talkheads_ggelu_bert_en_base/1',
    'talking-heads_large':
        'https://tfhub.dev/tensorflow/talkheads_ggelu_bert_en_large/1',
}

map_model_to_preprocess = {
    'bert_en_uncased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'bert_en_uncased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'bert_en_wwm_cased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_cased_preprocess/3',
    'bert_en_cased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_cased_preprocess/3',
    'bert_en_cased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_cased_preprocess/3',
    'bert_en_wwm_uncased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-2_H-128_A-2':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-2_H-256_A-4':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-2_H-512_A-8':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-2_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-4_H-128_A-2':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-4_H-256_A-4':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-4_H-512_A-8':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-4_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-6_H-128_A-2':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-6_H-256_A-4':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-6_H-512_A-8':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-6_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-8_H-128_A-2':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-8_H-256_A-4':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-8_H-512_A-8':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-8_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-10_H-128_A-2':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-10_H-256_A-4':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-10_H-512_A-8':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-10_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-12_H-128_A-2':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-12_H-256_A-4':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-12_H-512_A-8':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'bert_multi_cased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_multi_cased_preprocess/3',
    'albert_en_base':
        'https://tfhub.dev/tensorflow/albert_en_preprocess/3',
    'albert_en_large':
        'https://tfhub.dev/tensorflow/albert_en_preprocess/3',
    'albert_en_xlarge':
        'https://tfhub.dev/tensorflow/albert_en_preprocess/3',
    'albert_en_xxlarge':
        'https://tfhub.dev/tensorflow/albert_en_preprocess/3',
    'electra_small':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'electra_base':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'experts_pubmed':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'experts_wiki_books':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'talking-heads_base':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'talking-heads_large':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
}

tfhub_handle_encoder = map_name_to_handle[bert_model_name]
tfhub_handle_preprocess = map_model_to_preprocess[bert_model_name]

print('BERT model selected           :', tfhub_handle_encoder)
print('Preprocessing model auto-selected:', tfhub_handle_preprocess)

BERT model selected           : https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/3
Preprocessing model auto-selected: https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3

עבדו מראש את הטקסט

על הטקסט לסווג עם ברט colab המודל המקדים משמש מוטבע ישירות עם מקודד ברט.

מדריך זה מדגים כיצד לבצע עיבוד מקדים כחלק מצינור הקלט שלך לאימון, באמצעות Dataset.map, ולאחר מכן למזג אותו לתוך המודל שמיוצא להסקת מסקנות. כך, גם אימון וגם מסקנות יכולים לעבוד מקלט טקסט גולמי, למרות שה-TPU עצמו דורש קלט מספרי.

דרישות TPU הצידה, זה יכול לעזור להם ביצועים מקדימים נעשו באופן אסינכרוני בצינור הזנה (אתה יכול ללמוד יותר על מדריך ביצועי tf.data ).

מדריך זה גם מדגים כיצד לבנות מודלים מרובי קלט, וכיצד להתאים את אורך הרצף של הכניסות ל-BERT.

בואו נדגים את מודל העיבוד המקדים.

bert_preprocess = hub.load(tfhub_handle_preprocess)
tok = bert_preprocess.tokenize(tf.constant(['Hello TensorFlow!']))
print(tok)

<tf.RaggedTensor [[[7592], [23435, 12314], [999]]]>

כל דגם עיבוד מקדים גם מספק שיטה, .bert_pack_inputs(tensors, seq_length) , אשר לוקח רשימה של אסימונים (כמו tok לעיל) טיעון אורך רצף. זה אורז את התשומות כדי ליצור מילון של טנסורים בפורמט המצופה על ידי מודל BERT.

text_preprocessed = bert_preprocess.bert_pack_inputs([tok, tok], tf.constant(20))

print('Shape Word Ids : ', text_preprocessed['input_word_ids'].shape)
print('Word Ids       : ', text_preprocessed['input_word_ids'][0, :16])
print('Shape Mask     : ', text_preprocessed['input_mask'].shape)
print('Input Mask     : ', text_preprocessed['input_mask'][0, :16])
print('Shape Type Ids : ', text_preprocessed['input_type_ids'].shape)
print('Type Ids       : ', text_preprocessed['input_type_ids'][0, :16])

Shape Word Ids :  (1, 20)
Word Ids       :  tf.Tensor(
[  101  7592 23435 12314   999   102  7592 23435 12314   999   102     0
     0     0     0     0], shape=(16,), dtype=int32)
Shape Mask     :  (1, 20)
Input Mask     :  tf.Tensor([1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0], shape=(16,), dtype=int32)
Shape Type Ids :  (1, 20)
Type Ids       :  tf.Tensor([0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0], shape=(16,), dtype=int32)

הנה כמה פרטים שכדאי לשים אליהם לב:

input_mask המסכה מאפשרת המודל להבדיל למשעי בין התוכן לבין הריפוד. המסכה יש אותה הצורה כמו input_word_ids , ומכילה מקום 1 input_word_ids לא ריפוד.
input_type_ids יש את אותה צורה כמו input_mask , אך בתוך האזור הלא מרופד, מכיל 0 או 1 מציין אשר משפט האסימון הוא חלק.

לאחר מכן, תיצור מודל עיבוד מקדים שמכיל את כל ההיגיון הזה. המודל שלך ייקח מחרוזות כקלט, ויחזיר אובייקטים בפורמט מתאים שניתן להעביר ל-BERT.

לכל דגם BERT יש מודל עיבוד מקדים ספציפי, הקפד להשתמש במודל המתאים המתואר בתיעוד המודל של BERT.

def make_bert_preprocess_model(sentence_features, seq_length=128):
  """Returns Model mapping string features to BERT inputs.

  Args:
    sentence_features: a list with the names of string-valued features.
    seq_length: an integer that defines the sequence length of BERT inputs.

  Returns:
    A Keras Model that can be called on a list or dict of string Tensors
    (with the order or names, resp., given by sentence_features) and
    returns a dict of tensors for input to BERT.
  """

  input_segments = [
      tf.keras.layers.Input(shape=(), dtype=tf.string, name=ft)
      for ft in sentence_features]

  # Tokenize the text to word pieces.
  bert_preprocess = hub.load(tfhub_handle_preprocess)
  tokenizer = hub.KerasLayer(bert_preprocess.tokenize, name='tokenizer')
  segments = [tokenizer(s) for s in input_segments]

  # Optional: Trim segments in a smart way to fit seq_length.
  # Simple cases (like this example) can skip this step and let
  # the next step apply a default truncation to approximately equal lengths.
  truncated_segments = segments

  # Pack inputs. The details (start/end token ids, dict of output tensors)
  # are model-dependent, so this gets loaded from the SavedModel.
  packer = hub.KerasLayer(bert_preprocess.bert_pack_inputs,
                          arguments=dict(seq_length=seq_length),
                          name='packer')
  model_inputs = packer(truncated_segments)
  return tf.keras.Model(input_segments, model_inputs)

בואו נדגים את מודל העיבוד המקדים. תיצור מבחן עם קלט שני משפטים (קלט1 וקלט2). הפלט הוא מה מודל ברט יכול לצפות כקלט: input_word_ids , input_masks ו input_type_ids .

test_preprocess_model = make_bert_preprocess_model(['my_input1', 'my_input2'])
test_text = [np.array(['some random test sentence']),
             np.array(['another sentence'])]
text_preprocessed = test_preprocess_model(test_text)

print('Keys           : ', list(text_preprocessed.keys()))
print('Shape Word Ids : ', text_preprocessed['input_word_ids'].shape)
print('Word Ids       : ', text_preprocessed['input_word_ids'][0, :16])
print('Shape Mask     : ', text_preprocessed['input_mask'].shape)
print('Input Mask     : ', text_preprocessed['input_mask'][0, :16])
print('Shape Type Ids : ', text_preprocessed['input_type_ids'].shape)
print('Type Ids       : ', text_preprocessed['input_type_ids'][0, :16])

Keys           :  ['input_word_ids', 'input_mask', 'input_type_ids']
Shape Word Ids :  (1, 128)
Word Ids       :  tf.Tensor(
[ 101 2070 6721 3231 6251  102 2178 6251  102    0    0    0    0    0
    0    0], shape=(16,), dtype=int32)
Shape Mask     :  (1, 128)
Input Mask     :  tf.Tensor([1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0], shape=(16,), dtype=int32)
Shape Type Ids :  (1, 128)
Type Ids       :  tf.Tensor([0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0], shape=(16,), dtype=int32)

בואו נסתכל על מבנה הדגם, תוך שימת לב לשני הקלטים שהגדרתם זה עתה.

tf.keras.utils.plot_model(test_preprocess_model, show_shapes=True, show_dtype=True)

('You must install pydot (`pip install pydot`) and install graphviz (see instructions at https://graphviz.gitlab.io/download/) ', 'for plot_model/model_to_dot to work.')

כדי להחיל את העיבוד המקדים בכל התשומות מן הנתונים, תוכלו להשתמש map פונקציה מן הנתונים. התוצאה היא במטמון אז עבור ביצועים .

AUTOTUNE = tf.data.AUTOTUNE


def load_dataset_from_tfds(in_memory_ds, info, split, batch_size,
                           bert_preprocess_model):
  is_training = split.startswith('train')
  dataset = tf.data.Dataset.from_tensor_slices(in_memory_ds[split])
  num_examples = info.splits[split].num_examples

  if is_training:
    dataset = dataset.shuffle(num_examples)
    dataset = dataset.repeat()
  dataset = dataset.batch(batch_size)
  dataset = dataset.map(lambda ex: (bert_preprocess_model(ex), ex['label']))
  dataset = dataset.cache().prefetch(buffer_size=AUTOTUNE)
  return dataset, num_examples

הגדר את הדגם שלך

כעת אתה מוכן להגדיר את המודל שלך לסיווג צמד משפטים או משפטים על ידי הזנת התשומות המעובדות מראש דרך מקודד BERT והנחת מסווג ליניארי על גבי (או סידור אחר של שכבות כפי שאתה מעדיף), ושימוש בנשירה להסדרה.

def build_classifier_model(num_classes):

  class Classifier(tf.keras.Model):
    def __init__(self, num_classes):
      super(Classifier, self).__init__(name="prediction")
      self.encoder = hub.KerasLayer(tfhub_handle_encoder, trainable=True)
      self.dropout = tf.keras.layers.Dropout(0.1)
      self.dense = tf.keras.layers.Dense(num_classes)

    def call(self, preprocessed_text):
      encoder_outputs = self.encoder(preprocessed_text)
      pooled_output = encoder_outputs["pooled_output"]
      x = self.dropout(pooled_output)
      x = self.dense(x)
      return x

  model = Classifier(num_classes)
  return model

בואו ננסה להריץ את המודל על כמה תשומות מעובדות מראש.

test_classifier_model = build_classifier_model(2)
bert_raw_result = test_classifier_model(text_preprocessed)
print(tf.sigmoid(bert_raw_result))

tf.Tensor([[0.29329836 0.44367802]], shape=(1, 2), dtype=float32)

בחר משימה מתוך GLUE

אתה הולך להשתמש במערך TensorFlow מן דבק חבילת ביצועים.

Colab מאפשר לך להוריד מערכי נתונים קטנים אלה למערכת הקבצים המקומית, והקוד שלהלן קורא אותם במלואם לזיכרון, מכיוון שלמארח העובד הנפרד של TPU אין גישה למערכת הקבצים המקומית של זמן הריצה של colab.

עבור מערכי נתונים גדולים יותר, תצטרך ליצור משלך לאחסון בענן Google דלי ויש העובד TPU לקרוא את הנתונים משם. אתה יכול ללמוד יותר על מדריך TPU .

מומלץ להתחיל עם מערך הנתונים של CoLa (עבור משפט בודד) או MRPC (עבור ריבוי משפטים) מכיוון שהם קטנים ולא לוקח הרבה זמן לכוונן.

tfds_name = 'glue/cola' 

tfds_info = tfds.builder(tfds_name).info

sentence_features = list(tfds_info.features.keys())
sentence_features.remove('idx')
sentence_features.remove('label')

available_splits = list(tfds_info.splits.keys())
train_split = 'train'
validation_split = 'validation'
test_split = 'test'
if tfds_name == 'glue/mnli':
  validation_split = 'validation_matched'
  test_split = 'test_matched'

num_classes = tfds_info.features['label'].num_classes
num_examples = tfds_info.splits.total_num_examples

print(f'Using {tfds_name} from TFDS')
print(f'This dataset has {num_examples} examples')
print(f'Number of classes: {num_classes}')
print(f'Features {sentence_features}')
print(f'Splits {available_splits}')

with tf.device('/job:localhost'):
  # batch_size=-1 is a way to load the dataset into memory
  in_memory_ds = tfds.load(tfds_name, batch_size=-1, shuffle_files=True)

# The code below is just to show some samples from the selected dataset
print(f'Here are some sample rows from {tfds_name} dataset')
sample_dataset = tf.data.Dataset.from_tensor_slices(in_memory_ds[train_split])

labels_names = tfds_info.features['label'].names
print(labels_names)
print()

sample_i = 1
for sample_row in sample_dataset.take(5):
  samples = [sample_row[feature] for feature in sentence_features]
  print(f'sample row {sample_i}')
  for sample in samples:
    print(sample.numpy())
  sample_label = sample_row['label']

  print(f'label: {sample_label} ({labels_names[sample_label]})')
  print()
  sample_i += 1

Using glue/cola from TFDS
This dataset has 10657 examples
Number of classes: 2
Features ['sentence']
Splits ['train', 'validation', 'test']
Here are some sample rows from glue/cola dataset
['unacceptable', 'acceptable']

sample row 1
b'It is this hat that it is certain that he was wearing.'
label: 1 (acceptable)

sample row 2
b'Her efficient looking up of the answer pleased the boss.'
label: 1 (acceptable)

sample row 3
b'Both the workers will wear carnations.'
label: 1 (acceptable)

sample row 4
b'John enjoyed drawing trees for his syntax homework.'
label: 1 (acceptable)

sample row 5
b'We consider Leslie rather foolish, and Lou a complete idiot.'
label: 1 (acceptable)

מערך הנתונים קובע גם את סוג הבעיה (סיווג או רגרסיה) ואת פונקציית האובדן המתאימה לאימון.

def get_configuration(glue_task):

  loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

  if glue_task == 'glue/cola':
    metrics = tfa.metrics.MatthewsCorrelationCoefficient(num_classes=2)
  else:
    metrics = tf.keras.metrics.SparseCategoricalAccuracy(
        'accuracy', dtype=tf.float32)

  return metrics, loss

אמן את הדגם שלך

לבסוף, אתה יכול לאמן את המודל מקצה לקצה במערך הנתונים שבחרת.

הפצה

זכור את קוד ההגדרה בחלק העליון, שחיבר את זמן הריצה של colab לעובד TPU עם מספר התקני TPU. כדי להפיץ אימונים עליהם, תיצור ותרכיב את מודל Keras הראשי שלך במסגרת אסטרטגיית ההפצה של TPU. (לפרטים, ראה מבוזרות אימונים עם Keras .)

עיבוד מקדים, לעומת זאת, פועל על ה-CPU של המארח של העובד, לא על ה-TPUs, כך שמודל Keras לעיבוד מקדים וכן מערכי ההדרכה והאימות הממופים איתו נבנים מחוץ לתחום אסטרטגיית ההפצה. קריאת Model.fit() תדאג הפצה עבר-שבמערך אל העתקי המודל.

מייעל

פיין-טיונינג מלווה את הגדרת האופטימיזציה ברט מראש אימונים (כמו טקסט לסווג עם ברט ): היא משתמשת האופטימיזציה AdamW עם דעיכה ליניארית של שיעור למידה ראשוני רעיוני, עם התחילית שלב ליניארי חימום הרחבה הראשון 10% של צעדי אימונים ( num_warmup_steps ). בהתאם למאמר BERT, קצב הלמידה הראשוני קטן יותר עבור כוונון עדין (המיטב של 5e-5, 3e-5, 2e-5).

epochs = 3
batch_size = 32
init_lr = 2e-5

print(f'Fine tuning {tfhub_handle_encoder} model')
bert_preprocess_model = make_bert_preprocess_model(sentence_features)

with strategy.scope():

  # metric have to be created inside the strategy scope
  metrics, loss = get_configuration(tfds_name)

  train_dataset, train_data_size = load_dataset_from_tfds(
      in_memory_ds, tfds_info, train_split, batch_size, bert_preprocess_model)
  steps_per_epoch = train_data_size // batch_size
  num_train_steps = steps_per_epoch * epochs
  num_warmup_steps = num_train_steps // 10

  validation_dataset, validation_data_size = load_dataset_from_tfds(
      in_memory_ds, tfds_info, validation_split, batch_size,
      bert_preprocess_model)
  validation_steps = validation_data_size // batch_size

  classifier_model = build_classifier_model(num_classes)

  optimizer = optimization.create_optimizer(
      init_lr=init_lr,
      num_train_steps=num_train_steps,
      num_warmup_steps=num_warmup_steps,
      optimizer_type='adamw')

  classifier_model.compile(optimizer=optimizer, loss=loss, metrics=[metrics])

  classifier_model.fit(
      x=train_dataset,
      validation_data=validation_dataset,
      steps_per_epoch=steps_per_epoch,
      epochs=epochs,
      validation_steps=validation_steps)

Fine tuning https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/3 model
/tmpfs/src/tf_docs_env/lib/python3.6/site-packages/keras/engine/functional.py:585: UserWarning: Input dict contained keys ['idx', 'label'] which did not match any model input. They will be ignored by the model.
  [n for n in tensors.keys() if n not in ref_input_names])
Epoch 1/3
/tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/framework/indexed_slices.py:449: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("AdamWeightDecay/gradients/StatefulPartitionedCall:1", shape=(None,), dtype=int32), values=Tensor("clip_by_global_norm/clip_by_global_norm/_0:0", dtype=float32), dense_shape=Tensor("AdamWeightDecay/gradients/StatefulPartitionedCall:2", shape=(None,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "shape. This may consume a large amount of memory." % value)
267/267 [==============================] - 86s 81ms/step - loss: 0.6092 - MatthewsCorrelationCoefficient: 0.0000e+00 - val_loss: 0.4846 - val_MatthewsCorrelationCoefficient: 0.0000e+00
Epoch 2/3
267/267 [==============================] - 14s 53ms/step - loss: 0.3774 - MatthewsCorrelationCoefficient: 0.0000e+00 - val_loss: 0.5322 - val_MatthewsCorrelationCoefficient: 0.0000e+00
Epoch 3/3
267/267 [==============================] - 14s 53ms/step - loss: 0.2623 - MatthewsCorrelationCoefficient: 0.0000e+00 - val_loss: 0.6469 - val_MatthewsCorrelationCoefficient: 0.0000e+00

ייצא להסקת מסקנות

אתה תיצור מודל סופי הכולל את החלק הקדם-עיבוד ואת ה-BERT המכוונן עדין שיצרנו זה עתה.

בזמן מסקנות, עיבוד מקדים צריך להיות חלק מהמודל (מכיוון שאין עוד תור קלט נפרד באשר לנתוני אימון שעושים זאת). עיבוד מקדים אינו רק חישוב; יש לו משאבים משלו (טבלת המילים) שיש לצרף למודל Keras שנשמר לייצוא. ההרכבה הסופית הזו היא מה שיישמר.

אתה הולך להציל את המודל על colab ובהמשך תוכל להוריד כדי לשמור אותו לעתיד (View -> תוכן העניינים -> קבצים).

main_save_path = './my_models'
bert_type = tfhub_handle_encoder.split('/')[-2]
saved_model_name = f'{tfds_name.replace("/", "_")}_{bert_type}'

saved_model_path = os.path.join(main_save_path, saved_model_name)

preprocess_inputs = bert_preprocess_model.inputs
bert_encoder_inputs = bert_preprocess_model(preprocess_inputs)
bert_outputs = classifier_model(bert_encoder_inputs)
model_for_export = tf.keras.Model(preprocess_inputs, bert_outputs)

print('Saving', saved_model_path)

# Save everything on the Colab host (even the variables from TPU memory)
save_options = tf.saved_model.SaveOptions(experimental_io_device='/job:localhost')
model_for_export.save(saved_model_path, include_optimizer=False,
                      options=save_options)

Saving ./my_models/glue_cola_bert_en_uncased_L-12_H-768_A-12
WARNING:absl:Found untraced functions such as restored_function_body, restored_function_body, restored_function_body, restored_function_body, restored_function_body while saving (showing 5 of 910). These functions will not be directly callable after loading.

בדוק את הדגם

השלב האחרון הוא בדיקת התוצאות של המודל המיוצא שלך.

רק כדי לערוך השוואה, בואו נטען מחדש את המודל ונבדוק אותו באמצעות כמה קלטים מפיצול הבדיקה ממערך הנתונים.

with tf.device('/job:localhost'):
  reloaded_model = tf.saved_model.load(saved_model_path)

שיטות שירות

def prepare(record):
  model_inputs = [[record[ft]] for ft in sentence_features]
  return model_inputs


def prepare_serving(record):
  model_inputs = {ft: record[ft] for ft in sentence_features}
  return model_inputs


def print_bert_results(test, bert_result, dataset_name):

  bert_result_class = tf.argmax(bert_result, axis=1)[0]

  if dataset_name == 'glue/cola':
    print('sentence:', test[0].numpy())
    if bert_result_class == 1:
      print('This sentence is acceptable')
    else:
      print('This sentence is unacceptable')

  elif dataset_name == 'glue/sst2':
    print('sentence:', test[0])
    if bert_result_class == 1:
      print('This sentence has POSITIVE sentiment')
    else:
      print('This sentence has NEGATIVE sentiment')

  elif dataset_name == 'glue/mrpc':
    print('sentence1:', test[0])
    print('sentence2:', test[1])
    if bert_result_class == 1:
      print('Are a paraphrase')
    else:
      print('Are NOT a paraphrase')

  elif dataset_name == 'glue/qqp':
    print('question1:', test[0])
    print('question2:', test[1])
    if bert_result_class == 1:
      print('Questions are similar')
    else:
      print('Questions are NOT similar')

  elif dataset_name == 'glue/mnli':
    print('premise   :', test[0])
    print('hypothesis:', test[1])
    if bert_result_class == 1:
      print('This premise is NEUTRAL to the hypothesis')
    elif bert_result_class == 2:
      print('This premise CONTRADICTS the hypothesis')
    else:
      print('This premise ENTAILS the hypothesis')

  elif dataset_name == 'glue/qnli':
    print('question:', test[0])
    print('sentence:', test[1])
    if bert_result_class == 1:
      print('The question is NOT answerable by the sentence')
    else:
      print('The question is answerable by the sentence')

  elif dataset_name == 'glue/rte':
    print('sentence1:', test[0])
    print('sentence2:', test[1])
    if bert_result_class == 1:
      print('Sentence1 DOES NOT entails sentence2')
    else:
      print('Sentence1 entails sentence2')

  elif dataset_name == 'glue/wnli':
    print('sentence1:', test[0])
    print('sentence2:', test[1])
    if bert_result_class == 1:
      print('Sentence1 DOES NOT entails sentence2')
    else:
      print('Sentence1 entails sentence2')

  print('BERT raw results:', bert_result[0])
  print()

מִבְחָן

with tf.device('/job:localhost'):
  test_dataset = tf.data.Dataset.from_tensor_slices(in_memory_ds[test_split])
  for test_row in test_dataset.shuffle(1000).map(prepare).take(5):
    if len(sentence_features) == 1:
      result = reloaded_model(test_row[0])
    else:
      result = reloaded_model(list(test_row))

    print_bert_results(test_row, result, tfds_name)

sentence: [b'An old woman languished in the forest.']
This sentence is acceptable
BERT raw results: tf.Tensor([-1.7032353  3.3714833], shape=(2,), dtype=float32)

sentence: [b"I went to the movies and didn't pick up the shirts."]
This sentence is acceptable
BERT raw results: tf.Tensor([-0.73970896  1.0806316 ], shape=(2,), dtype=float32)

sentence: [b"Every essay that she's written and which I've read is on that pile."]
This sentence is acceptable
BERT raw results: tf.Tensor([-0.7034159  0.6236454], shape=(2,), dtype=float32)

sentence: [b'Either Bill ate the peaches, or Harry.']
This sentence is unacceptable
BERT raw results: tf.Tensor([ 0.05972151 -0.08620442], shape=(2,), dtype=float32)

sentence: [b'I ran into the baker from whom I bought these bagels.']
This sentence is acceptable
BERT raw results: tf.Tensor([-1.6862067  3.285925 ], shape=(2,), dtype=float32)

אם אתה רוצה להשתמש במודל שלך על הצגת TF , לזכור שזה ייקרא SavedModel שלך דרך אחד החתימות בשם שלה. שימו לב שיש כמה הבדלים קטנים בקלט. ב-Python, אתה יכול לבדוק אותם באופן הבא:

with tf.device('/job:localhost'):
  serving_model = reloaded_model.signatures['serving_default']
  for test_row in test_dataset.shuffle(1000).map(prepare_serving).take(5):
    result = serving_model(**test_row)
    # The 'prediction' key is the classifier's defined model name.
    print_bert_results(list(test_row.values()), result['prediction'], tfds_name)

sentence: b'Everyone attended more than two seminars.'
This sentence is acceptable
BERT raw results: tf.Tensor([-1.5594155  2.862155 ], shape=(2,), dtype=float32)

sentence: b'Most columnists claim that a senior White House official has been briefing them.'
This sentence is acceptable
BERT raw results: tf.Tensor([-1.6298996  3.3155093], shape=(2,), dtype=float32)

sentence: b"That my father, he's lived here all his life is well known to those cops."
This sentence is acceptable
BERT raw results: tf.Tensor([-1.2048947  1.8589772], shape=(2,), dtype=float32)

sentence: b'Ourselves like us.'
This sentence is acceptable
BERT raw results: tf.Tensor([-1.2723312  2.0494034], shape=(2,), dtype=float32)

sentence: b'John is clever.'
This sentence is acceptable
BERT raw results: tf.Tensor([-1.6516167  3.3147635], shape=(2,), dtype=float32)

עשית את זה! המודל השמור שלך יכול לשמש להגשה או להסקה פשוטה בתהליך, עם API פשוט יותר עם פחות קוד וקל יותר לתחזוקה.

הצעדים הבאים

כעת, לאחר שניסית את אחד מדגמי הבסיס של BERT, אתה יכול לנסות אחרים כדי להשיג דיוק רב יותר או אולי עם גרסאות דגמים קטנות יותר.

אתה יכול גם לנסות במערך נתונים אחרים.