Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

TPU에서 BERT를 사용하여 GLUE 작업 해결

TensorFlow.org에서 보기

BERT는 자연어 처리의 많은 문제를 해결하는 데 사용할 수 있습니다. 당신은 얼마나 많은 작업을위한 미세 조정 BERT에 배울 수 GLUE 벤치 마크 :

COLA (언어의 수용 코퍼스) : 문장은 문법적으로 정확합니까?
SST-2 (스탠포드 심리 Treebank는) : 작업은 주어진 문장의 감정을 예측하는 것입니다.
MRPC (마이크로 소프트 리서치 의역 코퍼스는) : 문장의 한 쌍의 의미 적으로 동일 여부를 결정합니다.
QQP (Quora의 질문 Pairs2는) : 질문 한 쌍의 의미 적으로 동일 여부를 결정합니다.
MNLI (멀티 장르 자연 언어 추론) : 전제 문장과 가설 문장을 감안할 때, 작업은, 전제는 가설 (함의)을 수반하는지 여부를 예측하는 가설 (모순) 모순, 또는도 (중립).
QNLI은 (자연 언어 추론을 질문 - 응답) : 태스크 상황에 맞는 문장이 질문에 대한 답이 들어 있는지 여부를 결정하는 것이다.
RTE는 (대한 텍스트 함의를 인식) : 문장이 주어진 가설을 수반 여부 있는지 확인합니다.
WNLI (Winograd 자연 언어 추론) : 작업은 대체 대명사로 문장이 원래 문장에 의해 수반되는 경우를 예측하는 것입니다.

이 자습서에는 TPU에서 이러한 모델을 학습시키기 위한 완전한 종단 간 코드가 포함되어 있습니다. 한 줄을 변경하여 GPU에서 이 노트북을 실행할 수도 있습니다(아래 설명 참조).

이 노트북에서 수행할 작업은 다음과 같습니다.

TensorFlow Hub에서 BERT 모델 로드
GLUE 작업 중 하나를 선택하고 데이터 세트를 다운로드합니다.
텍스트 전처리
BERT 미세 조정(단일 문장 및 다중 문장 데이터 세트에 대한 예가 제공됨)
학습된 모델을 저장하고 사용

설정

BERT를 미세 조정하는 데 사용하기 전에 별도의 모델을 사용하여 텍스트를 사전 처리합니다. 이 모델에 따라 tensorflow / 텍스트 아래 설치합니다.

pip install -q -U tensorflow-text

당신의 AdamW 최적화 사용 tensorflow / 모델 당신이 잘으로 설치됩니다 미세 조정 BERT에 있습니다.

pip install -q -U tf-models-official

pip install -U tfds-nightly

import os
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_datasets as tfds
import tensorflow_text as text  # A dependency of the preprocessing model
import tensorflow_addons as tfa
from official.nlp import optimization
import numpy as np

tf.get_logger().setLevel('ERROR')

/tmpfs/src/tf_docs_env/lib/python3.6/site-packages/requests/__init__.py:104: RequestsDependencyWarning: urllib3 (1.26.7) or chardet (2.3.0)/charset_normalizer (2.0.7) doesn't match a supported version!
  RequestsDependencyWarning)

다음으로 TFHub의 Cloud Storage 버킷에서 직접 체크포인트를 읽도록 TFHub를 구성합니다. 이것은 TPU에서 TFHub 모델을 실행할 때만 권장됩니다.

이 설정이 없으면 TFHub는 압축 파일을 다운로드하고 로컬에서 체크포인트를 추출합니다. 이러한 로컬 파일에서 로드를 시도하면 다음 오류와 함께 실패합니다.

InvalidArgumentError: Unimplemented: File system scheme '[local]' not implemented

때문이다 TPU는 클라우드 스토리지 버킷에서 직접 읽을 수 있습니다 .

os.environ["TFHUB_MODEL_LOAD_FORMAT"]="UNCOMPRESSED"

TPU 작업자에 연결

다음 코드는 TPU 작업자에 연결하고 TensorFlow의 기본 장치를 TPU 작업자의 CPU 장치로 변경합니다. 또한 이 TPU 작업자 한 명에서 사용할 수 있는 8개의 개별 TPU 코어에 모델 교육을 배포하는 데 사용할 TPU 배포 전략을 정의합니다. TensorFlow의 참조 TPU 가이드 자세한 내용은.

import os

if os.environ['COLAB_TPU_ADDR']:
  cluster_resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='')
  tf.config.experimental_connect_to_cluster(cluster_resolver)
  tf.tpu.experimental.initialize_tpu_system(cluster_resolver)
  strategy = tf.distribute.TPUStrategy(cluster_resolver)
  print('Using TPU')
elif tf.config.list_physical_devices('GPU'):
  strategy = tf.distribute.MirroredStrategy()
  print('Using GPU')
else:
  raise ValueError('Running on CPU is not recommended.')

Using TPU

TensorFlow Hub에서 모델 로드

여기에서 TensorFlow Hub에서 로드할 BERT 모델을 선택하고 미세 조정할 수 있습니다. 선택할 수 있는 여러 BERT 모델이 있습니다.

BERT-자료 , Uncased 및 7 개 개의 모델 원래 BERT 저자가 발표 한 훈련 무게.
작은 BERTs은 당신이 속도, 크기와 품질 사이의 트레이드 오프를 탐색 할 수 있습니다 동일한 일반적인 아키텍처하지만 적은 및 / 또는 작은 변압기 블록을 가지고있다.
ALBERT : 층간 파라미터를 공유 모델 크기 (아니라 계산 시간)을 감소 "A 라이트 BERT"의 네 개의 다른 크기.
BERT 전문가 : 팔 개 모델 모두는 BERT 기반 아키텍처를 가지고 있지만 대상 작업과 더 밀접하게 정렬하기 위해, 다른 사전 교육 도메인 사이에서 선택을 제공합니다.
일렉트라는 (세 가지 다른 크기) BERT와 같은 구조를 가지고 있지만, 셋업에서 판별으로 사전 교육을받은 가도록 유사한 제너 적대적 네트워크 (GAN).
- 토킹 헤즈 주목하고 정문 겔루 [BERT와 베이스 , 대형는 상기 변압기의 코어 아키텍처 두 개선을 갖는다.

자세한 내용은 위에 링크된 모델 문서를 참조하십시오.

이 자습서에서는 BERT 기반으로 시작합니다. 더 높은 정확도를 위해 더 크고 최신 모델을 사용하거나 더 빠른 훈련 시간을 위해 더 작은 모델을 사용할 수 있습니다. 모델을 변경하려면 한 줄의 코드만 전환하면 됩니다(아래 참조). 모든 차이점은 TensorFlow Hub에서 다운로드할 SavedModel에 캡슐화됩니다.

미세 조정할 BERT 모델 선택

bert_model_name = 'bert_en_uncased_L-12_H-768_A-12' 

map_name_to_handle = {
    'bert_en_uncased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/3',
    'bert_en_uncased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_uncased_L-24_H-1024_A-16/3',
    'bert_en_wwm_uncased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_wwm_uncased_L-24_H-1024_A-16/3',
    'bert_en_cased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_cased_L-12_H-768_A-12/3',
    'bert_en_cased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_cased_L-24_H-1024_A-16/3',
    'bert_en_wwm_cased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_wwm_cased_L-24_H-1024_A-16/3',
    'bert_multi_cased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_multi_cased_L-12_H-768_A-12/3',
    'small_bert/bert_en_uncased_L-2_H-128_A-2':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-2_H-128_A-2/1',
    'small_bert/bert_en_uncased_L-2_H-256_A-4':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-2_H-256_A-4/1',
    'small_bert/bert_en_uncased_L-2_H-512_A-8':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-2_H-512_A-8/1',
    'small_bert/bert_en_uncased_L-2_H-768_A-12':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-2_H-768_A-12/1',
    'small_bert/bert_en_uncased_L-4_H-128_A-2':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-4_H-128_A-2/1',
    'small_bert/bert_en_uncased_L-4_H-256_A-4':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-4_H-256_A-4/1',
    'small_bert/bert_en_uncased_L-4_H-512_A-8':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-4_H-512_A-8/1',
    'small_bert/bert_en_uncased_L-4_H-768_A-12':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-4_H-768_A-12/1',
    'small_bert/bert_en_uncased_L-6_H-128_A-2':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-6_H-128_A-2/1',
    'small_bert/bert_en_uncased_L-6_H-256_A-4':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-6_H-256_A-4/1',
    'small_bert/bert_en_uncased_L-6_H-512_A-8':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-6_H-512_A-8/1',
    'small_bert/bert_en_uncased_L-6_H-768_A-12':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-6_H-768_A-12/1',
    'small_bert/bert_en_uncased_L-8_H-128_A-2':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-8_H-128_A-2/1',
    'small_bert/bert_en_uncased_L-8_H-256_A-4':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-8_H-256_A-4/1',
    'small_bert/bert_en_uncased_L-8_H-512_A-8':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-8_H-512_A-8/1',
    'small_bert/bert_en_uncased_L-8_H-768_A-12':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-8_H-768_A-12/1',
    'small_bert/bert_en_uncased_L-10_H-128_A-2':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-10_H-128_A-2/1',
    'small_bert/bert_en_uncased_L-10_H-256_A-4':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-10_H-256_A-4/1',
    'small_bert/bert_en_uncased_L-10_H-512_A-8':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-10_H-512_A-8/1',
    'small_bert/bert_en_uncased_L-10_H-768_A-12':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-10_H-768_A-12/1',
    'small_bert/bert_en_uncased_L-12_H-128_A-2':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-12_H-128_A-2/1',
    'small_bert/bert_en_uncased_L-12_H-256_A-4':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-12_H-256_A-4/1',
    'small_bert/bert_en_uncased_L-12_H-512_A-8':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-12_H-512_A-8/1',
    'small_bert/bert_en_uncased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-12_H-768_A-12/1',
    'albert_en_base':
        'https://tfhub.dev/tensorflow/albert_en_base/2',
    'albert_en_large':
        'https://tfhub.dev/tensorflow/albert_en_large/2',
    'albert_en_xlarge':
        'https://tfhub.dev/tensorflow/albert_en_xlarge/2',
    'albert_en_xxlarge':
        'https://tfhub.dev/tensorflow/albert_en_xxlarge/2',
    'electra_small':
        'https://tfhub.dev/google/electra_small/2',
    'electra_base':
        'https://tfhub.dev/google/electra_base/2',
    'experts_pubmed':
        'https://tfhub.dev/google/experts/bert/pubmed/2',
    'experts_wiki_books':
        'https://tfhub.dev/google/experts/bert/wiki_books/2',
    'talking-heads_base':
        'https://tfhub.dev/tensorflow/talkheads_ggelu_bert_en_base/1',
    'talking-heads_large':
        'https://tfhub.dev/tensorflow/talkheads_ggelu_bert_en_large/1',
}

map_model_to_preprocess = {
    'bert_en_uncased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'bert_en_uncased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'bert_en_wwm_cased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_cased_preprocess/3',
    'bert_en_cased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_cased_preprocess/3',
    'bert_en_cased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_cased_preprocess/3',
    'bert_en_wwm_uncased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-2_H-128_A-2':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-2_H-256_A-4':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-2_H-512_A-8':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-2_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-4_H-128_A-2':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-4_H-256_A-4':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-4_H-512_A-8':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-4_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-6_H-128_A-2':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-6_H-256_A-4':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-6_H-512_A-8':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-6_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-8_H-128_A-2':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-8_H-256_A-4':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-8_H-512_A-8':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-8_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-10_H-128_A-2':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-10_H-256_A-4':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-10_H-512_A-8':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-10_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-12_H-128_A-2':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-12_H-256_A-4':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-12_H-512_A-8':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'bert_multi_cased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_multi_cased_preprocess/3',
    'albert_en_base':
        'https://tfhub.dev/tensorflow/albert_en_preprocess/3',
    'albert_en_large':
        'https://tfhub.dev/tensorflow/albert_en_preprocess/3',
    'albert_en_xlarge':
        'https://tfhub.dev/tensorflow/albert_en_preprocess/3',
    'albert_en_xxlarge':
        'https://tfhub.dev/tensorflow/albert_en_preprocess/3',
    'electra_small':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'electra_base':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'experts_pubmed':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'experts_wiki_books':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'talking-heads_base':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'talking-heads_large':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
}

tfhub_handle_encoder = map_name_to_handle[bert_model_name]
tfhub_handle_preprocess = map_model_to_preprocess[bert_model_name]

print('BERT model selected           :', tfhub_handle_encoder)
print('Preprocessing model auto-selected:', tfhub_handle_preprocess)

BERT model selected           : https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/3
Preprocessing model auto-selected: https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3

텍스트 전처리

온 BERT로 분류 텍스트 colab 전처리 모델이 직접 BERT 인코더 내장 사용됩니다.

이 자습서에서는 Dataset.map을 사용하여 훈련을 위한 입력 파이프라인의 일부로 사전 처리를 수행한 다음 추론을 위해 내보내는 모델에 병합하는 방법을 보여줍니다. 그렇게 하면 TPU 자체에 숫자 입력이 필요하지만 훈련과 추론 모두 원시 텍스트 입력에서 작동할 수 있습니다.

TPU 요구 사항은 제외하고, 그것은 성능 (당신이 더에서 배울 수있는 입력 파이프 라인에서 비동기 적으로 수행 전처리가 도움이 될 수 있습니다 tf.data 성능 가이드 ).

이 튜토리얼은 또한 다중 입력 모델을 구축하는 방법과 BERT에 대한 입력의 시퀀스 길이를 조정하는 방법을 보여줍니다.

전처리 모델을 보여드리겠습니다.

bert_preprocess = hub.load(tfhub_handle_preprocess)
tok = bert_preprocess.tokenize(tf.constant(['Hello TensorFlow!']))
print(tok)

<tf.RaggedTensor [[[7592], [23435, 12314], [999]]]>

각 모델은 또한 전처리 방법을 제공 .bert_pack_inputs(tensors, seq_length) , (같은 토큰들의리스트를 얻어 tok 위)와 시퀀스 길이 인수. 이것은 입력을 압축하여 BERT 모델에서 예상하는 형식으로 텐서 사전을 생성합니다.

text_preprocessed = bert_preprocess.bert_pack_inputs([tok, tok], tf.constant(20))

print('Shape Word Ids : ', text_preprocessed['input_word_ids'].shape)
print('Word Ids       : ', text_preprocessed['input_word_ids'][0, :16])
print('Shape Mask     : ', text_preprocessed['input_mask'].shape)
print('Input Mask     : ', text_preprocessed['input_mask'][0, :16])
print('Shape Type Ids : ', text_preprocessed['input_type_ids'].shape)
print('Type Ids       : ', text_preprocessed['input_type_ids'][0, :16])

Shape Word Ids :  (1, 20)
Word Ids       :  tf.Tensor(
[  101  7592 23435 12314   999   102  7592 23435 12314   999   102     0
     0     0     0     0], shape=(16,), dtype=int32)
Shape Mask     :  (1, 20)
Input Mask     :  tf.Tensor([1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0], shape=(16,), dtype=int32)
Shape Type Ids :  (1, 20)
Type Ids       :  tf.Tensor([0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0], shape=(16,), dtype=int32)

주의해야 할 몇 가지 세부 정보는 다음과 같습니다.

input_mask 마스크는 내용과 패딩 사이에 명확하게 구분하기 위해 모델을 할 수 있습니다. 마스크는 동일한 형상 가진다 input_word_ids 하고 1 어딘가에 포함 input_word_ids 패딩되지 않습니다.
input_type_ids 동일한 형상 가진다 input_mask 하지만 비 패딩 영역 안에, 0 또는 1은 토큰의 일부인 문장 나타내는 포함한다.

다음으로 이 모든 논리를 캡슐화하는 전처리 모델을 만듭니다. 모델은 문자열을 입력으로 사용하고 BERT에 전달할 수 있는 적절한 형식의 개체를 반환합니다.

각 BERT 모델에는 특정 전처리 모델이 있으므로 BERT의 모델 문서에 설명된 적절한 모델을 사용해야 합니다.

def make_bert_preprocess_model(sentence_features, seq_length=128):
  """Returns Model mapping string features to BERT inputs.

  Args:
    sentence_features: a list with the names of string-valued features.
    seq_length: an integer that defines the sequence length of BERT inputs.

  Returns:
    A Keras Model that can be called on a list or dict of string Tensors
    (with the order or names, resp., given by sentence_features) and
    returns a dict of tensors for input to BERT.
  """

  input_segments = [
      tf.keras.layers.Input(shape=(), dtype=tf.string, name=ft)
      for ft in sentence_features]

  # Tokenize the text to word pieces.
  bert_preprocess = hub.load(tfhub_handle_preprocess)
  tokenizer = hub.KerasLayer(bert_preprocess.tokenize, name='tokenizer')
  segments = [tokenizer(s) for s in input_segments]

  # Optional: Trim segments in a smart way to fit seq_length.
  # Simple cases (like this example) can skip this step and let
  # the next step apply a default truncation to approximately equal lengths.
  truncated_segments = segments

  # Pack inputs. The details (start/end token ids, dict of output tensors)
  # are model-dependent, so this gets loaded from the SavedModel.
  packer = hub.KerasLayer(bert_preprocess.bert_pack_inputs,
                          arguments=dict(seq_length=seq_length),
                          name='packer')
  model_inputs = packer(truncated_segments)
  return tf.keras.Model(input_segments, model_inputs)

전처리 모델을 보여드리겠습니다. 두 개의 문장 입력(input1 및 input2)으로 테스트를 생성합니다. : 출력은 BERT 모델은 입력으로 기대하는 것입니다 input_word_ids , input_masks 및 input_type_ids .

test_preprocess_model = make_bert_preprocess_model(['my_input1', 'my_input2'])
test_text = [np.array(['some random test sentence']),
             np.array(['another sentence'])]
text_preprocessed = test_preprocess_model(test_text)

print('Keys           : ', list(text_preprocessed.keys()))
print('Shape Word Ids : ', text_preprocessed['input_word_ids'].shape)
print('Word Ids       : ', text_preprocessed['input_word_ids'][0, :16])
print('Shape Mask     : ', text_preprocessed['input_mask'].shape)
print('Input Mask     : ', text_preprocessed['input_mask'][0, :16])
print('Shape Type Ids : ', text_preprocessed['input_type_ids'].shape)
print('Type Ids       : ', text_preprocessed['input_type_ids'][0, :16])

Keys           :  ['input_word_ids', 'input_mask', 'input_type_ids']
Shape Word Ids :  (1, 128)
Word Ids       :  tf.Tensor(
[ 101 2070 6721 3231 6251  102 2178 6251  102    0    0    0    0    0
    0    0], shape=(16,), dtype=int32)
Shape Mask     :  (1, 128)
Input Mask     :  tf.Tensor([1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0], shape=(16,), dtype=int32)
Shape Type Ids :  (1, 128)
Type Ids       :  tf.Tensor([0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0], shape=(16,), dtype=int32)

방금 정의한 두 입력에 주의하면서 모델의 구조를 살펴보겠습니다.

tf.keras.utils.plot_model(test_preprocess_model, show_shapes=True, show_dtype=True)

('You must install pydot (`pip install pydot`) and install graphviz (see instructions at https://graphviz.gitlab.io/download/) ', 'for plot_model/model_to_dot to work.')

데이터 세트에서 모든 입력의 전처리를 적용하려면 사용하는 map 데이터 세트에서 기능을. 결과는 다음에 대한 캐시 성능 .

AUTOTUNE = tf.data.AUTOTUNE


def load_dataset_from_tfds(in_memory_ds, info, split, batch_size,
                           bert_preprocess_model):
  is_training = split.startswith('train')
  dataset = tf.data.Dataset.from_tensor_slices(in_memory_ds[split])
  num_examples = info.splits[split].num_examples

  if is_training:
    dataset = dataset.shuffle(num_examples)
    dataset = dataset.repeat()
  dataset = dataset.batch(batch_size)
  dataset = dataset.map(lambda ex: (bert_preprocess_model(ex), ex['label']))
  dataset = dataset.cache().prefetch(buffer_size=AUTOTUNE)
  return dataset, num_examples

모델 정의

이제 BERT 인코더를 통해 전처리된 입력을 제공하고 선형 분류기를 맨 위에 놓고(또는 원하는 다른 레이어 배열), 정규화를 위해 드롭아웃을 사용하여 문장 또는 문장 쌍 분류를 위한 모델을 정의할 준비가 되었습니다.

def build_classifier_model(num_classes):

  class Classifier(tf.keras.Model):
    def __init__(self, num_classes):
      super(Classifier, self).__init__(name="prediction")
      self.encoder = hub.KerasLayer(tfhub_handle_encoder, trainable=True)
      self.dropout = tf.keras.layers.Dropout(0.1)
      self.dense = tf.keras.layers.Dense(num_classes)

    def call(self, preprocessed_text):
      encoder_outputs = self.encoder(preprocessed_text)
      pooled_output = encoder_outputs["pooled_output"]
      x = self.dropout(pooled_output)
      x = self.dense(x)
      return x

  model = Classifier(num_classes)
  return model

일부 사전 처리된 입력에서 모델을 실행해 보겠습니다.

test_classifier_model = build_classifier_model(2)
bert_raw_result = test_classifier_model(text_preprocessed)
print(tf.sigmoid(bert_raw_result))

tf.Tensor([[0.29329836 0.44367802]], shape=(1, 2), dtype=float32)

GLUE에서 작업 선택

당신은에서 TensorFlow 데이터 집합을 사용하려고 GLUE의 벤치 마크 스위트.

Colab을 사용하면 이러한 작은 데이터세트를 로컬 파일 시스템에 다운로드할 수 있으며, 별도의 TPU 작업자 호스트가 colab 런타임의 로컬 파일 시스템에 액세스할 수 없기 때문에 아래 코드는 이를 완전히 메모리로 읽습니다.

더 큰 데이터 세트의 경우, 당신은 당신의 자신 만들어야합니다 Google 클라우드 스토리지 버킷과 TPU 노동자는 거기에서 데이터를 읽을 수 있습니다. 당신은 더 배울 수있는 TPU 가이드 .

CoLa 데이터 세트(단일 문장의 경우) 또는 MRPC(다중 문장의 경우)는 작고 미세 조정하는 데 오래 걸리지 않기 때문에 시작하는 것이 좋습니다.

tfds_name = 'glue/cola' 

tfds_info = tfds.builder(tfds_name).info

sentence_features = list(tfds_info.features.keys())
sentence_features.remove('idx')
sentence_features.remove('label')

available_splits = list(tfds_info.splits.keys())
train_split = 'train'
validation_split = 'validation'
test_split = 'test'
if tfds_name == 'glue/mnli':
  validation_split = 'validation_matched'
  test_split = 'test_matched'

num_classes = tfds_info.features['label'].num_classes
num_examples = tfds_info.splits.total_num_examples

print(f'Using {tfds_name} from TFDS')
print(f'This dataset has {num_examples} examples')
print(f'Number of classes: {num_classes}')
print(f'Features {sentence_features}')
print(f'Splits {available_splits}')

with tf.device('/job:localhost'):
  # batch_size=-1 is a way to load the dataset into memory
  in_memory_ds = tfds.load(tfds_name, batch_size=-1, shuffle_files=True)

# The code below is just to show some samples from the selected dataset
print(f'Here are some sample rows from {tfds_name} dataset')
sample_dataset = tf.data.Dataset.from_tensor_slices(in_memory_ds[train_split])

labels_names = tfds_info.features['label'].names
print(labels_names)
print()

sample_i = 1
for sample_row in sample_dataset.take(5):
  samples = [sample_row[feature] for feature in sentence_features]
  print(f'sample row {sample_i}')
  for sample in samples:
    print(sample.numpy())
  sample_label = sample_row['label']

  print(f'label: {sample_label} ({labels_names[sample_label]})')
  print()
  sample_i += 1

Using glue/cola from TFDS
This dataset has 10657 examples
Number of classes: 2
Features ['sentence']
Splits ['train', 'validation', 'test']
Here are some sample rows from glue/cola dataset
['unacceptable', 'acceptable']

sample row 1
b'It is this hat that it is certain that he was wearing.'
label: 1 (acceptable)

sample row 2
b'Her efficient looking up of the answer pleased the boss.'
label: 1 (acceptable)

sample row 3
b'Both the workers will wear carnations.'
label: 1 (acceptable)

sample row 4
b'John enjoyed drawing trees for his syntax homework.'
label: 1 (acceptable)

sample row 5
b'We consider Leslie rather foolish, and Lou a complete idiot.'
label: 1 (acceptable)

데이터 세트는 또한 문제 유형(분류 또는 회귀)과 훈련을 위한 적절한 손실 함수를 결정합니다.

def get_configuration(glue_task):

  loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

  if glue_task == 'glue/cola':
    metrics = tfa.metrics.MatthewsCorrelationCoefficient(num_classes=2)
  else:
    metrics = tf.keras.metrics.SparseCategoricalAccuracy(
        'accuracy', dtype=tf.float32)

  return metrics, loss

모델 학습

마지막으로 선택한 데이터 세트에 대해 모델을 종단 간 학습할 수 있습니다.

분포

여러 TPU 기기가 있는 TPU 작업자에 colab 런타임을 연결한 상단의 설정 코드를 상기하세요. 교육을 배포하기 위해 TPU 배포 전략 범위 내에서 기본 Keras 모델을 만들고 컴파일합니다. (자세한 내용은 참조하십시오 Keras와 교육을 분산 .)

반면에 전처리는 TPU가 아닌 작업자 호스트의 CPU에서 실행되므로 전처리를 위한 Keras 모델과 이에 매핑된 교육 및 검증 데이터 세트는 배포 전략 범위 외부에 구축됩니다. 를 호출 Model.fit() 분산 처리됩니다 전달 된 모델 복제본 데이터 세트.

옵티마이저

미세 조정은 최적화 셋업 BERT에서 사전 교육 (에서와 같이 다음과 BERT으로 분류 텍스트 그것은 개념적인 초기 학습 속도의 선형 부패와 AdamW 최적화를 사용하여 첫 번째 통해 선형 워밍업 단계로 시작을 :) 훈련 단계 (10 % num_warmup_steps ). BERT 논문에 따르면 초기 학습률은 미세 조정을 위해 더 작습니다(5e-5, 3e-5, 2e-5 중 최고).

epochs = 3
batch_size = 32
init_lr = 2e-5

print(f'Fine tuning {tfhub_handle_encoder} model')
bert_preprocess_model = make_bert_preprocess_model(sentence_features)

with strategy.scope():

  # metric have to be created inside the strategy scope
  metrics, loss = get_configuration(tfds_name)

  train_dataset, train_data_size = load_dataset_from_tfds(
      in_memory_ds, tfds_info, train_split, batch_size, bert_preprocess_model)
  steps_per_epoch = train_data_size // batch_size
  num_train_steps = steps_per_epoch * epochs
  num_warmup_steps = num_train_steps // 10

  validation_dataset, validation_data_size = load_dataset_from_tfds(
      in_memory_ds, tfds_info, validation_split, batch_size,
      bert_preprocess_model)
  validation_steps = validation_data_size // batch_size

  classifier_model = build_classifier_model(num_classes)

  optimizer = optimization.create_optimizer(
      init_lr=init_lr,
      num_train_steps=num_train_steps,
      num_warmup_steps=num_warmup_steps,
      optimizer_type='adamw')

  classifier_model.compile(optimizer=optimizer, loss=loss, metrics=[metrics])

  classifier_model.fit(
      x=train_dataset,
      validation_data=validation_dataset,
      steps_per_epoch=steps_per_epoch,
      epochs=epochs,
      validation_steps=validation_steps)

Fine tuning https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/3 model
/tmpfs/src/tf_docs_env/lib/python3.6/site-packages/keras/engine/functional.py:585: UserWarning: Input dict contained keys ['idx', 'label'] which did not match any model input. They will be ignored by the model.
  [n for n in tensors.keys() if n not in ref_input_names])
Epoch 1/3
/tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/framework/indexed_slices.py:449: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("AdamWeightDecay/gradients/StatefulPartitionedCall:1", shape=(None,), dtype=int32), values=Tensor("clip_by_global_norm/clip_by_global_norm/_0:0", dtype=float32), dense_shape=Tensor("AdamWeightDecay/gradients/StatefulPartitionedCall:2", shape=(None,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "shape. This may consume a large amount of memory." % value)
267/267 [==============================] - 86s 81ms/step - loss: 0.6092 - MatthewsCorrelationCoefficient: 0.0000e+00 - val_loss: 0.4846 - val_MatthewsCorrelationCoefficient: 0.0000e+00
Epoch 2/3
267/267 [==============================] - 14s 53ms/step - loss: 0.3774 - MatthewsCorrelationCoefficient: 0.0000e+00 - val_loss: 0.5322 - val_MatthewsCorrelationCoefficient: 0.0000e+00
Epoch 3/3
267/267 [==============================] - 14s 53ms/step - loss: 0.2623 - MatthewsCorrelationCoefficient: 0.0000e+00 - val_loss: 0.6469 - val_MatthewsCorrelationCoefficient: 0.0000e+00

추론을 위해 내보내기

방금 만든 전처리 부분과 미세 조정된 BERT가 있는 최종 모델을 만듭니다.

추론할 때 전처리는 모델의 일부가 되어야 합니다(더 이상 이를 수행하는 훈련 데이터에 대해 별도의 입력 대기열이 없기 때문에). 전처리는 단순한 계산이 아닙니다. 내보내기를 위해 저장된 Keras 모델에 첨부되어야 하는 자체 리소스(어휘 테이블)가 있습니다. 이 최종 어셈블리가 저장될 것입니다.

당신은 colab에 모델을 저장하려고하고 나중에 미래를 위해 그것을 유지하기 위해 다운로드 할 수 있습니다 (보기 -> 목차 -> 파일).

main_save_path = './my_models'
bert_type = tfhub_handle_encoder.split('/')[-2]
saved_model_name = f'{tfds_name.replace("/", "_")}_{bert_type}'

saved_model_path = os.path.join(main_save_path, saved_model_name)

preprocess_inputs = bert_preprocess_model.inputs
bert_encoder_inputs = bert_preprocess_model(preprocess_inputs)
bert_outputs = classifier_model(bert_encoder_inputs)
model_for_export = tf.keras.Model(preprocess_inputs, bert_outputs)

print('Saving', saved_model_path)

# Save everything on the Colab host (even the variables from TPU memory)
save_options = tf.saved_model.SaveOptions(experimental_io_device='/job:localhost')
model_for_export.save(saved_model_path, include_optimizer=False,
                      options=save_options)

Saving ./my_models/glue_cola_bert_en_uncased_L-12_H-768_A-12
WARNING:absl:Found untraced functions such as restored_function_body, restored_function_body, restored_function_body, restored_function_body, restored_function_body while saving (showing 5 of 910). These functions will not be directly callable after loading.

모델 테스트

마지막 단계는 내보낸 모델의 결과를 테스트하는 것입니다.

비교를 위해 모델을 다시 로드하고 데이터 세트에서 분할된 테스트의 일부 입력을 사용하여 테스트해 보겠습니다.

with tf.device('/job:localhost'):
  reloaded_model = tf.saved_model.load(saved_model_path)

유틸리티 방법

def prepare(record):
  model_inputs = [[record[ft]] for ft in sentence_features]
  return model_inputs


def prepare_serving(record):
  model_inputs = {ft: record[ft] for ft in sentence_features}
  return model_inputs


def print_bert_results(test, bert_result, dataset_name):

  bert_result_class = tf.argmax(bert_result, axis=1)[0]

  if dataset_name == 'glue/cola':
    print('sentence:', test[0].numpy())
    if bert_result_class == 1:
      print('This sentence is acceptable')
    else:
      print('This sentence is unacceptable')

  elif dataset_name == 'glue/sst2':
    print('sentence:', test[0])
    if bert_result_class == 1:
      print('This sentence has POSITIVE sentiment')
    else:
      print('This sentence has NEGATIVE sentiment')

  elif dataset_name == 'glue/mrpc':
    print('sentence1:', test[0])
    print('sentence2:', test[1])
    if bert_result_class == 1:
      print('Are a paraphrase')
    else:
      print('Are NOT a paraphrase')

  elif dataset_name == 'glue/qqp':
    print('question1:', test[0])
    print('question2:', test[1])
    if bert_result_class == 1:
      print('Questions are similar')
    else:
      print('Questions are NOT similar')

  elif dataset_name == 'glue/mnli':
    print('premise   :', test[0])
    print('hypothesis:', test[1])
    if bert_result_class == 1:
      print('This premise is NEUTRAL to the hypothesis')
    elif bert_result_class == 2:
      print('This premise CONTRADICTS the hypothesis')
    else:
      print('This premise ENTAILS the hypothesis')

  elif dataset_name == 'glue/qnli':
    print('question:', test[0])
    print('sentence:', test[1])
    if bert_result_class == 1:
      print('The question is NOT answerable by the sentence')
    else:
      print('The question is answerable by the sentence')

  elif dataset_name == 'glue/rte':
    print('sentence1:', test[0])
    print('sentence2:', test[1])
    if bert_result_class == 1:
      print('Sentence1 DOES NOT entails sentence2')
    else:
      print('Sentence1 entails sentence2')

  elif dataset_name == 'glue/wnli':
    print('sentence1:', test[0])
    print('sentence2:', test[1])
    if bert_result_class == 1:
      print('Sentence1 DOES NOT entails sentence2')
    else:
      print('Sentence1 entails sentence2')

  print('BERT raw results:', bert_result[0])
  print()

테스트

with tf.device('/job:localhost'):
  test_dataset = tf.data.Dataset.from_tensor_slices(in_memory_ds[test_split])
  for test_row in test_dataset.shuffle(1000).map(prepare).take(5):
    if len(sentence_features) == 1:
      result = reloaded_model(test_row[0])
    else:
      result = reloaded_model(list(test_row))

    print_bert_results(test_row, result, tfds_name)

sentence: [b'An old woman languished in the forest.']
This sentence is acceptable
BERT raw results: tf.Tensor([-1.7032353  3.3714833], shape=(2,), dtype=float32)

sentence: [b"I went to the movies and didn't pick up the shirts."]
This sentence is acceptable
BERT raw results: tf.Tensor([-0.73970896  1.0806316 ], shape=(2,), dtype=float32)

sentence: [b"Every essay that she's written and which I've read is on that pile."]
This sentence is acceptable
BERT raw results: tf.Tensor([-0.7034159  0.6236454], shape=(2,), dtype=float32)

sentence: [b'Either Bill ate the peaches, or Harry.']
This sentence is unacceptable
BERT raw results: tf.Tensor([ 0.05972151 -0.08620442], shape=(2,), dtype=float32)

sentence: [b'I ran into the baker from whom I bought these bagels.']
This sentence is acceptable
BERT raw results: tf.Tensor([-1.6862067  3.285925 ], shape=(2,), dtype=float32)

당신이 모델에 사용하려는 경우 TF 서빙을 , 그것의 이름을 서명 중 하나를 통해 SavedModel를 호출합니다 기억 해요. 입력에 약간의 차이가 있습니다. Python에서는 다음과 같이 테스트할 수 있습니다.

with tf.device('/job:localhost'):
  serving_model = reloaded_model.signatures['serving_default']
  for test_row in test_dataset.shuffle(1000).map(prepare_serving).take(5):
    result = serving_model(**test_row)
    # The 'prediction' key is the classifier's defined model name.
    print_bert_results(list(test_row.values()), result['prediction'], tfds_name)

sentence: b'Everyone attended more than two seminars.'
This sentence is acceptable
BERT raw results: tf.Tensor([-1.5594155  2.862155 ], shape=(2,), dtype=float32)

sentence: b'Most columnists claim that a senior White House official has been briefing them.'
This sentence is acceptable
BERT raw results: tf.Tensor([-1.6298996  3.3155093], shape=(2,), dtype=float32)

sentence: b"That my father, he's lived here all his life is well known to those cops."
This sentence is acceptable
BERT raw results: tf.Tensor([-1.2048947  1.8589772], shape=(2,), dtype=float32)

sentence: b'Ourselves like us.'
This sentence is acceptable
BERT raw results: tf.Tensor([-1.2723312  2.0494034], shape=(2,), dtype=float32)

sentence: b'John is clever.'
This sentence is acceptable
BERT raw results: tf.Tensor([-1.6516167  3.3147635], shape=(2,), dtype=float32)

훌륭해! 저장된 모델은 코드가 적고 유지 관리가 더 쉬운 더 간단한 API를 사용하여 프로세스에서 제공하거나 간단한 추론에 사용할 수 있습니다.

다음 단계

이제 기본 BERT 모델 중 하나를 시도했으므로 정확도를 높이거나 더 작은 모델 버전을 사용하여 다른 모델을 시도할 수 있습니다.

다른 데이터세트에서도 시도할 수 있습니다.