텍스트 생성을 위한 Federated Learning

참고: 이 Colab은 tensorflow_federated pip 패키지의 최신 릴리즈 버전에서 동작하는 것으로 확인되었지만, Tensorflow Federated 프로젝트는 아직 릴리즈 전 개발 중이며 master에서 동작하지 않을 수 있습니다.

이 튜토리얼은 이미지 분류를 위한 Federated Learning 튜토리얼의 개념을 기반으로 하며, 페더레이션 학습을 위한 몇 가지 유용한 접근 방식을 보여줍니다.

특히, 이전에 훈련된 Keras 모델을 로드하고 분산된 (시뮬레이션) 데이터세트에 대한 페더레이션 훈련을 사용하여 구체화합니다. 이 방법은 여러 가지 이유로 실질적으로 중요합니다. 직렬화된 모델을 사용하는 기능을 통해 페더레이션 학습을 다른 ML 접근 방식과 쉽게 혼합할 수 있습니다. 또한, 이를 통해 사전 훈련된 모델의 범위가 증가할 수 있습니다. 예를 들어, 사전 훈련된 수많은 모델이 현재 널리 사용 가능하기 때문에 처음부터 언어 모델을 훈련하는 것은 거의 필요하지 않습니다(예: TF Hub 참조). 대신, 사전 훈련된 모델에서 시작하여 특정 애플리케이션에 대한 분산 데이터의 특정 특성에 적응하여 Federated Learning을 사용하여 구체화하는 것이 더 합리적입니다.

이 튜토리얼에서는 ASCII 문자를 생성하는 RNN으로 시작하고 페더레이션 학습을 통해 구체화합니다. 또한, 최종 가중치를 원래 Keras 모델로 피드백하여 표준 도구를 사용하여 쉽게 평가하고 텍스트를 생성할 수 있는 방법을 보여줍니다.

!pip install --quiet --upgrade tensorflow_federated_nightly
!pip install --quiet --upgrade nest_asyncio

import nest_asyncio
nest_asyncio.apply()

import collections
import functools
import os
import time

import numpy as np
import tensorflow as tf
import tensorflow_federated as tff

np.random.seed(0)

# Test the TFF is working:
tff.federated_computation(lambda: 'Hello, World!')()

b'Hello, World!'

사전 훈련된 모델 로드하기

TensorFlow 튜토리얼 즉시 실행되는 RNN을 사용한 텍스트 생성에 따라 사전 훈련된 모델을 로드합니다. 하지만, 셰익스피어의 전체 작품에 대한 훈련 대신 Charles Dickens의 A Tale of Two Cities 및 A Christmas Carol의 텍스트에 대해 모델을 사전 훈련했습니다.

어휘 확장 이외에는 원래 튜토리얼을 수정하지 않았기 때문에 이 초기 모델은 최첨단이 아니지만, 합리적인 예측값을 생성하며, 튜토리얼 목적에는 충분합니다. 최종 모델은 tf.keras.models.save_model(include_optimizer=False)로 저장되었습니다.

이 튜토리얼에서는 TFF에서 제공하는 데이터의 페더레이션 버전을 사용하여 셰익스피어에 대한 이 모델을 미세 조정하는 데 페더레이션 학습을 사용할 것입니다.

어휘 조회 테이블 생성하기

# A fixed vocabularly of ASCII chars that occur in the works of Shakespeare and Dickens:
vocab = list('dhlptx@DHLPTX $(,048cgkoswCGKOSW[_#\'/37;?bfjnrvzBFJNRVZ"&amp;*.26:\naeimquyAEIMQUY]!%)-159\r')

# Creating a mapping from unique characters to indices
char2idx = {u:i for i, u in enumerate(vocab)}
idx2char = np.array(vocab)

사전 훈련된 모델을 로드하고 일부 텍스트 생성하기

def load_model(batch_size):
  urls = {
      1: 'https://storage.googleapis.com/tff-models-public/dickens_rnn.batch1.kerasmodel',
      8: 'https://storage.googleapis.com/tff-models-public/dickens_rnn.batch8.kerasmodel'}
  assert batch_size in urls, 'batch_size must be in ' + str(urls.keys())
  url = urls[batch_size]
  local_file = tf.keras.utils.get_file(os.path.basename(url), origin=url)  
  return tf.keras.models.load_model(local_file, compile=False)

def generate_text(model, start_string):
  # From https://www.tensorflow.org/tutorials/sequences/text_generation
  num_generate = 200
  input_eval = [char2idx[s] for s in start_string]
  input_eval = tf.expand_dims(input_eval, 0)
  text_generated = []
  temperature = 1.0

  model.reset_states()
  for i in range(num_generate):
    predictions = model(input_eval)
    predictions = tf.squeeze(predictions, 0)
    predictions = predictions / temperature
    predicted_id = tf.random.categorical(
        predictions, num_samples=1)[-1, 0].numpy()
    input_eval = tf.expand_dims([predicted_id], 0)
    text_generated.append(idx2char[predicted_id])

  return (start_string + ''.join(text_generated))

# Text generation requires a batch_size=1 model.
keras_model_batch1 = load_model(batch_size=1)
print(generate_text(keras_model_batch1, 'What of TensorFlow Federated, you ask? '))

Downloading data from https://storage.googleapis.com/tff-models-public/dickens_rnn.batch1.kerasmodel
16195584/16193984 [==============================] - 0s 0us/step
16203776/16193984 [==============================] - 0s 0us/step
What of TensorFlow Federated, you ask? Sall
yesterday. Received the Bailey."

"Mr. Lorry, grimmering himself, or low varked thends the winter, and the eyes of Monsieur
Defarge. "Let his mind, hon in his
life and message; four declare

페더레이션 셰익스피어 데이터 로드 및 전처리

tff.simulation.datasets 패키지는 "clients"로 분할된 다양한 데이터세트를 제공합니다. 여기서 각 클라이언트는 페더레이션 학습에 참여할 수 있는 특정 기기의 데이터세트에 해당합니다.

이들 데이터세트는 실제 분산된 데이터에 대한 훈련 문제를 시뮬레이션에서 복제하는 현실적인 비 IID 데이터 분산을 제공합니다. 이 데이터의 일부 전처리는 Leaf 프로젝트(github)의 도구를 사용하여 수행되었습니다.

train_data, test_data = tff.simulation.datasets.shakespeare.load_data()

shakespeare.load_data()에서 제공하는 데이터세트는 셰익스피어 연극의 특정 캐릭터가 말한 각 대사에 하나씩, 문자열 Tensors의 시퀀스로 구성됩니다. 클라이언트 키는 캐릭터의 이름과 결합된 연극의 이름으로 구성됩니다. 예를 들어, MUCH_ADO_ABOUT_NOTHING_OTHELLO는 연극 Much Ado About Nothing의 오델로 캐릭터 대사에 해당합니다. 실제 페더레이션 학습 시나리오에서 클라이언트는 ID로 식별되거나 추적되지 않지만, 시뮬레이션의 경우 키가 지정된 데이터세트로 작업하는 것이 유용합니다.

예를 들어, 여기에서 King Lear의 일부 데이터를 볼 수 있습니다.

# Here the play is "The Tragedy of King Lear" and the character is "King".
raw_example_dataset = train_data.create_tf_dataset_for_client(
    'THE_TRAGEDY_OF_KING_LEAR_KING')
# To allow for future extensions, each entry x
# is an OrderedDict with a single key 'snippets' which contains the text.
for x in raw_example_dataset.take(2):
  print(x['snippets'])

tf.Tensor(b'', shape=(), dtype=string)
tf.Tensor(b'What?', shape=(), dtype=string)

이제 tf.data.Dataset 변환을 사용하여 위에 로드된 문자 RNN을 훈련하기 위한 이 데이터를 준비합니다.

# Input pre-processing parameters
SEQ_LENGTH = 100
BATCH_SIZE = 8
BUFFER_SIZE = 100  # For dataset shuffling

# Construct a lookup table to map string chars to indexes,
# using the vocab loaded above:
table = tf.lookup.StaticHashTable(
    tf.lookup.KeyValueTensorInitializer(
        keys=vocab, values=tf.constant(list(range(len(vocab))),
                                       dtype=tf.int64)),
    default_value=0)


def to_ids(x):
  s = tf.reshape(x['snippets'], shape=[1])
  chars = tf.strings.bytes_split(s).values
  ids = table.lookup(chars)
  return ids


def split_input_target(chunk):
  input_text = tf.map_fn(lambda x: x[:-1], chunk)
  target_text = tf.map_fn(lambda x: x[1:], chunk)
  return (input_text, target_text)


def preprocess(dataset):
  return (
      # Map ASCII chars to int64 indexes using the vocab
      dataset.map(to_ids)
      # Split into individual chars
      .unbatch()
      # Form example sequences of SEQ_LENGTH +1
      .batch(SEQ_LENGTH + 1, drop_remainder=True)
      # Shuffle and form minibatches
      .shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)
      # And finally split into (input, target) tuples,
      # each of length SEQ_LENGTH.
      .map(split_input_target))

원래 시퀀스의 형성과 위의 배치 형성에서는 단순성을 위해 drop_remainder=True를 사용합니다. 즉, 최소한 (SEQ_LENGTH + 1) * BATCH_SIZE 문자가 없는 모든 문자(클라이언트)는 빈 데이터세트를 갖게 됩니다. 이를 해결하기 위한 일반적인 접근 방식은 배치를 특수 토큰으로 채운 다음 해당 토큰을 고려하지 않도록 손실을 마스크하는 것입니다.

이로 인해 예제가 다소 복잡해지므로 이 튜토리얼에서는 표준 튜토리얼에서와 같이 전체 배치만 사용합니다. 그러나 페더레이션 설정에서는 많은 사용자가 작은 데이터세트를 가질 수 있으므로 이 문제가 더 중요합니다.

이제 raw_example_dataset를 전처리하고 유형을 확인할 수 있습니다.

example_dataset = preprocess(raw_example_dataset)
print(example_dataset.element_spec)

(TensorSpec(shape=(8, 100), dtype=tf.int64, name=None), TensorSpec(shape=(8, 100), dtype=tf.int64, name=None))

모델 컴파일 및 전처리된 데이터로 테스트하기

컴파일되지 않은 keras 모델을 로드했지만, keras_model.evaluate를 실행하려면 손실 및 메트릭을 사용하여 컴파일해야 합니다. 또한, 페더레이션 학습에서 기기 내 옵티마이저로 사용될 옵티마이저에서 컴파일할 것입니다.

원래 튜토리얼에는 문자 수준의 정확성은 없었습니다(올바른 다음 문자에 가장 높은 확률을 부여한 예측 비율). 이 정확성은 유용한 메트릭이므로 추가합니다. 그러나, 예측값은 순위 3(각 BATCH_SIZE * SEQ_LENGTH 예측값에 대한 로짓 벡터)이고 SparseCategoricalAccuracy는 순위 2 예측값만 기대하므로 정확성에 대한 새 메트릭 클래스를 정의해야 합니다.

class FlattenedCategoricalAccuracy(tf.keras.metrics.SparseCategoricalAccuracy):

  def __init__(self, name='accuracy', dtype=tf.float32):
    super().__init__(name, dtype=dtype)

  def update_state(self, y_true, y_pred, sample_weight=None):
    y_true = tf.reshape(y_true, [-1, 1])
    y_pred = tf.reshape(y_pred, [-1, len(vocab), 1])
    return super().update_state(y_true, y_pred, sample_weight)

이제 모델을 컴파일하고 example_dataset에서 평가할 수 있습니다.

BATCH_SIZE = 8  # The training and eval batch size for the rest of this tutorial.
keras_model = load_model(batch_size=BATCH_SIZE)
keras_model.compile(
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=[FlattenedCategoricalAccuracy()])

# Confirm that loss is much lower on Shakespeare than on random data
loss, accuracy = keras_model.evaluate(example_dataset.take(5), verbose=0)
print(
    'Evaluating on an example Shakespeare character: {a:3f}'.format(a=accuracy))

# As a sanity check, we can construct some completely random data, where we expect
# the accuracy to be essentially random:
random_guessed_accuracy = 1.0 / len(vocab)
print('Expected accuracy for random guessing: {a:.3f}'.format(
    a=random_guessed_accuracy))
random_indexes = np.random.randint(
    low=0, high=len(vocab), size=1 * BATCH_SIZE * (SEQ_LENGTH + 1))
data = collections.OrderedDict(
    snippets=tf.constant(
        ''.join(np.array(vocab)[random_indexes]), shape=[1, 1]))
random_dataset = preprocess(tf.data.Dataset.from_tensor_slices(data))
loss, accuracy = keras_model.evaluate(random_dataset, steps=10, verbose=0)
print('Evaluating on completely random data: {a:.3f}'.format(a=accuracy))

Downloading data from https://storage.googleapis.com/tff-models-public/dickens_rnn.batch8.kerasmodel
16195584/16193984 [==============================] - 0s 0us/step
16203776/16193984 [==============================] - 0s 0us/step
Evaluating on an example Shakespeare character: 0.402000
Expected accuracy for random guessing: 0.012
Evaluating on completely random data: 0.011

페더레이션 학습으로 모델 미세 조정하기

TFF는 모든 TensorFlow 계산을 직렬화하여 잠재적으로 Python이 아닌 환경에서 실행할 수 있습니다(현재로서는 Python으로 구현된 시뮬레이션 런타임만 사용할 수 있음). 즉시 모드(TF 2.0)에서 실행 중이지만, 현재 TFF는 "with tf.Graph.as_default()" 문의 컨텍스트 내에서 필요한 ops를 구성하여 TensorFlow 계산을 직렬화합니다. 따라서 TFF가 제어하는 그래프에 모델을 도입하는 데 사용할 수 있는 함수를 제공해야 합니다. 다음과 같이 수행합니다.

# Clone the keras_model inside `create_tff_model()`, which TFF will
# call to produce a new copy of the model inside the graph that it will 
# serialize. Note: we want to construct all the necessary objects we'll need 
# _inside_ this method.
def create_tff_model():
  # TFF uses an `input_spec` so it knows the types and shapes
  # that your model expects.
  input_spec = example_dataset.element_spec
  keras_model_clone = tf.keras.models.clone_model(keras_model)
  return tff.learning.from_keras_model(
      keras_model_clone,
      input_spec=input_spec,
      loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
      metrics=[FlattenedCategoricalAccuracy()])

이제 모델을 개선하는 데 사용할 Federated Averaging 반복 프로세스를 구성할 준비가 되었습니다(Federated Averaging 알고리즘에 대한 자세한 내용은 분산 데이터에서 딥 네트워크의 Communication-Efficient Learning 논문 참조).

컴파일된 Keras 모델을 사용하여 페더레이션 훈련의 각 라운드 후에 표준 (비 페더레이션) 평가를 수행합니다. 이는 시뮬레이션된 페더레이션 학습을 수행할 때 연구 목적으로 유용하며, 표준 테스트데이터 세트가 있습니다.

현실적인 운영 환경에서 같은 기술을 사용하여 페러레이션 학습으로 훈련 된 모델을 테스트 또는 품질 보증 목적으로 중앙 집중식 벤치마크 데이터세트에서 평가할 수 있습니다.

# This command builds all the TensorFlow graphs and serializes them: 
fed_avg = tff.learning.build_federated_averaging_process(
    model_fn=create_tff_model,
    client_optimizer_fn=lambda: tf.keras.optimizers.SGD(lr=0.5))

다음은 단일 배치의 단일 클라이언트에서 한 라운드에 대해 페더레이션 평균화를 실행하는 가장 간단한 루프입니다.

state = fed_avg.initialize()
state, metrics = fed_avg.next(state, [example_dataset.take(5)])
print('loss={l:.3f}, accuracy={a:.3f}'.format(
    l=metrics.train.loss, a=metrics.train.accuracy))

loss=4.403, accuracy=0.132

이제 약간 더 흥미로운 훈련 및 평가 루프를 작성해 보겠습니다.

이 시뮬레이션이 여전히 상대적으로 빠르게 실행되도록 각 라운드에 대해 2개의 미니 배치만을 고려하여 같은 3개의 클라이언트에 대해 훈련합니다.

def data(client, source=train_data):
  return preprocess(source.create_tf_dataset_for_client(client)).take(5)


clients = [
    'ALL_S_WELL_THAT_ENDS_WELL_CELIA', 'MUCH_ADO_ABOUT_NOTHING_OTHELLO',
]

train_datasets = [data(client) for client in clients]

# We concatenate the test datasets for evaluation with Keras by creating a 
# Dataset of Datasets, and then identity flat mapping across all the examples.
test_dataset = tf.data.Dataset.from_tensor_slices(
    [data(client, test_data) for client in clients]).flat_map(lambda x: x)

clone_model()은 가중치를 복제하지 않으므로 fed_avg.initialize()에 의해 생성된 모델의 초기 상태는 로드된 가중치가 아니라 Keras 모델의 임의 이니셜라이저를 기반으로 합니다. 사전 훈련된 모델에서 훈련을 시작하기 위해 로드된 모델에서 직접 서버 상태의 모델 가중치를 설정합니다.

NUM_ROUNDS = 5

# The state of the FL server, containing the model and optimization state.
state = fed_avg.initialize()

# Load our pre-trained Keras model weights into the global model state.
state = tff.learning.state_with_new_model_weights(
    state,
    trainable_weights=[v.numpy() for v in keras_model.trainable_weights],
    non_trainable_weights=[
        v.numpy() for v in keras_model.non_trainable_weights
    ])


def keras_evaluate(state, round_num):
  # Take our global model weights and push them back into a Keras model to
  # use its standard `.evaluate()` method.
  keras_model = load_model(batch_size=BATCH_SIZE)
  keras_model.compile(
      loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
      metrics=[FlattenedCategoricalAccuracy()])
  state.model.assign_weights_to(keras_model)
  loss, accuracy = keras_model.evaluate(example_dataset, steps=2, verbose=0)
  print('\tEval: loss={l:.3f}, accuracy={a:.3f}'.format(l=loss, a=accuracy))


for round_num in range(NUM_ROUNDS):
  print('Round {r}'.format(r=round_num))
  keras_evaluate(state, round_num)
  state, metrics = fed_avg.next(state, train_datasets)
  train_metrics = metrics['train']
  print('\tTrain: loss={l:.3f}, accuracy={a:.3f}'.format(
      l=train_metrics['loss'], a=train_metrics['accuracy']))

print('Final evaluation')
keras_evaluate(state, NUM_ROUNDS + 1)

Round 0
    Eval: loss=3.324, accuracy=0.401
    Train: loss=4.360, accuracy=0.155
Round 1
    Eval: loss=4.361, accuracy=0.049
    Train: loss=4.235, accuracy=0.164
Round 2
    Eval: loss=4.219, accuracy=0.177
    Train: loss=4.081, accuracy=0.221
Round 3
    Eval: loss=4.080, accuracy=0.174
    Train: loss=3.940, accuracy=0.226
Round 4
    Eval: loss=3.991, accuracy=0.176
    Train: loss=3.840, accuracy=0.226
Final evaluation
    Eval: loss=3.909, accuracy=0.171

기본 변경으로 큰 차이를 만들 수 있는 충분한 훈련을 하지 않았지만, 더 많은 셰익스피어 데이터에 대해 더 오래 훈련하면 업데이트된 모델로 생성된 텍스트 스타일에서 차이를 볼 수 있습니다.

# Set our newly trained weights back in the originally created model.
keras_model_batch1.set_weights([v.numpy() for v in keras_model.weights])
# Text generation requires batch_size=1
print(generate_text(keras_model_batch1, 'What of TensorFlow Federated, you ask? '))

What of TensorFlow Federated, you ask? Shalways, I will call your
compet with any city brought their faces uncompany," besumed him. "When he
sticked Madame Defarge pushed the lamps.

"Have I often but no unison. She had probably come,

확장 제안

이 튜토리얼은 첫 단계에 불과합니다! 이 노트북을 확장하는 방법에 대한 몇 가지 아이디어는 다음과 같습니다.

무작위로 훈련할 클라이언트를 샘플링하는 보다 현실적인 훈련 루프를 작성합니다.
클라이언트 데이터세트에서 ".repeat(NUM_EPOCHS)"를 사용하여 여러 epoch의 로컬 학습을 시도합니다(예: McMahan et. al.). 이를 수행하는 이미지 분류를 위한 Federated Learning도 참조하세요.
compile() 명령을 변경하여 클라이언트에서 다른 최적화 알고리즘을 사용하여 실험합니다.
build_federated_averaging_process에 server_optimizer 인수를 사용하여 서버에 모델 업데이트를 적용하기 위한 다른 알고리즘을 시도합니다.
client_weight_fn 인수를 build_federated_averaging_process에 사용하여 클라이언트의 다른 가중치를 시도합니다. 기본 가중치는 클라이언트의 예제 수에 따라 업데이트되지만, 예를 들어 client_weight_fn=lambda _: tf.constant(1.0)를 수행할 수 있습니다.

텍스트 생성을 위한 Federated Learning 컬렉션을 사용해 정리하기 내 환경설정을 기준으로 콘텐츠를 저장하고 분류하세요.