평가 마이그레이션하기

TensorFlow.org에서보기 Google Colab에서 실행하기 GitHub에서 소스 보기 노트북 다운로드하기

평가는 모델 측정과 벤치마킹에 있어 중요한 부분을 차지합니다.

이 가이드는 TensorFlow 1에서 TensorFlow 2로 평가기 작업을 마이그레이션하는 방법을 보여줍니다. Tensorflow 1에서 이 기능은 API가 분산 실행될 때 tf.estimator.train_and_evaluate에 의해 구현됩니다. Tensorflow 2에서는 기본 tf.keras.utils.SidecarEvaluator 또는 평가기 작업의 사용자 정의 평가 루프를 사용할 수 있습니다.

TensorFlow 1(tf.estimator.Estimator.evaluate)과 TensorFlow 2(Model.fit(..., validation_data=(...)) 또는 Model.evaluate) 모두에는 간단한 일련의 평가 옵션이 있습니다. 작업자가 훈련과 평가 사이를 전환하지 않는 경우에는 평가기 작업을 사용하는 것이 좋고 평가를 배포하려는 경우에는 Model.fit의 내장 평가를 사용하는 것이 좋습니다.

설치하기

import tensorflow.compat.v1 as tf1
import tensorflow as tf
import numpy as np
import tempfile
import time
import os
2022-12-14 20:33:00.097841: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2022-12-14 20:33:00.097934: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2022-12-14 20:33:00.097943: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

TensorFlow 1: tf.estimator.train_and_evaluate를 사용하여 평가하기

TensorFlow 1에서는 tf.estimator.train_and_evaluate를 사용하여 estimator를 평가하도록 tf.estimator를 구성할 수 있습니다.

이 예제에서는 먼저 tf.estimator.Estimator를 정의하고 훈련 및 평가 사양을 지정합니다.

feature_columns = [tf1.feature_column.numeric_column("x", shape=[28, 28])]

classifier = tf1.estimator.DNNClassifier(
    feature_columns=feature_columns,
    hidden_units=[256, 32],
    optimizer=tf1.train.AdamOptimizer(0.001),
    n_classes=10,
    dropout=0.2
)

train_input_fn = tf1.estimator.inputs.numpy_input_fn(
    x={"x": x_train},
    y=y_train.astype(np.int32),
    num_epochs=10,
    batch_size=50,
    shuffle=True,
)

test_input_fn = tf1.estimator.inputs.numpy_input_fn(
    x={"x": x_test},
    y=y_test.astype(np.int32),
    num_epochs=10,
    shuffle=False
)

train_spec = tf1.estimator.TrainSpec(input_fn=train_input_fn, max_steps=10)
eval_spec = tf1.estimator.EvalSpec(input_fn=test_input_fn,
                                   steps=10,
                                   throttle_secs=0)
INFO:tensorflow:Using default config.
WARNING:tensorflow:Using temporary folder as model directory: /tmpfs/tmp/tmp3wto3fy_
INFO:tensorflow:Using config: {'_model_dir': '/tmpfs/tmp/tmp3wto3fy_', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
WARNING:tensorflow:From /tmpfs/tmp/ipykernel_40809/122738158.py:11: The name tf.estimator.inputs is deprecated. Please use tf.compat.v1.estimator.inputs instead.

WARNING:tensorflow:From /tmpfs/tmp/ipykernel_40809/122738158.py:11: The name tf.estimator.inputs.numpy_input_fn is deprecated. Please use tf.compat.v1.estimator.inputs.numpy_input_fn instead.

그런 다음 모델을 훈련하고 평가합니다. 이 노트북에서 평가가 로컬 실행으로 제한되고 훈련과 평가 사이에 번갈아가며 실행되기 때문에 훈련 간에 동기적으로 실행됩니다. 그러나 estimator를 분산적으로 사용하는 경우 평가기는 전용 평가기 작업으로 실행됩니다. 자세한 정보는 분산 훈련용 마이그레이션 가이드를 확인하세요.

tf1.estimator.train_and_evaluate(estimator=classifier,
                                train_spec=train_spec,
                                eval_spec=eval_spec)
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/training/training_util.py:396: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/inputs/queues/feeding_queue_runner.py:60: QueueRunner.__init__ (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/inputs/queues/feeding_functions.py:491: add_queue_runner (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/training/monitored_session.py:910: start_queue_runners (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
2022-12-14 20:33:05.897008: W tensorflow/core/common_runtime/type_inference.cc:339] Type inference failed. This indicates an invalid graph that escaped type checking. Error message: INVALID_ARGUMENT: expected compatible input types, but input 1:
type_id: TFT_OPTIONAL
args {
  type_id: TFT_PRODUCT
  args {
    type_id: TFT_TENSOR
    args {
      type_id: TFT_INT64
    }
  }
}
 is neither a subtype nor a supertype of the combined inputs preceding it:
type_id: TFT_OPTIONAL
args {
  type_id: TFT_PRODUCT
  args {
    type_id: TFT_TENSOR
    args {
      type_id: TFT_INT32
    }
  }
}

    while inferring type of node 'dnn/zero_fraction/cond/output/_18'
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Saving checkpoints for 0 into /tmpfs/tmp/tmp3wto3fy_/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:loss = 115.1364, step = 0
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 10...
INFO:tensorflow:Saving checkpoints for 10 into /tmpfs/tmp/tmp3wto3fy_/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 10...
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2022-12-14T20:33:07
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmpfs/tmp/tmp3wto3fy_/model.ckpt-10
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [1/10]
INFO:tensorflow:Evaluation [2/10]
INFO:tensorflow:Evaluation [3/10]
INFO:tensorflow:Evaluation [4/10]
INFO:tensorflow:Evaluation [5/10]
INFO:tensorflow:Evaluation [6/10]
INFO:tensorflow:Evaluation [7/10]
INFO:tensorflow:Evaluation [8/10]
INFO:tensorflow:Evaluation [9/10]
INFO:tensorflow:Evaluation [10/10]
INFO:tensorflow:Inference Time : 0.28638s
INFO:tensorflow:Finished evaluation at 2022-12-14-20:33:07
INFO:tensorflow:Saving dict for global step 10: accuracy = 0.4703125, average_loss = 1.8119497, global_step = 10, loss = 231.92957
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 10: /tmpfs/tmp/tmp3wto3fy_/model.ckpt-10
INFO:tensorflow:Loss for final step: 87.83424.
({'accuracy': 0.4703125,
  'average_loss': 1.8119497,
  'loss': 231.92957,
  'global_step': 10},
 [])

TensorFlow 2: Keras 모델 평가하기

TensorFlow 2에서 훈련에 Keras Model.fit API를 사용하는 경우 tf.keras.utils.SidecarEvaluator를 사용하여 모델을 평가할 수 있습니다. 이 가이드에 표시되지 않은 평가 메트릭을 TensorBoard에서 시각화할 수도 있습니다.

이를 보여주기 위해 먼저 모델을 정의하고 훈련해 보겠습니다.

def create_model():
  return tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(512, activation='relu'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10)
  ])

loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

model = create_model()
model.compile(optimizer='adam',
              loss=loss,
              metrics=['accuracy'],
              steps_per_execution=10,
              run_eagerly=True)

log_dir = tempfile.mkdtemp()
model_checkpoint = tf.keras.callbacks.ModelCheckpoint(
    filepath=os.path.join(log_dir, 'ckpt-{epoch}'),
    save_weights_only=True)

model.fit(x=x_train,
          y=y_train,
          epochs=1,
          callbacks=[model_checkpoint])
1875/1875 [==============================] - 31s 17ms/step - loss: 0.2186 - accuracy: 0.9362
<keras.callbacks.History at 0x7fe51c0d4820>

그런 다음 tf.keras.utils.SidecarEvaluator를 사용하여 모델을 평가합니다. 실제 훈련에서는 별도의 작업을 사용하여 평가를 수행하고 훈련에 사용할 작업자 리소스를 확보하는 것이 좋습니다.

data = tf.data.Dataset.from_tensor_slices((x_test, y_test))
data = data.batch(64)

tf.keras.utils.SidecarEvaluator(
model=model,
data=data,
checkpoint_dir=log_dir,
max_evaluations=1
).start()
INFO:tensorflow:Waiting for new checkpoint at /tmpfs/tmp/tmpb8crrwr_
INFO:tensorflow:Found new checkpoint at /tmpfs/tmp/tmpb8crrwr_/ckpt-1
INFO:tensorflow:Evaluation starts: Model weights loaded from latest checkpoint file /tmpfs/tmp/tmpb8crrwr_/ckpt-1
157/157 - 2s - loss: 0.1135 - accuracy: 0.9650 - 2s/epoch - 10ms/step
INFO:tensorflow:End of evaluation. Metrics: loss=0.11346454918384552 accuracy=0.9649999737739563
INFO:tensorflow:Last checkpoint evaluated. SidecarEvaluator stops.

다음 단계

  • 사이드카 평가에 대해 자세히 알아보려면 tf.keras.utils.SidecarEvaluator API 설명서를 읽어보세요.
  • Keras에서 훈련과 평가를 번갈아 가며 수행하는 것을 고려하려면 다른 내장 메소드를 읽어보는 것을 고려하세요.