불균형 데이터 분류

TensorFlow.org에서 보기 Google Colab에서 실행하기 GitHub에서 소스 노트북 다운로드

이 튜토리얼에서는 한 클래스의 예시의 수가 다른 클래스보다 훨씬 많은 매우 불균형적인 데이터세트를 분류하는 방법을 소개합니다. Kaggle에서 호스팅되는 신용 카드 부정 행위 탐지 데이터세트를 사용하여 작업해 보겠습니다. 총 284,807건의 거래에서 492건의 부정 거래를 탐지하는 것을 목표로 합니다. Keras를 사용하여 모델 및 클래스 가중치를 정의하여 불균형 데이터에서 모델을 학습시켜 보겠습니다.

이 튜토리얼에는 다음을 수행하기 위한 완전한 코드가 포함되어 있습니다.

  • Pandas를 사용하여 CSV 파일 로드.
  • 학습, 검증 및 테스트세트 작성.
  • Keras를 사용하여 모델을 정의하고 학습(클래스 가중치 설정 포함)
  • 다양한 측정 기준(정밀도 및 재현율 포함)을 사용하여 모델 평가
  • 다음과 같은 불균형 데이터를 처리하기 위한 일반적인 기술 사용
    • 클래스 가중치
    • 오버샘플링

설정

import tensorflow as tf
from tensorflow import keras

import os
import tempfile

import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

import sklearn
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
mpl.rcParams['figure.figsize'] = (12, 10)
colors = plt.rcParams['axes.prop_cycle'].by_key()['color']

데이터 처리 및 탐색

Kaggle 신용 카드 부정 행위 데이터 세트

Pandas는 구조적 데이터를 로드하고 처리하는 데 유용한 여러 유틸리티가 포함된 Python 라이브러리입니다. CSV를 Pandas 데이터 프레임으로 다운로드하는 데 사용할 수 있습니다.

참고: 이 데이터세트는 빅데이터 마이닝 및 부정 행위 감지에 대한 Worldline과 ULB(Université Libre de Bruxelles) Machine Learning Group의 연구 협업을 통해 수집 및 분석되었습니다. 관련 주제에 관한 현재 및 과거 프로젝트에 대한 자세한 내용은 여기를 참조하거나 DefeatFraud 프로젝트 페이지에서 확인할 수 있습니다.

file = tf.keras.utils
raw_df = pd.read_csv('https://storage.googleapis.com/download.tensorflow.org/data/creditcard.csv')
raw_df.head()
raw_df[['Time', 'V1', 'V2', 'V3', 'V4', 'V5', 'V26', 'V27', 'V28', 'Amount', 'Class']].describe()

클래스 레이블 불균형 검사

데이터세트 불균형을 살펴보겠습니다.:

neg, pos = np.bincount(raw_df['Class'])
total = neg + pos
print('Examples:\n    Total: {}\n    Positive: {} ({:.2f}% of total)\n'.format(
    total, pos, 100 * pos / total))
Examples:
    Total: 284807
    Positive: 492 (0.17% of total)

이를 통해 양성 샘플 일부를 확인할 수 있습니다.

데이터 정리, 분할 및 정규화

원시 데이터에는 몇 가지 문제가 있습니다. 먼저 TimeAmount 열이 매우 가변적이므로 직접 사용할 수 없습니다. (의미가 명확하지 않으므로) Time 열을 삭제하고 Amount 열의 로그를 가져와 범위를 줄입니다.

cleaned_df = raw_df.copy()

# You don't want the `Time` column.
cleaned_df.pop('Time')

# The `Amount` column covers a huge range. Convert to log-space.
eps = 0.001 # 0 => 0.1¢
cleaned_df['Log Ammount'] = np.log(cleaned_df.pop('Amount')+eps)

데이터세트를 학습, 검증 및 테스트 세트로 분할합니다. 검증 세트는 모델 피팅 중에 사용되어 손실 및 메트릭을 평가하지만 해당 모델은 이 데이터에 적합하지 않습니다. 테스트 세트는 훈련 단계에서는 전혀 사용되지 않으며 마지막에만 사용되어 모델이 새 데이터로 일반화되는 정도를 평가합니다. 이는 훈련 데이터가 부족하여 과대적합이 크게 문제가 되는 불균형 데이터세트에서 특히 중요합니다.

# Use a utility from sklearn to split and shuffle your dataset.
train_df, test_df = train_test_split(cleaned_df, test_size=0.2)
train_df, val_df = train_test_split(train_df, test_size=0.2)

# Form np arrays of labels and features.
train_labels = np.array(train_df.pop('Class'))
bool_train_labels = train_labels != 0
val_labels = np.array(val_df.pop('Class'))
test_labels = np.array(test_df.pop('Class'))

train_features = np.array(train_df)
val_features = np.array(val_df)
test_features = np.array(test_df)

sklearn StandardScaler를 사용하여 입력 특성을 정규화하면 평균은 0으로, 표준 편차는 1로 설정됩니다.

참고: StandardScaler는 모델이 유효성 검사 또는 테스트 세트를 참고하는지 여부를 확인하기 위해 train_features를 사용하는 경우에만 적합합니다.

scaler = StandardScaler()
train_features = scaler.fit_transform(train_features)

val_features = scaler.transform(val_features)
test_features = scaler.transform(test_features)

train_features = np.clip(train_features, -5, 5)
val_features = np.clip(val_features, -5, 5)
test_features = np.clip(test_features, -5, 5)


print('Training labels shape:', train_labels.shape)
print('Validation labels shape:', val_labels.shape)
print('Test labels shape:', test_labels.shape)

print('Training features shape:', train_features.shape)
print('Validation features shape:', val_features.shape)
print('Test features shape:', test_features.shape)
Training labels shape: (182276,)
Validation labels shape: (45569,)
Test labels shape: (56962,)
Training features shape: (182276, 29)
Validation features shape: (45569, 29)
Test features shape: (56962, 29)

주의: 모델을 배포하려면 전처리 계산을 유지하는 것이 중요합니다. 따라서 레이어로 구현하고 내보내기 전에 모델에 연결하는 것이 가장 쉬운 방법입니다.

데이터 분포 살펴보기

다음으로 몇 가지 특성에 대한 양 및 음의 예시 분포를 비교해 보겠습니다. 이 때 스스로 검토할 사항은 다음과 같습니다.

  • 이와 같은 분포가 합리적인가?
    • 예, 이미 입력을 정규화했으며 대부분 +/- 2 범위에 집중되어 있습니다.
  • 분포 간 차이를 알 수 있습니까?
    • 예, 양의 예에는 극단적 값의 비율이 훨씬 높습니다.
pos_df = pd.DataFrame(train_features[ bool_train_labels], columns=train_df.columns)
neg_df = pd.DataFrame(train_features[~bool_train_labels], columns=train_df.columns)

sns.jointplot(pos_df['V5'], pos_df['V6'],
              kind='hex', xlim=(-5,5), ylim=(-5,5))
plt.suptitle("Positive distribution")

sns.jointplot(neg_df['V5'], neg_df['V6'],
              kind='hex', xlim=(-5,5), ylim=(-5,5))
_ = plt.suptitle("Negative distribution")
/home/kbuilder/.local/lib/python3.7/site-packages/seaborn/_decorators.py:43: FutureWarning: Pass the following variables as keyword args: x, y. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.
  FutureWarning
/home/kbuilder/.local/lib/python3.7/site-packages/seaborn/_decorators.py:43: FutureWarning: Pass the following variables as keyword args: x, y. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.
  FutureWarning

png

png

모델 및 메트릭 정의

조밀하게 연결된 숨겨진 레이어, 과대적합을 줄이기 위한 드롭아웃 레이어, 거래 사기 가능성을 반환하는 시그모이드 출력 레이어로 간단한 신경망을 생성하는 함수를 정의합니다.

METRICS = [
      keras.metrics.TruePositives(name='tp'),
      keras.metrics.FalsePositives(name='fp'),
      keras.metrics.TrueNegatives(name='tn'),
      keras.metrics.FalseNegatives(name='fn'), 
      keras.metrics.BinaryAccuracy(name='accuracy'),
      keras.metrics.Precision(name='precision'),
      keras.metrics.Recall(name='recall'),
      keras.metrics.AUC(name='auc'),
      keras.metrics.AUC(name='prc', curve='PR'), # precision-recall curve
]

def make_model(metrics=METRICS, output_bias=None):
  if output_bias is not None:
    output_bias = tf.keras.initializers.Constant(output_bias)
  model = keras.Sequential([
      keras.layers.Dense(
          16, activation='relu',
          input_shape=(train_features.shape[-1],)),
      keras.layers.Dropout(0.5),
      keras.layers.Dense(1, activation='sigmoid',
                         bias_initializer=output_bias),
  ])

  model.compile(
      optimizer=keras.optimizers.Adam(learning_rate=1e-3),
      loss=keras.losses.BinaryCrossentropy(),
      metrics=metrics)

  return model

유용한 메트릭 이해하기

위에서 정의한 몇 가지 메트릭은 모델을 통해 계산할 수 있으며 성능을 평가할 때 유용합니다.

  • 허위 음성과 허위 양성은 잘못 분류된 샘플입니다.
  • 실제 음성과 실제 양성은 올바로 분류된 샘플입니다.
  • 정확도는 올바로 분류된 예의 비율입니다.

$\frac{\text{true samples} }{\text{total samples} }$

  • 정밀도는 올바르게 분류된 예측 양성의 비율입니다.

$\frac{\text{true positives} }{\text{true positives + false positives} }$

  • 재현율은 올바르게 분류된 실제 양성의 비율입니다.

$\frac{\text{true positives} }{\text{true positives + false negatives} }$

  • AUC는 ROC-AUC(Area Under the Curve of a Receiver Operating Characteristic) 곡선을 의미합니다. 이 메트릭은 분류자가 임의의 양성 샘플 순위를 임의의 음성 샘플 순위보다 높게 지정할 확률과 같습니다.
  • AUPRC는 PR curve AUC를 의미합니다. 이 메트릭은 다양한 확률 임계값에 대한 정밀도-재현율 쌍을 계산합니다.

참고: 정확도는 이 작업에 유용한 측정 항목이 아닙니다. 항상 False를 예측해야 이 작업에서 99.8% 이상의 정확도를 얻을 수 있습니다.

더 읽어보기:

기준 모델

모델 구축

이제 앞서 정의한 함수를 사용하여 모델을 만들고 학습해 보겠습니다. 모델은 기본 배치 크기인 2048보다 큰 배치 크기를 사용하는 것이 좋습니다. 각 배치에서 양성 샘플을 일부 포함시켜 적절한 기회를 얻는 것이 중요합니다. 배치 크기가 너무 작으면 부정 거래 예시를 제대로 학습할 수 없습니다.

참고: 이 모델은 클래스의 불균형을 잘 다루지 못합니다. 이를 이 튜토리얼의 뒷부분에서 개선하게 될 겁니다.

EPOCHS = 100
BATCH_SIZE = 2048

early_stopping = tf.keras.callbacks.EarlyStopping(
    monitor='val_auc', 
    verbose=1,
    patience=10,
    mode='max',
    restore_best_weights=True)
model = make_model()
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 16)                480       
_________________________________________________________________
dropout (Dropout)            (None, 16)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 17        
=================================================================
Total params: 497
Trainable params: 497
Non-trainable params: 0
_________________________________________________________________

모델을 실행하여 테스트해보겠습니다.

model.predict(train_features[:10])
array([[0.24614692],
       [0.54844296],
       [0.2355674 ],
       [0.5151356 ],
       [0.39113   ],
       [0.386738  ],
       [0.32651168],
       [0.41346052],
       [0.11890455],
       [0.34378132]], dtype=float32)

선택사항: 초기 바이어스를 올바로 설정합니다.

이와 같은 초기 추측은 적절하지 않습니다. 데이터세트가 불균형하다는 것을 알고 있으니까요. 출력 레이어의 바이어스를 설정하여 해당 데이터세트를 반영하면(참조: 신경망 훈련 방법: "init well") 초기 수렴에 유용할 수 있습니다.

기본 바이어스 초기화를 사용하면 손실은 약 math.log(2) = 0.69314

results = model.evaluate(train_features, train_labels, batch_size=BATCH_SIZE, verbose=0)
print("Loss: {:0.4f}".format(results[0]))
Loss: 0.4902

올바른 바이어스 설정은 다음에서 가능합니다.

$$ p_0 = pos/(pos + neg) = 1/(1+e^{-b_0}) $$
$$ b_0 = -log_e(1/p_0 - 1) $$
$$ b_0 = log_e(pos/neg)$$
initial_bias = np.log([pos/neg])
initial_bias
array([-6.35935934])

이를 초기 바이어스로 설정하면 모델은 훨씬 더 합리적으로 초기 추측을 할 수 있습니다.

pos/total = 0.0018에 가까울 것입니다.

model = make_model(output_bias=initial_bias)
model.predict(train_features[:10])
array([[0.0108717 ],
       [0.00171685],
       [0.0009581 ],
       [0.00057781],
       [0.00524487],
       [0.00081512],
       [0.00155083],
       [0.0006627 ],
       [0.00218084],
       [0.00183015]], dtype=float32)

이 초기화를 통해서 초기 손실은 대략 다음과 같아야합니다.:

$$-p_0log(p_0)-(1-p_0)log(1-p_0) = 0.01317$$
results = model.evaluate(train_features, train_labels, batch_size=BATCH_SIZE, verbose=0)
print("Loss: {:0.4f}".format(results[0]))
Loss: 0.0145

이 초기 손실은 단순한 상태의 초기화에서 발생했을 때 보다 약 50배 적습니다.

이런 식으로 모델은 처음 몇 epoch를 쓰며 양성 예시가 거의 없다는 것을 학습할 필요는 없습니다. 이렇게 하면 학습을 하면서 손실된 플롯을 더 쉽게 파악할 수 있습니다.

초기 가중치 체크 포인트

다양한 학습 과정을 비교하려면 이 초기 모델의 가중치를 체크포인트 파일에 보관하고 학습 전에 각 모델에 로드합니다.

initial_weights = os.path.join(tempfile.mkdtemp(), 'initial_weights')
model.save_weights(initial_weights)

바이어스 수정이 도움이 되는지 확인하기

계속 진행하기 전에 조심스러운 바이어스 초기화가 실제로 도움이 되었는지 빠르게 확인하십시오

정교한 초기화를 한 모델과 하지 않은 모델을 20 epoch 학습시키고 손실을 비교합니다.

model = make_model()
model.load_weights(initial_weights)
model.layers[-1].bias.assign([0.0])
zero_bias_history = model.fit(
    train_features,
    train_labels,
    batch_size=BATCH_SIZE,
    epochs=20,
    validation_data=(val_features, val_labels), 
    verbose=0)
model = make_model()
model.load_weights(initial_weights)
careful_bias_history = model.fit(
    train_features,
    train_labels,
    batch_size=BATCH_SIZE,
    epochs=20,
    validation_data=(val_features, val_labels), 
    verbose=0)
def plot_loss(history, label, n):
  # Use a log scale to show the wide range of values.
  plt.semilogy(history.epoch, history.history['loss'],
               color=colors[n], label='Train '+label)
  plt.semilogy(history.epoch, history.history['val_loss'],
          color=colors[n], label='Val '+label,
          linestyle="--")
  plt.xlabel('Epoch')
  plt.ylabel('Loss')

  plt.legend()
plot_loss(zero_bias_history, "Zero Bias", 0)
plot_loss(careful_bias_history, "Careful Bias", 1)

png

위의 그림에서 명확히 알 수 있듯이, 검증 손실 측면에서 이와 같은 정교한 초기화에는 분명한 이점이 있습니다.

모델 학습

model = make_model()
model.load_weights(initial_weights)
baseline_history = model.fit(
    train_features,
    train_labels,
    batch_size=BATCH_SIZE,
    epochs=EPOCHS,
    callbacks=[early_stopping],
    validation_data=(val_features, val_labels))
Epoch 1/100
90/90 [==============================] - 3s 14ms/step - loss: 0.0119 - tp: 144.0000 - fp: 49.0000 - tn: 227398.0000 - fn: 254.0000 - accuracy: 0.9987 - precision: 0.7461 - recall: 0.3618 - auc: 0.7577 - prc: 0.3668 - val_loss: 0.0063 - val_tp: 27.0000 - val_fp: 7.0000 - val_tn: 45477.0000 - val_fn: 58.0000 - val_accuracy: 0.9986 - val_precision: 0.7941 - val_recall: 0.3176 - val_auc: 0.8993 - val_prc: 0.6408
Epoch 2/100
90/90 [==============================] - 1s 6ms/step - loss: 0.0080 - tp: 122.0000 - fp: 31.0000 - tn: 181932.0000 - fn: 191.0000 - accuracy: 0.9988 - precision: 0.7974 - recall: 0.3898 - auc: 0.8251 - prc: 0.4363 - val_loss: 0.0049 - val_tp: 40.0000 - val_fp: 8.0000 - val_tn: 45476.0000 - val_fn: 45.0000 - val_accuracy: 0.9988 - val_precision: 0.8333 - val_recall: 0.4706 - val_auc: 0.9233 - val_prc: 0.7260
Epoch 3/100
90/90 [==============================] - 1s 6ms/step - loss: 0.0072 - tp: 141.0000 - fp: 26.0000 - tn: 181937.0000 - fn: 172.0000 - accuracy: 0.9989 - precision: 0.8443 - recall: 0.4505 - auc: 0.8428 - prc: 0.5110 - val_loss: 0.0043 - val_tp: 48.0000 - val_fp: 8.0000 - val_tn: 45476.0000 - val_fn: 37.0000 - val_accuracy: 0.9990 - val_precision: 0.8571 - val_recall: 0.5647 - val_auc: 0.9292 - val_prc: 0.7555
Epoch 4/100
90/90 [==============================] - 1s 6ms/step - loss: 0.0063 - tp: 152.0000 - fp: 32.0000 - tn: 181931.0000 - fn: 161.0000 - accuracy: 0.9989 - precision: 0.8261 - recall: 0.4856 - auc: 0.8600 - prc: 0.5586 - val_loss: 0.0040 - val_tp: 53.0000 - val_fp: 8.0000 - val_tn: 45476.0000 - val_fn: 32.0000 - val_accuracy: 0.9991 - val_precision: 0.8689 - val_recall: 0.6235 - val_auc: 0.9293 - val_prc: 0.7649
Epoch 5/100
90/90 [==============================] - 1s 6ms/step - loss: 0.0060 - tp: 159.0000 - fp: 27.0000 - tn: 181936.0000 - fn: 154.0000 - accuracy: 0.9990 - precision: 0.8548 - recall: 0.5080 - auc: 0.8849 - prc: 0.6044 - val_loss: 0.0038 - val_tp: 57.0000 - val_fp: 10.0000 - val_tn: 45474.0000 - val_fn: 28.0000 - val_accuracy: 0.9992 - val_precision: 0.8507 - val_recall: 0.6706 - val_auc: 0.9292 - val_prc: 0.7664
Epoch 6/100
90/90 [==============================] - 1s 6ms/step - loss: 0.0053 - tp: 165.0000 - fp: 26.0000 - tn: 181937.0000 - fn: 148.0000 - accuracy: 0.9990 - precision: 0.8639 - recall: 0.5272 - auc: 0.8947 - prc: 0.6455 - val_loss: 0.0037 - val_tp: 61.0000 - val_fp: 12.0000 - val_tn: 45472.0000 - val_fn: 24.0000 - val_accuracy: 0.9992 - val_precision: 0.8356 - val_recall: 0.7176 - val_auc: 0.9351 - val_prc: 0.7740
Epoch 7/100
90/90 [==============================] - 1s 6ms/step - loss: 0.0050 - tp: 167.0000 - fp: 27.0000 - tn: 181936.0000 - fn: 146.0000 - accuracy: 0.9991 - precision: 0.8608 - recall: 0.5335 - auc: 0.8980 - prc: 0.6629 - val_loss: 0.0036 - val_tp: 62.0000 - val_fp: 11.0000 - val_tn: 45473.0000 - val_fn: 23.0000 - val_accuracy: 0.9993 - val_precision: 0.8493 - val_recall: 0.7294 - val_auc: 0.9351 - val_prc: 0.7779
Epoch 8/100
90/90 [==============================] - 1s 6ms/step - loss: 0.0053 - tp: 157.0000 - fp: 26.0000 - tn: 181937.0000 - fn: 156.0000 - accuracy: 0.9990 - precision: 0.8579 - recall: 0.5016 - auc: 0.8853 - prc: 0.6545 - val_loss: 0.0035 - val_tp: 63.0000 - val_fp: 12.0000 - val_tn: 45472.0000 - val_fn: 22.0000 - val_accuracy: 0.9993 - val_precision: 0.8400 - val_recall: 0.7412 - val_auc: 0.9351 - val_prc: 0.7758
Epoch 9/100
90/90 [==============================] - 1s 6ms/step - loss: 0.0052 - tp: 161.0000 - fp: 29.0000 - tn: 181934.0000 - fn: 152.0000 - accuracy: 0.9990 - precision: 0.8474 - recall: 0.5144 - auc: 0.8901 - prc: 0.6550 - val_loss: 0.0034 - val_tp: 62.0000 - val_fp: 10.0000 - val_tn: 45474.0000 - val_fn: 23.0000 - val_accuracy: 0.9993 - val_precision: 0.8611 - val_recall: 0.7294 - val_auc: 0.9351 - val_prc: 0.7911
Epoch 10/100
90/90 [==============================] - 1s 6ms/step - loss: 0.0050 - tp: 167.0000 - fp: 32.0000 - tn: 181931.0000 - fn: 146.0000 - accuracy: 0.9990 - precision: 0.8392 - recall: 0.5335 - auc: 0.9095 - prc: 0.6694 - val_loss: 0.0033 - val_tp: 62.0000 - val_fp: 9.0000 - val_tn: 45475.0000 - val_fn: 23.0000 - val_accuracy: 0.9993 - val_precision: 0.8732 - val_recall: 0.7294 - val_auc: 0.9410 - val_prc: 0.8032
Epoch 11/100
90/90 [==============================] - 1s 6ms/step - loss: 0.0051 - tp: 168.0000 - fp: 25.0000 - tn: 181938.0000 - fn: 145.0000 - accuracy: 0.9991 - precision: 0.8705 - recall: 0.5367 - auc: 0.8886 - prc: 0.6526 - val_loss: 0.0033 - val_tp: 62.0000 - val_fp: 9.0000 - val_tn: 45475.0000 - val_fn: 23.0000 - val_accuracy: 0.9993 - val_precision: 0.8732 - val_recall: 0.7294 - val_auc: 0.9409 - val_prc: 0.7966
Epoch 12/100
90/90 [==============================] - 1s 6ms/step - loss: 0.0051 - tp: 163.0000 - fp: 26.0000 - tn: 181937.0000 - fn: 150.0000 - accuracy: 0.9990 - precision: 0.8624 - recall: 0.5208 - auc: 0.8870 - prc: 0.6505 - val_loss: 0.0033 - val_tp: 63.0000 - val_fp: 9.0000 - val_tn: 45475.0000 - val_fn: 22.0000 - val_accuracy: 0.9993 - val_precision: 0.8750 - val_recall: 0.7412 - val_auc: 0.9409 - val_prc: 0.7985
Epoch 13/100
90/90 [==============================] - 1s 6ms/step - loss: 0.0047 - tp: 171.0000 - fp: 23.0000 - tn: 181940.0000 - fn: 142.0000 - accuracy: 0.9991 - precision: 0.8814 - recall: 0.5463 - auc: 0.9015 - prc: 0.6862 - val_loss: 0.0032 - val_tp: 65.0000 - val_fp: 9.0000 - val_tn: 45475.0000 - val_fn: 20.0000 - val_accuracy: 0.9994 - val_precision: 0.8784 - val_recall: 0.7647 - val_auc: 0.9409 - val_prc: 0.7963
Epoch 14/100
90/90 [==============================] - 1s 6ms/step - loss: 0.0045 - tp: 174.0000 - fp: 24.0000 - tn: 181939.0000 - fn: 139.0000 - accuracy: 0.9991 - precision: 0.8788 - recall: 0.5559 - auc: 0.9128 - prc: 0.7033 - val_loss: 0.0031 - val_tp: 65.0000 - val_fp: 10.0000 - val_tn: 45474.0000 - val_fn: 20.0000 - val_accuracy: 0.9993 - val_precision: 0.8667 - val_recall: 0.7647 - val_auc: 0.9409 - val_prc: 0.8120
Epoch 15/100
90/90 [==============================] - 1s 6ms/step - loss: 0.0042 - tp: 183.0000 - fp: 21.0000 - tn: 181942.0000 - fn: 130.0000 - accuracy: 0.9992 - precision: 0.8971 - recall: 0.5847 - auc: 0.8999 - prc: 0.7092 - val_loss: 0.0031 - val_tp: 66.0000 - val_fp: 10.0000 - val_tn: 45474.0000 - val_fn: 19.0000 - val_accuracy: 0.9994 - val_precision: 0.8684 - val_recall: 0.7765 - val_auc: 0.9409 - val_prc: 0.8082
Epoch 16/100
90/90 [==============================] - 1s 6ms/step - loss: 0.0047 - tp: 181.0000 - fp: 30.0000 - tn: 181933.0000 - fn: 132.0000 - accuracy: 0.9991 - precision: 0.8578 - recall: 0.5783 - auc: 0.8984 - prc: 0.6598 - val_loss: 0.0031 - val_tp: 65.0000 - val_fp: 10.0000 - val_tn: 45474.0000 - val_fn: 20.0000 - val_accuracy: 0.9993 - val_precision: 0.8667 - val_recall: 0.7647 - val_auc: 0.9409 - val_prc: 0.8092
Epoch 17/100
90/90 [==============================] - 1s 6ms/step - loss: 0.0047 - tp: 172.0000 - fp: 30.0000 - tn: 181933.0000 - fn: 141.0000 - accuracy: 0.9991 - precision: 0.8515 - recall: 0.5495 - auc: 0.9015 - prc: 0.6626 - val_loss: 0.0031 - val_tp: 63.0000 - val_fp: 9.0000 - val_tn: 45475.0000 - val_fn: 22.0000 - val_accuracy: 0.9993 - val_precision: 0.8750 - val_recall: 0.7412 - val_auc: 0.9409 - val_prc: 0.8096
Epoch 18/100
90/90 [==============================] - 1s 6ms/step - loss: 0.0049 - tp: 158.0000 - fp: 19.0000 - tn: 181944.0000 - fn: 155.0000 - accuracy: 0.9990 - precision: 0.8927 - recall: 0.5048 - auc: 0.8839 - prc: 0.6544 - val_loss: 0.0032 - val_tp: 52.0000 - val_fp: 8.0000 - val_tn: 45476.0000 - val_fn: 33.0000 - val_accuracy: 0.9991 - val_precision: 0.8667 - val_recall: 0.6118 - val_auc: 0.9410 - val_prc: 0.8229
Epoch 19/100
90/90 [==============================] - 1s 6ms/step - loss: 0.0046 - tp: 153.0000 - fp: 19.0000 - tn: 181944.0000 - fn: 160.0000 - accuracy: 0.9990 - precision: 0.8895 - recall: 0.4888 - auc: 0.9016 - prc: 0.6858 - val_loss: 0.0031 - val_tp: 57.0000 - val_fp: 8.0000 - val_tn: 45476.0000 - val_fn: 28.0000 - val_accuracy: 0.9992 - val_precision: 0.8769 - val_recall: 0.6706 - val_auc: 0.9410 - val_prc: 0.8210
Epoch 20/100
90/90 [==============================] - 1s 6ms/step - loss: 0.0046 - tp: 167.0000 - fp: 19.0000 - tn: 181944.0000 - fn: 146.0000 - accuracy: 0.9991 - precision: 0.8978 - recall: 0.5335 - auc: 0.9048 - prc: 0.6831 - val_loss: 0.0030 - val_tp: 64.0000 - val_fp: 10.0000 - val_tn: 45474.0000 - val_fn: 21.0000 - val_accuracy: 0.9993 - val_precision: 0.8649 - val_recall: 0.7529 - val_auc: 0.9410 - val_prc: 0.8190
Epoch 21/100
90/90 [==============================] - 1s 6ms/step - loss: 0.0047 - tp: 166.0000 - fp: 17.0000 - tn: 181946.0000 - fn: 147.0000 - accuracy: 0.9991 - precision: 0.9071 - recall: 0.5304 - auc: 0.8839 - prc: 0.6687 - val_loss: 0.0030 - val_tp: 66.0000 - val_fp: 10.0000 - val_tn: 45474.0000 - val_fn: 19.0000 - val_accuracy: 0.9994 - val_precision: 0.8684 - val_recall: 0.7765 - val_auc: 0.9409 - val_prc: 0.8193
Epoch 22/100
90/90 [==============================] - 1s 6ms/step - loss: 0.0046 - tp: 174.0000 - fp: 19.0000 - tn: 181944.0000 - fn: 139.0000 - accuracy: 0.9991 - precision: 0.9016 - recall: 0.5559 - auc: 0.9015 - prc: 0.6660 - val_loss: 0.0030 - val_tp: 60.0000 - val_fp: 9.0000 - val_tn: 45475.0000 - val_fn: 25.0000 - val_accuracy: 0.9993 - val_precision: 0.8696 - val_recall: 0.7059 - val_auc: 0.9409 - val_prc: 0.8208
Epoch 23/100
90/90 [==============================] - 1s 6ms/step - loss: 0.0044 - tp: 175.0000 - fp: 30.0000 - tn: 181933.0000 - fn: 138.0000 - accuracy: 0.9991 - precision: 0.8537 - recall: 0.5591 - auc: 0.9032 - prc: 0.6874 - val_loss: 0.0030 - val_tp: 60.0000 - val_fp: 8.0000 - val_tn: 45476.0000 - val_fn: 25.0000 - val_accuracy: 0.9993 - val_precision: 0.8824 - val_recall: 0.7059 - val_auc: 0.9409 - val_prc: 0.8217
Epoch 24/100
90/90 [==============================] - 1s 6ms/step - loss: 0.0048 - tp: 162.0000 - fp: 23.0000 - tn: 181940.0000 - fn: 151.0000 - accuracy: 0.9990 - precision: 0.8757 - recall: 0.5176 - auc: 0.8935 - prc: 0.6440 - val_loss: 0.0030 - val_tp: 59.0000 - val_fp: 8.0000 - val_tn: 45476.0000 - val_fn: 26.0000 - val_accuracy: 0.9993 - val_precision: 0.8806 - val_recall: 0.6941 - val_auc: 0.9410 - val_prc: 0.8234
Epoch 25/100
90/90 [==============================] - 1s 6ms/step - loss: 0.0044 - tp: 160.0000 - fp: 25.0000 - tn: 181938.0000 - fn: 153.0000 - accuracy: 0.9990 - precision: 0.8649 - recall: 0.5112 - auc: 0.9080 - prc: 0.6881 - val_loss: 0.0030 - val_tp: 61.0000 - val_fp: 8.0000 - val_tn: 45476.0000 - val_fn: 24.0000 - val_accuracy: 0.9993 - val_precision: 0.8841 - val_recall: 0.7176 - val_auc: 0.9409 - val_prc: 0.8223
Epoch 26/100
90/90 [==============================] - 1s 6ms/step - loss: 0.0043 - tp: 168.0000 - fp: 26.0000 - tn: 181937.0000 - fn: 145.0000 - accuracy: 0.9991 - precision: 0.8660 - recall: 0.5367 - auc: 0.9209 - prc: 0.7015 - val_loss: 0.0030 - val_tp: 60.0000 - val_fp: 8.0000 - val_tn: 45476.0000 - val_fn: 25.0000 - val_accuracy: 0.9993 - val_precision: 0.8824 - val_recall: 0.7059 - val_auc: 0.9410 - val_prc: 0.8223
Epoch 27/100
90/90 [==============================] - 1s 6ms/step - loss: 0.0044 - tp: 166.0000 - fp: 20.0000 - tn: 181943.0000 - fn: 147.0000 - accuracy: 0.9991 - precision: 0.8925 - recall: 0.5304 - auc: 0.8904 - prc: 0.6831 - val_loss: 0.0030 - val_tp: 63.0000 - val_fp: 8.0000 - val_tn: 45476.0000 - val_fn: 22.0000 - val_accuracy: 0.9993 - val_precision: 0.8873 - val_recall: 0.7412 - val_auc: 0.9410 - val_prc: 0.8216
Epoch 28/100
90/90 [==============================] - 1s 6ms/step - loss: 0.0047 - tp: 161.0000 - fp: 24.0000 - tn: 181939.0000 - fn: 152.0000 - accuracy: 0.9990 - precision: 0.8703 - recall: 0.5144 - auc: 0.8904 - prc: 0.6629 - val_loss: 0.0030 - val_tp: 62.0000 - val_fp: 8.0000 - val_tn: 45476.0000 - val_fn: 23.0000 - val_accuracy: 0.9993 - val_precision: 0.8857 - val_recall: 0.7294 - val_auc: 0.9409 - val_prc: 0.8222
Restoring model weights from the end of the best epoch.
Epoch 00028: early stopping

학습 이력 확인

이 섹션에서는 훈련 및 검증 세트에서 모델의 정확도 및 손실에 대한 플롯을 생성합니다. 이는 과대적합 확인에 유용하며 과대적합 및 과소적합 튜토리얼에서 자세히 알아볼 수 있습니다.

또한, 위에서 만든 모든 메트릭에 대해 다음과 같은 플롯을 생성할 수 있습니다. 거짓 음성이 예시에 포함되어 있습니다.

def plot_metrics(history):
  metrics = ['loss', 'auc', 'precision', 'recall']
  for n, metric in enumerate(metrics):
    name = metric.replace("_"," ").capitalize()
    plt.subplot(2,2,n+1)
    plt.plot(history.epoch, history.history[metric], color=colors[0], label='Train')
    plt.plot(history.epoch, history.history['val_'+metric],
             color=colors[0], linestyle="--", label='Val')
    plt.xlabel('Epoch')
    plt.ylabel(name)
    if metric == 'loss':
      plt.ylim([0, plt.ylim()[1]])
    elif metric == 'auc':
      plt.ylim([0.8,1])
    else:
      plt.ylim([0,1])

    plt.legend()
plot_metrics(baseline_history)

png

참고: 검증 곡선은 일반적으로 훈련 곡선보다 성능이 좋습니다. 이는 주로 모델을 평가할 때 drop out 레이어가 활성화 되지 않았기 때문에 발생합니다.

메트릭 평가

혼동 행렬을 사용하여 실제 레이블과 예측 레이블을 요약할 수 있습니다. 여기서 X축은 예측 레이블이고 Y축은 실제 레이블입니다.

train_predictions_baseline = model.predict(train_features, batch_size=BATCH_SIZE)
test_predictions_baseline = model.predict(test_features, batch_size=BATCH_SIZE)
def plot_cm(labels, predictions, p=0.5):
  cm = confusion_matrix(labels, predictions > p)
  plt.figure(figsize=(5,5))
  sns.heatmap(cm, annot=True, fmt="d")
  plt.title('Confusion matrix @{:.2f}'.format(p))
  plt.ylabel('Actual label')
  plt.xlabel('Predicted label')

  print('Legitimate Transactions Detected (True Negatives): ', cm[0][0])
  print('Legitimate Transactions Incorrectly Detected (False Positives): ', cm[0][1])
  print('Fraudulent Transactions Missed (False Negatives): ', cm[1][0])
  print('Fraudulent Transactions Detected (True Positives): ', cm[1][1])
  print('Total Fraudulent Transactions: ', np.sum(cm[1]))

테스트 데이터세트에서 모델을 평가하고 위에서 생성한 메트릭 결과를 표시합니다.

baseline_results = model.evaluate(test_features, test_labels,
                                  batch_size=BATCH_SIZE, verbose=0)
for name, value in zip(model.metrics_names, baseline_results):
  print(name, ': ', value)
print()

plot_cm(test_labels, test_predictions_baseline)
loss :  0.003450451185926795
tp :  60.0
fp :  9.0
tn :  56859.0
fn :  34.0
accuracy :  0.9992451071739197
precision :  0.8695651888847351
recall :  0.6382978558540344
auc :  0.9199696779251099
prc :  0.77667635679245

Legitimate Transactions Detected (True Negatives):  56859
Legitimate Transactions Incorrectly Detected (False Positives):  9
Fraudulent Transactions Missed (False Negatives):  34
Fraudulent Transactions Detected (True Positives):  60
Total Fraudulent Transactions:  94

png

만약 모델이 모두 완벽하게 예측했다면 대각행렬이 되어 예측 오류를 보여주며 대각선 값은 0이 됩니다. 이와 같은 경우, 매트릭에 거짓 양성이 상대적으로 낮음을 확인할 수 있으며 이를 통해 플래그가 잘못 지정된 합법적인 거래가 상대적으로 적다는 것을 알 수 있습니다. 그러나 거짓 양성 수를 늘리더라도 거짓 음성을 더 낮추고 싶을 수 있습니다. 거짓 음성은 부정 거래가 발생할 수 있지만, 거짓 양성은 고객에게 이메일을 보내 카드 활동 확인을 요청할 수 있기 때문에 거짓 음성을 낮추는 것이 더 바람직할 수 있기 때문입니다.

ROC 플로팅

이제 ROC을 플로팅 하십시오. 이 그래프는 출력 임계값을 조정하기만 해도 모델이 도달할 수 있는 성능 범위를 한눈에 보여주기 때문에 유용합니다.

def plot_roc(name, labels, predictions, **kwargs):
  fp, tp, _ = sklearn.metrics.roc_curve(labels, predictions)

  plt.plot(100*fp, 100*tp, label=name, linewidth=2, **kwargs)
  plt.xlabel('False positives [%]')
  plt.ylabel('True positives [%]')
  plt.xlim([-0.5,20])
  plt.ylim([80,100.5])
  plt.grid(True)
  ax = plt.gca()
  ax.set_aspect('equal')
plot_roc("Train Baseline", train_labels, train_predictions_baseline, color=colors[0])
plot_roc("Test Baseline", test_labels, test_predictions_baseline, color=colors[0], linestyle='--')
plt.legend(loc='lower right')
<matplotlib.legend.Legend at 0x7ff9fc3f8e90>

png

AUPRC 플로팅

Now plot the AUPRC. Area under the interpolated precision-recall curve, obtained by plotting (recall, precision) points for different values of the classification threshold. Depending on how it's calculated, PR AUC may be equivalent to the average precision of the model.

def plot_prc(name, labels, predictions, **kwargs):
    precision, recall, _ = sklearn.metrics.precision_recall_curve(labels, predictions)

    plt.plot(precision, recall, label=name, linewidth=2, **kwargs)
    plt.xlabel('Recall')
    plt.ylabel('Precision')
    plt.grid(True)
    ax = plt.gca()
    ax.set_aspect('equal')
plot_prc("Train Baseline", train_labels, train_predictions_baseline, color=colors[0])
plot_prc("Test Baseline", test_labels, test_predictions_baseline, color=colors[0], linestyle='--')
plt.legend(loc='lower right')
<matplotlib.legend.Legend at 0x7ff9fc2aec90>

png

정밀도가 비교적 높은 것 같지만 재현율과 ROC 곡선(AUC) 아래 면적이 높지 않습니다. 분류자가 정밀도와 재현율 모두를 최대화하려고 하면 종종 어려움에 직면하는데, 불균형 데이터세트로 작업할 때 특히 그렇습니다. 관심있는 문제의 맥락에서 다른 유형의 오류 비용을 고려하는 것이 중요합니다. 이 예시에서 거짓 음성(부정 거래를 놓친 경우)은 금전적 비용이 들 수 있지만 , 거짓 양성(거래가 사기 행위로 잘못 표시됨)은 사용자 만족도를 감소시킬 수 있습니다.

클래스 가중치

클래스 가중치 계산

목표는 부정 거래를 식별하는 것이지만, 작업할 수 있는 양성 샘플이 많지 않지 않기 때문에 분류자는 이용 가능한 몇 가지 예에 가중치를 두고자 할 것입니다. 매개 변수를 통해 각 클래스에 대한 Keras 가중치를 전달한다면 이 작업을 수행할 수 있습니다. 이로 인해 모델은 더 적은 클래스 예시에 "더 많은 관심을 기울일" 수 있습니다.

# Scaling by total/2 helps keep the loss to a similar magnitude.
# The sum of the weights of all examples stays the same.
weight_for_0 = (1 / neg) * (total / 2.0)
weight_for_1 = (1 / pos) * (total / 2.0)

class_weight = {0: weight_for_0, 1: weight_for_1}

print('Weight for class 0: {:.2f}'.format(weight_for_0))
print('Weight for class 1: {:.2f}'.format(weight_for_1))
Weight for class 0: 0.50
Weight for class 1: 289.44

클래스 가중치로 모델 교육

이제 해당 모델이 예측에 어떤 영향을 미치는지 확인하기 위하여 클래스 가중치로 모델을 재 교육하고 평가해 보십시오.

참고: class_weights를 사용하면 손실 범위가 변경됩니다. 이는 옵티마이저에 따라 훈련의 안정성에 영향을 미칠 수 있습니다. tf.keras.optimizers.SGD와 같이 단계 크기가 그래디언트의 크기에 따라 달라지는 옵티마이저는 실패할 수 있습니다. 여기서 사용된 옵티마이저인 tf.keras.optimizers.Adam은 스케일링 변경의 영향을 받지 않습니다. 또한, 가중치로 인해 총 손실은 두 모델 간에 비교할 수 없습니다.

weighted_model = make_model()
weighted_model.load_weights(initial_weights)

weighted_history = weighted_model.fit(
    train_features,
    train_labels,
    batch_size=BATCH_SIZE,
    epochs=EPOCHS,
    callbacks=[early_stopping],
    validation_data=(val_features, val_labels),
    # The class weights go here
    class_weight=class_weight)
Epoch 1/100
90/90 [==============================] - 3s 14ms/step - loss: 2.4293 - tp: 140.0000 - fp: 175.0000 - tn: 238656.0000 - fn: 267.0000 - accuracy: 0.9982 - precision: 0.4444 - recall: 0.3440 - auc: 0.7575 - prc: 0.2604 - val_loss: 0.0065 - val_tp: 48.0000 - val_fp: 10.0000 - val_tn: 45474.0000 - val_fn: 37.0000 - val_accuracy: 0.9990 - val_precision: 0.8276 - val_recall: 0.5647 - val_auc: 0.9214 - val_prc: 0.6476
Epoch 2/100
90/90 [==============================] - 1s 6ms/step - loss: 1.2689 - tp: 165.0000 - fp: 337.0000 - tn: 181626.0000 - fn: 148.0000 - accuracy: 0.9973 - precision: 0.3287 - recall: 0.5272 - auc: 0.8348 - prc: 0.4089 - val_loss: 0.0073 - val_tp: 67.0000 - val_fp: 13.0000 - val_tn: 45471.0000 - val_fn: 18.0000 - val_accuracy: 0.9993 - val_precision: 0.8375 - val_recall: 0.7882 - val_auc: 0.9319 - val_prc: 0.6983
Epoch 3/100
90/90 [==============================] - 1s 6ms/step - loss: 1.0040 - tp: 186.0000 - fp: 588.0000 - tn: 181375.0000 - fn: 127.0000 - accuracy: 0.9961 - precision: 0.2403 - recall: 0.5942 - auc: 0.8646 - prc: 0.4305 - val_loss: 0.0091 - val_tp: 70.0000 - val_fp: 18.0000 - val_tn: 45466.0000 - val_fn: 15.0000 - val_accuracy: 0.9993 - val_precision: 0.7955 - val_recall: 0.8235 - val_auc: 0.9401 - val_prc: 0.7173
Epoch 4/100
90/90 [==============================] - 1s 6ms/step - loss: 0.8617 - tp: 205.0000 - fp: 853.0000 - tn: 181110.0000 - fn: 108.0000 - accuracy: 0.9947 - precision: 0.1938 - recall: 0.6550 - auc: 0.8763 - prc: 0.4641 - val_loss: 0.0111 - val_tp: 72.0000 - val_fp: 18.0000 - val_tn: 45466.0000 - val_fn: 13.0000 - val_accuracy: 0.9993 - val_precision: 0.8000 - val_recall: 0.8471 - val_auc: 0.9370 - val_prc: 0.7207
Epoch 5/100
90/90 [==============================] - 1s 6ms/step - loss: 0.7014 - tp: 221.0000 - fp: 1193.0000 - tn: 180770.0000 - fn: 92.0000 - accuracy: 0.9930 - precision: 0.1563 - recall: 0.7061 - auc: 0.8947 - prc: 0.4586 - val_loss: 0.0134 - val_tp: 73.0000 - val_fp: 23.0000 - val_tn: 45461.0000 - val_fn: 12.0000 - val_accuracy: 0.9992 - val_precision: 0.7604 - val_recall: 0.8588 - val_auc: 0.9335 - val_prc: 0.7322
Epoch 6/100
90/90 [==============================] - 1s 6ms/step - loss: 0.6461 - tp: 224.0000 - fp: 1677.0000 - tn: 180286.0000 - fn: 89.0000 - accuracy: 0.9903 - precision: 0.1178 - recall: 0.7157 - auc: 0.8966 - prc: 0.4675 - val_loss: 0.0162 - val_tp: 73.0000 - val_fp: 32.0000 - val_tn: 45452.0000 - val_fn: 12.0000 - val_accuracy: 0.9990 - val_precision: 0.6952 - val_recall: 0.8588 - val_auc: 0.9412 - val_prc: 0.7423
Epoch 7/100
90/90 [==============================] - 1s 6ms/step - loss: 0.5967 - tp: 235.0000 - fp: 2210.0000 - tn: 179753.0000 - fn: 78.0000 - accuracy: 0.9874 - precision: 0.0961 - recall: 0.7508 - auc: 0.8965 - prc: 0.4573 - val_loss: 0.0191 - val_tp: 74.0000 - val_fp: 51.0000 - val_tn: 45433.0000 - val_fn: 11.0000 - val_accuracy: 0.9986 - val_precision: 0.5920 - val_recall: 0.8706 - val_auc: 0.9476 - val_prc: 0.7464
Epoch 8/100
90/90 [==============================] - 1s 6ms/step - loss: 0.5490 - tp: 241.0000 - fp: 2695.0000 - tn: 179268.0000 - fn: 72.0000 - accuracy: 0.9848 - precision: 0.0821 - recall: 0.7700 - auc: 0.9069 - prc: 0.4449 - val_loss: 0.0231 - val_tp: 75.0000 - val_fp: 99.0000 - val_tn: 45385.0000 - val_fn: 10.0000 - val_accuracy: 0.9976 - val_precision: 0.4310 - val_recall: 0.8824 - val_auc: 0.9564 - val_prc: 0.7422
Epoch 9/100
90/90 [==============================] - 1s 6ms/step - loss: 0.5640 - tp: 238.0000 - fp: 3428.0000 - tn: 178535.0000 - fn: 75.0000 - accuracy: 0.9808 - precision: 0.0649 - recall: 0.7604 - auc: 0.9041 - prc: 0.3522 - val_loss: 0.0274 - val_tp: 75.0000 - val_fp: 148.0000 - val_tn: 45336.0000 - val_fn: 10.0000 - val_accuracy: 0.9965 - val_precision: 0.3363 - val_recall: 0.8824 - val_auc: 0.9559 - val_prc: 0.7441
Epoch 10/100
90/90 [==============================] - 1s 6ms/step - loss: 0.4189 - tp: 257.0000 - fp: 3817.0000 - tn: 178146.0000 - fn: 56.0000 - accuracy: 0.9788 - precision: 0.0631 - recall: 0.8211 - auc: 0.9283 - prc: 0.3663 - val_loss: 0.0309 - val_tp: 75.0000 - val_fp: 189.0000 - val_tn: 45295.0000 - val_fn: 10.0000 - val_accuracy: 0.9956 - val_precision: 0.2841 - val_recall: 0.8824 - val_auc: 0.9563 - val_prc: 0.7382
Epoch 11/100
90/90 [==============================] - 1s 6ms/step - loss: 0.4576 - tp: 249.0000 - fp: 4326.0000 - tn: 177637.0000 - fn: 64.0000 - accuracy: 0.9759 - precision: 0.0544 - recall: 0.7955 - auc: 0.9192 - prc: 0.3173 - val_loss: 0.0352 - val_tp: 75.0000 - val_fp: 239.0000 - val_tn: 45245.0000 - val_fn: 10.0000 - val_accuracy: 0.9945 - val_precision: 0.2389 - val_recall: 0.8824 - val_auc: 0.9622 - val_prc: 0.7301
Epoch 12/100
90/90 [==============================] - 1s 6ms/step - loss: 0.4404 - tp: 254.0000 - fp: 4855.0000 - tn: 177108.0000 - fn: 59.0000 - accuracy: 0.9730 - precision: 0.0497 - recall: 0.8115 - auc: 0.9158 - prc: 0.3063 - val_loss: 0.0391 - val_tp: 75.0000 - val_fp: 279.0000 - val_tn: 45205.0000 - val_fn: 10.0000 - val_accuracy: 0.9937 - val_precision: 0.2119 - val_recall: 0.8824 - val_auc: 0.9658 - val_prc: 0.7065
Epoch 13/100
90/90 [==============================] - 1s 6ms/step - loss: 0.4016 - tp: 263.0000 - fp: 5147.0000 - tn: 176816.0000 - fn: 50.0000 - accuracy: 0.9715 - precision: 0.0486 - recall: 0.8403 - auc: 0.9272 - prc: 0.2966 - val_loss: 0.0420 - val_tp: 75.0000 - val_fp: 325.0000 - val_tn: 45159.0000 - val_fn: 10.0000 - val_accuracy: 0.9926 - val_precision: 0.1875 - val_recall: 0.8824 - val_auc: 0.9660 - val_prc: 0.7221
Epoch 14/100
90/90 [==============================] - 1s 6ms/step - loss: 0.3577 - tp: 270.0000 - fp: 5094.0000 - tn: 176869.0000 - fn: 43.0000 - accuracy: 0.9718 - precision: 0.0503 - recall: 0.8626 - auc: 0.9347 - prc: 0.3200 - val_loss: 0.0423 - val_tp: 75.0000 - val_fp: 328.0000 - val_tn: 45156.0000 - val_fn: 10.0000 - val_accuracy: 0.9926 - val_precision: 0.1861 - val_recall: 0.8824 - val_auc: 0.9685 - val_prc: 0.7219
Epoch 15/100
90/90 [==============================] - 1s 6ms/step - loss: 0.3748 - tp: 262.0000 - fp: 5345.0000 - tn: 176618.0000 - fn: 51.0000 - accuracy: 0.9704 - precision: 0.0467 - recall: 0.8371 - auc: 0.9324 - prc: 0.2965 - val_loss: 0.0450 - val_tp: 75.0000 - val_fp: 368.0000 - val_tn: 45116.0000 - val_fn: 10.0000 - val_accuracy: 0.9917 - val_precision: 0.1693 - val_recall: 0.8824 - val_auc: 0.9694 - val_prc: 0.7221
Epoch 16/100
90/90 [==============================] - 1s 6ms/step - loss: 0.3665 - tp: 266.0000 - fp: 5481.0000 - tn: 176482.0000 - fn: 47.0000 - accuracy: 0.9697 - precision: 0.0463 - recall: 0.8498 - auc: 0.9321 - prc: 0.2964 - val_loss: 0.0477 - val_tp: 75.0000 - val_fp: 406.0000 - val_tn: 45078.0000 - val_fn: 10.0000 - val_accuracy: 0.9909 - val_precision: 0.1559 - val_recall: 0.8824 - val_auc: 0.9698 - val_prc: 0.6993
Epoch 17/100
90/90 [==============================] - 1s 6ms/step - loss: 0.3555 - tp: 265.0000 - fp: 5621.0000 - tn: 176342.0000 - fn: 48.0000 - accuracy: 0.9689 - precision: 0.0450 - recall: 0.8466 - auc: 0.9359 - prc: 0.2761 - val_loss: 0.0518 - val_tp: 75.0000 - val_fp: 474.0000 - val_tn: 45010.0000 - val_fn: 10.0000 - val_accuracy: 0.9894 - val_precision: 0.1366 - val_recall: 0.8824 - val_auc: 0.9723 - val_prc: 0.6728
Epoch 18/100
90/90 [==============================] - 1s 6ms/step - loss: 0.3410 - tp: 269.0000 - fp: 5786.0000 - tn: 176177.0000 - fn: 44.0000 - accuracy: 0.9680 - precision: 0.0444 - recall: 0.8594 - auc: 0.9369 - prc: 0.2622 - val_loss: 0.0538 - val_tp: 75.0000 - val_fp: 506.0000 - val_tn: 44978.0000 - val_fn: 10.0000 - val_accuracy: 0.9887 - val_precision: 0.1291 - val_recall: 0.8824 - val_auc: 0.9757 - val_prc: 0.6730
Epoch 19/100
90/90 [==============================] - 1s 6ms/step - loss: 0.3251 - tp: 265.0000 - fp: 5951.0000 - tn: 176012.0000 - fn: 48.0000 - accuracy: 0.9671 - precision: 0.0426 - recall: 0.8466 - auc: 0.9434 - prc: 0.2801 - val_loss: 0.0573 - val_tp: 75.0000 - val_fp: 550.0000 - val_tn: 44934.0000 - val_fn: 10.0000 - val_accuracy: 0.9877 - val_precision: 0.1200 - val_recall: 0.8824 - val_auc: 0.9760 - val_prc: 0.6599
Epoch 20/100
90/90 [==============================] - 1s 6ms/step - loss: 0.3349 - tp: 270.0000 - fp: 6076.0000 - tn: 175887.0000 - fn: 43.0000 - accuracy: 0.9664 - precision: 0.0425 - recall: 0.8626 - auc: 0.9346 - prc: 0.2738 - val_loss: 0.0591 - val_tp: 75.0000 - val_fp: 563.0000 - val_tn: 44921.0000 - val_fn: 10.0000 - val_accuracy: 0.9874 - val_precision: 0.1176 - val_recall: 0.8824 - val_auc: 0.9756 - val_prc: 0.6534
Epoch 21/100
90/90 [==============================] - 1s 6ms/step - loss: 0.3178 - tp: 269.0000 - fp: 6053.0000 - tn: 175910.0000 - fn: 44.0000 - accuracy: 0.9666 - precision: 0.0425 - recall: 0.8594 - auc: 0.9428 - prc: 0.2553 - val_loss: 0.0607 - val_tp: 76.0000 - val_fp: 595.0000 - val_tn: 44889.0000 - val_fn: 9.0000 - val_accuracy: 0.9867 - val_precision: 0.1133 - val_recall: 0.8941 - val_auc: 0.9773 - val_prc: 0.6539
Epoch 22/100
90/90 [==============================] - 1s 6ms/step - loss: 0.3422 - tp: 262.0000 - fp: 6095.0000 - tn: 175868.0000 - fn: 51.0000 - accuracy: 0.9663 - precision: 0.0412 - recall: 0.8371 - auc: 0.9365 - prc: 0.2578 - val_loss: 0.0621 - val_tp: 76.0000 - val_fp: 610.0000 - val_tn: 44874.0000 - val_fn: 9.0000 - val_accuracy: 0.9864 - val_precision: 0.1108 - val_recall: 0.8941 - val_auc: 0.9777 - val_prc: 0.6543
Epoch 23/100
90/90 [==============================] - 1s 6ms/step - loss: 0.2819 - tp: 270.0000 - fp: 6322.0000 - tn: 175641.0000 - fn: 43.0000 - accuracy: 0.9651 - precision: 0.0410 - recall: 0.8626 - auc: 0.9529 - prc: 0.2552 - val_loss: 0.0645 - val_tp: 76.0000 - val_fp: 636.0000 - val_tn: 44848.0000 - val_fn: 9.0000 - val_accuracy: 0.9858 - val_precision: 0.1067 - val_recall: 0.8941 - val_auc: 0.9795 - val_prc: 0.6420
Epoch 24/100
90/90 [==============================] - 1s 6ms/step - loss: 0.3015 - tp: 270.0000 - fp: 6538.0000 - tn: 175425.0000 - fn: 43.0000 - accuracy: 0.9639 - precision: 0.0397 - recall: 0.8626 - auc: 0.9492 - prc: 0.2461 - val_loss: 0.0659 - val_tp: 78.0000 - val_fp: 635.0000 - val_tn: 44849.0000 - val_fn: 7.0000 - val_accuracy: 0.9859 - val_precision: 0.1094 - val_recall: 0.9176 - val_auc: 0.9801 - val_prc: 0.6484
Epoch 25/100
90/90 [==============================] - 1s 6ms/step - loss: 0.2709 - tp: 272.0000 - fp: 6479.0000 - tn: 175484.0000 - fn: 41.0000 - accuracy: 0.9642 - precision: 0.0403 - recall: 0.8690 - auc: 0.9553 - prc: 0.2584 - val_loss: 0.0686 - val_tp: 78.0000 - val_fp: 671.0000 - val_tn: 44813.0000 - val_fn: 7.0000 - val_accuracy: 0.9851 - val_precision: 0.1041 - val_recall: 0.9176 - val_auc: 0.9792 - val_prc: 0.6126
Epoch 26/100
90/90 [==============================] - 1s 6ms/step - loss: 0.2666 - tp: 269.0000 - fp: 6707.0000 - tn: 175256.0000 - fn: 44.0000 - accuracy: 0.9630 - precision: 0.0386 - recall: 0.8594 - auc: 0.9624 - prc: 0.2594 - val_loss: 0.0743 - val_tp: 78.0000 - val_fp: 736.0000 - val_tn: 44748.0000 - val_fn: 7.0000 - val_accuracy: 0.9837 - val_precision: 0.0958 - val_recall: 0.9176 - val_auc: 0.9804 - val_prc: 0.5962
Epoch 27/100
90/90 [==============================] - 1s 6ms/step - loss: 0.2695 - tp: 276.0000 - fp: 6754.0000 - tn: 175209.0000 - fn: 37.0000 - accuracy: 0.9627 - precision: 0.0393 - recall: 0.8818 - auc: 0.9566 - prc: 0.2498 - val_loss: 0.0748 - val_tp: 78.0000 - val_fp: 739.0000 - val_tn: 44745.0000 - val_fn: 7.0000 - val_accuracy: 0.9836 - val_precision: 0.0955 - val_recall: 0.9176 - val_auc: 0.9803 - val_prc: 0.5961
Epoch 28/100
90/90 [==============================] - 1s 6ms/step - loss: 0.2404 - tp: 278.0000 - fp: 6839.0000 - tn: 175124.0000 - fn: 35.0000 - accuracy: 0.9623 - precision: 0.0391 - recall: 0.8882 - auc: 0.9640 - prc: 0.2434 - val_loss: 0.0746 - val_tp: 78.0000 - val_fp: 739.0000 - val_tn: 44745.0000 - val_fn: 7.0000 - val_accuracy: 0.9836 - val_precision: 0.0955 - val_recall: 0.9176 - val_auc: 0.9794 - val_prc: 0.6014
Epoch 29/100
90/90 [==============================] - 1s 7ms/step - loss: 0.2311 - tp: 278.0000 - fp: 6711.0000 - tn: 175252.0000 - fn: 35.0000 - accuracy: 0.9630 - precision: 0.0398 - recall: 0.8882 - auc: 0.9663 - prc: 0.2511 - val_loss: 0.0750 - val_tp: 78.0000 - val_fp: 743.0000 - val_tn: 44741.0000 - val_fn: 7.0000 - val_accuracy: 0.9835 - val_precision: 0.0950 - val_recall: 0.9176 - val_auc: 0.9793 - val_prc: 0.5962
Epoch 30/100
90/90 [==============================] - 1s 6ms/step - loss: 0.2593 - tp: 273.0000 - fp: 6607.0000 - tn: 175356.0000 - fn: 40.0000 - accuracy: 0.9635 - precision: 0.0397 - recall: 0.8722 - auc: 0.9607 - prc: 0.2507 - val_loss: 0.0788 - val_tp: 78.0000 - val_fp: 782.0000 - val_tn: 44702.0000 - val_fn: 7.0000 - val_accuracy: 0.9827 - val_precision: 0.0907 - val_recall: 0.9176 - val_auc: 0.9787 - val_prc: 0.5912
Epoch 31/100
90/90 [==============================] - 1s 6ms/step - loss: 0.2304 - tp: 278.0000 - fp: 6633.0000 - tn: 175330.0000 - fn: 35.0000 - accuracy: 0.9634 - precision: 0.0402 - recall: 0.8882 - auc: 0.9664 - prc: 0.2656 - val_loss: 0.0788 - val_tp: 78.0000 - val_fp: 783.0000 - val_tn: 44701.0000 - val_fn: 7.0000 - val_accuracy: 0.9827 - val_precision: 0.0906 - val_recall: 0.9176 - val_auc: 0.9774 - val_prc: 0.5913
Epoch 32/100
90/90 [==============================] - 1s 6ms/step - loss: 0.2012 - tp: 278.0000 - fp: 6660.0000 - tn: 175303.0000 - fn: 35.0000 - accuracy: 0.9633 - precision: 0.0401 - recall: 0.8882 - auc: 0.9764 - prc: 0.2447 - val_loss: 0.0806 - val_tp: 78.0000 - val_fp: 815.0000 - val_tn: 44669.0000 - val_fn: 7.0000 - val_accuracy: 0.9820 - val_precision: 0.0873 - val_recall: 0.9176 - val_auc: 0.9792 - val_prc: 0.5716
Epoch 33/100
90/90 [==============================] - 1s 6ms/step - loss: 0.2660 - tp: 273.0000 - fp: 6680.0000 - tn: 175283.0000 - fn: 40.0000 - accuracy: 0.9631 - precision: 0.0393 - recall: 0.8722 - auc: 0.9563 - prc: 0.2345 - val_loss: 0.0805 - val_tp: 78.0000 - val_fp: 812.0000 - val_tn: 44672.0000 - val_fn: 7.0000 - val_accuracy: 0.9820 - val_precision: 0.0876 - val_recall: 0.9176 - val_auc: 0.9792 - val_prc: 0.5768
Epoch 34/100
90/90 [==============================] - 1s 6ms/step - loss: 0.2343 - tp: 272.0000 - fp: 6456.0000 - tn: 175507.0000 - fn: 41.0000 - accuracy: 0.9644 - precision: 0.0404 - recall: 0.8690 - auc: 0.9700 - prc: 0.2448 - val_loss: 0.0835 - val_tp: 78.0000 - val_fp: 846.0000 - val_tn: 44638.0000 - val_fn: 7.0000 - val_accuracy: 0.9813 - val_precision: 0.0844 - val_recall: 0.9176 - val_auc: 0.9774 - val_prc: 0.5676
Epoch 35/100
90/90 [==============================] - 1s 6ms/step - loss: 0.2054 - tp: 282.0000 - fp: 6597.0000 - tn: 175366.0000 - fn: 31.0000 - accuracy: 0.9636 - precision: 0.0410 - recall: 0.9010 - auc: 0.9722 - prc: 0.2551 - val_loss: 0.0806 - val_tp: 78.0000 - val_fp: 809.0000 - val_tn: 44675.0000 - val_fn: 7.0000 - val_accuracy: 0.9821 - val_precision: 0.0879 - val_recall: 0.9176 - val_auc: 0.9785 - val_prc: 0.5729
Epoch 36/100
90/90 [==============================] - 1s 6ms/step - loss: 0.2335 - tp: 280.0000 - fp: 6094.0000 - tn: 175869.0000 - fn: 33.0000 - accuracy: 0.9664 - precision: 0.0439 - recall: 0.8946 - auc: 0.9647 - prc: 0.2693 - val_loss: 0.0788 - val_tp: 78.0000 - val_fp: 784.0000 - val_tn: 44700.0000 - val_fn: 7.0000 - val_accuracy: 0.9826 - val_precision: 0.0905 - val_recall: 0.9176 - val_auc: 0.9785 - val_prc: 0.5779
Restoring model weights from the end of the best epoch.
Epoch 00036: early stopping

학습 이력 조회

plot_metrics(weighted_history)

png

매트릭 평가

train_predictions_weighted = weighted_model.predict(train_features, batch_size=BATCH_SIZE)
test_predictions_weighted = weighted_model.predict(test_features, batch_size=BATCH_SIZE)
weighted_results = weighted_model.evaluate(test_features, test_labels,
                                           batch_size=BATCH_SIZE, verbose=0)
for name, value in zip(weighted_model.metrics_names, weighted_results):
  print(name, ': ', value)
print()

plot_cm(test_labels, test_predictions_weighted)
loss :  0.07310998439788818
tp :  82.0
fp :  970.0
tn :  55898.0
fn :  12.0
accuracy :  0.9827604293823242
precision :  0.07794676721096039
recall :  0.8723404407501221
auc :  0.9701933264732361
prc :  0.5456250905990601

Legitimate Transactions Detected (True Negatives):  55898
Legitimate Transactions Incorrectly Detected (False Positives):  970
Fraudulent Transactions Missed (False Negatives):  12
Fraudulent Transactions Detected (True Positives):  82
Total Fraudulent Transactions:  94

png

여기서 클래스 가중치를 사용하면 거짓 양성이 더 많기 때문에 정확도와 정밀도는 더 낮지만, 반대로 참 양성이 많으므로 재현율과 AUC는 더 높다는 것을 알 수 있습니다. 정확도가 낮음에도 불구하고 이 모델은 재현율이 더 높습니다(더 많은 부정 거래 식별). 물론 두 가지 유형의 오류 모두 비용이 발생합니다(많은 합법 거래를 사기로 표시하여 사용자를 번거롭게 하는 것은 바람직하지 않으므로). 따라서, 여러 유형 오류 간 절충 사항을 신중하게 고려해야 합니다.

ROC 플로팅

plot_roc("Train Baseline", train_labels, train_predictions_baseline, color=colors[0])
plot_roc("Test Baseline", test_labels, test_predictions_baseline, color=colors[0], linestyle='--')

plot_roc("Train Weighted", train_labels, train_predictions_weighted, color=colors[1])
plot_roc("Test Weighted", test_labels, test_predictions_weighted, color=colors[1], linestyle='--')


plt.legend(loc='lower right')
<matplotlib.legend.Legend at 0x7ff9ccdf8f10>

png

AUPRC 플로팅

plot_prc("Train Baseline", train_labels, train_predictions_baseline, color=colors[0])
plot_prc("Test Baseline", test_labels, test_predictions_baseline, color=colors[0], linestyle='--')

plot_prc("Train Weighted", train_labels, train_predictions_weighted, color=colors[1])
plot_prc("Test Weighted", test_labels, test_predictions_weighted, color=colors[1], linestyle='--')


plt.legend(loc='lower right')
<matplotlib.legend.Legend at 0x7ff9cc1f0110>

png

오버샘플링

소수 계급 과대 표본

관련된 접근 방식은 소수 클래스를 오버 샘플링 하여 데이터 세트를 리 샘플링 하는 것입니다.

pos_features = train_features[bool_train_labels]
neg_features = train_features[~bool_train_labels]

pos_labels = train_labels[bool_train_labels]
neg_labels = train_labels[~bool_train_labels]

NumPy 사용

긍정적인 예에서 적절한 수의 임의 인덱스를 선택하여 데이터 세트의 균형을 수동으로 조정할 수 있습니다.:

ids = np.arange(len(pos_features))
choices = np.random.choice(ids, len(neg_features))

res_pos_features = pos_features[choices]
res_pos_labels = pos_labels[choices]

res_pos_features.shape
(181963, 29)
resampled_features = np.concatenate([res_pos_features, neg_features], axis=0)
resampled_labels = np.concatenate([res_pos_labels, neg_labels], axis=0)

order = np.arange(len(resampled_labels))
np.random.shuffle(order)
resampled_features = resampled_features[order]
resampled_labels = resampled_labels[order]

resampled_features.shape
(363926, 29)

tf.data 사용

tf.data를 사용하는 경우 균형있는 예를 생성하는 가장 쉬운 방법은 positivenegative 데이터세트로 시작하여 이들을 병합하는 것입니다. tf.data guide에서 더 많은 예를 참조하시기 바랍니다.

BUFFER_SIZE = 100000

def make_ds(features, labels):
  ds = tf.data.Dataset.from_tensor_slices((features, labels))#.cache()
  ds = ds.shuffle(BUFFER_SIZE).repeat()
  return ds

pos_ds = make_ds(pos_features, pos_labels)
neg_ds = make_ds(neg_features, neg_labels)

각 데이터 세트는 (feature, label) 쌍으로 되어 있습니다.

for features, label in pos_ds.take(1):
  print("Features:\n", features.numpy())
  print()
  print("Label: ", label.numpy())
Features:
 [-3.74818081  3.00481166 -5.          5.         -5.         -1.78734344
 -5.          3.88414884 -5.         -5.          5.         -5.
  0.55035681 -5.         -0.49442542 -5.         -5.         -5.
  4.95463284  0.93387724  2.95291341  0.04553031 -0.02054476  1.03199931
 -0.10099318  0.34178886  3.51642319  0.9441714  -0.25312576]

Label:  1

experimental.sample_from_datasets 를 사용하여 두 가지를 병합합니다.:

resampled_ds = tf.data.experimental.sample_from_datasets([pos_ds, neg_ds], weights=[0.5, 0.5])
resampled_ds = resampled_ds.batch(BATCH_SIZE).prefetch(2)
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/data/experimental/ops/interleave_ops.py:260: RandomDataset.__init__ (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.random(...)`.
for features, label in resampled_ds.take(1):
  print(label.numpy().mean())
0.50244140625

이 데이터 세트를 사용하려면 epoch당 스텝 수가 필요합니다.

이 경우 "epoch"의 정의는 명확하지 않습니다. 각 음성 예시를 한 번 볼 때 필요한 배치 수라고 해봅시다.

resampled_steps_per_epoch = np.ceil(2.0*neg/BATCH_SIZE)
resampled_steps_per_epoch
278.0

오버 샘플링 된 데이터에 대한 학습

이제 클래스 가중치를 사용하는 대신 리 샘플링 된 데이터 세트로 모델을 학습하여 이러한 방법이 어떻게 비교되는지 확인하십시오.

참고: 긍정적인 예를 복제하여 데이터가 균형을 이루었기 때문에 총 데이터 세트 크기가 더 크고 각 세대가 더 많은 학습 단계를 위해 실행됩니다.

resampled_model = make_model()
resampled_model.load_weights(initial_weights)

# Reset the bias to zero, since this dataset is balanced.
output_layer = resampled_model.layers[-1] 
output_layer.bias.assign([0])

val_ds = tf.data.Dataset.from_tensor_slices((val_features, val_labels)).cache()
val_ds = val_ds.batch(BATCH_SIZE).prefetch(2) 

resampled_history = resampled_model.fit(
    resampled_ds,
    epochs=EPOCHS,
    steps_per_epoch=resampled_steps_per_epoch,
    callbacks=[early_stopping],
    validation_data=val_ds)
Epoch 1/100
278/278 [==============================] - 9s 28ms/step - loss: 0.5130 - tp: 215130.0000 - fp: 50880.0000 - tn: 290410.0000 - fn: 69886.0000 - accuracy: 0.8072 - precision: 0.8087 - recall: 0.7548 - auc: 0.8612 - prc: 0.8897 - val_loss: 0.2428 - val_tp: 75.0000 - val_fp: 683.0000 - val_tn: 44801.0000 - val_fn: 10.0000 - val_accuracy: 0.9848 - val_precision: 0.0989 - val_recall: 0.8824 - val_auc: 0.9576 - val_prc: 0.7392
Epoch 2/100
278/278 [==============================] - 7s 26ms/step - loss: 0.2451 - tp: 237128.0000 - fp: 13237.0000 - tn: 271244.0000 - fn: 47735.0000 - accuracy: 0.8929 - precision: 0.9471 - recall: 0.8324 - auc: 0.9581 - prc: 0.9672 - val_loss: 0.1324 - val_tp: 76.0000 - val_fp: 618.0000 - val_tn: 44866.0000 - val_fn: 9.0000 - val_accuracy: 0.9862 - val_precision: 0.1095 - val_recall: 0.8941 - val_auc: 0.9689 - val_prc: 0.7532
Epoch 3/100
278/278 [==============================] - 7s 26ms/step - loss: 0.1919 - tp: 240924.0000 - fp: 8887.0000 - tn: 275710.0000 - fn: 43823.0000 - accuracy: 0.9074 - precision: 0.9644 - recall: 0.8461 - auc: 0.9778 - prc: 0.9808 - val_loss: 0.0961 - val_tp: 77.0000 - val_fp: 679.0000 - val_tn: 44805.0000 - val_fn: 8.0000 - val_accuracy: 0.9849 - val_precision: 0.1019 - val_recall: 0.9059 - val_auc: 0.9722 - val_prc: 0.7557
Epoch 4/100
278/278 [==============================] - 7s 26ms/step - loss: 0.1658 - tp: 243922.0000 - fp: 7987.0000 - tn: 276860.0000 - fn: 40575.0000 - accuracy: 0.9147 - precision: 0.9683 - recall: 0.8574 - auc: 0.9848 - prc: 0.9862 - val_loss: 0.0791 - val_tp: 77.0000 - val_fp: 717.0000 - val_tn: 44767.0000 - val_fn: 8.0000 - val_accuracy: 0.9841 - val_precision: 0.0970 - val_recall: 0.9059 - val_auc: 0.9742 - val_prc: 0.7575
Epoch 5/100
278/278 [==============================] - 7s 25ms/step - loss: 0.1464 - tp: 248311.0000 - fp: 7782.0000 - tn: 277184.0000 - fn: 36067.0000 - accuracy: 0.9230 - precision: 0.9696 - recall: 0.8732 - auc: 0.9885 - prc: 0.9892 - val_loss: 0.0692 - val_tp: 77.0000 - val_fp: 737.0000 - val_tn: 44747.0000 - val_fn: 8.0000 - val_accuracy: 0.9837 - val_precision: 0.0946 - val_recall: 0.9059 - val_auc: 0.9732 - val_prc: 0.7505
Epoch 6/100
278/278 [==============================] - 7s 25ms/step - loss: 0.1335 - tp: 251223.0000 - fp: 7755.0000 - tn: 277108.0000 - fn: 33258.0000 - accuracy: 0.9280 - precision: 0.9701 - recall: 0.8831 - auc: 0.9904 - prc: 0.9910 - val_loss: 0.0633 - val_tp: 78.0000 - val_fp: 757.0000 - val_tn: 44727.0000 - val_fn: 7.0000 - val_accuracy: 0.9832 - val_precision: 0.0934 - val_recall: 0.9176 - val_auc: 0.9751 - val_prc: 0.7508
Epoch 7/100
278/278 [==============================] - 7s 25ms/step - loss: 0.1244 - tp: 264697.0000 - fp: 9195.0000 - tn: 275154.0000 - fn: 20298.0000 - accuracy: 0.9482 - precision: 0.9664 - recall: 0.9288 - auc: 0.9916 - prc: 0.9920 - val_loss: 0.0588 - val_tp: 78.0000 - val_fp: 764.0000 - val_tn: 44720.0000 - val_fn: 7.0000 - val_accuracy: 0.9831 - val_precision: 0.0926 - val_recall: 0.9176 - val_auc: 0.9762 - val_prc: 0.7510
Epoch 8/100
278/278 [==============================] - 7s 25ms/step - loss: 0.1156 - tp: 274614.0000 - fp: 10534.0000 - tn: 274470.0000 - fn: 9726.0000 - accuracy: 0.9644 - precision: 0.9631 - recall: 0.9658 - auc: 0.9928 - prc: 0.9930 - val_loss: 0.0543 - val_tp: 78.0000 - val_fp: 769.0000 - val_tn: 44715.0000 - val_fn: 7.0000 - val_accuracy: 0.9830 - val_precision: 0.0921 - val_recall: 0.9176 - val_auc: 0.9772 - val_prc: 0.7440
Epoch 9/100
278/278 [==============================] - 7s 25ms/step - loss: 0.1078 - tp: 276395.0000 - fp: 11090.0000 - tn: 274037.0000 - fn: 7822.0000 - accuracy: 0.9668 - precision: 0.9614 - recall: 0.9725 - auc: 0.9937 - prc: 0.9939 - val_loss: 0.0509 - val_tp: 78.0000 - val_fp: 763.0000 - val_tn: 44721.0000 - val_fn: 7.0000 - val_accuracy: 0.9831 - val_precision: 0.0927 - val_recall: 0.9176 - val_auc: 0.9779 - val_prc: 0.7454
Epoch 10/100
278/278 [==============================] - 7s 25ms/step - loss: 0.1015 - tp: 278413.0000 - fp: 11347.0000 - tn: 272984.0000 - fn: 6600.0000 - accuracy: 0.9685 - precision: 0.9608 - recall: 0.9768 - auc: 0.9944 - prc: 0.9945 - val_loss: 0.0475 - val_tp: 78.0000 - val_fp: 757.0000 - val_tn: 44727.0000 - val_fn: 7.0000 - val_accuracy: 0.9832 - val_precision: 0.0934 - val_recall: 0.9176 - val_auc: 0.9788 - val_prc: 0.7467
Epoch 11/100
278/278 [==============================] - 7s 25ms/step - loss: 0.0953 - tp: 279347.0000 - fp: 11548.0000 - tn: 272895.0000 - fn: 5554.0000 - accuracy: 0.9700 - precision: 0.9603 - recall: 0.9805 - auc: 0.9951 - prc: 0.9950 - val_loss: 0.0437 - val_tp: 78.0000 - val_fp: 720.0000 - val_tn: 44764.0000 - val_fn: 7.0000 - val_accuracy: 0.9840 - val_precision: 0.0977 - val_recall: 0.9176 - val_auc: 0.9794 - val_prc: 0.7475
Epoch 12/100
278/278 [==============================] - 7s 25ms/step - loss: 0.0890 - tp: 280148.0000 - fp: 11442.0000 - tn: 273063.0000 - fn: 4691.0000 - accuracy: 0.9717 - precision: 0.9608 - recall: 0.9835 - auc: 0.9957 - prc: 0.9956 - val_loss: 0.0406 - val_tp: 78.0000 - val_fp: 685.0000 - val_tn: 44799.0000 - val_fn: 7.0000 - val_accuracy: 0.9848 - val_precision: 0.1022 - val_recall: 0.9176 - val_auc: 0.9796 - val_prc: 0.7494
Epoch 13/100
278/278 [==============================] - 7s 25ms/step - loss: 0.0836 - tp: 280576.0000 - fp: 11537.0000 - tn: 273402.0000 - fn: 3829.0000 - accuracy: 0.9730 - precision: 0.9605 - recall: 0.9865 - auc: 0.9962 - prc: 0.9959 - val_loss: 0.0377 - val_tp: 78.0000 - val_fp: 671.0000 - val_tn: 44813.0000 - val_fn: 7.0000 - val_accuracy: 0.9851 - val_precision: 0.1041 - val_recall: 0.9176 - val_auc: 0.9801 - val_prc: 0.7611
Epoch 14/100
278/278 [==============================] - 7s 25ms/step - loss: 0.0779 - tp: 281854.0000 - fp: 11365.0000 - tn: 272825.0000 - fn: 3300.0000 - accuracy: 0.9742 - precision: 0.9612 - recall: 0.9884 - auc: 0.9966 - prc: 0.9964 - val_loss: 0.0351 - val_tp: 79.0000 - val_fp: 653.0000 - val_tn: 44831.0000 - val_fn: 6.0000 - val_accuracy: 0.9855 - val_precision: 0.1079 - val_recall: 0.9294 - val_auc: 0.9760 - val_prc: 0.7695
Epoch 15/100
278/278 [==============================] - 7s 25ms/step - loss: 0.0731 - tp: 281649.0000 - fp: 11176.0000 - tn: 273622.0000 - fn: 2897.0000 - accuracy: 0.9753 - precision: 0.9618 - recall: 0.9898 - auc: 0.9969 - prc: 0.9966 - val_loss: 0.0321 - val_tp: 78.0000 - val_fp: 601.0000 - val_tn: 44883.0000 - val_fn: 7.0000 - val_accuracy: 0.9867 - val_precision: 0.1149 - val_recall: 0.9176 - val_auc: 0.9760 - val_prc: 0.7612
Epoch 16/100
278/278 [==============================] - 7s 25ms/step - loss: 0.0688 - tp: 282499.0000 - fp: 11014.0000 - tn: 273773.0000 - fn: 2058.0000 - accuracy: 0.9770 - precision: 0.9625 - recall: 0.9928 - auc: 0.9971 - prc: 0.9968 - val_loss: 0.0301 - val_tp: 78.0000 - val_fp: 568.0000 - val_tn: 44916.0000 - val_fn: 7.0000 - val_accuracy: 0.9874 - val_precision: 0.1207 - val_recall: 0.9176 - val_auc: 0.9717 - val_prc: 0.7708
Epoch 17/100
278/278 [==============================] - 7s 25ms/step - loss: 0.0658 - tp: 283489.0000 - fp: 10985.0000 - tn: 273198.0000 - fn: 1672.0000 - accuracy: 0.9778 - precision: 0.9627 - recall: 0.9941 - auc: 0.9973 - prc: 0.9970 - val_loss: 0.0286 - val_tp: 78.0000 - val_fp: 541.0000 - val_tn: 44943.0000 - val_fn: 7.0000 - val_accuracy: 0.9880 - val_precision: 0.1260 - val_recall: 0.9176 - val_auc: 0.9669 - val_prc: 0.7795
Epoch 18/100
278/278 [==============================] - 7s 25ms/step - loss: 0.0619 - tp: 283650.0000 - fp: 10499.0000 - tn: 273724.0000 - fn: 1471.0000 - accuracy: 0.9790 - precision: 0.9643 - recall: 0.9948 - auc: 0.9975 - prc: 0.9972 - val_loss: 0.0271 - val_tp: 78.0000 - val_fp: 512.0000 - val_tn: 44972.0000 - val_fn: 7.0000 - val_accuracy: 0.9886 - val_precision: 0.1322 - val_recall: 0.9176 - val_auc: 0.9674 - val_prc: 0.7799
Epoch 19/100
278/278 [==============================] - 7s 25ms/step - loss: 0.0596 - tp: 283345.0000 - fp: 10294.0000 - tn: 274533.0000 - fn: 1172.0000 - accuracy: 0.9799 - precision: 0.9649 - recall: 0.9959 - auc: 0.9975 - prc: 0.9972 - val_loss: 0.0255 - val_tp: 78.0000 - val_fp: 480.0000 - val_tn: 45004.0000 - val_fn: 7.0000 - val_accuracy: 0.9893 - val_precision: 0.1398 - val_recall: 0.9176 - val_auc: 0.9677 - val_prc: 0.7800
Epoch 20/100
278/278 [==============================] - 7s 25ms/step - loss: 0.0576 - tp: 284070.0000 - fp: 10193.0000 - tn: 274197.0000 - fn: 884.0000 - accuracy: 0.9805 - precision: 0.9654 - recall: 0.9969 - auc: 0.9976 - prc: 0.9973 - val_loss: 0.0242 - val_tp: 78.0000 - val_fp: 455.0000 - val_tn: 45029.0000 - val_fn: 7.0000 - val_accuracy: 0.9899 - val_precision: 0.1463 - val_recall: 0.9176 - val_auc: 0.9680 - val_prc: 0.7817
Epoch 21/100
278/278 [==============================] - 7s 25ms/step - loss: 0.0553 - tp: 283954.0000 - fp: 9994.0000 - tn: 274718.0000 - fn: 678.0000 - accuracy: 0.9813 - precision: 0.9660 - recall: 0.9976 - auc: 0.9977 - prc: 0.9974 - val_loss: 0.0229 - val_tp: 78.0000 - val_fp: 425.0000 - val_tn: 45059.0000 - val_fn: 7.0000 - val_accuracy: 0.9905 - val_precision: 0.1551 - val_recall: 0.9176 - val_auc: 0.9682 - val_prc: 0.7815
Epoch 22/100
278/278 [==============================] - 7s 25ms/step - loss: 0.0537 - tp: 284026.0000 - fp: 9815.0000 - tn: 274976.0000 - fn: 527.0000 - accuracy: 0.9818 - precision: 0.9666 - recall: 0.9981 - auc: 0.9978 - prc: 0.9975 - val_loss: 0.0221 - val_tp: 78.0000 - val_fp: 397.0000 - val_tn: 45087.0000 - val_fn: 7.0000 - val_accuracy: 0.9911 - val_precision: 0.1642 - val_recall: 0.9176 - val_auc: 0.9684 - val_prc: 0.7809
Epoch 23/100
278/278 [==============================] - 7s 25ms/step - loss: 0.0523 - tp: 284345.0000 - fp: 9613.0000 - tn: 274941.0000 - fn: 445.0000 - accuracy: 0.9823 - precision: 0.9673 - recall: 0.9984 - auc: 0.9977 - prc: 0.9975 - val_loss: 0.0215 - val_tp: 78.0000 - val_fp: 373.0000 - val_tn: 45111.0000 - val_fn: 7.0000 - val_accuracy: 0.9917 - val_precision: 0.1729 - val_recall: 0.9176 - val_auc: 0.9686 - val_prc: 0.7817
Restoring model weights from the end of the best epoch.
Epoch 00023: early stopping

만약 훈련 프로세스가 각 기울기 업데이트에서 전체 데이터 세트를 고려하는 경우, 이 오버 샘플링은 기본적으로 클래스 가중치와 동일합니다.

그러나 여기에서와 같이, 모델을 배치별로 훈련할 때 오버샘플링된 데이터는 더 부드러운 그래디언트 신호를 제공합니다. 각 양성 예시가 하나의 배치에서 큰 가중치를 가지기보다, 매번 여러 배치에서 작은 가중치를 갖기 때문입니다.

이 부드러운 기울기 신호는 모델을 더 쉽게 훈련 할 수 있습니다.

교육 이력 확인

학습 데이터의 분포가 검증 및 테스트 데이터와 완전히 다르기 때문에 여기서 측정 항목의 분포가 다를 수 있습니다.

plot_metrics(resampled_history)

png

재교육

균형 잡힌 데이터에 대한 훈련이 더 쉽기 때문에 위의 훈련 절차가 빠르게 과적합 될 수 있습니다.

epoch를 나누어 tf.keras.callbacks.EarlyStopping를 보다 세밀하게 제어하여 훈련 중단 시점을 정합니다.

resampled_model = make_model()
resampled_model.load_weights(initial_weights)

# Reset the bias to zero, since this dataset is balanced.
output_layer = resampled_model.layers[-1] 
output_layer.bias.assign([0])

resampled_history = resampled_model.fit(
    resampled_ds,
    # These are not real epochs
    steps_per_epoch=20,
    epochs=10*EPOCHS,
    callbacks=[early_stopping],
    validation_data=(val_ds))
Epoch 1/1000
20/20 [==============================] - 3s 65ms/step - loss: 1.3354 - tp: 9534.0000 - fp: 6307.0000 - tn: 59758.0000 - fn: 10930.0000 - accuracy: 0.8008 - precision: 0.6019 - recall: 0.4659 - auc: 0.8067 - prc: 0.6533 - val_loss: 0.5710 - val_tp: 67.0000 - val_fp: 11408.0000 - val_tn: 34076.0000 - val_fn: 18.0000 - val_accuracy: 0.7493 - val_precision: 0.0058 - val_recall: 0.7882 - val_auc: 0.8237 - val_prc: 0.4174
Epoch 2/1000
20/20 [==============================] - 1s 28ms/step - loss: 0.8414 - tp: 12394.0000 - fp: 6307.0000 - tn: 14243.0000 - fn: 8016.0000 - accuracy: 0.6503 - precision: 0.6627 - recall: 0.6073 - auc: 0.6650 - prc: 0.7762 - val_loss: 0.5763 - val_tp: 72.0000 - val_fp: 11294.0000 - val_tn: 34190.0000 - val_fn: 13.0000 - val_accuracy: 0.7519 - val_precision: 0.0063 - val_recall: 0.8471 - val_auc: 0.8960 - val_prc: 0.5866
Epoch 3/1000
20/20 [==============================] - 1s 28ms/step - loss: 0.6527 - tp: 14108.0000 - fp: 6111.0000 - tn: 14509.0000 - fn: 6232.0000 - accuracy: 0.6987 - precision: 0.6978 - recall: 0.6936 - auc: 0.7508 - prc: 0.8306 - val_loss: 0.5511 - val_tp: 75.0000 - val_fp: 9861.0000 - val_tn: 35623.0000 - val_fn: 10.0000 - val_accuracy: 0.7834 - val_precision: 0.0075 - val_recall: 0.8824 - val_auc: 0.9157 - val_prc: 0.6609
Epoch 4/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.5601 - tp: 15105.0000 - fp: 5225.0000 - tn: 15335.0000 - fn: 5295.0000 - accuracy: 0.7432 - precision: 0.7430 - recall: 0.7404 - auc: 0.8027 - prc: 0.8655 - val_loss: 0.5121 - val_tp: 75.0000 - val_fp: 7915.0000 - val_tn: 37569.0000 - val_fn: 10.0000 - val_accuracy: 0.8261 - val_precision: 0.0094 - val_recall: 0.8824 - val_auc: 0.9234 - val_prc: 0.6841
Epoch 5/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.4919 - tp: 15943.0000 - fp: 4424.0000 - tn: 15953.0000 - fn: 4640.0000 - accuracy: 0.7787 - precision: 0.7828 - recall: 0.7746 - auc: 0.8404 - prc: 0.8915 - val_loss: 0.4714 - val_tp: 75.0000 - val_fp: 6078.0000 - val_tn: 39406.0000 - val_fn: 10.0000 - val_accuracy: 0.8664 - val_precision: 0.0122 - val_recall: 0.8824 - val_auc: 0.9306 - val_prc: 0.6901
Epoch 6/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.4573 - tp: 16315.0000 - fp: 3972.0000 - tn: 16333.0000 - fn: 4340.0000 - accuracy: 0.7971 - precision: 0.8042 - recall: 0.7899 - auc: 0.8594 - prc: 0.9049 - val_loss: 0.4329 - val_tp: 75.0000 - val_fp: 4581.0000 - val_tn: 40903.0000 - val_fn: 10.0000 - val_accuracy: 0.8993 - val_precision: 0.0161 - val_recall: 0.8824 - val_auc: 0.9385 - val_prc: 0.7017
Epoch 7/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.4157 - tp: 16291.0000 - fp: 3425.0000 - tn: 17072.0000 - fn: 4172.0000 - accuracy: 0.8145 - precision: 0.8263 - recall: 0.7961 - auc: 0.8780 - prc: 0.9159 - val_loss: 0.3968 - val_tp: 74.0000 - val_fp: 3282.0000 - val_tn: 42202.0000 - val_fn: 11.0000 - val_accuracy: 0.9277 - val_precision: 0.0221 - val_recall: 0.8706 - val_auc: 0.9441 - val_prc: 0.7128
Epoch 8/1000
20/20 [==============================] - 1s 30ms/step - loss: 0.3922 - tp: 16459.0000 - fp: 2928.0000 - tn: 17567.0000 - fn: 4006.0000 - accuracy: 0.8307 - precision: 0.8490 - recall: 0.8043 - auc: 0.8918 - prc: 0.9246 - val_loss: 0.3646 - val_tp: 75.0000 - val_fp: 2264.0000 - val_tn: 43220.0000 - val_fn: 10.0000 - val_accuracy: 0.9501 - val_precision: 0.0321 - val_recall: 0.8824 - val_auc: 0.9489 - val_prc: 0.7206
Epoch 9/1000
20/20 [==============================] - 1s 30ms/step - loss: 0.3696 - tp: 16715.0000 - fp: 2522.0000 - tn: 17846.0000 - fn: 3877.0000 - accuracy: 0.8438 - precision: 0.8689 - recall: 0.8117 - auc: 0.9015 - prc: 0.9320 - val_loss: 0.3369 - val_tp: 75.0000 - val_fp: 1646.0000 - val_tn: 43838.0000 - val_fn: 10.0000 - val_accuracy: 0.9637 - val_precision: 0.0436 - val_recall: 0.8824 - val_auc: 0.9524 - val_prc: 0.7248
Epoch 10/1000
20/20 [==============================] - 1s 30ms/step - loss: 0.3523 - tp: 16615.0000 - fp: 2160.0000 - tn: 18372.0000 - fn: 3813.0000 - accuracy: 0.8542 - precision: 0.8850 - recall: 0.8133 - auc: 0.9098 - prc: 0.9365 - val_loss: 0.3125 - val_tp: 75.0000 - val_fp: 1251.0000 - val_tn: 44233.0000 - val_fn: 10.0000 - val_accuracy: 0.9723 - val_precision: 0.0566 - val_recall: 0.8824 - val_auc: 0.9541 - val_prc: 0.7291
Epoch 11/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.3335 - tp: 16729.0000 - fp: 2002.0000 - tn: 18580.0000 - fn: 3649.0000 - accuracy: 0.8620 - precision: 0.8931 - recall: 0.8209 - auc: 0.9196 - prc: 0.9425 - val_loss: 0.2899 - val_tp: 75.0000 - val_fp: 972.0000 - val_tn: 44512.0000 - val_fn: 10.0000 - val_accuracy: 0.9785 - val_precision: 0.0716 - val_recall: 0.8824 - val_auc: 0.9552 - val_prc: 0.7341
Epoch 12/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.3183 - tp: 16936.0000 - fp: 1703.0000 - tn: 18645.0000 - fn: 3676.0000 - accuracy: 0.8687 - precision: 0.9086 - recall: 0.8217 - auc: 0.9264 - prc: 0.9476 - val_loss: 0.2704 - val_tp: 75.0000 - val_fp: 824.0000 - val_tn: 44660.0000 - val_fn: 10.0000 - val_accuracy: 0.9817 - val_precision: 0.0834 - val_recall: 0.8824 - val_auc: 0.9561 - val_prc: 0.7374
Epoch 13/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.3034 - tp: 16839.0000 - fp: 1468.0000 - tn: 18994.0000 - fn: 3659.0000 - accuracy: 0.8748 - precision: 0.9198 - recall: 0.8215 - auc: 0.9322 - prc: 0.9511 - val_loss: 0.2537 - val_tp: 75.0000 - val_fp: 726.0000 - val_tn: 44758.0000 - val_fn: 10.0000 - val_accuracy: 0.9838 - val_precision: 0.0936 - val_recall: 0.8824 - val_auc: 0.9570 - val_prc: 0.7386
Epoch 14/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.2930 - tp: 16939.0000 - fp: 1382.0000 - tn: 19116.0000 - fn: 3523.0000 - accuracy: 0.8802 - precision: 0.9246 - recall: 0.8278 - auc: 0.9375 - prc: 0.9543 - val_loss: 0.2387 - val_tp: 75.0000 - val_fp: 678.0000 - val_tn: 44806.0000 - val_fn: 10.0000 - val_accuracy: 0.9849 - val_precision: 0.0996 - val_recall: 0.8824 - val_auc: 0.9581 - val_prc: 0.7397
Epoch 15/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.2876 - tp: 16798.0000 - fp: 1313.0000 - tn: 19188.0000 - fn: 3661.0000 - accuracy: 0.8786 - precision: 0.9275 - recall: 0.8211 - auc: 0.9406 - prc: 0.9557 - val_loss: 0.2255 - val_tp: 75.0000 - val_fp: 635.0000 - val_tn: 44849.0000 - val_fn: 10.0000 - val_accuracy: 0.9858 - val_precision: 0.1056 - val_recall: 0.8824 - val_auc: 0.9590 - val_prc: 0.7403
Epoch 16/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.2747 - tp: 16776.0000 - fp: 1173.0000 - tn: 19496.0000 - fn: 3515.0000 - accuracy: 0.8855 - precision: 0.9346 - recall: 0.8268 - auc: 0.9454 - prc: 0.9587 - val_loss: 0.2132 - val_tp: 75.0000 - val_fp: 615.0000 - val_tn: 44869.0000 - val_fn: 10.0000 - val_accuracy: 0.9863 - val_precision: 0.1087 - val_recall: 0.8824 - val_auc: 0.9604 - val_prc: 0.7408
Epoch 17/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.2683 - tp: 16904.0000 - fp: 1088.0000 - tn: 19357.0000 - fn: 3611.0000 - accuracy: 0.8853 - precision: 0.9395 - recall: 0.8240 - auc: 0.9486 - prc: 0.9612 - val_loss: 0.2021 - val_tp: 75.0000 - val_fp: 604.0000 - val_tn: 44880.0000 - val_fn: 10.0000 - val_accuracy: 0.9865 - val_precision: 0.1105 - val_recall: 0.8824 - val_auc: 0.9618 - val_prc: 0.7429
Epoch 18/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.2598 - tp: 17033.0000 - fp: 1061.0000 - tn: 19453.0000 - fn: 3413.0000 - accuracy: 0.8908 - precision: 0.9414 - recall: 0.8331 - auc: 0.9520 - prc: 0.9632 - val_loss: 0.1917 - val_tp: 75.0000 - val_fp: 596.0000 - val_tn: 44888.0000 - val_fn: 10.0000 - val_accuracy: 0.9867 - val_precision: 0.1118 - val_recall: 0.8824 - val_auc: 0.9628 - val_prc: 0.7442
Epoch 19/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.2543 - tp: 16959.0000 - fp: 976.0000 - tn: 19610.0000 - fn: 3415.0000 - accuracy: 0.8928 - precision: 0.9456 - recall: 0.8324 - auc: 0.9548 - prc: 0.9649 - val_loss: 0.1825 - val_tp: 75.0000 - val_fp: 596.0000 - val_tn: 44888.0000 - val_fn: 10.0000 - val_accuracy: 0.9867 - val_precision: 0.1118 - val_recall: 0.8824 - val_auc: 0.9638 - val_prc: 0.7446
Epoch 20/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.2481 - tp: 16913.0000 - fp: 957.0000 - tn: 19627.0000 - fn: 3463.0000 - accuracy: 0.8921 - precision: 0.9464 - recall: 0.8300 - auc: 0.9573 - prc: 0.9663 - val_loss: 0.1743 - val_tp: 75.0000 - val_fp: 590.0000 - val_tn: 44894.0000 - val_fn: 10.0000 - val_accuracy: 0.9868 - val_precision: 0.1128 - val_recall: 0.8824 - val_auc: 0.9648 - val_prc: 0.7477
Epoch 21/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.2424 - tp: 17151.0000 - fp: 920.0000 - tn: 19469.0000 - fn: 3420.0000 - accuracy: 0.8940 - precision: 0.9491 - recall: 0.8337 - auc: 0.9594 - prc: 0.9684 - val_loss: 0.1673 - val_tp: 75.0000 - val_fp: 601.0000 - val_tn: 44883.0000 - val_fn: 10.0000 - val_accuracy: 0.9866 - val_precision: 0.1109 - val_recall: 0.8824 - val_auc: 0.9659 - val_prc: 0.7495
Epoch 22/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.2370 - tp: 17030.0000 - fp: 887.0000 - tn: 19615.0000 - fn: 3428.0000 - accuracy: 0.8947 - precision: 0.9505 - recall: 0.8324 - auc: 0.9610 - prc: 0.9692 - val_loss: 0.1603 - val_tp: 75.0000 - val_fp: 600.0000 - val_tn: 44884.0000 - val_fn: 10.0000 - val_accuracy: 0.9866 - val_precision: 0.1111 - val_recall: 0.8824 - val_auc: 0.9665 - val_prc: 0.7497
Epoch 23/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.2325 - tp: 17029.0000 - fp: 852.0000 - tn: 19651.0000 - fn: 3428.0000 - accuracy: 0.8955 - precision: 0.9524 - recall: 0.8324 - auc: 0.9632 - prc: 0.9704 - val_loss: 0.1541 - val_tp: 75.0000 - val_fp: 601.0000 - val_tn: 44883.0000 - val_fn: 10.0000 - val_accuracy: 0.9866 - val_precision: 0.1109 - val_recall: 0.8824 - val_auc: 0.9675 - val_prc: 0.7530
Epoch 24/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.2287 - tp: 17184.0000 - fp: 873.0000 - tn: 19479.0000 - fn: 3424.0000 - accuracy: 0.8951 - precision: 0.9517 - recall: 0.8339 - auc: 0.9647 - prc: 0.9719 - val_loss: 0.1484 - val_tp: 75.0000 - val_fp: 598.0000 - val_tn: 44886.0000 - val_fn: 10.0000 - val_accuracy: 0.9867 - val_precision: 0.1114 - val_recall: 0.8824 - val_auc: 0.9681 - val_prc: 0.7533
Epoch 25/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.2229 - tp: 17161.0000 - fp: 794.0000 - tn: 19699.0000 - fn: 3306.0000 - accuracy: 0.8999 - precision: 0.9558 - recall: 0.8385 - auc: 0.9667 - prc: 0.9729 - val_loss: 0.1431 - val_tp: 75.0000 - val_fp: 596.0000 - val_tn: 44888.0000 - val_fn: 10.0000 - val_accuracy: 0.9867 - val_precision: 0.1118 - val_recall: 0.8824 - val_auc: 0.9685 - val_prc: 0.7533
Epoch 26/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.2196 - tp: 17037.0000 - fp: 741.0000 - tn: 19824.0000 - fn: 3358.0000 - accuracy: 0.8999 - precision: 0.9583 - recall: 0.8354 - auc: 0.9679 - prc: 0.9735 - val_loss: 0.1382 - val_tp: 75.0000 - val_fp: 601.0000 - val_tn: 44883.0000 - val_fn: 10.0000 - val_accuracy: 0.9866 - val_precision: 0.1109 - val_recall: 0.8824 - val_auc: 0.9690 - val_prc: 0.7533
Epoch 27/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.2121 - tp: 17149.0000 - fp: 723.0000 - tn: 19817.0000 - fn: 3271.0000 - accuracy: 0.9025 - precision: 0.9595 - recall: 0.8398 - auc: 0.9708 - prc: 0.9758 - val_loss: 0.1337 - val_tp: 76.0000 - val_fp: 604.0000 - val_tn: 44880.0000 - val_fn: 9.0000 - val_accuracy: 0.9865 - val_precision: 0.1118 - val_recall: 0.8941 - val_auc: 0.9698 - val_prc: 0.7533
Epoch 28/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.2132 - tp: 17351.0000 - fp: 702.0000 - tn: 19627.0000 - fn: 3280.0000 - accuracy: 0.9028 - precision: 0.9611 - recall: 0.8410 - auc: 0.9705 - prc: 0.9757 - val_loss: 0.1297 - val_tp: 77.0000 - val_fp: 613.0000 - val_tn: 44871.0000 - val_fn: 8.0000 - val_accuracy: 0.9864 - val_precision: 0.1116 - val_recall: 0.9059 - val_auc: 0.9704 - val_prc: 0.7535
Epoch 29/1000
20/20 [==============================] - 1s 30ms/step - loss: 0.2074 - tp: 17088.0000 - fp: 695.0000 - tn: 19865.0000 - fn: 3312.0000 - accuracy: 0.9022 - precision: 0.9609 - recall: 0.8376 - auc: 0.9723 - prc: 0.9768 - val_loss: 0.1259 - val_tp: 77.0000 - val_fp: 617.0000 - val_tn: 44867.0000 - val_fn: 8.0000 - val_accuracy: 0.9863 - val_precision: 0.1110 - val_recall: 0.9059 - val_auc: 0.9705 - val_prc: 0.7550
Epoch 30/1000
20/20 [==============================] - 1s 30ms/step - loss: 0.2041 - tp: 17287.0000 - fp: 710.0000 - tn: 19696.0000 - fn: 3267.0000 - accuracy: 0.9029 - precision: 0.9605 - recall: 0.8411 - auc: 0.9735 - prc: 0.9779 - val_loss: 0.1224 - val_tp: 77.0000 - val_fp: 622.0000 - val_tn: 44862.0000 - val_fn: 8.0000 - val_accuracy: 0.9862 - val_precision: 0.1102 - val_recall: 0.9059 - val_auc: 0.9715 - val_prc: 0.7549
Epoch 31/1000
20/20 [==============================] - 1s 30ms/step - loss: 0.1987 - tp: 17253.0000 - fp: 643.0000 - tn: 19863.0000 - fn: 3201.0000 - accuracy: 0.9062 - precision: 0.9641 - recall: 0.8435 - auc: 0.9756 - prc: 0.9793 - val_loss: 0.1189 - val_tp: 77.0000 - val_fp: 619.0000 - val_tn: 44865.0000 - val_fn: 8.0000 - val_accuracy: 0.9862 - val_precision: 0.1106 - val_recall: 0.9059 - val_auc: 0.9714 - val_prc: 0.7548
Epoch 32/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.2014 - tp: 17034.0000 - fp: 695.0000 - tn: 19951.0000 - fn: 3280.0000 - accuracy: 0.9030 - precision: 0.9608 - recall: 0.8385 - auc: 0.9752 - prc: 0.9783 - val_loss: 0.1156 - val_tp: 77.0000 - val_fp: 622.0000 - val_tn: 44862.0000 - val_fn: 8.0000 - val_accuracy: 0.9862 - val_precision: 0.1102 - val_recall: 0.9059 - val_auc: 0.9715 - val_prc: 0.7549
Epoch 33/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.1952 - tp: 17170.0000 - fp: 597.0000 - tn: 19967.0000 - fn: 3226.0000 - accuracy: 0.9067 - precision: 0.9664 - recall: 0.8418 - auc: 0.9764 - prc: 0.9798 - val_loss: 0.1130 - val_tp: 77.0000 - val_fp: 632.0000 - val_tn: 44852.0000 - val_fn: 8.0000 - val_accuracy: 0.9860 - val_precision: 0.1086 - val_recall: 0.9059 - val_auc: 0.9720 - val_prc: 0.7562
Epoch 34/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.1905 - tp: 17308.0000 - fp: 666.0000 - tn: 19789.0000 - fn: 3197.0000 - accuracy: 0.9057 - precision: 0.9629 - recall: 0.8441 - auc: 0.9780 - prc: 0.9811 - val_loss: 0.1106 - val_tp: 77.0000 - val_fp: 641.0000 - val_tn: 44843.0000 - val_fn: 8.0000 - val_accuracy: 0.9858 - val_precision: 0.1072 - val_recall: 0.9059 - val_auc: 0.9721 - val_prc: 0.7562
Epoch 35/1000
20/20 [==============================] - 1s 30ms/step - loss: 0.1914 - tp: 17310.0000 - fp: 659.0000 - tn: 19850.0000 - fn: 3141.0000 - accuracy: 0.9072 - precision: 0.9633 - recall: 0.8464 - auc: 0.9778 - prc: 0.9806 - val_loss: 0.1082 - val_tp: 77.0000 - val_fp: 649.0000 - val_tn: 44835.0000 - val_fn: 8.0000 - val_accuracy: 0.9856 - val_precision: 0.1061 - val_recall: 0.9059 - val_auc: 0.9719 - val_prc: 0.7563
Epoch 36/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.1909 - tp: 17200.0000 - fp: 615.0000 - tn: 19960.0000 - fn: 3185.0000 - accuracy: 0.9072 - precision: 0.9655 - recall: 0.8438 - auc: 0.9780 - prc: 0.9808 - val_loss: 0.1061 - val_tp: 77.0000 - val_fp: 660.0000 - val_tn: 44824.0000 - val_fn: 8.0000 - val_accuracy: 0.9853 - val_precision: 0.1045 - val_recall: 0.9059 - val_auc: 0.9719 - val_prc: 0.7561
Epoch 37/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.1885 - tp: 17408.0000 - fp: 607.0000 - tn: 19809.0000 - fn: 3136.0000 - accuracy: 0.9086 - precision: 0.9663 - recall: 0.8474 - auc: 0.9791 - prc: 0.9817 - val_loss: 0.1040 - val_tp: 77.0000 - val_fp: 664.0000 - val_tn: 44820.0000 - val_fn: 8.0000 - val_accuracy: 0.9853 - val_precision: 0.1039 - val_recall: 0.9059 - val_auc: 0.9725 - val_prc: 0.7561
Epoch 38/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.1873 - tp: 17341.0000 - fp: 645.0000 - tn: 19797.0000 - fn: 3177.0000 - accuracy: 0.9067 - precision: 0.9641 - recall: 0.8452 - auc: 0.9794 - prc: 0.9819 - val_loss: 0.1016 - val_tp: 77.0000 - val_fp: 659.0000 - val_tn: 44825.0000 - val_fn: 8.0000 - val_accuracy: 0.9854 - val_precision: 0.1046 - val_recall: 0.9059 - val_auc: 0.9726 - val_prc: 0.7560
Epoch 39/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.1834 - tp: 17359.0000 - fp: 563.0000 - tn: 19972.0000 - fn: 3066.0000 - accuracy: 0.9114 - precision: 0.9686 - recall: 0.8499 - auc: 0.9804 - prc: 0.9827 - val_loss: 0.0997 - val_tp: 77.0000 - val_fp: 662.0000 - val_tn: 44822.0000 - val_fn: 8.0000 - val_accuracy: 0.9853 - val_precision: 0.1042 - val_recall: 0.9059 - val_auc: 0.9724 - val_prc: 0.7560
Epoch 40/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.1835 - tp: 17403.0000 - fp: 614.0000 - tn: 19872.0000 - fn: 3071.0000 - accuracy: 0.9100 - precision: 0.9659 - recall: 0.8500 - auc: 0.9808 - prc: 0.9829 - val_loss: 0.0980 - val_tp: 77.0000 - val_fp: 669.0000 - val_tn: 44815.0000 - val_fn: 8.0000 - val_accuracy: 0.9851 - val_precision: 0.1032 - val_recall: 0.9059 - val_auc: 0.9729 - val_prc: 0.7561
Epoch 41/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.1751 - tp: 17468.0000 - fp: 584.0000 - tn: 19932.0000 - fn: 2976.0000 - accuracy: 0.9131 - precision: 0.9676 - recall: 0.8544 - auc: 0.9825 - prc: 0.9845 - val_loss: 0.0963 - val_tp: 77.0000 - val_fp: 681.0000 - val_tn: 44803.0000 - val_fn: 8.0000 - val_accuracy: 0.9849 - val_precision: 0.1016 - val_recall: 0.9059 - val_auc: 0.9726 - val_prc: 0.7561
Epoch 42/1000
20/20 [==============================] - 1s 30ms/step - loss: 0.1752 - tp: 17508.0000 - fp: 602.0000 - tn: 19893.0000 - fn: 2957.0000 - accuracy: 0.9131 - precision: 0.9668 - recall: 0.8555 - auc: 0.9821 - prc: 0.9843 - val_loss: 0.0951 - val_tp: 77.0000 - val_fp: 695.0000 - val_tn: 44789.0000 - val_fn: 8.0000 - val_accuracy: 0.9846 - val_precision: 0.0997 - val_recall: 0.9059 - val_auc: 0.9731 - val_prc: 0.7561
Epoch 43/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.1732 - tp: 17402.0000 - fp: 591.0000 - tn: 20004.0000 - fn: 2963.0000 - accuracy: 0.9132 - precision: 0.9672 - recall: 0.8545 - auc: 0.9830 - prc: 0.9846 - val_loss: 0.0934 - val_tp: 77.0000 - val_fp: 698.0000 - val_tn: 44786.0000 - val_fn: 8.0000 - val_accuracy: 0.9845 - val_precision: 0.0994 - val_recall: 0.9059 - val_auc: 0.9732 - val_prc: 0.7560
Epoch 44/1000
20/20 [==============================] - 1s 28ms/step - loss: 0.1722 - tp: 17422.0000 - fp: 618.0000 - tn: 19934.0000 - fn: 2986.0000 - accuracy: 0.9120 - precision: 0.9657 - recall: 0.8537 - auc: 0.9829 - prc: 0.9847 - val_loss: 0.0920 - val_tp: 77.0000 - val_fp: 700.0000 - val_tn: 44784.0000 - val_fn: 8.0000 - val_accuracy: 0.9845 - val_precision: 0.0991 - val_recall: 0.9059 - val_auc: 0.9736 - val_prc: 0.7574
Epoch 45/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.1699 - tp: 17571.0000 - fp: 540.0000 - tn: 19883.0000 - fn: 2966.0000 - accuracy: 0.9144 - precision: 0.9702 - recall: 0.8556 - auc: 0.9836 - prc: 0.9855 - val_loss: 0.0910 - val_tp: 77.0000 - val_fp: 717.0000 - val_tn: 44767.0000 - val_fn: 8.0000 - val_accuracy: 0.9841 - val_precision: 0.0970 - val_recall: 0.9059 - val_auc: 0.9738 - val_prc: 0.7574
Epoch 46/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.1685 - tp: 17630.0000 - fp: 571.0000 - tn: 19845.0000 - fn: 2914.0000 - accuracy: 0.9149 - precision: 0.9686 - recall: 0.8582 - auc: 0.9839 - prc: 0.9858 - val_loss: 0.0902 - val_tp: 77.0000 - val_fp: 725.0000 - val_tn: 44759.0000 - val_fn: 8.0000 - val_accuracy: 0.9839 - val_precision: 0.0960 - val_recall: 0.9059 - val_auc: 0.9738 - val_prc: 0.7574
Epoch 47/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.1690 - tp: 17542.0000 - fp: 601.0000 - tn: 19862.0000 - fn: 2955.0000 - accuracy: 0.9132 - precision: 0.9669 - recall: 0.8558 - auc: 0.9841 - prc: 0.9857 - val_loss: 0.0887 - val_tp: 77.0000 - val_fp: 723.0000 - val_tn: 44761.0000 - val_fn: 8.0000 - val_accuracy: 0.9840 - val_precision: 0.0962 - val_recall: 0.9059 - val_auc: 0.9742 - val_prc: 0.7573
Epoch 48/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.1659 - tp: 17624.0000 - fp: 600.0000 - tn: 19835.0000 - fn: 2901.0000 - accuracy: 0.9145 - precision: 0.9671 - recall: 0.8587 - auc: 0.9846 - prc: 0.9859 - val_loss: 0.0874 - val_tp: 77.0000 - val_fp: 726.0000 - val_tn: 44758.0000 - val_fn: 8.0000 - val_accuracy: 0.9839 - val_precision: 0.0959 - val_recall: 0.9059 - val_auc: 0.9742 - val_prc: 0.7573
Epoch 49/1000
20/20 [==============================] - 1s 30ms/step - loss: 0.1662 - tp: 17573.0000 - fp: 575.0000 - tn: 19915.0000 - fn: 2897.0000 - accuracy: 0.9152 - precision: 0.9683 - recall: 0.8585 - auc: 0.9845 - prc: 0.9859 - val_loss: 0.0860 - val_tp: 77.0000 - val_fp: 719.0000 - val_tn: 44765.0000 - val_fn: 8.0000 - val_accuracy: 0.9840 - val_precision: 0.0967 - val_recall: 0.9059 - val_auc: 0.9744 - val_prc: 0.7574
Epoch 50/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.1642 - tp: 17583.0000 - fp: 545.0000 - tn: 19895.0000 - fn: 2937.0000 - accuracy: 0.9150 - precision: 0.9699 - recall: 0.8569 - auc: 0.9853 - prc: 0.9866 - val_loss: 0.0848 - val_tp: 77.0000 - val_fp: 722.0000 - val_tn: 44762.0000 - val_fn: 8.0000 - val_accuracy: 0.9840 - val_precision: 0.0964 - val_recall: 0.9059 - val_auc: 0.9744 - val_prc: 0.7574
Epoch 51/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.1604 - tp: 17654.0000 - fp: 548.0000 - tn: 19880.0000 - fn: 2878.0000 - accuracy: 0.9164 - precision: 0.9699 - recall: 0.8598 - auc: 0.9861 - prc: 0.9873 - val_loss: 0.0841 - val_tp: 77.0000 - val_fp: 736.0000 - val_tn: 44748.0000 - val_fn: 8.0000 - val_accuracy: 0.9837 - val_precision: 0.0947 - val_recall: 0.9059 - val_auc: 0.9742 - val_prc: 0.7573
Epoch 52/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.1608 - tp: 17634.0000 - fp: 536.0000 - tn: 19927.0000 - fn: 2863.0000 - accuracy: 0.9170 - precision: 0.9705 - recall: 0.8603 - auc: 0.9861 - prc: 0.9872 - val_loss: 0.0831 - val_tp: 77.0000 - val_fp: 740.0000 - val_tn: 44744.0000 - val_fn: 8.0000 - val_accuracy: 0.9836 - val_precision: 0.0942 - val_recall: 0.9059 - val_auc: 0.9739 - val_prc: 0.7576
Epoch 53/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.1607 - tp: 17679.0000 - fp: 612.0000 - tn: 19845.0000 - fn: 2824.0000 - accuracy: 0.9161 - precision: 0.9665 - recall: 0.8623 - auc: 0.9858 - prc: 0.9869 - val_loss: 0.0817 - val_tp: 77.0000 - val_fp: 737.0000 - val_tn: 44747.0000 - val_fn: 8.0000 - val_accuracy: 0.9837 - val_precision: 0.0946 - val_recall: 0.9059 - val_auc: 0.9743 - val_prc: 0.7575
Epoch 54/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.1568 - tp: 17632.0000 - fp: 568.0000 - tn: 19873.0000 - fn: 2887.0000 - accuracy: 0.9156 - precision: 0.9688 - recall: 0.8593 - auc: 0.9866 - prc: 0.9877 - val_loss: 0.0809 - val_tp: 77.0000 - val_fp: 738.0000 - val_tn: 44746.0000 - val_fn: 8.0000 - val_accuracy: 0.9836 - val_precision: 0.0945 - val_recall: 0.9059 - val_auc: 0.9745 - val_prc: 0.7576
Epoch 55/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.1564 - tp: 17681.0000 - fp: 549.0000 - tn: 19927.0000 - fn: 2803.0000 - accuracy: 0.9182 - precision: 0.9699 - recall: 0.8632 - auc: 0.9869 - prc: 0.9878 - val_loss: 0.0801 - val_tp: 77.0000 - val_fp: 745.0000 - val_tn: 44739.0000 - val_fn: 8.0000 - val_accuracy: 0.9835 - val_precision: 0.0937 - val_recall: 0.9059 - val_auc: 0.9742 - val_prc: 0.7580
Epoch 56/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.1540 - tp: 17695.0000 - fp: 544.0000 - tn: 19957.0000 - fn: 2764.0000 - accuracy: 0.9192 - precision: 0.9702 - recall: 0.8649 - auc: 0.9871 - prc: 0.9882 - val_loss: 0.0792 - val_tp: 77.0000 - val_fp: 745.0000 - val_tn: 44739.0000 - val_fn: 8.0000 - val_accuracy: 0.9835 - val_precision: 0.0937 - val_recall: 0.9059 - val_auc: 0.9745 - val_prc: 0.7581
Epoch 57/1000
20/20 [==============================] - 1s 30ms/step - loss: 0.1536 - tp: 17757.0000 - fp: 549.0000 - tn: 19856.0000 - fn: 2798.0000 - accuracy: 0.9183 - precision: 0.9700 - recall: 0.8639 - auc: 0.9873 - prc: 0.9884 - val_loss: 0.0789 - val_tp: 77.0000 - val_fp: 761.0000 - val_tn: 44723.0000 - val_fn: 8.0000 - val_accuracy: 0.9831 - val_precision: 0.0919 - val_recall: 0.9059 - val_auc: 0.9737 - val_prc: 0.7581
Epoch 58/1000
20/20 [==============================] - 1s 30ms/step - loss: 0.1534 - tp: 17586.0000 - fp: 560.0000 - tn: 20056.0000 - fn: 2758.0000 - accuracy: 0.9190 - precision: 0.9691 - recall: 0.8644 - auc: 0.9874 - prc: 0.9882 - val_loss: 0.0780 - val_tp: 77.0000 - val_fp: 764.0000 - val_tn: 44720.0000 - val_fn: 8.0000 - val_accuracy: 0.9831 - val_precision: 0.0916 - val_recall: 0.9059 - val_auc: 0.9741 - val_prc: 0.7581
Epoch 59/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.1511 - tp: 17720.0000 - fp: 584.0000 - tn: 20010.0000 - fn: 2646.0000 - accuracy: 0.9211 - precision: 0.9681 - recall: 0.8701 - auc: 0.9874 - prc: 0.9883 - val_loss: 0.0773 - val_tp: 77.0000 - val_fp: 762.0000 - val_tn: 44722.0000 - val_fn: 8.0000 - val_accuracy: 0.9831 - val_precision: 0.0918 - val_recall: 0.9059 - val_auc: 0.9733 - val_prc: 0.7495
Epoch 60/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.1492 - tp: 17780.0000 - fp: 558.0000 - tn: 20016.0000 - fn: 2606.0000 - accuracy: 0.9228 - precision: 0.9696 - recall: 0.8722 - auc: 0.9881 - prc: 0.9889 - val_loss: 0.0765 - val_tp: 77.0000 - val_fp: 764.0000 - val_tn: 44720.0000 - val_fn: 8.0000 - val_accuracy: 0.9831 - val_precision: 0.0916 - val_recall: 0.9059 - val_auc: 0.9734 - val_prc: 0.7495
Epoch 61/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.1501 - tp: 17722.0000 - fp: 585.0000 - tn: 20016.0000 - fn: 2637.0000 - accuracy: 0.9213 - precision: 0.9680 - recall: 0.8705 - auc: 0.9879 - prc: 0.9885 - val_loss: 0.0756 - val_tp: 77.0000 - val_fp: 756.0000 - val_tn: 44728.0000 - val_fn: 8.0000 - val_accuracy: 0.9832 - val_precision: 0.0924 - val_recall: 0.9059 - val_auc: 0.9737 - val_prc: 0.7496
Epoch 62/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.1446 - tp: 17841.0000 - fp: 551.0000 - tn: 20025.0000 - fn: 2543.0000 - accuracy: 0.9245 - precision: 0.9700 - recall: 0.8752 - auc: 0.9887 - prc: 0.9895 - val_loss: 0.0746 - val_tp: 77.0000 - val_fp: 749.0000 - val_tn: 44735.0000 - val_fn: 8.0000 - val_accuracy: 0.9834 - val_precision: 0.0932 - val_recall: 0.9059 - val_auc: 0.9740 - val_prc: 0.7496
Epoch 63/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.1436 - tp: 17834.0000 - fp: 550.0000 - tn: 20035.0000 - fn: 2541.0000 - accuracy: 0.9245 - precision: 0.9701 - recall: 0.8753 - auc: 0.9890 - prc: 0.9896 - val_loss: 0.0738 - val_tp: 77.0000 - val_fp: 749.0000 - val_tn: 44735.0000 - val_fn: 8.0000 - val_accuracy: 0.9834 - val_precision: 0.0932 - val_recall: 0.9059 - val_auc: 0.9743 - val_prc: 0.7497
Epoch 64/1000
20/20 [==============================] - 1s 30ms/step - loss: 0.1450 - tp: 17965.0000 - fp: 591.0000 - tn: 19909.0000 - fn: 2495.0000 - accuracy: 0.9247 - precision: 0.9682 - recall: 0.8781 - auc: 0.9887 - prc: 0.9895 - val_loss: 0.0726 - val_tp: 77.0000 - val_fp: 745.0000 - val_tn: 44739.0000 - val_fn: 8.0000 - val_accuracy: 0.9835 - val_precision: 0.0937 - val_recall: 0.9059 - val_auc: 0.9722 - val_prc: 0.7499
Epoch 65/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.1425 - tp: 17941.0000 - fp: 536.0000 - tn: 19921.0000 - fn: 2562.0000 - accuracy: 0.9244 - precision: 0.9710 - recall: 0.8750 - auc: 0.9894 - prc: 0.9900 - val_loss: 0.0718 - val_tp: 77.0000 - val_fp: 745.0000 - val_tn: 44739.0000 - val_fn: 8.0000 - val_accuracy: 0.9835 - val_precision: 0.0937 - val_recall: 0.9059 - val_auc: 0.9724 - val_prc: 0.7500
Epoch 66/1000
20/20 [==============================] - 1s 29ms/step - loss: 0.1415 - tp: 17838.0000 - fp: 567.0000 - tn: 20032.0000 - fn: 2523.0000 - accuracy: 0.9246 - precision: 0.9692 - recall: 0.8761 - auc: 0.9892 - prc: 0.9898 - val_loss: 0.0714 - val_tp: 77.0000 - val_fp: 744.0000 - val_tn: 44740.0000 - val_fn: 8.0000 - val_accuracy: 0.9835 - val_precision: 0.0938 - val_recall: 0.9059 - val_auc: 0.9726 - val_prc: 0.7499
Restoring model weights from the end of the best epoch.
Epoch 00066: early stopping

훈련 이력 재확인

plot_metrics(resampled_history)

png

메트릭 평가

train_predictions_resampled = resampled_model.predict(train_features, batch_size=BATCH_SIZE)
test_predictions_resampled = resampled_model.predict(test_features, batch_size=BATCH_SIZE)
resampled_results = resampled_model.evaluate(test_features, test_labels,
                                             batch_size=BATCH_SIZE, verbose=0)
for name, value in zip(resampled_model.metrics_names, resampled_results):
  print(name, ': ', value)
print()

plot_cm(test_labels, test_predictions_resampled)
loss :  0.07805436849594116
tp :  82.0
fp :  923.0
tn :  55945.0
fn :  12.0
accuracy :  0.98358553647995
precision :  0.08159203827381134
recall :  0.8723404407501221
auc :  0.9706520438194275
prc :  0.698624849319458

Legitimate Transactions Detected (True Negatives):  55945
Legitimate Transactions Incorrectly Detected (False Positives):  923
Fraudulent Transactions Missed (False Negatives):  12
Fraudulent Transactions Detected (True Positives):  82
Total Fraudulent Transactions:  94

png

ROC 플로팅

plot_roc("Train Baseline", train_labels, train_predictions_baseline, color=colors[0])
plot_roc("Test Baseline", test_labels, test_predictions_baseline, color=colors[0], linestyle='--')

plot_roc("Train Weighted", train_labels, train_predictions_weighted, color=colors[1])
plot_roc("Test Weighted", test_labels, test_predictions_weighted, color=colors[1], linestyle='--')

plot_roc("Train Resampled", train_labels, train_predictions_resampled, color=colors[2])
plot_roc("Test Resampled", test_labels, test_predictions_resampled, color=colors[2], linestyle='--')
plt.legend(loc='lower right')
<matplotlib.legend.Legend at 0x7ff9afdcf4d0>

png

AUPRC 플로팅

plot_prc("Train Baseline", train_labels, train_predictions_baseline, color=colors[0])
plot_prc("Test Baseline", test_labels, test_predictions_baseline, color=colors[0], linestyle='--')

plot_prc("Train Weighted", train_labels, train_predictions_weighted, color=colors[1])
plot_prc("Test Weighted", test_labels, test_predictions_weighted, color=colors[1], linestyle='--')

plot_prc("Train Resampled", train_labels, train_predictions_resampled, color=colors[2])
plot_prc("Test Resampled", test_labels, test_predictions_resampled, color=colors[2], linestyle='--')
plt.legend(loc='lower right')
<matplotlib.legend.Legend at 0x7ff9ccd52410>

png

튜토리얼을 이 문제에 적용

불균형 데이터 분류는 학습 할 샘플이 너무 적기 때문에 본질적으로 어려운 작업입니다. 항상 데이터부터 시작하여 가능한 한 많은 샘플을 수집하고 모델이 소수 클래스를 최대한 활용할 수 있도록 어떤 기능이 관련 될 수 있는지에 대해 실질적인 생각을 하도록 최선을 다해야 합니다. 어떤 시점에서 모델은 원하는 결과를 개선하고 산출하는데 어려움을 겪을 수 있으므로 문제의 컨텍스트와 다양한 유형의 오류 간의 균형을 염두에 두는 것이 중요합니다.