本頁面由 Cloud Translation API 翻譯而成。
Switch to English

圖片分類

在TensorFlow.org上查看 在Google Colab中運行 在GitHub上查看源代碼 下載筆記本

本教程顯示如何對花朵圖像進行分類。它使用keras.Sequential模型創建圖像分類器,並使用preprocessing.image_dataset_from_directory加載數據。您將獲得以下概念的實踐經驗:

  • 有效地從磁盤加載數據集。
  • 識別過度擬合併應用減輕它的技術,包括數據擴充和輟學。

本教程遵循基本的機器學習工作流程:

  1. 檢查並了解數據
  2. 建立輸入管道
  3. 建立模型
  4. 訓練模型
  5. 測試模型
  6. 改進模型並重複該過程
pip install -q tf-nightly

導入TensorFlow和其他庫

 import matplotlib.pyplot as plt
import numpy as np
import os
import PIL
import tensorflow as tf

from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
 

下載並瀏覽數據集

本教程使用約3700張花朵照片的數據集。數據集包​​含5個子目錄,每個類一個:

 flower_photo/
  daisy/
  dandelion/
  roses/
  sunflowers/
  tulips/
 
 import pathlib
dataset_url = "https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz"
data_dir = tf.keras.utils.get_file('flower_photos', origin=dataset_url, untar=True)
data_dir = pathlib.Path(data_dir)
 
Downloading data from https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz
228818944/228813984 [==============================] - 4s 0us/step

下載後,您現在應該具有可用的數據集副本。總圖片有3,670張:

 image_count = len(list(data_dir.glob('*/*.jpg')))
print(image_count)
 
3670

這是一些玫瑰:

 roses = list(data_dir.glob('roses/*'))
PIL.Image.open(str(roses[0]))
 

png

 PIL.Image.open(str(roses[1]))
 

png

還有一些鬱金香:

 tulips = list(data_dir.glob('tulips/*'))
PIL.Image.open(str(tulips[0]))
 

png

 PIL.Image.open(str(tulips[1]))
 

png

使用keras.preprocessing加載

讓我們使用有用的image_dataset_from_directory實用程序從磁盤上加載這些圖像。這將使您僅用幾行代碼就可以將磁盤上的映像目錄帶到tf.data.Dataset 。如果願意,您還可以通過訪問加載圖像教程從頭開始編寫自己的數據加載代碼。

創建一個數據集

為加載程序定義一些參數:

 batch_size = 32
img_height = 180
img_width = 180
 

在開發模型時,最好使用驗證拆分。我們將使用80%的圖像進行訓練,並使用20%的圖像進行驗證。

 train_ds = tf.keras.preprocessing.image_dataset_from_directory(
  data_dir,
  validation_split=0.2,
  subset="training",
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size)
 
Found 3670 files belonging to 5 classes.
Using 2936 files for training.

 val_ds = tf.keras.preprocessing.image_dataset_from_directory(
  data_dir,
  validation_split=0.2,
  subset="validation",
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size)
 
Found 3670 files belonging to 5 classes.
Using 734 files for validation.

您可以在這些數據集的class_names屬性中找到類名稱。這些以字母順序對應於目錄名稱。

 class_names = train_ds.class_names
print(class_names)
 
['daisy', 'dandelion', 'roses', 'sunflowers', 'tulips']

可視化數據

這是訓練數據集中的前9張圖像。

 import matplotlib.pyplot as plt

plt.figure(figsize=(10, 10))
for images, labels in train_ds.take(1):
  for i in range(9):
    ax = plt.subplot(3, 3, i + 1)
    plt.imshow(images[i].numpy().astype("uint8"))
    plt.title(class_names[labels[i]])
    plt.axis("off")
 

png

您將稍後通過將這些數據集傳遞給model.fit來訓練模型。如果願意,還可以手動遍歷數據集並檢索成批圖像:

 for image_batch, labels_batch in train_ds:
  print(image_batch.shape)
  print(labels_batch.shape)
  break
 
(32, 180, 180, 3)
(32,)

image_batch是形狀為(32, 180, 180, 3) image_batch )的張量。這是一批32張形狀為180x180x3圖像(最後一個尺寸是指色彩通道RGB)。 label_batch是形狀(32,)的張量,它們是32個圖像的對應標籤。

您可以撥打.numpy()image_batchlabels_batch張量將它們轉換為一個numpy.ndarray

配置數據集以提高性能

讓我們確保使用緩衝預取,以便我們可以從磁盤產生數據而不會阻塞I / O。這是加載數據時應使用的兩種重要方法。

在第一個時期將圖像從磁盤加載後, Dataset.cache()會將圖像保留在內存中。這將確保在訓練模型時數據集不會成為瓶頸。如果數據集太大而無法容納到內存中,則也可以使用此方法來創建高性能的磁盤緩存。

Dataset.prefetch()與數據預處理和模型執行重疊。

有興趣的讀者可以在數據性能指南中了解有關這兩種方法以及如何將數據緩存到磁盤的更多信息

 AUTOTUNE = tf.data.experimental.AUTOTUNE

train_ds = train_ds.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)
 

標準化數據

RGB通道值在[0, 255]範圍內。這對於神經網絡而言並不理想;通常,您應該設法減小輸入值。在這裡,我們將使用重縮放圖層將值標準化為[0, 1]

 normalization_layer = layers.experimental.preprocessing.Rescaling(1./255)
 

有兩種使用此層的方法。您可以通過調用map將其應用於數據集:

 normalized_ds = train_ds.map(lambda x, y: (normalization_layer(x), y))
image_batch, labels_batch = next(iter(normalized_ds))
first_image = image_batch[0]
# Notice the pixels values are now in `[0,1]`.
print(np.min(first_image), np.max(first_image)) 
 
0.0 1.0

或者,您可以在模型定義中包括該層,從而簡化部署。我們將在這裡使用第二種方法。

創建模型

該模型由三個卷積塊組成,每個卷積塊中都有一個最大池層。有一個完全連接的層,上面有128個單元,可通過relu激活功能激活。該模型尚未針對高精度進行調整,本教程的目的是展示一種標準方法。

 num_classes = 5

model = Sequential([
  layers.experimental.preprocessing.Rescaling(1./255, input_shape=(img_height, img_width, 3)),
  layers.Conv2D(16, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Conv2D(32, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Conv2D(64, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Flatten(),
  layers.Dense(128, activation='relu'),
  layers.Dense(num_classes)
])
 

編譯模型

在本教程中,選擇optimizers.Adam器。 losses.SparseCategoricalCrossentropy optimizers.Adam器和損失losses.SparseCategoricalCrossentropy損失函數。要查看每個訓練時期的訓練和驗證準確性,請傳遞metrics參數。

 model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])
 

型號匯總

使用模型的summary方法查看網絡的所有層:

 model.summary()
 
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
rescaling_1 (Rescaling)      (None, 180, 180, 3)       0         
_________________________________________________________________
conv2d (Conv2D)              (None, 180, 180, 16)      448       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 90, 90, 16)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 90, 90, 32)        4640      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 45, 45, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 45, 45, 64)        18496     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 22, 22, 64)        0         
_________________________________________________________________
flatten (Flatten)            (None, 30976)             0         
_________________________________________________________________
dense (Dense)                (None, 128)               3965056   
_________________________________________________________________
dense_1 (Dense)              (None, 5)                 645       
=================================================================
Total params: 3,989,285
Trainable params: 3,989,285
Non-trainable params: 0
_________________________________________________________________

訓練模型

 epochs=10
history = model.fit(
  train_ds,
  validation_data=val_ds,
  epochs=epochs
)
 
Epoch 1/10
92/92 [==============================] - 4s 47ms/step - loss: 1.9328 - accuracy: 0.2896 - val_loss: 1.1022 - val_accuracy: 0.5245
Epoch 2/10
92/92 [==============================] - 1s 10ms/step - loss: 1.0441 - accuracy: 0.5761 - val_loss: 1.0057 - val_accuracy: 0.5913
Epoch 3/10
92/92 [==============================] - 1s 10ms/step - loss: 0.8640 - accuracy: 0.6763 - val_loss: 0.8951 - val_accuracy: 0.6499
Epoch 4/10
92/92 [==============================] - 1s 10ms/step - loss: 0.7106 - accuracy: 0.7472 - val_loss: 0.8992 - val_accuracy: 0.6621
Epoch 5/10
92/92 [==============================] - 1s 10ms/step - loss: 0.4817 - accuracy: 0.8285 - val_loss: 0.8997 - val_accuracy: 0.6662
Epoch 6/10
92/92 [==============================] - 1s 10ms/step - loss: 0.3131 - accuracy: 0.8903 - val_loss: 1.0014 - val_accuracy: 0.6567
Epoch 7/10
92/92 [==============================] - 1s 10ms/step - loss: 0.1782 - accuracy: 0.9436 - val_loss: 1.2140 - val_accuracy: 0.6431
Epoch 8/10
92/92 [==============================] - 1s 10ms/step - loss: 0.1024 - accuracy: 0.9703 - val_loss: 1.5144 - val_accuracy: 0.6240
Epoch 9/10
92/92 [==============================] - 1s 10ms/step - loss: 0.0736 - accuracy: 0.9815 - val_loss: 1.7651 - val_accuracy: 0.5926
Epoch 10/10
92/92 [==============================] - 1s 10ms/step - loss: 0.0761 - accuracy: 0.9775 - val_loss: 2.0429 - val_accuracy: 0.5967

可視化培訓結果

在訓練和驗證集上創建損失和準確性圖。

 acc = history.history['accuracy']
val_acc = history.history['val_accuracy']

loss=history.history['loss']
val_loss=history.history['val_loss']

epochs_range = range(epochs)

plt.figure(figsize=(8, 8))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')

plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()
 

png

從圖中可以看出,訓練準確性和驗證準確性相差很大,模型僅在驗證集上獲得了約60%的準確性。

讓我們看一下出了什麼問題,並嘗試提高模型的整體性能。

過度擬合

在上面的圖中,訓練精度隨時間呈線性增長,而驗證精度在訓練過程中停滯在60%左右。同樣,訓練和驗證準確性之間的準確性差異也很明顯,這是過度擬合的標誌。

當訓練示例數量很少時,模型有時會從訓練示例中的噪音或不必要的細節中學習,以至於對新示例上的模型性能產生負面影響。這種現象稱為過擬合。這意味著該模型很難推廣到新的數據集。

在訓練過程中,有多種方法可以解決過擬合的問題。在本教程中,您將使用數據擴充並將Dropout添加到模型中。

資料擴充

當培訓實例數量很少時,過度擬合通常會發生。 數據擴充採用以下方法,從現有示例中生成額外的訓練數據:先進行擴充,然後使用隨機變換生成看起來可信的圖像。這有助於使模型暴露於數據的更多方面,並且可以更好地進行概括。

我們將使用實驗性Keras預處理層實現數據增強。這些可以像其他圖層一樣包含在模型內部,並在GPU上運行。

 data_augmentation = keras.Sequential(
  [
    layers.experimental.preprocessing.RandomFlip("horizontal", 
                                                 input_shape=(img_height, 
                                                              img_width,
                                                              3)),
    layers.experimental.preprocessing.RandomRotation(0.1),
    layers.experimental.preprocessing.RandomZoom(0.1),
  ]
)
 

讓我們通過將數據增強多次應用於同一幅圖像來形象化一些增強示例:

 plt.figure(figsize=(10, 10))
for images, _ in train_ds.take(1):
  for i in range(9):
    augmented_images = data_augmentation(images)
    ax = plt.subplot(3, 3, i + 1)
    plt.imshow(augmented_images[0].numpy().astype("uint8"))
    plt.axis("off")
 

png

我們稍後將使用數據增強來訓練模型。

退出

減少過度擬合的另一種技術是將Dropout引入網絡,這是一種正則化形式。

將Dropout應用於圖層時,它會在訓練過程中從該圖層中隨機退出(通過將激活設置為零)許多輸出單元。輟學採用分數形式作為其輸入值,形式為0.1、0.2、0.4等。這意味著從所施加的層中隨機退出輸出單元的10%,20%或40%。

讓我們使用layers.Dropout創建一個新的神經網絡,然後使用增強圖像對其進行訓練。

 model = Sequential([
  data_augmentation,
  layers.experimental.preprocessing.Rescaling(1./255),
  layers.Conv2D(16, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Conv2D(32, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Conv2D(64, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Dropout(0.2),
  layers.Flatten(),
  layers.Dense(128, activation='relu'),
  layers.Dense(num_classes)
])
 

編譯和訓練模型

 model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])
 
 model.summary()
 
Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
sequential_1 (Sequential)    (None, 180, 180, 3)       0         
_________________________________________________________________
rescaling_2 (Rescaling)      (None, 180, 180, 3)       0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 180, 180, 16)      448       
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 90, 90, 16)        0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 90, 90, 32)        4640      
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 45, 45, 32)        0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 45, 45, 64)        18496     
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 22, 22, 64)        0         
_________________________________________________________________
dropout (Dropout)            (None, 22, 22, 64)        0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 30976)             0         
_________________________________________________________________
dense_2 (Dense)              (None, 128)               3965056   
_________________________________________________________________
dense_3 (Dense)              (None, 5)                 645       
=================================================================
Total params: 3,989,285
Trainable params: 3,989,285
Non-trainable params: 0
_________________________________________________________________

 epochs = 15
history = model.fit(
  train_ds,
  validation_data=val_ds,
  epochs=epochs
)
 
Epoch 1/15
92/92 [==============================] - 2s 20ms/step - loss: 1.4842 - accuracy: 0.3279 - val_loss: 1.0863 - val_accuracy: 0.5640
Epoch 2/15
92/92 [==============================] - 1s 12ms/step - loss: 1.1215 - accuracy: 0.5284 - val_loss: 1.0374 - val_accuracy: 0.6022
Epoch 3/15
92/92 [==============================] - 1s 12ms/step - loss: 0.9680 - accuracy: 0.6117 - val_loss: 0.9200 - val_accuracy: 0.6485
Epoch 4/15
92/92 [==============================] - 1s 12ms/step - loss: 0.8538 - accuracy: 0.6753 - val_loss: 0.9206 - val_accuracy: 0.6417
Epoch 5/15
92/92 [==============================] - 1s 12ms/step - loss: 0.7744 - accuracy: 0.6977 - val_loss: 0.8169 - val_accuracy: 0.6839
Epoch 6/15
92/92 [==============================] - 1s 13ms/step - loss: 0.7758 - accuracy: 0.7093 - val_loss: 0.7743 - val_accuracy: 0.6880
Epoch 7/15
92/92 [==============================] - 1s 13ms/step - loss: 0.6805 - accuracy: 0.7481 - val_loss: 0.8598 - val_accuracy: 0.6540
Epoch 8/15
92/92 [==============================] - 1s 13ms/step - loss: 0.7132 - accuracy: 0.7278 - val_loss: 0.7177 - val_accuracy: 0.7207
Epoch 9/15
92/92 [==============================] - 1s 13ms/step - loss: 0.6634 - accuracy: 0.7580 - val_loss: 0.7152 - val_accuracy: 0.7166
Epoch 10/15
92/92 [==============================] - 1s 13ms/step - loss: 0.6562 - accuracy: 0.7538 - val_loss: 0.7251 - val_accuracy: 0.7248
Epoch 11/15
92/92 [==============================] - 1s 13ms/step - loss: 0.5798 - accuracy: 0.7840 - val_loss: 0.7016 - val_accuracy: 0.7357
Epoch 12/15
92/92 [==============================] - 1s 13ms/step - loss: 0.5635 - accuracy: 0.7913 - val_loss: 0.7755 - val_accuracy: 0.7248
Epoch 13/15
92/92 [==============================] - 1s 13ms/step - loss: 0.5361 - accuracy: 0.7982 - val_loss: 0.7575 - val_accuracy: 0.7153
Epoch 14/15
92/92 [==============================] - 1s 13ms/step - loss: 0.5420 - accuracy: 0.8022 - val_loss: 0.7375 - val_accuracy: 0.7302
Epoch 15/15
92/92 [==============================] - 1s 12ms/step - loss: 0.5132 - accuracy: 0.8120 - val_loss: 0.7561 - val_accuracy: 0.7289

可視化培訓結果

在應用數據增強和Dropout之後,與以前相比,過度擬合的情況減少了,並且訓練和驗證的準確性更加緊密。

 acc = history.history['accuracy']
val_acc = history.history['val_accuracy']

loss = history.history['loss']
val_loss = history.history['val_loss']

epochs_range = range(epochs)

plt.figure(figsize=(8, 8))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')

plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()
 

png

預測新數據

最後,讓我們使用模型對訓練或驗證集中未包含的圖像進行分類。

 sunflower_url = "https://storage.googleapis.com/download.tensorflow.org/example_images/592px-Red_sunflower.jpg"
sunflower_path = tf.keras.utils.get_file('Red_sunflower', origin=sunflower_url)

img = keras.preprocessing.image.load_img(
    sunflower_path, target_size=(img_height, img_width)
)
img_array = keras.preprocessing.image.img_to_array(img)
img_array = tf.expand_dims(img_array, 0) # Create a batch

predictions = model.predict(img_array)
score = tf.nn.softmax(predictions[0])

print(
    "This image most likely belongs to {} with a {:.2f} percent confidence."
    .format(class_names[np.argmax(score)], 100 * np.max(score))
)
 
Downloading data from https://storage.googleapis.com/download.tensorflow.org/example_images/592px-Red_sunflower.jpg
122880/117948 [===============================] - 0s 0us/step
This image most likely belongs to sunflowers with a 89.39 percent confidence.