此页面由 Cloud Translation API 翻译。
Switch to English

图片分类

在TensorFlow.org上查看 在Google Colab中运行 在GitHub上查看源代码 下载笔记本

本教程显示如何对花朵图像进行分类。它使用keras.Sequential模型创建图像分类器,并使用preprocessing.image_dataset_from_directory加载数据。您将获得以下概念的实践经验:

  • 有效地从磁盘加载数据集。
  • 识别过度拟合并应用减轻它的技术,包括数据扩充和辍学。

本教程遵循基本的机器学习工作流程:

  1. 检查并了解数据
  2. 建立输入管道
  3. 建立模型
  4. 训练模型
  5. 测试模型
  6. 改进模型并重复该过程
pip install -q tf-nightly

导入TensorFlow和其他库

 import matplotlib.pyplot as plt
import numpy as np
import os
import PIL
import tensorflow as tf

from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
 

下载并浏览数据集

本教程使用约3700张花朵照片的数据集。数据集包含5个子目录,每个类一个:

 flower_photo/
  daisy/
  dandelion/
  roses/
  sunflowers/
  tulips/
 
 import pathlib
dataset_url = "https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz"
data_dir = tf.keras.utils.get_file('flower_photos', origin=dataset_url, untar=True)
data_dir = pathlib.Path(data_dir)
 
Downloading data from https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz
228818944/228813984 [==============================] - 4s 0us/step

下载后,您现在应该具有可用的数据集副本。总图片有3,670张:

 image_count = len(list(data_dir.glob('*/*.jpg')))
print(image_count)
 
3670

这是一些玫瑰:

 roses = list(data_dir.glob('roses/*'))
PIL.Image.open(str(roses[0]))
 

png

 PIL.Image.open(str(roses[1]))
 

png

还有一些郁金香:

 tulips = list(data_dir.glob('tulips/*'))
PIL.Image.open(str(tulips[0]))
 

png

 PIL.Image.open(str(tulips[1]))
 

png

使用keras.preprocessing加载

让我们使用有用的image_dataset_from_directory实用程序从磁盘上加载这些图像。这将使您仅用几行代码就可以将磁盘上的映像目录带到tf.data.Dataset 。如果愿意,您还可以通过访问加载图像教程从头开始编写自己的数据加载代码。

创建一个数据集

为加载程序定义一些参数:

 batch_size = 32
img_height = 180
img_width = 180
 

在开发模型时,最好使用验证拆分。我们将使用80%的图像进行训练,并使用20%的图像进行验证。

 train_ds = tf.keras.preprocessing.image_dataset_from_directory(
  data_dir,
  validation_split=0.2,
  subset="training",
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size)
 
Found 3670 files belonging to 5 classes.
Using 2936 files for training.

 val_ds = tf.keras.preprocessing.image_dataset_from_directory(
  data_dir,
  validation_split=0.2,
  subset="validation",
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size)
 
Found 3670 files belonging to 5 classes.
Using 734 files for validation.

您可以在这些数据集的class_names属性中找到类名称。这些以字母顺序对应于目录名称。

 class_names = train_ds.class_names
print(class_names)
 
['daisy', 'dandelion', 'roses', 'sunflowers', 'tulips']

可视化数据

这是训练数据集中的前9张图像。

 import matplotlib.pyplot as plt

plt.figure(figsize=(10, 10))
for images, labels in train_ds.take(1):
  for i in range(9):
    ax = plt.subplot(3, 3, i + 1)
    plt.imshow(images[i].numpy().astype("uint8"))
    plt.title(class_names[labels[i]])
    plt.axis("off")
 

png

您将稍后通过将这些数据集传递给model.fit来训练模型。如果愿意,还可以手动遍历数据集并检索成批图像:

 for image_batch, labels_batch in train_ds:
  print(image_batch.shape)
  print(labels_batch.shape)
  break
 
(32, 180, 180, 3)
(32,)

image_batch是形状为(32, 180, 180, 3) image_batch )的张量。这是一批32张形状为180x180x3图像(最后一个尺寸是指色彩通道RGB)。 label_batch是形状(32,)的张量,它们是32个图像的对应标签。

您可以拨打.numpy()image_batchlabels_batch张量将它们转换为一个numpy.ndarray

配置数据集以提高性能

让我们确保使用缓冲预取,以便我们可以从磁盘产生数据而不会阻塞I / O。这是加载数据时应使用的两种重要方法。

在第一个时期将图像从磁盘加载后, Dataset.cache()会将图像保留在内存中。这将确保在训练模型时数据集不会成为瓶颈。如果您的数据集太大而无法容纳到内存中,则也可以使用此方法来创建高性能的磁盘缓存。

Dataset.prefetch()与数据预处理和模型执行重叠。

有兴趣的读者可以在数据性能指南中了解有关这两种方法以及如何将数据缓存到磁盘的更多信息

 AUTOTUNE = tf.data.experimental.AUTOTUNE

train_ds = train_ds.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)
 

标准化数据

RGB通道值在[0, 255]范围内。这对于神经网络而言并不理想;通常,您应该设法减小输入值。在这里,我们将使用重缩放图层将值标准化为[0, 1]

 normalization_layer = layers.experimental.preprocessing.Rescaling(1./255)
 

有两种使用此层的方法。您可以通过调用map将其应用于数据集:

 normalized_ds = train_ds.map(lambda x, y: (normalization_layer(x), y))
image_batch, labels_batch = next(iter(normalized_ds))
first_image = image_batch[0]
# Notice the pixels values are now in `[0,1]`.
print(np.min(first_image), np.max(first_image)) 
 
0.0 1.0

或者,您可以在模型定义中包括该层,从而简化部署。我们将在这里使用第二种方法。

创建模型

该模型由三个卷积块组成,每个卷积块中都有一个最大池层。有一个完全连接的层,上面有128个单元,可通过relu激活功能激活。该模型尚未针对高精度进行调整,本教程的目的是展示一种标准方法。

 num_classes = 5

model = Sequential([
  layers.experimental.preprocessing.Rescaling(1./255, input_shape=(img_height, img_width, 3)),
  layers.Conv2D(16, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Conv2D(32, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Conv2D(64, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Flatten(),
  layers.Dense(128, activation='relu'),
  layers.Dense(num_classes)
])
 

编译模型

在本教程中,选择optimizers.Adam器。 losses.SparseCategoricalCrossentropy optimizers.Adam器和损失losses.SparseCategoricalCrossentropy损失函数。要查看每个训练时期的训练和验证准确性,请传递metrics参数。

 model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])
 

型号汇总

使用模型的summary方法查看网络的所有层:

 model.summary()
 
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
rescaling_1 (Rescaling)      (None, 180, 180, 3)       0         
_________________________________________________________________
conv2d (Conv2D)              (None, 180, 180, 16)      448       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 90, 90, 16)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 90, 90, 32)        4640      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 45, 45, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 45, 45, 64)        18496     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 22, 22, 64)        0         
_________________________________________________________________
flatten (Flatten)            (None, 30976)             0         
_________________________________________________________________
dense (Dense)                (None, 128)               3965056   
_________________________________________________________________
dense_1 (Dense)              (None, 5)                 645       
=================================================================
Total params: 3,989,285
Trainable params: 3,989,285
Non-trainable params: 0
_________________________________________________________________

训练模型

 epochs=10
history = model.fit(
  train_ds,
  validation_data=val_ds,
  epochs=epochs
)
 
Epoch 1/10
92/92 [==============================] - 4s 47ms/step - loss: 1.9328 - accuracy: 0.2896 - val_loss: 1.1022 - val_accuracy: 0.5245
Epoch 2/10
92/92 [==============================] - 1s 10ms/step - loss: 1.0441 - accuracy: 0.5761 - val_loss: 1.0057 - val_accuracy: 0.5913
Epoch 3/10
92/92 [==============================] - 1s 10ms/step - loss: 0.8640 - accuracy: 0.6763 - val_loss: 0.8951 - val_accuracy: 0.6499
Epoch 4/10
92/92 [==============================] - 1s 10ms/step - loss: 0.7106 - accuracy: 0.7472 - val_loss: 0.8992 - val_accuracy: 0.6621
Epoch 5/10
92/92 [==============================] - 1s 10ms/step - loss: 0.4817 - accuracy: 0.8285 - val_loss: 0.8997 - val_accuracy: 0.6662
Epoch 6/10
92/92 [==============================] - 1s 10ms/step - loss: 0.3131 - accuracy: 0.8903 - val_loss: 1.0014 - val_accuracy: 0.6567
Epoch 7/10
92/92 [==============================] - 1s 10ms/step - loss: 0.1782 - accuracy: 0.9436 - val_loss: 1.2140 - val_accuracy: 0.6431
Epoch 8/10
92/92 [==============================] - 1s 10ms/step - loss: 0.1024 - accuracy: 0.9703 - val_loss: 1.5144 - val_accuracy: 0.6240
Epoch 9/10
92/92 [==============================] - 1s 10ms/step - loss: 0.0736 - accuracy: 0.9815 - val_loss: 1.7651 - val_accuracy: 0.5926
Epoch 10/10
92/92 [==============================] - 1s 10ms/step - loss: 0.0761 - accuracy: 0.9775 - val_loss: 2.0429 - val_accuracy: 0.5967

可视化培训结果

在训练和验证集上创建损失和准确性图。

 acc = history.history['accuracy']
val_acc = history.history['val_accuracy']

loss=history.history['loss']
val_loss=history.history['val_loss']

epochs_range = range(epochs)

plt.figure(figsize=(8, 8))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')

plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()
 

png

从图中可以看出,训练准确性和验证准确性相差很大,模型仅在验证集上获得了约60%的准确性。

让我们看一下出了什么问题,并尝试提高模型的整体性能。

过度拟合

在上面的图中,训练精度随时间呈线性增长,而验证精度在训练过程中停滞在60%左右。同样,训练和验证准确性之间的准确性差异也很明显,这是过度拟合的标志。

当训练示例数量很少时,模型有时会从训练示例中的噪音或不必要的细节中学习,以至于对新示例上的模型性能产生负面影响。这种现象称为过拟合。这意味着该模型很难推广到新的数据集。

在训练过程中,有多种方法可以解决过拟合的问题。在本教程中,您将使用数据扩充并将Dropout添加到模型中。

资料扩充

当培训实例数量很少时,过度拟合通常会发生。 数据扩充采用以下方法,从现有示例中生成额外的训练数据:先进行扩充,然后使用随机转换生成看起来可信的图像。这有助于使模型暴露于数据的更多方面,并且可以更好地进行概括。

我们将使用实验性Keras预处理层实现数据增强。这些可以像其他图层一样包含在模型内部,并在GPU上运行。

 data_augmentation = keras.Sequential(
  [
    layers.experimental.preprocessing.RandomFlip("horizontal", 
                                                 input_shape=(img_height, 
                                                              img_width,
                                                              3)),
    layers.experimental.preprocessing.RandomRotation(0.1),
    layers.experimental.preprocessing.RandomZoom(0.1),
  ]
)
 

让我们通过将数据增强多次应用于同一幅图像来形象化一些增强示例:

 plt.figure(figsize=(10, 10))
for images, _ in train_ds.take(1):
  for i in range(9):
    augmented_images = data_augmentation(images)
    ax = plt.subplot(3, 3, i + 1)
    plt.imshow(augmented_images[0].numpy().astype("uint8"))
    plt.axis("off")
 

png

我们稍后将使用数据增强来训练模型。

退出

减少过度拟合的另一种技术是将Dropout引入网络,这是一种正则化形式。

在将Dropout应用于图层时,它会在训练过程中从该图层中随机退出(通过将激活设置为零)许多输出单元。辍学采用分数形式作为其输入值,形式为0.1、0.2、0.4等。这意味着从所施加的层中随机退出输出单元的10%,20%或40%。

让我们使用layers.Dropout创建一个新的神经网络,然后使用增强图像对其进行训练。

 model = Sequential([
  data_augmentation,
  layers.experimental.preprocessing.Rescaling(1./255),
  layers.Conv2D(16, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Conv2D(32, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Conv2D(64, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Dropout(0.2),
  layers.Flatten(),
  layers.Dense(128, activation='relu'),
  layers.Dense(num_classes)
])
 

编译和训练模型

 model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])
 
 model.summary()
 
Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
sequential_1 (Sequential)    (None, 180, 180, 3)       0         
_________________________________________________________________
rescaling_2 (Rescaling)      (None, 180, 180, 3)       0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 180, 180, 16)      448       
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 90, 90, 16)        0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 90, 90, 32)        4640      
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 45, 45, 32)        0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 45, 45, 64)        18496     
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 22, 22, 64)        0         
_________________________________________________________________
dropout (Dropout)            (None, 22, 22, 64)        0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 30976)             0         
_________________________________________________________________
dense_2 (Dense)              (None, 128)               3965056   
_________________________________________________________________
dense_3 (Dense)              (None, 5)                 645       
=================================================================
Total params: 3,989,285
Trainable params: 3,989,285
Non-trainable params: 0
_________________________________________________________________

 epochs = 15
history = model.fit(
  train_ds,
  validation_data=val_ds,
  epochs=epochs
)
 
Epoch 1/15
92/92 [==============================] - 2s 20ms/step - loss: 1.4842 - accuracy: 0.3279 - val_loss: 1.0863 - val_accuracy: 0.5640
Epoch 2/15
92/92 [==============================] - 1s 12ms/step - loss: 1.1215 - accuracy: 0.5284 - val_loss: 1.0374 - val_accuracy: 0.6022
Epoch 3/15
92/92 [==============================] - 1s 12ms/step - loss: 0.9680 - accuracy: 0.6117 - val_loss: 0.9200 - val_accuracy: 0.6485
Epoch 4/15
92/92 [==============================] - 1s 12ms/step - loss: 0.8538 - accuracy: 0.6753 - val_loss: 0.9206 - val_accuracy: 0.6417
Epoch 5/15
92/92 [==============================] - 1s 12ms/step - loss: 0.7744 - accuracy: 0.6977 - val_loss: 0.8169 - val_accuracy: 0.6839
Epoch 6/15
92/92 [==============================] - 1s 13ms/step - loss: 0.7758 - accuracy: 0.7093 - val_loss: 0.7743 - val_accuracy: 0.6880
Epoch 7/15
92/92 [==============================] - 1s 13ms/step - loss: 0.6805 - accuracy: 0.7481 - val_loss: 0.8598 - val_accuracy: 0.6540
Epoch 8/15
92/92 [==============================] - 1s 13ms/step - loss: 0.7132 - accuracy: 0.7278 - val_loss: 0.7177 - val_accuracy: 0.7207
Epoch 9/15
92/92 [==============================] - 1s 13ms/step - loss: 0.6634 - accuracy: 0.7580 - val_loss: 0.7152 - val_accuracy: 0.7166
Epoch 10/15
92/92 [==============================] - 1s 13ms/step - loss: 0.6562 - accuracy: 0.7538 - val_loss: 0.7251 - val_accuracy: 0.7248
Epoch 11/15
92/92 [==============================] - 1s 13ms/step - loss: 0.5798 - accuracy: 0.7840 - val_loss: 0.7016 - val_accuracy: 0.7357
Epoch 12/15
92/92 [==============================] - 1s 13ms/step - loss: 0.5635 - accuracy: 0.7913 - val_loss: 0.7755 - val_accuracy: 0.7248
Epoch 13/15
92/92 [==============================] - 1s 13ms/step - loss: 0.5361 - accuracy: 0.7982 - val_loss: 0.7575 - val_accuracy: 0.7153
Epoch 14/15
92/92 [==============================] - 1s 13ms/step - loss: 0.5420 - accuracy: 0.8022 - val_loss: 0.7375 - val_accuracy: 0.7302
Epoch 15/15
92/92 [==============================] - 1s 12ms/step - loss: 0.5132 - accuracy: 0.8120 - val_loss: 0.7561 - val_accuracy: 0.7289

可视化培训结果

在应用数据增强和Dropout之后,与以前相比,过度拟合的情况减少了,并且训练和验证的准确性更加紧密。

 acc = history.history['accuracy']
val_acc = history.history['val_accuracy']

loss = history.history['loss']
val_loss = history.history['val_loss']

epochs_range = range(epochs)

plt.figure(figsize=(8, 8))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')

plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()
 

png

预测新数据

最后,让我们使用模型对训练或验证集中未包含的图像进行分类。

 sunflower_url = "https://storage.googleapis.com/download.tensorflow.org/example_images/592px-Red_sunflower.jpg"
sunflower_path = tf.keras.utils.get_file('Red_sunflower', origin=sunflower_url)

img = keras.preprocessing.image.load_img(
    sunflower_path, target_size=(img_height, img_width)
)
img_array = keras.preprocessing.image.img_to_array(img)
img_array = tf.expand_dims(img_array, 0) # Create a batch

predictions = model.predict(img_array)
score = tf.nn.softmax(predictions[0])

print(
    "This image most likely belongs to {} with a {:.2f} percent confidence."
    .format(class_names[np.argmax(score)], 100 * np.max(score))
)
 
Downloading data from https://storage.googleapis.com/download.tensorflow.org/example_images/592px-Red_sunflower.jpg
122880/117948 [===============================] - 0s 0us/step
This image most likely belongs to sunflowers with a 89.39 percent confidence.