TensorFlow กลับมาที่ Google I/O ในวันที่ 14 พฤษภาคม! สมัครตอนนี้

หน้านี้ได้รับการแปลโดย Cloud Translation API

ภายหลังการฝึกอบรม float16 quantization

ดูบน TensorFlow.org

ทำงานใน Google Colab

ดูแหล่งที่มาบน GitHub

ดาวน์โหลดโน๊ตบุ๊ค

ภาพรวม

TensorFlow Lite ขณะนี้สนับสนุนการแปลงน้ำหนักถึง 16 บิตค่า floating point ในระหว่างการแปลงรูปแบบจาก TensorFlow เป็นรูปแบบบัฟเฟอร์แบน TensorFlow Lite ของ ส่งผลให้ขนาดโมเดลลดลง 2 เท่า ฮาร์ดแวร์บางตัว เช่น GPU สามารถคำนวณโดยกำเนิดในเลขคณิตที่มีความแม่นยำที่ลดลงนี้ ทำให้สามารถเร่งความเร็วเหนือการประมวลผลจุดลอยตัวแบบเดิมได้ ตัวแทน Tensorflow Lite GPU สามารถกำหนดค่าให้ทำงานในลักษณะนี้ได้ อย่างไรก็ตาม โมเดลที่แปลงเป็นตุ้มน้ำหนัก float16 ยังคงสามารถทำงานบน CPU ได้โดยไม่ต้องมีการดัดแปลงเพิ่มเติม: ตุ้มน้ำหนัก float16 จะถูกสุ่มตัวอย่างเป็น float32 ก่อนการอนุมานครั้งแรก สิ่งนี้ทำให้สามารถลดขนาดโมเดลลงได้อย่างมากเพื่อแลกกับผลกระทบที่น้อยที่สุดต่อเวลาแฝงและความแม่นยำ

ในบทช่วยสอนนี้ คุณจะฝึกโมเดล MNIST ตั้งแต่เริ่มต้น ตรวจสอบความแม่นยำใน TensorFlow แล้วแปลงโมเดลเป็นบัฟเฟอร์แบบแบน Tensorflow Lite ด้วยการวัดปริมาณ float16 สุดท้าย ตรวจสอบความถูกต้องของแบบจำลองที่แปลงแล้วเปรียบเทียบกับรุ่น float32 ดั้งเดิม

สร้างแบบจำลอง MNIST

ติดตั้ง

import logging
logging.getLogger("tensorflow").setLevel(logging.DEBUG)

import tensorflow as tf
from tensorflow import keras
import numpy as np
import pathlib

ฝึกและส่งออกโมเดล

# Load MNIST dataset
mnist = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# Normalize the input image so that each pixel value is between 0 to 1.
train_images = train_images / 255.0
test_images = test_images / 255.0

# Define the model architecture
model = keras.Sequential([
  keras.layers.InputLayer(input_shape=(28, 28)),
  keras.layers.Reshape(target_shape=(28, 28, 1)),
  keras.layers.Conv2D(filters=12, kernel_size=(3, 3), activation=tf.nn.relu),
  keras.layers.MaxPooling2D(pool_size=(2, 2)),
  keras.layers.Flatten(),
  keras.layers.Dense(10)
])

# Train the digit classification model
model.compile(optimizer='adam',
              loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])
model.fit(
  train_images,
  train_labels,
  epochs=1,
  validation_data=(test_images, test_labels)
)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11493376/11490434 [==============================] - 0s 0us/step
11501568/11490434 [==============================] - 0s 0us/step
1875/1875 [==============================] - 13s 2ms/step - loss: 0.2655 - accuracy: 0.9244 - val_loss: 0.1237 - val_accuracy: 0.9654
<keras.callbacks.History at 0x7f3f8428e6d0>

ตัวอย่างเช่น คุณฝึกโมเดลสำหรับยุคเดียว ดังนั้นมันจึงฝึกให้มีความแม่นยำเพียง ~96%

แปลงเป็นรุ่น TensorFlow Lite

ใช้งูหลาม TFLiteConverter ตอนนี้คุณสามารถแปลงรูปแบบการฝึกอบรมในรูปแบบ TensorFlow Lite

ตอนนี้โหลดรูปแบบโดยใช้ TFLiteConverter :

converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()

2021-12-14 12:18:07.073783: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
INFO:tensorflow:Assets written to: /tmp/tmpm1s3vkrd/assets
2021-12-14 12:18:07.876066: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:363] Ignored output_format.
2021-12-14 12:18:07.876112: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:366] Ignored drop_control_dependency.
WARNING:absl:Buffer deduplication procedure will be skipped when flatbuffer library is not properly loaded

เขียนมันออกไป .tflite ไฟล์:

tflite_models_dir = pathlib.Path("/tmp/mnist_tflite_models/")
tflite_models_dir.mkdir(exist_ok=True, parents=True)

tflite_model_file = tflite_models_dir/"mnist_model.tflite"
tflite_model_file.write_bytes(tflite_model)

แทนที่จะ quantize แบบจำลองเพื่อการส่งออก float16 แรกตั้งค่า optimizations ธงกับการใช้เพิ่มประสิทธิภาพการเริ่มต้น จากนั้นระบุว่า float16 เป็นประเภทที่รองรับบนแพลตฟอร์มเป้าหมาย:

converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.float16]

สุดท้ายแปลงโมเดลเหมือนปกติ หมายเหตุ โดยค่าเริ่มต้น โมเดลที่แปลงแล้วจะยังคงใช้อินพุตและเอาต์พุตแบบ float เพื่อความสะดวกในการเรียกใช้

tflite_fp16_model = converter.convert()
tflite_model_fp16_file = tflite_models_dir/"mnist_model_quant_f16.tflite"
tflite_model_fp16_file.write_bytes(tflite_fp16_model)

INFO:tensorflow:Assets written to: /tmp/tmpvjt9l68i/assets
INFO:tensorflow:Assets written to: /tmp/tmpvjt9l68i/assets
2021-12-14 12:18:08.810262: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:363] Ignored output_format.
2021-12-14 12:18:08.810303: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:366] Ignored drop_control_dependency.
WARNING:absl:Buffer deduplication procedure will be skipped when flatbuffer library is not properly loaded
44384

หมายเหตุวิธีแฟ้มผลจะอยู่ที่ประมาณ 1/2 ขนาด

ls -lh {tflite_models_dir}

total 128K
-rw-rw-r-- 1 kbuilder kbuilder 83K Dec 14 12:18 mnist_model.tflite
-rw-rw-r-- 1 kbuilder kbuilder 44K Dec 14 12:18 mnist_model_quant_f16.tflite

เรียกใช้รุ่น TensorFlow Lite

เรียกใช้โมเดล TensorFlow Lite โดยใช้ Python TensorFlow Lite Interpreter

โหลดโมเดลลงในล่าม

interpreter = tf.lite.Interpreter(model_path=str(tflite_model_file))
interpreter.allocate_tensors()

interpreter_fp16 = tf.lite.Interpreter(model_path=str(tflite_model_fp16_file))
interpreter_fp16.allocate_tensors()

ทดสอบโมเดลในภาพเดียว

test_image = np.expand_dims(test_images[0], axis=0).astype(np.float32)

input_index = interpreter.get_input_details()[0]["index"]
output_index = interpreter.get_output_details()[0]["index"]

interpreter.set_tensor(input_index, test_image)
interpreter.invoke()
predictions = interpreter.get_tensor(output_index)

import matplotlib.pylab as plt

plt.imshow(test_images[0])
template = "True:{true}, predicted:{predict}"
_ = plt.title(template.format(true= str(test_labels[0]),
                              predict=str(np.argmax(predictions[0]))))
plt.grid(False)

png

test_image = np.expand_dims(test_images[0], axis=0).astype(np.float32)

input_index = interpreter_fp16.get_input_details()[0]["index"]
output_index = interpreter_fp16.get_output_details()[0]["index"]

interpreter_fp16.set_tensor(input_index, test_image)
interpreter_fp16.invoke()
predictions = interpreter_fp16.get_tensor(output_index)

plt.imshow(test_images[0])
template = "True:{true}, predicted:{predict}"
_ = plt.title(template.format(true= str(test_labels[0]),
                              predict=str(np.argmax(predictions[0]))))
plt.grid(False)

png

ประเมินรุ่น

# A helper function to evaluate the TF Lite model using "test" dataset.
def evaluate_model(interpreter):
  input_index = interpreter.get_input_details()[0]["index"]
  output_index = interpreter.get_output_details()[0]["index"]

  # Run predictions on every image in the "test" dataset.
  prediction_digits = []
  for test_image in test_images:
    # Pre-processing: add batch dimension and convert to float32 to match with
    # the model's input data format.
    test_image = np.expand_dims(test_image, axis=0).astype(np.float32)
    interpreter.set_tensor(input_index, test_image)

    # Run inference.
    interpreter.invoke()

    # Post-processing: remove batch dimension and find the digit with highest
    # probability.
    output = interpreter.tensor(output_index)
    digit = np.argmax(output()[0])
    prediction_digits.append(digit)

  # Compare prediction results with ground truth labels to calculate accuracy.
  accurate_count = 0
  for index in range(len(prediction_digits)):
    if prediction_digits[index] == test_labels[index]:
      accurate_count += 1
  accuracy = accurate_count * 1.0 / len(prediction_digits)

  return accuracy

print(evaluate_model(interpreter))

0.9654

ทำซ้ำการประเมินในแบบจำลองเชิงปริมาณ float16 เพื่อรับ:

# NOTE: Colab runs on server CPUs. At the time of writing this, TensorFlow Lite
# doesn't have super optimized server CPU kernels. For this reason this may be
# slower than the above float interpreter. But for mobile CPUs, considerable
# speedup can be observed.
print(evaluate_model(interpreter_fp16))

0.9654

ในตัวอย่างนี้ คุณได้วัดปริมาณแบบจำลองเป็น float16 โดยไม่มีความแตกต่างในความแม่นยำ

นอกจากนี้ยังสามารถประเมินโมเดลเชิงปริมาณ fp16 บน GPU ได้อีกด้วย ในการดำเนินการทางคณิตศาสตร์ทั้งหมดที่มีค่าความแม่นยำลดลงให้แน่ใจว่าได้สร้าง TfLiteGPUDelegateOptions struct ใน app และการตั้งค่าของคุณ precision_loss_allowed ไป 1 เช่นนี้

//Prepare GPU delegate.
const TfLiteGpuDelegateOptions options = {
  .metadata = NULL,
  .compile_options = {
    .precision_loss_allowed = 1,  // FP16
    .preferred_gl_object_type = TFLITE_GL_OBJECT_TYPE_FASTEST,
    .dynamic_batch_enabled = 0,   // Not fully functional yet
  },
};

เอกสารรายละเอียดเกี่ยวกับผู้แทน TFLite GPU และวิธีการใช้งานในการประยุกต์ใช้ของคุณสามารถพบได้ ที่นี่