ترحيل feature_columns إلى طبقات المعالجة المسبقة Keras الخاصة بـ TF2

عادةً ما يأتي تدريب النموذج مع قدر من المعالجة المسبقة للميزات ، لا سيما عند التعامل مع البيانات المنظمة. عند تدريب tf.estimator.Estimator على TF1 ، تتم معالجة هذه الميزة مسبقًا باستخدام واجهة برمجة تطبيقات tf.feature_column . في TF2 ، يمكن إجراء هذه المعالجة المسبقة مباشرةً باستخدام طبقات Keras ، والتي تسمى طبقات المعالجة المسبقة .

في دليل الترحيل هذا ، ستقوم بإجراء بعض تحويلات الميزات الشائعة باستخدام أعمدة الميزة وطبقات المعالجة المسبقة ، متبوعًا بتدريب نموذج كامل باستخدام كل من واجهات برمجة التطبيقات.

أولاً ، ابدأ بزوج من الواردات الضرورية ،

import tensorflow as tf
import tensorflow.compat.v1 as tf1
import math

وأضف أداة مساعدة لاستدعاء عمود ميزة للتوضيح:

def call_feature_columns(feature_columns, inputs):
  # This is a convenient way to call a `feature_column` outside of an estimator
  # to display its output.
  feature_layer = tf1.keras.layers.DenseFeatures(feature_columns)
  return feature_layer(inputs)

التعامل مع المدخلات

لاستخدام أعمدة المعالم مع مُقدِّر ، يُتوقع دائمًا أن تكون مدخلات النموذج قاموسًا للتنسورات:

input_dict = {
  'foo': tf.constant([1]),
  'bar': tf.constant([0]),
  'baz': tf.constant([-1])
}

يجب إنشاء كل عمود ميزة بمفتاح للفهرسة في البيانات المصدر. يتم تسلسل إخراج جميع أعمدة المعالم ويتم استخدامه بواسطة نموذج المقدر.

columns = [
  tf1.feature_column.numeric_column('foo'),
  tf1.feature_column.numeric_column('bar'),
  tf1.feature_column.numeric_column('baz'),
]
call_feature_columns(columns, input_dict)

<tf.Tensor: shape=(1, 3), dtype=float32, numpy=array([[ 0., -1.,  1.]], dtype=float32)>

في Keras ، يكون إدخال النموذج أكثر مرونة. يمكن أن يتعامل نموذج tf.keras.Model مع إدخال موتر واحد ، أو قائمة بميزات موتر ، أو قاموس ميزات موتر. يمكنك التعامل مع مدخلات القاموس عن طريق تمرير قاموس من tf.keras.Input عند إنشاء النموذج. لن يتم تسلسل المدخلات تلقائيًا ، مما يسمح باستخدامها بطرق أكثر مرونة. يمكن أن تكون متسلسلة مع tf.keras.layers.Concatenate .

inputs = {
  'foo': tf.keras.Input(shape=()),
  'bar': tf.keras.Input(shape=()),
  'baz': tf.keras.Input(shape=()),
}
# Inputs are typically transformed by preprocessing layers before concatenation.
outputs = tf.keras.layers.Concatenate()(inputs.values())
model = tf.keras.Model(inputs=inputs, outputs=outputs)
model(input_dict)

<tf.Tensor: shape=(3,), dtype=float32, numpy=array([ 1.,  0., -1.], dtype=float32)>

معرفات عدد صحيح ترميز واحد ساخن

تحويل السمة المشتركة هو ترميز واحد ساخن مدخلات عدد صحيح من نطاق معروف. فيما يلي مثال على استخدام أعمدة الميزات:

categorical_col = tf1.feature_column.categorical_column_with_identity(
    'type', num_buckets=3)
indicator_col = tf1.feature_column.indicator_column(categorical_col)
call_feature_columns(indicator_col, {'type': [0, 1, 2]})

<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]], dtype=float32)>

باستخدام طبقات المعالجة المسبقة لـ Keras ، يمكن استبدال هذه الأعمدة بطبقة tf.keras.layers.CategoryEncoding output_mode 'one_hot'

one_hot_layer = tf.keras.layers.CategoryEncoding(
    num_tokens=3, output_mode='one_hot')
one_hot_layer([0, 1, 2])

<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]], dtype=float32)>

تطبيع الميزات الرقمية

عند التعامل مع معالم النقطة العائمة المستمرة مع أعمدة الميزة ، تحتاج إلى استخدام tf.feature_column.numeric_column . في الحالة التي يكون فيها الإدخال طبيعيًا بالفعل ، يكون تحويل هذا إلى Keras أمرًا بسيطًا. يمكنك ببساطة استخدام tf.keras.Input مباشرة في النموذج الخاص بك ، كما هو موضح أعلاه.

يمكن أيضًا استخدام numeric_column لتطبيع الإدخال:

def normalize(x):
  mean, variance = (2.0, 1.0)
  return (x - mean) / math.sqrt(variance)
numeric_col = tf1.feature_column.numeric_column('col', normalizer_fn=normalize)
call_feature_columns(numeric_col, {'col': tf.constant([[0.], [1.], [2.]])})

<tf.Tensor: shape=(3, 1), dtype=float32, numpy=
array([[-2.],
       [-1.],
       [ 0.]], dtype=float32)>

في المقابل ، مع Keras ، يمكن إجراء هذا التطبيع باستخدام tf.keras.layers.Normalization .

normalization_layer = tf.keras.layers.Normalization(mean=2.0, variance=1.0)
normalization_layer(tf.constant([[0.], [1.], [2.]]))

<tf.Tensor: shape=(3, 1), dtype=float32, numpy=
array([[-2.],
       [-1.],
       [ 0.]], dtype=float32)>

دلو وميزات رقمية ترميز واحد ساخن

هناك تحويل شائع آخر لمدخلات النقطة العائمة المستمرة وهو التجميع ثم إلى أعداد صحيحة من نطاق ثابت.

في أعمدة الميزة ، يمكن تحقيق ذلك باستخدام tf.feature_column.bucketized_column :

numeric_col = tf1.feature_column.numeric_column('col')
bucketized_col = tf1.feature_column.bucketized_column(numeric_col, [1, 4, 5])
call_feature_columns(bucketized_col, {'col': tf.constant([1., 2., 3., 4., 5.])})

<tf.Tensor: shape=(5, 4), dtype=float32, numpy=
array([[0., 1., 0., 0.],
       [0., 1., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]], dtype=float32)>

في Keras ، يمكن استبدال هذا بـ tf.keras.layers.Discretization .

discretization_layer = tf.keras.layers.Discretization(bin_boundaries=[1, 4, 5])
one_hot_layer = tf.keras.layers.CategoryEncoding(
    num_tokens=4, output_mode='one_hot')
one_hot_layer(discretization_layer([1., 2., 3., 4., 5.]))

<tf.Tensor: shape=(5, 4), dtype=float32, numpy=
array([[0., 1., 0., 0.],
       [0., 1., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]], dtype=float32)>

سلسلة بيانات ترميز واحدة ساخنة مع مفردات

غالبًا ما تتطلب معالجة ميزات السلسلة البحث عن مفردات لترجمة السلاسل إلى فهارس. فيما يلي مثال على استخدام أعمدة الميزة للبحث عن السلاسل ثم تشفير المؤشرات مرة واحدة:

vocab_col = tf1.feature_column.categorical_column_with_vocabulary_list(
    'sizes',
    vocabulary_list=['small', 'medium', 'large'],
    num_oov_buckets=0)
indicator_col = tf1.feature_column.indicator_column(vocab_col)
call_feature_columns(indicator_col, {'sizes': ['small', 'medium', 'large']})

<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]], dtype=float32)>

باستخدام طبقات معالجة Keras المسبقة ، استخدم طبقة tf.keras.layers.StringLookup مع وضع output_mode مضبوطًا على 'one_hot' :

string_lookup_layer = tf.keras.layers.StringLookup(
    vocabulary=['small', 'medium', 'large'],
    num_oov_indices=0,
    output_mode='one_hot')
string_lookup_layer(['small', 'medium', 'large'])

<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]], dtype=float32)>

تضمين بيانات سلسلة مع مفردات

للمفردات الكبيرة ، غالبًا ما يكون التضمين ضروريًا لتحقيق أداء جيد. فيما يلي مثال لتضمين ميزة سلسلة باستخدام أعمدة الميزة:

vocab_col = tf1.feature_column.categorical_column_with_vocabulary_list(
    'col',
    vocabulary_list=['small', 'medium', 'large'],
    num_oov_buckets=0)
embedding_col = tf1.feature_column.embedding_column(vocab_col, 4)
call_feature_columns(embedding_col, {'col': ['small', 'medium', 'large']})

<tf.Tensor: shape=(3, 4), dtype=float32, numpy=
array([[-0.01798586, -0.2808677 ,  0.27639154,  0.06081508],
       [ 0.05771849,  0.02464074,  0.20080602,  0.50164527],
       [-0.9208247 , -0.40816694, -0.49132794,  0.9203153 ]],
      dtype=float32)>

باستخدام طبقات المعالجة المسبقة لـ Keras ، يمكن تحقيق ذلك من خلال الجمع بين طبقة tf.keras.layers.StringLookup وطبقة tf.keras.layers.Embedding . سيكون الإخراج الافتراضي لـ StringLookup عن فهارس أعداد صحيحة يمكن إدخالها مباشرة في التضمين.

string_lookup_layer = tf.keras.layers.StringLookup(
    vocabulary=['small', 'medium', 'large'], num_oov_indices=0)
embedding = tf.keras.layers.Embedding(3, 4)
embedding(string_lookup_layer(['small', 'medium', 'large']))

<tf.Tensor: shape=(3, 4), dtype=float32, numpy=
array([[ 0.04838837, -0.04014301,  0.02001903, -0.01150769],
       [-0.04580117, -0.04319514,  0.03725603, -0.00572466],
       [-0.0401094 ,  0.00997342,  0.00111955,  0.00132702]],
      dtype=float32)>

جمع البيانات الفئوية المرجحة

في بعض الحالات ، تحتاج إلى التعامل مع البيانات الفئوية حيث يأتي كل حدث لفئة مع وزن مرتبط. في أعمدة الميزة ، يتم التعامل مع هذا بواسطة tf.feature_column.weighted_categorical_column . عند إقرانه بعمود indicator_column ، يكون لهذا تأثير جمع الأوزان لكل فئة.

ids = tf.constant([[5, 11, 5, 17, 17]])
weights = tf.constant([[0.5, 1.5, 0.7, 1.8, 0.2]])

categorical_col = tf1.feature_column.categorical_column_with_identity(
    'ids', num_buckets=20)
weighted_categorical_col = tf1.feature_column.weighted_categorical_column(
    categorical_col, 'weights')
indicator_col = tf1.feature_column.indicator_column(weighted_categorical_col)
call_feature_columns(indicator_col, {'ids': ids, 'weights': weights})

WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/feature_column/feature_column_v2.py:4203: sparse_merge (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
No similar op available at this time.
<tf.Tensor: shape=(1, 20), dtype=float32, numpy=
array([[0. , 0. , 0. , 0. , 0. , 1.2, 0. , 0. , 0. , 0. , 0. , 1.5, 0. ,

        0. , 0. , 0. , 0. , 2. , 0. , 0. ]], dtype=float32)>

في Keras ، يمكن القيام بذلك عن طريق تمرير إدخال count_weights إلى tf.keras.layers.CategoryEncoding مع output_mode='count' .

ids = tf.constant([[5, 11, 5, 17, 17]])
weights = tf.constant([[0.5, 1.5, 0.7, 1.8, 0.2]])

# Using sparse output is more efficient when `num_tokens` is large.
count_layer = tf.keras.layers.CategoryEncoding(
    num_tokens=20, output_mode='count', sparse=True)
tf.sparse.to_dense(count_layer(ids, count_weights=weights))

<tf.Tensor: shape=(1, 20), dtype=float32, numpy=
array([[0. , 0. , 0. , 0. , 0. , 1.2, 0. , 0. , 0. , 0. , 0. , 1.5, 0. ,

        0. , 0. , 0. , 0. , 2. , 0. , 0. ]], dtype=float32)>

تضمين البيانات الفئوية الموزونة

قد ترغب بالتناوب في تضمين المدخلات الموزونة الفئوية. في أعمدة المعالم ، يحتوي embedding_column على وسيطة combiner . إذا احتوى أي نموذج على إدخالات متعددة لفئة ، فسيتم دمجها وفقًا لإعداد الوسيطة (افتراضيًا 'mean' ).

ids = tf.constant([[5, 11, 5, 17, 17]])
weights = tf.constant([[0.5, 1.5, 0.7, 1.8, 0.2]])

categorical_col = tf1.feature_column.categorical_column_with_identity(
    'ids', num_buckets=20)
weighted_categorical_col = tf1.feature_column.weighted_categorical_column(
    categorical_col, 'weights')
embedding_col = tf1.feature_column.embedding_column(
    weighted_categorical_col, 4, combiner='mean')
call_feature_columns(embedding_col, {'ids': ids, 'weights': weights})

<tf.Tensor: shape=(1, 4), dtype=float32, numpy=
array([[ 0.02666993,  0.289671  ,  0.18065728, -0.21045178]],
      dtype=float32)>

في Keras ، لا يوجد خيار tf.keras.layers.Embedding combiner ولكن يمكنك تحقيق نفس التأثير باستخدام tf.keras.layers.Dense . إن embedding_column أعلاه هو ببساطة دمج متجهات التضمين خطيًا وفقًا لوزن الفئة. على الرغم من أنه ليس واضحًا في البداية ، إلا أنه مكافئ تمامًا لتمثيل مدخلاتك الفئوية كمتجه وزن ضئيل للحجم (num_tokens) ، وتعديلها بواسطة نواة Dense الشكل (embedding_size, num_tokens) .

ids = tf.constant([[5, 11, 5, 17, 17]])
weights = tf.constant([[0.5, 1.5, 0.7, 1.8, 0.2]])

# For `combiner='mean'`, normalize your weights to sum to 1. Removing this line
# would be eqivalent to an `embedding_column` with `combiner='sum'`.
weights = weights / tf.reduce_sum(weights, axis=-1, keepdims=True)

count_layer = tf.keras.layers.CategoryEncoding(
    num_tokens=20, output_mode='count', sparse=True)
embedding_layer = tf.keras.layers.Dense(4, use_bias=False)
embedding_layer(count_layer(ids, count_weights=weights))

<tf.Tensor: shape=(1, 4), dtype=float32, numpy=
array([[-0.03897291, -0.27131438,  0.09332469,  0.04333957]],
      dtype=float32)>

مثال تدريب كامل

لإظهار سير عمل تدريبي كامل ، قم أولاً بإعداد بعض البيانات بثلاث ميزات من أنواع مختلفة:

features = {
    'type': [0, 1, 1],
    'size': ['small', 'small', 'medium'],
    'weight': [2.7, 1.8, 1.6],
}
labels = [1, 1, 0]
predict_features = {'type': [0], 'size': ['foo'], 'weight': [-0.7]}

حدد بعض الثوابت المشتركة لكل من مهام سير عمل TF1 و TF2:

vocab = ['small', 'medium', 'large']
one_hot_dims = 3
embedding_dims = 4
weight_mean = 2.0
weight_variance = 1.0

مع أعمدة الميزة

يجب أن يتم تمرير أعمدة السمات كقائمة للمقدر عند الإنشاء ، وسيتم استدعاؤها ضمنيًا أثناء التدريب.

categorical_col = tf1.feature_column.categorical_column_with_identity(
    'type', num_buckets=one_hot_dims)
# Convert index to one-hot; e.g. [2] -> [0,0,1].
indicator_col = tf1.feature_column.indicator_column(categorical_col)

# Convert strings to indices; e.g. ['small'] -> [1].
vocab_col = tf1.feature_column.categorical_column_with_vocabulary_list(
    'size', vocabulary_list=vocab, num_oov_buckets=1)
# Embed the indices.
embedding_col = tf1.feature_column.embedding_column(vocab_col, embedding_dims)

normalizer_fn = lambda x: (x - weight_mean) / math.sqrt(weight_variance)
# Normalize the numeric inputs; e.g. [2.0] -> [0.0].
numeric_col = tf1.feature_column.numeric_column(
    'weight', normalizer_fn=normalizer_fn)

estimator = tf1.estimator.DNNClassifier(
    feature_columns=[indicator_col, embedding_col, numeric_col],
    hidden_units=[1])

def _input_fn():
  return tf1.data.Dataset.from_tensor_slices((features, labels)).batch(1)

estimator.train(_input_fn)

INFO:tensorflow:Using default config.
WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmp8lwbuor2
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmp8lwbuor2', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
INFO:tensorflow:Calling model_fn.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/training/adagrad.py:77: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmp8lwbuor2/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:loss = 0.54634213, step = 0
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 3...
INFO:tensorflow:Saving checkpoints for 3 into /tmp/tmp8lwbuor2/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 3...
INFO:tensorflow:Loss for final step: 0.7308526.
<tensorflow_estimator.python.estimator.canned.dnn.DNNClassifier at 0x7f90685d53d0>

سيتم أيضًا استخدام أعمدة الميزة لتحويل بيانات الإدخال عند تشغيل الاستدلال على النموذج.

def _predict_fn():
  return tf1.data.Dataset.from_tensor_slices(predict_features).batch(1)

next(estimator.predict(_predict_fn))

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmp8lwbuor2/model.ckpt-3
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
{'logits': array([0.5172372], dtype=float32),
 'logistic': array([0.6265015], dtype=float32),
 'probabilities': array([0.37349847, 0.6265015 ], dtype=float32),
 'class_ids': array([1]),
 'classes': array([b'1'], dtype=object),
 'all_class_ids': array([0, 1], dtype=int32),
 'all_classes': array([b'0', b'1'], dtype=object)}

مع طبقات Keras المعالجة المسبقة

تعتبر طبقات Keras المسبقة المعالجة أكثر مرونة من حيث المكان الذي يمكن استدعاؤها. يمكن تطبيق طبقة مباشرة على الموترات ، أو استخدامها داخل خط أنابيب إدخال tf.data ، أو مضمنة مباشرة في نموذج Keras القابل للتدريب.

في هذا المثال ، ستقوم بتطبيق طبقات المعالجة المسبقة داخل خط أنابيب إدخال tf.data . للقيام بذلك ، يمكنك تحديد tf.keras.Model منفصل للمعالجة المسبقة لميزات الإدخال الخاصة بك. هذا النموذج غير قابل للتدريب ، ولكنه طريقة مناسبة لتجميع طبقات المعالجة المسبقة.

inputs = {
  'type': tf.keras.Input(shape=(), dtype='int64'),
  'size': tf.keras.Input(shape=(), dtype='string'),
  'weight': tf.keras.Input(shape=(), dtype='float32'),
}
# Convert index to one-hot; e.g. [2] -> [0,0,1].
type_output = tf.keras.layers.CategoryEncoding(
      one_hot_dims, output_mode='one_hot')(inputs['type'])
# Convert size strings to indices; e.g. ['small'] -> [1].
size_output = tf.keras.layers.StringLookup(vocabulary=vocab)(inputs['size'])
# Normalize the numeric inputs; e.g. [2.0] -> [0.0].
weight_output = tf.keras.layers.Normalization(
      axis=None, mean=weight_mean, variance=weight_variance)(inputs['weight'])
outputs = {
  'type': type_output,
  'size': size_output,
  'weight': weight_output,
}
preprocessing_model = tf.keras.Model(inputs, outputs)

يمكنك الآن تطبيق هذا النموذج داخل استدعاء tf.data.Dataset.map . يرجى ملاحظة أن الوظيفة التي تم تمريرها إلى map سيتم تحويلها تلقائيًا إلى tf.function ، ويتم تطبيق التحذيرات المعتادة لكتابة رمز tf.function (بدون آثار جانبية).

# Apply the preprocessing in tf.data.Dataset.map.
dataset = tf.data.Dataset.from_tensor_slices((features, labels)).batch(1)
dataset = dataset.map(lambda x, y: (preprocessing_model(x), y),
                      num_parallel_calls=tf.data.AUTOTUNE)
# Display a preprocessed input sample.
next(dataset.take(1).as_numpy_iterator())

({'type': array([[1., 0., 0.]], dtype=float32),
  'size': array([1]),
  'weight': array([0.70000005], dtype=float32)},
 array([1], dtype=int32))

بعد ذلك ، يمكنك تحديد Model منفصل يحتوي على الطبقات القابلة للتدريب. لاحظ كيف تعكس مدخلات هذا النموذج الآن أنواع وأشكال الميزات المعالجة مسبقًا.

inputs = {
  'type': tf.keras.Input(shape=(one_hot_dims,), dtype='float32'),
  'size': tf.keras.Input(shape=(), dtype='int64'),
  'weight': tf.keras.Input(shape=(), dtype='float32'),
}
# Since the embedding is trainable, it needs to be part of the training model.
embedding = tf.keras.layers.Embedding(len(vocab), embedding_dims)
outputs = tf.keras.layers.Concatenate()([
  inputs['type'],
  embedding(inputs['size']),
  tf.expand_dims(inputs['weight'], -1),
])
outputs = tf.keras.layers.Dense(1)(outputs)
training_model = tf.keras.Model(inputs, outputs)

يمكنك الآن تدريب نموذج training_model على tf.keras.Model.fit .

# Train on the preprocessed data.
training_model.compile(
    loss=tf.keras.losses.BinaryCrossentropy(from_logits=True))
training_model.fit(dataset)

3/3 [==============================] - 0s 3ms/step - loss: 0.7248
<keras.callbacks.History at 0x7f9041a294d0>

أخيرًا ، في وقت الاستدلال ، قد يكون من المفيد دمج هذه المراحل المنفصلة في نموذج واحد يتعامل مع مدخلات الميزات الأولية.

inputs = preprocessing_model.input
outpus = training_model(preprocessing_model(inputs))
inference_model = tf.keras.Model(inputs, outpus)

predict_dataset = tf.data.Dataset.from_tensor_slices(predict_features).batch(1)
inference_model.predict(predict_dataset)

array([[0.936637]], dtype=float32)

يمكن حفظ هذا النموذج المركب كنموذج SavedModel لاستخدامه لاحقًا.

inference_model.save('model')
restored_model = tf.keras.models.load_model('model')
restored_model.predict(predict_dataset)

WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model.
2021-10-27 01:23:25.649967: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
INFO:tensorflow:Assets written to: model/assets
WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually.
array([[0.936637]], dtype=float32)

ملاحظة: طبقات المعالجة المسبقة غير قابلة للتدريب ، مما يسمح لك بتطبيقها بشكل غير متزامن باستخدام tf.data . هذا له فوائد الأداء ، حيث يمكنك كل من الجلب المسبق للدفعات المجهزة مسبقًا ، وتحرير أي مسرعات للتركيز على الأجزاء القابلة للتفاضل من النموذج. كما يوضح هذا الدليل ، يعد فصل المعالجة المسبقة أثناء التدريب وتكوينها أثناء الاستدلال طريقة مرنة للاستفادة من مكاسب الأداء هذه. ومع ذلك ، إذا كان نموذجك صغيرًا أو كان وقت المعالجة المسبقة مهملاً ، فقد يكون من الأسهل بناء معالجة مسبقة في نموذج كامل من البداية. للقيام بذلك ، يمكنك بناء نموذج واحد يبدأ بـ tf.keras.Input ، متبوعًا بطبقات ما قبل المعالجة ، متبوعة بطبقات قابلة للتدريب.

جدول تكافؤ أعمدة الميزة

كمرجع ، يوجد هنا تطابق تقريبي بين أعمدة المعالم وطبقات ما قبل المعالجة:

عمود الميزة	طبقة كيراس
`feature_column.bucketized_column`	`layers.Discretization`
`feature_column.categorical_column_with_hash_bucket`	`layers.Hashing`
`feature_column.categorical_column_with_identity`	`layers.CategoryEncoding`
`feature_column.categorical_column_with_vocabulary_file`	`layers.StringLookup` أو `layers.IntegerLookup`
`feature_column.categorical_column_with_vocabulary_list`	`layers.StringLookup` أو `layers.IntegerLookup`
`feature_column.crossed_column`	لم تنفذ.
`feature_column.embedding_column`	`layers.Embedding`
`feature_column.indicator_column`	`output_mode='one_hot'` أو `output_mode='multi_hot'` *
`feature_column.numeric_column`	`layers.Normalization`
`feature_column.sequence_categorical_column_with_hash_bucket`	`layers.Hashing`
`feature_column.sequence_categorical_column_with_identity`	`layers.CategoryEncoding`
`feature_column.sequence_categorical_column_with_vocabulary_file`	`layers.StringLookup` ، سلسلة ، بحث ، طبقات ، `layers.IntegerLookup` ، أو `layer.TextVectorization` .
`feature_column.sequence_categorical_column_with_vocabulary_list`	`layers.StringLookup` ، سلسلة ، بحث ، طبقات ، `layers.IntegerLookup` ، أو `layer.TextVectorization` .
`feature_column.sequence_numeric_column`	`layers.Normalization`
`feature_column.weighted_categorical_column`	`layers.CategoryEncoding`

* يمكن layers.TextVectorization output_mode layers.StringLookup layers.CategoryEncoding layers.IntegerLookup

يمكن أن تتعامل layers.TextVectorization النص مع الإدخال الحر للنص مباشرةً (على سبيل المثال ، جمل أو فقرات كاملة). هذا ليس بديلًا واحدًا لواحد لمعالجة التسلسل الفئوي في TF1 ، ولكنه قد يوفر بديلاً مناسبًا للمعالجة المسبقة للنص المخصص.

الخطوات التالية

لمزيد من المعلومات حول طبقات المعالجة المسبقة لـ keras ، راجع دليل طبقات المعالجة المسبقة .
للحصول على مثال أكثر تعمقًا لتطبيق طبقات المعالجة المسبقة على البيانات المنظمة ، راجع البرنامج التعليمي للبيانات المنظمة .