TensorFlow powraca na Google I/O 14 maja! Zarejestruj się teraz

Ta strona została przetłumaczona przez Cloud Translation API.

Migracja feature_columns do warstw Keras Preprocessing TF2

Zobacz na TensorFlow.org

Uruchom w Google Colab

Wyświetl źródło na GitHub

Pobierz notatnik

Trenowanie modelu zwykle wiąże się z pewnym wstępnym przetwarzaniem funkcji, szczególnie w przypadku danych strukturalnych. Podczas uczenia tf.estimator.Estimator w TF1 to wstępne przetwarzanie funkcji jest zwykle wykonywane za pomocą interfejsu API tf.feature_column . W TF2 to wstępne przetwarzanie można wykonać bezpośrednio za pomocą warstw Keras, zwanych warstwami przetwarzania wstępnego .

W tym przewodniku migracji wykonasz kilka typowych przekształceń funkcji przy użyciu zarówno kolumn funkcji, jak i warstw przetwarzania wstępnego, a następnie przeszkolisz kompletny model z obydwoma interfejsami API.

Najpierw zacznij od kilku niezbędnych importów,

import tensorflow as tf
import tensorflow.compat.v1 as tf1
import math

i dodaj narzędzie do wywoływania kolumny funkcji w celu demonstracji:

def call_feature_columns(feature_columns, inputs):
  # This is a convenient way to call a `feature_column` outside of an estimator
  # to display its output.
  feature_layer = tf1.keras.layers.DenseFeatures(feature_columns)
  return feature_layer(inputs)

Obsługa wprowadzania

Aby używać kolumn funkcji z estymatorem, dane wejściowe modelu powinny być zawsze słownikiem tensorów:

input_dict = {
  'foo': tf.constant([1]),
  'bar': tf.constant([0]),
  'baz': tf.constant([-1])
}

Każda kolumna funkcji musi być utworzona z kluczem do indeksowania danych źródłowych. Dane wyjściowe wszystkich kolumn funkcji są łączone i używane przez model estymatora.

columns = [
  tf1.feature_column.numeric_column('foo'),
  tf1.feature_column.numeric_column('bar'),
  tf1.feature_column.numeric_column('baz'),
]
call_feature_columns(columns, input_dict)

<tf.Tensor: shape=(1, 3), dtype=float32, numpy=array([[ 0., -1.,  1.]], dtype=float32)>

W Keras wprowadzanie modelu jest znacznie bardziej elastyczne. tf.keras.Model może obsługiwać pojedyncze dane wejściowe tensora, listę funkcji tensor lub słownik funkcji tensor. Możesz obsłużyć wprowadzanie słownikowe, przekazując słownik tf.keras.Input podczas tworzenia modelu. Dane wejściowe nie będą konkatenowane automatycznie, co pozwala na ich znacznie bardziej elastyczne wykorzystanie. Można je łączyć z tf.keras.layers.Concatenate .

inputs = {
  'foo': tf.keras.Input(shape=()),
  'bar': tf.keras.Input(shape=()),
  'baz': tf.keras.Input(shape=()),
}
# Inputs are typically transformed by preprocessing layers before concatenation.
outputs = tf.keras.layers.Concatenate()(inputs.values())
model = tf.keras.Model(inputs=inputs, outputs=outputs)
model(input_dict)

<tf.Tensor: shape=(3,), dtype=float32, numpy=array([ 1.,  0., -1.], dtype=float32)>

Kodowanie jednorazowych identyfikatorów całkowitych

Typową transformacją funkcji jest kodowanie danych wejściowych liczb całkowitych z jednym gorącym zakresem o znanym zakresie. Oto przykład z użyciem kolumn funkcji:

categorical_col = tf1.feature_column.categorical_column_with_identity(
    'type', num_buckets=3)
indicator_col = tf1.feature_column.indicator_column(categorical_col)
call_feature_columns(indicator_col, {'type': [0, 1, 2]})

<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]], dtype=float32)>

Korzystając z warstw przetwarzania wstępnego Keras, te kolumny można zastąpić pojedynczą warstwą tf.keras.layers.CategoryEncoding z output_mode ustawionym na 'one_hot' :

one_hot_layer = tf.keras.layers.CategoryEncoding(
    num_tokens=3, output_mode='one_hot')
one_hot_layer([0, 1, 2])

<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]], dtype=float32)>

Normalizowanie cech numerycznych

Podczas obsługi ciągłych obiektów zmiennoprzecinkowych z kolumnami funkcji, należy użyć tf.feature_column.numeric_column . W przypadku, gdy dane wejściowe są już znormalizowane, konwersja do Keras jest trywialna. Możesz po prostu użyć tf.keras.Input bezpośrednio w swoim modelu, jak pokazano powyżej.

Do normalizacji danych wejściowych można również użyć numeric_column :

def normalize(x):
  mean, variance = (2.0, 1.0)
  return (x - mean) / math.sqrt(variance)
numeric_col = tf1.feature_column.numeric_column('col', normalizer_fn=normalize)
call_feature_columns(numeric_col, {'col': tf.constant([[0.], [1.], [2.]])})

<tf.Tensor: shape=(3, 1), dtype=float32, numpy=
array([[-2.],
       [-1.],
       [ 0.]], dtype=float32)>

W przeciwieństwie do Keras, tę normalizację można wykonać za pomocą tf.keras.layers.Normalization .

normalization_layer = tf.keras.layers.Normalization(mean=2.0, variance=1.0)
normalization_layer(tf.constant([[0.], [1.], [2.]]))

<tf.Tensor: shape=(3, 1), dtype=float32, numpy=
array([[-2.],
       [-1.],
       [ 0.]], dtype=float32)>

Funkcje numeryczne dzielenia na wiadro i kodowania na gorąco

Inną powszechną transformacją ciągłych, zmiennoprzecinkowych danych wejściowych jest przeliczanie na liczby całkowite o ustalonym zakresie.

W kolumnach funkcji można to osiągnąć za pomocą tf.feature_column.bucketized_column :

numeric_col = tf1.feature_column.numeric_column('col')
bucketized_col = tf1.feature_column.bucketized_column(numeric_col, [1, 4, 5])
call_feature_columns(bucketized_col, {'col': tf.constant([1., 2., 3., 4., 5.])})

<tf.Tensor: shape=(5, 4), dtype=float32, numpy=
array([[0., 1., 0., 0.],
       [0., 1., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]], dtype=float32)>

W Keras można to zastąpić przez tf.keras.layers.Discretization :

discretization_layer = tf.keras.layers.Discretization(bin_boundaries=[1, 4, 5])
one_hot_layer = tf.keras.layers.CategoryEncoding(
    num_tokens=4, output_mode='one_hot')
one_hot_layer(discretization_layer([1., 2., 3., 4., 5.]))

<tf.Tensor: shape=(5, 4), dtype=float32, numpy=
array([[0., 1., 0., 0.],
       [0., 1., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]], dtype=float32)>

Kodowanie jednorazowych danych ciągu ze słownictwem

Obsługa funkcji ciągów często wymaga wyszukiwania słownictwa w celu przetłumaczenia ciągów na indeksy. Oto przykład użycia kolumn funkcji do wyszukiwania ciągów, a następnie kodowania indeksów w trybie hot-hot:

vocab_col = tf1.feature_column.categorical_column_with_vocabulary_list(
    'sizes',
    vocabulary_list=['small', 'medium', 'large'],
    num_oov_buckets=0)
indicator_col = tf1.feature_column.indicator_column(vocab_col)
call_feature_columns(indicator_col, {'sizes': ['small', 'medium', 'large']})

<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]], dtype=float32)>

Korzystając z warstw przetwarzania wstępnego Keras, użyj warstwy tf.keras.layers.StringLookup z output_mode ustawionym na 'one_hot' :

string_lookup_layer = tf.keras.layers.StringLookup(
    vocabulary=['small', 'medium', 'large'],
    num_oov_indices=0,
    output_mode='one_hot')
string_lookup_layer(['small', 'medium', 'large'])

<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]], dtype=float32)>

Osadzanie danych ciągów ze słownictwem

W przypadku większych słowników często potrzebne jest osadzanie w celu uzyskania dobrej wydajności. Oto przykład osadzania funkcji ciągu za pomocą kolumn funkcji:

vocab_col = tf1.feature_column.categorical_column_with_vocabulary_list(
    'col',
    vocabulary_list=['small', 'medium', 'large'],
    num_oov_buckets=0)
embedding_col = tf1.feature_column.embedding_column(vocab_col, 4)
call_feature_columns(embedding_col, {'col': ['small', 'medium', 'large']})

<tf.Tensor: shape=(3, 4), dtype=float32, numpy=
array([[-0.01798586, -0.2808677 ,  0.27639154,  0.06081508],
       [ 0.05771849,  0.02464074,  0.20080602,  0.50164527],
       [-0.9208247 , -0.40816694, -0.49132794,  0.9203153 ]],
      dtype=float32)>

Korzystając z warstw przetwarzania wstępnego Keras, można to osiągnąć, łącząc warstwę tf.keras.layers.StringLookup i tf.keras.layers.Embedding . Domyślnym wynikiem funkcji StringLookup będą indeksy liczb całkowitych, które można wprowadzić bezpośrednio do osadzania.

string_lookup_layer = tf.keras.layers.StringLookup(
    vocabulary=['small', 'medium', 'large'], num_oov_indices=0)
embedding = tf.keras.layers.Embedding(3, 4)
embedding(string_lookup_layer(['small', 'medium', 'large']))

<tf.Tensor: shape=(3, 4), dtype=float32, numpy=
array([[ 0.04838837, -0.04014301,  0.02001903, -0.01150769],
       [-0.04580117, -0.04319514,  0.03725603, -0.00572466],
       [-0.0401094 ,  0.00997342,  0.00111955,  0.00132702]],
      dtype=float32)>

Sumowanie ważonych danych kategorycznych

W niektórych przypadkach musisz mieć do czynienia z danymi kategorycznymi, w których każdemu wystąpieniu kategorii towarzyszy powiązana waga. W kolumnach funkcji jest to obsługiwane przez tf.feature_column.weighted_categorical_column . W połączeniu z parametrem indicator_column , powoduje to sumowanie wag według kategorii.

ids = tf.constant([[5, 11, 5, 17, 17]])
weights = tf.constant([[0.5, 1.5, 0.7, 1.8, 0.2]])

categorical_col = tf1.feature_column.categorical_column_with_identity(
    'ids', num_buckets=20)
weighted_categorical_col = tf1.feature_column.weighted_categorical_column(
    categorical_col, 'weights')
indicator_col = tf1.feature_column.indicator_column(weighted_categorical_col)
call_feature_columns(indicator_col, {'ids': ids, 'weights': weights})

WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/feature_column/feature_column_v2.py:4203: sparse_merge (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
No similar op available at this time.
<tf.Tensor: shape=(1, 20), dtype=float32, numpy=
array([[0. , 0. , 0. , 0. , 0. , 1.2, 0. , 0. , 0. , 0. , 0. , 1.5, 0. ,

        0. , 0. , 0. , 0. , 2. , 0. , 0. ]], dtype=float32)>

W Keras można to zrobić, przekazując dane wejściowe count_weights do tf.keras.layers.CategoryEncoding z output_mode='count' .

ids = tf.constant([[5, 11, 5, 17, 17]])
weights = tf.constant([[0.5, 1.5, 0.7, 1.8, 0.2]])

# Using sparse output is more efficient when `num_tokens` is large.
count_layer = tf.keras.layers.CategoryEncoding(
    num_tokens=20, output_mode='count', sparse=True)
tf.sparse.to_dense(count_layer(ids, count_weights=weights))

<tf.Tensor: shape=(1, 20), dtype=float32, numpy=
array([[0. , 0. , 0. , 0. , 0. , 1.2, 0. , 0. , 0. , 0. , 0. , 1.5, 0. ,

        0. , 0. , 0. , 0. , 2. , 0. , 0. ]], dtype=float32)>

Osadzanie ważonych danych kategorialnych

Alternatywnie możesz chcieć osadzić ważone dane kategoryczne. W kolumnach funkcji kolumna embedding_column zawiera argument combiner . Jeśli dowolna próbka zawiera wiele wpisów dla kategorii, zostaną one połączone zgodnie z ustawieniem argumentu (domyślnie 'mean' ).

ids = tf.constant([[5, 11, 5, 17, 17]])
weights = tf.constant([[0.5, 1.5, 0.7, 1.8, 0.2]])

categorical_col = tf1.feature_column.categorical_column_with_identity(
    'ids', num_buckets=20)
weighted_categorical_col = tf1.feature_column.weighted_categorical_column(
    categorical_col, 'weights')
embedding_col = tf1.feature_column.embedding_column(
    weighted_categorical_col, 4, combiner='mean')
call_feature_columns(embedding_col, {'ids': ids, 'weights': weights})

<tf.Tensor: shape=(1, 4), dtype=float32, numpy=
array([[ 0.02666993,  0.289671  ,  0.18065728, -0.21045178]],
      dtype=float32)>

W Keras nie ma opcji tf.keras.layers.Embedding combiner ale ten sam efekt można osiągnąć dzięki tf.keras.layers.Dense . Powyższa embedding_column po prostu liniowo łączy wektory osadzania zgodnie z wagą kategorii. Chociaż na początku nie jest to oczywiste, jest to dokładnie równoważne reprezentowaniu danych wejściowych kategorycznych jako rzadkiego wektora wag o rozmiarze (num_tokens) i pomnożeniu ich przez jądro Dense kształtu (embedding_size, num_tokens) .

ids = tf.constant([[5, 11, 5, 17, 17]])
weights = tf.constant([[0.5, 1.5, 0.7, 1.8, 0.2]])

# For `combiner='mean'`, normalize your weights to sum to 1. Removing this line
# would be eqivalent to an `embedding_column` with `combiner='sum'`.
weights = weights / tf.reduce_sum(weights, axis=-1, keepdims=True)

count_layer = tf.keras.layers.CategoryEncoding(
    num_tokens=20, output_mode='count', sparse=True)
embedding_layer = tf.keras.layers.Dense(4, use_bias=False)
embedding_layer(count_layer(ids, count_weights=weights))

<tf.Tensor: shape=(1, 4), dtype=float32, numpy=
array([[-0.03897291, -0.27131438,  0.09332469,  0.04333957]],
      dtype=float32)>

Kompletny przykład szkolenia

Aby pokazać pełny przepływ pracy szkoleniowej, najpierw przygotuj dane z trzema funkcjami różnych typów:

features = {
    'type': [0, 1, 1],
    'size': ['small', 'small', 'medium'],
    'weight': [2.7, 1.8, 1.6],
}
labels = [1, 1, 0]
predict_features = {'type': [0], 'size': ['foo'], 'weight': [-0.7]}

Zdefiniuj kilka wspólnych stałych dla przepływów pracy TF1 i TF2:

vocab = ['small', 'medium', 'large']
one_hot_dims = 3
embedding_dims = 4
weight_mean = 2.0
weight_variance = 1.0

Z kolumnami funkcji

Kolumny funkcji muszą być przekazywane jako lista do estymatora podczas tworzenia i będą wywoływane niejawnie podczas uczenia.

categorical_col = tf1.feature_column.categorical_column_with_identity(
    'type', num_buckets=one_hot_dims)
# Convert index to one-hot; e.g. [2] -> [0,0,1].
indicator_col = tf1.feature_column.indicator_column(categorical_col)

# Convert strings to indices; e.g. ['small'] -> [1].
vocab_col = tf1.feature_column.categorical_column_with_vocabulary_list(
    'size', vocabulary_list=vocab, num_oov_buckets=1)
# Embed the indices.
embedding_col = tf1.feature_column.embedding_column(vocab_col, embedding_dims)

normalizer_fn = lambda x: (x - weight_mean) / math.sqrt(weight_variance)
# Normalize the numeric inputs; e.g. [2.0] -> [0.0].
numeric_col = tf1.feature_column.numeric_column(
    'weight', normalizer_fn=normalizer_fn)

estimator = tf1.estimator.DNNClassifier(
    feature_columns=[indicator_col, embedding_col, numeric_col],
    hidden_units=[1])

def _input_fn():
  return tf1.data.Dataset.from_tensor_slices((features, labels)).batch(1)

estimator.train(_input_fn)

INFO:tensorflow:Using default config.
WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmp8lwbuor2
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmp8lwbuor2', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
INFO:tensorflow:Calling model_fn.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/training/adagrad.py:77: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmp8lwbuor2/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:loss = 0.54634213, step = 0
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 3...
INFO:tensorflow:Saving checkpoints for 3 into /tmp/tmp8lwbuor2/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 3...
INFO:tensorflow:Loss for final step: 0.7308526.
<tensorflow_estimator.python.estimator.canned.dnn.DNNClassifier at 0x7f90685d53d0>

Kolumny funkcji będą również używane do przekształcania danych wejściowych podczas wnioskowania na modelu.

def _predict_fn():
  return tf1.data.Dataset.from_tensor_slices(predict_features).batch(1)

next(estimator.predict(_predict_fn))

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmp8lwbuor2/model.ckpt-3
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
{'logits': array([0.5172372], dtype=float32),
 'logistic': array([0.6265015], dtype=float32),
 'probabilities': array([0.37349847, 0.6265015 ], dtype=float32),
 'class_ids': array([1]),
 'classes': array([b'1'], dtype=object),
 'all_class_ids': array([0, 1], dtype=int32),
 'all_classes': array([b'0', b'1'], dtype=object)}

Z warstwami wstępnego przetwarzania Keras

Warstwy przetwarzania wstępnego Keras są bardziej elastyczne, jeśli chodzi o ich wywoływanie. Warstwę można zastosować bezpośrednio do tensorów, użyć wewnątrz potoku wejściowego tf.data lub wbudować bezpośrednio w trenowalny model Keras.

W tym przykładzie zastosujesz warstwy przetwarzania wstępnego wewnątrz potoku wejściowego tf.data . Aby to zrobić, możesz zdefiniować oddzielny tf.keras.Model , aby wstępnie przetworzyć funkcje wejściowe. Tego modelu nie można trenować, ale jest to wygodny sposób na grupowanie warstw przetwarzania wstępnego.

inputs = {
  'type': tf.keras.Input(shape=(), dtype='int64'),
  'size': tf.keras.Input(shape=(), dtype='string'),
  'weight': tf.keras.Input(shape=(), dtype='float32'),
}
# Convert index to one-hot; e.g. [2] -> [0,0,1].
type_output = tf.keras.layers.CategoryEncoding(
      one_hot_dims, output_mode='one_hot')(inputs['type'])
# Convert size strings to indices; e.g. ['small'] -> [1].
size_output = tf.keras.layers.StringLookup(vocabulary=vocab)(inputs['size'])
# Normalize the numeric inputs; e.g. [2.0] -> [0.0].
weight_output = tf.keras.layers.Normalization(
      axis=None, mean=weight_mean, variance=weight_variance)(inputs['weight'])
outputs = {
  'type': type_output,
  'size': size_output,
  'weight': weight_output,
}
preprocessing_model = tf.keras.Model(inputs, outputs)

Możesz teraz zastosować ten model w wywołaniu tf.data.Dataset.map . Należy pamiętać, że funkcja przekazana do map zostanie automatycznie przekonwertowana na tf.function i zostaną zastosowane zwykłe zastrzeżenia dotyczące pisania kodu tf.function (brak efektów ubocznych).

# Apply the preprocessing in tf.data.Dataset.map.
dataset = tf.data.Dataset.from_tensor_slices((features, labels)).batch(1)
dataset = dataset.map(lambda x, y: (preprocessing_model(x), y),
                      num_parallel_calls=tf.data.AUTOTUNE)
# Display a preprocessed input sample.
next(dataset.take(1).as_numpy_iterator())

({'type': array([[1., 0., 0.]], dtype=float32),
  'size': array([1]),
  'weight': array([0.70000005], dtype=float32)},
 array([1], dtype=int32))

Następnie możesz zdefiniować oddzielny Model zawierający warstwy, które można trenować. Zwróć uwagę, jak dane wejściowe do tego modelu odzwierciedlają teraz wstępnie przetworzone typy funkcji i kształty.

inputs = {
  'type': tf.keras.Input(shape=(one_hot_dims,), dtype='float32'),
  'size': tf.keras.Input(shape=(), dtype='int64'),
  'weight': tf.keras.Input(shape=(), dtype='float32'),
}
# Since the embedding is trainable, it needs to be part of the training model.
embedding = tf.keras.layers.Embedding(len(vocab), embedding_dims)
outputs = tf.keras.layers.Concatenate()([
  inputs['type'],
  embedding(inputs['size']),
  tf.expand_dims(inputs['weight'], -1),
])
outputs = tf.keras.layers.Dense(1)(outputs)
training_model = tf.keras.Model(inputs, outputs)

Możesz teraz trenować training_model za pomocą tf.keras.Model.fit .

# Train on the preprocessed data.
training_model.compile(
    loss=tf.keras.losses.BinaryCrossentropy(from_logits=True))
training_model.fit(dataset)

3/3 [==============================] - 0s 3ms/step - loss: 0.7248
<keras.callbacks.History at 0x7f9041a294d0>

Wreszcie, w czasie wnioskowania, przydatne może być połączenie tych oddzielnych etapów w jeden model, który obsługuje surowe dane wejściowe funkcji.

inputs = preprocessing_model.input
outpus = training_model(preprocessing_model(inputs))
inference_model = tf.keras.Model(inputs, outpus)

predict_dataset = tf.data.Dataset.from_tensor_slices(predict_features).batch(1)
inference_model.predict(predict_dataset)

array([[0.936637]], dtype=float32)

Ten złożony model można zapisać jako SavedModel do późniejszego wykorzystania.

inference_model.save('model')
restored_model = tf.keras.models.load_model('model')
restored_model.predict(predict_dataset)

WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model.
2021-10-27 01:23:25.649967: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
INFO:tensorflow:Assets written to: model/assets
WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually.
array([[0.936637]], dtype=float32)

Uwaga: Warstw przetwarzania wstępnego nie można trenować, co umożliwia stosowanie ich asynchronicznie za pomocą tf.data . Ma to zalety wydajnościowe, ponieważ można zarówno wstępnie pobrać wstępnie przetworzone partie, jak i zwolnić wszelkie akceleratory, aby skoncentrować się na zróżnicowanych częściach modelu. Jak pokazano w tym przewodniku, oddzielenie przetwarzania wstępnego podczas uczenia i tworzenie go podczas wnioskowania jest elastycznym sposobem wykorzystania tych przyrostów wydajności. Jeśli jednak Twój model jest mały lub czas przetwarzania wstępnego jest nieistotny, prostsze może być wbudowanie przetwarzania wstępnego w kompletny model od samego początku. Aby to zrobić, możesz zbudować pojedynczy model, zaczynając od tf.keras.Input , po których następują warstwy przetwarzania wstępnego, a następnie warstwy możliwe do trenowania.

Tabela równoważności kolumny funkcji

W celach informacyjnych poniżej przedstawiono przybliżoną zależność między kolumnami funkcji a warstwami przetwarzania wstępnego:

Kolumna funkcji	Warstwa Keras
`feature_column.bucketized_column`	`layers.Discretization`
`feature_column.categorical_column_with_hash_bucket`	`layers.Hashing`
`feature_column.categorical_column_with_identity`	`layers.CategoryEncoding`
`feature_column.categorical_column_with_vocabulary_file`	`layers.StringLookup` lub `layers.IntegerLookup`
`feature_column.categorical_column_with_vocabulary_list`	`layers.StringLookup` lub `layers.IntegerLookup`
`feature_column.crossed_column`	Nie zaimplementowano.
`feature_column.embedding_column`	`layers.Embedding`
`feature_column.indicator_column`	`output_mode='one_hot'` lub `output_mode='multi_hot'` *
`feature_column.numeric_column`	`layers.Normalization`
`feature_column.sequence_categorical_column_with_hash_bucket`	`layers.Hashing`
`feature_column.sequence_categorical_column_with_identity`	`layers.CategoryEncoding`
`feature_column.sequence_categorical_column_with_vocabulary_file`	`layers.StringLookup` , `layer.TextVectorization` `layers.IntegerLookup`
`feature_column.sequence_categorical_column_with_vocabulary_list`	`layers.StringLookup` , `layer.TextVectorization` `layers.IntegerLookup`
`feature_column.sequence_numeric_column`	`layers.Normalization`
`feature_column.weighted_categorical_column`	`layers.CategoryEncoding`

* output_mode można przekazać do layers.CategoryEncoding , layers.StringLookup , layers.IntegerLookup i layers.TextVectorization .

† layers.TextVectorization może bezpośrednio obsługiwać dowolne wprowadzanie tekstu (np. całe zdania lub akapity). Nie jest to jeden do jednego zamiennika obsługi sekwencji kategorialnych w TF1, ale może zaoferować wygodne zastąpienie wstępnego przetwarzania tekstu ad-hoc.

Następne kroki

Aby uzyskać więcej informacji na temat warstw wstępnego przetwarzania Keras, zobacz przewodnik dotyczący wstępnego przetwarzania warstw .
Bardziej szczegółowy przykład zastosowania warstw przetwarzania wstępnego do danych strukturalnych znajdziesz w samouczku dotyczącym danych strukturalnych .