迁移示例:预设 Estimator

在 TensorFlow.org 上查看 在 Google Colab 中运行 在 Github 上查看源代码 下载笔记本

预设(或预制)Estimator 在 TensorFlow 1 中一直被用作一种快速简单的方式来针对各种典型用例训练模型。TensorFlow 2 通过 Keras 模型为其中一些方式提供了直接的近似替代。对于那些没有内置 TensorFlow 2 替代的预设 Estimator,您仍然能够相当轻松地构建自己的替代。

本指南将通过几个直接等效项和自定义替代示例来演示如何使用 Keras 将 TensorFlow 1 的 tf.estimator 派生模型迁移到 TensorFlow 2。

即,本指南包含下列迁移过程的示例:

  • 从 TensorFlow 1 中 tf.estimatorLinearEstimatorClassifierRegressor 到 TensorFlow 2 中的 Keras tf.compat.v1.keras.models.LinearModel
  • 从 TensorFlow 1 中 tf.estimatorDNNEstimatorClassifierRegressor 到 TensorFlow 2 中的自定义 Keras DNN ModelKeras
  • 从 TensorFlow 1 中 tf.estimatorDNNLinearCombinedEstimatorClassifierRegressor 到 TensorFlow 2 中的 tf.compat.v1.keras.models.WideDeepModel
  • 从 TensorFlow 1 中 tf.estimatorBoostedTreesEstimatorClassifierRegressor 到 TensorFlow 2 中的 tfdf.keras.GradientBoostedTreesModel in

模型训练的一个常见前身是特征预处理,可以使用 tf.feature_column 为 TensorFlow 1 Estimator 模型完成此过程。有关 TensorFlow 2 中特征预处理的更多信息,请参阅有关从特征列迁移到 Keras 预处理层 API 的本指南

安装

从几个必要的 TensorFlow 导入开始:

pip install tensorflow_decision_forests
import keras
import pandas as pd
import tensorflow as tf
import tensorflow.compat.v1 as tf1
import tensorflow_decision_forests as tfdf
2022-12-14 20:53:33.238375: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2022-12-14 20:53:33.238495: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2022-12-14 20:53:33.238506: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.

从标准 Titanic 数据集中准备一些简单的数据进行演示:

x_train = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/train.csv')
x_eval = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/eval.csv')
x_train['sex'].replace(('male', 'female'), (0, 1), inplace=True)
x_eval['sex'].replace(('male', 'female'), (0, 1), inplace=True)

x_train['alone'].replace(('n', 'y'), (0, 1), inplace=True)
x_eval['alone'].replace(('n', 'y'), (0, 1), inplace=True)

x_train['class'].replace(('First', 'Second', 'Third'), (1, 2, 3), inplace=True)
x_eval['class'].replace(('First', 'Second', 'Third'), (1, 2, 3), inplace=True)

x_train.drop(['embark_town', 'deck'], axis=1, inplace=True)
x_eval.drop(['embark_town', 'deck'], axis=1, inplace=True)

y_train = x_train.pop('survived')
y_eval = x_eval.pop('survived')
# Data setup for TensorFlow 1 with `tf.estimator`
def _input_fn():
  return tf1.data.Dataset.from_tensor_slices((dict(x_train), y_train)).batch(32)


def _eval_input_fn():
  return tf1.data.Dataset.from_tensor_slices((dict(x_eval), y_eval)).batch(32)


FEATURE_NAMES = [
    'age', 'fare', 'sex', 'n_siblings_spouses', 'parch', 'class', 'alone'
]

feature_columns = []
for fn in FEATURE_NAMES:
  feat_col = tf1.feature_column.numeric_column(fn, dtype=tf.float32)
  feature_columns.append(feat_col)

然后,创建一个方法来实例化一个简单的样本优化器,以便与我们的各种 TensorFlow 1 Estimator 和 TensorFlow 2 Keras 模型一起使用。

def create_sample_optimizer(tf_version):
  if tf_version == 'tf1':
    optimizer = lambda: tf.keras.optimizers.legacy.Ftrl(
        l1_regularization_strength=0.001,
        learning_rate=tf1.train.exponential_decay(
            learning_rate=0.1,
            global_step=tf1.train.get_global_step(),
            decay_steps=10000,
            decay_rate=0.9))
  elif tf_version == 'tf2':
    optimizer = tf.keras.optimizers.legacy.Ftrl(
        l1_regularization_strength=0.001,
        learning_rate=tf.keras.optimizers.schedules.ExponentialDecay(
            initial_learning_rate=0.1, decay_steps=10000, decay_rate=0.9))
  return optimizer

示例 1:从 LinearEstimator 迁移

TensorFlow 1:使用 LinearEstimator

在 TensorFlow 1 中,可以使用 tf.estimator.LinearEstimator 为回归和分类问题创建基线线性模型。

linear_estimator = tf.estimator.LinearEstimator(
    head=tf.estimator.BinaryClassHead(),
    feature_columns=feature_columns,
    optimizer=create_sample_optimizer('tf1'))
INFO:tensorflow:Using default config.
WARNING:tensorflow:Using temporary folder as model directory: /tmpfs/tmp/tmpqld_ko1v
INFO:tensorflow:Using config: {'_model_dir': '/tmpfs/tmp/tmpqld_ko1v', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
linear_estimator.train(input_fn=_input_fn, steps=100)
linear_estimator.evaluate(input_fn=_eval_input_fn, steps=10)
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/training/training_util.py:396: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
INFO:tensorflow:Calling model_fn.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/keras/optimizers/optimizer_v2/ftrl.py:173: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Saving checkpoints for 0 into /tmpfs/tmp/tmpqld_ko1v/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:loss = 0.6931472, step = 0
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 20...
INFO:tensorflow:Saving checkpoints for 20 into /tmpfs/tmp/tmpqld_ko1v/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 20...
INFO:tensorflow:Loss for final step: 0.552688.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2022-12-14T20:53:40
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmpfs/tmp/tmpqld_ko1v/model.ckpt-20
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [1/10]
INFO:tensorflow:Evaluation [2/10]
INFO:tensorflow:Evaluation [3/10]
INFO:tensorflow:Evaluation [4/10]
INFO:tensorflow:Evaluation [5/10]
INFO:tensorflow:Evaluation [6/10]
INFO:tensorflow:Evaluation [7/10]
INFO:tensorflow:Evaluation [8/10]
INFO:tensorflow:Evaluation [9/10]
INFO:tensorflow:Inference Time : 0.51476s
INFO:tensorflow:Finished evaluation at 2022-12-14-20:53:41
INFO:tensorflow:Saving dict for global step 20: accuracy = 0.70075756, accuracy_baseline = 0.625, auc = 0.75472915, auc_precision_recall = 0.65362054, average_loss = 0.5759378, global_step = 20, label/mean = 0.375, loss = 0.5704811, precision = 0.6388889, prediction/mean = 0.41331065, recall = 0.46464646
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 20: /tmpfs/tmp/tmpqld_ko1v/model.ckpt-20
{'accuracy': 0.70075756,
 'accuracy_baseline': 0.625,
 'auc': 0.75472915,
 'auc_precision_recall': 0.65362054,
 'average_loss': 0.5759378,
 'label/mean': 0.375,
 'loss': 0.5704811,
 'precision': 0.6388889,
 'prediction/mean': 0.41331065,
 'recall': 0.46464646,
 'global_step': 20}

TensorFlow 2:使用 Keras LinearModel

在 TensorFlow 2 中,可以创建 Keras tf.compat.v1.keras.models.LinearModel 的实例,它是 tf.estimator.LinearEstimator 的替代。tf.compat.v1.keras 路径用于表示预制模型的存在目的是兼容性。

linear_model = tf.compat.v1.keras.experimental.LinearModel()
linear_model.compile(loss='mse', optimizer=create_sample_optimizer('tf2'), metrics=['accuracy'])
linear_model.fit(x_train, y_train, epochs=10)
linear_model.evaluate(x_eval, y_eval, return_dict=True)
Epoch 1/10
20/20 [==============================] - 0s 2ms/step - loss: 7.8412 - accuracy: 0.6316
Epoch 2/10
20/20 [==============================] - 0s 2ms/step - loss: 0.1941 - accuracy: 0.7065
Epoch 3/10
20/20 [==============================] - 0s 2ms/step - loss: 0.1793 - accuracy: 0.7448
Epoch 4/10
20/20 [==============================] - 0s 2ms/step - loss: 0.1689 - accuracy: 0.7831
Epoch 5/10
20/20 [==============================] - 0s 2ms/step - loss: 0.1627 - accuracy: 0.7959
Epoch 6/10
20/20 [==============================] - 0s 2ms/step - loss: 0.1587 - accuracy: 0.8022
Epoch 7/10
20/20 [==============================] - 0s 2ms/step - loss: 0.1581 - accuracy: 0.8102
Epoch 8/10
20/20 [==============================] - 0s 2ms/step - loss: 0.1580 - accuracy: 0.8038
Epoch 9/10
20/20 [==============================] - 0s 2ms/step - loss: 0.1560 - accuracy: 0.8038
Epoch 10/10
20/20 [==============================] - 0s 2ms/step - loss: 0.1573 - accuracy: 0.8086
9/9 [==============================] - 0s 2ms/step - loss: 0.1914 - accuracy: 0.7424
{'loss': 0.19140790402889252, 'accuracy': 0.7424242496490479}

示例 2:从 DNNEstimator 迁移

TensorFlow 1:使用 DNNEstimator

在 TensorFlow 1 中,可以使用 tf.estimator.DNNEstimator 为回归和分类问题创建基线深度神经网络 (DNN) 模型。

dnn_estimator = tf.estimator.DNNEstimator(
    head=tf.estimator.BinaryClassHead(),
    feature_columns=feature_columns,
    hidden_units=[128],
    activation_fn=tf.nn.relu,
    optimizer=create_sample_optimizer('tf1'))
INFO:tensorflow:Using default config.
WARNING:tensorflow:Using temporary folder as model directory: /tmpfs/tmp/tmpsv1l4m2i
INFO:tensorflow:Using config: {'_model_dir': '/tmpfs/tmp/tmpsv1l4m2i', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
dnn_estimator.train(input_fn=_input_fn, steps=100)
dnn_estimator.evaluate(input_fn=_eval_input_fn, steps=10)
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
2022-12-14 20:53:43.081312: W tensorflow/core/common_runtime/type_inference.cc:339] Type inference failed. This indicates an invalid graph that escaped type checking. Error message: INVALID_ARGUMENT: expected compatible input types, but input 1:
type_id: TFT_OPTIONAL
args {
  type_id: TFT_PRODUCT
  args {
    type_id: TFT_TENSOR
    args {
      type_id: TFT_INT64
    }
  }
}
 is neither a subtype nor a supertype of the combined inputs preceding it:
type_id: TFT_OPTIONAL
args {
  type_id: TFT_PRODUCT
  args {
    type_id: TFT_TENSOR
    args {
      type_id: TFT_INT32
    }
  }
}

    while inferring type of node 'dnn/zero_fraction/cond/output/_18'
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Saving checkpoints for 0 into /tmpfs/tmp/tmpsv1l4m2i/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:loss = 2.3800273, step = 0
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 20...
INFO:tensorflow:Saving checkpoints for 20 into /tmpfs/tmp/tmpsv1l4m2i/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 20...
INFO:tensorflow:Loss for final step: 0.58426255.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2022-12-14T20:53:44
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmpfs/tmp/tmpsv1l4m2i/model.ckpt-20
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [1/10]
INFO:tensorflow:Evaluation [2/10]
INFO:tensorflow:Evaluation [3/10]
INFO:tensorflow:Evaluation [4/10]
INFO:tensorflow:Evaluation [5/10]
INFO:tensorflow:Evaluation [6/10]
INFO:tensorflow:Evaluation [7/10]
INFO:tensorflow:Evaluation [8/10]
INFO:tensorflow:Evaluation [9/10]
INFO:tensorflow:Inference Time : 0.46536s
INFO:tensorflow:Finished evaluation at 2022-12-14-20:53:45
INFO:tensorflow:Saving dict for global step 20: accuracy = 0.70075756, accuracy_baseline = 0.625, auc = 0.70645857, auc_precision_recall = 0.6163224, average_loss = 0.5989076, global_step = 20, label/mean = 0.375, loss = 0.5935677, precision = 0.6388889, prediction/mean = 0.40254602, recall = 0.46464646
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 20: /tmpfs/tmp/tmpsv1l4m2i/model.ckpt-20
{'accuracy': 0.70075756,
 'accuracy_baseline': 0.625,
 'auc': 0.70645857,
 'auc_precision_recall': 0.6163224,
 'average_loss': 0.5989076,
 'label/mean': 0.375,
 'loss': 0.5935677,
 'precision': 0.6388889,
 'prediction/mean': 0.40254602,
 'recall': 0.46464646,
 'global_step': 20}

TensorFlow 2:使用 Keras 创建自定义 DNN 模型

在 TensorFlow 2 中,可以创建一个自定义 DNN 模型来替代由 tf.estimator.DNNEstimator 生成的模型,此模型具有类似级别的用户指定自定义(例如,与前面的示例一样,能够自定义选定的模型优化器)。

可以使用类似的工作流将 tf.estimator.experimental.RNNEstimator 替换为 Keras RNN 模型。Keras 通过 tf.keras.layers.RNNtf.keras.layers.LSTMtf.keras.layers.GRU 提供了许多内置的可自定义选项。要了解详情,请查看 使用 Keras 的 RNN 指南内置 RNN 层:简单示例部分。

dnn_model = tf.keras.models.Sequential(
    [tf.keras.layers.Dense(128, activation='relu'),
     tf.keras.layers.Dense(1)])

dnn_model.compile(loss='mse', optimizer=create_sample_optimizer('tf2'), metrics=['accuracy'])
dnn_model.fit(x_train, y_train, epochs=10)
dnn_model.evaluate(x_eval, y_eval, return_dict=True)
Epoch 1/10
20/20 [==============================] - 0s 2ms/step - loss: 775.9051 - accuracy: 0.6236
Epoch 2/10
20/20 [==============================] - 0s 2ms/step - loss: 0.2534 - accuracy: 0.6762
Epoch 3/10
20/20 [==============================] - 0s 2ms/step - loss: 0.2125 - accuracy: 0.6970
Epoch 4/10
20/20 [==============================] - 0s 2ms/step - loss: 0.1981 - accuracy: 0.7161
Epoch 5/10
20/20 [==============================] - 0s 2ms/step - loss: 0.1814 - accuracy: 0.7432
Epoch 6/10
20/20 [==============================] - 0s 2ms/step - loss: 0.1782 - accuracy: 0.7480
Epoch 7/10
20/20 [==============================] - 0s 2ms/step - loss: 0.1738 - accuracy: 0.7560
Epoch 8/10
20/20 [==============================] - 0s 2ms/step - loss: 0.1693 - accuracy: 0.7592
Epoch 9/10
20/20 [==============================] - 0s 2ms/step - loss: 0.1689 - accuracy: 0.7608
Epoch 10/10
20/20 [==============================] - 0s 2ms/step - loss: 0.1625 - accuracy: 0.7735
9/9 [==============================] - 0s 2ms/step - loss: 0.1939 - accuracy: 0.7121
{'loss': 0.19385609030723572, 'accuracy': 0.7121211886405945}

示例 3:从 DNNLinearCombinedEstimator 迁移

TensorFlow 1:使用 DNNLinearCombinedEstimator

在 TensorFlow 1 中,可以使用 tf.estimator.DNNLinearCombinedEstimator 为回归和分类问题创建基线组合模型,并为其线性和 DNN 组件提供自定义能力。

optimizer = create_sample_optimizer('tf1')

combined_estimator = tf.estimator.DNNLinearCombinedEstimator(
    head=tf.estimator.BinaryClassHead(),
    # Wide settings
    linear_feature_columns=feature_columns,
    linear_optimizer=optimizer,
    # Deep settings
    dnn_feature_columns=feature_columns,
    dnn_hidden_units=[128],
    dnn_optimizer=optimizer)
INFO:tensorflow:Using default config.
WARNING:tensorflow:Using temporary folder as model directory: /tmpfs/tmp/tmp2ent7mpp
INFO:tensorflow:Using config: {'_model_dir': '/tmpfs/tmp/tmp2ent7mpp', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
combined_estimator.train(input_fn=_input_fn, steps=100)
combined_estimator.evaluate(input_fn=_eval_input_fn, steps=10)
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Saving checkpoints for 0 into /tmpfs/tmp/tmp2ent7mpp/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:loss = 4.2519217, step = 0
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 20...
INFO:tensorflow:Saving checkpoints for 20 into /tmpfs/tmp/tmp2ent7mpp/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 20...
INFO:tensorflow:Loss for final step: 0.55690926.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2022-12-14T20:53:49
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmpfs/tmp/tmp2ent7mpp/model.ckpt-20
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [1/10]
INFO:tensorflow:Evaluation [2/10]
INFO:tensorflow:Evaluation [3/10]
INFO:tensorflow:Evaluation [4/10]
INFO:tensorflow:Evaluation [5/10]
INFO:tensorflow:Evaluation [6/10]
INFO:tensorflow:Evaluation [7/10]
INFO:tensorflow:Evaluation [8/10]
INFO:tensorflow:Evaluation [9/10]
INFO:tensorflow:Inference Time : 0.54224s
INFO:tensorflow:Finished evaluation at 2022-12-14-20:53:49
INFO:tensorflow:Saving dict for global step 20: accuracy = 0.7083333, accuracy_baseline = 0.625, auc = 0.76424855, auc_precision_recall = 0.64961433, average_loss = 0.59428364, global_step = 20, label/mean = 0.375, loss = 0.5884412, precision = 0.6617647, prediction/mean = 0.41718265, recall = 0.45454547
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 20: /tmpfs/tmp/tmp2ent7mpp/model.ckpt-20
{'accuracy': 0.7083333,
 'accuracy_baseline': 0.625,
 'auc': 0.76424855,
 'auc_precision_recall': 0.64961433,
 'average_loss': 0.59428364,
 'label/mean': 0.375,
 'loss': 0.5884412,
 'precision': 0.6617647,
 'prediction/mean': 0.41718265,
 'recall': 0.45454547,
 'global_step': 20}

TensorFlow 2:使用 Keras WideDeepModel

在 TensorFlow 2 中,可以创建 Keras tf.compat.v1.keras.models.WideDeepModel 的一个实例来替代由 tf.estimator.DNNLinearCombinedEstimator 生成的实例,此实例具有类似级别的用户指定自定义(例如,与前面的示例一样,能够自定义选定的模型优化器)。

WideDeepModel 是在 LinearModel 组件和自定义 DNN 模型的基础上构造的,这两者均已在前面的两个示例中进行了探讨。如果需要,也可以使用自定义线性模型来替代内置的 Keras LinearModel

如果您想构建自己的模型而不是预设 Estimator,请查看 Keras 序贯模型指南。有关自定义训练和优化器的更多信息,请参阅自定义训练:演示指南。

# Create LinearModel and DNN Model as in Examples 1 and 2
optimizer = create_sample_optimizer('tf2')

linear_model = tf.compat.v1.keras.experimental.LinearModel()
linear_model.compile(loss='mse', optimizer=optimizer, metrics=['accuracy'])
linear_model.fit(x_train, y_train, epochs=10, verbose=0)

dnn_model = tf.keras.models.Sequential(
    [tf.keras.layers.Dense(128, activation='relu'),
     tf.keras.layers.Dense(1)])
dnn_model.compile(loss='mse', optimizer=optimizer, metrics=['accuracy'])
combined_model = tf.compat.v1.keras.experimental.WideDeepModel(linear_model,
                                                               dnn_model)
combined_model.compile(
    optimizer=[optimizer, optimizer], loss='mse', metrics=['accuracy'])
combined_model.fit([x_train, x_train], y_train, epochs=10)
combined_model.evaluate(x_eval, y_eval, return_dict=True)
Epoch 1/10
20/20 [==============================] - 0s 3ms/step - loss: 2211.5945 - accuracy: 0.7225
Epoch 2/10
20/20 [==============================] - 0s 2ms/step - loss: 0.2073 - accuracy: 0.7799
Epoch 3/10
20/20 [==============================] - 0s 2ms/step - loss: 0.1835 - accuracy: 0.7974
Epoch 4/10
20/20 [==============================] - 0s 2ms/step - loss: 0.1768 - accuracy: 0.7943
Epoch 5/10
20/20 [==============================] - 0s 2ms/step - loss: 0.1649 - accuracy: 0.8038
Epoch 6/10
20/20 [==============================] - 0s 2ms/step - loss: 0.1564 - accuracy: 0.7927
Epoch 7/10
20/20 [==============================] - 0s 2ms/step - loss: 0.1567 - accuracy: 0.7974
Epoch 8/10
20/20 [==============================] - 0s 2ms/step - loss: 0.1545 - accuracy: 0.8022
Epoch 9/10
20/20 [==============================] - 0s 3ms/step - loss: 0.1525 - accuracy: 0.7927
Epoch 10/10
20/20 [==============================] - 0s 2ms/step - loss: 0.1477 - accuracy: 0.8022
9/9 [==============================] - 0s 2ms/step - loss: 0.1784 - accuracy: 0.7462
{'loss': 0.17842726409435272, 'accuracy': 0.7462121248245239}

示例 4:从 BoostedTreesEstimator 迁移

TensorFlow 1:使用 BoostedTreesEstimator

在 TensorFlow 1 中,可以使用 tf.estimator.BoostedTreesEstimator 创建基线,以使用用于回归和分类问题的决策树集合创建一个基线梯度提升模型。TensorFlow 2 中不再包含此功能。

bt_estimator = tf1.estimator.BoostedTreesEstimator(
    head=tf.estimator.BinaryClassHead(),
    n_batches_per_layer=1,
    max_depth=10,
    n_trees=1000,
    feature_columns=feature_columns)
bt_estimator.train(input_fn=_input_fn, steps=1000)
bt_estimator.evaluate(input_fn=_eval_input_fn, steps=100)

TensorFlow 2:使用 TensorFlow Decision Forests

在 TensorFlow 2 中,tf.estimator.BoostedTreesEstimatorTensorFlow Decision Forests 软件包中的 tfdf.keras.GradientBoostedTreesModel 所替代。

tf.estimator.BoostedTreesEstimator 相比,TensorFlow Decision Forests 具备多项优势,尤其是在质量、速度、易用性和灵活性方面。要了解 TensorFlow Decision Forests,请从初学者 colab 开始。

以下示例显示了如何使用 TensorFlow 2 训练梯度提升树模型:

安装 TensorFlow Decision Forests。

pip install tensorflow_decision_forests

创建一个 TensorFlow 数据集。请注意,Decision Forests 原生支持多种类型的特征,不需要预处理。

train_dataframe = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/train.csv')
eval_dataframe = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/eval.csv')

# Convert the Pandas Dataframes into TensorFlow datasets.
train_dataset = tfdf.keras.pd_dataframe_to_tf_dataset(train_dataframe, label="survived")
eval_dataset = tfdf.keras.pd_dataframe_to_tf_dataset(eval_dataframe, label="survived")

train_dataset 数据集上训练模型。

# Use the default hyper-parameters of the model.
gbt_model = tfdf.keras.GradientBoostedTreesModel()
gbt_model.fit(train_dataset)
Warning: The `num_threads` constructor argument is not set and the number of CPU is os.cpu_count()=32 > 32. Setting num_threads to 32. Set num_threads manually to use more than 32 cpus.
WARNING:absl:The `num_threads` constructor argument is not set and the number of CPU is os.cpu_count()=32 > 32. Setting num_threads to 32. Set num_threads manually to use more than 32 cpus.
Use /tmpfs/tmp/tmpqnq3zd2b as temporary training directory
Reading training dataset...
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/autograph/pyct/static_analysis/liveness.py:83: Analyzer.lamba_check (from tensorflow.python.autograph.pyct.static_analysis.liveness) is deprecated and will be removed after 2023-09-23.
Instructions for updating:
Lambda fuctions will be no more assumed to be used in the statement where they are used, or at least in the same block. https://github.com/tensorflow/tensorflow/issues/56089
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/autograph/pyct/static_analysis/liveness.py:83: Analyzer.lamba_check (from tensorflow.python.autograph.pyct.static_analysis.liveness) is deprecated and will be removed after 2023-09-23.
Instructions for updating:
Lambda fuctions will be no more assumed to be used in the statement where they are used, or at least in the same block. https://github.com/tensorflow/tensorflow/issues/56089
Training dataset read in 0:00:03.150387. Found 627 examples.
Training model...
2022-12-14 20:53:56.760607: W external/ydf/yggdrasil_decision_forests/learner/gradient_boosted_trees/gradient_boosted_trees.cc:1765] Subsample hyperparameter given but sampling method does not match.
2022-12-14 20:53:56.760651: W external/ydf/yggdrasil_decision_forests/learner/gradient_boosted_trees/gradient_boosted_trees.cc:1778] GOSS alpha hyperparameter given but GOSS is disabled.
2022-12-14 20:53:56.760659: W external/ydf/yggdrasil_decision_forests/learner/gradient_boosted_trees/gradient_boosted_trees.cc:1787] GOSS beta hyperparameter given but GOSS is disabled.
2022-12-14 20:53:56.760665: W external/ydf/yggdrasil_decision_forests/learner/gradient_boosted_trees/gradient_boosted_trees.cc:1799] SelGB ratio hyperparameter given but SelGB is disabled.
Model trained in 0:00:00.221226
Compiling model...
[INFO 2022-12-14T20:53:56.972436202+00:00 kernel.cc:1175] Loading model from path /tmpfs/tmp/tmpqnq3zd2b/model/ with prefix b4d3be4844094a05
[INFO 2022-12-14T20:53:56.97594625+00:00 abstract_model.cc:1306] Engine "GradientBoostedTreesQuickScorerExtended" built
[INFO 2022-12-14T20:53:56.97598354+00:00 kernel.cc:1021] Use fast generic engine
WARNING:tensorflow:AutoGraph could not transform <function simple_ml_inference_op_with_handle at 0x7fbf28914d30> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: could not get source code
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING:tensorflow:AutoGraph could not transform <function simple_ml_inference_op_with_handle at 0x7fbf28914d30> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: could not get source code
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING: AutoGraph could not transform <function simple_ml_inference_op_with_handle at 0x7fbf28914d30> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: could not get source code
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
Model compiled.
<keras.callbacks.History at 0x7fbe6864d0d0>

eval_dataset 数据集上评估模型的质量。

gbt_model.compile(metrics=['accuracy'])
gbt_evaluation = gbt_model.evaluate(eval_dataset, return_dict=True)
print(gbt_evaluation)
1/1 [==============================] - 0s 286ms/step - loss: 0.0000e+00 - accuracy: 0.8295
{'loss': 0.0, 'accuracy': 0.8295454382896423}

梯度提升树只是 TensorFlow Decision Forests 中可用的众多决策森林算法之一。例如,随机森林(以 tfdf.keras.GradientBoostedTreesModel 的形式提供,非常抗过拟合),而 CART(以 tfdf.keras.CartModel 的形式提供)则非常适合模型解释。

在下一个示例中,训练并绘制一个随机森林模型。

# Train a Random Forest model
rf_model = tfdf.keras.RandomForestModel()
rf_model.fit(train_dataset)

# Evaluate the Random Forest model
rf_model.compile(metrics=['accuracy'])
rf_evaluation = rf_model.evaluate(eval_dataset, return_dict=True)
print(rf_evaluation)
Warning: The `num_threads` constructor argument is not set and the number of CPU is os.cpu_count()=32 > 32. Setting num_threads to 32. Set num_threads manually to use more than 32 cpus.
WARNING:absl:The `num_threads` constructor argument is not set and the number of CPU is os.cpu_count()=32 > 32. Setting num_threads to 32. Set num_threads manually to use more than 32 cpus.
Use /tmpfs/tmp/tmpkqitzwv6 as temporary training directory
Reading training dataset...
Training dataset read in 0:00:00.318851. Found 627 examples.
Training model...
Model trained in 0:00:00.193806
Compiling model...
[INFO 2022-12-14T20:53:59.044871325+00:00 kernel.cc:1175] Loading model from path /tmpfs/tmp/tmpkqitzwv6/model/ with prefix f5ecc502707143df
[INFO 2022-12-14T20:53:59.144414078+00:00 kernel.cc:1021] Use fast generic engine
Model compiled.
1/1 [==============================] - 0s 128ms/step - loss: 0.0000e+00 - accuracy: 0.8333
{'loss': 0.0, 'accuracy': 0.8333333134651184}

在最后一个示例中,训练和评估一个 CART 模型。

# Train a CART model
cart_model = tfdf.keras.CartModel()
cart_model.fit(train_dataset)

# Plot the CART model
tfdf.model_plotter.plot_model_in_colab(cart_model, max_depth=2)
Warning: The `num_threads` constructor argument is not set and the number of CPU is os.cpu_count()=32 > 32. Setting num_threads to 32. Set num_threads manually to use more than 32 cpus.
WARNING:absl:The `num_threads` constructor argument is not set and the number of CPU is os.cpu_count()=32 > 32. Setting num_threads to 32. Set num_threads manually to use more than 32 cpus.
Use /tmpfs/tmp/tmpw1db3ktf as temporary training directory
Reading training dataset...
Training dataset read in 0:00:00.178616. Found 627 examples.
Training model...
2022-12-14 20:53:59.640631: W external/ydf/yggdrasil_decision_forests/model/random_forest/random_forest.cc:607] ValidationEvaluation requires OOB evaluation enabled.Random Forest models should be trained with compute_oob_performances:true. CART models do not support OOB evaluation.
Model trained in 0:00:00.017503
Compiling model...
Model compiled.
[INFO 2022-12-14T20:53:59.652667984+00:00 kernel.cc:1175] Loading model from path /tmpfs/tmp/tmpw1db3ktf/model/ with prefix 9cf0c013f7a4462d
[INFO 2022-12-14T20:53:59.653026898+00:00 kernel.cc:1021] Use fast generic engine