Migra tu código de TensorFlow 1 a TensorFlow 2

Ver en TensorFlow.org Ejecutar en Google Colab Ver fuente en GitHub Descargar cuaderno

Esta guía está destinada a usuarios de API de TensorFlow de bajo nivel. Si está utilizando las API de alto nivel ( tf.keras ) puede haber poca o ninguna acción que debe tomar para hacer que el código totalmente compatibles TensorFlow 2.x:

Todavía es posible ejecutar código 1.x, no modificada ( a excepción de contrib ), en TensorFlow 2.x:

import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()

Sin embargo, esto no le permite aprovechar muchas de las mejoras realizadas en TensorFlow 2.x. Esta guía lo ayudará a actualizar su código, haciéndolo más simple, más eficiente y más fácil de mantener.

Script de conversión automática

El primer paso, antes de intentar poner en práctica los cambios que se describen en esta guía, es tratar de ejecutar el script de actualización .

Esto ejecutará un pase inicial al actualizar su código a TensorFlow 2.x, pero no puede convertir su código en idiomático a v2. Su código puede todavía hacer uso de tf.compat.v1 puntos finales a los marcadores de posición de acceso, sesiones, colecciones, y otras funcionalidades de estilo 1.x.

Cambios de comportamiento de alto nivel

Si su código funciona en TensorFlow 2.x utilizando tf.compat.v1.disable_v2_behavior , todavía hay cambios de comportamiento globales que pueda necesitar a la dirección. Los principales cambios son:

  • Ejecución ansiosos, v1.enable_eager_execution() : Cualquier código que utiliza implícitamente una tf.Graph fallará. Asegúrese de envolver el código en una with tf.Graph().as_default() contexto.

  • Las variables de recurso, v1.enable_resource_variables() : Un poco de código Puede depende de los comportamientos no deterministas habilitadas por las variables de referencia TensorFlow. Las variables de recursos se bloquean mientras se escriben, por lo que proporcionan garantías de coherencia más intuitivas.

    • Esto puede cambiar el comportamiento en casos extremos.
    • Esto puede crear copias adicionales y puede tener un mayor uso de memoria.
    • Esto se puede desactivar mediante el paso use_resource=False a la tf.Variable constructor.
  • Tensor formas, v1.enable_v2_tensorshape() : TensorFlow 2.x simplifica el comportamiento de las formas del tensor. En lugar de t.shape[0].value se puede decir t.shape[0] . Estos cambios deben ser pequeños y tiene sentido corregirlos de inmediato. Consulte la TensorShape sección para ejemplos.

  • Control de flujo, v1.enable_control_flow_v2() : La aplicación de control de flujo TensorFlow 2.x se ha simplificado, y así produce diferentes representaciones de gráficos. Por favor informar de errores para cualquier problema.

Crear código para TensorFlow 2.x

Esta guía mostrará varios ejemplos de conversión de código de TensorFlow 1.x a TensorFlow 2.x. Estos cambios permitirán que su código aproveche las optimizaciones de rendimiento y las llamadas API simplificadas.

En cada caso, el patrón es:

1. Reemplazar v1.Session.run llamadas

Cada v1.Session.run llamada debe ser reemplazada por una función de Python.

  • Los feed_dict y v1.placeholder s se convierten en argumentos de la función.
  • Las fetches se convierten en valor de retorno de la función.
  • Durante la conversión de ejecución ansiosos permite una fácil depuración con herramientas estándar de Python como pdb .

Después de eso, añadir un tf.function decorador para que funcione de manera eficiente en el gráfico. Echa un vistazo a la guía de autógrafos para obtener más información acerca de cómo funciona este.

Tenga en cuenta que:

  • A diferencia de v1.Session.run , un tf.function tiene una firma de rendimiento fijo y siempre devuelve todas las salidas. Si esto causa problemas de rendimiento, cree dos funciones separadas.

  • No hay necesidad de un tf.control_dependencies u operaciones similares: Un tf.function se comporta como si se ejecutan en el orden escrita. tf.Variable misiones y tf.assert s, por ejemplo, se ejecutan automáticamente.

La sección de los modelos de conversión contiene un ejemplo de trabajo de este proceso de conversión.

2. Utilice objetos de Python para realizar un seguimiento de las variables y las pérdidas

Todo el seguimiento de variables basado en nombres está totalmente desaconsejado en TensorFlow 2.x. Utilice objetos de Python para realizar un seguimiento de las variables.

Uso tf.Variable en lugar de v1.get_variable .

Cada v1.variable_scope debe ser convertido a un objeto Python. Normalmente, este será uno de los siguientes:

Si necesita listas agregadas de las variables (como tf.Graph.get_collection(tf.GraphKeys.VARIABLES) ), utilice los .variables y .trainable_variables atributos de la Layer y el Model objetos.

Estas Layer y Model clases implementan varias otras propiedades que eliminan la necesidad de colecciones globales. Su .losses propiedad puede ser un reemplazo para el uso de la tf.GraphKeys.LOSSES colección.

Consulte las guías Keras para más detalles.

3. Actualiza tus circuitos de entrenamiento

Utilice la API de más alto nivel que funcione para su caso de uso. Prefiero tf.keras.Model.fit sobre la construcción de sus propios bucles de formación.

Estas funciones de alto nivel gestionan muchos de los detalles de bajo nivel que pueden ser fáciles de pasar por alto si escribe su propio ciclo de entrenamiento. Por ejemplo, ellos automáticamente a cobro revertido las pérdidas de regularización, y establecer el training=True argumento cuando se llama el modelo.

4. Actualice sus canalizaciones de entrada de datos

Utilice tf.data conjuntos de datos para la entrada de datos. Estos objetos son eficientes, expresivos y se integran bien con tensorflow.

Ellos pueden pasar directamente a la tf.keras.Model.fit método.

model.fit(dataset, epochs=5)

Se pueden iterar sobre Python directamente estándar:

for example_batch, label_batch in dataset:
    break

5. Migrate fuera compat.v1 símbolos

El tf.compat.v1 módulo contiene la API completa TensorFlow 1.x, con su semántica originales.

El TensorFlow 2.x script de actualización convertirá símbolos a sus equivalentes v2 si tal conversión es seguro, es decir, si se puede determinar que el comportamiento de la versión 2.x TensorFlow es exactamente equivalente (por ejemplo, cambiará el nombre v1.arg_max a tf.argmax , ya que son la misma función).

Después de que el script de actualización se realiza con una pieza de código, es probable que haya muchas menciones de compat.v1 . Vale la pena revisar el código y convertirlos manualmente al equivalente v2 (debe mencionarse en el registro si lo hay).

Conversión de modelos

Variables de bajo nivel y ejecución del operador

Ejemplos de uso de API de bajo nivel incluyen:

  • Uso de alcances variables para controlar la reutilización.
  • Creación de variables con v1.get_variable .
  • Acceder a colecciones de forma explícita.
  • Accediendo a colecciones implícitamente con métodos como:

  • Usando v1.placeholder para configurar las entradas de gráficos.

  • La ejecución de gráficos con Session.run .

  • Inicializando variables manualmente.

Antes de convertir

Así es como pueden verse estos patrones en el código con TensorFlow 1.x.

import tensorflow as tf
import tensorflow.compat.v1 as v1

import tensorflow_datasets as tfds
2021-07-19 23:37:03.701382: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
g = v1.Graph()

with g.as_default():
  in_a = v1.placeholder(dtype=v1.float32, shape=(2))
  in_b = v1.placeholder(dtype=v1.float32, shape=(2))

  def forward(x):
    with v1.variable_scope("matmul", reuse=v1.AUTO_REUSE):
      W = v1.get_variable("W", initializer=v1.ones(shape=(2,2)),
                          regularizer=lambda x:tf.reduce_mean(x**2))
      b = v1.get_variable("b", initializer=v1.zeros(shape=(2)))
      return W * x + b

  out_a = forward(in_a)
  out_b = forward(in_b)
  reg_loss=v1.losses.get_regularization_loss(scope="matmul")

with v1.Session(graph=g) as sess:
  sess.run(v1.global_variables_initializer())
  outs = sess.run([out_a, out_b, reg_loss],
                feed_dict={in_a: [1, 0], in_b: [0, 1]})

print(outs[0])
print()
print(outs[1])
print()
print(outs[2])
2021-07-19 23:37:05.720243: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-07-19 23:37:06.406838: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:06.407495: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:00:05.0 name: NVIDIA Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-07-19 23:37:06.407533: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-19 23:37:06.410971: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-07-19 23:37:06.411090: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-07-19 23:37:06.412239: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-07-19 23:37:06.412612: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-07-19 23:37:06.413657: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-07-19 23:37:06.414637: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-07-19 23:37:06.414862: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-07-19 23:37:06.415002: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:06.415823: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:06.416461: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-19 23:37:06.417159: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-07-19 23:37:06.417858: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:06.418588: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:00:05.0 name: NVIDIA Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-07-19 23:37:06.418704: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:06.419416: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:06.420021: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-19 23:37:06.420085: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-19 23:37:07.053897: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-19 23:37:07.053954: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 
2021-07-19 23:37:07.053964: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N 
2021-07-19 23:37:07.054212: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:07.054962: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:07.055685: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:07.056348: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14646 MB memory) -> physical GPU (device: 0, name: NVIDIA Tesla V100-SXM2-16GB, pci bus id: 0000:00:05.0, compute capability: 7.0)
2021-07-19 23:37:07.060371: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2000165000 Hz
[[1. 0.]
 [1. 0.]]

[[0. 1.]
 [0. 1.]]

1.0

Después de convertir

En el código convertido:

  • Las variables son objetos de Python locales.
  • El forward función todavía define el cálculo.
  • El Session.run llamada se sustituye con una llamada a forward .
  • El opcional tf.function decorador puede ser añadido para el rendimiento.
  • Las regularizaciones se calculan manualmente, sin hacer referencia a ninguna colección global.
  • No hay uso de sesiones o marcadores de posición.
W = tf.Variable(tf.ones(shape=(2,2)), name="W")
b = tf.Variable(tf.zeros(shape=(2)), name="b")

@tf.function
def forward(x):
  return W * x + b

out_a = forward([1,0])
print(out_a)
tf.Tensor(
[[1. 0.]
 [1. 0.]], shape=(2, 2), dtype=float32)
2021-07-19 23:37:07.370160: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:07.370572: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:00:05.0 name: NVIDIA Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-07-19 23:37:07.370699: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:07.371011: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:07.371278: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-19 23:37:07.371360: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-19 23:37:07.371370: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 
2021-07-19 23:37:07.371377: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N 
2021-07-19 23:37:07.371511: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:07.371844: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:07.372131: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14646 MB memory) -> physical GPU (device: 0, name: NVIDIA Tesla V100-SXM2-16GB, pci bus id: 0000:00:05.0, compute capability: 7.0)
2021-07-19 23:37:07.419147: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
out_b = forward([0,1])

regularizer = tf.keras.regularizers.l2(0.04)
reg_loss=regularizer(W)

Los modelos basados en tf.layers

El v1.layers módulo se utiliza para contener la capa de funciones que se basaban en v1.variable_scope para definir y las variables de reutilización.

Antes de convertir

def model(x, training, scope='model'):
  with v1.variable_scope(scope, reuse=v1.AUTO_REUSE):
    x = v1.layers.conv2d(x, 32, 3, activation=v1.nn.relu,
          kernel_regularizer=lambda x:0.004*tf.reduce_mean(x**2))
    x = v1.layers.max_pooling2d(x, (2, 2), 1)
    x = v1.layers.flatten(x)
    x = v1.layers.dropout(x, 0.1, training=training)
    x = v1.layers.dense(x, 64, activation=v1.nn.relu)
    x = v1.layers.batch_normalization(x, training=training)
    x = v1.layers.dense(x, 10)
    return x
train_data = tf.ones(shape=(1, 28, 28, 1))
test_data = tf.ones(shape=(1, 28, 28, 1))

train_out = model(train_data, training=True)
test_out = model(test_data, training=False)

print(train_out)
print()
print(test_out)
/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/keras/legacy_tf_layers/convolutional.py:414: UserWarning: `tf.layers.conv2d` is deprecated and will be removed in a future version. Please Use `tf.keras.layers.Conv2D` instead.
  warnings.warn('`tf.layers.conv2d` is deprecated and '
/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py:2183: UserWarning: `layer.apply` is deprecated and will be removed in a future version. Please use `layer.__call__` method instead.
  warnings.warn('`layer.apply` is deprecated and '
2021-07-19 23:37:07.471106: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-07-19 23:37:09.562531: I tensorflow/stream_executor/cuda/cuda_dnn.cc:359] Loaded cuDNN version 8100
2021-07-19 23:37:14.794726: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
tf.Tensor([[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]], shape=(1, 10), dtype=float32)

tf.Tensor(
[[ 0.04853132 -0.08974641 -0.32679698  0.07017353  0.12982666 -0.2153313
  -0.09793851  0.10957378  0.01823931  0.00898573]], shape=(1, 10), dtype=float32)
2021-07-19 23:37:15.173234: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/keras/legacy_tf_layers/pooling.py:310: UserWarning: `tf.layers.max_pooling2d` is deprecated and will be removed in a future version. Please use `tf.keras.layers.MaxPooling2D` instead.
  warnings.warn('`tf.layers.max_pooling2d` is deprecated and '
/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/keras/legacy_tf_layers/core.py:329: UserWarning: `tf.layers.flatten` is deprecated and will be removed in a future version. Please use `tf.keras.layers.Flatten` instead.
  warnings.warn('`tf.layers.flatten` is deprecated and '
/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/keras/legacy_tf_layers/core.py:268: UserWarning: `tf.layers.dropout` is deprecated and will be removed in a future version. Please use `tf.keras.layers.Dropout` instead.
  warnings.warn('`tf.layers.dropout` is deprecated and '
/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/keras/legacy_tf_layers/core.py:171: UserWarning: `tf.layers.dense` is deprecated and will be removed in a future version. Please use `tf.keras.layers.Dense` instead.
  warnings.warn('`tf.layers.dense` is deprecated and '
/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/keras/legacy_tf_layers/normalization.py:308: UserWarning: `tf.layers.batch_normalization` is deprecated and will be removed in a future version. Please use `tf.keras.layers.BatchNormalization` instead. In particular, `tf.control_dependencies(tf.GraphKeys.UPDATE_OPS)` should not be used (consult the `tf.keras.layers.BatchNormalization` documentation).
  '`tf.layers.batch_normalization` is deprecated and '

Después de convertir

La mayoría de los argumentos se mantuvieron igual. Pero note las diferencias:

  • La training argumento se pasa a cada capa por el modelo cuando se ejecuta.
  • El primer argumento para el original model de función (la entrada x ) se ha ido. Esto se debe a que las capas de objetos separan la construcción del modelo de la llamada al modelo.

También tenga en cuenta que:

  • Si está utilizando regularizers o inicializadores de tf.contrib , estos tienen más cambios de argumentos que otros.
  • El código ya no escribe a las colecciones, así como funciones v1.losses.get_regularization_loss ya no volverán estos valores, lo que podría romper sus lazos de formación.
model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, 3, activation='relu',
                           kernel_regularizer=tf.keras.regularizers.l2(0.04),
                           input_shape=(28, 28, 1)),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dropout(0.1),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Dense(10)
])

train_data = tf.ones(shape=(1, 28, 28, 1))
test_data = tf.ones(shape=(1, 28, 28, 1))
train_out = model(train_data, training=True)
print(train_out)
tf.Tensor([[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]], shape=(1, 10), dtype=float32)
test_out = model(test_data, training=False)
print(test_out)
tf.Tensor(
[[-0.06252427  0.30122417 -0.18610534 -0.04890637 -0.01496555  0.41607457
   0.24905115  0.014429   -0.12719882 -0.22354674]], shape=(1, 10), dtype=float32)
# Here are all the trainable variables
len(model.trainable_variables)
8
# Here is the regularization loss
model.losses
[<tf.Tensor: shape=(), dtype=float32, numpy=0.07443664>]

Las variables mixtas y v1.layers

Código existentes a menudo mezclas de nivel inferior TensorFlow 1.x variables y operaciones con mayor nivel v1.layers .

Antes de convertir

def model(x, training, scope='model'):
  with v1.variable_scope(scope, reuse=v1.AUTO_REUSE):
    W = v1.get_variable(
      "W", dtype=v1.float32,
      initializer=v1.ones(shape=x.shape),
      regularizer=lambda x:0.004*tf.reduce_mean(x**2),
      trainable=True)
    if training:
      x = x + W
    else:
      x = x + W * 0.5
    x = v1.layers.conv2d(x, 32, 3, activation=tf.nn.relu)
    x = v1.layers.max_pooling2d(x, (2, 2), 1)
    x = v1.layers.flatten(x)
    return x

train_out = model(train_data, training=True)
test_out = model(test_data, training=False)

Después de convertir

Para convertir este código, siga el patrón de mapeo de capas a capas como en el ejemplo anterior.

El patrón general es:

  • Parámetros de la capa se acumulan en __init__ .
  • Construir las variables de build .
  • Ejecutar los cálculos de call , y devolver el resultado.

El v1.variable_scope es esencialmente una capa de su propio. Así que volver a escribir como un tf.keras.layers.Layer . Echa un vistazo a los que hacen nuevas capas y modelos a través de la subclasificación de guía para más detalles.

# Create a custom layer for part of the model
class CustomLayer(tf.keras.layers.Layer):
  def __init__(self, *args, **kwargs):
    super(CustomLayer, self).__init__(*args, **kwargs)

  def build(self, input_shape):
    self.w = self.add_weight(
        shape=input_shape[1:],
        dtype=tf.float32,
        initializer=tf.keras.initializers.ones(),
        regularizer=tf.keras.regularizers.l2(0.02),
        trainable=True)

  # Call method will sometimes get used in graph mode,
  # training will get turned into a tensor
  @tf.function
  def call(self, inputs, training=None):
    if training:
      return inputs + self.w
    else:
      return inputs + self.w * 0.5
custom_layer = CustomLayer()
print(custom_layer([1]).numpy())
print(custom_layer([1], training=True).numpy())
[1.5]
[2.]
train_data = tf.ones(shape=(1, 28, 28, 1))
test_data = tf.ones(shape=(1, 28, 28, 1))

# Build the model including the custom layer
model = tf.keras.Sequential([
    CustomLayer(input_shape=(28, 28, 1)),
    tf.keras.layers.Conv2D(32, 3, activation='relu'),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Flatten(),
])

train_out = model(train_data, training=True)
test_out = model(test_data, training=False)

Algunas cosas a tener en cuenta:

  • Los modelos y capas subclasificados de Keras deben ejecutarse tanto en gráficos v1 (sin dependencias de control automático) como en modo ansioso:

    • Envuelva la call en un tf.function para obtener autógrafos y control automático de dependencias.
  • No se olvide de aceptar una training argumento para call :

    • A veces es una tf.Tensor
    • A veces es un booleano de Python
  • Crear las variables del modelo de constructor o Model.build utilizando `self.add_weight:

    • En Model.build tiene acceso a la forma de entrada, por lo que puede crear pesas con forma coincidente
    • Usando tf.keras.layers.Layer.add_weight permite Keras a las variables de la pista y las pérdidas de regularización
  • No mantenga tf.Tensors en sus objetos:

    • Puede ser que se crean ya sea en un tf.function o en el contexto ansiosos, y estos tensores comportarse de manera diferente
    • Uso tf.Variable es para el estado, son siempre utilizable a partir de los dos contextos
    • tf.Tensors son sólo para valores intermedios

Una nota sobre Slim y contrib.layers

Una gran cantidad de código mayores TensorFlow 1.x utiliza el delgado de la biblioteca, que fue empaquetado con TensorFlow 1.x como tf.contrib.layers . Como contrib módulo, esto ya no es disponible en TensorFlow 2.x, incluso en tf.compat.v1 . Código de la conversión usando Delgado a TensorFlow 2.x es más complicado que la conversión de los repositorios que utilizan v1.layers . De hecho, puede tener sentido para convertir su código Delgado a v1.layers en primer lugar, a continuación, convertir a Keras.

  • Retire arg_scopes , todos los argumentos tienen que ser explícitos.
  • Si utiliza ellos, dividida normalizer_fn y activation_fn en sus propias capas.
  • Las capas de conv separables se asignan a una o más capas de Keras diferentes (capas de Keras en profundidad, puntuales y separables).
  • Slim y v1.layers tienen diferentes nombres de argumentos y valores por defecto.
  • Algunos argumentos tienen diferentes escalas.
  • Si utiliza Delgado modelos pre-entrenado, probar los modelos pre-traimed de Keras de tf.keras.applications o TF Hub TensorFlow 2.x SavedModels 's exportados desde el código original Slim.

Algunos tf.contrib capas podrían no haber sido movido a base TensorFlow sino que se han trasladado a la paquete TensorFlow Complementos .

Capacitación

Hay muchas maneras a los datos de alimentación a un tf.keras modelo. Aceptarán generadores de Python y matrices Numpy como entrada.

La forma recomendada de alimentación de datos a un modelo es utilizar el tf.data paquete, que contiene una colección de clases de alto rendimiento para la manipulación de datos.

Si usted todavía está utilizando tf.queue , estos son ahora sólo se admiten como estructuras de datos, no como las tuberías de entrada.

Usar conjuntos de datos de TensorFlow

El TensorFlow Conjuntos de datos de paquete ( tfds ) contiene utilidades para los conjuntos de datos predefinidos de carga como tf.data.Dataset objetos.

Para este ejemplo, se puede cargar el conjunto de datos MNIST utilizando tfds :

datasets, info = tfds.load(name='mnist', with_info=True, as_supervised=True)
mnist_train, mnist_test = datasets['train'], datasets['test']

Luego, prepare los datos para el entrenamiento:

  • Vuelva a escalar cada imagen.
  • Mezcla el orden de los ejemplos.
  • Recopile lotes de imágenes y etiquetas.
BUFFER_SIZE = 10 # Use a much larger value for real code
BATCH_SIZE = 64
NUM_EPOCHS = 5


def scale(image, label):
  image = tf.cast(image, tf.float32)
  image /= 255

  return image, label

Para que el ejemplo sea breve, recorte el conjunto de datos para que solo devuelva 5 lotes:

train_data = mnist_train.map(scale).shuffle(BUFFER_SIZE).batch(BATCH_SIZE)
test_data = mnist_test.map(scale).batch(BATCH_SIZE)

STEPS_PER_EPOCH = 5

train_data = train_data.take(STEPS_PER_EPOCH)
test_data = test_data.take(STEPS_PER_EPOCH)
image_batch, label_batch = next(iter(train_data))
2021-07-19 23:37:19.049077: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.

Usa bucles de entrenamiento de Keras

Si no es necesario el control de bajo nivel de su proceso de formación, el uso de Keras incorporada fit , evaluate y predict métodos se recomienda. Estos métodos proporcionan una interfaz uniforme para entrenar el modelo independientemente de la implementación (secuencial, funcional o subclasificada).

Las ventajas de estos métodos incluyen:

  • Ellos aceptan matrices numpy, generadores y Python, tf.data.Datasets .
  • Aplican pérdidas de regularización y activación de forma automática.
  • Apoyan tf.distribute para la formación de múltiples dispositivos .
  • Admiten reclamaciones arbitrarias como pérdidas y métricas.
  • Apoyan devoluciones de llamada como tf.keras.callbacks.TensorBoard y devoluciones de llamada personalizados.
  • Son eficaces y utilizan automáticamente los gráficos de TensorFlow.

Aquí está un ejemplo de la formación de un modelo utilizando un Dataset . (Para más detalles sobre cómo funciona este, disfrutar de los tutoriales sección).

model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, 3, activation='relu',
                           kernel_regularizer=tf.keras.regularizers.l2(0.02),
                           input_shape=(28, 28, 1)),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dropout(0.1),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Dense(10)
])

# Model is the full model w/o custom layers
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

model.fit(train_data, epochs=NUM_EPOCHS)
loss, acc = model.evaluate(test_data)

print("Loss {}, Accuracy {}".format(loss, acc))
Epoch 1/5
5/5 [==============================] - 2s 8ms/step - loss: 1.5874 - accuracy: 0.4719
Epoch 2/5
2021-07-19 23:37:20.919125: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
5/5 [==============================] - 0s 5ms/step - loss: 0.4435 - accuracy: 0.9094
Epoch 3/5
2021-07-19 23:37:21.242435: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
5/5 [==============================] - 0s 6ms/step - loss: 0.2764 - accuracy: 0.9594
Epoch 4/5
2021-07-19 23:37:21.576808: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
5/5 [==============================] - 0s 5ms/step - loss: 0.1889 - accuracy: 0.9844
Epoch 5/5
2021-07-19 23:37:21.888991: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
5/5 [==============================] - 1s 6ms/step - loss: 0.1504 - accuracy: 0.9906
2021-07-19 23:37:23.082199: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
5/5 [==============================] - 1s 3ms/step - loss: 1.6299 - accuracy: 0.7031
Loss 1.6299388408660889, Accuracy 0.703125
2021-07-19 23:37:23.932781: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.

Escribe tu propio bucle

Si la etapa de formación del modelo Keras funciona para usted, pero se necesita más fuera de control que paso, considere el uso de la tf.keras.Model.train_on_batch método, en su propio bucle de datos de iteración.

Recuerde: Hay muchas cosas que se pueden implementar como un tf.keras.callbacks.Callback .

Este método tiene muchas de las ventajas de los métodos mencionados en la sección anterior, pero le da al usuario el control del bucle externo.

También puede utilizar tf.keras.Model.test_on_batch o tf.keras.Model.evaluate con el desempeño de verificación durante el entrenamiento.

Para seguir entrenando el modelo anterior:

# Model is the full model w/o custom layers
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

for epoch in range(NUM_EPOCHS):
  # Reset the metric accumulators
  model.reset_metrics()

  for image_batch, label_batch in train_data:
    result = model.train_on_batch(image_batch, label_batch)
    metrics_names = model.metrics_names
    print("train: ",
          "{}: {:.3f}".format(metrics_names[0], result[0]),
          "{}: {:.3f}".format(metrics_names[1], result[1]))
  for image_batch, label_batch in test_data:
    result = model.test_on_batch(image_batch, label_batch,
                                 # Return accumulated metrics
                                 reset_metrics=False)
  metrics_names = model.metrics_names
  print("\neval: ",
        "{}: {:.3f}".format(metrics_names[0], result[0]),
        "{}: {:.3f}".format(metrics_names[1], result[1]))
train:  loss: 0.131 accuracy: 1.000
train:  loss: 0.179 accuracy: 0.969
train:  loss: 0.117 accuracy: 0.984
train:  loss: 0.187 accuracy: 0.969
train:  loss: 0.168 accuracy: 0.969
2021-07-19 23:37:24.758128: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
2021-07-19 23:37:25.476778: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
eval:  loss: 1.655 accuracy: 0.703
train:  loss: 0.083 accuracy: 1.000
train:  loss: 0.080 accuracy: 1.000
train:  loss: 0.099 accuracy: 0.984
train:  loss: 0.088 accuracy: 1.000
train:  loss: 0.084 accuracy: 1.000
2021-07-19 23:37:25.822978: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
2021-07-19 23:37:26.103858: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
eval:  loss: 1.645 accuracy: 0.759
train:  loss: 0.066 accuracy: 1.000
train:  loss: 0.070 accuracy: 1.000
train:  loss: 0.062 accuracy: 1.000
train:  loss: 0.067 accuracy: 1.000
train:  loss: 0.061 accuracy: 1.000
2021-07-19 23:37:26.454306: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
2021-07-19 23:37:26.715112: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
eval:  loss: 1.609 accuracy: 0.819
train:  loss: 0.056 accuracy: 1.000
train:  loss: 0.053 accuracy: 1.000
train:  loss: 0.048 accuracy: 1.000
train:  loss: 0.057 accuracy: 1.000
train:  loss: 0.069 accuracy: 0.984
2021-07-19 23:37:27.059747: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
2021-07-19 23:37:27.327066: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
eval:  loss: 1.568 accuracy: 0.825
train:  loss: 0.048 accuracy: 1.000
train:  loss: 0.048 accuracy: 1.000
train:  loss: 0.044 accuracy: 1.000
train:  loss: 0.045 accuracy: 1.000
train:  loss: 0.045 accuracy: 1.000
2021-07-19 23:37:28.593597: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
eval:  loss: 1.531 accuracy: 0.841
2021-07-19 23:37:29.220455: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.

Personaliza el paso de entrenamiento

Si necesita más flexibilidad y control, puede tenerlo implementando su propio ciclo de entrenamiento. Hay tres pasos:

  1. Iterar sobre un generador de Python o tf.data.Dataset para obtener lotes de ejemplos.
  2. Utilice tf.GradientTape a los gradientes de cobro revertido.
  3. Utilice uno de los tf.keras.optimizers para aplicar las actualizaciones de peso a las variables del modelo.

Recuerda:

  • Siempre incluya una training argumento en la call método de capas y modelos subclases.
  • Asegúrese de llamar el modelo con la training correcta conjunto argumento.
  • Dependiendo del uso, es posible que las variables del modelo no existan hasta que el modelo se ejecute en un lote de datos.
  • Necesita manejar manualmente cosas como pérdidas de regularización para el modelo.

Tenga en cuenta las simplificaciones relativas a v1:

  • No es necesario ejecutar inicializadores de variables. Las variables se inicializan al crearse.
  • No es necesario agregar dependencias de control manual. Incluso en tf.function operaciones actúan como en el modo ansiosos.
model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, 3, activation='relu',
                           kernel_regularizer=tf.keras.regularizers.l2(0.02),
                           input_shape=(28, 28, 1)),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dropout(0.1),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Dense(10)
])

optimizer = tf.keras.optimizers.Adam(0.001)
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

@tf.function
def train_step(inputs, labels):
  with tf.GradientTape() as tape:
    predictions = model(inputs, training=True)
    regularization_loss=tf.math.add_n(model.losses)
    pred_loss=loss_fn(labels, predictions)
    total_loss=pred_loss + regularization_loss

  gradients = tape.gradient(total_loss, model.trainable_variables)
  optimizer.apply_gradients(zip(gradients, model.trainable_variables))

for epoch in range(NUM_EPOCHS):
  for inputs, labels in train_data:
    train_step(inputs, labels)
  print("Finished epoch", epoch)
2021-07-19 23:37:29.998049: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
Finished epoch 0
2021-07-19 23:37:30.316333: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
Finished epoch 1
2021-07-19 23:37:30.618560: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
Finished epoch 2
2021-07-19 23:37:30.946881: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
Finished epoch 3
Finished epoch 4
2021-07-19 23:37:31.261594: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.

Pérdidas y métricas de nuevo estilo

En TensorFlow 2.x, las métricas y las pérdidas son objetos. Estos funcionan tanto con entusiasmo y en tf.function s.

Un objeto de pérdida es invocable y espera (y_true, y_pred) como argumentos:

cce = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
cce([[1, 0]], [[-1.0,3.0]]).numpy()
4.01815

Un objeto métrico tiene los siguientes métodos:

  • Metric.update_state() : añadir nuevas observaciones.
  • Metric.result() : obtener el resultado actual de la métrica, dados los valores observados.
  • Metric.reset_states() : despejar todas las observaciones.

El objeto en sí es invocable. Llamando a las actualizaciones del estado con nuevas observaciones, al igual que con update_state , y devuelve el nuevo resultado de la métrica.

No es necesario que inicialice manualmente las variables de una métrica y, como TensorFlow 2.x tiene dependencias de control automático, tampoco debe preocuparse por ellas.

El siguiente código utiliza una métrica para realizar un seguimiento de la pérdida media observada dentro de un ciclo de entrenamiento personalizado.

# Create the metrics
loss_metric = tf.keras.metrics.Mean(name='train_loss')
accuracy_metric = tf.keras.metrics.SparseCategoricalAccuracy(name='train_accuracy')

@tf.function
def train_step(inputs, labels):
  with tf.GradientTape() as tape:
    predictions = model(inputs, training=True)
    regularization_loss=tf.math.add_n(model.losses)
    pred_loss=loss_fn(labels, predictions)
    total_loss=pred_loss + regularization_loss

  gradients = tape.gradient(total_loss, model.trainable_variables)
  optimizer.apply_gradients(zip(gradients, model.trainable_variables))
  # Update the metrics
  loss_metric.update_state(total_loss)
  accuracy_metric.update_state(labels, predictions)


for epoch in range(NUM_EPOCHS):
  # Reset the metrics
  loss_metric.reset_states()
  accuracy_metric.reset_states()

  for inputs, labels in train_data:
    train_step(inputs, labels)
  # Get the metric results
  mean_loss=loss_metric.result()
  mean_accuracy = accuracy_metric.result()

  print('Epoch: ', epoch)
  print('  loss:     {:.3f}'.format(mean_loss))
  print('  accuracy: {:.3f}'.format(mean_accuracy))
2021-07-19 23:37:31.878403: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
Epoch:  0
  loss:     0.172
  accuracy: 0.988
2021-07-19 23:37:32.177136: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
Epoch:  1
  loss:     0.143
  accuracy: 0.997
2021-07-19 23:37:32.493570: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
Epoch:  2
  loss:     0.126
  accuracy: 0.997
2021-07-19 23:37:32.807739: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
Epoch:  3
  loss:     0.109
  accuracy: 1.000
Epoch:  4
  loss:     0.092
  accuracy: 1.000
2021-07-19 23:37:33.155028: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.

Nombres de métricas de Keras

En TensorFlow 2.x, los modelos de Keras son más consistentes en el manejo de nombres de métricas.

Ahora bien, cuando se pasa una cadena en la lista de métricas, exactamente esa cadena se utiliza como métrica del name . Estos nombres son visibles en la historia del objeto devuelto por model.fit , y en los registros pasan a keras.callbacks . se establece en la cadena que pasó en la lista de métricas.

model.compile(
    optimizer = tf.keras.optimizers.Adam(0.001),
    loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics = ['acc', 'accuracy', tf.keras.metrics.SparseCategoricalAccuracy(name="my_accuracy")])
history = model.fit(train_data)
5/5 [==============================] - 1s 6ms/step - loss: 0.1042 - acc: 0.9969 - accuracy: 0.9969 - my_accuracy: 0.9969
2021-07-19 23:37:34.039643: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
history.history.keys()
dict_keys(['loss', 'acc', 'accuracy', 'my_accuracy'])

Esto difiere de las versiones anteriores, donde pasan a metrics=["accuracy"] resultaría en dict_keys(['loss', 'acc'])

Optimizadores de Keras

Los optimizadores en v1.train , como v1.train.AdamOptimizer y v1.train.GradientDescentOptimizer , tienen equivalentes en tf.keras.optimizers .

Convertir v1.train a keras.optimizers

A continuación, se indican algunos aspectos que debe tener en cuenta al convertir sus optimizadores:

Nuevos valores por defecto para algunos tf.keras.optimizers

No hay cambios para optimizers.SGD , optimizers.Adam o optimizers.RMSprop .

Las siguientes tasas de aprendizaje predeterminadas han cambiado:

TensorBoard

TensorFlow 2.x incluye cambios significativos en el tf.summary API se utiliza para escribir los datos de resumen para la visualización en TensorBoard. Para una introducción general a la nueva tf.summary , hay varios tutoriales disponibles que utilizan la API 2.x TensorFlow. Esto incluye una guía de migración 2.x TensorBoard TensorFlow .

Guardar y cargar

Compatibilidad con puntos de control

TensorFlow 2.x utiliza puntos de control objeto de base .

Los puntos de control basados ​​en nombres de estilo antiguo aún se pueden cargar, si tiene cuidado. El proceso de conversión de código puede resultar en cambios de nombre de variable, pero existen soluciones.

El enfoque más simple es alinear los nombres del nuevo modelo con los nombres en el punto de control:

  • Las variables siguen todos tienen un name argumento se puede establecer.
  • Modelos Keras también toman un name argumento como el que establecen como prefijo para sus variables.
  • El v1.name_scope función puede ser utilizada para establecer los prefijos de nombre de variable. Esto es muy diferente de tf.variable_scope . Solo afecta a los nombres y no rastrea las variables ni la reutilización.

Si eso no funciona para su caso de uso, probar el v1.train.init_from_checkpoint función. Se necesita una assignment_map argumento, que especifica la correspondencia entre los nombres antiguos a nuevos nombres.

El repositorio TensorFlow Estimador incluye una herramienta de conversión para actualizar los puntos de control para los estimadores prefabricados de TensorFlow 1.x a 2.0. Puede servir como ejemplo de cómo crear una herramienta para un caso de uso similar.

Compatibilidad de modelos guardados

No hay problemas de compatibilidad importantes para los modelos guardados.

  • Los modelos guardados de TensorFlow 1.x funcionan en TensorFlow 2.x.
  • Los modelos guardados de TensorFlow 2.x funcionan en TensorFlow 1.x si se admiten todas las operaciones.

Un Graph.pb o Graph.pbtxt

No hay una forma sencilla de actualizar una prima Graph.pb archivo a TensorFlow 2.x. Su mejor opción es actualizar el código que generó el archivo.

Pero, si usted tiene un "gráfico congelado" (un tf.Graph donde las variables se han convertido en constantes), entonces es posible convertir a un concrete_function usando v1.wrap_function :

def wrap_frozen_graph(graph_def, inputs, outputs):
  def _imports_graph_def():
    tf.compat.v1.import_graph_def(graph_def, name="")
  wrapped_import = tf.compat.v1.wrap_function(_imports_graph_def, [])
  import_graph = wrapped_import.graph
  return wrapped_import.prune(
      tf.nest.map_structure(import_graph.as_graph_element, inputs),
      tf.nest.map_structure(import_graph.as_graph_element, outputs))

Por ejemplo, aquí hay un gráfico congelado para Inception v1, de 2016:

path = tf.keras.utils.get_file(
    'inception_v1_2016_08_28_frozen.pb',
    'http://storage.googleapis.com/download.tensorflow.org/models/inception_v1_2016_08_28_frozen.pb.tar.gz',
    untar=True)
Downloading data from http://storage.googleapis.com/download.tensorflow.org/models/inception_v1_2016_08_28_frozen.pb.tar.gz
24698880/24695710 [==============================] - 1s 0us/step

Cargar el tf.GraphDef :

graph_def = tf.compat.v1.GraphDef()
loaded = graph_def.ParseFromString(open(path,'rb').read())

Envolverlo en un concrete_function :

inception_func = wrap_frozen_graph(
    graph_def, inputs='input:0',
    outputs='InceptionV1/InceptionV1/Mixed_3b/Branch_1/Conv2d_0a_1x1/Relu:0')

Pásale un tensor como entrada:

input_img = tf.ones([1,224,224,3], dtype=tf.float32)
inception_func(input_img).shape
TensorShape([1, 28, 28, 96])

Estimadores

Entrenamiento con estimadores

Los estimadores son compatibles con TensorFlow 2.x.

Cuando se utiliza estimadores, puede utilizar input_fn , tf.estimator.TrainSpec y tf.estimator.EvalSpec de TensorFlow 1.x.

Aquí está un ejemplo usando input_fn con el tren y evaluar especificaciones.

Creación de las especificaciones input_fn y train / eval

# Define the estimator's input_fn
def input_fn():
  datasets, info = tfds.load(name='mnist', with_info=True, as_supervised=True)
  mnist_train, mnist_test = datasets['train'], datasets['test']

  BUFFER_SIZE = 10000
  BATCH_SIZE = 64

  def scale(image, label):
    image = tf.cast(image, tf.float32)
    image /= 255

    return image, label[..., tf.newaxis]

  train_data = mnist_train.map(scale).shuffle(BUFFER_SIZE).batch(BATCH_SIZE)
  return train_data.repeat()

# Define train and eval specs
train_spec = tf.estimator.TrainSpec(input_fn=input_fn,
                                    max_steps=STEPS_PER_EPOCH * NUM_EPOCHS)
eval_spec = tf.estimator.EvalSpec(input_fn=input_fn,
                                  steps=STEPS_PER_EPOCH)

Usando una definición de modelo de Keras

Existen algunas diferencias en cómo construir sus estimadores en TensorFlow 2.x.

Se recomienda que defina el modelo mediante Keras, a continuación, utilizar el tf.keras.estimator.model_to_estimator utilidad para convertir su modelo en un estimador. El siguiente código muestra cómo utilizar esta utilidad al crear y entrenar un estimador.

def make_model():
  return tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, 3, activation='relu',
                           kernel_regularizer=tf.keras.regularizers.l2(0.02),
                           input_shape=(28, 28, 1)),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dropout(0.1),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Dense(10)
  ])
model = make_model()

model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

estimator = tf.keras.estimator.model_to_estimator(
  keras_model = model
)

tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
INFO:tensorflow:Using default config.
INFO:tensorflow:Using default config.
WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmpbhtumut0
WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmpbhtumut0
INFO:tensorflow:Using the Keras model provided.
INFO:tensorflow:Using the Keras model provided.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/keras/layers/normalization.py:534: _colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/keras/backend.py:435: UserWarning: `tf.keras.backend.set_learning_phase` is deprecated and will be removed after 2020-10-11. To update it, simply pass a True/False value to the `training` argument of the `__call__` method of your layer or model.
  warnings.warn('`tf.keras.backend.set_learning_phase` is deprecated and '
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/keras/layers/normalization.py:534: _colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpbhtumut0', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
2021-07-19 23:37:36.453946: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:36.454330: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:00:05.0 name: NVIDIA Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-07-19 23:37:36.454461: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:36.454737: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:36.454977: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-19 23:37:36.455020: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-19 23:37:36.455027: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 
2021-07-19 23:37:36.455033: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N 
2021-07-19 23:37:36.455126: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:36.455479: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:36.455779: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14646 MB memory) -> physical GPU (device: 0, name: NVIDIA Tesla V100-SXM2-16GB, pci bus id: 0000:00:05.0, compute capability: 7.0)
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpbhtumut0', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Warm-starting with WarmStartSettings: WarmStartSettings(ckpt_to_initialize_from='/tmp/tmpbhtumut0/keras/keras_model.ckpt', vars_to_warm_start='.*', var_name_to_vocab_info={}, var_name_to_prev_var_name={})
INFO:tensorflow:Warm-starting with WarmStartSettings: WarmStartSettings(ckpt_to_initialize_from='/tmp/tmpbhtumut0/keras/keras_model.ckpt', vars_to_warm_start='.*', var_name_to_vocab_info={}, var_name_to_prev_var_name={})
INFO:tensorflow:Warm-starting from: /tmp/tmpbhtumut0/keras/keras_model.ckpt
INFO:tensorflow:Warm-starting from: /tmp/tmpbhtumut0/keras/keras_model.ckpt
INFO:tensorflow:Warm-starting variables only in TRAINABLE_VARIABLES.
INFO:tensorflow:Warm-starting variables only in TRAINABLE_VARIABLES.
INFO:tensorflow:Warm-started 8 variables.
INFO:tensorflow:Warm-started 8 variables.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Graph was finalized.
2021-07-19 23:37:39.175917: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:39.176299: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:00:05.0 name: NVIDIA Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-07-19 23:37:39.176424: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:39.176729: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:39.176999: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-19 23:37:39.177042: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-19 23:37:39.177050: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 
2021-07-19 23:37:39.177057: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N 
2021-07-19 23:37:39.177159: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:39.177481: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:39.177761: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14646 MB memory) -> physical GPU (device: 0, name: NVIDIA Tesla V100-SXM2-16GB, pci bus id: 0000:00:05.0, compute capability: 7.0)
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmpbhtumut0/model.ckpt.
INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmpbhtumut0/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:loss = 3.1193407, step = 0
INFO:tensorflow:loss = 3.1193407, step = 0
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 25...
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 25...
INFO:tensorflow:Saving checkpoints for 25 into /tmp/tmpbhtumut0/model.ckpt.
INFO:tensorflow:Saving checkpoints for 25 into /tmp/tmpbhtumut0/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 25...
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 25...
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py:2426: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  warnings.warn('`Model.state_updates` will be removed in a future version. '
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2021-07-19T23:37:42
INFO:tensorflow:Starting evaluation at 2021-07-19T23:37:42
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Graph was finalized.
2021-07-19 23:37:42.476830: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:42.477207: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:00:05.0 name: NVIDIA Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-07-19 23:37:42.477339: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:42.477648: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:42.477910: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-19 23:37:42.477955: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-19 23:37:42.477963: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 
2021-07-19 23:37:42.477969: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N 
2021-07-19 23:37:42.478058: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:42.478332: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
INFO:tensorflow:Restoring parameters from /tmp/tmpbhtumut0/model.ckpt-25
2021-07-19 23:37:42.478592: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14646 MB memory) -> physical GPU (device: 0, name: NVIDIA Tesla V100-SXM2-16GB, pci bus id: 0000:00:05.0, compute capability: 7.0)
INFO:tensorflow:Restoring parameters from /tmp/tmpbhtumut0/model.ckpt-25
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [1/5]
INFO:tensorflow:Evaluation [1/5]
INFO:tensorflow:Evaluation [2/5]
INFO:tensorflow:Evaluation [2/5]
INFO:tensorflow:Evaluation [3/5]
INFO:tensorflow:Evaluation [3/5]
INFO:tensorflow:Evaluation [4/5]
INFO:tensorflow:Evaluation [4/5]
INFO:tensorflow:Evaluation [5/5]
INFO:tensorflow:Evaluation [5/5]
INFO:tensorflow:Inference Time : 1.02146s
2021-07-19 23:37:43.437293: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
INFO:tensorflow:Inference Time : 1.02146s
INFO:tensorflow:Finished evaluation at 2021-07-19-23:37:43
INFO:tensorflow:Finished evaluation at 2021-07-19-23:37:43
INFO:tensorflow:Saving dict for global step 25: accuracy = 0.634375, global_step = 25, loss = 1.493957
INFO:tensorflow:Saving dict for global step 25: accuracy = 0.634375, global_step = 25, loss = 1.493957
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 25: /tmp/tmpbhtumut0/model.ckpt-25
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 25: /tmp/tmpbhtumut0/model.ckpt-25
INFO:tensorflow:Loss for final step: 0.37796202.
2021-07-19 23:37:43.510911: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
INFO:tensorflow:Loss for final step: 0.37796202.
({'accuracy': 0.634375, 'loss': 1.493957, 'global_step': 25}, [])

El uso de un encargo model_fn

Si usted tiene un estimador personalizado existente model_fn que necesita para mantener, puede convertir su model_fn utilizar un modelo Keras.

Sin embargo, por razones de compatibilidad, una costumbre model_fn seguirá funcionando en modo gráfico 1.x-estilo. Esto significa que no hay una ejecución ávida ni dependencias de control automático.

Model_fn personalizado con cambios mínimos

Para hacer su encargo model_fn trabajo en TensorFlow 2.x, si lo prefiere cambios mínimos en el código existente, tf.compat.v1 símbolos como optimizers y metrics se pueden utilizar.

Utilizando un modelo Keras en una costumbre model_fn es similar a usar en un bucle de entrenamiento personalizado:

  • Establecer la training de fase apropiada, basada en el mode argumento.
  • Explícitamente pasar del modelo trainable_variables al optimizador.

Sin embargo, hay diferencias importantes, en relación con un lazo de encargo :

  • En lugar de utilizar Model.losses , extraer las pérdidas utilizando Model.get_losses_for .
  • Extraer actualizaciones del modelo usando Model.get_updates_for .

El siguiente código crea un estimador de una costumbre model_fn , que ilustra todas estas preocupaciones.

def my_model_fn(features, labels, mode):
  model = make_model()

  optimizer = tf.compat.v1.train.AdamOptimizer()
  loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

  training = (mode == tf.estimator.ModeKeys.TRAIN)
  predictions = model(features, training=training)

  if mode == tf.estimator.ModeKeys.PREDICT:
    return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)

  reg_losses = model.get_losses_for(None) + model.get_losses_for(features)
  total_loss=loss_fn(labels, predictions) + tf.math.add_n(reg_losses)

  accuracy = tf.compat.v1.metrics.accuracy(labels=labels,
                                           predictions=tf.math.argmax(predictions, axis=1),
                                           name='acc_op')

  update_ops = model.get_updates_for(None) + model.get_updates_for(features)
  minimize_op = optimizer.minimize(
      total_loss,
      var_list=model.trainable_variables,
      global_step=tf.compat.v1.train.get_or_create_global_step())
  train_op = tf.group(minimize_op, update_ops)

  return tf.estimator.EstimatorSpec(
    mode=mode,
    predictions=predictions,
    loss=total_loss,
    train_op=train_op, eval_metric_ops={'accuracy': accuracy})

# Create the Estimator & Train
estimator = tf.estimator.Estimator(model_fn=my_model_fn)
tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
INFO:tensorflow:Using default config.
INFO:tensorflow:Using default config.
WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmpqiom6a5s
WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmpqiom6a5s
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpqiom6a5s', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpqiom6a5s', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Graph was finalized.
2021-07-19 23:37:46.140692: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:46.141065: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:00:05.0 name: NVIDIA Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-07-19 23:37:46.141220: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:46.141517: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:46.141765: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-19 23:37:46.141807: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-19 23:37:46.141814: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 
2021-07-19 23:37:46.141820: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N 
2021-07-19 23:37:46.141907: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:46.142234: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:46.142497: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14646 MB memory) -> physical GPU (device: 0, name: NVIDIA Tesla V100-SXM2-16GB, pci bus id: 0000:00:05.0, compute capability: 7.0)
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmpqiom6a5s/model.ckpt.
INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmpqiom6a5s/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:loss = 2.9167266, step = 0
INFO:tensorflow:loss = 2.9167266, step = 0
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 25...
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 25...
INFO:tensorflow:Saving checkpoints for 25 into /tmp/tmpqiom6a5s/model.ckpt.
INFO:tensorflow:Saving checkpoints for 25 into /tmp/tmpqiom6a5s/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 25...
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 25...
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2021-07-19T23:37:49
INFO:tensorflow:Starting evaluation at 2021-07-19T23:37:49
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpqiom6a5s/model.ckpt-25
2021-07-19 23:37:49.640699: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:49.641091: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:00:05.0 name: NVIDIA Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-07-19 23:37:49.641238: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:49.641580: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:49.641848: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-19 23:37:49.641893: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-19 23:37:49.641901: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 
2021-07-19 23:37:49.641910: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N 
2021-07-19 23:37:49.642029: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:49.642355: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:49.642657: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14646 MB memory) -> physical GPU (device: 0, name: NVIDIA Tesla V100-SXM2-16GB, pci bus id: 0000:00:05.0, compute capability: 7.0)
INFO:tensorflow:Restoring parameters from /tmp/tmpqiom6a5s/model.ckpt-25
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [1/5]
INFO:tensorflow:Evaluation [1/5]
INFO:tensorflow:Evaluation [2/5]
INFO:tensorflow:Evaluation [2/5]
INFO:tensorflow:Evaluation [3/5]
INFO:tensorflow:Evaluation [3/5]
INFO:tensorflow:Evaluation [4/5]
INFO:tensorflow:Evaluation [4/5]
INFO:tensorflow:Evaluation [5/5]
INFO:tensorflow:Evaluation [5/5]
INFO:tensorflow:Inference Time : 1.38362s
2021-07-19 23:37:50.924973: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
INFO:tensorflow:Inference Time : 1.38362s
INFO:tensorflow:Finished evaluation at 2021-07-19-23:37:50
INFO:tensorflow:Finished evaluation at 2021-07-19-23:37:50
INFO:tensorflow:Saving dict for global step 25: accuracy = 0.70625, global_step = 25, loss = 1.6135181
INFO:tensorflow:Saving dict for global step 25: accuracy = 0.70625, global_step = 25, loss = 1.6135181
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 25: /tmp/tmpqiom6a5s/model.ckpt-25
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 25: /tmp/tmpqiom6a5s/model.ckpt-25
INFO:tensorflow:Loss for final step: 0.60315084.
2021-07-19 23:37:51.035953: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
INFO:tensorflow:Loss for final step: 0.60315084.
({'accuracy': 0.70625, 'loss': 1.6135181, 'global_step': 25}, [])

Costumbre model_fn con TensorFlow símbolos 2.x.

Si desea deshacerse de todos los símbolos 1.x TensorFlow y actualizar su costumbre model_fn a TensorFlow 2.x, es necesario actualizar el optimizador y métricas para tf.keras.optimizers y tf.keras.metrics .

En la costumbre model_fn , además de los anteriores cambios , más mejoras deben hacerse:

Para el ejemplo anterior de my_model_fn , el código migrado con TensorFlow símbolos 2.x se muestra como:

def my_model_fn(features, labels, mode):
  model = make_model()

  training = (mode == tf.estimator.ModeKeys.TRAIN)
  loss_obj = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
  predictions = model(features, training=training)

  # Get both the unconditional losses (the None part)
  # and the input-conditional losses (the features part).
  reg_losses = model.get_losses_for(None) + model.get_losses_for(features)
  total_loss=loss_obj(labels, predictions) + tf.math.add_n(reg_losses)

  # Upgrade to tf.keras.metrics.
  accuracy_obj = tf.keras.metrics.Accuracy(name='acc_obj')
  accuracy = accuracy_obj.update_state(
      y_true=labels, y_pred=tf.math.argmax(predictions, axis=1))

  train_op = None
  if training:
    # Upgrade to tf.keras.optimizers.
    optimizer = tf.keras.optimizers.Adam()
    # Manually assign tf.compat.v1.global_step variable to optimizer.iterations
    # to make tf.compat.v1.train.global_step increased correctly.
    # This assignment is a must for any `tf.train.SessionRunHook` specified in
    # estimator, as SessionRunHooks rely on global step.
    optimizer.iterations = tf.compat.v1.train.get_or_create_global_step()
    # Get both the unconditional updates (the None part)
    # and the input-conditional updates (the features part).
    update_ops = model.get_updates_for(None) + model.get_updates_for(features)
    # Compute the minimize_op.
    minimize_op = optimizer.get_updates(
        total_loss,
        model.trainable_variables)[0]
    train_op = tf.group(minimize_op, *update_ops)

  return tf.estimator.EstimatorSpec(
    mode=mode,
    predictions=predictions,
    loss=total_loss,
    train_op=train_op,
    eval_metric_ops={'Accuracy': accuracy_obj})

# Create the Estimator and train.
estimator = tf.estimator.Estimator(model_fn=my_model_fn)
tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
INFO:tensorflow:Using default config.
INFO:tensorflow:Using default config.
WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmpomveromc
WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmpomveromc
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpomveromc', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpomveromc', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Graph was finalized.
2021-07-19 23:37:53.371110: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:53.371633: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:00:05.0 name: NVIDIA Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-07-19 23:37:53.371845: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:53.372311: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:53.372679: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-19 23:37:53.372742: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-19 23:37:53.372779: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 
2021-07-19 23:37:53.372790: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N 
2021-07-19 23:37:53.372939: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:53.373380: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:53.373693: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14646 MB memory) -> physical GPU (device: 0, name: NVIDIA Tesla V100-SXM2-16GB, pci bus id: 0000:00:05.0, compute capability: 7.0)
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmpomveromc/model.ckpt.
INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmpomveromc/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:loss = 2.874814, step = 0
INFO:tensorflow:loss = 2.874814, step = 0
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 25...
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 25...
INFO:tensorflow:Saving checkpoints for 25 into /tmp/tmpomveromc/model.ckpt.
INFO:tensorflow:Saving checkpoints for 25 into /tmp/tmpomveromc/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 25...
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 25...
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2021-07-19T23:37:56
INFO:tensorflow:Starting evaluation at 2021-07-19T23:37:56
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpomveromc/model.ckpt-25
2021-07-19 23:37:56.884303: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:56.884746: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:00:05.0 name: NVIDIA Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-07-19 23:37:56.884934: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:56.885330: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:56.885640: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-19 23:37:56.885696: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-19 23:37:56.885711: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 
2021-07-19 23:37:56.885720: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N 
2021-07-19 23:37:56.885861: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:56.886386: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:56.886729: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14646 MB memory) -> physical GPU (device: 0, name: NVIDIA Tesla V100-SXM2-16GB, pci bus id: 0000:00:05.0, compute capability: 7.0)
INFO:tensorflow:Restoring parameters from /tmp/tmpomveromc/model.ckpt-25
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [1/5]
INFO:tensorflow:Evaluation [1/5]
INFO:tensorflow:Evaluation [2/5]
INFO:tensorflow:Evaluation [2/5]
INFO:tensorflow:Evaluation [3/5]
INFO:tensorflow:Evaluation [3/5]
INFO:tensorflow:Evaluation [4/5]
INFO:tensorflow:Evaluation [4/5]
INFO:tensorflow:Evaluation [5/5]
INFO:tensorflow:Evaluation [5/5]
INFO:tensorflow:Inference Time : 1.04574s
2021-07-19 23:37:57.852422: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
INFO:tensorflow:Inference Time : 1.04574s
INFO:tensorflow:Finished evaluation at 2021-07-19-23:37:57
INFO:tensorflow:Finished evaluation at 2021-07-19-23:37:57
INFO:tensorflow:Saving dict for global step 25: Accuracy = 0.790625, global_step = 25, loss = 1.4257433
INFO:tensorflow:Saving dict for global step 25: Accuracy = 0.790625, global_step = 25, loss = 1.4257433
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 25: /tmp/tmpomveromc/model.ckpt-25
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 25: /tmp/tmpomveromc/model.ckpt-25
INFO:tensorflow:Loss for final step: 0.42627147.
2021-07-19 23:37:57.941217: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
INFO:tensorflow:Loss for final step: 0.42627147.
({'Accuracy': 0.790625, 'loss': 1.4257433, 'global_step': 25}, [])

Estimadores prediseñados

Estimadores de prefabricados en la familia de tf.estimator.DNN* , tf.estimator.Linear* y tf.estimator.DNNLinearCombined* son aún compatibles con la API TensorFlow 2.x. Sin embargo, algunos argumentos han cambiado:

  1. input_layer_partitioner : Se ha eliminado en la versión 2.
  2. loss_reduction : Se ha actualizado para tf.keras.losses.Reduction en lugar de tf.compat.v1.losses.Reduction . Su valor por defecto también se cambia a tf.keras.losses.Reduction.SUM_OVER_BATCH_SIZE de tf.compat.v1.losses.Reduction.SUM .
  3. optimizer , dnn_optimizer y linear_optimizer : este argumento ha sido actualizado para tf.keras.optimizers en lugar de la tf.compat.v1.train.Optimizer .

Para migrar los cambios anteriores:

  1. No es necesaria ninguna migración de input_layer_partitioner desde Distribution Strategy se harán cargo en forma automática en TensorFlow 2.x.
  2. Para loss_reduction , comprobar tf.keras.losses.Reduction de las opciones soportadas.
  3. Para optimizer argumentos:
    • Si no lo hace: 1 paso) en el optimizer , dnn_optimizer o linear_optimizer argumento, o 2) especificar el optimizer argumento como una string en su código, entonces no es necesario cambiar nada porque tf.keras.optimizers se utiliza por defecto .
    • De lo contrario, es necesario actualizarla desde tf.compat.v1.train.Optimizer a sus correspondientes tf.keras.optimizers .

Convertidor de punto de control

La migración a keras.optimizers rompa puestos de control guardado utilizando TensorFlow 1.x, como tf.keras.optimizers genera un conjunto diferente de variables que se guarda en los puestos de control. Para hacer viejo puesto de control reutilizable después de su migración a TensorFlow 2.x, tratar la herramienta de conversión de punto de control .

 curl -O https://raw.githubusercontent.com/tensorflow/estimator/master/tensorflow_estimator/python/estimator/tools/checkpoint_converter.py
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 14889  100 14889    0     0  60771      0 --:--:-- --:--:-- --:--:-- 60771

La herramienta tiene ayuda incorporada:

 python checkpoint_converter.py -h
2021-07-19 23:37:58.805973: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
usage: checkpoint_converter.py [-h]
                               {dnn,linear,combined} source_checkpoint
                               source_graph target_checkpoint

positional arguments:
  {dnn,linear,combined}
                        The type of estimator to be converted. So far, the
                        checkpoint converter only supports Canned Estimator.
                        So the allowed types include linear, dnn and combined.
  source_checkpoint     Path to source checkpoint file to be read in.
  source_graph          Path to source graph file to be read in.
  target_checkpoint     Path to checkpoint file to be written out.

optional arguments:
  -h, --help            show this help message and exit

TensorShape

Esta clase se simplificó para sostener int s, en lugar de tf.compat.v1.Dimension objetos. Así que no hay necesidad de llamar a .value para obtener una int .

Persona tf.compat.v1.Dimension objetos siguen siendo accesibles desde tf.TensorShape.dims .

A continuación, se muestran las diferencias entre TensorFlow 1.xy TensorFlow 2.x.

# Create a shape and choose an index
i = 0
shape = tf.TensorShape([16, None, 256])
shape
TensorShape([16, None, 256])

Si tenía esto en TensorFlow 1.x:

value = shape[i].value

Luego haz esto en TensorFlow 2.x:

value = shape[i]
value
16

Si tenía esto en TensorFlow 1.x:

for dim in shape:
    value = dim.value
    print(value)

Luego haz esto en TensorFlow 2.x:

for value in shape:
  print(value)
16
None
256

Si tenía esto en TensorFlow 1.x (o utilizó cualquier otro método de dimensión):

dim = shape[i]
dim.assert_is_compatible_with(other_dim)

Luego haz esto en TensorFlow 2.x:

other_dim = 16
Dimension = tf.compat.v1.Dimension

if shape.rank is None:
  dim = Dimension(None)
else:
  dim = shape.dims[i]
dim.is_compatible_with(other_dim) # or any other dimension method
True
shape = tf.TensorShape(None)

if shape:
  dim = shape.dims[i]
  dim.is_compatible_with(other_dim) # or any other dimension method

El valor booleano de un tf.TensorShape es True si se conoce el rango, False de lo contrario.

print(bool(tf.TensorShape([])))      # Scalar
print(bool(tf.TensorShape([0])))     # 0-length vector
print(bool(tf.TensorShape([1])))     # 1-length vector
print(bool(tf.TensorShape([None])))  # Unknown-length vector
print(bool(tf.TensorShape([1, 10, 100])))       # 3D tensor
print(bool(tf.TensorShape([None, None, None]))) # 3D tensor with no known dimensions
print()
print(bool(tf.TensorShape(None)))  # A tensor with unknown rank.
True
True
True
True
True
True

False

Otros cambios

  • Retire tf.colocate_with : algoritmos de colocación del dispositivo TensorFlow han mejorado significativamente. Esto ya no debería ser necesario. Si la eliminación provoca una degradación de rendimiento informe de un error .

  • Reemplazar v1.ConfigProto uso con las funciones equivalentes de tf.config .

Conclusiones

El proceso general es:

  1. Ejecute el script de actualización.
  2. Elimina los símbolos contrib.
  3. Cambie sus modelos a un estilo orientado a objetos (Keras).
  4. Use tf.keras o tf.estimator formación y bucles de evaluación donde se puede.
  5. De lo contrario, use bucles personalizados, pero asegúrese de evitar sesiones y colecciones.

Se necesita un poco de trabajo para convertir el código a TensorFlow 2.x idiomático, pero cada cambio da como resultado:

  • Menos líneas de código.
  • Mayor claridad y sencillez.
  • Depuración más sencilla.