Migra il tuo codice TensorFlow 1 a TensorFlow 2

Visualizza su TensorFlow.org Esegui in Google Colab Visualizza la fonte su GitHub Scarica il taccuino

Questa guida è per gli utenti di API TensorFlow di basso livello. Se si utilizzano le API di alto livello ( tf.keras ) ci può essere poca o nessuna azione è necessario prendere per rendere il codice completamente tensorflow 2.x compatibile:

È ancora possibile eseguire codice 1.x, non modificato ( eccetto contrib ), in tensorflow 2.x:

import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()

Tuttavia, ciò non consente di sfruttare molti dei miglioramenti apportati a TensorFlow 2.x. Questa guida ti aiuterà ad aggiornare il tuo codice, rendendolo più semplice, più performante e più facile da mantenere.

Script di conversione automatica

Il primo passo, prima di tentare di implementare le modifiche descritte in questa guida, è quello di provare a eseguire lo script di aggiornamento .

Questo eseguirà un passaggio iniziale per aggiornare il tuo codice a TensorFlow 2.x ma non può rendere il tuo codice idiomatico a v2. Il tuo codice può ancora fare uso di tf.compat.v1 endpoint per i segnaposto di accesso, le sessioni, collezioni, e altre funzionalità 1.x-stile.

Cambiamenti comportamentali di primo livello

Se i lavori di codice in tensorflow 2.x utilizzando tf.compat.v1.disable_v2_behavior , ci sono ancora cambiamenti comportamentali globali potrebbe essere necessario l'indirizzo. Le principali modifiche sono:

  • Esecuzione Eager, v1.enable_eager_execution() : qualsiasi codice che utilizza implicitamente un tf.Graph fallirà. Assicurarsi di avvolgere questo codice in un with tf.Graph().as_default() contesto.

  • Variabili di risorse, v1.enable_resource_variables() : Alcuni codice può dipende da comportamenti non deterministici attivati per variabili di riferimento tensorflow. Le variabili delle risorse sono bloccate durante la scrittura e quindi forniscono garanzie di coerenza più intuitive.

    • Questo può modificare il comportamento nei casi limite.
    • Ciò può creare copie aggiuntive e può avere un maggiore utilizzo della memoria.
    • Questo può essere disabilitato passando use_resource=False alla tf.Variable costruttore.
  • Tensor forme, v1.enable_v2_tensorshape() : tensorflow 2.x semplifica il comportamento delle forme tensore. Invece di t.shape[0].value si può dire t.shape[0] . Queste modifiche dovrebbero essere piccole e ha senso correggerle subito. Consultare la TensorShape sezione esempi.

  • Il flusso di controllo, v1.enable_control_flow_v2() : L'implementazione flusso di controllo tensorflow 2.x è stata semplificata, e quindi produce diverse rappresentazioni grafico. Si prega bug dei file per eventuali problemi.

Crea codice per TensorFlow 2.x

Questa guida illustrerà diversi esempi di conversione del codice TensorFlow 1.x in TensorFlow 2.x. Queste modifiche consentiranno al tuo codice di sfruttare le ottimizzazioni delle prestazioni e le chiamate API semplificate.

In ogni caso, il modello è:

1. Sostituire v1.Session.run chiamate

Ogni v1.Session.run chiamata dovrebbe essere sostituito da una funzione Python.

  • I feed_dict e v1.placeholder s diventano argomenti di funzione.
  • I fetches diventano valore di ritorno della funzione.
  • Durante la conversione esecuzione ansiosi permette un facile debug con strumenti Python standard come pdb .

Dopo di che, aggiungere un tf.function decoratore per farlo funzionare in modo efficiente in grafico. Controlla la guida Autograph per maggiori informazioni su come funziona.

Notare che:

  • A differenza di v1.Session.run , un tf.function ha una firma rendimento fisso e restituisce sempre tutte le uscite. Se ciò causa problemi di prestazioni, creare due funzioni separate.

  • Non v'è alcuna necessità di un tf.control_dependencies o operazioni simili: A tf.function si comporta come se fosse eseguito in ordine scritto. tf.Variable assegnazioni e tf.assert s, per esempio, vengono eseguite automaticamente.

La sezione modelli conversione contiene un esempio di lavoro di questo processo di conversione.

2. Usa oggetti Python per tenere traccia di variabili e perdite

Tutto il monitoraggio delle variabili basato sul nome è fortemente sconsigliato in TensorFlow 2.x. Usa gli oggetti Python per tenere traccia delle variabili.

Uso tf.Variable invece di v1.get_variable .

Ogni v1.variable_scope deve essere convertito in un oggetto Python. In genere questo sarà uno dei seguenti:

Se avete bisogno di liste di aggregazione delle variabili (come tf.Graph.get_collection(tf.GraphKeys.VARIABLES) ), utilizzare i .variables e .trainable_variables attributi del Layer e Model di oggetti.

Questi Layer e Model classi implementano diverse altre proprietà che eliminano la necessità di collezioni a livello mondiale. La loro .losses proprietà può essere un sostituto per l'utilizzo del tf.GraphKeys.LOSSES collezione.

Fare riferimento alle guide Keras per maggiori dettagli.

3. Aggiorna i tuoi cicli di allenamento

Usa l'API di livello più alto che funziona per il tuo caso d'uso. Preferisco tf.keras.Model.fit sopra costruire il proprio loop di formazione.

Queste funzioni di alto livello gestiscono molti dettagli di basso livello che potrebbero essere facili da perdere se si scrive il proprio ciclo di allenamento. Per esempio, essi automaticamente raccogliere le perdite di regolarizzazione, e impostare la training=True argomento quando si chiama il modello.

4. Aggiorna le tue pipeline di input dei dati

Utilizzare tf.data set di dati per l'input dei dati. Questi oggetti sono efficienti, espressivi e si integrano bene con tensorflow.

Essi possono essere passati direttamente al tf.keras.Model.fit metodo.

model.fit(dataset, epochs=5)

Possono essere iterati direttamente su Python standard:

for example_batch, label_batch in dataset:
    break

5. Migrazione off compat.v1 simboli

Il tf.compat.v1 modulo contiene l'API completa tensorflow 1.x, con le sue originali semantica.

Il tensorflow 2.x script di aggiornamento convertirà i simboli per i loro equivalenti v2 se una tale conversione è sicuro, vale a dire, se si può determinare che il comportamento della versione 2.x tensorflow è esattamente equivalente (per esempio, si rinominerà v1.arg_max a tf.argmax , poiché questi sono la stessa funzione).

Dopo lo script di aggiornamento viene fatto con un pezzo di codice, è probabile che ci sono numerose segnalazioni di compat.v1 . Vale la pena esaminare il codice e convertirli manualmente nell'equivalente v2 (dovrebbe essere menzionato nel registro se ce n'è uno).

Conversione di modelli

Variabili di basso livello ed esecuzione dell'operatore

Esempi di utilizzo dell'API di basso livello includono:

Prima di convertire

Ecco come possono apparire questi modelli nel codice utilizzando TensorFlow 1.x.

import tensorflow as tf
import tensorflow.compat.v1 as v1

import tensorflow_datasets as tfds
2021-07-19 23:37:03.701382: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
g = v1.Graph()

with g.as_default():
  in_a = v1.placeholder(dtype=v1.float32, shape=(2))
  in_b = v1.placeholder(dtype=v1.float32, shape=(2))

  def forward(x):
    with v1.variable_scope("matmul", reuse=v1.AUTO_REUSE):
      W = v1.get_variable("W", initializer=v1.ones(shape=(2,2)),
                          regularizer=lambda x:tf.reduce_mean(x**2))
      b = v1.get_variable("b", initializer=v1.zeros(shape=(2)))
      return W * x + b

  out_a = forward(in_a)
  out_b = forward(in_b)
  reg_loss=v1.losses.get_regularization_loss(scope="matmul")

with v1.Session(graph=g) as sess:
  sess.run(v1.global_variables_initializer())
  outs = sess.run([out_a, out_b, reg_loss],
                feed_dict={in_a: [1, 0], in_b: [0, 1]})

print(outs[0])
print()
print(outs[1])
print()
print(outs[2])
2021-07-19 23:37:05.720243: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-07-19 23:37:06.406838: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:06.407495: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:00:05.0 name: NVIDIA Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-07-19 23:37:06.407533: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-19 23:37:06.410971: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-07-19 23:37:06.411090: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-07-19 23:37:06.412239: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-07-19 23:37:06.412612: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-07-19 23:37:06.413657: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-07-19 23:37:06.414637: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-07-19 23:37:06.414862: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-07-19 23:37:06.415002: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:06.415823: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:06.416461: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-19 23:37:06.417159: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-07-19 23:37:06.417858: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:06.418588: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:00:05.0 name: NVIDIA Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-07-19 23:37:06.418704: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:06.419416: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:06.420021: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-19 23:37:06.420085: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-19 23:37:07.053897: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-19 23:37:07.053954: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 
2021-07-19 23:37:07.053964: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N 
2021-07-19 23:37:07.054212: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:07.054962: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:07.055685: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:07.056348: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14646 MB memory) -> physical GPU (device: 0, name: NVIDIA Tesla V100-SXM2-16GB, pci bus id: 0000:00:05.0, compute capability: 7.0)
2021-07-19 23:37:07.060371: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2000165000 Hz
[[1. 0.]
 [1. 0.]]

[[0. 1.]
 [0. 1.]]

1.0

Dopo la conversione

Nel codice convertito:

  • Le variabili sono oggetti Python locali.
  • Il forward funzione definisce ancora il calcolo.
  • Il Session.run chiamata viene sostituita con una chiamata a forward .
  • L'invio facoltativo tf.function decoratore può essere aggiunto per le prestazioni.
  • Le regolarizzazioni vengono calcolate manualmente, senza fare riferimento ad alcuna raccolta globale.
  • Non c'è l'utilizzo di sessioni o segnaposto.
W = tf.Variable(tf.ones(shape=(2,2)), name="W")
b = tf.Variable(tf.zeros(shape=(2)), name="b")

@tf.function
def forward(x):
  return W * x + b

out_a = forward([1,0])
print(out_a)
tf.Tensor(
[[1. 0.]
 [1. 0.]], shape=(2, 2), dtype=float32)
2021-07-19 23:37:07.370160: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:07.370572: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:00:05.0 name: NVIDIA Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-07-19 23:37:07.370699: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:07.371011: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:07.371278: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-19 23:37:07.371360: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-19 23:37:07.371370: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 
2021-07-19 23:37:07.371377: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N 
2021-07-19 23:37:07.371511: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:07.371844: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:07.372131: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14646 MB memory) -> physical GPU (device: 0, name: NVIDIA Tesla V100-SXM2-16GB, pci bus id: 0000:00:05.0, compute capability: 7.0)
2021-07-19 23:37:07.419147: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
out_b = forward([0,1])

regularizer = tf.keras.regularizers.l2(0.04)
reg_loss=regularizer(W)

Modelli basati su tf.layers

Il v1.layers modulo viene utilizzato per contenere strato funzioni che utilizzavano v1.variable_scope per definire e variabili riutilizzo.

Prima di convertire

def model(x, training, scope='model'):
  with v1.variable_scope(scope, reuse=v1.AUTO_REUSE):
    x = v1.layers.conv2d(x, 32, 3, activation=v1.nn.relu,
          kernel_regularizer=lambda x:0.004*tf.reduce_mean(x**2))
    x = v1.layers.max_pooling2d(x, (2, 2), 1)
    x = v1.layers.flatten(x)
    x = v1.layers.dropout(x, 0.1, training=training)
    x = v1.layers.dense(x, 64, activation=v1.nn.relu)
    x = v1.layers.batch_normalization(x, training=training)
    x = v1.layers.dense(x, 10)
    return x
train_data = tf.ones(shape=(1, 28, 28, 1))
test_data = tf.ones(shape=(1, 28, 28, 1))

train_out = model(train_data, training=True)
test_out = model(test_data, training=False)

print(train_out)
print()
print(test_out)
/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/keras/legacy_tf_layers/convolutional.py:414: UserWarning: `tf.layers.conv2d` is deprecated and will be removed in a future version. Please Use `tf.keras.layers.Conv2D` instead.
  warnings.warn('`tf.layers.conv2d` is deprecated and '
/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py:2183: UserWarning: `layer.apply` is deprecated and will be removed in a future version. Please use `layer.__call__` method instead.
  warnings.warn('`layer.apply` is deprecated and '
2021-07-19 23:37:07.471106: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-07-19 23:37:09.562531: I tensorflow/stream_executor/cuda/cuda_dnn.cc:359] Loaded cuDNN version 8100
2021-07-19 23:37:14.794726: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
tf.Tensor([[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]], shape=(1, 10), dtype=float32)

tf.Tensor(
[[ 0.04853132 -0.08974641 -0.32679698  0.07017353  0.12982666 -0.2153313
  -0.09793851  0.10957378  0.01823931  0.00898573]], shape=(1, 10), dtype=float32)
2021-07-19 23:37:15.173234: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/keras/legacy_tf_layers/pooling.py:310: UserWarning: `tf.layers.max_pooling2d` is deprecated and will be removed in a future version. Please use `tf.keras.layers.MaxPooling2D` instead.
  warnings.warn('`tf.layers.max_pooling2d` is deprecated and '
/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/keras/legacy_tf_layers/core.py:329: UserWarning: `tf.layers.flatten` is deprecated and will be removed in a future version. Please use `tf.keras.layers.Flatten` instead.
  warnings.warn('`tf.layers.flatten` is deprecated and '
/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/keras/legacy_tf_layers/core.py:268: UserWarning: `tf.layers.dropout` is deprecated and will be removed in a future version. Please use `tf.keras.layers.Dropout` instead.
  warnings.warn('`tf.layers.dropout` is deprecated and '
/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/keras/legacy_tf_layers/core.py:171: UserWarning: `tf.layers.dense` is deprecated and will be removed in a future version. Please use `tf.keras.layers.Dense` instead.
  warnings.warn('`tf.layers.dense` is deprecated and '
/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/keras/legacy_tf_layers/normalization.py:308: UserWarning: `tf.layers.batch_normalization` is deprecated and will be removed in a future version. Please use `tf.keras.layers.BatchNormalization` instead. In particular, `tf.control_dependencies(tf.GraphKeys.UPDATE_OPS)` should not be used (consult the `tf.keras.layers.BatchNormalization` documentation).
  '`tf.layers.batch_normalization` is deprecated and '

Dopo la conversione

La maggior parte degli argomenti è rimasta la stessa. Ma nota le differenze:

  • La training argomento è passato a ogni livello dal modello quando viene eseguito.
  • Il primo argomento a quello originale model funzione (l'ingresso x ) è andato. Questo perché i livelli oggetto separano la creazione del modello dalla chiamata al modello.

Si noti inoltre che:

  • Se si utilizza regularizers o initializers da tf.contrib , questi hanno più cambiamenti di argomento di altri.
  • Il codice non è più scrive alle collezioni, in modo da funzioni come v1.losses.get_regularization_loss non torneranno questi valori, potenzialmente rompendo i loop di formazione.
model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, 3, activation='relu',
                           kernel_regularizer=tf.keras.regularizers.l2(0.04),
                           input_shape=(28, 28, 1)),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dropout(0.1),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Dense(10)
])

train_data = tf.ones(shape=(1, 28, 28, 1))
test_data = tf.ones(shape=(1, 28, 28, 1))
train_out = model(train_data, training=True)
print(train_out)
tf.Tensor([[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]], shape=(1, 10), dtype=float32)
test_out = model(test_data, training=False)
print(test_out)
tf.Tensor(
[[-0.06252427  0.30122417 -0.18610534 -0.04890637 -0.01496555  0.41607457
   0.24905115  0.014429   -0.12719882 -0.22354674]], shape=(1, 10), dtype=float32)
# Here are all the trainable variables
len(model.trainable_variables)
8
# Here is the regularization loss
model.losses
[<tf.Tensor: shape=(), dtype=float32, numpy=0.07443664>]

Variabili misti & v1.layers

Codice esistenti spesso miscele di livello inferiore tensorflow 1.x variabili e operazioni con livello superiore v1.layers .

Prima di convertire

def model(x, training, scope='model'):
  with v1.variable_scope(scope, reuse=v1.AUTO_REUSE):
    W = v1.get_variable(
      "W", dtype=v1.float32,
      initializer=v1.ones(shape=x.shape),
      regularizer=lambda x:0.004*tf.reduce_mean(x**2),
      trainable=True)
    if training:
      x = x + W
    else:
      x = x + W * 0.5
    x = v1.layers.conv2d(x, 32, 3, activation=tf.nn.relu)
    x = v1.layers.max_pooling2d(x, (2, 2), 1)
    x = v1.layers.flatten(x)
    return x

train_out = model(train_data, training=True)
test_out = model(test_data, training=False)

Dopo la conversione

Per convertire questo codice, segui lo schema di mappatura dei livelli sui livelli come nell'esempio precedente.

Lo schema generale è:

  • Parametri strato Raccogliere in __init__ .
  • Costruire le variabili in build .
  • Eseguire i calcoli di call , e restituire il risultato.

Il v1.variable_scope è essenzialmente uno strato propria. Così riscriverlo come tf.keras.layers.Layer . Controlla i Conoscere nuovi Livelli e modelle via sottoclasse guida per i dettagli.

# Create a custom layer for part of the model
class CustomLayer(tf.keras.layers.Layer):
  def __init__(self, *args, **kwargs):
    super(CustomLayer, self).__init__(*args, **kwargs)

  def build(self, input_shape):
    self.w = self.add_weight(
        shape=input_shape[1:],
        dtype=tf.float32,
        initializer=tf.keras.initializers.ones(),
        regularizer=tf.keras.regularizers.l2(0.02),
        trainable=True)

  # Call method will sometimes get used in graph mode,
  # training will get turned into a tensor
  @tf.function
  def call(self, inputs, training=None):
    if training:
      return inputs + self.w
    else:
      return inputs + self.w * 0.5
custom_layer = CustomLayer()
print(custom_layer([1]).numpy())
print(custom_layer([1], training=True).numpy())
[1.5]
[2.]
train_data = tf.ones(shape=(1, 28, 28, 1))
test_data = tf.ones(shape=(1, 28, 28, 1))

# Build the model including the custom layer
model = tf.keras.Sequential([
    CustomLayer(input_shape=(28, 28, 1)),
    tf.keras.layers.Conv2D(32, 3, activation='relu'),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Flatten(),
])

train_out = model(train_data, training=True)
test_out = model(test_data, training=False)

Alcune cose da notare:

  • I modelli e i livelli Keras sottoclasse devono essere eseguiti sia nei grafici v1 (nessuna dipendenza dal controllo automatico) sia in modalità desideroso:

    • Avvolgere la call in un tf.function per ottenere autografi e controllo automatico delle dipendenze.
  • Non dimenticate di accettare una training argomento per call :

    • A volte si tratta di un tf.Tensor
    • A volte è un booleano Python
  • Creare le variabili del modello nel costruttore o Model.build utilizzando `self.add_weight:

    • In Model.build si ha accesso alla forma di ingresso, in modo da poter creare pesi con forma di corrispondenza
    • Utilizzando tf.keras.layers.Layer.add_weight permette Keras alle variabili della pista e le perdite di regolarizzazione
  • Non tenere tf.Tensors nei vostri oggetti:

    • Si potrebbe ottenere creato sia in un tf.function o nel contesto ansioso, e questi tensori comportarsi in modo diverso
    • Usa tf.Variable s per lo stato, sono sempre utilizzabile da entrambi i contesti
    • tf.Tensors sono solo per valori intermedi

Una nota su Slim e contrib.layers

Una grande quantità di vecchio codice tensorflow 1.x utilizza la Slim biblioteca, che è stato confezionato con tensorflow 1.x come tf.contrib.layers . Come contrib del modulo, questo non è più disponibile in tensorflow 2.x, anche in tf.compat.v1 . Conversione codice utilizzando Slim per tensorflow 2.x è più coinvolto di conversione repository che utilizzano v1.layers . In effetti, può essere utile per convertire il codice a Slim v1.layers prima, poi convertire in Keras.

  • Rimuovere arg_scopes , tutti args devono essere esplicito.
  • Se si utilizza loro, diviso normalizer_fn e activation_fn nelle proprie strati.
  • I livelli conv separabili vengono mappati su uno o più livelli Keras diversi (livelli Keras in profondità, in punti e separabili).
  • Sottile e v1.layers hanno diversi nomi degli argomenti e valori predefiniti.
  • Alcuni argomenti hanno scale diverse.
  • Se si utilizza Slim modelli pre-addestrati, provare i modelli pre-traimed di Keras da tf.keras.applications o TF Hub s' tensorflow 2.x SavedModels esportati dal codice originale Slim.

Alcuni tf.contrib strati potrebbero non sono stati spostati nucleo tensorflow ma sono invece spostati al pacchetto tensorflow Addons .

Formazione

Ci sono molti modi per i dati di alimentazione ad un tf.keras modello. Accetteranno generatori Python e array Numpy come input.

Il metodo consigliato per dati di alimentazione ad un modello è quello di utilizzare il tf.data pacchetto, che contiene un insieme di classi alte prestazioni per la manipolazione dei dati.

Se si sta ancora utilizzando tf.queue , questi sono ora supportati solo come data-strutture, non come condutture di ingresso.

Utilizzo dei set di dati TensorFlow

Il tensorflow Datasets pacchetto ( tfds ) contiene utilità per caricare set di dati predefiniti come tf.data.Dataset oggetti.

Per questo esempio, è possibile caricare il set di dati utilizzando MNIST tfds :

datasets, info = tfds.load(name='mnist', with_info=True, as_supervised=True)
mnist_train, mnist_test = datasets['train'], datasets['test']

Quindi preparare i dati per l'allenamento:

  • Ridimensiona ogni immagine.
  • Rimescola l'ordine degli esempi.
  • Raccogli lotti di immagini ed etichette.
BUFFER_SIZE = 10 # Use a much larger value for real code
BATCH_SIZE = 64
NUM_EPOCHS = 5


def scale(image, label):
  image = tf.cast(image, tf.float32)
  image /= 255

  return image, label

Per mantenere l'esempio breve, taglia il set di dati per restituire solo 5 batch:

train_data = mnist_train.map(scale).shuffle(BUFFER_SIZE).batch(BATCH_SIZE)
test_data = mnist_test.map(scale).batch(BATCH_SIZE)

STEPS_PER_EPOCH = 5

train_data = train_data.take(STEPS_PER_EPOCH)
test_data = test_data.take(STEPS_PER_EPOCH)
image_batch, label_batch = next(iter(train_data))
2021-07-19 23:37:19.049077: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.

Usa i cicli di allenamento Keras

Se non avete bisogno di controllo di basso livello del processo di formazione, utilizzando di Keras built-in fit , evaluate e predict i metodi è raccomandato. Questi metodi forniscono un'interfaccia uniforme per addestrare il modello indipendentemente dall'implementazione (sequenziale, funzionale o sottoclasse).

I vantaggi di questi metodi includono:

  • Essi accettano array NumPy, generatori Python e, tf.data.Datasets .
  • Applicano la regolarizzazione e le perdite di attivazione automaticamente.
  • Sostengono tf.distribute per la formazione multi-dispositivo .
  • Supportano callable arbitrari come perdite e metriche.
  • Sostengono callback come tf.keras.callbacks.TensorBoard , e callback personalizzati.
  • Sono performanti, utilizzando automaticamente i grafici TensorFlow.

Ecco un esempio di formazione di un modello usando un Dataset . (Per i dettagli su come funziona, controllare il tutorial sezione).

model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, 3, activation='relu',
                           kernel_regularizer=tf.keras.regularizers.l2(0.02),
                           input_shape=(28, 28, 1)),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dropout(0.1),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Dense(10)
])

# Model is the full model w/o custom layers
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

model.fit(train_data, epochs=NUM_EPOCHS)
loss, acc = model.evaluate(test_data)

print("Loss {}, Accuracy {}".format(loss, acc))
Epoch 1/5
5/5 [==============================] - 2s 8ms/step - loss: 1.5874 - accuracy: 0.4719
Epoch 2/5
2021-07-19 23:37:20.919125: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
5/5 [==============================] - 0s 5ms/step - loss: 0.4435 - accuracy: 0.9094
Epoch 3/5
2021-07-19 23:37:21.242435: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
5/5 [==============================] - 0s 6ms/step - loss: 0.2764 - accuracy: 0.9594
Epoch 4/5
2021-07-19 23:37:21.576808: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
5/5 [==============================] - 0s 5ms/step - loss: 0.1889 - accuracy: 0.9844
Epoch 5/5
2021-07-19 23:37:21.888991: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
5/5 [==============================] - 1s 6ms/step - loss: 0.1504 - accuracy: 0.9906
2021-07-19 23:37:23.082199: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
5/5 [==============================] - 1s 3ms/step - loss: 1.6299 - accuracy: 0.7031
Loss 1.6299388408660889, Accuracy 0.703125
2021-07-19 23:37:23.932781: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.

Scrivi il tuo ciclo

Se passo la formazione del modello Keras funziona per voi, ma è necessario un maggiore controllo fuori questo passo, è consigliabile utilizzare il tf.keras.Model.train_on_batch metodo, nel proprio ciclo di dati-iterazione.

Ricorda: Molte cose possono essere implementate come un tf.keras.callbacks.Callback .

Questo metodo presenta molti dei vantaggi dei metodi menzionati nella sezione precedente, ma offre all'utente il controllo del ciclo esterno.

È inoltre possibile utilizzare tf.keras.Model.test_on_batch o tf.keras.Model.evaluate per le prestazioni di controllo durante l'allenamento.

Per continuare ad addestrare il modello precedente:

# Model is the full model w/o custom layers
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

for epoch in range(NUM_EPOCHS):
  # Reset the metric accumulators
  model.reset_metrics()

  for image_batch, label_batch in train_data:
    result = model.train_on_batch(image_batch, label_batch)
    metrics_names = model.metrics_names
    print("train: ",
          "{}: {:.3f}".format(metrics_names[0], result[0]),
          "{}: {:.3f}".format(metrics_names[1], result[1]))
  for image_batch, label_batch in test_data:
    result = model.test_on_batch(image_batch, label_batch,
                                 # Return accumulated metrics
                                 reset_metrics=False)
  metrics_names = model.metrics_names
  print("\neval: ",
        "{}: {:.3f}".format(metrics_names[0], result[0]),
        "{}: {:.3f}".format(metrics_names[1], result[1]))
train:  loss: 0.131 accuracy: 1.000
train:  loss: 0.179 accuracy: 0.969
train:  loss: 0.117 accuracy: 0.984
train:  loss: 0.187 accuracy: 0.969
train:  loss: 0.168 accuracy: 0.969
2021-07-19 23:37:24.758128: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
2021-07-19 23:37:25.476778: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
eval:  loss: 1.655 accuracy: 0.703
train:  loss: 0.083 accuracy: 1.000
train:  loss: 0.080 accuracy: 1.000
train:  loss: 0.099 accuracy: 0.984
train:  loss: 0.088 accuracy: 1.000
train:  loss: 0.084 accuracy: 1.000
2021-07-19 23:37:25.822978: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
2021-07-19 23:37:26.103858: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
eval:  loss: 1.645 accuracy: 0.759
train:  loss: 0.066 accuracy: 1.000
train:  loss: 0.070 accuracy: 1.000
train:  loss: 0.062 accuracy: 1.000
train:  loss: 0.067 accuracy: 1.000
train:  loss: 0.061 accuracy: 1.000
2021-07-19 23:37:26.454306: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
2021-07-19 23:37:26.715112: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
eval:  loss: 1.609 accuracy: 0.819
train:  loss: 0.056 accuracy: 1.000
train:  loss: 0.053 accuracy: 1.000
train:  loss: 0.048 accuracy: 1.000
train:  loss: 0.057 accuracy: 1.000
train:  loss: 0.069 accuracy: 0.984
2021-07-19 23:37:27.059747: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
2021-07-19 23:37:27.327066: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
eval:  loss: 1.568 accuracy: 0.825
train:  loss: 0.048 accuracy: 1.000
train:  loss: 0.048 accuracy: 1.000
train:  loss: 0.044 accuracy: 1.000
train:  loss: 0.045 accuracy: 1.000
train:  loss: 0.045 accuracy: 1.000
2021-07-19 23:37:28.593597: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
eval:  loss: 1.531 accuracy: 0.841
2021-07-19 23:37:29.220455: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.

Personalizza la fase di allenamento

Se hai bisogno di maggiore flessibilità e controllo, puoi averlo implementando il tuo ciclo di allenamento. Ci sono tre passaggi:

  1. Iterare su un generatore di Python o tf.data.Dataset per ottenere lotti di esempi.
  2. Utilizzare tf.GradientTape ai gradienti Raccogliere.
  3. Utilizzare uno dei tf.keras.optimizers per applicare gli aggiornamenti di peso alle variabili del modello.

Ricorda:

  • Sempre includere una training argomento sulla call metodo di strati e modelli sottoclasse.
  • Assicurarsi di chiamare il modello con la training correttamente impostato argomento.
  • A seconda dell'utilizzo, le variabili del modello potrebbero non esistere finché il modello non viene eseguito su un batch di dati.
  • Devi gestire manualmente cose come le perdite di regolarizzazione per il modello.

Notare le semplificazioni relative alla v1:

  • Non è necessario eseguire inizializzatori di variabili. Le variabili vengono inizializzate al momento della creazione.
  • Non è necessario aggiungere dipendenze per il controllo manuale. Anche in tf.function operazioni comportano come nella modalità ansioso.
model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, 3, activation='relu',
                           kernel_regularizer=tf.keras.regularizers.l2(0.02),
                           input_shape=(28, 28, 1)),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dropout(0.1),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Dense(10)
])

optimizer = tf.keras.optimizers.Adam(0.001)
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

@tf.function
def train_step(inputs, labels):
  with tf.GradientTape() as tape:
    predictions = model(inputs, training=True)
    regularization_loss=tf.math.add_n(model.losses)
    pred_loss=loss_fn(labels, predictions)
    total_loss=pred_loss + regularization_loss

  gradients = tape.gradient(total_loss, model.trainable_variables)
  optimizer.apply_gradients(zip(gradients, model.trainable_variables))

for epoch in range(NUM_EPOCHS):
  for inputs, labels in train_data:
    train_step(inputs, labels)
  print("Finished epoch", epoch)
2021-07-19 23:37:29.998049: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
Finished epoch 0
2021-07-19 23:37:30.316333: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
Finished epoch 1
2021-07-19 23:37:30.618560: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
Finished epoch 2
2021-07-19 23:37:30.946881: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
Finished epoch 3
Finished epoch 4
2021-07-19 23:37:31.261594: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.

Metriche e perdite di nuovo stile

In TensorFlow 2.x, le metriche e le perdite sono oggetti. Questi lavori sia con entusiasmo e in tf.function s.

Un oggetto di perdita è chiamabile e si aspetta (y_true, y_pred) come argomenti:

cce = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
cce([[1, 0]], [[-1.0,3.0]]).numpy()
4.01815

Un oggetto metrico ha i seguenti metodi:

  • Metric.update_state() : aggiungere nuove osservazioni.
  • Metric.result() : ottenere il risultato corrente della metrica, dati i valori osservati.
  • Metric.reset_states() : cancellare tutte le osservazioni.

L'oggetto stesso è richiamabile. Chiamando gli aggiornamenti dello stato con nuove osservazioni, come con update_state , e restituisce il nuovo risultato della metrica.

Non è necessario inizializzare manualmente le variabili di una metrica e poiché TensorFlow 2.x ha dipendenze di controllo automatico, non è necessario preoccuparsi nemmeno di quelle.

Il codice seguente utilizza una metrica per tenere traccia della perdita media osservata all'interno di un ciclo di addestramento personalizzato.

# Create the metrics
loss_metric = tf.keras.metrics.Mean(name='train_loss')
accuracy_metric = tf.keras.metrics.SparseCategoricalAccuracy(name='train_accuracy')

@tf.function
def train_step(inputs, labels):
  with tf.GradientTape() as tape:
    predictions = model(inputs, training=True)
    regularization_loss=tf.math.add_n(model.losses)
    pred_loss=loss_fn(labels, predictions)
    total_loss=pred_loss + regularization_loss

  gradients = tape.gradient(total_loss, model.trainable_variables)
  optimizer.apply_gradients(zip(gradients, model.trainable_variables))
  # Update the metrics
  loss_metric.update_state(total_loss)
  accuracy_metric.update_state(labels, predictions)


for epoch in range(NUM_EPOCHS):
  # Reset the metrics
  loss_metric.reset_states()
  accuracy_metric.reset_states()

  for inputs, labels in train_data:
    train_step(inputs, labels)
  # Get the metric results
  mean_loss=loss_metric.result()
  mean_accuracy = accuracy_metric.result()

  print('Epoch: ', epoch)
  print('  loss:     {:.3f}'.format(mean_loss))
  print('  accuracy: {:.3f}'.format(mean_accuracy))
2021-07-19 23:37:31.878403: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
Epoch:  0
  loss:     0.172
  accuracy: 0.988
2021-07-19 23:37:32.177136: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
Epoch:  1
  loss:     0.143
  accuracy: 0.997
2021-07-19 23:37:32.493570: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
Epoch:  2
  loss:     0.126
  accuracy: 0.997
2021-07-19 23:37:32.807739: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
Epoch:  3
  loss:     0.109
  accuracy: 1.000
Epoch:  4
  loss:     0.092
  accuracy: 1.000
2021-07-19 23:37:33.155028: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.

Nomi delle metriche Keras

In TensorFlow 2.x, i modelli Keras sono più coerenti nella gestione dei nomi delle metriche.

Ora, quando si passa una stringa nella lista di metriche, la stringa esatta viene utilizzata come metrica name . Questi nomi sono visibili nell'oggetto storia restituito da model.fit , e nei registri passati al keras.callbacks . è impostato sulla stringa passata nell'elenco delle metriche.

model.compile(
    optimizer = tf.keras.optimizers.Adam(0.001),
    loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics = ['acc', 'accuracy', tf.keras.metrics.SparseCategoricalAccuracy(name="my_accuracy")])
history = model.fit(train_data)
5/5 [==============================] - 1s 6ms/step - loss: 0.1042 - acc: 0.9969 - accuracy: 0.9969 - my_accuracy: 0.9969
2021-07-19 23:37:34.039643: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
history.history.keys()
dict_keys(['loss', 'acc', 'accuracy', 'my_accuracy'])

Questo differisce da versioni precedenti in cui passa metrics=["accuracy"] sarebbe provocare dict_keys(['loss', 'acc'])

Ottimizzatori Keras

Gli ottimizzatori di v1.train , come v1.train.AdamOptimizer e v1.train.GradientDescentOptimizer , hanno equivalenti in tf.keras.optimizers .

Convertire v1.train a keras.optimizers

Ecco alcune cose da tenere a mente quando converti i tuoi ottimizzatori:

Nuovi valori predefiniti per alcuni tf.keras.optimizers

Non ci sono cambiamenti per optimizers.SGD , optimizers.Adam o optimizers.RMSprop .

I seguenti tassi di apprendimento predefiniti sono stati modificati:

TensorBoard

Tensorflow 2.x include modifiche significative alla tf.summary API utilizzata per i dati di riepilogo scrittura per la visualizzazione in TensorBoard. Per un'introduzione generale alla nuova tf.summary , ci sono diversi tutorial disponibili che utilizzano l'API 2.x tensorflow. Questo include una guida alla migrazione 2.x TensorBoard tensorflow .

Salvataggio e caricamento

Compatibilità checkpoint

Tensorflow 2.x utilizza i punti di controllo basati su oggetti .

I checkpoint basati sul nome vecchio stile possono ancora essere caricati, se stai attento. Il processo di conversione del codice può comportare modifiche al nome della variabile, ma esistono soluzioni alternative.

L'approccio più semplice è allineare i nomi del nuovo modello con i nomi nel checkpoint:

  • Le variabili ancora hanno tutti un name argomento è possibile impostare.
  • Modelli Keras anche prendere un name argomento come che si misero come prefisso per le loro variabili.
  • La v1.name_scope funzione può essere utilizzata per impostare prefissi nome della variabile. Questo è molto diverso da tf.variable_scope . Influisce solo sui nomi e non tiene traccia delle variabili e del riutilizzo.

Se questo non funziona per il vostro caso d'uso, provare il v1.train.init_from_checkpoint funzioni. Ci vuole assignment_map argomento, che specifica la mappatura da vecchi nomi a nomi nuovi.

Il repository tensorflow Estimator include un tool di conversione per aggiornare i punti di controllo per stimatori premade da tensorflow 1.x alla versione 2.0. Può servire come esempio di come costruire uno strumento per un caso d'uso simile.

Compatibilità con i modelli salvati

Non ci sono problemi di compatibilità significativi per i modelli salvati.

  • I modelli salvati di TensorFlow 1.x funzionano in TensorFlow 2.x.
  • TensorFlow 2.x save_models funziona in TensorFlow 1.x se tutte le operazioni sono supportate.

A Graph.pb o Graph.pbtxt

Non esiste un modo semplice per aggiornare un grezzo Graph.pb file da tensorflow 2.x. La soluzione migliore è aggiornare il codice che ha generato il file.

Ma, se si dispone di un "grafo congelato" (un tf.Graph dove le variabili sono state trasformate in costanti), allora è possibile convertire ad un concrete_function utilizzando v1.wrap_function :

def wrap_frozen_graph(graph_def, inputs, outputs):
  def _imports_graph_def():
    tf.compat.v1.import_graph_def(graph_def, name="")
  wrapped_import = tf.compat.v1.wrap_function(_imports_graph_def, [])
  import_graph = wrapped_import.graph
  return wrapped_import.prune(
      tf.nest.map_structure(import_graph.as_graph_element, inputs),
      tf.nest.map_structure(import_graph.as_graph_element, outputs))

Ad esempio, ecco un grafico congelato per Inception v1, dal 2016:

path = tf.keras.utils.get_file(
    'inception_v1_2016_08_28_frozen.pb',
    'http://storage.googleapis.com/download.tensorflow.org/models/inception_v1_2016_08_28_frozen.pb.tar.gz',
    untar=True)
Downloading data from http://storage.googleapis.com/download.tensorflow.org/models/inception_v1_2016_08_28_frozen.pb.tar.gz
24698880/24695710 [==============================] - 1s 0us/step

Caricare il tf.GraphDef :

graph_def = tf.compat.v1.GraphDef()
loaded = graph_def.ParseFromString(open(path,'rb').read())

Avvolgerlo in un concrete_function :

inception_func = wrap_frozen_graph(
    graph_def, inputs='input:0',
    outputs='InceptionV1/InceptionV1/Mixed_3b/Branch_1/Conv2d_0a_1x1/Relu:0')

Passagli un tensore come input:

input_img = tf.ones([1,224,224,3], dtype=tf.float32)
inception_func(input_img).shape
TensorShape([1, 28, 28, 96])

Stimatori

Formazione con stimatori

Gli stimatori sono supportati in TensorFlow 2.x.

Quando si utilizza stimatori, è possibile utilizzare input_fn , tf.estimator.TrainSpec , e tf.estimator.EvalSpec da tensorflow 1.x.

Ecco un esempio utilizzando input_fn con il treno e valutare le specifiche.

Creazione delle specifiche input_fn e train/eval

# Define the estimator's input_fn
def input_fn():
  datasets, info = tfds.load(name='mnist', with_info=True, as_supervised=True)
  mnist_train, mnist_test = datasets['train'], datasets['test']

  BUFFER_SIZE = 10000
  BATCH_SIZE = 64

  def scale(image, label):
    image = tf.cast(image, tf.float32)
    image /= 255

    return image, label[..., tf.newaxis]

  train_data = mnist_train.map(scale).shuffle(BUFFER_SIZE).batch(BATCH_SIZE)
  return train_data.repeat()

# Define train and eval specs
train_spec = tf.estimator.TrainSpec(input_fn=input_fn,
                                    max_steps=STEPS_PER_EPOCH * NUM_EPOCHS)
eval_spec = tf.estimator.EvalSpec(input_fn=input_fn,
                                  steps=STEPS_PER_EPOCH)

Utilizzo di una definizione del modello Keras

Ci sono alcune differenze nel modo in cui costruire i tuoi stimatori in TensorFlow 2.x.

Si consiglia di definire il proprio modello utilizzando Keras, quindi utilizzare il tf.keras.estimator.model_to_estimator utility per trasformare il modello in uno stimatore. Il codice seguente mostra come utilizzare questa utilità durante la creazione e l'addestramento di uno stimatore.

def make_model():
  return tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, 3, activation='relu',
                           kernel_regularizer=tf.keras.regularizers.l2(0.02),
                           input_shape=(28, 28, 1)),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dropout(0.1),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Dense(10)
  ])
model = make_model()

model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

estimator = tf.keras.estimator.model_to_estimator(
  keras_model = model
)

tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
INFO:tensorflow:Using default config.
INFO:tensorflow:Using default config.
WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmpbhtumut0
WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmpbhtumut0
INFO:tensorflow:Using the Keras model provided.
INFO:tensorflow:Using the Keras model provided.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/keras/layers/normalization.py:534: _colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/keras/backend.py:435: UserWarning: `tf.keras.backend.set_learning_phase` is deprecated and will be removed after 2020-10-11. To update it, simply pass a True/False value to the `training` argument of the `__call__` method of your layer or model.
  warnings.warn('`tf.keras.backend.set_learning_phase` is deprecated and '
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/keras/layers/normalization.py:534: _colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpbhtumut0', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
2021-07-19 23:37:36.453946: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:36.454330: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:00:05.0 name: NVIDIA Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-07-19 23:37:36.454461: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:36.454737: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:36.454977: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-19 23:37:36.455020: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-19 23:37:36.455027: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 
2021-07-19 23:37:36.455033: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N 
2021-07-19 23:37:36.455126: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:36.455479: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:36.455779: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14646 MB memory) -> physical GPU (device: 0, name: NVIDIA Tesla V100-SXM2-16GB, pci bus id: 0000:00:05.0, compute capability: 7.0)
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpbhtumut0', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Warm-starting with WarmStartSettings: WarmStartSettings(ckpt_to_initialize_from='/tmp/tmpbhtumut0/keras/keras_model.ckpt', vars_to_warm_start='.*', var_name_to_vocab_info={}, var_name_to_prev_var_name={})
INFO:tensorflow:Warm-starting with WarmStartSettings: WarmStartSettings(ckpt_to_initialize_from='/tmp/tmpbhtumut0/keras/keras_model.ckpt', vars_to_warm_start='.*', var_name_to_vocab_info={}, var_name_to_prev_var_name={})
INFO:tensorflow:Warm-starting from: /tmp/tmpbhtumut0/keras/keras_model.ckpt
INFO:tensorflow:Warm-starting from: /tmp/tmpbhtumut0/keras/keras_model.ckpt
INFO:tensorflow:Warm-starting variables only in TRAINABLE_VARIABLES.
INFO:tensorflow:Warm-starting variables only in TRAINABLE_VARIABLES.
INFO:tensorflow:Warm-started 8 variables.
INFO:tensorflow:Warm-started 8 variables.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Graph was finalized.
2021-07-19 23:37:39.175917: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:39.176299: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:00:05.0 name: NVIDIA Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-07-19 23:37:39.176424: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:39.176729: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:39.176999: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-19 23:37:39.177042: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-19 23:37:39.177050: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 
2021-07-19 23:37:39.177057: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N 
2021-07-19 23:37:39.177159: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:39.177481: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:39.177761: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14646 MB memory) -> physical GPU (device: 0, name: NVIDIA Tesla V100-SXM2-16GB, pci bus id: 0000:00:05.0, compute capability: 7.0)
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmpbhtumut0/model.ckpt.
INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmpbhtumut0/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:loss = 3.1193407, step = 0
INFO:tensorflow:loss = 3.1193407, step = 0
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 25...
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 25...
INFO:tensorflow:Saving checkpoints for 25 into /tmp/tmpbhtumut0/model.ckpt.
INFO:tensorflow:Saving checkpoints for 25 into /tmp/tmpbhtumut0/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 25...
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 25...
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py:2426: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  warnings.warn('`Model.state_updates` will be removed in a future version. '
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2021-07-19T23:37:42
INFO:tensorflow:Starting evaluation at 2021-07-19T23:37:42
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Graph was finalized.
2021-07-19 23:37:42.476830: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:42.477207: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:00:05.0 name: NVIDIA Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-07-19 23:37:42.477339: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:42.477648: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:42.477910: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-19 23:37:42.477955: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-19 23:37:42.477963: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 
2021-07-19 23:37:42.477969: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N 
2021-07-19 23:37:42.478058: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:42.478332: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
INFO:tensorflow:Restoring parameters from /tmp/tmpbhtumut0/model.ckpt-25
2021-07-19 23:37:42.478592: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14646 MB memory) -> physical GPU (device: 0, name: NVIDIA Tesla V100-SXM2-16GB, pci bus id: 0000:00:05.0, compute capability: 7.0)
INFO:tensorflow:Restoring parameters from /tmp/tmpbhtumut0/model.ckpt-25
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [1/5]
INFO:tensorflow:Evaluation [1/5]
INFO:tensorflow:Evaluation [2/5]
INFO:tensorflow:Evaluation [2/5]
INFO:tensorflow:Evaluation [3/5]
INFO:tensorflow:Evaluation [3/5]
INFO:tensorflow:Evaluation [4/5]
INFO:tensorflow:Evaluation [4/5]
INFO:tensorflow:Evaluation [5/5]
INFO:tensorflow:Evaluation [5/5]
INFO:tensorflow:Inference Time : 1.02146s
2021-07-19 23:37:43.437293: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
INFO:tensorflow:Inference Time : 1.02146s
INFO:tensorflow:Finished evaluation at 2021-07-19-23:37:43
INFO:tensorflow:Finished evaluation at 2021-07-19-23:37:43
INFO:tensorflow:Saving dict for global step 25: accuracy = 0.634375, global_step = 25, loss = 1.493957
INFO:tensorflow:Saving dict for global step 25: accuracy = 0.634375, global_step = 25, loss = 1.493957
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 25: /tmp/tmpbhtumut0/model.ckpt-25
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 25: /tmp/tmpbhtumut0/model.ckpt-25
INFO:tensorflow:Loss for final step: 0.37796202.
2021-07-19 23:37:43.510911: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
INFO:tensorflow:Loss for final step: 0.37796202.
({'accuracy': 0.634375, 'loss': 1.493957, 'global_step': 25}, [])

Utilizzando una consuetudine model_fn

Se si dispone di uno esistente stimatore personalizzato model_fn che è necessario mantenere, è possibile convertire il vostro model_fn utilizzare un modello Keras.

Tuttavia, per motivi di compatibilità, una consuetudine model_fn sarà ancora eseguito in modalità grafico 1.x stile. Ciò significa che non vi è alcuna esecuzione impaziente e nessuna dipendenza dal controllo automatico.

Model_fn personalizzato con modifiche minime

Per rendere la vostra abitudine model_fn lavoro tensorflow 2.x, se si preferisce modifiche minime al codice esistente, tf.compat.v1 simboli come optimizers e metrics possono essere utilizzati.

Utilizzando un modello Keras in un costume model_fn è simile a quello utilizzato in un ciclo di formazione personalizzato:

  • Impostare la training di fase in modo appropriato, in base alla mode argomentazione.
  • In modo esplicito passare del modello trainable_variables per l'ottimizzatore.

Ma ci sono differenze importanti, relativi ad un ciclo personalizzato :

  • Invece di usare Model.losses , estrarre le perdite utilizzando Model.get_losses_for .
  • Estrarre gli aggiornamenti del modello utilizzando Model.get_updates_for .

Il codice seguente crea uno stimatore da una consuetudine model_fn , illustrando tutte queste preoccupazioni.

def my_model_fn(features, labels, mode):
  model = make_model()

  optimizer = tf.compat.v1.train.AdamOptimizer()
  loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

  training = (mode == tf.estimator.ModeKeys.TRAIN)
  predictions = model(features, training=training)

  if mode == tf.estimator.ModeKeys.PREDICT:
    return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)

  reg_losses = model.get_losses_for(None) + model.get_losses_for(features)
  total_loss=loss_fn(labels, predictions) + tf.math.add_n(reg_losses)

  accuracy = tf.compat.v1.metrics.accuracy(labels=labels,
                                           predictions=tf.math.argmax(predictions, axis=1),
                                           name='acc_op')

  update_ops = model.get_updates_for(None) + model.get_updates_for(features)
  minimize_op = optimizer.minimize(
      total_loss,
      var_list=model.trainable_variables,
      global_step=tf.compat.v1.train.get_or_create_global_step())
  train_op = tf.group(minimize_op, update_ops)

  return tf.estimator.EstimatorSpec(
    mode=mode,
    predictions=predictions,
    loss=total_loss,
    train_op=train_op, eval_metric_ops={'accuracy': accuracy})

# Create the Estimator & Train
estimator = tf.estimator.Estimator(model_fn=my_model_fn)
tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
INFO:tensorflow:Using default config.
INFO:tensorflow:Using default config.
WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmpqiom6a5s
WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmpqiom6a5s
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpqiom6a5s', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpqiom6a5s', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Graph was finalized.
2021-07-19 23:37:46.140692: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:46.141065: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:00:05.0 name: NVIDIA Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-07-19 23:37:46.141220: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:46.141517: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:46.141765: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-19 23:37:46.141807: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-19 23:37:46.141814: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 
2021-07-19 23:37:46.141820: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N 
2021-07-19 23:37:46.141907: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:46.142234: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:46.142497: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14646 MB memory) -> physical GPU (device: 0, name: NVIDIA Tesla V100-SXM2-16GB, pci bus id: 0000:00:05.0, compute capability: 7.0)
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmpqiom6a5s/model.ckpt.
INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmpqiom6a5s/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:loss = 2.9167266, step = 0
INFO:tensorflow:loss = 2.9167266, step = 0
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 25...
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 25...
INFO:tensorflow:Saving checkpoints for 25 into /tmp/tmpqiom6a5s/model.ckpt.
INFO:tensorflow:Saving checkpoints for 25 into /tmp/tmpqiom6a5s/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 25...
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 25...
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2021-07-19T23:37:49
INFO:tensorflow:Starting evaluation at 2021-07-19T23:37:49
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpqiom6a5s/model.ckpt-25
2021-07-19 23:37:49.640699: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:49.641091: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:00:05.0 name: NVIDIA Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-07-19 23:37:49.641238: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:49.641580: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:49.641848: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-19 23:37:49.641893: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-19 23:37:49.641901: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 
2021-07-19 23:37:49.641910: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N 
2021-07-19 23:37:49.642029: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:49.642355: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:49.642657: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14646 MB memory) -> physical GPU (device: 0, name: NVIDIA Tesla V100-SXM2-16GB, pci bus id: 0000:00:05.0, compute capability: 7.0)
INFO:tensorflow:Restoring parameters from /tmp/tmpqiom6a5s/model.ckpt-25
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [1/5]
INFO:tensorflow:Evaluation [1/5]
INFO:tensorflow:Evaluation [2/5]
INFO:tensorflow:Evaluation [2/5]
INFO:tensorflow:Evaluation [3/5]
INFO:tensorflow:Evaluation [3/5]
INFO:tensorflow:Evaluation [4/5]
INFO:tensorflow:Evaluation [4/5]
INFO:tensorflow:Evaluation [5/5]
INFO:tensorflow:Evaluation [5/5]
INFO:tensorflow:Inference Time : 1.38362s
2021-07-19 23:37:50.924973: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
INFO:tensorflow:Inference Time : 1.38362s
INFO:tensorflow:Finished evaluation at 2021-07-19-23:37:50
INFO:tensorflow:Finished evaluation at 2021-07-19-23:37:50
INFO:tensorflow:Saving dict for global step 25: accuracy = 0.70625, global_step = 25, loss = 1.6135181
INFO:tensorflow:Saving dict for global step 25: accuracy = 0.70625, global_step = 25, loss = 1.6135181
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 25: /tmp/tmpqiom6a5s/model.ckpt-25
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 25: /tmp/tmpqiom6a5s/model.ckpt-25
INFO:tensorflow:Loss for final step: 0.60315084.
2021-07-19 23:37:51.035953: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
INFO:tensorflow:Loss for final step: 0.60315084.
({'accuracy': 0.70625, 'loss': 1.6135181, 'global_step': 25}, [])

Personalizzato model_fn con i simboli 2.x tensorflow

Se si vuole sbarazzarsi di tutti i simboli 1.x tensorflow e aggiornare la vostra abitudine model_fn a tensorflow 2.x, è necessario aggiornare l'ottimizzatore e metriche per tf.keras.optimizers e tf.keras.metrics .

Nel personalizzato model_fn , oltre alle suddette modifiche , più aggiornamenti devono essere fatte:

Per l'esempio di cui sopra della my_model_fn , il codice migrato con i simboli 2.x tensorflow è indicata come:

def my_model_fn(features, labels, mode):
  model = make_model()

  training = (mode == tf.estimator.ModeKeys.TRAIN)
  loss_obj = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
  predictions = model(features, training=training)

  # Get both the unconditional losses (the None part)
  # and the input-conditional losses (the features part).
  reg_losses = model.get_losses_for(None) + model.get_losses_for(features)
  total_loss=loss_obj(labels, predictions) + tf.math.add_n(reg_losses)

  # Upgrade to tf.keras.metrics.
  accuracy_obj = tf.keras.metrics.Accuracy(name='acc_obj')
  accuracy = accuracy_obj.update_state(
      y_true=labels, y_pred=tf.math.argmax(predictions, axis=1))

  train_op = None
  if training:
    # Upgrade to tf.keras.optimizers.
    optimizer = tf.keras.optimizers.Adam()
    # Manually assign tf.compat.v1.global_step variable to optimizer.iterations
    # to make tf.compat.v1.train.global_step increased correctly.
    # This assignment is a must for any `tf.train.SessionRunHook` specified in
    # estimator, as SessionRunHooks rely on global step.
    optimizer.iterations = tf.compat.v1.train.get_or_create_global_step()
    # Get both the unconditional updates (the None part)
    # and the input-conditional updates (the features part).
    update_ops = model.get_updates_for(None) + model.get_updates_for(features)
    # Compute the minimize_op.
    minimize_op = optimizer.get_updates(
        total_loss,
        model.trainable_variables)[0]
    train_op = tf.group(minimize_op, *update_ops)

  return tf.estimator.EstimatorSpec(
    mode=mode,
    predictions=predictions,
    loss=total_loss,
    train_op=train_op,
    eval_metric_ops={'Accuracy': accuracy_obj})

# Create the Estimator and train.
estimator = tf.estimator.Estimator(model_fn=my_model_fn)
tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
INFO:tensorflow:Using default config.
INFO:tensorflow:Using default config.
WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmpomveromc
WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmpomveromc
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpomveromc', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpomveromc', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Graph was finalized.
2021-07-19 23:37:53.371110: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:53.371633: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:00:05.0 name: NVIDIA Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-07-19 23:37:53.371845: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:53.372311: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:53.372679: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-19 23:37:53.372742: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-19 23:37:53.372779: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 
2021-07-19 23:37:53.372790: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N 
2021-07-19 23:37:53.372939: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:53.373380: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:53.373693: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14646 MB memory) -> physical GPU (device: 0, name: NVIDIA Tesla V100-SXM2-16GB, pci bus id: 0000:00:05.0, compute capability: 7.0)
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmpomveromc/model.ckpt.
INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmpomveromc/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:loss = 2.874814, step = 0
INFO:tensorflow:loss = 2.874814, step = 0
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 25...
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 25...
INFO:tensorflow:Saving checkpoints for 25 into /tmp/tmpomveromc/model.ckpt.
INFO:tensorflow:Saving checkpoints for 25 into /tmp/tmpomveromc/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 25...
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 25...
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2021-07-19T23:37:56
INFO:tensorflow:Starting evaluation at 2021-07-19T23:37:56
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpomveromc/model.ckpt-25
2021-07-19 23:37:56.884303: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:56.884746: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:00:05.0 name: NVIDIA Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-07-19 23:37:56.884934: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:56.885330: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:56.885640: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-19 23:37:56.885696: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-19 23:37:56.885711: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 
2021-07-19 23:37:56.885720: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N 
2021-07-19 23:37:56.885861: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:56.886386: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-19 23:37:56.886729: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14646 MB memory) -> physical GPU (device: 0, name: NVIDIA Tesla V100-SXM2-16GB, pci bus id: 0000:00:05.0, compute capability: 7.0)
INFO:tensorflow:Restoring parameters from /tmp/tmpomveromc/model.ckpt-25
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [1/5]
INFO:tensorflow:Evaluation [1/5]
INFO:tensorflow:Evaluation [2/5]
INFO:tensorflow:Evaluation [2/5]
INFO:tensorflow:Evaluation [3/5]
INFO:tensorflow:Evaluation [3/5]
INFO:tensorflow:Evaluation [4/5]
INFO:tensorflow:Evaluation [4/5]
INFO:tensorflow:Evaluation [5/5]
INFO:tensorflow:Evaluation [5/5]
INFO:tensorflow:Inference Time : 1.04574s
2021-07-19 23:37:57.852422: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
INFO:tensorflow:Inference Time : 1.04574s
INFO:tensorflow:Finished evaluation at 2021-07-19-23:37:57
INFO:tensorflow:Finished evaluation at 2021-07-19-23:37:57
INFO:tensorflow:Saving dict for global step 25: Accuracy = 0.790625, global_step = 25, loss = 1.4257433
INFO:tensorflow:Saving dict for global step 25: Accuracy = 0.790625, global_step = 25, loss = 1.4257433
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 25: /tmp/tmpomveromc/model.ckpt-25
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 25: /tmp/tmpomveromc/model.ckpt-25
INFO:tensorflow:Loss for final step: 0.42627147.
2021-07-19 23:37:57.941217: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
INFO:tensorflow:Loss for final step: 0.42627147.
({'Accuracy': 0.790625, 'loss': 1.4257433, 'global_step': 25}, [])

Stimatori predefiniti

Premade stimatori nella famiglia di tf.estimator.DNN* , tf.estimator.Linear* e tf.estimator.DNNLinearCombined* sono ancora supportato negli API tensorflow 2.x. Tuttavia, alcuni argomenti sono cambiati:

  1. input_layer_partitioner : Rimosso in v2.
  2. loss_reduction : aggiornato per tf.keras.losses.Reduction invece di tf.compat.v1.losses.Reduction . Il suo valore di default è cambiato anche per tf.keras.losses.Reduction.SUM_OVER_BATCH_SIZE da tf.compat.v1.losses.Reduction.SUM .
  3. optimizer , dnn_optimizer e linear_optimizer : questo argomento è stato aggiornato per tf.keras.optimizers al posto del tf.compat.v1.train.Optimizer .

Per migrare le modifiche precedenti:

  1. Nessuna migrazione è necessario per input_layer_partitioner poiché Distribution Strategy gestirà automaticamente in tensorflow 2.x.
  2. Per loss_reduction , controllare tf.keras.losses.Reduction per le opzioni supportate.
  3. Per optimizer argomenti:
    • Se non lo fai: 1) passare l' optimizer , dnn_optimizer o linear_optimizer argomento, o 2) specificare l' optimizer argomento come una string nel codice, allora non c'è bisogno di cambiare nulla, perché tf.keras.optimizers viene utilizzato per default .
    • In caso contrario, è necessario aggiornare da tf.compat.v1.train.Optimizer ai suoi corrispondenti tf.keras.optimizers .

Convertitore Checkpoint

La migrazione di keras.optimizers rompere punti di controllo salvati utilizzando tensorflow 1.x, come tf.keras.optimizers genera un diverso insieme di variabili da salvare in punti di controllo. Per rendere riutilizzabile vecchio posto di blocco dopo la migrazione a tensorflow 2.x, provare lo strumento di conversione checkpoint .

 curl -O https://raw.githubusercontent.com/tensorflow/estimator/master/tensorflow_estimator/python/estimator/tools/checkpoint_converter.py
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 14889  100 14889    0     0  60771      0 --:--:-- --:--:-- --:--:-- 60771

Lo strumento ha un aiuto integrato:

 python checkpoint_converter.py -h
2021-07-19 23:37:58.805973: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
usage: checkpoint_converter.py [-h]
                               {dnn,linear,combined} source_checkpoint
                               source_graph target_checkpoint

positional arguments:
  {dnn,linear,combined}
                        The type of estimator to be converted. So far, the
                        checkpoint converter only supports Canned Estimator.
                        So the allowed types include linear, dnn and combined.
  source_checkpoint     Path to source checkpoint file to be read in.
  source_graph          Path to source graph file to be read in.
  target_checkpoint     Path to checkpoint file to be written out.

optional arguments:
  -h, --help            show this help message and exit

Tensore Forma

Questa classe è stato semplificato per tenere int s, invece di tf.compat.v1.Dimension oggetti. Quindi non c'è bisogno di chiamare .value per ottenere un int .

Individuale tf.compat.v1.Dimension oggetti sono ancora accessibili da tf.TensorShape.dims .

Di seguito vengono illustrate le differenze tra TensorFlow 1.x e TensorFlow 2.x.

# Create a shape and choose an index
i = 0
shape = tf.TensorShape([16, None, 256])
shape
TensorShape([16, None, 256])

Se avessi questo in TensorFlow 1.x:

value = shape[i].value

Quindi esegui questa operazione in TensorFlow 2.x:

value = shape[i]
value
16

Se avessi questo in TensorFlow 1.x:

for dim in shape:
    value = dim.value
    print(value)

Quindi esegui questa operazione in TensorFlow 2.x:

for value in shape:
  print(value)
16
None
256

Se lo avevi in ​​TensorFlow 1.x (o usavi qualsiasi altro metodo di dimensione):

dim = shape[i]
dim.assert_is_compatible_with(other_dim)

Quindi esegui questa operazione in TensorFlow 2.x:

other_dim = 16
Dimension = tf.compat.v1.Dimension

if shape.rank is None:
  dim = Dimension(None)
else:
  dim = shape.dims[i]
dim.is_compatible_with(other_dim) # or any other dimension method
True
shape = tf.TensorShape(None)

if shape:
  dim = shape.dims[i]
  dim.is_compatible_with(other_dim) # or any other dimension method

Il valore booleano di un tf.TensorShape è True se il rango è noto, False altrimenti.

print(bool(tf.TensorShape([])))      # Scalar
print(bool(tf.TensorShape([0])))     # 0-length vector
print(bool(tf.TensorShape([1])))     # 1-length vector
print(bool(tf.TensorShape([None])))  # Unknown-length vector
print(bool(tf.TensorShape([1, 10, 100])))       # 3D tensor
print(bool(tf.TensorShape([None, None, None]))) # 3D tensor with no known dimensions
print()
print(bool(tf.TensorShape(None)))  # A tensor with unknown rank.
True
True
True
True
True
True

False

Altre modifiche

  • Rimuovere tf.colocate_with : gli algoritmi di posizionamento del dispositivo di tensorflow sono migliorati in modo significativo. Questo non dovrebbe più essere necessario. Se la rimozione provoca un degrado delle prestazioni , compilare una segnalazione .

  • Sostituire v1.ConfigProto utilizzo con funzioni equivalenti da tf.config .

Conclusioni

Il processo complessivo è:

  1. Esegui lo script di aggiornamento.
  2. Rimuovere i simboli di contributo.
  3. Cambia i tuoi modelli in uno stile orientato agli oggetti (Keras).
  4. Utilizzare tf.keras o tf.estimator formazione e loop di valutazione dove si può.
  5. Altrimenti, usa loop personalizzati, ma assicurati di evitare sessioni e raccolte.

Ci vuole un po' di lavoro per convertire il codice in TensorFlow 2.x idiomatico, ma ogni modifica si traduce in:

  • Meno righe di codice.
  • Maggiore chiarezza e semplicità.
  • Debug più semplice.