Ce tutoriel décrit la régularisation graphique de l' apprentissage structuré Neural cadre et démontre un flux de travail de bout en bout pour la classification de sentiment dans un pipeline de TFX.
![]() | ![]() | ![]() | ![]() |
Aperçu
Ce bloc - notes classifie critiques de films comme positifs ou négatifs en utilisant le texte de l'examen. Ceci est un exemple de classification binaire, une sorte importante et largement applicable problème d'apprentissage de la machine.
Nous allons démontrer l'utilisation de la régularisation de graphe dans ce cahier en construisant un graphe à partir de l'entrée donnée. La recette générale pour construire un modèle de graphe régularisé à l'aide du framework Neural Structured Learning (NSL) lorsque l'entrée ne contient pas de graphe explicite est la suivante :
- Créez des incorporations pour chaque échantillon de texte dans l'entrée. Cela peut être fait en utilisant des modèles pré-formés tels que word2vec , pivotant , BERT etc.
- Construisez un graphique basé sur ces plongements en utilisant une métrique de similarité telle que la distance « L2 », la distance « cosinus », etc. Les nœuds du graphique correspondent aux échantillons et les arêtes du graphique correspondent à la similarité entre les paires d'échantillons.
- Générez des données d'entraînement à partir du graphique synthétisé ci-dessus et des exemples de caractéristiques. Les données d'apprentissage résultantes contiendront des caractéristiques voisines en plus des caractéristiques de nœud d'origine.
- Créez un réseau de neurones comme modèle de base à l'aide d'estimateurs.
- Enroulez le modèle de base avec la
add_graph_regularization
fonction enveloppe, qui est fourni par le cadre de NSL, pour créer un nouveau modèle de prévision du graphique. Ce nouveau modèle inclura une perte de régularisation de graphe comme terme de régularisation dans son objectif d'entraînement. - Former et évaluer le modèle de l'estimateur graphique.
Dans ce didacticiel, nous intégrons le flux de travail ci-dessus dans un pipeline TFX à l'aide de plusieurs composants TFX personnalisés ainsi qu'un composant d'entraînement personnalisé à régularisation graphique.
Vous trouverez ci-dessous le schéma de notre pipeline TFX. Les cases oranges représentent les composants TFX du commerce et les cases roses représentent les composants TFX personnalisés.
Pip de mise à niveau
Pour éviter de mettre à niveau Pip dans un système lors de l'exécution locale, assurez-vous que nous exécutons dans Colab. Les systèmes locaux peuvent bien sûr être mis à niveau séparément.
try:
import colab
!pip install --upgrade pip
except:
pass
Installer les packages requis
!pip install -q -U \
tfx==1.2.0 \
neural-structured-learning \
tensorflow-hub \
tensorflow-datasets
As-tu redémarré le runtime ?
Si vous utilisez Google Colab, la première fois que vous exécutez la cellule ci-dessus, vous devez redémarrer le runtime (Runtime > Redémarrer le runtime...). Cela est dû à la façon dont Colab charge les packages.
Dépendances et importations
import apache_beam as beam
import gzip as gzip_lib
import numpy as np
import os
import pprint
import shutil
import tempfile
import urllib
import uuid
pp = pprint.PrettyPrinter()
import tensorflow as tf
import neural_structured_learning as nsl
import tfx
from tfx.components.evaluator.component import Evaluator
from tfx.components.example_gen.import_example_gen.component import ImportExampleGen
from tfx.components.example_validator.component import ExampleValidator
from tfx.components.model_validator.component import ModelValidator
from tfx.components.pusher.component import Pusher
from tfx.components.schema_gen.component import SchemaGen
from tfx.components.statistics_gen.component import StatisticsGen
from tfx.components.trainer import executor as trainer_executor
from tfx.components.trainer.component import Trainer
from tfx.components.transform.component import Transform
from tfx.dsl.components.base import executor_spec
from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext
from tfx.proto import evaluator_pb2
from tfx.proto import example_gen_pb2
from tfx.proto import pusher_pb2
from tfx.proto import trainer_pb2
from tfx.types import artifact
from tfx.types import artifact_utils
from tfx.types import channel
from tfx.types import standard_artifacts
from tfx.types.standard_artifacts import Examples
from tfx.dsl.component.experimental.annotations import InputArtifact
from tfx.dsl.component.experimental.annotations import OutputArtifact
from tfx.dsl.component.experimental.annotations import Parameter
from tfx.dsl.component.experimental.decorators import component
from tensorflow_metadata.proto.v0 import anomalies_pb2
from tensorflow_metadata.proto.v0 import schema_pb2
from tensorflow_metadata.proto.v0 import statistics_pb2
import tensorflow_data_validation as tfdv
import tensorflow_transform as tft
import tensorflow_model_analysis as tfma
import tensorflow_hub as hub
import tensorflow_datasets as tfds
print("TF Version: ", tf.__version__)
print("Eager mode: ", tf.executing_eagerly())
print(
"GPU is",
"available" if tf.config.list_physical_devices("GPU") else "NOT AVAILABLE")
print("NSL Version: ", nsl.__version__)
print("TFX Version: ", tfx.__version__)
print("TFDV version: ", tfdv.__version__)
print("TFT version: ", tft.__version__)
print("TFMA version: ", tfma.__version__)
print("Hub version: ", hub.__version__)
print("Beam version: ", beam.__version__)
TF Version: 2.5.2 Eager mode: True GPU is available NSL Version: 1.3.1 TFX Version: 1.2.0 TFDV version: 1.2.0 TFT version: 1.2.0 TFMA version: 0.33.0 Hub version: 0.12.0 Beam version: 2.34.0
Jeu de données IMDB
Le jeu de données IMDB contient le texte de 50.000 critiques de films de la Internet Movie Database . Ceux-ci sont divisés en 25 000 avis pour la formation et 25 000 avis pour les tests. Les ensembles de formation et d' essai sont équilibrés, ce qui signifie qu'ils contiennent un nombre égal de commentaires positifs et négatifs. De plus, il y a 50 000 critiques de films non étiquetées supplémentaires.
Télécharger le jeu de données IMDB prétraité
Le code suivant télécharge l'ensemble de données IMDB (ou utilise une copie en cache s'il a déjà été téléchargé) à l'aide de TFDS. Pour accélérer ce bloc-notes, nous n'utiliserons que 10 000 avis étiquetés et 10 000 avis non étiquetés pour la formation, et 10 000 avis de test pour l'évaluation.
train_set, eval_set = tfds.load(
"imdb_reviews:1.0.0",
split=["train[:10000]+unsupervised[:10000]", "test[:10000]"],
shuffle_files=False)
Examinons quelques critiques de l'ensemble de formation :
for tfrecord in train_set.take(4):
print("Review: {}".format(tfrecord["text"].numpy().decode("utf-8")[:300]))
print("Label: {}\n".format(tfrecord["label"].numpy()))
Review: This was an absolutely terrible movie. Don't be lured in by Christopher Walken or Michael Ironside. Both are great actors, but this must simply be their worst role in history. Even their great acting could not redeem this movie's ridiculous storyline. This movie is an early nineties US propaganda pi Label: 0 Review: I have been known to fall asleep during films, but this is usually due to a combination of things including, really tired, being warm and comfortable on the sette and having just eaten a lot. However on this occasion I fell asleep because the film was rubbish. The plot development was constant. Cons Label: 0 Review: Mann photographs the Alberta Rocky Mountains in a superb fashion, and Jimmy Stewart and Walter Brennan give enjoyable performances as they always seem to do. <br /><br />But come on Hollywood - a Mountie telling the people of Dawson City, Yukon to elect themselves a marshal (yes a marshal!) and to e Label: 0 Review: This is the kind of film for a snowy Sunday afternoon when the rest of the world can go ahead with its own business as you descend into a big arm-chair and mellow for a couple of hours. Wonderful performances from Cher and Nicolas Cage (as always) gently row the plot along. There are no rapids to cr Label: 1
def _dict_to_example(instance):
"""Decoded CSV to tf example."""
feature = {}
for key, value in instance.items():
if value is None:
feature[key] = tf.train.Feature()
elif value.dtype == np.integer:
feature[key] = tf.train.Feature(
int64_list=tf.train.Int64List(value=value.tolist()))
elif value.dtype == np.float32:
feature[key] = tf.train.Feature(
float_list=tf.train.FloatList(value=value.tolist()))
else:
feature[key] = tf.train.Feature(
bytes_list=tf.train.BytesList(value=value.tolist()))
return tf.train.Example(features=tf.train.Features(feature=feature))
examples_path = tempfile.mkdtemp(prefix="tfx-data")
train_path = os.path.join(examples_path, "train.tfrecord")
eval_path = os.path.join(examples_path, "eval.tfrecord")
for path, dataset in [(train_path, train_set), (eval_path, eval_set)]:
with tf.io.TFRecordWriter(path) as writer:
for example in dataset:
writer.write(
_dict_to_example({
"label": np.array([example["label"].numpy()]),
"text": np.array([example["text"].numpy()]),
}).SerializeToString())
/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/ipykernel_launcher.py:7: DeprecationWarning: Converting `np.integer` or `np.signedinteger` to a dtype is deprecated. The current result is `np.dtype(np.int_)` which is not strictly correct. Note that the result depends on the system. To ensure stable results use may want to use `np.int64` or `np.int32`. import sys
Exécuter les composants TFX de manière interactive
Dans les cellules qui suivent , vous construire des composants TFX et exécuter chacun de manière interactive dans le InteractiveContext pour obtenir ExecutionResult
objets. Cela reflète le processus d'un orchestrateur exécutant des composants dans un DAG TFX en fonction du moment où les dépendances de chaque composant sont satisfaites.
context = InteractiveContext()
WARNING:absl:InteractiveContext pipeline_root argument not provided: using temporary directory /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9 as root for pipeline outputs. WARNING:absl:InteractiveContext metadata_connection_config not provided: using SQLite ML Metadata database at /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/metadata.sqlite.
Le composant ExampleGen
Dans tout processus de développement ML, la première étape du démarrage du développement de code consiste à ingérer les ensembles de données d'entraînement et de test. Le ExampleGen
composant apporte des données dans le pipeline TFX.
Créez un composant ExampleGen et exécutez-le.
input_config = example_gen_pb2.Input(splits=[
example_gen_pb2.Input.Split(name='train', pattern='train.tfrecord'),
example_gen_pb2.Input.Split(name='eval', pattern='eval.tfrecord')
])
example_gen = ImportExampleGen(input_base=examples_path, input_config=input_config)
context.run(example_gen, enable_cache=True)
WARNING:apache_beam.runners.interactive.interactive_environment:Dependencies required for Interactive Beam PCollection visualization are not available, please use: `pip install apache-beam[interactive]` to install necessary dependencies to enable all data visualization features. WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter. WARNING:apache_beam.io.tfrecordio:Couldn't find python-snappy so the implementation of _TFRecordUtil._masked_crc32c is not as fast as it could be.
for artifact in example_gen.outputs['examples'].get():
print(artifact)
print('\nexample_gen.outputs is a {}'.format(type(example_gen.outputs)))
print(example_gen.outputs)
print(example_gen.outputs['examples'].get()[0].split_names)
Artifact(artifact: id: 1 type_id: 14 uri: "/tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/ImportExampleGen/examples/1" properties { key: "split_names" value { string_value: "[\"train\", \"eval\"]" } } custom_properties { key: "file_format" value { string_value: "tfrecords_gzip" } } custom_properties { key: "input_fingerprint" value { string_value: "split:train,num_files:1,total_bytes:27706811,xor_checksum:1638618106,sum_checksum:1638618106\nsplit:eval,num_files:1,total_bytes:13374744,xor_checksum:1638618111,sum_checksum:1638618111" } } custom_properties { key: "payload_format" value { string_value: "FORMAT_TF_EXAMPLE" } } custom_properties { key: "span" value { int_value: 0 } } custom_properties { key: "state" value { string_value: "published" } } custom_properties { key: "tfx_version" value { string_value: "1.2.0" } } state: LIVE , artifact_type: id: 14 name: "Examples" properties { key: "span" value: INT } properties { key: "split_names" value: STRING } properties { key: "version" value: INT } ) example_gen.outputs is a <class 'dict'> {'examples': Channel( type_name: Examples artifacts: [Artifact(artifact: id: 1 type_id: 14 uri: "/tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/ImportExampleGen/examples/1" properties { key: "split_names" value { string_value: "[\"train\", \"eval\"]" } } custom_properties { key: "file_format" value { string_value: "tfrecords_gzip" } } custom_properties { key: "input_fingerprint" value { string_value: "split:train,num_files:1,total_bytes:27706811,xor_checksum:1638618106,sum_checksum:1638618106\nsplit:eval,num_files:1,total_bytes:13374744,xor_checksum:1638618111,sum_checksum:1638618111" } } custom_properties { key: "payload_format" value { string_value: "FORMAT_TF_EXAMPLE" } } custom_properties { key: "span" value { int_value: 0 } } custom_properties { key: "state" value { string_value: "published" } } custom_properties { key: "tfx_version" value { string_value: "1.2.0" } } state: LIVE , artifact_type: id: 14 name: "Examples" properties { key: "span" value: INT } properties { key: "split_names" value: STRING } properties { key: "version" value: INT } )] additional_properties: {} additional_custom_properties: {} )} ["train", "eval"]
Les sorties du composant incluent 2 artefacts :
- les exemples de formation (10 000 avis labellisés + 10 000 avis non labellisés)
- les exemples d'évaluation (10 000 avis labellisés)
Le composant personnalisé IdentifierExamples
Pour utiliser NSL, nous aurons besoin que chaque instance ait un identifiant unique. Nous créons un composant personnalisé qui ajoute un tel ID unique à toutes les instances de toutes les divisions. Nous misons sur Apache faisceau pour pouvoir accéder aisément à de grands ensembles de données en cas de besoin.
def make_example_with_unique_id(example, id_feature_name):
"""Adds a unique ID to the given `tf.train.Example` proto.
This function uses Python's 'uuid' module to generate a universally unique
identifier for each example.
Args:
example: An instance of a `tf.train.Example` proto.
id_feature_name: The name of the feature in the resulting `tf.train.Example`
that will contain the unique identifier.
Returns:
A new `tf.train.Example` proto that includes a unique identifier as an
additional feature.
"""
result = tf.train.Example()
result.CopyFrom(example)
unique_id = uuid.uuid4()
result.features.feature.get_or_create(
id_feature_name).bytes_list.MergeFrom(
tf.train.BytesList(value=[str(unique_id).encode('utf-8')]))
return result
@component
def IdentifyExamples(orig_examples: InputArtifact[Examples],
identified_examples: OutputArtifact[Examples],
id_feature_name: Parameter[str],
component_name: Parameter[str]) -> None:
# Get a list of the splits in input_data
splits_list = artifact_utils.decode_split_names(
split_names=orig_examples.split_names)
# For completeness, encode the splits names and payload_format.
# We could also just use input_data.split_names.
identified_examples.split_names = artifact_utils.encode_split_names(
splits=splits_list)
# TODO(b/168616829): Remove populating payload_format after tfx 0.25.0.
identified_examples.set_string_custom_property(
"payload_format",
orig_examples.get_string_custom_property("payload_format"))
for split in splits_list:
input_dir = artifact_utils.get_split_uri([orig_examples], split)
output_dir = artifact_utils.get_split_uri([identified_examples], split)
os.mkdir(output_dir)
with beam.Pipeline() as pipeline:
(pipeline
| 'ReadExamples' >> beam.io.ReadFromTFRecord(
os.path.join(input_dir, '*'),
coder=beam.coders.coders.ProtoCoder(tf.train.Example))
| 'AddUniqueId' >> beam.Map(make_example_with_unique_id, id_feature_name)
| 'WriteIdentifiedExamples' >> beam.io.WriteToTFRecord(
file_path_prefix=os.path.join(output_dir, 'data_tfrecord'),
coder=beam.coders.coders.ProtoCoder(tf.train.Example),
file_name_suffix='.gz'))
return
identify_examples = IdentifyExamples(
orig_examples=example_gen.outputs['examples'],
component_name=u'IdentifyExamples',
id_feature_name=u'id')
context.run(identify_examples, enable_cache=False)
WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter. WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter.
Le composant StatisticsGen
Le StatisticsGen
composant calcule les statistiques descriptives pour votre ensemble de données. Les statistiques qu'il génère peuvent être visualisées pour examen, et sont utilisées par exemple pour la validation et pour déduire un schéma.
Créez un composant StatisticsGen et exécutez-le.
# Computes statistics over data for visualization and example validation.
statistics_gen = StatisticsGen(
examples=identify_examples.outputs["identified_examples"])
context.run(statistics_gen, enable_cache=True)
WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter.
Le composant SchemaGen
Le SchemaGen
composant génère un schéma pour vos données basées sur les statistiques de StatisticsGen. Il essaie de déduire les types de données de chacune de vos caractéristiques et les plages de valeurs légales pour les caractéristiques catégorielles.
Créez un composant SchemaGen et exécutez-le.
# Generates schema based on statistics files.
schema_gen = SchemaGen(
statistics=statistics_gen.outputs['statistics'], infer_feature_shape=False)
context.run(schema_gen, enable_cache=True)
WARNING: Logging before InitGoogleLogging() is written to STDERR I1204 11:42:13.777263 6839 rdbms_metadata_access_object.cc:686] No property is defined for the Type
L'artefact généré est juste un schema.pbtxt
contenant une représentation textuelle d'un schema_pb2.Schema
protobuf:
train_uri = schema_gen.outputs['schema'].get()[0].uri
schema_filename = os.path.join(train_uri, 'schema.pbtxt')
schema = tfx.utils.io_utils.parse_pbtxt_file(
file_name=schema_filename, message=schema_pb2.Schema())
Il peut être visualisé à l' aide tfdv.display_schema()
(nous verrons cela plus en détail dans un laboratoire ultérieur):
tfdv.display_schema(schema)
Le composant ExampleValidator
Le ExampleValidator
effectue détection des anomalies, sur la base des statistiques de StatisticsGen et le schéma de SchemaGen. Il recherche des problèmes tels que des valeurs manquantes, des valeurs de type incorrect ou des valeurs catégorielles en dehors du domaine des valeurs acceptables.
Créez un composant ExampleValidator et exécutez-le.
# Performs anomaly detection based on statistics and data schema.
validate_stats = ExampleValidator(
statistics=statistics_gen.outputs['statistics'],
schema=schema_gen.outputs['schema'])
context.run(validate_stats, enable_cache=False)
Le composant SynthesizeGraph
La construction d'un graphique implique la création d'incorporations pour des échantillons de texte, puis l'utilisation d'une fonction de similarité pour comparer les incorporations.
Nous utiliserons incorporations pour créer Swivel incorporations pré - entraîné dans le tf.train.Example
format pour chaque échantillon dans l'entrée. Nous conserverons les incorporations résultant du TFRecord
le format ainsi que l'ID de l'échantillon. Ceci est important et nous permettra de faire correspondre des exemples de plongements avec les nœuds correspondants dans le graphe plus tard.
Une fois que nous avons les exemples de plongements, nous les utiliserons pour construire un graphe de similarité, c'est-à-dire que les nœuds de ce graphe correspondront aux échantillons et les arêtes de ce graphe correspondront à la similarité entre les paires de nœuds.
Neural Structured Learning fournit une bibliothèque de création de graphiques pour créer un graphique basé sur des exemples d'intégration. Il utilise la similarité cosinus comme la mesure de similarité pour comparer les incorporations et les bords de construction entre eux. Cela nous permet également de spécifier un seuil de similarité, qui peut être utilisé pour éliminer les arêtes dissemblables du graphe final. Dans l'exemple suivant, en utilisant 0,99 comme seuil de similarité, nous nous retrouvons avec un graphique qui a 111 066 arêtes bidirectionnelles.
swivel_url = 'https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1'
hub_layer = hub.KerasLayer(swivel_url, input_shape=[], dtype=tf.string)
def _bytes_feature(value):
"""Returns a bytes_list from a string / byte."""
return tf.train.Feature(bytes_list=tf.train.BytesList(value=value))
def _float_feature(value):
"""Returns a float_list from a float / double."""
return tf.train.Feature(float_list=tf.train.FloatList(value=value))
def create_embedding_example(example):
"""Create tf.Example containing the sample's embedding and its ID."""
sentence_embedding = hub_layer(tf.sparse.to_dense(example['text']))
# Flatten the sentence embedding back to 1-D.
sentence_embedding = tf.reshape(sentence_embedding, shape=[-1])
feature_dict = {
'id': _bytes_feature(tf.sparse.to_dense(example['id']).numpy()),
'embedding': _float_feature(sentence_embedding.numpy().tolist())
}
return tf.train.Example(features=tf.train.Features(feature=feature_dict))
def create_dataset(uri):
tfrecord_filenames = [os.path.join(uri, name) for name in os.listdir(uri)]
return tf.data.TFRecordDataset(tfrecord_filenames, compression_type='GZIP')
def create_embeddings(train_path, output_path):
dataset = create_dataset(train_path)
embeddings_path = os.path.join(output_path, 'embeddings.tfr')
feature_map = {
'label': tf.io.FixedLenFeature([], tf.int64),
'id': tf.io.VarLenFeature(tf.string),
'text': tf.io.VarLenFeature(tf.string)
}
with tf.io.TFRecordWriter(embeddings_path) as writer:
for tfrecord in dataset:
tensor_dict = tf.io.parse_single_example(tfrecord, feature_map)
embedding_example = create_embedding_example(tensor_dict)
writer.write(embedding_example.SerializeToString())
def build_graph(output_path, similarity_threshold):
embeddings_path = os.path.join(output_path, 'embeddings.tfr')
graph_path = os.path.join(output_path, 'graph.tsv')
graph_builder_config = nsl.configs.GraphBuilderConfig(
similarity_threshold=similarity_threshold,
lsh_splits=32,
lsh_rounds=15,
random_seed=12345)
nsl.tools.build_graph_from_config([embeddings_path], graph_path,
graph_builder_config)
"""Custom Artifact type"""
class SynthesizedGraph(tfx.types.artifact.Artifact):
"""Output artifact of the SynthesizeGraph component"""
TYPE_NAME = 'SynthesizedGraphPath'
PROPERTIES = {
'span': standard_artifacts.SPAN_PROPERTY,
'split_names': standard_artifacts.SPLIT_NAMES_PROPERTY,
}
@component
def SynthesizeGraph(identified_examples: InputArtifact[Examples],
synthesized_graph: OutputArtifact[SynthesizedGraph],
similarity_threshold: Parameter[float],
component_name: Parameter[str]) -> None:
# Get a list of the splits in input_data
splits_list = artifact_utils.decode_split_names(
split_names=identified_examples.split_names)
# We build a graph only based on the 'Split-train' split which includes both
# labeled and unlabeled examples.
train_input_examples_uri = os.path.join(identified_examples.uri,
'Split-train')
output_graph_uri = os.path.join(synthesized_graph.uri, 'Split-train')
os.mkdir(output_graph_uri)
print('Creating embeddings...')
create_embeddings(train_input_examples_uri, output_graph_uri)
print('Synthesizing graph...')
build_graph(output_graph_uri, similarity_threshold)
synthesized_graph.split_names = artifact_utils.encode_split_names(
splits=['Split-train'])
return
synthesize_graph = SynthesizeGraph(
identified_examples=identify_examples.outputs['identified_examples'],
component_name=u'SynthesizeGraph',
similarity_threshold=0.99)
context.run(synthesize_graph, enable_cache=False)
Creating embeddings... Synthesizing graph...
train_uri = synthesize_graph.outputs["synthesized_graph"].get()[0].uri
os.listdir(train_uri)
['Split-train']
graph_path = os.path.join(train_uri, "Split-train", "graph.tsv")
print("node 1\t\t\t\t\tnode 2\t\t\t\t\tsimilarity")
!head {graph_path}
print("...")
!tail {graph_path}
node 1 node 2 similarity 8c4f4c09-3dfa-4b8f-b3eb-e1596f7509ed 638cfa94-ebb5-4182-bb18-a8f4cc332131 0.990838 638cfa94-ebb5-4182-bb18-a8f4cc332131 8c4f4c09-3dfa-4b8f-b3eb-e1596f7509ed 0.990838 8c4f4c09-3dfa-4b8f-b3eb-e1596f7509ed 1f9023b1-d312-4fc5-b87f-52636c7b0ea8 0.990184 1f9023b1-d312-4fc5-b87f-52636c7b0ea8 8c4f4c09-3dfa-4b8f-b3eb-e1596f7509ed 0.990184 292e3cc8-7c6b-4463-98d8-5dbfa88a75f9 1ec31309-2b4a-4a4c-9f72-083f201d54a7 0.992471 1ec31309-2b4a-4a4c-9f72-083f201d54a7 292e3cc8-7c6b-4463-98d8-5dbfa88a75f9 0.992471 d5560e01-40d9-4cc0-9cd0-23355c7378f2 b78d8ee6-e404-44bf-a5bc-977b883d1913 0.992505 b78d8ee6-e404-44bf-a5bc-977b883d1913 d5560e01-40d9-4cc0-9cd0-23355c7378f2 0.992505 e138ef2e-4fe4-44b0-a4dc-8b01266b7ae6 b78d8ee6-e404-44bf-a5bc-977b883d1913 0.992823 b78d8ee6-e404-44bf-a5bc-977b883d1913 e138ef2e-4fe4-44b0-a4dc-8b01266b7ae6 0.992823 ... 11f44e7c-8393-4d17-8810-ca1f5e60e692 029e39a4-cd35-4e33-bdb1-8547f56a1ca7 0.991879 029e39a4-cd35-4e33-bdb1-8547f56a1ca7 11f44e7c-8393-4d17-8810-ca1f5e60e692 0.991879 4bdebeac-2f54-47a2-889c-3c2cf190e2dd 5eb7cfca-1f3d-4a32-9746-ebcca805b1d0 0.991046 5eb7cfca-1f3d-4a32-9746-ebcca805b1d0 4bdebeac-2f54-47a2-889c-3c2cf190e2dd 0.991046 e75e90af-8093-484a-883f-9f545a126208 3b2258ce-d8d7-40d5-ba1f-d771e7ddc56f 0.991198 3b2258ce-d8d7-40d5-ba1f-d771e7ddc56f e75e90af-8093-484a-883f-9f545a126208 0.991198 ce73d577-0f4d-4919-aaee-bbf8aadb12ec ba933752-a08b-4615-9b90-0731c8bfc23d 0.990260 ba933752-a08b-4615-9b90-0731c8bfc23d ce73d577-0f4d-4919-aaee-bbf8aadb12ec 0.990260 d20d75c6-eb13-41f5-865c-e6e54725fe13 648ff28d-6860-4e8f-a411-d1577a1d78ca 0.991317 648ff28d-6860-4e8f-a411-d1577a1d78ca d20d75c6-eb13-41f5-865c-e6e54725fe13 0.991317
wc -l {graph_path}
222132 /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/SynthesizeGraph/synthesized_graph/6/Split-train/graph.tsv
Le composant de transformation
Le Transform
effectue des transformations de données de composants et d' ingénierie de fonction. Les résultats incluent un graphique TensorFlow d'entrée qui est utilisé à la fois pendant l'apprentissage et pour prétraiter les données avant l'apprentissage ou l'inférence. Ce graphique devient une partie du SavedModel qui est le résultat de l'apprentissage du modèle. Étant donné que le même graphique d'entrée est utilisé à la fois pour l'entraînement et la diffusion, le prétraitement sera toujours le même et ne doit être écrit qu'une seule fois.
Le composant Transform nécessite plus de code que de nombreux autres composants en raison de la complexité arbitraire de l'ingénierie des fonctionnalités dont vous pouvez avoir besoin pour les données et/ou le modèle avec lesquels vous travaillez. Il nécessite la disponibilité de fichiers de code qui définissent le traitement nécessaire.
Chaque échantillon comprendra les trois caractéristiques suivantes :
- id: l'ID de noeud de l'échantillon.
- text_xf: Une liste de int64 contenant les identifiants de mots.
- label_xf: A singleton int64 identifiant la classe cible de l'évaluation: 0 = négatif, 1 = positif.
Définissons un module contenant le preprocessing_fn()
fonction que nous allons passer à la Transform
de composants:
_transform_module_file = 'imdb_transform.py'
%%writefile {_transform_module_file}
import tensorflow as tf
import tensorflow_transform as tft
SEQUENCE_LENGTH = 100
VOCAB_SIZE = 10000
OOV_SIZE = 100
def tokenize_reviews(reviews, sequence_length=SEQUENCE_LENGTH):
reviews = tf.strings.lower(reviews)
reviews = tf.strings.regex_replace(reviews, r" '| '|^'|'$", " ")
reviews = tf.strings.regex_replace(reviews, "[^a-z' ]", " ")
tokens = tf.strings.split(reviews)[:, :sequence_length]
start_tokens = tf.fill([tf.shape(reviews)[0], 1], "<START>")
end_tokens = tf.fill([tf.shape(reviews)[0], 1], "<END>")
tokens = tf.concat([start_tokens, tokens, end_tokens], axis=1)
tokens = tokens[:, :sequence_length]
tokens = tokens.to_tensor(default_value="<PAD>")
pad = sequence_length - tf.shape(tokens)[1]
tokens = tf.pad(tokens, [[0, 0], [0, pad]], constant_values="<PAD>")
return tf.reshape(tokens, [-1, sequence_length])
def preprocessing_fn(inputs):
"""tf.transform's callback function for preprocessing inputs.
Args:
inputs: map from feature keys to raw not-yet-transformed features.
Returns:
Map from string feature key to transformed feature operations.
"""
outputs = {}
outputs["id"] = inputs["id"]
tokens = tokenize_reviews(_fill_in_missing(inputs["text"], ''))
outputs["text_xf"] = tft.compute_and_apply_vocabulary(
tokens,
top_k=VOCAB_SIZE,
num_oov_buckets=OOV_SIZE)
outputs["label_xf"] = _fill_in_missing(inputs["label"], -1)
return outputs
def _fill_in_missing(x, default_value):
"""Replace missing values in a SparseTensor.
Fills in missing values of `x` with the default_value.
Args:
x: A `SparseTensor` of rank 2. Its dense shape should have size at most 1
in the second dimension.
default_value: the value with which to replace the missing values.
Returns:
A rank 1 tensor where missing values of `x` have been filled in.
"""
if not isinstance(x, tf.sparse.SparseTensor):
return x
return tf.squeeze(
tf.sparse.to_dense(
tf.SparseTensor(x.indices, x.values, [x.dense_shape[0], 1]),
default_value),
axis=1)
Writing imdb_transform.py
Créer et exécuter le Transform
composant, se référant aux fichiers qui ont été créés ci - dessus.
# Performs transformations and feature engineering in training and serving.
transform = Transform(
examples=identify_examples.outputs['identified_examples'],
schema=schema_gen.outputs['schema'],
module_file=_transform_module_file)
context.run(transform, enable_cache=True)
running bdist_wheel running build running build_py creating build creating build/lib copying imdb_transform.py -> build/lib installing to /tmp/tmpiem_52h3 running install running install_lib copying build/lib/imdb_transform.py -> /tmp/tmpiem_52h3 running install_egg_info running egg_info creating tfx_user_code_Transform.egg-info writing tfx_user_code_Transform.egg-info/PKG-INFO writing dependency_links to tfx_user_code_Transform.egg-info/dependency_links.txt writing top-level names to tfx_user_code_Transform.egg-info/top_level.txt writing manifest file 'tfx_user_code_Transform.egg-info/SOURCES.txt' reading manifest file 'tfx_user_code_Transform.egg-info/SOURCES.txt' writing manifest file 'tfx_user_code_Transform.egg-info/SOURCES.txt' Copying tfx_user_code_Transform.egg-info to /tmp/tmpiem_52h3/tfx_user_code_Transform-0.0+074f608d1f54105225e2fee77ebe4b6159a009eca01b5a0791099840a2185d50-py3.7.egg-info running install_scripts creating /tmp/tmpiem_52h3/tfx_user_code_Transform-0.0+074f608d1f54105225e2fee77ebe4b6159a009eca01b5a0791099840a2185d50.dist-info/WHEEL creating '/tmp/tmps6mh09_9/tfx_user_code_Transform-0.0+074f608d1f54105225e2fee77ebe4b6159a009eca01b5a0791099840a2185d50-py3-none-any.whl' and adding '/tmp/tmpiem_52h3' to it adding 'imdb_transform.py' adding 'tfx_user_code_Transform-0.0+074f608d1f54105225e2fee77ebe4b6159a009eca01b5a0791099840a2185d50.dist-info/METADATA' adding 'tfx_user_code_Transform-0.0+074f608d1f54105225e2fee77ebe4b6159a009eca01b5a0791099840a2185d50.dist-info/WHEEL' adding 'tfx_user_code_Transform-0.0+074f608d1f54105225e2fee77ebe4b6159a009eca01b5a0791099840a2185d50.dist-info/top_level.txt' adding 'tfx_user_code_Transform-0.0+074f608d1f54105225e2fee77ebe4b6159a009eca01b5a0791099840a2185d50.dist-info/RECORD' removing /tmp/tmpiem_52h3 /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/setuptools/command/install.py:37: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools. setuptools.SetuptoolsDeprecationWarning, listing git files failed - pretending there aren't any I1204 11:43:54.715353 6839 rdbms_metadata_access_object.cc:686] No property is defined for the Type I1204 11:43:54.719055 6839 rdbms_metadata_access_object.cc:686] No property is defined for the Type Processing /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/_wheels/tfx_user_code_Transform-0.0+074f608d1f54105225e2fee77ebe4b6159a009eca01b5a0791099840a2185d50-py3-none-any.whl Installing collected packages: tfx-user-code-Transform Successfully installed tfx-user-code-Transform-0.0+074f608d1f54105225e2fee77ebe4b6159a009eca01b5a0791099840a2185d50 Processing /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/_wheels/tfx_user_code_Transform-0.0+074f608d1f54105225e2fee77ebe4b6159a009eca01b5a0791099840a2185d50-py3-none-any.whl Installing collected packages: tfx-user-code-Transform Successfully installed tfx-user-code-Transform-0.0+074f608d1f54105225e2fee77ebe4b6159a009eca01b5a0791099840a2185d50 WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_transform/tf_utils.py:261: Tensor.experimental_ref (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Use ref() instead. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_transform/tf_utils.py:261: Tensor.experimental_ref (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Use ref() instead. Processing /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/_wheels/tfx_user_code_Transform-0.0+074f608d1f54105225e2fee77ebe4b6159a009eca01b5a0791099840a2185d50-py3-none-any.whl WARNING:root:This output type hint will be ignored and not used for type-checking purposes. Typically, output type hints for a PTransform are single (or nested) types wrapped by a PCollection, PDone, or None. Got: Tuple[Dict[str, Union[NoneType, _Dataset]], Union[Dict[str, Dict[str, PCollection]], NoneType]] instead. Installing collected packages: tfx-user-code-Transform Successfully installed tfx-user-code-Transform-0.0+074f608d1f54105225e2fee77ebe4b6159a009eca01b5a0791099840a2185d50 WARNING:absl:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.: compute_and_apply_vocabulary/apply_vocab/text_file_init/InitializeTableFromTextFileV2 WARNING:absl:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.: compute_and_apply_vocabulary/apply_vocab/text_file_init/InitializeTableFromTextFileV2 WARNING:root:This output type hint will be ignored and not used for type-checking purposes. Typically, output type hints for a PTransform are single (or nested) types wrapped by a PCollection, PDone, or None. Got: Tuple[Dict[str, Union[NoneType, _Dataset]], Union[Dict[str, Dict[str, PCollection]], NoneType]] instead. WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter. INFO:tensorflow:Assets written to: /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Transform/transform_graph/7/.temp_path/tftransform_tmp/41946dd1d2594124b929c5ec8c7f82cd/assets 2021-12-04 11:44:05.216878: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them. INFO:tensorflow:Assets written to: /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Transform/transform_graph/7/.temp_path/tftransform_tmp/41946dd1d2594124b929c5ec8c7f82cd/assets INFO:tensorflow:Assets written to: /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Transform/transform_graph/7/.temp_path/tftransform_tmp/50d5168031d643728b9fd8d8ede0362b/assets INFO:tensorflow:Assets written to: /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Transform/transform_graph/7/.temp_path/tftransform_tmp/50d5168031d643728b9fd8d8ede0362b/assets
La Transform
de composant a 2 types de sorties:
-
transform_graph
est le graphique qui peut effectuer les opérations de pré - traitement (ce graphique sera inclus dans les modèles de service et évaluation). -
transformed_examples
représente la formation et des données prétraitées évaluation.
transform.outputs
{'transform_graph': Channel( type_name: TransformGraph artifacts: [Artifact(artifact: id: 7 type_id: 25 uri: "/tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Transform/transform_graph/7" custom_properties { key: "name" value { string_value: "transform_graph" } } custom_properties { key: "producer_component" value { string_value: "Transform" } } custom_properties { key: "state" value { string_value: "published" } } custom_properties { key: "tfx_version" value { string_value: "1.2.0" } } state: LIVE , artifact_type: id: 25 name: "TransformGraph" )] additional_properties: {} additional_custom_properties: {} ), 'transformed_examples': Channel( type_name: Examples artifacts: [Artifact(artifact: id: 8 type_id: 14 uri: "/tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Transform/transformed_examples/7" properties { key: "split_names" value { string_value: "[\"train\", \"eval\"]" } } custom_properties { key: "name" value { string_value: "transformed_examples" } } custom_properties { key: "producer_component" value { string_value: "Transform" } } custom_properties { key: "state" value { string_value: "published" } } custom_properties { key: "tfx_version" value { string_value: "1.2.0" } } state: LIVE , artifact_type: id: 14 name: "Examples" properties { key: "span" value: INT } properties { key: "split_names" value: STRING } properties { key: "version" value: INT } )] additional_properties: {} additional_custom_properties: {} ), 'updated_analyzer_cache': Channel( type_name: TransformCache artifacts: [Artifact(artifact: id: 9 type_id: 26 uri: "/tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Transform/updated_analyzer_cache/7" custom_properties { key: "name" value { string_value: "updated_analyzer_cache" } } custom_properties { key: "producer_component" value { string_value: "Transform" } } custom_properties { key: "state" value { string_value: "published" } } custom_properties { key: "tfx_version" value { string_value: "1.2.0" } } state: LIVE , artifact_type: id: 26 name: "TransformCache" )] additional_properties: {} additional_custom_properties: {} ), 'pre_transform_schema': Channel( type_name: Schema artifacts: [Artifact(artifact: id: 10 type_id: 19 uri: "/tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Transform/pre_transform_schema/7" custom_properties { key: "name" value { string_value: "pre_transform_schema" } } custom_properties { key: "producer_component" value { string_value: "Transform" } } custom_properties { key: "state" value { string_value: "published" } } custom_properties { key: "tfx_version" value { string_value: "1.2.0" } } state: LIVE , artifact_type: id: 19 name: "Schema" )] additional_properties: {} additional_custom_properties: {} ), 'pre_transform_stats': Channel( type_name: ExampleStatistics artifacts: [Artifact(artifact: id: 11 type_id: 17 uri: "/tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Transform/pre_transform_stats/7" custom_properties { key: "name" value { string_value: "pre_transform_stats" } } custom_properties { key: "producer_component" value { string_value: "Transform" } } custom_properties { key: "state" value { string_value: "published" } } custom_properties { key: "tfx_version" value { string_value: "1.2.0" } } state: LIVE , artifact_type: id: 17 name: "ExampleStatistics" properties { key: "span" value: INT } properties { key: "split_names" value: STRING } )] additional_properties: {} additional_custom_properties: {} ), 'post_transform_schema': Channel( type_name: Schema artifacts: [Artifact(artifact: id: 12 type_id: 19 uri: "/tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Transform/post_transform_schema/7" custom_properties { key: "name" value { string_value: "post_transform_schema" } } custom_properties { key: "producer_component" value { string_value: "Transform" } } custom_properties { key: "state" value { string_value: "published" } } custom_properties { key: "tfx_version" value { string_value: "1.2.0" } } state: LIVE , artifact_type: id: 19 name: "Schema" )] additional_properties: {} additional_custom_properties: {} ), 'post_transform_stats': Channel( type_name: ExampleStatistics artifacts: [Artifact(artifact: id: 13 type_id: 17 uri: "/tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Transform/post_transform_stats/7" custom_properties { key: "name" value { string_value: "post_transform_stats" } } custom_properties { key: "producer_component" value { string_value: "Transform" } } custom_properties { key: "state" value { string_value: "published" } } custom_properties { key: "tfx_version" value { string_value: "1.2.0" } } state: LIVE , artifact_type: id: 17 name: "ExampleStatistics" properties { key: "span" value: INT } properties { key: "split_names" value: STRING } )] additional_properties: {} additional_custom_properties: {} ), 'post_transform_anomalies': Channel( type_name: ExampleAnomalies artifacts: [Artifact(artifact: id: 14 type_id: 21 uri: "/tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Transform/post_transform_anomalies/7" custom_properties { key: "name" value { string_value: "post_transform_anomalies" } } custom_properties { key: "producer_component" value { string_value: "Transform" } } custom_properties { key: "state" value { string_value: "published" } } custom_properties { key: "tfx_version" value { string_value: "1.2.0" } } state: LIVE , artifact_type: id: 21 name: "ExampleAnomalies" properties { key: "span" value: INT } properties { key: "split_names" value: STRING } )] additional_properties: {} additional_custom_properties: {} )}
Jetez un coup d' oeil à l' transform_graph
artefact: il pointe vers un répertoire contenant 3 sous - répertoires:
train_uri = transform.outputs['transform_graph'].get()[0].uri
os.listdir(train_uri)
['transform_fn', 'transformed_metadata', 'metadata']
Le transform_fn
sous - répertoire contient le graphique de pré - traitement réel. Les metadata
sous - répertoire contient le schéma des données d' origine. Le transformed_metadata
sous - répertoire contient le schéma des données prétraitées.
Jetez un œil à certains des exemples transformés et vérifiez qu'ils sont effectivement traités comme prévu.
def pprint_examples(artifact, n_examples=3):
print("artifact:", artifact)
uri = os.path.join(artifact.uri, "Split-train")
print("uri:", uri)
tfrecord_filenames = [os.path.join(uri, name) for name in os.listdir(uri)]
print("tfrecord_filenames:", tfrecord_filenames)
dataset = tf.data.TFRecordDataset(tfrecord_filenames, compression_type="GZIP")
for tfrecord in dataset.take(n_examples):
serialized_example = tfrecord.numpy()
example = tf.train.Example.FromString(serialized_example)
pp.pprint(example)
pprint_examples(transform.outputs['transformed_examples'].get()[0])
artifact: Artifact(artifact: id: 8 type_id: 14 uri: "/tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Transform/transformed_examples/7" properties { key: "split_names" value { string_value: "[\"train\", \"eval\"]" } } custom_properties { key: "name" value { string_value: "transformed_examples" } } custom_properties { key: "producer_component" value { string_value: "Transform" } } custom_properties { key: "state" value { string_value: "published" } } custom_properties { key: "tfx_version" value { string_value: "1.2.0" } } state: LIVE , artifact_type: id: 14 name: "Examples" properties { key: "span" value: INT } properties { key: "split_names" value: STRING } properties { key: "version" value: INT } ) uri: /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Transform/transformed_examples/7/Split-train tfrecord_filenames: ['/tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Transform/transformed_examples/7/Split-train/transformed_examples-00000-of-00001.gz'] features { feature { key: "id" value { bytes_list { value: "d62bf114-6f90-4a7d-ad60-559924f2582b" } } } feature { key: "label_xf" value { int64_list { value: 0 } } } feature { key: "text_xf" value { int64_list { value: 13 value: 8 value: 14 value: 32 value: 338 value: 310 value: 15 value: 95 value: 27 value: 10001 value: 9 value: 31 value: 1173 value: 3153 value: 43 value: 495 value: 10060 value: 214 value: 26 value: 71 value: 142 value: 19 value: 8 value: 204 value: 339 value: 27 value: 74 value: 181 value: 238 value: 9 value: 440 value: 67 value: 74 value: 71 value: 94 value: 100 value: 22 value: 5442 value: 8 value: 1573 value: 607 value: 530 value: 8 value: 15 value: 6 value: 32 value: 378 value: 6292 value: 207 value: 2276 value: 388 value: 0 value: 84 value: 1023 value: 154 value: 65 value: 155 value: 52 value: 0 value: 10080 value: 7871 value: 65 value: 250 value: 74 value: 3202 value: 20 value: 10000 value: 3720 value: 10020 value: 10008 value: 1282 value: 3862 value: 3 value: 53 value: 3952 value: 110 value: 1879 value: 17 value: 3153 value: 14 value: 166 value: 19 value: 2 value: 1023 value: 1007 value: 9405 value: 9 value: 2 value: 15 value: 12 value: 14 value: 4504 value: 4 value: 109 value: 158 value: 1202 value: 7 value: 174 value: 505 value: 12 } } } } features { feature { key: "id" value { bytes_list { value: "bc94a341-63f1-417c-8a1b-c723a29e67e4" } } } feature { key: "label_xf" value { int64_list { value: 0 } } } feature { key: "text_xf" value { int64_list { value: 13 value: 7 value: 23 value: 75 value: 494 value: 5 value: 748 value: 2155 value: 307 value: 91 value: 19 value: 8 value: 6 value: 499 value: 763 value: 5 value: 2 value: 1690 value: 4 value: 200 value: 593 value: 57 value: 1244 value: 120 value: 2364 value: 3 value: 4407 value: 21 value: 0 value: 10081 value: 3 value: 263 value: 42 value: 6947 value: 2 value: 169 value: 185 value: 21 value: 8 value: 5143 value: 7 value: 1339 value: 2155 value: 81 value: 0 value: 18 value: 14 value: 1468 value: 0 value: 86 value: 986 value: 14 value: 2259 value: 1790 value: 562 value: 3 value: 284 value: 200 value: 401 value: 5 value: 668 value: 19 value: 17 value: 58 value: 1934 value: 4 value: 45 value: 14 value: 4212 value: 113 value: 43 value: 135 value: 7 value: 753 value: 7 value: 224 value: 23 value: 1155 value: 179 value: 4 value: 0 value: 18 value: 19 value: 7 value: 191 value: 0 value: 2047 value: 4 value: 10 value: 3 value: 283 value: 42 value: 401 value: 5 value: 668 value: 4 value: 90 value: 234 value: 10023 value: 227 } } } } features { feature { key: "id" value { bytes_list { value: "bc8a0f68-45eb-4993-b757-52d92db1cd5a" } } } feature { key: "label_xf" value { int64_list { value: 0 } } } feature { key: "text_xf" value { int64_list { value: 13 value: 4577 value: 7158 value: 0 value: 10047 value: 3778 value: 3346 value: 9 value: 2 value: 758 value: 1915 value: 3 value: 2280 value: 1511 value: 3 value: 2003 value: 10020 value: 225 value: 786 value: 382 value: 16 value: 39 value: 203 value: 361 value: 5 value: 93 value: 11 value: 11 value: 19 value: 220 value: 21 value: 341 value: 2 value: 10000 value: 966 value: 0 value: 77 value: 4 value: 6677 value: 464 value: 10071 value: 5 value: 10042 value: 630 value: 2 value: 10044 value: 404 value: 2 value: 10044 value: 3 value: 5 value: 10008 value: 0 value: 1259 value: 630 value: 106 value: 10042 value: 6721 value: 10 value: 49 value: 21 value: 0 value: 2071 value: 20 value: 1292 value: 4 value: 0 value: 431 value: 11 value: 11 value: 166 value: 67 value: 2342 value: 5815 value: 12 value: 575 value: 21 value: 0 value: 1691 value: 537 value: 4 value: 0 value: 3605 value: 307 value: 0 value: 10054 value: 1563 value: 3115 value: 467 value: 4577 value: 3 value: 1069 value: 1158 value: 5 value: 23 value: 4279 value: 6677 value: 464 value: 20 value: 10004 } } } }
Le composant GraphAugmentation
Puisque nous avons les exemples de caractéristiques et le graphique synthétisé, nous pouvons générer les données d'entraînement augmentées pour l'apprentissage structuré neuronal. Le cadre NSL fournit une bibliothèque pour combiner le graphique et les exemples de caractéristiques afin de produire les données d'apprentissage finales pour la régularisation du graphique. Les données d'apprentissage résultantes incluront les caractéristiques d'échantillon d'origine ainsi que les caractéristiques de leurs voisins correspondants.
Dans ce didacticiel, nous considérons les arêtes non dirigées et utilisons un maximum de 3 voisins par échantillon pour augmenter les données d'apprentissage avec des voisins de graphe.
def split_train_and_unsup(input_uri):
'Separate the labeled and unlabeled instances.'
tmp_dir = tempfile.mkdtemp(prefix='tfx-data')
tfrecord_filenames = [
os.path.join(input_uri, filename) for filename in os.listdir(input_uri)
]
train_path = os.path.join(tmp_dir, 'train.tfrecord')
unsup_path = os.path.join(tmp_dir, 'unsup.tfrecord')
with tf.io.TFRecordWriter(train_path) as train_writer, \
tf.io.TFRecordWriter(unsup_path) as unsup_writer:
for tfrecord in tf.data.TFRecordDataset(
tfrecord_filenames, compression_type='GZIP'):
example = tf.train.Example()
example.ParseFromString(tfrecord.numpy())
if ('label_xf' not in example.features.feature or
example.features.feature['label_xf'].int64_list.value[0] == -1):
writer = unsup_writer
else:
writer = train_writer
writer.write(tfrecord.numpy())
return train_path, unsup_path
def gzip(filepath):
with open(filepath, 'rb') as f_in:
with gzip_lib.open(filepath + '.gz', 'wb') as f_out:
shutil.copyfileobj(f_in, f_out)
os.remove(filepath)
def copy_tfrecords(input_uri, output_uri):
for filename in os.listdir(input_uri):
input_filename = os.path.join(input_uri, filename)
output_filename = os.path.join(output_uri, filename)
shutil.copyfile(input_filename, output_filename)
@component
def GraphAugmentation(identified_examples: InputArtifact[Examples],
synthesized_graph: InputArtifact[SynthesizedGraph],
augmented_examples: OutputArtifact[Examples],
num_neighbors: Parameter[int],
component_name: Parameter[str]) -> None:
# Get a list of the splits in input_data
splits_list = artifact_utils.decode_split_names(
split_names=identified_examples.split_names)
train_input_uri = os.path.join(identified_examples.uri, 'Split-train')
eval_input_uri = os.path.join(identified_examples.uri, 'Split-eval')
train_graph_uri = os.path.join(synthesized_graph.uri, 'Split-train')
train_output_uri = os.path.join(augmented_examples.uri, 'Split-train')
eval_output_uri = os.path.join(augmented_examples.uri, 'Split-eval')
os.mkdir(train_output_uri)
os.mkdir(eval_output_uri)
# Separate the labeled and unlabeled examples from the 'Split-train' split.
train_path, unsup_path = split_train_and_unsup(train_input_uri)
output_path = os.path.join(train_output_uri, 'nsl_train_data.tfr')
pack_nbrs_args = dict(
labeled_examples_path=train_path,
unlabeled_examples_path=unsup_path,
graph_path=os.path.join(train_graph_uri, 'graph.tsv'),
output_training_data_path=output_path,
add_undirected_edges=True,
max_nbrs=num_neighbors)
print('nsl.tools.pack_nbrs arguments:', pack_nbrs_args)
nsl.tools.pack_nbrs(**pack_nbrs_args)
# Downstream components expect gzip'ed TFRecords.
gzip(output_path)
# The test examples are left untouched and are simply copied over.
copy_tfrecords(eval_input_uri, eval_output_uri)
augmented_examples.split_names = identified_examples.split_names
return
# Augments training data with graph neighbors.
graph_augmentation = GraphAugmentation(
identified_examples=transform.outputs['transformed_examples'],
synthesized_graph=synthesize_graph.outputs['synthesized_graph'],
component_name=u'GraphAugmentation',
num_neighbors=3)
context.run(graph_augmentation, enable_cache=False)
nsl.tools.pack_nbrs arguments: {'labeled_examples_path': '/tmp/tfx-datajju3fxrq/train.tfrecord', 'unlabeled_examples_path': '/tmp/tfx-datajju3fxrq/unsup.tfrecord', 'graph_path': '/tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/SynthesizeGraph/synthesized_graph/6/Split-train/graph.tsv', 'output_training_data_path': '/tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/GraphAugmentation/augmented_examples/8/Split-train/nsl_train_data.tfr', 'add_undirected_edges': True, 'max_nbrs': 3}
pprint_examples(graph_augmentation.outputs['augmented_examples'].get()[0], 6)
artifact: Artifact(artifact: id: 15 type_id: 14 uri: "/tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/GraphAugmentation/augmented_examples/8" properties { key: "split_names" value { string_value: "[\"train\", \"eval\"]" } } custom_properties { key: "name" value { string_value: "augmented_examples" } } custom_properties { key: "producer_component" value { string_value: "GraphAugmentation" } } custom_properties { key: "state" value { string_value: "published" } } custom_properties { key: "tfx_version" value { string_value: "1.2.0" } } state: LIVE , artifact_type: id: 14 name: "Examples" properties { key: "span" value: INT } properties { key: "split_names" value: STRING } properties { key: "version" value: INT } ) uri: /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/GraphAugmentation/augmented_examples/8/Split-train tfrecord_filenames: ['/tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/GraphAugmentation/augmented_examples/8/Split-train/nsl_train_data.tfr.gz'] features { feature { key: "NL_num_nbrs" value { int64_list { value: 0 } } } feature { key: "id" value { bytes_list { value: "d62bf114-6f90-4a7d-ad60-559924f2582b" } } } feature { key: "label_xf" value { int64_list { value: 0 } } } feature { key: "text_xf" value { int64_list { value: 13 value: 8 value: 14 value: 32 value: 338 value: 310 value: 15 value: 95 value: 27 value: 10001 value: 9 value: 31 value: 1173 value: 3153 value: 43 value: 495 value: 10060 value: 214 value: 26 value: 71 value: 142 value: 19 value: 8 value: 204 value: 339 value: 27 value: 74 value: 181 value: 238 value: 9 value: 440 value: 67 value: 74 value: 71 value: 94 value: 100 value: 22 value: 5442 value: 8 value: 1573 value: 607 value: 530 value: 8 value: 15 value: 6 value: 32 value: 378 value: 6292 value: 207 value: 2276 value: 388 value: 0 value: 84 value: 1023 value: 154 value: 65 value: 155 value: 52 value: 0 value: 10080 value: 7871 value: 65 value: 250 value: 74 value: 3202 value: 20 value: 10000 value: 3720 value: 10020 value: 10008 value: 1282 value: 3862 value: 3 value: 53 value: 3952 value: 110 value: 1879 value: 17 value: 3153 value: 14 value: 166 value: 19 value: 2 value: 1023 value: 1007 value: 9405 value: 9 value: 2 value: 15 value: 12 value: 14 value: 4504 value: 4 value: 109 value: 158 value: 1202 value: 7 value: 174 value: 505 value: 12 } } } } features { feature { key: "NL_num_nbrs" value { int64_list { value: 0 } } } feature { key: "id" value { bytes_list { value: "bc94a341-63f1-417c-8a1b-c723a29e67e4" } } } feature { key: "label_xf" value { int64_list { value: 0 } } } feature { key: "text_xf" value { int64_list { value: 13 value: 7 value: 23 value: 75 value: 494 value: 5 value: 748 value: 2155 value: 307 value: 91 value: 19 value: 8 value: 6 value: 499 value: 763 value: 5 value: 2 value: 1690 value: 4 value: 200 value: 593 value: 57 value: 1244 value: 120 value: 2364 value: 3 value: 4407 value: 21 value: 0 value: 10081 value: 3 value: 263 value: 42 value: 6947 value: 2 value: 169 value: 185 value: 21 value: 8 value: 5143 value: 7 value: 1339 value: 2155 value: 81 value: 0 value: 18 value: 14 value: 1468 value: 0 value: 86 value: 986 value: 14 value: 2259 value: 1790 value: 562 value: 3 value: 284 value: 200 value: 401 value: 5 value: 668 value: 19 value: 17 value: 58 value: 1934 value: 4 value: 45 value: 14 value: 4212 value: 113 value: 43 value: 135 value: 7 value: 753 value: 7 value: 224 value: 23 value: 1155 value: 179 value: 4 value: 0 value: 18 value: 19 value: 7 value: 191 value: 0 value: 2047 value: 4 value: 10 value: 3 value: 283 value: 42 value: 401 value: 5 value: 668 value: 4 value: 90 value: 234 value: 10023 value: 227 } } } } features { feature { key: "NL_num_nbrs" value { int64_list { value: 0 } } } feature { key: "id" value { bytes_list { value: "bc8a0f68-45eb-4993-b757-52d92db1cd5a" } } } feature { key: "label_xf" value { int64_list { value: 0 } } } feature { key: "text_xf" value { int64_list { value: 13 value: 4577 value: 7158 value: 0 value: 10047 value: 3778 value: 3346 value: 9 value: 2 value: 758 value: 1915 value: 3 value: 2280 value: 1511 value: 3 value: 2003 value: 10020 value: 225 value: 786 value: 382 value: 16 value: 39 value: 203 value: 361 value: 5 value: 93 value: 11 value: 11 value: 19 value: 220 value: 21 value: 341 value: 2 value: 10000 value: 966 value: 0 value: 77 value: 4 value: 6677 value: 464 value: 10071 value: 5 value: 10042 value: 630 value: 2 value: 10044 value: 404 value: 2 value: 10044 value: 3 value: 5 value: 10008 value: 0 value: 1259 value: 630 value: 106 value: 10042 value: 6721 value: 10 value: 49 value: 21 value: 0 value: 2071 value: 20 value: 1292 value: 4 value: 0 value: 431 value: 11 value: 11 value: 166 value: 67 value: 2342 value: 5815 value: 12 value: 575 value: 21 value: 0 value: 1691 value: 537 value: 4 value: 0 value: 3605 value: 307 value: 0 value: 10054 value: 1563 value: 3115 value: 467 value: 4577 value: 3 value: 1069 value: 1158 value: 5 value: 23 value: 4279 value: 6677 value: 464 value: 20 value: 10004 } } } } features { feature { key: "NL_num_nbrs" value { int64_list { value: 0 } } } feature { key: "id" value { bytes_list { value: "06e86044-477f-47b9-babf-825fbb5af70c" } } } feature { key: "label_xf" value { int64_list { value: 1 } } } feature { key: "text_xf" value { int64_list { value: 13 value: 8 value: 6 value: 0 value: 251 value: 4 value: 18 value: 20 value: 2 value: 6783 value: 2295 value: 2338 value: 52 value: 0 value: 468 value: 4 value: 0 value: 189 value: 73 value: 153 value: 1294 value: 17 value: 90 value: 234 value: 935 value: 16 value: 25 value: 10024 value: 92 value: 2 value: 192 value: 4218 value: 3317 value: 3 value: 10098 value: 20 value: 2 value: 356 value: 4 value: 565 value: 334 value: 382 value: 36 value: 6989 value: 3 value: 6065 value: 2510 value: 16 value: 203 value: 7264 value: 2849 value: 0 value: 86 value: 346 value: 50 value: 26 value: 58 value: 10020 value: 5 value: 1464 value: 58 value: 2081 value: 2969 value: 42 value: 2 value: 2364 value: 3 value: 1402 value: 10062 value: 138 value: 147 value: 614 value: 115 value: 29 value: 90 value: 105 value: 2 value: 223 value: 18 value: 9 value: 160 value: 324 value: 3 value: 24 value: 12 value: 1252 value: 0 value: 2142 value: 10 value: 1832 value: 111 value: 1 value: 1 value: 1 value: 1 value: 1 value: 1 value: 1 value: 1 value: 1 } } } } features { feature { key: "NL_num_nbrs" value { int64_list { value: 0 } } } feature { key: "id" value { bytes_list { value: "072dc782-850b-4286-8f4f-2f6f527db6cf" } } } feature { key: "label_xf" value { int64_list { value: 1 } } } feature { key: "text_xf" value { int64_list { value: 13 value: 16 value: 423 value: 23 value: 1367 value: 30 value: 0 value: 363 value: 12 value: 153 value: 3174 value: 9 value: 8 value: 18 value: 26 value: 667 value: 338 value: 1372 value: 0 value: 86 value: 46 value: 9200 value: 282 value: 0 value: 10091 value: 4 value: 0 value: 694 value: 10028 value: 52 value: 362 value: 26 value: 202 value: 39 value: 216 value: 5 value: 27 value: 5822 value: 19 value: 52 value: 58 value: 362 value: 26 value: 202 value: 39 value: 474 value: 0 value: 10029 value: 4 value: 2 value: 243 value: 143 value: 386 value: 3 value: 0 value: 386 value: 579 value: 2 value: 132 value: 57 value: 725 value: 88 value: 140 value: 30 value: 27 value: 33 value: 1359 value: 29 value: 8 value: 567 value: 35 value: 106 value: 230 value: 60 value: 0 value: 3041 value: 5 value: 7879 value: 28 value: 281 value: 110 value: 111 value: 1 value: 1 value: 1 value: 1 value: 1 value: 1 value: 1 value: 1 value: 1 value: 1 value: 1 value: 1 value: 1 value: 1 value: 1 value: 1 value: 1 value: 1 } } } } features { feature { key: "NL_num_nbrs" value { int64_list { value: 0 } } } feature { key: "id" value { bytes_list { value: "27da61c0-3dff-46e9-8588-d12176b3798f" } } } feature { key: "label_xf" value { int64_list { value: 1 } } } feature { key: "text_xf" value { int64_list { value: 13 value: 8 value: 6 value: 2 value: 18 value: 69 value: 140 value: 27 value: 83 value: 31 value: 1877 value: 905 value: 9 value: 10057 value: 31 value: 43 value: 2115 value: 36 value: 32 value: 2057 value: 6133 value: 10 value: 6 value: 32 value: 2474 value: 1614 value: 3 value: 2707 value: 990 value: 4 value: 10067 value: 9 value: 2 value: 1532 value: 242 value: 90 value: 3757 value: 3 value: 90 value: 10026 value: 0 value: 242 value: 6 value: 260 value: 31 value: 24 value: 4 value: 0 value: 84 value: 497 value: 177 value: 1151 value: 777 value: 9 value: 397 value: 552 value: 7726 value: 10051 value: 34 value: 14 value: 379 value: 33 value: 1829 value: 9 value: 123 value: 0 value: 916 value: 10028 value: 7 value: 64 value: 571 value: 12 value: 8 value: 18 value: 27 value: 687 value: 9 value: 30 value: 5609 value: 16 value: 25 value: 99 value: 117 value: 66 value: 2 value: 130 value: 21 value: 8 value: 842 value: 7726 value: 10051 value: 6 value: 338 value: 1107 value: 3 value: 24 value: 10020 value: 29 value: 53 value: 1476 } } } }
Le volet formateur
Les Trainer
modèles de trains utilisant des composants tensorflow.
Créer un module contenant un python trainer_fn
fonction, qui doit retourner un estimateur. Si vous préférez créer un modèle Keras, vous pouvez le faire et puis le convertir en un estimateur en utilisant keras.model_to_estimator()
.
# Setup paths.
_trainer_module_file = 'imdb_trainer.py'
%%writefile {_trainer_module_file}
import neural_structured_learning as nsl
import tensorflow as tf
import tensorflow_model_analysis as tfma
import tensorflow_transform as tft
from tensorflow_transform.tf_metadata import schema_utils
NBR_FEATURE_PREFIX = 'NL_nbr_'
NBR_WEIGHT_SUFFIX = '_weight'
LABEL_KEY = 'label'
ID_FEATURE_KEY = 'id'
def _transformed_name(key):
return key + '_xf'
def _transformed_names(keys):
return [_transformed_name(key) for key in keys]
# Hyperparameters:
#
# We will use an instance of `HParams` to inclue various hyperparameters and
# constants used for training and evaluation. We briefly describe each of them
# below:
#
# - max_seq_length: This is the maximum number of words considered from each
# movie review in this example.
# - vocab_size: This is the size of the vocabulary considered for this
# example.
# - oov_size: This is the out-of-vocabulary size considered for this example.
# - distance_type: This is the distance metric used to regularize the sample
# with its neighbors.
# - graph_regularization_multiplier: This controls the relative weight of the
# graph regularization term in the overall
# loss function.
# - num_neighbors: The number of neighbors used for graph regularization. This
# value has to be less than or equal to the `num_neighbors`
# argument used above in the GraphAugmentation component when
# invoking `nsl.tools.pack_nbrs`.
# - num_fc_units: The number of units in the fully connected layer of the
# neural network.
class HParams(object):
"""Hyperparameters used for training."""
def __init__(self):
### dataset parameters
# The following 3 values should match those defined in the Transform
# Component.
self.max_seq_length = 100
self.vocab_size = 10000
self.oov_size = 100
### Neural Graph Learning parameters
self.distance_type = nsl.configs.DistanceType.L2
self.graph_regularization_multiplier = 0.1
# The following value has to be at most the value of 'num_neighbors' used
# in the GraphAugmentation component.
self.num_neighbors = 1
### Model Architecture
self.num_embedding_dims = 16
self.num_fc_units = 64
HPARAMS = HParams()
def optimizer_fn():
"""Returns an instance of `tf.Optimizer`."""
return tf.compat.v1.train.RMSPropOptimizer(
learning_rate=0.0001, decay=1e-6)
def build_train_op(loss, global_step):
"""Builds a train op to optimize the given loss using gradient descent."""
with tf.name_scope('train'):
optimizer = optimizer_fn()
train_op = optimizer.minimize(loss=loss, global_step=global_step)
return train_op
# Building the model:
#
# A neural network is created by stacking layers—this requires two main
# architectural decisions:
# * How many layers to use in the model?
# * How many *hidden units* to use for each layer?
#
# In this example, the input data consists of an array of word-indices. The
# labels to predict are either 0 or 1. We will use a feed-forward neural network
# as our base model in this tutorial.
def feed_forward_model(features, is_training, reuse=tf.compat.v1.AUTO_REUSE):
"""Builds a simple 2 layer feed forward neural network.
The layers are effectively stacked sequentially to build the classifier. The
first layer is an Embedding layer, which takes the integer-encoded vocabulary
and looks up the embedding vector for each word-index. These vectors are
learned as the model trains. The vectors add a dimension to the output array.
The resulting dimensions are: (batch, sequence, embedding). Next is a global
average pooling 1D layer, which reduces the dimensionality of its inputs from
3D to 2D. This fixed-length output vector is piped through a fully-connected
(Dense) layer with 16 hidden units. The last layer is densely connected with a
single output node. Using the sigmoid activation function, this value is a
float between 0 and 1, representing a probability, or confidence level.
Args:
features: A dictionary containing batch features returned from the
`input_fn`, that include sample features, corresponding neighbor features,
and neighbor weights.
is_training: a Python Boolean value or a Boolean scalar Tensor, indicating
whether to apply dropout.
reuse: a Python Boolean value for reusing variable scope.
Returns:
logits: Tensor of shape [batch_size, 1].
representations: Tensor of shape [batch_size, _] for graph regularization.
This is the representation of each example at the graph regularization
layer.
"""
with tf.compat.v1.variable_scope('ff', reuse=reuse):
inputs = features[_transformed_name('text')]
embeddings = tf.compat.v1.get_variable(
'embeddings',
shape=[
HPARAMS.vocab_size + HPARAMS.oov_size, HPARAMS.num_embedding_dims
])
embedding_layer = tf.nn.embedding_lookup(embeddings, inputs)
pooling_layer = tf.compat.v1.layers.AveragePooling1D(
pool_size=HPARAMS.max_seq_length, strides=HPARAMS.max_seq_length)(
embedding_layer)
# Shape of pooling_layer is now [batch_size, 1, HPARAMS.num_embedding_dims]
pooling_layer = tf.reshape(pooling_layer, [-1, HPARAMS.num_embedding_dims])
dense_layer = tf.compat.v1.layers.Dense(
16, activation='relu')(
pooling_layer)
output_layer = tf.compat.v1.layers.Dense(
1, activation='sigmoid')(
dense_layer)
# Graph regularization will be done on the penultimate (dense) layer
# because the output layer is a single floating point number.
return output_layer, dense_layer
# A note on hidden units:
#
# The above model has two intermediate or "hidden" layers, between the input and
# output, and excluding the Embedding layer. The number of outputs (units,
# nodes, or neurons) is the dimension of the representational space for the
# layer. In other words, the amount of freedom the network is allowed when
# learning an internal representation. If a model has more hidden units
# (a higher-dimensional representation space), and/or more layers, then the
# network can learn more complex representations. However, it makes the network
# more computationally expensive and may lead to learning unwanted
# patterns—patterns that improve performance on training data but not on the
# test data. This is called overfitting.
# This function will be used to generate the embeddings for samples and their
# corresponding neighbors, which will then be used for graph regularization.
def embedding_fn(features, mode):
"""Returns the embedding corresponding to the given features.
Args:
features: A dictionary containing batch features returned from the
`input_fn`, that include sample features, corresponding neighbor features,
and neighbor weights.
mode: Specifies if this is training, evaluation, or prediction. See
tf.estimator.ModeKeys.
Returns:
The embedding that will be used for graph regularization.
"""
is_training = (mode == tf.estimator.ModeKeys.TRAIN)
_, embedding = feed_forward_model(features, is_training)
return embedding
def feed_forward_model_fn(features, labels, mode, params, config):
"""Implementation of the model_fn for the base feed-forward model.
Args:
features: This is the first item returned from the `input_fn` passed to
`train`, `evaluate`, and `predict`. This should be a single `Tensor` or
`dict` of same.
labels: This is the second item returned from the `input_fn` passed to
`train`, `evaluate`, and `predict`. This should be a single `Tensor` or
`dict` of same (for multi-head models). If mode is `ModeKeys.PREDICT`,
`labels=None` will be passed. If the `model_fn`'s signature does not
accept `mode`, the `model_fn` must still be able to handle `labels=None`.
mode: Optional. Specifies if this training, evaluation or prediction. See
`ModeKeys`.
params: An HParams instance as returned by get_hyper_parameters().
config: Optional configuration object. Will receive what is passed to
Estimator in `config` parameter, or the default `config`. Allows updating
things in your model_fn based on configuration such as `num_ps_replicas`,
or `model_dir`. Unused currently.
Returns:
A `tf.estimator.EstimatorSpec` for the base feed-forward model. This does
not include graph-based regularization.
"""
is_training = mode == tf.estimator.ModeKeys.TRAIN
# Build the computation graph.
probabilities, _ = feed_forward_model(features, is_training)
predictions = tf.round(probabilities)
if mode == tf.estimator.ModeKeys.PREDICT:
# labels will be None, and no loss to compute.
cross_entropy_loss = None
eval_metric_ops = None
else:
# Loss is required in train and eval modes.
# Flatten 'probabilities' to 1-D.
probabilities = tf.reshape(probabilities, shape=[-1])
cross_entropy_loss = tf.compat.v1.keras.losses.binary_crossentropy(
labels, probabilities)
eval_metric_ops = {
'accuracy': tf.compat.v1.metrics.accuracy(labels, predictions)
}
if is_training:
global_step = tf.compat.v1.train.get_or_create_global_step()
train_op = build_train_op(cross_entropy_loss, global_step)
else:
train_op = None
return tf.estimator.EstimatorSpec(
mode=mode,
predictions={
'probabilities': probabilities,
'predictions': predictions
},
loss=cross_entropy_loss,
train_op=train_op,
eval_metric_ops=eval_metric_ops)
# Tf.Transform considers these features as "raw"
def _get_raw_feature_spec(schema):
return schema_utils.schema_as_feature_spec(schema).feature_spec
def _gzip_reader_fn(filenames):
"""Small utility returning a record reader that can read gzip'ed files."""
return tf.data.TFRecordDataset(
filenames,
compression_type='GZIP')
def _example_serving_receiver_fn(tf_transform_output, schema):
"""Build the serving in inputs.
Args:
tf_transform_output: A TFTransformOutput.
schema: the schema of the input data.
Returns:
Tensorflow graph which parses examples, applying tf-transform to them.
"""
raw_feature_spec = _get_raw_feature_spec(schema)
raw_feature_spec.pop(LABEL_KEY)
# We don't need the ID feature for serving.
raw_feature_spec.pop(ID_FEATURE_KEY)
raw_input_fn = tf.estimator.export.build_parsing_serving_input_receiver_fn(
raw_feature_spec, default_batch_size=None)
serving_input_receiver = raw_input_fn()
transformed_features = tf_transform_output.transform_raw_features(
serving_input_receiver.features)
# Even though, LABEL_KEY was removed from 'raw_feature_spec', the transform
# operation would have injected the transformed LABEL_KEY feature with a
# default value.
transformed_features.pop(_transformed_name(LABEL_KEY))
return tf.estimator.export.ServingInputReceiver(
transformed_features, serving_input_receiver.receiver_tensors)
def _eval_input_receiver_fn(tf_transform_output, schema):
"""Build everything needed for the tf-model-analysis to run the model.
Args:
tf_transform_output: A TFTransformOutput.
schema: the schema of the input data.
Returns:
EvalInputReceiver function, which contains:
- Tensorflow graph which parses raw untransformed features, applies the
tf-transform preprocessing operators.
- Set of raw, untransformed features.
- Label against which predictions will be compared.
"""
# Notice that the inputs are raw features, not transformed features here.
raw_feature_spec = _get_raw_feature_spec(schema)
# We don't need the ID feature for TFMA.
raw_feature_spec.pop(ID_FEATURE_KEY)
raw_input_fn = tf.estimator.export.build_parsing_serving_input_receiver_fn(
raw_feature_spec, default_batch_size=None)
serving_input_receiver = raw_input_fn()
transformed_features = tf_transform_output.transform_raw_features(
serving_input_receiver.features)
labels = transformed_features.pop(_transformed_name(LABEL_KEY))
return tfma.export.EvalInputReceiver(
features=transformed_features,
receiver_tensors=serving_input_receiver.receiver_tensors,
labels=labels)
def _augment_feature_spec(feature_spec, num_neighbors):
"""Augments `feature_spec` to include neighbor features.
Args:
feature_spec: Dictionary of feature keys mapping to TF feature types.
num_neighbors: Number of neighbors to use for feature key augmentation.
Returns:
An augmented `feature_spec` that includes neighbor feature keys.
"""
for i in range(num_neighbors):
feature_spec['{}{}_{}'.format(NBR_FEATURE_PREFIX, i, 'id')] = \
tf.io.VarLenFeature(dtype=tf.string)
# We don't care about the neighbor features corresponding to
# _transformed_name(LABEL_KEY) because the LABEL_KEY feature will be
# removed from the feature spec during training/evaluation.
feature_spec['{}{}_{}'.format(NBR_FEATURE_PREFIX, i, 'text_xf')] = \
tf.io.FixedLenFeature(shape=[HPARAMS.max_seq_length], dtype=tf.int64,
default_value=tf.constant(0, dtype=tf.int64,
shape=[HPARAMS.max_seq_length]))
# The 'NL_num_nbrs' features is currently not used.
# Set the neighbor weight feature keys.
for i in range(num_neighbors):
feature_spec['{}{}{}'.format(NBR_FEATURE_PREFIX, i, NBR_WEIGHT_SUFFIX)] = \
tf.io.FixedLenFeature(shape=[1], dtype=tf.float32, default_value=[0.0])
return feature_spec
def _input_fn(filenames, tf_transform_output, is_training, batch_size=200):
"""Generates features and labels for training or evaluation.
Args:
filenames: [str] list of CSV files to read data from.
tf_transform_output: A TFTransformOutput.
is_training: Boolean indicating if we are in training mode.
batch_size: int First dimension size of the Tensors returned by input_fn
Returns:
A (features, indices) tuple where features is a dictionary of
Tensors, and indices is a single Tensor of label indices.
"""
transformed_feature_spec = (
tf_transform_output.transformed_feature_spec().copy())
# During training, NSL uses augmented training data (which includes features
# from graph neighbors). So, update the feature spec accordingly. This needs
# to be done because we are using different schemas for NSL training and eval,
# but the Trainer Component only accepts a single schema.
if is_training:
transformed_feature_spec =_augment_feature_spec(transformed_feature_spec,
HPARAMS.num_neighbors)
dataset = tf.data.experimental.make_batched_features_dataset(
filenames, batch_size, transformed_feature_spec, reader=_gzip_reader_fn)
transformed_features = tf.compat.v1.data.make_one_shot_iterator(
dataset).get_next()
# We pop the label because we do not want to use it as a feature while we're
# training.
return transformed_features, transformed_features.pop(
_transformed_name(LABEL_KEY))
# TFX will call this function
def trainer_fn(hparams, schema):
"""Build the estimator using the high level API.
Args:
hparams: Holds hyperparameters used to train the model as name/value pairs.
schema: Holds the schema of the training examples.
Returns:
A dict of the following:
- estimator: The estimator that will be used for training and eval.
- train_spec: Spec for training.
- eval_spec: Spec for eval.
- eval_input_receiver_fn: Input function for eval.
"""
train_batch_size = 40
eval_batch_size = 40
tf_transform_output = tft.TFTransformOutput(hparams.transform_output)
train_input_fn = lambda: _input_fn(
hparams.train_files,
tf_transform_output,
is_training=True,
batch_size=train_batch_size)
eval_input_fn = lambda: _input_fn(
hparams.eval_files,
tf_transform_output,
is_training=False,
batch_size=eval_batch_size)
train_spec = tf.estimator.TrainSpec(
train_input_fn,
max_steps=hparams.train_steps)
serving_receiver_fn = lambda: _example_serving_receiver_fn(
tf_transform_output, schema)
exporter = tf.estimator.FinalExporter('imdb', serving_receiver_fn)
eval_spec = tf.estimator.EvalSpec(
eval_input_fn,
steps=hparams.eval_steps,
exporters=[exporter],
name='imdb-eval')
run_config = tf.estimator.RunConfig(
save_checkpoints_steps=999, keep_checkpoint_max=1)
run_config = run_config.replace(model_dir=hparams.serving_model_dir)
estimator = tf.estimator.Estimator(
model_fn=feed_forward_model_fn, config=run_config, params=HPARAMS)
# Create a graph regularization config.
graph_reg_config = nsl.configs.make_graph_reg_config(
max_neighbors=HPARAMS.num_neighbors,
multiplier=HPARAMS.graph_regularization_multiplier,
distance_type=HPARAMS.distance_type,
sum_over_axis=-1)
# Invoke the Graph Regularization Estimator wrapper to incorporate
# graph-based regularization for training.
graph_nsl_estimator = nsl.estimator.add_graph_regularization(
estimator,
embedding_fn,
optimizer_fn=optimizer_fn,
graph_reg_config=graph_reg_config)
# Create an input receiver for TFMA processing
receiver_fn = lambda: _eval_input_receiver_fn(
tf_transform_output, schema)
return {
'estimator': graph_nsl_estimator,
'train_spec': train_spec,
'eval_spec': eval_spec,
'eval_input_receiver_fn': receiver_fn
}
Writing imdb_trainer.py
Créer et exécuter le Trainer
composant, il passe le fichier que nous avons créé ci - dessus.
# Uses user-provided Python function that implements a model using TensorFlow's
# Estimators API.
trainer = Trainer(
module_file=_trainer_module_file,
custom_executor_spec=executor_spec.ExecutorClassSpec(
trainer_executor.Executor),
transformed_examples=graph_augmentation.outputs['augmented_examples'],
schema=schema_gen.outputs['schema'],
transform_graph=transform.outputs['transform_graph'],
train_args=trainer_pb2.TrainArgs(num_steps=10000),
eval_args=trainer_pb2.EvalArgs(num_steps=5000))
context.run(trainer)
WARNING:absl:`custom_executor_spec` is deprecated. Please customize component directly. WARNING:absl:`transformed_examples` is deprecated. Please use `examples` instead. /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/setuptools/command/install.py:37: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools. setuptools.SetuptoolsDeprecationWarning, listing git files failed - pretending there aren't any I1204 11:44:36.000404 6839 rdbms_metadata_access_object.cc:686] No property is defined for the Type I1204 11:44:36.003713 6839 rdbms_metadata_access_object.cc:686] No property is defined for the Type WARNING:absl:Examples artifact does not have payload_format custom property. Falling back to FORMAT_TF_EXAMPLE WARNING:absl:Examples artifact does not have payload_format custom property. Falling back to FORMAT_TF_EXAMPLE WARNING:absl:Examples artifact does not have payload_format custom property. Falling back to FORMAT_TF_EXAMPLE running bdist_wheel running build running build_py creating build creating build/lib copying imdb_trainer.py -> build/lib copying imdb_transform.py -> build/lib installing to /tmp/tmpyr89v7kz running install running install_lib copying build/lib/imdb_trainer.py -> /tmp/tmpyr89v7kz copying build/lib/imdb_transform.py -> /tmp/tmpyr89v7kz running install_egg_info running egg_info creating tfx_user_code_Trainer.egg-info writing tfx_user_code_Trainer.egg-info/PKG-INFO writing dependency_links to tfx_user_code_Trainer.egg-info/dependency_links.txt writing top-level names to tfx_user_code_Trainer.egg-info/top_level.txt writing manifest file 'tfx_user_code_Trainer.egg-info/SOURCES.txt' reading manifest file 'tfx_user_code_Trainer.egg-info/SOURCES.txt' writing manifest file 'tfx_user_code_Trainer.egg-info/SOURCES.txt' Copying tfx_user_code_Trainer.egg-info to /tmp/tmpyr89v7kz/tfx_user_code_Trainer-0.0+b990a2c6a4f23081880867efa3bd3c38db9d7bd0a87a0c9b277ae63714defc8d-py3.7.egg-info running install_scripts creating /tmp/tmpyr89v7kz/tfx_user_code_Trainer-0.0+b990a2c6a4f23081880867efa3bd3c38db9d7bd0a87a0c9b277ae63714defc8d.dist-info/WHEEL creating '/tmp/tmpl71r0gnq/tfx_user_code_Trainer-0.0+b990a2c6a4f23081880867efa3bd3c38db9d7bd0a87a0c9b277ae63714defc8d-py3-none-any.whl' and adding '/tmp/tmpyr89v7kz' to it adding 'imdb_trainer.py' adding 'imdb_transform.py' adding 'tfx_user_code_Trainer-0.0+b990a2c6a4f23081880867efa3bd3c38db9d7bd0a87a0c9b277ae63714defc8d.dist-info/METADATA' adding 'tfx_user_code_Trainer-0.0+b990a2c6a4f23081880867efa3bd3c38db9d7bd0a87a0c9b277ae63714defc8d.dist-info/WHEEL' adding 'tfx_user_code_Trainer-0.0+b990a2c6a4f23081880867efa3bd3c38db9d7bd0a87a0c9b277ae63714defc8d.dist-info/top_level.txt' adding 'tfx_user_code_Trainer-0.0+b990a2c6a4f23081880867efa3bd3c38db9d7bd0a87a0c9b277ae63714defc8d.dist-info/RECORD' removing /tmp/tmpyr89v7kz Processing /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/_wheels/tfx_user_code_Trainer-0.0+b990a2c6a4f23081880867efa3bd3c38db9d7bd0a87a0c9b277ae63714defc8d-py3-none-any.whl Installing collected packages: tfx-user-code-Trainer Successfully installed tfx-user-code-Trainer-0.0+b990a2c6a4f23081880867efa3bd3c38db9d7bd0a87a0c9b277ae63714defc8d INFO:tensorflow:Using config: {'_model_dir': '/tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-Serving', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 999, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true graph_options { rewrite_options { meta_optimizer_iterations: ONE } } , '_keep_checkpoint_max': 1, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1} INFO:tensorflow:Using config: {'_model_dir': '/tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-Serving', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 999, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true graph_options { rewrite_options { meta_optimizer_iterations: ONE } } , '_keep_checkpoint_max': 1, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1} INFO:tensorflow:Not using Distribute Coordinator. INFO:tensorflow:Not using Distribute Coordinator. INFO:tensorflow:Running training and evaluation locally (non-distributed). INFO:tensorflow:Running training and evaluation locally (non-distributed). INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps 999 or save_checkpoints_secs None. INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps 999 or save_checkpoints_secs None. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version. Instructions for updating: Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version. Instructions for updating: Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts. INFO:tensorflow:Calling model_fn. INFO:tensorflow:Calling model_fn. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/training/rmsprop.py:123: calling Ones.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/training/rmsprop.py:123: calling Ones.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/ops/array_ops.py:5049: calling gather (from tensorflow.python.ops.array_ops) with validate_indices is deprecated and will be removed in a future version. Instructions for updating: The `validate_indices` argument has no effect. Indices are always validated on CPU and never validated on GPU. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/ops/array_ops.py:5049: calling gather (from tensorflow.python.ops.array_ops) with validate_indices is deprecated and will be removed in a future version. Instructions for updating: The `validate_indices` argument has no effect. Indices are always validated on CPU and never validated on GPU. INFO:tensorflow:Done calling model_fn. INFO:tensorflow:Done calling model_fn. INFO:tensorflow:Create CheckpointSaverHook. INFO:tensorflow:Create CheckpointSaverHook. INFO:tensorflow:Graph was finalized. INFO:tensorflow:Graph was finalized. INFO:tensorflow:Running local_init_op. INFO:tensorflow:Running local_init_op. INFO:tensorflow:Done running local_init_op. INFO:tensorflow:Done running local_init_op. INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0... INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0... INFO:tensorflow:Saving checkpoints for 0 into /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-Serving/model.ckpt. INFO:tensorflow:Saving checkpoints for 0 into /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-Serving/model.ckpt. INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0... INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0... INFO:tensorflow:loss = 0.6933001, step = 0 INFO:tensorflow:loss = 0.6933001, step = 0 INFO:tensorflow:global_step/sec: 220.68 INFO:tensorflow:global_step/sec: 220.68 INFO:tensorflow:loss = 0.69297814, step = 100 (0.454 sec) INFO:tensorflow:loss = 0.69297814, step = 100 (0.454 sec) INFO:tensorflow:global_step/sec: 288.754 INFO:tensorflow:global_step/sec: 288.754 INFO:tensorflow:loss = 0.6923192, step = 200 (0.347 sec) INFO:tensorflow:loss = 0.6923192, step = 200 (0.347 sec) INFO:tensorflow:global_step/sec: 286.927 INFO:tensorflow:global_step/sec: 286.927 INFO:tensorflow:loss = 0.6908457, step = 300 (0.348 sec) INFO:tensorflow:loss = 0.6908457, step = 300 (0.348 sec) INFO:tensorflow:global_step/sec: 286.211 INFO:tensorflow:global_step/sec: 286.211 INFO:tensorflow:loss = 0.6921471, step = 400 (0.350 sec) INFO:tensorflow:loss = 0.6921471, step = 400 (0.350 sec) INFO:tensorflow:global_step/sec: 282.252 INFO:tensorflow:global_step/sec: 282.252 INFO:tensorflow:loss = 0.69014025, step = 500 (0.354 sec) INFO:tensorflow:loss = 0.69014025, step = 500 (0.354 sec) INFO:tensorflow:global_step/sec: 288.814 INFO:tensorflow:global_step/sec: 288.814 INFO:tensorflow:loss = 0.6904064, step = 600 (0.346 sec) INFO:tensorflow:loss = 0.6904064, step = 600 (0.346 sec) INFO:tensorflow:global_step/sec: 275.969 INFO:tensorflow:global_step/sec: 275.969 INFO:tensorflow:loss = 0.6891232, step = 700 (0.363 sec) INFO:tensorflow:loss = 0.6891232, step = 700 (0.363 sec) INFO:tensorflow:global_step/sec: 280.819 INFO:tensorflow:global_step/sec: 280.819 INFO:tensorflow:loss = 0.69049495, step = 800 (0.356 sec) INFO:tensorflow:loss = 0.69049495, step = 800 (0.356 sec) INFO:tensorflow:global_step/sec: 278.558 INFO:tensorflow:global_step/sec: 278.558 INFO:tensorflow:loss = 0.68652004, step = 900 (0.359 sec) INFO:tensorflow:loss = 0.68652004, step = 900 (0.359 sec) INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 999... INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 999... INFO:tensorflow:Saving checkpoints for 999 into /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-Serving/model.ckpt. INFO:tensorflow:Saving checkpoints for 999 into /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-Serving/model.ckpt. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/training/saver.py:971: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to delete files with this prefix. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/training/saver.py:971: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to delete files with this prefix. INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 999... INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 999... INFO:tensorflow:Calling model_fn. INFO:tensorflow:Calling model_fn. INFO:tensorflow:Done calling model_fn. INFO:tensorflow:Done calling model_fn. INFO:tensorflow:Starting evaluation at 2021-12-04T11:44:45 INFO:tensorflow:Starting evaluation at 2021-12-04T11:44:45 INFO:tensorflow:Graph was finalized. INFO:tensorflow:Graph was finalized. INFO:tensorflow:Restoring parameters from /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-Serving/model.ckpt-999 INFO:tensorflow:Restoring parameters from /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-Serving/model.ckpt-999 INFO:tensorflow:Running local_init_op. INFO:tensorflow:Running local_init_op. INFO:tensorflow:Done running local_init_op. INFO:tensorflow:Done running local_init_op. INFO:tensorflow:Evaluation [500/5000] INFO:tensorflow:Evaluation [500/5000] INFO:tensorflow:Evaluation [1000/5000] INFO:tensorflow:Evaluation [1000/5000] INFO:tensorflow:Evaluation [1500/5000] INFO:tensorflow:Evaluation [1500/5000] INFO:tensorflow:Evaluation [2000/5000] INFO:tensorflow:Evaluation [2000/5000] INFO:tensorflow:Evaluation [2500/5000] INFO:tensorflow:Evaluation [2500/5000] INFO:tensorflow:Evaluation [3000/5000] INFO:tensorflow:Evaluation [3000/5000] INFO:tensorflow:Evaluation [3500/5000] INFO:tensorflow:Evaluation [3500/5000] INFO:tensorflow:Evaluation [4000/5000] INFO:tensorflow:Evaluation [4000/5000] INFO:tensorflow:Evaluation [4500/5000] INFO:tensorflow:Evaluation [4500/5000] INFO:tensorflow:Evaluation [5000/5000] INFO:tensorflow:Evaluation [5000/5000] INFO:tensorflow:Inference Time : 5.56428s INFO:tensorflow:Inference Time : 5.56428s INFO:tensorflow:Finished evaluation at 2021-12-04-11:44:51 INFO:tensorflow:Finished evaluation at 2021-12-04-11:44:51 INFO:tensorflow:Saving dict for global step 999: accuracy = 0.7047, global_step = 999, loss = 0.68605316 INFO:tensorflow:Saving dict for global step 999: accuracy = 0.7047, global_step = 999, loss = 0.68605316 INFO:tensorflow:Saving 'checkpoint_path' summary for global step 999: /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-Serving/model.ckpt-999 INFO:tensorflow:Saving 'checkpoint_path' summary for global step 999: /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-Serving/model.ckpt-999 INFO:tensorflow:global_step/sec: 16.1827 INFO:tensorflow:global_step/sec: 16.1827 INFO:tensorflow:loss = 0.68512, step = 1000 (6.179 sec) INFO:tensorflow:loss = 0.68512, step = 1000 (6.179 sec) INFO:tensorflow:global_step/sec: 278.496 INFO:tensorflow:global_step/sec: 278.496 INFO:tensorflow:loss = 0.6872438, step = 1100 (0.360 sec) INFO:tensorflow:loss = 0.6872438, step = 1100 (0.360 sec) INFO:tensorflow:global_step/sec: 276.552 INFO:tensorflow:global_step/sec: 276.552 INFO:tensorflow:loss = 0.6817854, step = 1200 (0.361 sec) INFO:tensorflow:loss = 0.6817854, step = 1200 (0.361 sec) INFO:tensorflow:global_step/sec: 271.064 INFO:tensorflow:global_step/sec: 271.064 INFO:tensorflow:loss = 0.6696973, step = 1300 (0.369 sec) INFO:tensorflow:loss = 0.6696973, step = 1300 (0.369 sec) INFO:tensorflow:global_step/sec: 275.856 INFO:tensorflow:global_step/sec: 275.856 INFO:tensorflow:loss = 0.6826827, step = 1400 (0.362 sec) INFO:tensorflow:loss = 0.6826827, step = 1400 (0.362 sec) INFO:tensorflow:global_step/sec: 270.879 INFO:tensorflow:global_step/sec: 270.879 INFO:tensorflow:loss = 0.6712682, step = 1500 (0.369 sec) INFO:tensorflow:loss = 0.6712682, step = 1500 (0.369 sec) INFO:tensorflow:global_step/sec: 277.073 INFO:tensorflow:global_step/sec: 277.073 INFO:tensorflow:loss = 0.67981917, step = 1600 (0.361 sec) INFO:tensorflow:loss = 0.67981917, step = 1600 (0.361 sec) INFO:tensorflow:global_step/sec: 270.234 INFO:tensorflow:global_step/sec: 270.234 INFO:tensorflow:loss = 0.67373323, step = 1700 (0.370 sec) INFO:tensorflow:loss = 0.67373323, step = 1700 (0.370 sec) INFO:tensorflow:global_step/sec: 279.658 INFO:tensorflow:global_step/sec: 279.658 INFO:tensorflow:loss = 0.66337496, step = 1800 (0.358 sec) INFO:tensorflow:loss = 0.66337496, step = 1800 (0.358 sec) INFO:tensorflow:global_step/sec: 279.271 INFO:tensorflow:global_step/sec: 279.271 INFO:tensorflow:loss = 0.6738259, step = 1900 (0.358 sec) INFO:tensorflow:loss = 0.6738259, step = 1900 (0.358 sec) INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 1998... INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 1998... INFO:tensorflow:Saving checkpoints for 1998 into /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-Serving/model.ckpt. INFO:tensorflow:Saving checkpoints for 1998 into /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-Serving/model.ckpt. INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 1998... INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 1998... INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs). INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs). INFO:tensorflow:global_step/sec: 237.544 INFO:tensorflow:global_step/sec: 237.544 INFO:tensorflow:loss = 0.66583055, step = 2000 (0.421 sec) INFO:tensorflow:loss = 0.66583055, step = 2000 (0.421 sec) INFO:tensorflow:global_step/sec: 277.133 INFO:tensorflow:global_step/sec: 277.133 INFO:tensorflow:loss = 0.6637004, step = 2100 (0.361 sec) INFO:tensorflow:loss = 0.6637004, step = 2100 (0.361 sec) INFO:tensorflow:global_step/sec: 272.248 INFO:tensorflow:global_step/sec: 272.248 INFO:tensorflow:loss = 0.6696273, step = 2200 (0.367 sec) INFO:tensorflow:loss = 0.6696273, step = 2200 (0.367 sec) INFO:tensorflow:global_step/sec: 277.247 INFO:tensorflow:global_step/sec: 277.247 INFO:tensorflow:loss = 0.6513475, step = 2300 (0.361 sec) INFO:tensorflow:loss = 0.6513475, step = 2300 (0.361 sec) INFO:tensorflow:global_step/sec: 276.598 INFO:tensorflow:global_step/sec: 276.598 INFO:tensorflow:loss = 0.6662655, step = 2400 (0.362 sec) INFO:tensorflow:loss = 0.6662655, step = 2400 (0.362 sec) INFO:tensorflow:global_step/sec: 272.004 INFO:tensorflow:global_step/sec: 272.004 INFO:tensorflow:loss = 0.6493275, step = 2500 (0.368 sec) INFO:tensorflow:loss = 0.6493275, step = 2500 (0.368 sec) INFO:tensorflow:global_step/sec: 279.613 INFO:tensorflow:global_step/sec: 279.613 INFO:tensorflow:loss = 0.64058864, step = 2600 (0.358 sec) INFO:tensorflow:loss = 0.64058864, step = 2600 (0.358 sec) INFO:tensorflow:global_step/sec: 279.725 INFO:tensorflow:global_step/sec: 279.725 INFO:tensorflow:loss = 0.6401115, step = 2700 (0.357 sec) INFO:tensorflow:loss = 0.6401115, step = 2700 (0.357 sec) INFO:tensorflow:global_step/sec: 275.868 INFO:tensorflow:global_step/sec: 275.868 INFO:tensorflow:loss = 0.66073626, step = 2800 (0.363 sec) INFO:tensorflow:loss = 0.66073626, step = 2800 (0.363 sec) INFO:tensorflow:global_step/sec: 279.9 INFO:tensorflow:global_step/sec: 279.9 INFO:tensorflow:loss = 0.61275744, step = 2900 (0.357 sec) INFO:tensorflow:loss = 0.61275744, step = 2900 (0.357 sec) INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 2997... INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 2997... INFO:tensorflow:Saving checkpoints for 2997 into /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-Serving/model.ckpt. INFO:tensorflow:Saving checkpoints for 2997 into /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-Serving/model.ckpt. INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 2997... INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 2997... INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs). INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs). INFO:tensorflow:global_step/sec: 239.223 INFO:tensorflow:global_step/sec: 239.223 INFO:tensorflow:loss = 0.6508343, step = 3000 (0.418 sec) INFO:tensorflow:loss = 0.6508343, step = 3000 (0.418 sec) INFO:tensorflow:global_step/sec: 278.547 INFO:tensorflow:global_step/sec: 278.547 INFO:tensorflow:loss = 0.65112776, step = 3100 (0.359 sec) INFO:tensorflow:loss = 0.65112776, step = 3100 (0.359 sec) INFO:tensorflow:global_step/sec: 279.487 INFO:tensorflow:global_step/sec: 279.487 INFO:tensorflow:loss = 0.63657844, step = 3200 (0.358 sec) INFO:tensorflow:loss = 0.63657844, step = 3200 (0.358 sec) INFO:tensorflow:global_step/sec: 277.617 INFO:tensorflow:global_step/sec: 277.617 INFO:tensorflow:loss = 0.6216135, step = 3300 (0.360 sec) INFO:tensorflow:loss = 0.6216135, step = 3300 (0.360 sec) INFO:tensorflow:global_step/sec: 279.256 INFO:tensorflow:global_step/sec: 279.256 INFO:tensorflow:loss = 0.64972967, step = 3400 (0.358 sec) INFO:tensorflow:loss = 0.64972967, step = 3400 (0.358 sec) INFO:tensorflow:global_step/sec: 281.028 INFO:tensorflow:global_step/sec: 281.028 INFO:tensorflow:loss = 0.6309604, step = 3500 (0.356 sec) INFO:tensorflow:loss = 0.6309604, step = 3500 (0.356 sec) INFO:tensorflow:global_step/sec: 282.144 INFO:tensorflow:global_step/sec: 282.144 INFO:tensorflow:loss = 0.59252113, step = 3600 (0.355 sec) INFO:tensorflow:loss = 0.59252113, step = 3600 (0.355 sec) INFO:tensorflow:global_step/sec: 275.802 INFO:tensorflow:global_step/sec: 275.802 INFO:tensorflow:loss = 0.5944205, step = 3700 (0.363 sec) INFO:tensorflow:loss = 0.5944205, step = 3700 (0.363 sec) INFO:tensorflow:global_step/sec: 273.658 INFO:tensorflow:global_step/sec: 273.658 INFO:tensorflow:loss = 0.63925326, step = 3800 (0.365 sec) INFO:tensorflow:loss = 0.63925326, step = 3800 (0.365 sec) INFO:tensorflow:global_step/sec: 274.902 INFO:tensorflow:global_step/sec: 274.902 INFO:tensorflow:loss = 0.6255677, step = 3900 (0.365 sec) INFO:tensorflow:loss = 0.6255677, step = 3900 (0.365 sec) INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 3996... INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 3996... INFO:tensorflow:Saving checkpoints for 3996 into /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-Serving/model.ckpt. INFO:tensorflow:Saving checkpoints for 3996 into /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-Serving/model.ckpt. INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 3996... INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 3996... INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs). INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs). INFO:tensorflow:global_step/sec: 235.472 INFO:tensorflow:global_step/sec: 235.472 INFO:tensorflow:loss = 0.5732498, step = 4000 (0.424 sec) INFO:tensorflow:loss = 0.5732498, step = 4000 (0.424 sec) INFO:tensorflow:global_step/sec: 277.885 INFO:tensorflow:global_step/sec: 277.885 INFO:tensorflow:loss = 0.59263897, step = 4100 (0.360 sec) INFO:tensorflow:loss = 0.59263897, step = 4100 (0.360 sec) INFO:tensorflow:global_step/sec: 272.498 INFO:tensorflow:global_step/sec: 272.498 INFO:tensorflow:loss = 0.6244205, step = 4200 (0.367 sec) INFO:tensorflow:loss = 0.6244205, step = 4200 (0.367 sec) INFO:tensorflow:global_step/sec: 273.911 INFO:tensorflow:global_step/sec: 273.911 INFO:tensorflow:loss = 0.5709779, step = 4300 (0.365 sec) INFO:tensorflow:loss = 0.5709779, step = 4300 (0.365 sec) INFO:tensorflow:global_step/sec: 272.385 INFO:tensorflow:global_step/sec: 272.385 INFO:tensorflow:loss = 0.57497543, step = 4400 (0.367 sec) INFO:tensorflow:loss = 0.57497543, step = 4400 (0.367 sec) INFO:tensorflow:global_step/sec: 277.073 INFO:tensorflow:global_step/sec: 277.073 INFO:tensorflow:loss = 0.62753403, step = 4500 (0.361 sec) INFO:tensorflow:loss = 0.62753403, step = 4500 (0.361 sec) INFO:tensorflow:global_step/sec: 279.972 INFO:tensorflow:global_step/sec: 279.972 INFO:tensorflow:loss = 0.5253285, step = 4600 (0.357 sec) INFO:tensorflow:loss = 0.5253285, step = 4600 (0.357 sec) INFO:tensorflow:global_step/sec: 283.916 INFO:tensorflow:global_step/sec: 283.916 INFO:tensorflow:loss = 0.5570012, step = 4700 (0.353 sec) INFO:tensorflow:loss = 0.5570012, step = 4700 (0.353 sec) INFO:tensorflow:global_step/sec: 286.699 INFO:tensorflow:global_step/sec: 286.699 INFO:tensorflow:loss = 0.54549825, step = 4800 (0.348 sec) INFO:tensorflow:loss = 0.54549825, step = 4800 (0.348 sec) INFO:tensorflow:global_step/sec: 287.171 INFO:tensorflow:global_step/sec: 287.171 INFO:tensorflow:loss = 0.58005756, step = 4900 (0.348 sec) INFO:tensorflow:loss = 0.58005756, step = 4900 (0.348 sec) INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 4995... INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 4995... INFO:tensorflow:Saving checkpoints for 4995 into /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-Serving/model.ckpt. INFO:tensorflow:Saving checkpoints for 4995 into /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-Serving/model.ckpt. INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 4995... INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 4995... INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs). INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs). INFO:tensorflow:global_step/sec: 246.035 INFO:tensorflow:global_step/sec: 246.035 INFO:tensorflow:loss = 0.55126476, step = 5000 (0.406 sec) INFO:tensorflow:loss = 0.55126476, step = 5000 (0.406 sec) INFO:tensorflow:global_step/sec: 286.048 INFO:tensorflow:global_step/sec: 286.048 INFO:tensorflow:loss = 0.5440348, step = 5100 (0.350 sec) INFO:tensorflow:loss = 0.5440348, step = 5100 (0.350 sec) INFO:tensorflow:global_step/sec: 288.158 INFO:tensorflow:global_step/sec: 288.158 INFO:tensorflow:loss = 0.530152, step = 5200 (0.347 sec) INFO:tensorflow:loss = 0.530152, step = 5200 (0.347 sec) INFO:tensorflow:global_step/sec: 282.667 INFO:tensorflow:global_step/sec: 282.667 INFO:tensorflow:loss = 0.61745214, step = 5300 (0.354 sec) INFO:tensorflow:loss = 0.61745214, step = 5300 (0.354 sec) INFO:tensorflow:global_step/sec: 283.025 INFO:tensorflow:global_step/sec: 283.025 INFO:tensorflow:loss = 0.5531441, step = 5400 (0.354 sec) INFO:tensorflow:loss = 0.5531441, step = 5400 (0.354 sec) INFO:tensorflow:global_step/sec: 284.596 INFO:tensorflow:global_step/sec: 284.596 INFO:tensorflow:loss = 0.55586976, step = 5500 (0.351 sec) INFO:tensorflow:loss = 0.55586976, step = 5500 (0.351 sec) INFO:tensorflow:global_step/sec: 283.212 INFO:tensorflow:global_step/sec: 283.212 INFO:tensorflow:loss = 0.5627943, step = 5600 (0.353 sec) INFO:tensorflow:loss = 0.5627943, step = 5600 (0.353 sec) INFO:tensorflow:global_step/sec: 281.121 INFO:tensorflow:global_step/sec: 281.121 INFO:tensorflow:loss = 0.45171082, step = 5700 (0.356 sec) INFO:tensorflow:loss = 0.45171082, step = 5700 (0.356 sec) INFO:tensorflow:global_step/sec: 281.568 INFO:tensorflow:global_step/sec: 281.568 INFO:tensorflow:loss = 0.51796657, step = 5800 (0.355 sec) INFO:tensorflow:loss = 0.51796657, step = 5800 (0.355 sec) INFO:tensorflow:global_step/sec: 272.14 INFO:tensorflow:global_step/sec: 272.14 INFO:tensorflow:loss = 0.570162, step = 5900 (0.368 sec) INFO:tensorflow:loss = 0.570162, step = 5900 (0.368 sec) INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 5994... INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 5994... INFO:tensorflow:Saving checkpoints for 5994 into /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-Serving/model.ckpt. INFO:tensorflow:Saving checkpoints for 5994 into /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-Serving/model.ckpt. INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 5994... INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 5994... INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs). INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs). INFO:tensorflow:global_step/sec: 234.957 INFO:tensorflow:global_step/sec: 234.957 INFO:tensorflow:loss = 0.5400977, step = 6000 (0.425 sec) INFO:tensorflow:loss = 0.5400977, step = 6000 (0.425 sec) INFO:tensorflow:global_step/sec: 259.23 INFO:tensorflow:global_step/sec: 259.23 INFO:tensorflow:loss = 0.4981569, step = 6100 (0.386 sec) INFO:tensorflow:loss = 0.4981569, step = 6100 (0.386 sec) INFO:tensorflow:global_step/sec: 267.697 INFO:tensorflow:global_step/sec: 267.697 INFO:tensorflow:loss = 0.5613683, step = 6200 (0.373 sec) INFO:tensorflow:loss = 0.5613683, step = 6200 (0.373 sec) INFO:tensorflow:global_step/sec: 266.623 INFO:tensorflow:global_step/sec: 266.623 INFO:tensorflow:loss = 0.48216385, step = 6300 (0.375 sec) INFO:tensorflow:loss = 0.48216385, step = 6300 (0.375 sec) INFO:tensorflow:global_step/sec: 266.123 INFO:tensorflow:global_step/sec: 266.123 INFO:tensorflow:loss = 0.4599746, step = 6400 (0.376 sec) INFO:tensorflow:loss = 0.4599746, step = 6400 (0.376 sec) INFO:tensorflow:global_step/sec: 269.688 INFO:tensorflow:global_step/sec: 269.688 INFO:tensorflow:loss = 0.4796008, step = 6500 (0.371 sec) INFO:tensorflow:loss = 0.4796008, step = 6500 (0.371 sec) INFO:tensorflow:global_step/sec: 258.906 INFO:tensorflow:global_step/sec: 258.906 INFO:tensorflow:loss = 0.5626136, step = 6600 (0.386 sec) INFO:tensorflow:loss = 0.5626136, step = 6600 (0.386 sec) INFO:tensorflow:global_step/sec: 261.596 INFO:tensorflow:global_step/sec: 261.596 INFO:tensorflow:loss = 0.5001174, step = 6700 (0.382 sec) INFO:tensorflow:loss = 0.5001174, step = 6700 (0.382 sec) INFO:tensorflow:global_step/sec: 266.467 INFO:tensorflow:global_step/sec: 266.467 INFO:tensorflow:loss = 0.44604325, step = 6800 (0.376 sec) INFO:tensorflow:loss = 0.44604325, step = 6800 (0.376 sec) INFO:tensorflow:global_step/sec: 267.785 INFO:tensorflow:global_step/sec: 267.785 INFO:tensorflow:loss = 0.4936733, step = 6900 (0.373 sec) INFO:tensorflow:loss = 0.4936733, step = 6900 (0.373 sec) INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 6993... INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 6993... INFO:tensorflow:Saving checkpoints for 6993 into /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-Serving/model.ckpt. INFO:tensorflow:Saving checkpoints for 6993 into /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-Serving/model.ckpt. INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 6993... INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 6993... INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs). INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs). INFO:tensorflow:global_step/sec: 231.159 INFO:tensorflow:global_step/sec: 231.159 INFO:tensorflow:loss = 0.44407076, step = 7000 (0.433 sec) INFO:tensorflow:loss = 0.44407076, step = 7000 (0.433 sec) INFO:tensorflow:global_step/sec: 259.935 INFO:tensorflow:global_step/sec: 259.935 INFO:tensorflow:loss = 0.4649738, step = 7100 (0.385 sec) INFO:tensorflow:loss = 0.4649738, step = 7100 (0.385 sec) INFO:tensorflow:global_step/sec: 261.497 INFO:tensorflow:global_step/sec: 261.497 INFO:tensorflow:loss = 0.48575532, step = 7200 (0.382 sec) INFO:tensorflow:loss = 0.48575532, step = 7200 (0.382 sec) INFO:tensorflow:global_step/sec: 264.401 INFO:tensorflow:global_step/sec: 264.401 INFO:tensorflow:loss = 0.5566124, step = 7300 (0.378 sec) INFO:tensorflow:loss = 0.5566124, step = 7300 (0.378 sec) INFO:tensorflow:global_step/sec: 263.189 INFO:tensorflow:global_step/sec: 263.189 INFO:tensorflow:loss = 0.485472, step = 7400 (0.380 sec) INFO:tensorflow:loss = 0.485472, step = 7400 (0.380 sec) INFO:tensorflow:global_step/sec: 262.158 INFO:tensorflow:global_step/sec: 262.158 INFO:tensorflow:loss = 0.39120063, step = 7500 (0.381 sec) INFO:tensorflow:loss = 0.39120063, step = 7500 (0.381 sec) INFO:tensorflow:global_step/sec: 266.983 INFO:tensorflow:global_step/sec: 266.983 INFO:tensorflow:loss = 0.35777277, step = 7600 (0.374 sec) INFO:tensorflow:loss = 0.35777277, step = 7600 (0.374 sec) INFO:tensorflow:global_step/sec: 267.642 INFO:tensorflow:global_step/sec: 267.642 INFO:tensorflow:loss = 0.5350034, step = 7700 (0.374 sec) INFO:tensorflow:loss = 0.5350034, step = 7700 (0.374 sec) INFO:tensorflow:global_step/sec: 269.459 INFO:tensorflow:global_step/sec: 269.459 INFO:tensorflow:loss = 0.42015103, step = 7800 (0.371 sec) INFO:tensorflow:loss = 0.42015103, step = 7800 (0.371 sec) INFO:tensorflow:global_step/sec: 267.026 INFO:tensorflow:global_step/sec: 267.026 INFO:tensorflow:loss = 0.54285204, step = 7900 (0.375 sec) INFO:tensorflow:loss = 0.54285204, step = 7900 (0.375 sec) INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 7992... INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 7992... INFO:tensorflow:Saving checkpoints for 7992 into /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-Serving/model.ckpt. INFO:tensorflow:Saving checkpoints for 7992 into /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-Serving/model.ckpt. INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 7992... INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 7992... INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs). INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs). INFO:tensorflow:global_step/sec: 226.498 INFO:tensorflow:global_step/sec: 226.498 INFO:tensorflow:loss = 0.36296645, step = 8000 (0.441 sec) INFO:tensorflow:loss = 0.36296645, step = 8000 (0.441 sec) INFO:tensorflow:global_step/sec: 262.545 INFO:tensorflow:global_step/sec: 262.545 INFO:tensorflow:loss = 0.5328135, step = 8100 (0.381 sec) INFO:tensorflow:loss = 0.5328135, step = 8100 (0.381 sec) INFO:tensorflow:global_step/sec: 264.823 INFO:tensorflow:global_step/sec: 264.823 INFO:tensorflow:loss = 0.42400876, step = 8200 (0.377 sec) INFO:tensorflow:loss = 0.42400876, step = 8200 (0.377 sec) INFO:tensorflow:global_step/sec: 270.946 INFO:tensorflow:global_step/sec: 270.946 INFO:tensorflow:loss = 0.4334933, step = 8300 (0.369 sec) INFO:tensorflow:loss = 0.4334933, step = 8300 (0.369 sec) INFO:tensorflow:global_step/sec: 271.252 INFO:tensorflow:global_step/sec: 271.252 INFO:tensorflow:loss = 0.44592458, step = 8400 (0.369 sec) INFO:tensorflow:loss = 0.44592458, step = 8400 (0.369 sec) INFO:tensorflow:global_step/sec: 272.492 INFO:tensorflow:global_step/sec: 272.492 INFO:tensorflow:loss = 0.44213057, step = 8500 (0.367 sec) INFO:tensorflow:loss = 0.44213057, step = 8500 (0.367 sec) INFO:tensorflow:global_step/sec: 273.226 INFO:tensorflow:global_step/sec: 273.226 INFO:tensorflow:loss = 0.46779203, step = 8600 (0.366 sec) INFO:tensorflow:loss = 0.46779203, step = 8600 (0.366 sec) INFO:tensorflow:global_step/sec: 261.518 INFO:tensorflow:global_step/sec: 261.518 INFO:tensorflow:loss = 0.5460389, step = 8700 (0.382 sec) INFO:tensorflow:loss = 0.5460389, step = 8700 (0.382 sec) INFO:tensorflow:global_step/sec: 277.202 INFO:tensorflow:global_step/sec: 277.202 INFO:tensorflow:loss = 0.5019726, step = 8800 (0.361 sec) INFO:tensorflow:loss = 0.5019726, step = 8800 (0.361 sec) INFO:tensorflow:global_step/sec: 276.724 INFO:tensorflow:global_step/sec: 276.724 INFO:tensorflow:loss = 0.45209432, step = 8900 (0.361 sec) INFO:tensorflow:loss = 0.45209432, step = 8900 (0.361 sec) INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 8991... INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 8991... INFO:tensorflow:Saving checkpoints for 8991 into /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-Serving/model.ckpt. INFO:tensorflow:Saving checkpoints for 8991 into /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-Serving/model.ckpt. INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 8991... INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 8991... INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs). INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs). INFO:tensorflow:global_step/sec: 226.305 INFO:tensorflow:global_step/sec: 226.305 INFO:tensorflow:loss = 0.34912163, step = 9000 (0.442 sec) INFO:tensorflow:loss = 0.34912163, step = 9000 (0.442 sec) INFO:tensorflow:global_step/sec: 271.186 INFO:tensorflow:global_step/sec: 271.186 INFO:tensorflow:loss = 0.5445255, step = 9100 (0.369 sec) INFO:tensorflow:loss = 0.5445255, step = 9100 (0.369 sec) INFO:tensorflow:global_step/sec: 267.761 INFO:tensorflow:global_step/sec: 267.761 INFO:tensorflow:loss = 0.35654712, step = 9200 (0.373 sec) INFO:tensorflow:loss = 0.35654712, step = 9200 (0.373 sec) INFO:tensorflow:global_step/sec: 262.439 INFO:tensorflow:global_step/sec: 262.439 INFO:tensorflow:loss = 0.42294815, step = 9300 (0.381 sec) INFO:tensorflow:loss = 0.42294815, step = 9300 (0.381 sec) INFO:tensorflow:global_step/sec: 262.881 INFO:tensorflow:global_step/sec: 262.881 INFO:tensorflow:loss = 0.45307142, step = 9400 (0.380 sec) INFO:tensorflow:loss = 0.45307142, step = 9400 (0.380 sec) INFO:tensorflow:global_step/sec: 264.643 INFO:tensorflow:global_step/sec: 264.643 INFO:tensorflow:loss = 0.43050554, step = 9500 (0.378 sec) INFO:tensorflow:loss = 0.43050554, step = 9500 (0.378 sec) INFO:tensorflow:global_step/sec: 270.757 INFO:tensorflow:global_step/sec: 270.757 INFO:tensorflow:loss = 0.40443382, step = 9600 (0.369 sec) INFO:tensorflow:loss = 0.40443382, step = 9600 (0.369 sec) INFO:tensorflow:global_step/sec: 268.755 INFO:tensorflow:global_step/sec: 268.755 INFO:tensorflow:loss = 0.37255523, step = 9700 (0.372 sec) INFO:tensorflow:loss = 0.37255523, step = 9700 (0.372 sec) INFO:tensorflow:global_step/sec: 264.603 INFO:tensorflow:global_step/sec: 264.603 INFO:tensorflow:loss = 0.4721123, step = 9800 (0.378 sec) INFO:tensorflow:loss = 0.4721123, step = 9800 (0.378 sec) INFO:tensorflow:global_step/sec: 273.682 INFO:tensorflow:global_step/sec: 273.682 INFO:tensorflow:loss = 0.52799636, step = 9900 (0.365 sec) INFO:tensorflow:loss = 0.52799636, step = 9900 (0.365 sec) INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 9990... INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 9990... INFO:tensorflow:Saving checkpoints for 9990 into /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-Serving/model.ckpt. INFO:tensorflow:Saving checkpoints for 9990 into /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-Serving/model.ckpt. INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 9990... INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 9990... INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs). INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs). INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 10000... INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 10000... INFO:tensorflow:Saving checkpoints for 10000 into /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-Serving/model.ckpt. INFO:tensorflow:Saving checkpoints for 10000 into /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-Serving/model.ckpt. INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 10000... INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 10000... INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs). INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs). INFO:tensorflow:Calling model_fn. INFO:tensorflow:Calling model_fn. INFO:tensorflow:Done calling model_fn. INFO:tensorflow:Done calling model_fn. INFO:tensorflow:Starting evaluation at 2021-12-04T11:45:25 INFO:tensorflow:Starting evaluation at 2021-12-04T11:45:25 INFO:tensorflow:Graph was finalized. INFO:tensorflow:Graph was finalized. INFO:tensorflow:Restoring parameters from /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-Serving/model.ckpt-10000 INFO:tensorflow:Restoring parameters from /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-Serving/model.ckpt-10000 INFO:tensorflow:Running local_init_op. INFO:tensorflow:Running local_init_op. INFO:tensorflow:Done running local_init_op. INFO:tensorflow:Done running local_init_op. INFO:tensorflow:Evaluation [500/5000] INFO:tensorflow:Evaluation [500/5000] INFO:tensorflow:Evaluation [1000/5000] INFO:tensorflow:Evaluation [1000/5000] INFO:tensorflow:Evaluation [1500/5000] INFO:tensorflow:Evaluation [1500/5000] INFO:tensorflow:Evaluation [2000/5000] INFO:tensorflow:Evaluation [2000/5000] INFO:tensorflow:Evaluation [2500/5000] INFO:tensorflow:Evaluation [2500/5000] INFO:tensorflow:Evaluation [3000/5000] INFO:tensorflow:Evaluation [3000/5000] INFO:tensorflow:Evaluation [3500/5000] INFO:tensorflow:Evaluation [3500/5000] INFO:tensorflow:Evaluation [4000/5000] INFO:tensorflow:Evaluation [4000/5000] INFO:tensorflow:Evaluation [4500/5000] INFO:tensorflow:Evaluation [4500/5000] INFO:tensorflow:Evaluation [5000/5000] INFO:tensorflow:Evaluation [5000/5000] INFO:tensorflow:Inference Time : 5.60779s INFO:tensorflow:Inference Time : 5.60779s INFO:tensorflow:Finished evaluation at 2021-12-04-11:45:30 INFO:tensorflow:Finished evaluation at 2021-12-04-11:45:30 INFO:tensorflow:Saving dict for global step 10000: accuracy = 0.8008, global_step = 10000, loss = 0.4497029 INFO:tensorflow:Saving dict for global step 10000: accuracy = 0.8008, global_step = 10000, loss = 0.4497029 INFO:tensorflow:Saving 'checkpoint_path' summary for global step 10000: /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-Serving/model.ckpt-10000 INFO:tensorflow:Saving 'checkpoint_path' summary for global step 10000: /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-Serving/model.ckpt-10000 INFO:tensorflow:Performing the final export in the end of training. INFO:tensorflow:Performing the final export in the end of training. WARNING:tensorflow:Loading a TF2 SavedModel but eager mode seems disabled. WARNING:tensorflow:Loading a TF2 SavedModel but eager mode seems disabled. INFO:tensorflow:Calling model_fn. INFO:tensorflow:Calling model_fn. INFO:tensorflow:Done calling model_fn. INFO:tensorflow:Done calling model_fn. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/saved_model/signature_def_utils_impl.py:201: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version. Instructions for updating: This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/saved_model/signature_def_utils_impl.py:201: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version. Instructions for updating: This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info. INFO:tensorflow:Signatures INCLUDED in export for Classify: None INFO:tensorflow:Signatures INCLUDED in export for Classify: None INFO:tensorflow:Signatures INCLUDED in export for Regress: None INFO:tensorflow:Signatures INCLUDED in export for Regress: None INFO:tensorflow:Signatures INCLUDED in export for Predict: ['serving_default'] INFO:tensorflow:Signatures INCLUDED in export for Predict: ['serving_default'] INFO:tensorflow:Signatures INCLUDED in export for Train: None INFO:tensorflow:Signatures INCLUDED in export for Train: None INFO:tensorflow:Signatures INCLUDED in export for Eval: None INFO:tensorflow:Signatures INCLUDED in export for Eval: None INFO:tensorflow:Restoring parameters from /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-Serving/model.ckpt-10000 INFO:tensorflow:Restoring parameters from /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-Serving/model.ckpt-10000 INFO:tensorflow:Assets added to graph. INFO:tensorflow:Assets added to graph. INFO:tensorflow:Assets written to: /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-Serving/export/imdb/temp-1638618330/assets INFO:tensorflow:Assets written to: /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-Serving/export/imdb/temp-1638618330/assets INFO:tensorflow:SavedModel written to: /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-Serving/export/imdb/temp-1638618330/saved_model.pb INFO:tensorflow:SavedModel written to: /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-Serving/export/imdb/temp-1638618330/saved_model.pb INFO:tensorflow:Loss for final step: 0.43356365. INFO:tensorflow:Loss for final step: 0.43356365. WARNING:tensorflow:Loading a TF2 SavedModel but eager mode seems disabled. WARNING:tensorflow:Loading a TF2 SavedModel but eager mode seems disabled. INFO:tensorflow:Calling model_fn. INFO:tensorflow:Calling model_fn. INFO:tensorflow:Done calling model_fn. INFO:tensorflow:Done calling model_fn. INFO:tensorflow:Signatures INCLUDED in export for Classify: None INFO:tensorflow:Signatures INCLUDED in export for Classify: None INFO:tensorflow:Signatures INCLUDED in export for Regress: None INFO:tensorflow:Signatures INCLUDED in export for Regress: None INFO:tensorflow:Signatures INCLUDED in export for Predict: None INFO:tensorflow:Signatures INCLUDED in export for Predict: None INFO:tensorflow:Signatures INCLUDED in export for Train: None INFO:tensorflow:Signatures INCLUDED in export for Train: None INFO:tensorflow:Signatures INCLUDED in export for Eval: ['eval'] INFO:tensorflow:Signatures INCLUDED in export for Eval: ['eval'] WARNING:tensorflow:Export includes no default signature! WARNING:tensorflow:Export includes no default signature! INFO:tensorflow:Restoring parameters from /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-Serving/model.ckpt-10000 INFO:tensorflow:Restoring parameters from /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-Serving/model.ckpt-10000 INFO:tensorflow:Assets added to graph. INFO:tensorflow:Assets added to graph. INFO:tensorflow:Assets written to: /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-TFMA/temp-1638618332/assets INFO:tensorflow:Assets written to: /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-TFMA/temp-1638618332/assets INFO:tensorflow:SavedModel written to: /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-TFMA/temp-1638618332/saved_model.pb INFO:tensorflow:SavedModel written to: /tmp/tfx-interactive-2021-12-04T11_41_51.482724-py59cet9/Trainer/model_run/9/Format-TFMA/temp-1638618332/saved_model.pb WARNING:absl:Support for estimator-based executor and model export will be deprecated soon. Please use export structure <ModelExportPath>/serving_model_dir/saved_model.pb" WARNING:absl:Support for estimator-based executor and model export will be deprecated soon. Please use export structure <ModelExportPath>/eval_model_dir/saved_model.pb"
Jetez un coup d' oeil au modèle formé qui a été exporté du Trainer
.
train_uri = trainer.outputs['model'].get()[0].uri
serving_model_path = os.path.join(train_uri, 'Format-Serving')
exported_model = tf.saved_model.load(serving_model_path)
exported_model.graph.get_operations()[:10] + ["..."]
[<tf.Operation 'global_step/Initializer/zeros' type=Const>, <tf.Operation 'global_step' type=VarHandleOp>, <tf.Operation 'global_step/IsInitialized/VarIsInitializedOp' type=VarIsInitializedOp>, <tf.Operation 'global_step/Assign' type=AssignVariableOp>, <tf.Operation 'global_step/Read/ReadVariableOp' type=ReadVariableOp>, <tf.Operation 'input_example_tensor' type=Placeholder>, <tf.Operation 'ParseExample/ParseExampleV2/names' type=Const>, <tf.Operation 'ParseExample/ParseExampleV2/sparse_keys' type=Const>, <tf.Operation 'ParseExample/ParseExampleV2/dense_keys' type=Const>, <tf.Operation 'ParseExample/ParseExampleV2/ragged_keys' type=Const>, '...']
Visualisons les métriques du modèle à l'aide de Tensorboard.
#docs_infra: no_execute
# Get the URI of the output artifact representing the training logs,
# which is a directory
model_run_dir = trainer.outputs['model_run'].get()[0].uri
%load_ext tensorboard
%tensorboard --logdir {model_run_dir}
Modèle au service
La régularisation du graphique n'affecte le workflow d'entraînement qu'en ajoutant un terme de régularisation à la fonction de perte. Par conséquent, l'évaluation du modèle et les workflows de diffusion restent inchangés. Il est pour la même raison que nous avons également omis de composants TFX en aval qui viennent généralement après que le composant formateur comme le Evaluator, Pusher, etc.
Conclusion
Nous avons démontré l'utilisation de la régularisation de graphe à l'aide du framework Neural Structured Learning (NSL) dans un pipeline TFX même lorsque l'entrée ne contient pas de graphe explicite. Nous avons considéré la tâche de classification des sentiments des critiques de films IMDB pour laquelle nous avons synthétisé un graphique de similarité basé sur les intégrations de critiques. Nous encourageons les utilisateurs à expérimenter davantage en utilisant différents plongements pour la construction de graphes, en faisant varier les hyperparamètres, en modifiant la quantité de supervision et en définissant différentes architectures de modèle.