Ajuda a proteger a Grande Barreira de Corais com TensorFlow em Kaggle Junte Desafio

Pré-processamento de dados com TensorFlow Transform

O componente de engenharia de recursos do TensorFlow Extended (TFX)

Este notebook exemplo colab fornece um exemplo um pouco mais avançado de como TensorFlow Transform ( tf.Transform ) pode ser usado para dados pré-processar usando exatamente o mesmo código para ambos treinamento de um modelo e servindo inferências na produção.

TensorFlow Transform é uma biblioteca para pré-processamento de dados de entrada para TensorFlow, incluindo a criação de recursos que exigem uma passagem completa sobre o conjunto de dados de treinamento. Por exemplo, usando o TensorFlow Transform, você pode:

  • Normalize um valor de entrada usando a média e o desvio padrão
  • Converta strings em inteiros, gerando um vocabulário sobre todos os valores de entrada
  • Converta flutuantes em inteiros atribuindo-os a intervalos, com base na distribuição de dados observada

O TensorFlow tem suporte integrado para manipulações em um único exemplo ou lote de exemplos. tf.Transform estende esses recursos para apoiar passes completos sobre todo o conjunto de dados de treinamento.

A saída do tf.Transform é exportado como um gráfico TensorFlow que você pode usar para treinamento e servir. Usar o mesmo gráfico para treinamento e serviço pode evitar distorções, uma vez que as mesmas transformações são aplicadas em ambos os estágios.

O que estamos fazendo neste exemplo

Neste exemplo vamos estar processando um amplamente utilizados dados do censo do conjunto de dados que contém , e treinar um modelo para fazer a classificação. Ao longo do caminho que vamos transformar os dados usando tf.Transform .

Atualizar Pip

Para evitar a atualização do Pip em um sistema quando executado localmente, verifique se estamos executando no Colab. É claro que os sistemas locais podem ser atualizados separadamente.

try:
  import colab
  !pip install --upgrade pip
except:
  pass

Instale o TensorFlow Transform

pip install tensorflow-transform

Verificação Python, importações e globais

Primeiro, teremos certeza de que estamos usando Python 3 e, em seguida, vamos instalar e importar o que precisamos.

import sys

# Confirm that we're using Python 3
assert sys.version_info.major is 3, 'Oops, not running Python 3. Use Runtime > Change runtime type'
import math
import os
import pprint

import tensorflow as tf
print('TF: {}'.format(tf.__version__))

import apache_beam as beam
print('Beam: {}'.format(beam.__version__))

import tensorflow_transform as tft
import tensorflow_transform.beam as tft_beam
print('Transform: {}'.format(tft.__version__))

from tfx_bsl.public import tfxio
from tfx_bsl.coders.example_coder import RecordBatchToExamples

!wget https://storage.googleapis.com/artifacts.tfx-oss-public.appspot.com/datasets/census/adult.data
!wget https://storage.googleapis.com/artifacts.tfx-oss-public.appspot.com/datasets/census/adult.test

train = './adult.data'
test = './adult.test'
TF: 2.6.2
Beam: 2.33.0
Transform: 1.3.0
--2021-11-09 11:18:34--  https://storage.googleapis.com/artifacts.tfx-oss-public.appspot.com/datasets/census/adult.data
Resolving storage.googleapis.com (storage.googleapis.com)... 142.251.8.128, 74.125.204.128, 64.233.189.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|142.251.8.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3974305 (3.8M) [application/octet-stream]
Saving to: ‘adult.data’

adult.data          100%[===================>]   3.79M  --.-KB/s    in 0.03s   

2021-11-09 11:18:34 (122 MB/s) - ‘adult.data’ saved [3974305/3974305]

--2021-11-09 11:18:34--  https://storage.googleapis.com/artifacts.tfx-oss-public.appspot.com/datasets/census/adult.test
Resolving storage.googleapis.com (storage.googleapis.com)... 108.177.125.128, 64.233.189.128, 74.125.204.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|108.177.125.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2003153 (1.9M) [application/octet-stream]
Saving to: ‘adult.test’

adult.test          100%[===================>]   1.91M  --.-KB/s    in 0.02s   

2021-11-09 11:18:34 (109 MB/s) - ‘adult.test’ saved [2003153/2003153]

Nomeie nossas colunas

Criaremos algumas listas úteis para referenciar as colunas em nosso conjunto de dados.

CATEGORICAL_FEATURE_KEYS = [
    'workclass',
    'education',
    'marital-status',
    'occupation',
    'relationship',
    'race',
    'sex',
    'native-country',
]
NUMERIC_FEATURE_KEYS = [
    'age',
    'capital-gain',
    'capital-loss',
    'hours-per-week',
]
OPTIONAL_NUMERIC_FEATURE_KEYS = [
    'education-num',
]
ORDERED_CSV_COLUMNS = [
    'age', 'workclass', 'fnlwgt', 'education', 'education-num',
    'marital-status', 'occupation', 'relationship', 'race', 'sex',
    'capital-gain', 'capital-loss', 'hours-per-week', 'native-country', 'label'
]
LABEL_KEY = 'label'

Defina nossos recursos e esquema

Vamos definir um esquema baseado em quais tipos as colunas estão em nossa entrada. Entre outras coisas, isso ajudará a importá-los corretamente.

RAW_DATA_FEATURE_SPEC = dict(
    [(name, tf.io.FixedLenFeature([], tf.string))
     for name in CATEGORICAL_FEATURE_KEYS] +
    [(name, tf.io.FixedLenFeature([], tf.float32))
     for name in NUMERIC_FEATURE_KEYS] +
    [(name, tf.io.VarLenFeature(tf.float32))
     for name in OPTIONAL_NUMERIC_FEATURE_KEYS] +
    [(LABEL_KEY, tf.io.FixedLenFeature([], tf.string))]
)

SCHEMA = tft.tf_metadata.dataset_metadata.DatasetMetadata(
    tft.tf_metadata.schema_utils.schema_from_feature_spec(RAW_DATA_FEATURE_SPEC)).schema

Configuração de hiperparâmetros e manutenção básica

Constantes e hiperparâmetros usados ​​para treinamento. O tamanho do intervalo inclui todas as categorias listadas na descrição do conjunto de dados, bem como um extra para "?" que representa desconhecido.

testing = os.getenv("WEB_TEST_BROWSER", False)
NUM_OOV_BUCKETS = 1
if testing:
  TRAIN_NUM_EPOCHS = 1
  NUM_TRAIN_INSTANCES = 1
  TRAIN_BATCH_SIZE = 1
  NUM_TEST_INSTANCES = 1
else:
  TRAIN_NUM_EPOCHS = 16
  NUM_TRAIN_INSTANCES = 32561
  TRAIN_BATCH_SIZE = 128
  NUM_TEST_INSTANCES = 16281

# Names of temp files
TRANSFORMED_TRAIN_DATA_FILEBASE = 'train_transformed'
TRANSFORMED_TEST_DATA_FILEBASE = 'test_transformed'
EXPORTED_MODEL_DIR = 'exported_model_dir'

Pré-processamento com tf.Transform

Criar um tf.Transform preprocessing_fn

A função de pré-processamento é o conceito mais importante de tf.Transform. Uma função de pré-processamento é onde a transformação do conjunto de dados realmente acontece. Ele aceita e retorna um dicionário de tensores, onde um tensor significa um Tensor ou SparseTensor . Existem dois grupos principais de chamadas de API que normalmente formam o coração de uma função de pré-processamento:

  1. TensorFlow Ops: Qualquer função que aceita e retorna tensores, o que geralmente significa ops TensorFlow. Isso adiciona operações do TensorFlow ao gráfico que transforma os dados brutos em dados transformados, um vetor de recursos por vez. Eles serão executados para todos os exemplos, durante o treinamento e serviço.
  2. TensorFlow Transform Analisadores: Qualquer um dos analisadores fornecidas por tf.Transform. Os analisadores também aceitam e retornam tensores, mas, ao contrário das operações do TensorFlow, eles são executados apenas uma vez, durante o treinamento, e normalmente passam por todo o conjunto de dados de treinamento. Eles criam tensor constantes , que são adicionados ao seu gráfico. Por exemplo, tft.min calcula o mínimo de um tensor sobre o conjunto de dados de treinamento. tf.Transform fornece um conjunto fixo de analisadores, mas isso será estendido em versões futuras.
def preprocessing_fn(inputs):
  """Preprocess input columns into transformed columns."""
  # Since we are modifying some features and leaving others unchanged, we
  # start by setting `outputs` to a copy of `inputs.
  outputs = inputs.copy()

  # Scale numeric columns to have range [0, 1].
  for key in NUMERIC_FEATURE_KEYS:
    outputs[key] = tft.scale_to_0_1(inputs[key])

  for key in OPTIONAL_NUMERIC_FEATURE_KEYS:
    # This is a SparseTensor because it is optional. Here we fill in a default
    # value when it is missing.
    sparse = tf.sparse.SparseTensor(inputs[key].indices, inputs[key].values,
                                    [inputs[key].dense_shape[0], 1])
    dense = tf.sparse.to_dense(sp_input=sparse, default_value=0.)
    # Reshaping from a batch of vectors of size 1 to a batch to scalars.
    dense = tf.squeeze(dense, axis=1)
    outputs[key] = tft.scale_to_0_1(dense)

  # For all categorical columns except the label column, we generate a
  # vocabulary but do not modify the feature.  This vocabulary is instead
  # used in the trainer, by means of a feature column, to convert the feature
  # from a string to an integer id.
  for key in CATEGORICAL_FEATURE_KEYS:
    outputs[key] = tft.compute_and_apply_vocabulary(
        tf.strings.strip(inputs[key]),
        num_oov_buckets=NUM_OOV_BUCKETS,
        vocab_filename=key)

  # For the label column we provide the mapping from string to index.
  table_keys = ['>50K', '<=50K']
  with tf.init_scope():
    initializer = tf.lookup.KeyValueTensorInitializer(
        keys=table_keys,
        values=tf.cast(tf.range(len(table_keys)), tf.int64),
        key_dtype=tf.string,
        value_dtype=tf.int64)
    table = tf.lookup.StaticHashTable(initializer, default_value=-1)
  # Remove trailing periods for test data when the data is read with tf.data.
  label_str = tf.strings.regex_replace(inputs[LABEL_KEY], r'\.', '')
  label_str = tf.strings.strip(label_str)
  data_labels = table.lookup(label_str)
  transformed_label = tf.one_hot(
      indices=data_labels, depth=len(table_keys), on_value=1.0, off_value=0.0)
  outputs[LABEL_KEY] = tf.reshape(transformed_label, [-1, len(table_keys)])

  return outputs

Transforme os dados

Agora estamos prontos para começar a transformar nossos dados em um pipeline do Apache Beam.

  1. Leia os dados usando o leitor CSV
  2. Transforme-o usando um pipeline de pré-processamento que dimensiona dados numéricos e converte dados categóricos de strings em índices de valores int64, criando um vocabulário para cada categoria
  3. Escreva o resultado como um TFRecord de Example protos, que usaremos para treinar um modelo mais tarde
def transform_data(train_data_file, test_data_file, working_dir):
  """Transform the data and write out as a TFRecord of Example protos.

  Read in the data using the CSV reader, and transform it using a
  preprocessing pipeline that scales numeric data and converts categorical data
  from strings to int64 values indices, by creating a vocabulary for each
  category.

  Args:
    train_data_file: File containing training data
    test_data_file: File containing test data
    working_dir: Directory to write transformed data and metadata to
  """

  # The "with" block will create a pipeline, and run that pipeline at the exit
  # of the block.
  with beam.Pipeline() as pipeline:
    with tft_beam.Context(temp_dir=tempfile.mkdtemp()):
      # Create a TFXIO to read the census data with the schema. To do this we
      # need to list all columns in order since the schema doesn't specify the
      # order of columns in the csv.
      # We first read CSV files and use BeamRecordCsvTFXIO whose .BeamSource()
      # accepts a PCollection[bytes] because we need to patch the records first
      # (see "FixCommasTrainData" below). Otherwise, tfxio.CsvTFXIO can be used
      # to both read the CSV files and parse them to TFT inputs:
      # csv_tfxio = tfxio.CsvTFXIO(...)
      # raw_data = (pipeline | 'ToRecordBatches' >> csv_tfxio.BeamSource())
      csv_tfxio = tfxio.BeamRecordCsvTFXIO(
          physical_format='text',
          column_names=ORDERED_CSV_COLUMNS,
          schema=SCHEMA)

      # Read in raw data and convert using CSV TFXIO.  Note that we apply
      # some Beam transformations here, which will not be encoded in the TF
      # graph since we don't do the from within tf.Transform's methods
      # (AnalyzeDataset, TransformDataset etc.).  These transformations are just
      # to get data into a format that the CSV TFXIO can read, in particular
      # removing spaces after commas.
      raw_data = (
          pipeline
          | 'ReadTrainData' >> beam.io.ReadFromText(
              train_data_file, coder=beam.coders.BytesCoder())
          | 'FixCommasTrainData' >> beam.Map(
              lambda line: line.replace(b', ', b','))
          | 'DecodeTrainData' >> csv_tfxio.BeamSource())

      # Combine data and schema into a dataset tuple.  Note that we already used
      # the schema to read the CSV data, but we also need it to interpret
      # raw_data.
      raw_dataset = (raw_data, csv_tfxio.TensorAdapterConfig())

      # The TFXIO output format is chosen for improved performance.
      transformed_dataset, transform_fn = (
          raw_dataset | tft_beam.AnalyzeAndTransformDataset(
              preprocessing_fn, output_record_batches=True))

      # Transformed metadata is not necessary for encoding.
      transformed_data, _ = transformed_dataset

      # Extract transformed RecordBatches, encode and write them to the given
      # directory.
      _ = (
          transformed_data
          | 'EncodeTrainData' >>
          beam.FlatMapTuple(lambda batch, _: RecordBatchToExamples(batch))
          | 'WriteTrainData' >> beam.io.WriteToTFRecord(
              os.path.join(working_dir, TRANSFORMED_TRAIN_DATA_FILEBASE)))

      # Now apply transform function to test data.  In this case we remove the
      # trailing period at the end of each line, and also ignore the header line
      # that is present in the test data file.
      raw_test_data = (
          pipeline
          | 'ReadTestData' >> beam.io.ReadFromText(
              test_data_file, skip_header_lines=1,
              coder=beam.coders.BytesCoder())
          | 'FixCommasTestData' >> beam.Map(
              lambda line: line.replace(b', ', b','))
          | 'RemoveTrailingPeriodsTestData' >> beam.Map(lambda line: line[:-1])
          | 'DecodeTestData' >> csv_tfxio.BeamSource())

      raw_test_dataset = (raw_test_data, csv_tfxio.TensorAdapterConfig())

      # The TFXIO output format is chosen for improved performance.
      transformed_test_dataset = (
          (raw_test_dataset, transform_fn)
          | tft_beam.TransformDataset(output_record_batches=True))

      # Transformed metadata is not necessary for encoding.
      transformed_test_data, _ = transformed_test_dataset

      # Extract transformed RecordBatches, encode and write them to the given
      # directory.
      _ = (
          transformed_test_data
          | 'EncodeTestData' >>
          beam.FlatMapTuple(lambda batch, _: RecordBatchToExamples(batch))
          | 'WriteTestData' >> beam.io.WriteToTFRecord(
              os.path.join(working_dir, TRANSFORMED_TEST_DATA_FILEBASE)))

      # Will write a SavedModel and metadata to working_dir, which can then
      # be read by the tft.TFTransformOutput class.
      _ = (
          transform_fn
          | 'WriteTransformFn' >> tft_beam.WriteTransformFn(working_dir))

Usando nossos dados pré-processados ​​para treinar um modelo usando tf.keras

Para mostrar como tf.Transform nos permite usar o mesmo código para treinamento e servir, e assim evitar inclinação, vamos treinar um modelo. Para treinar nosso modelo e preparar nosso modelo treinado para produção, precisamos criar funções de entrada. A principal diferença entre nossa função de entrada de treinamento e nossa função de entrada de serviço é que os dados de treinamento contêm os rótulos, e os dados de produção não. Os argumentos e retornos também são um pouco diferentes.

Crie uma função de entrada para treinamento

def _make_training_input_fn(tf_transform_output, transformed_examples,
                            batch_size):
  """An input function reading from transformed data, converting to model input.

  Args:
    tf_transform_output: Wrapper around output of tf.Transform.
    transformed_examples: Base filename of examples.
    batch_size: Batch size.

  Returns:
    The input data for training or eval, in the form of k.
  """
  def input_fn():
    return tf.data.experimental.make_batched_features_dataset(
        file_pattern=transformed_examples,
        batch_size=batch_size,
        features=tf_transform_output.transformed_feature_spec(),
        reader=tf.data.TFRecordDataset,
        label_key=LABEL_KEY,
        shuffle=True).prefetch(tf.data.experimental.AUTOTUNE)

  return input_fn

Crie uma função de entrada para servir

Vamos criar uma função de entrada que possamos usar na produção e preparar nosso modelo treinado para servir.

def _make_serving_input_fn(tf_transform_output, raw_examples, batch_size):
  """An input function reading from raw data, converting to model input.

  Args:
    tf_transform_output: Wrapper around output of tf.Transform.
    raw_examples: Base filename of examples.
    batch_size: Batch size.

  Returns:
    The input data for training or eval, in the form of k.
  """

  def get_ordered_raw_data_dtypes():
    result = []
    for col in ORDERED_CSV_COLUMNS:
      if col not in RAW_DATA_FEATURE_SPEC:
        result.append(0.0)
        continue
      spec = RAW_DATA_FEATURE_SPEC[col]
      if isinstance(spec, tf.io.FixedLenFeature):
        result.append(spec.dtype)
      else:
        result.append(0.0)
    return result

  def input_fn():
    dataset = tf.data.experimental.make_csv_dataset(
        file_pattern=raw_examples,
        batch_size=batch_size,
        column_names=ORDERED_CSV_COLUMNS,
        column_defaults=get_ordered_raw_data_dtypes(),
        prefetch_buffer_size=0,
        ignore_errors=True)

    tft_layer = tf_transform_output.transform_features_layer()

    def transform_dataset(data):
      raw_features = {}
      for key, val in data.items():
        if key not in RAW_DATA_FEATURE_SPEC:
          continue
        if isinstance(RAW_DATA_FEATURE_SPEC[key], tf.io.VarLenFeature):
          raw_features[key] = tf.RaggedTensor.from_tensor(
              tf.expand_dims(val, -1)).to_sparse()
          continue
        raw_features[key] = val
      transformed_features = tft_layer(raw_features)
      data_labels = transformed_features.pop(LABEL_KEY)
      return (transformed_features, data_labels)

    return dataset.map(
        transform_dataset,
        num_parallel_calls=tf.data.experimental.AUTOTUNE).prefetch(
            tf.data.experimental.AUTOTUNE)

  return input_fn

Treine, avalie e exporte nosso modelo

def export_serving_model(tf_transform_output, model, output_dir):
  """Exports a keras model for serving.

  Args:
    tf_transform_output: Wrapper around output of tf.Transform.
    model: A keras model to export for serving.
    output_dir: A directory where the model will be exported to.
  """
  # The layer has to be saved to the model for keras tracking purpases.
  model.tft_layer = tf_transform_output.transform_features_layer()

  @tf.function
  def serve_tf_examples_fn(serialized_tf_examples):
    """Serving tf.function model wrapper."""
    feature_spec = RAW_DATA_FEATURE_SPEC.copy()
    feature_spec.pop(LABEL_KEY)
    parsed_features = tf.io.parse_example(serialized_tf_examples, feature_spec)
    transformed_features = model.tft_layer(parsed_features)
    outputs = model(transformed_features)
    classes_names = tf.constant([['0', '1']])
    classes = tf.tile(classes_names, [tf.shape(outputs)[0], 1])
    return {'classes': classes, 'scores': outputs}

  concrete_serving_fn = serve_tf_examples_fn.get_concrete_function(
      tf.TensorSpec(shape=[None], dtype=tf.string, name='inputs'))
  signatures = {'serving_default': concrete_serving_fn}

  # This is required in order to make this model servable with model_server.
  versioned_output_dir = os.path.join(output_dir, '1')
  model.save(versioned_output_dir, save_format='tf', signatures=signatures)
def train_and_evaluate(working_dir,
                       num_train_instances=NUM_TRAIN_INSTANCES,
                       num_test_instances=NUM_TEST_INSTANCES):
  """Train the model on training data and evaluate on test data.

  Args:
    working_dir: The location of the Transform output.
    num_train_instances: Number of instances in train set
    num_test_instances: Number of instances in test set

  Returns:
    The results from the estimator's 'evaluate' method
  """
  train_data_path_pattern = os.path.join(working_dir,
                                 TRANSFORMED_TRAIN_DATA_FILEBASE + '*')
  eval_data_path_pattern = os.path.join(working_dir,
                            TRANSFORMED_TEST_DATA_FILEBASE + '*')
  tf_transform_output = tft.TFTransformOutput(working_dir)

  train_input_fn = _make_training_input_fn(
      tf_transform_output, train_data_path_pattern, batch_size=TRAIN_BATCH_SIZE)
  train_dataset = train_input_fn()

  # Evaluate model on test dataset.
  eval_input_fn = _make_training_input_fn(
      tf_transform_output, eval_data_path_pattern, batch_size=TRAIN_BATCH_SIZE)
  validation_dataset = eval_input_fn()

  feature_spec = tf_transform_output.transformed_feature_spec().copy()
  feature_spec.pop(LABEL_KEY)

  inputs = {}
  for key, spec in feature_spec.items():
    if isinstance(spec, tf.io.VarLenFeature):
      inputs[key] = tf.keras.layers.Input(
          shape=[None], name=key, dtype=spec.dtype, sparse=True)
    elif isinstance(spec, tf.io.FixedLenFeature):
      inputs[key] = tf.keras.layers.Input(
          shape=spec.shape, name=key, dtype=spec.dtype)
    else:
      raise ValueError('Spec type is not supported: ', key, spec)

  encoded_inputs = {}
  for key in inputs:
    feature = tf.expand_dims(inputs[key], -1)
    if key in CATEGORICAL_FEATURE_KEYS:
      num_buckets = tf_transform_output.num_buckets_for_transformed_feature(key)
      encoding_layer = (
          tf.keras.layers.experimental.preprocessing.CategoryEncoding(
              max_tokens=num_buckets, output_mode='binary', sparse=False))
      encoded_inputs[key] = encoding_layer(feature)
    else:
      encoded_inputs[key] = feature

  stacked_inputs = tf.concat(tf.nest.flatten(encoded_inputs), axis=1)
  output = tf.keras.layers.Dense(100, activation='relu')(stacked_inputs)
  output = tf.keras.layers.Dense(70, activation='relu')(output)
  output = tf.keras.layers.Dense(50, activation='relu')(output)
  output = tf.keras.layers.Dense(20, activation='relu')(output)
  output = tf.keras.layers.Dense(2, activation='sigmoid')(output)
  model = tf.keras.Model(inputs=inputs, outputs=output)

  model.compile(optimizer='adam',
                loss='binary_crossentropy',
                metrics=['accuracy'])
  pprint.pprint(model.summary())

  model.fit(train_dataset, validation_data=validation_dataset,
            epochs=TRAIN_NUM_EPOCHS,
            steps_per_epoch=math.ceil(num_train_instances / TRAIN_BATCH_SIZE),
            validation_steps=math.ceil(num_test_instances / TRAIN_BATCH_SIZE))

  # Export the model.
  exported_model_dir = os.path.join(working_dir, EXPORTED_MODEL_DIR)
  export_serving_model(tf_transform_output, model, exported_model_dir)

  metrics_values = model.evaluate(validation_dataset, steps=num_test_instances)
  metrics_labels = model.metrics_names
  return {l: v for l, v in zip(metrics_labels, metrics_values)}

Junte tudo

Criamos tudo de que precisamos para pré-processar nossos dados do censo, treinar um modelo e prepará-lo para veiculação. Até agora, estamos apenas preparando as coisas. É hora de começar a correr!

import tempfile
temp = os.path.join(tempfile.gettempdir(), 'keras')

transform_data(train, test, temp)
results = train_and_evaluate(temp)
pprint.pprint(results)
WARNING:apache_beam.runners.interactive.interactive_environment:Dependencies required for Interactive Beam PCollection visualization are not available, please use: `pip install apache-beam[interactive]` to install necessary dependencies to enable all data visualization features.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_transform/tf_utils.py:261: Tensor.experimental_ref (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use ref() instead.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_transform/tf_utils.py:261: Tensor.experimental_ref (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use ref() instead.
WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter.
2021-11-09 11:18:42.727956: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
INFO:tensorflow:Assets written to: /tmp/tmpf6qd1ln_/tftransform_tmp/e5b60ec95e6d46d4bb2f9abfe93ea02f/assets
INFO:tensorflow:Assets written to: /tmp/tmpf6qd1ln_/tftransform_tmp/e5b60ec95e6d46d4bb2f9abfe93ea02f/assets
INFO:tensorflow:tensorflow_text is not available.
INFO:tensorflow:tensorflow_text is not available.
INFO:tensorflow:tensorflow_decision_forests is not available.
INFO:tensorflow:tensorflow_decision_forests is not available.
INFO:tensorflow:struct2tensor is not available.
INFO:tensorflow:struct2tensor is not available.
INFO:tensorflow:Assets written to: /tmp/tmpf6qd1ln_/tftransform_tmp/12125850cbb24457a10c019a79f929b8/assets
INFO:tensorflow:Assets written to: /tmp/tmpf6qd1ln_/tftransform_tmp/12125850cbb24457a10c019a79f929b8/assets
INFO:tensorflow:tensorflow_text is not available.
INFO:tensorflow:tensorflow_text is not available.
INFO:tensorflow:tensorflow_decision_forests is not available.
INFO:tensorflow:tensorflow_decision_forests is not available.
INFO:tensorflow:struct2tensor is not available.
INFO:tensorflow:struct2tensor is not available.
WARNING:apache_beam.io.tfrecordio:Couldn't find python-snappy so the implementation of _TFRecordUtil._masked_crc32c is not as fast as it could be.
INFO:tensorflow:tensorflow_text is not available.
INFO:tensorflow:tensorflow_text is not available.
INFO:tensorflow:tensorflow_decision_forests is not available.
INFO:tensorflow:tensorflow_decision_forests is not available.
INFO:tensorflow:struct2tensor is not available.
INFO:tensorflow:struct2tensor is not available.
WARNING:tensorflow:max_tokens is deprecated, please use num_tokens instead.
WARNING:tensorflow:max_tokens is deprecated, please use num_tokens instead.
WARNING:tensorflow:max_tokens is deprecated, please use num_tokens instead.
WARNING:tensorflow:max_tokens is deprecated, please use num_tokens instead.
WARNING:tensorflow:max_tokens is deprecated, please use num_tokens instead.
WARNING:tensorflow:max_tokens is deprecated, please use num_tokens instead.
WARNING:tensorflow:max_tokens is deprecated, please use num_tokens instead.
WARNING:tensorflow:max_tokens is deprecated, please use num_tokens instead.
WARNING:tensorflow:max_tokens is deprecated, please use num_tokens instead.
WARNING:tensorflow:max_tokens is deprecated, please use num_tokens instead.
WARNING:tensorflow:max_tokens is deprecated, please use num_tokens instead.
WARNING:tensorflow:max_tokens is deprecated, please use num_tokens instead.
WARNING:tensorflow:max_tokens is deprecated, please use num_tokens instead.
WARNING:tensorflow:max_tokens is deprecated, please use num_tokens instead.
WARNING:tensorflow:max_tokens is deprecated, please use num_tokens instead.
WARNING:tensorflow:max_tokens is deprecated, please use num_tokens instead.
Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
education (InputLayer)          [(None,)]            0                                            
__________________________________________________________________________________________________
marital-status (InputLayer)     [(None,)]            0                                            
__________________________________________________________________________________________________
native-country (InputLayer)     [(None,)]            0                                            
__________________________________________________________________________________________________
occupation (InputLayer)         [(None,)]            0                                            
__________________________________________________________________________________________________
race (InputLayer)               [(None,)]            0                                            
__________________________________________________________________________________________________
relationship (InputLayer)       [(None,)]            0                                            
__________________________________________________________________________________________________
sex (InputLayer)                [(None,)]            0                                            
__________________________________________________________________________________________________
workclass (InputLayer)          [(None,)]            0                                            
__________________________________________________________________________________________________
age (InputLayer)                [(None,)]            0                                            
__________________________________________________________________________________________________
capital-gain (InputLayer)       [(None,)]            0                                            
__________________________________________________________________________________________________
capital-loss (InputLayer)       [(None,)]            0                                            
__________________________________________________________________________________________________
tf.expand_dims_3 (TFOpLambda)   (None, 1)            0           education[0][0]                  
__________________________________________________________________________________________________
education-num (InputLayer)      [(None,)]            0                                            
__________________________________________________________________________________________________
hours-per-week (InputLayer)     [(None,)]            0                                            
__________________________________________________________________________________________________
tf.expand_dims_6 (TFOpLambda)   (None, 1)            0           marital-status[0][0]             
__________________________________________________________________________________________________
tf.expand_dims_7 (TFOpLambda)   (None, 1)            0           native-country[0][0]             
__________________________________________________________________________________________________
tf.expand_dims_8 (TFOpLambda)   (None, 1)            0           occupation[0][0]                 
__________________________________________________________________________________________________
tf.expand_dims_9 (TFOpLambda)   (None, 1)            0           race[0][0]                       
__________________________________________________________________________________________________
tf.expand_dims_10 (TFOpLambda)  (None, 1)            0           relationship[0][0]               
__________________________________________________________________________________________________
tf.expand_dims_11 (TFOpLambda)  (None, 1)            0           sex[0][0]                        
__________________________________________________________________________________________________
tf.expand_dims_12 (TFOpLambda)  (None, 1)            0           workclass[0][0]                  
__________________________________________________________________________________________________
tf.expand_dims (TFOpLambda)     (None, 1)            0           age[0][0]                        
__________________________________________________________________________________________________
tf.expand_dims_1 (TFOpLambda)   (None, 1)            0           capital-gain[0][0]               
__________________________________________________________________________________________________
tf.expand_dims_2 (TFOpLambda)   (None, 1)            0           capital-loss[0][0]               
__________________________________________________________________________________________________
category_encoding (CategoryEnco (None, 17)           0           tf.expand_dims_3[0][0]           
__________________________________________________________________________________________________
tf.expand_dims_4 (TFOpLambda)   (None, 1)            0           education-num[0][0]              
__________________________________________________________________________________________________
tf.expand_dims_5 (TFOpLambda)   (None, 1)            0           hours-per-week[0][0]             
__________________________________________________________________________________________________
category_encoding_1 (CategoryEn (None, 8)            0           tf.expand_dims_6[0][0]           
__________________________________________________________________________________________________
category_encoding_2 (CategoryEn (None, 43)           0           tf.expand_dims_7[0][0]           
__________________________________________________________________________________________________
category_encoding_3 (CategoryEn (None, 16)           0           tf.expand_dims_8[0][0]           
__________________________________________________________________________________________________
category_encoding_4 (CategoryEn (None, 6)            0           tf.expand_dims_9[0][0]           
__________________________________________________________________________________________________
category_encoding_5 (CategoryEn (None, 7)            0           tf.expand_dims_10[0][0]          
__________________________________________________________________________________________________
category_encoding_6 (CategoryEn (None, 3)            0           tf.expand_dims_11[0][0]          
__________________________________________________________________________________________________
category_encoding_7 (CategoryEn (None, 10)           0           tf.expand_dims_12[0][0]          
__________________________________________________________________________________________________
tf.concat (TFOpLambda)          (None, 115)          0           tf.expand_dims[0][0]             
                                                                 tf.expand_dims_1[0][0]           
                                                                 tf.expand_dims_2[0][0]           
                                                                 category_encoding[0][0]          
                                                                 tf.expand_dims_4[0][0]           
                                                                 tf.expand_dims_5[0][0]           
                                                                 category_encoding_1[0][0]        
                                                                 category_encoding_2[0][0]        
                                                                 category_encoding_3[0][0]        
                                                                 category_encoding_4[0][0]        
                                                                 category_encoding_5[0][0]        
                                                                 category_encoding_6[0][0]        
                                                                 category_encoding_7[0][0]        
__________________________________________________________________________________________________
dense (Dense)                   (None, 100)          11600       tf.concat[0][0]                  
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 70)           7070        dense[0][0]                      
__________________________________________________________________________________________________
dense_2 (Dense)                 (None, 50)           3550        dense_1[0][0]                    
__________________________________________________________________________________________________
dense_3 (Dense)                 (None, 20)           1020        dense_2[0][0]                    
__________________________________________________________________________________________________
dense_4 (Dense)                 (None, 2)            42          dense_3[0][0]                    
==================================================================================================
Total params: 23,282
Trainable params: 23,282
Non-trainable params: 0
__________________________________________________________________________________________________
None
Epoch 1/16
255/255 [==============================] - 3s 8ms/step - loss: 0.3889 - accuracy: 0.8141 - val_loss: 0.3401 - val_accuracy: 0.8409
Epoch 2/16
255/255 [==============================] - 2s 7ms/step - loss: 0.3351 - accuracy: 0.8442 - val_loss: 0.3392 - val_accuracy: 0.8429
Epoch 3/16
255/255 [==============================] - 2s 6ms/step - loss: 0.3230 - accuracy: 0.8486 - val_loss: 0.3343 - val_accuracy: 0.8410
Epoch 4/16
255/255 [==============================] - 2s 7ms/step - loss: 0.3160 - accuracy: 0.8513 - val_loss: 0.3211 - val_accuracy: 0.8509
Epoch 5/16
255/255 [==============================] - 2s 6ms/step - loss: 0.3081 - accuracy: 0.8551 - val_loss: 0.3215 - val_accuracy: 0.8461
Epoch 6/16
255/255 [==============================] - 2s 7ms/step - loss: 0.3046 - accuracy: 0.8577 - val_loss: 0.3290 - val_accuracy: 0.8436
Epoch 7/16
255/255 [==============================] - 2s 6ms/step - loss: 0.2996 - accuracy: 0.8594 - val_loss: 0.3260 - val_accuracy: 0.8494
Epoch 8/16
255/255 [==============================] - 2s 6ms/step - loss: 0.2946 - accuracy: 0.8620 - val_loss: 0.3284 - val_accuracy: 0.8479
Epoch 9/16
255/255 [==============================] - 2s 6ms/step - loss: 0.2915 - accuracy: 0.8626 - val_loss: 0.3238 - val_accuracy: 0.8489
Epoch 10/16
255/255 [==============================] - 2s 7ms/step - loss: 0.2884 - accuracy: 0.8639 - val_loss: 0.3269 - val_accuracy: 0.8497
Epoch 11/16
255/255 [==============================] - 2s 7ms/step - loss: 0.2836 - accuracy: 0.8669 - val_loss: 0.3364 - val_accuracy: 0.8474
Epoch 12/16
255/255 [==============================] - 2s 6ms/step - loss: 0.2817 - accuracy: 0.8680 - val_loss: 0.3375 - val_accuracy: 0.8444
Epoch 13/16
255/255 [==============================] - 2s 6ms/step - loss: 0.2786 - accuracy: 0.8701 - val_loss: 0.3392 - val_accuracy: 0.8481
Epoch 14/16
255/255 [==============================] - 2s 7ms/step - loss: 0.2743 - accuracy: 0.8723 - val_loss: 0.3402 - val_accuracy: 0.8467
Epoch 15/16
255/255 [==============================] - 2s 6ms/step - loss: 0.2734 - accuracy: 0.8718 - val_loss: 0.3442 - val_accuracy: 0.8438
Epoch 16/16
255/255 [==============================] - 2s 6ms/step - loss: 0.2694 - accuracy: 0.8734 - val_loss: 0.3466 - val_accuracy: 0.8456
INFO:tensorflow:tensorflow_text is not available.
INFO:tensorflow:tensorflow_text is not available.
INFO:tensorflow:tensorflow_decision_forests is not available.
INFO:tensorflow:tensorflow_decision_forests is not available.
INFO:tensorflow:struct2tensor is not available.
INFO:tensorflow:struct2tensor is not available.
INFO:tensorflow:Assets written to: /tmp/keras/exported_model_dir/1/assets
INFO:tensorflow:Assets written to: /tmp/keras/exported_model_dir/1/assets
16281/16281 [==============================] - 68s 4ms/step - loss: 0.3470 - accuracy: 0.8455
{'accuracy': 0.8454640507698059, 'loss': 0.34704914689064026}

(Opcional) Usando nossos dados pré-processados ​​para treinar um modelo usando tf.estimator

Se você preferir usar um modelo Estimator em vez de um modelo Keras, o código nesta seção mostra como fazer isso.

Crie uma função de entrada para treinamento

def _make_training_input_fn(tf_transform_output, transformed_examples,
                            batch_size):
  """Creates an input function reading from transformed data.

  Args:
    tf_transform_output: Wrapper around output of tf.Transform.
    transformed_examples: Base filename of examples.
    batch_size: Batch size.

  Returns:
    The input function for training or eval.
  """
  def input_fn():
    """Input function for training and eval."""
    dataset = tf.data.experimental.make_batched_features_dataset(
        file_pattern=transformed_examples,
        batch_size=batch_size,
        features=tf_transform_output.transformed_feature_spec(),
        reader=tf.data.TFRecordDataset,
        shuffle=True)

    transformed_features = tf.compat.v1.data.make_one_shot_iterator(
        dataset).get_next()

    # Extract features and label from the transformed tensors.
    transformed_labels = tf.where(
        tf.equal(transformed_features.pop(LABEL_KEY), 1))

    return transformed_features, transformed_labels[:,1]

  return input_fn

Crie uma função de entrada para servir

Vamos criar uma função de entrada que possamos usar na produção e preparar nosso modelo treinado para servir.

def _make_serving_input_fn(tf_transform_output):
  """Creates an input function reading from raw data.

  Args:
    tf_transform_output: Wrapper around output of tf.Transform.

  Returns:
    The serving input function.
  """
  raw_feature_spec = RAW_DATA_FEATURE_SPEC.copy()
  # Remove label since it is not available during serving.
  raw_feature_spec.pop(LABEL_KEY)

  def serving_input_fn():
    """Input function for serving."""
    # Get raw features by generating the basic serving input_fn and calling it.
    # Here we generate an input_fn that expects a parsed Example proto to be fed
    # to the model at serving time.  See also
    # tf.estimator.export.build_raw_serving_input_receiver_fn.
    raw_input_fn = tf.estimator.export.build_parsing_serving_input_receiver_fn(
        raw_feature_spec, default_batch_size=None)
    serving_input_receiver = raw_input_fn()

    # Apply the transform function that was used to generate the materialized
    # data.
    raw_features = serving_input_receiver.features
    transformed_features = tf_transform_output.transform_raw_features(
        raw_features)

    return tf.estimator.export.ServingInputReceiver(
        transformed_features, serving_input_receiver.receiver_tensors)

  return serving_input_fn

Envolva nossos dados de entrada em FeatureColumns

Nosso modelo espera nossos dados no TensorFlow FeatureColumns.

def get_feature_columns(tf_transform_output):
  """Returns the FeatureColumns for the model.

  Args:
    tf_transform_output: A `TFTransformOutput` object.

  Returns:
    A list of FeatureColumns.
  """
  # Wrap scalars as real valued columns.
  real_valued_columns = [tf.feature_column.numeric_column(key, shape=())
                         for key in NUMERIC_FEATURE_KEYS]

  # Wrap categorical columns.
  one_hot_columns = [
      tf.feature_column.indicator_column(
          tf.feature_column.categorical_column_with_identity(
              key=key,
              num_buckets=(NUM_OOV_BUCKETS +
                  tf_transform_output.vocabulary_size_by_name(
                      vocab_filename=key))))
      for key in CATEGORICAL_FEATURE_KEYS]

  return real_valued_columns + one_hot_columns

Treine, avalie e exporte nosso modelo

def train_and_evaluate(working_dir, num_train_instances=NUM_TRAIN_INSTANCES,
                       num_test_instances=NUM_TEST_INSTANCES):
  """Train the model on training data and evaluate on test data.

  Args:
    working_dir: Directory to read transformed data and metadata from and to
        write exported model to.
    num_train_instances: Number of instances in train set
    num_test_instances: Number of instances in test set

  Returns:
    The results from the estimator's 'evaluate' method
  """
  tf_transform_output = tft.TFTransformOutput(working_dir)

  run_config = tf.estimator.RunConfig()

  estimator = tf.estimator.LinearClassifier(
      feature_columns=get_feature_columns(tf_transform_output),
      config=run_config,
      loss_reduction=tf.losses.Reduction.SUM)

  # Fit the model using the default optimizer.
  train_input_fn = _make_training_input_fn(
      tf_transform_output,
      os.path.join(working_dir, TRANSFORMED_TRAIN_DATA_FILEBASE + '*'),
      batch_size=TRAIN_BATCH_SIZE)
  estimator.train(
      input_fn=train_input_fn,
      max_steps=TRAIN_NUM_EPOCHS * num_train_instances / TRAIN_BATCH_SIZE)

  # Evaluate model on test dataset.
  eval_input_fn = _make_training_input_fn(
      tf_transform_output,
      os.path.join(working_dir, TRANSFORMED_TEST_DATA_FILEBASE + '*'),
      batch_size=1)

  # Export the model.
  serving_input_fn = _make_serving_input_fn(tf_transform_output)
  exported_model_dir = os.path.join(working_dir, EXPORTED_MODEL_DIR)
  estimator.export_saved_model(exported_model_dir, serving_input_fn)

  return estimator.evaluate(input_fn=eval_input_fn, steps=num_test_instances)

Junte tudo

Criamos tudo de que precisamos para pré-processar nossos dados do censo, treinar um modelo e prepará-lo para veiculação. Até agora, estamos apenas preparando as coisas. É hora de começar a correr!

import tempfile
temp = os.path.join(tempfile.gettempdir(), 'estimator')

transform_data(train, test, temp)
results = train_and_evaluate(temp)
pprint.pprint(results)
WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter.
INFO:tensorflow:Assets written to: /tmp/tmp42ffgsto/tftransform_tmp/e8c76b6dcd7045a69109320a422446fa/assets
INFO:tensorflow:Assets written to: /tmp/tmp42ffgsto/tftransform_tmp/e8c76b6dcd7045a69109320a422446fa/assets
INFO:tensorflow:tensorflow_text is not available.
INFO:tensorflow:tensorflow_text is not available.
INFO:tensorflow:tensorflow_decision_forests is not available.
INFO:tensorflow:tensorflow_decision_forests is not available.
INFO:tensorflow:struct2tensor is not available.
INFO:tensorflow:struct2tensor is not available.
INFO:tensorflow:Assets written to: /tmp/tmp42ffgsto/tftransform_tmp/38266367d8a44318a5b671b0fd9953e1/assets
INFO:tensorflow:Assets written to: /tmp/tmp42ffgsto/tftransform_tmp/38266367d8a44318a5b671b0fd9953e1/assets
INFO:tensorflow:tensorflow_text is not available.
INFO:tensorflow:tensorflow_text is not available.
INFO:tensorflow:tensorflow_decision_forests is not available.
INFO:tensorflow:tensorflow_decision_forests is not available.
INFO:tensorflow:struct2tensor is not available.
INFO:tensorflow:struct2tensor is not available.
INFO:tensorflow:tensorflow_text is not available.
INFO:tensorflow:tensorflow_text is not available.
INFO:tensorflow:tensorflow_decision_forests is not available.
INFO:tensorflow:tensorflow_decision_forests is not available.
INFO:tensorflow:struct2tensor is not available.
INFO:tensorflow:struct2tensor is not available.
WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmphpcnvj_9
WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmphpcnvj_9
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmphpcnvj_9', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmphpcnvj_9', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/keras/engine/base_layer_v1.py:1684: UserWarning: `layer.add_variable` is deprecated and will be removed in a future version. Please use `layer.add_weight` method instead.
  warnings.warn('`layer.add_variable` is deprecated and '
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/keras/optimizer_v2/ftrl.py:147: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/keras/optimizer_v2/ftrl.py:147: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmphpcnvj_9/model.ckpt.
INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmphpcnvj_9/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:loss = 88.72284, step = 0
INFO:tensorflow:loss = 88.72284, step = 0
INFO:tensorflow:global_step/sec: 175.414
INFO:tensorflow:global_step/sec: 175.414
INFO:tensorflow:loss = 48.07448, step = 100 (0.571 sec)
INFO:tensorflow:loss = 48.07448, step = 100 (0.571 sec)
INFO:tensorflow:global_step/sec: 229.456
INFO:tensorflow:global_step/sec: 229.456
INFO:tensorflow:loss = 61.178864, step = 200 (0.436 sec)
INFO:tensorflow:loss = 61.178864, step = 200 (0.436 sec)
INFO:tensorflow:global_step/sec: 224.899
INFO:tensorflow:global_step/sec: 224.899
INFO:tensorflow:loss = 48.286705, step = 300 (0.445 sec)
INFO:tensorflow:loss = 48.286705, step = 300 (0.445 sec)
INFO:tensorflow:global_step/sec: 225.226
INFO:tensorflow:global_step/sec: 225.226
INFO:tensorflow:loss = 51.9139, step = 400 (0.444 sec)
INFO:tensorflow:loss = 51.9139, step = 400 (0.444 sec)
INFO:tensorflow:global_step/sec: 228.107
INFO:tensorflow:global_step/sec: 228.107
INFO:tensorflow:loss = 44.438698, step = 500 (0.438 sec)
INFO:tensorflow:loss = 44.438698, step = 500 (0.438 sec)
INFO:tensorflow:global_step/sec: 225.519
INFO:tensorflow:global_step/sec: 225.519
INFO:tensorflow:loss = 39.813446, step = 600 (0.443 sec)
INFO:tensorflow:loss = 39.813446, step = 600 (0.443 sec)
INFO:tensorflow:global_step/sec: 226.471
INFO:tensorflow:global_step/sec: 226.471
INFO:tensorflow:loss = 48.06566, step = 700 (0.442 sec)
INFO:tensorflow:loss = 48.06566, step = 700 (0.442 sec)
INFO:tensorflow:global_step/sec: 226.182
INFO:tensorflow:global_step/sec: 226.182
INFO:tensorflow:loss = 39.054085, step = 800 (0.442 sec)
INFO:tensorflow:loss = 39.054085, step = 800 (0.442 sec)
INFO:tensorflow:global_step/sec: 229.466
INFO:tensorflow:global_step/sec: 229.466
INFO:tensorflow:loss = 41.87681, step = 900 (0.436 sec)
INFO:tensorflow:loss = 41.87681, step = 900 (0.436 sec)
INFO:tensorflow:global_step/sec: 225.932
INFO:tensorflow:global_step/sec: 225.932
INFO:tensorflow:loss = 37.37454, step = 1000 (0.442 sec)
INFO:tensorflow:loss = 37.37454, step = 1000 (0.442 sec)
INFO:tensorflow:global_step/sec: 223.176
INFO:tensorflow:global_step/sec: 223.176
INFO:tensorflow:loss = 41.804867, step = 1100 (0.448 sec)
INFO:tensorflow:loss = 41.804867, step = 1100 (0.448 sec)
INFO:tensorflow:global_step/sec: 219.86
INFO:tensorflow:global_step/sec: 219.86
INFO:tensorflow:loss = 34.930386, step = 1200 (0.455 sec)
INFO:tensorflow:loss = 34.930386, step = 1200 (0.455 sec)
INFO:tensorflow:global_step/sec: 215.812
INFO:tensorflow:global_step/sec: 215.812
INFO:tensorflow:loss = 46.14614, step = 1300 (0.464 sec)
INFO:tensorflow:loss = 46.14614, step = 1300 (0.464 sec)
INFO:tensorflow:global_step/sec: 219.062
INFO:tensorflow:global_step/sec: 219.062
INFO:tensorflow:loss = 44.350525, step = 1400 (0.456 sec)
INFO:tensorflow:loss = 44.350525, step = 1400 (0.456 sec)
INFO:tensorflow:global_step/sec: 225.859
INFO:tensorflow:global_step/sec: 225.859
INFO:tensorflow:loss = 41.62947, step = 1500 (0.443 sec)
INFO:tensorflow:loss = 41.62947, step = 1500 (0.443 sec)
INFO:tensorflow:global_step/sec: 222.791
INFO:tensorflow:global_step/sec: 222.791
INFO:tensorflow:loss = 39.155415, step = 1600 (0.449 sec)
INFO:tensorflow:loss = 39.155415, step = 1600 (0.449 sec)
INFO:tensorflow:global_step/sec: 218.216
INFO:tensorflow:global_step/sec: 218.216
INFO:tensorflow:loss = 48.676804, step = 1700 (0.458 sec)
INFO:tensorflow:loss = 48.676804, step = 1700 (0.458 sec)
INFO:tensorflow:global_step/sec: 221.741
INFO:tensorflow:global_step/sec: 221.741
INFO:tensorflow:loss = 41.099533, step = 1800 (0.451 sec)
INFO:tensorflow:loss = 41.099533, step = 1800 (0.451 sec)
INFO:tensorflow:global_step/sec: 215.495
INFO:tensorflow:global_step/sec: 215.495
INFO:tensorflow:loss = 40.689064, step = 1900 (0.464 sec)
INFO:tensorflow:loss = 40.689064, step = 1900 (0.464 sec)
INFO:tensorflow:global_step/sec: 225.078
INFO:tensorflow:global_step/sec: 225.078
INFO:tensorflow:loss = 41.96339, step = 2000 (0.445 sec)
INFO:tensorflow:loss = 41.96339, step = 2000 (0.445 sec)
INFO:tensorflow:global_step/sec: 224.698
INFO:tensorflow:global_step/sec: 224.698
INFO:tensorflow:loss = 36.897514, step = 2100 (0.445 sec)
INFO:tensorflow:loss = 36.897514, step = 2100 (0.445 sec)
INFO:tensorflow:global_step/sec: 226.767
INFO:tensorflow:global_step/sec: 226.767
INFO:tensorflow:loss = 40.899315, step = 2200 (0.441 sec)
INFO:tensorflow:loss = 40.899315, step = 2200 (0.441 sec)
INFO:tensorflow:global_step/sec: 227.046
INFO:tensorflow:global_step/sec: 227.046
INFO:tensorflow:loss = 60.495663, step = 2300 (0.440 sec)
INFO:tensorflow:loss = 60.495663, step = 2300 (0.440 sec)
INFO:tensorflow:global_step/sec: 222.482
INFO:tensorflow:global_step/sec: 222.482
INFO:tensorflow:loss = 53.929543, step = 2400 (0.450 sec)
INFO:tensorflow:loss = 53.929543, step = 2400 (0.450 sec)
INFO:tensorflow:global_step/sec: 223.815
INFO:tensorflow:global_step/sec: 223.815
INFO:tensorflow:loss = 38.190765, step = 2500 (0.447 sec)
INFO:tensorflow:loss = 38.190765, step = 2500 (0.447 sec)
INFO:tensorflow:global_step/sec: 224.088
INFO:tensorflow:global_step/sec: 224.088
INFO:tensorflow:loss = 39.904915, step = 2600 (0.446 sec)
INFO:tensorflow:loss = 39.904915, step = 2600 (0.446 sec)
INFO:tensorflow:global_step/sec: 223.104
INFO:tensorflow:global_step/sec: 223.104
INFO:tensorflow:loss = 41.107674, step = 2700 (0.448 sec)
INFO:tensorflow:loss = 41.107674, step = 2700 (0.448 sec)
INFO:tensorflow:global_step/sec: 218.155
INFO:tensorflow:global_step/sec: 218.155
INFO:tensorflow:loss = 41.644638, step = 2800 (0.459 sec)
INFO:tensorflow:loss = 41.644638, step = 2800 (0.459 sec)
INFO:tensorflow:global_step/sec: 218.99
INFO:tensorflow:global_step/sec: 218.99
INFO:tensorflow:loss = 38.121563, step = 2900 (0.456 sec)
INFO:tensorflow:loss = 38.121563, step = 2900 (0.456 sec)
INFO:tensorflow:global_step/sec: 221.771
INFO:tensorflow:global_step/sec: 221.771
INFO:tensorflow:loss = 36.85429, step = 3000 (0.451 sec)
INFO:tensorflow:loss = 36.85429, step = 3000 (0.451 sec)
INFO:tensorflow:global_step/sec: 216.171
INFO:tensorflow:global_step/sec: 216.171
INFO:tensorflow:loss = 38.48166, step = 3100 (0.463 sec)
INFO:tensorflow:loss = 38.48166, step = 3100 (0.463 sec)
INFO:tensorflow:global_step/sec: 219.535
INFO:tensorflow:global_step/sec: 219.535
INFO:tensorflow:loss = 45.735847, step = 3200 (0.455 sec)
INFO:tensorflow:loss = 45.735847, step = 3200 (0.455 sec)
INFO:tensorflow:global_step/sec: 222.691
INFO:tensorflow:global_step/sec: 222.691
INFO:tensorflow:loss = 43.371204, step = 3300 (0.449 sec)
INFO:tensorflow:loss = 43.371204, step = 3300 (0.449 sec)
INFO:tensorflow:global_step/sec: 221.861
INFO:tensorflow:global_step/sec: 221.861
INFO:tensorflow:loss = 45.63005, step = 3400 (0.451 sec)
INFO:tensorflow:loss = 45.63005, step = 3400 (0.451 sec)
INFO:tensorflow:global_step/sec: 216.52
INFO:tensorflow:global_step/sec: 216.52
INFO:tensorflow:loss = 45.134335, step = 3500 (0.462 sec)
INFO:tensorflow:loss = 45.134335, step = 3500 (0.462 sec)
INFO:tensorflow:global_step/sec: 220.787
INFO:tensorflow:global_step/sec: 220.787
INFO:tensorflow:loss = 41.1521, step = 3600 (0.453 sec)
INFO:tensorflow:loss = 41.1521, step = 3600 (0.453 sec)
INFO:tensorflow:global_step/sec: 219.394
INFO:tensorflow:global_step/sec: 219.394
INFO:tensorflow:loss = 47.715237, step = 3700 (0.456 sec)
INFO:tensorflow:loss = 47.715237, step = 3700 (0.456 sec)
INFO:tensorflow:global_step/sec: 218.352
INFO:tensorflow:global_step/sec: 218.352
INFO:tensorflow:loss = 52.373795, step = 3800 (0.458 sec)
INFO:tensorflow:loss = 52.373795, step = 3800 (0.458 sec)
INFO:tensorflow:global_step/sec: 213.061
INFO:tensorflow:global_step/sec: 213.061
INFO:tensorflow:loss = 39.63704, step = 3900 (0.470 sec)
INFO:tensorflow:loss = 39.63704, step = 3900 (0.470 sec)
INFO:tensorflow:global_step/sec: 213.278
INFO:tensorflow:global_step/sec: 213.278
INFO:tensorflow:loss = 37.945107, step = 4000 (0.469 sec)
INFO:tensorflow:loss = 37.945107, step = 4000 (0.469 sec)
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 4071...
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 4071...
INFO:tensorflow:Saving checkpoints for 4071 into /tmp/tmphpcnvj_9/model.ckpt.
INFO:tensorflow:Saving checkpoints for 4071 into /tmp/tmphpcnvj_9/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 4071...
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 4071...
INFO:tensorflow:Loss for final step: 38.98066.
INFO:tensorflow:Loss for final step: 38.98066.
INFO:tensorflow:tensorflow_text is not available.
INFO:tensorflow:tensorflow_text is not available.
INFO:tensorflow:tensorflow_decision_forests is not available.
INFO:tensorflow:tensorflow_decision_forests is not available.
INFO:tensorflow:struct2tensor is not available.
INFO:tensorflow:struct2tensor is not available.
WARNING:tensorflow:Loading a TF2 SavedModel but eager mode seems disabled.
WARNING:tensorflow:Loading a TF2 SavedModel but eager mode seems disabled.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/saved_model/signature_def_utils_impl.py:145: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/saved_model/signature_def_utils_impl.py:145: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.
INFO:tensorflow:Signatures INCLUDED in export for Classify: ['serving_default', 'classification']
INFO:tensorflow:Signatures INCLUDED in export for Classify: ['serving_default', 'classification']
INFO:tensorflow:Signatures INCLUDED in export for Regress: ['regression']
INFO:tensorflow:Signatures INCLUDED in export for Regress: ['regression']
INFO:tensorflow:Signatures INCLUDED in export for Predict: ['predict']
INFO:tensorflow:Signatures INCLUDED in export for Predict: ['predict']
INFO:tensorflow:Signatures INCLUDED in export for Train: None
INFO:tensorflow:Signatures INCLUDED in export for Train: None
INFO:tensorflow:Signatures INCLUDED in export for Eval: None
INFO:tensorflow:Signatures INCLUDED in export for Eval: None
INFO:tensorflow:Restoring parameters from /tmp/tmphpcnvj_9/model.ckpt-4071
INFO:tensorflow:Restoring parameters from /tmp/tmphpcnvj_9/model.ckpt-4071
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:Assets written to: /tmp/estimator/exported_model_dir/temp-1636456878/assets
INFO:tensorflow:Assets written to: /tmp/estimator/exported_model_dir/temp-1636456878/assets
INFO:tensorflow:SavedModel written to: /tmp/estimator/exported_model_dir/temp-1636456878/saved_model.pb
INFO:tensorflow:SavedModel written to: /tmp/estimator/exported_model_dir/temp-1636456878/saved_model.pb
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2021-11-09T11:21:20
INFO:tensorflow:Starting evaluation at 2021-11-09T11:21:20
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmphpcnvj_9/model.ckpt-4071
INFO:tensorflow:Restoring parameters from /tmp/tmphpcnvj_9/model.ckpt-4071
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [1628/16281]
INFO:tensorflow:Evaluation [1628/16281]
INFO:tensorflow:Evaluation [3256/16281]
INFO:tensorflow:Evaluation [3256/16281]
INFO:tensorflow:Evaluation [4884/16281]
INFO:tensorflow:Evaluation [4884/16281]
INFO:tensorflow:Evaluation [6512/16281]
INFO:tensorflow:Evaluation [6512/16281]
INFO:tensorflow:Evaluation [8140/16281]
INFO:tensorflow:Evaluation [8140/16281]
INFO:tensorflow:Evaluation [9768/16281]
INFO:tensorflow:Evaluation [9768/16281]
INFO:tensorflow:Evaluation [11396/16281]
INFO:tensorflow:Evaluation [11396/16281]
INFO:tensorflow:Evaluation [13024/16281]
INFO:tensorflow:Evaluation [13024/16281]
INFO:tensorflow:Evaluation [14652/16281]
INFO:tensorflow:Evaluation [14652/16281]
INFO:tensorflow:Evaluation [16280/16281]
INFO:tensorflow:Evaluation [16280/16281]
INFO:tensorflow:Evaluation [16281/16281]
INFO:tensorflow:Evaluation [16281/16281]
INFO:tensorflow:Inference Time : 68.77847s
INFO:tensorflow:Inference Time : 68.77847s
INFO:tensorflow:Finished evaluation at 2021-11-09-11:22:29
INFO:tensorflow:Finished evaluation at 2021-11-09-11:22:29
INFO:tensorflow:Saving dict for global step 4071: accuracy = 0.850562, accuracy_baseline = 0.76377374, auc = 0.9020801, auc_precision_recall = 0.96727455, average_loss = 0.32358122, global_step = 4071, label/mean = 0.76377374, loss = 0.32358122, precision = 0.87629795, prediction/mean = 0.7685369, recall = 0.9365501
INFO:tensorflow:Saving dict for global step 4071: accuracy = 0.850562, accuracy_baseline = 0.76377374, auc = 0.9020801, auc_precision_recall = 0.96727455, average_loss = 0.32358122, global_step = 4071, label/mean = 0.76377374, loss = 0.32358122, precision = 0.87629795, prediction/mean = 0.7685369, recall = 0.9365501
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 4071: /tmp/tmphpcnvj_9/model.ckpt-4071
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 4071: /tmp/tmphpcnvj_9/model.ckpt-4071
{'accuracy': 0.850562,
 'accuracy_baseline': 0.76377374,
 'auc': 0.9020801,
 'auc_precision_recall': 0.96727455,
 'average_loss': 0.32358122,
 'global_step': 4071,
 'label/mean': 0.76377374,
 'loss': 0.32358122,
 'precision': 0.87629795,
 'prediction/mean': 0.7685369,
 'recall': 0.9365501}

O que fizemos

Neste exemplo usamos tf.Transform para pré-processar um conjunto de dados dos dados do censo, e treinar um modelo com os dados limpos e transformados. Também criamos uma função de entrada que poderíamos usar ao implantar nosso modelo treinado em um ambiente de produção para realizar inferência. Usando o mesmo código para treinamento e inferência, evitamos quaisquer problemas com distorção de dados. Ao longo do caminho, aprendemos como criar uma transformação do Apache Beam para realizar a transformação necessária para limpar os dados. Vimos também como usar esses dados transformados para treinar um modelo usando tanto tf.keras ou tf.estimator . Esta é apenas uma pequena parte do que o TensorFlow Transform pode fazer! Nós encorajamos você a mergulhar em tf.Transform e veja o que ele pode fazer por você.