TF 2.0 is out! Get hands-on practice at TF World, Oct 28-31. Use code TF20 for 20% off select passes. Register now

Preprocessing data with TensorFlow Transform

The Feature Engineering Component of TensorFlow Extended (TFX)

This example colab notebook provides a somewhat more advanced example of how TensorFlow Transform (tf.Transform) can be used to preprocess data using exactly the same code for both training a model and serving inferences in production.

TensorFlow Transform is a library for preprocessing input data for TensorFlow, including creating features that require a full pass over the training dataset. For example, using TensorFlow Transform you could:

  • Normalize an input value by using the mean and standard deviation
  • Convert strings to integers by generating a vocabulary over all of the input values
  • Convert floats to integers by assigning them to buckets, based on the observed data distribution

TensorFlow has built-in support for manipulations on a single example or a batch of examples. tf.Transform extends these capabilities to support full passes over the entire training dataset.

The output of tf.Transform is exported as a TensorFlow graph which you can use for both training and serving. Using the same graph for both training and serving can prevent skew, since the same transformations are applied in both stages.

What we're doing in this example

In this example we'll be processing a widely used dataset containing census data, and training a model to do classification. Along the way we'll be transforming the data using tf.Transform.

Python check, imports, and globals

First we'll make sure that we're using Python 2, and then go ahead and install and import the stuff we need.

import sys
from __future__ import print_function

# Confirm that we're using Python 2
assert sys.version_info.major is 2, 'Oops, not running Python 2'
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import argparse
import os
import pprint
import tempfile
import urllib
import zipfile

temp = tempfile.gettempdir()
zip, headers = urllib.urlretrieve('https://storage.googleapis.com/tfx-colab-datasets/census.zip')
zipfile.ZipFile(zip).extractall(temp)
zipfile.ZipFile(zip).close()
urllib.urlcleanup()

train = os.path.join(temp, 'census/adult.data')
test = os.path.join(temp, 'census/adult.test')

try:
  import tensorflow_transform as tft
  import apache_beam as beam
except ImportError:
  print('Installing TensorFlow Transform.  This will take a minute, ignore the warnings')
  !pip install -q tensorflow_transform
  print('Installing Apache Beam.  This will take a minute, ignore the warnings')
  !pip install -q apache_beam
  import tensorflow_transform as tft
  import apache_beam as beam

import tensorflow as tf
import tensorflow_transform.beam as tft_beam
from tensorflow_transform.tf_metadata import dataset_metadata
from tensorflow_transform.tf_metadata import dataset_schema
Installing TensorFlow Transform.  This will take a minute, ignore the warnings
DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7.
Installing Apache Beam.  This will take a minute, ignore the warnings
DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7.

Name our columns

We'll create some handy lists for referencing the columns in our dataset.

CATEGORICAL_FEATURE_KEYS = [
    'workclass',
    'education',
    'marital-status',
    'occupation',
    'relationship',
    'race',
    'sex',
    'native-country',
]
NUMERIC_FEATURE_KEYS = [
    'age',
    'capital-gain',
    'capital-loss',
    'hours-per-week',
]
OPTIONAL_NUMERIC_FEATURE_KEYS = [
    'education-num',
]
LABEL_KEY = 'label'

Define our features and schema

Let's define a schema based on what types the columns are in our input. Among other things this will help with importing them correctly.

RAW_DATA_FEATURE_SPEC = dict(
    [(name, tf.FixedLenFeature([], tf.string))
     for name in CATEGORICAL_FEATURE_KEYS] +
    [(name, tf.FixedLenFeature([], tf.float32))
     for name in NUMERIC_FEATURE_KEYS] +
    [(name, tf.VarLenFeature(tf.float32))
     for name in OPTIONAL_NUMERIC_FEATURE_KEYS] +
    [(LABEL_KEY, tf.FixedLenFeature([], tf.string))]
)

RAW_DATA_METADATA = dataset_metadata.DatasetMetadata(
    dataset_schema.from_feature_spec(RAW_DATA_FEATURE_SPEC))

Setting hyperparameters and basic housekeeping

Constants and hyperparameters used for training. The bucket size includes all listed categories in the dataset description as well as one extra for "?" which represents unknown.

testing = False
if testing:
  TRAIN_NUM_EPOCHS = 1
  NUM_TRAIN_INSTANCES = 1
  TRAIN_BATCH_SIZE = 1
  NUM_TEST_INSTANCES = 1
else:
  TRAIN_NUM_EPOCHS = 16
  NUM_TRAIN_INSTANCES = 32561
  TRAIN_BATCH_SIZE = 128
  NUM_TEST_INSTANCES = 16281

# Names of temp files
TRANSFORMED_TRAIN_DATA_FILEBASE = 'train_transformed'
TRANSFORMED_TEST_DATA_FILEBASE = 'test_transformed'
EXPORTED_MODEL_DIR = 'exported_model_dir'

Cleaning

Create a Beam Transform for cleaning our input data

We'll create a Beam Transform by creating a subclass of Apache Beam's PTransform class and overriding the expand method to specify the actual processing logic. A PTransform represents a data processing operation, or a step, in your pipeline. Every PTransform takes one or more PCollection objects as input, performs a processing function that you provide on the elements of that PCollection, and produces zero or more output PCollection objects.

Our transform class will apply Beam's ParDo on the input PCollection containing our census dataset, producing clean data in an output PCollection.

class MapAndFilterErrors(beam.PTransform):
  """Like beam.Map but filters out erros in the map_fn."""

  class _MapAndFilterErrorsDoFn(beam.DoFn):
    """Count the bad examples using a beam metric."""

    def __init__(self, fn):
      self._fn = fn
      # Create a counter to measure number of bad elements.
      self._bad_elements_counter = beam.metrics.Metrics.counter(
          'census_example', 'bad_elements')

    def process(self, element):
      try:
        yield self._fn(element)
      except Exception:  # pylint: disable=broad-except
        # Catch any exception the above call.
        self._bad_elements_counter.inc(1)

  def __init__(self, fn):
    self._fn = fn

  def expand(self, pcoll):
    return pcoll | beam.ParDo(self._MapAndFilterErrorsDoFn(self._fn))

Preprocessing with tf.Transform

Create a tf.Transform preprocessing_fn

The preprocessing function is the most important concept of tf.Transform. A preprocessing function is where the transformation of the dataset really happens. It accepts and returns a dictionary of tensors, where a tensor means a Tensor or SparseTensor. There are two main groups of API calls that typically form the heart of a preprocessing function:

  1. TensorFlow Ops: Any function that accepts and returns tensors, which usually means TensorFlow ops. These add TensorFlow operations to the graph that transforms raw data into transformed data one feature vector at a time. These will run for every example, during both training and serving.
  2. TensorFlow Transform Analyzers: Any of the analyzers provided by tf.Transform. Analyzers also accept and return tensors, but unlike TensorFlow ops they only run once, during training, and typically make a full pass over the entire training dataset. They create tensor constants, which are added to your graph. For example, tft.min computes the minimum of a tensor over the training dataset. tf.Transform provides a fixed set of analyzers, but this will be extended in future versions.
def preprocessing_fn(inputs):
  """Preprocess input columns into transformed columns."""
  # Since we are modifying some features and leaving others unchanged, we
  # start by setting `outputs` to a copy of `inputs.
  outputs = inputs.copy()

  # Scale numeric columns to have range [0, 1].
  for key in NUMERIC_FEATURE_KEYS:
    outputs[key] = tft.scale_to_0_1(outputs[key])

  for key in OPTIONAL_NUMERIC_FEATURE_KEYS:
    # This is a SparseTensor because it is optional. Here we fill in a default
    # value when it is missing.
    dense = tf.sparse_to_dense(outputs[key].indices,
                               [outputs[key].dense_shape[0], 1],
                               outputs[key].values, default_value=0.)
    # Reshaping from a batch of vectors of size 1 to a batch to scalars.
    dense = tf.squeeze(dense, axis=1)
    outputs[key] = tft.scale_to_0_1(dense)

  # For all categorical columns except the label column, we generate a
  # vocabulary but do not modify the feature.  This vocabulary is instead
  # used in the trainer, by means of a feature column, to convert the feature
  # from a string to an integer id.
  for key in CATEGORICAL_FEATURE_KEYS:
    tft.vocabulary(inputs[key], vocab_filename=key)

  # For the label column we provide the mapping from string to index.
  table = tf.contrib.lookup.index_table_from_tensor(['>50K', '<=50K'])
  outputs[LABEL_KEY] = table.lookup(outputs[LABEL_KEY])

  return outputs

Transform the data

Now we're ready to start transforming our data in an Apache Beam pipeline.

  1. Read in the data using the CSV reader
  2. Clean it using our new MapAndFilterErrors transform
  3. Transform it using a preprocessing pipeline that scales numeric data and converts categorical data from strings to int64 values indices, by creating a vocabulary for each category
  4. Write out the result as a TFRecord of Example protos, which we will use for training a model later
def transform_data(train_data_file, test_data_file, working_dir):
  """Transform the data and write out as a TFRecord of Example protos.

  Read in the data using the CSV reader, and transform it using a
  preprocessing pipeline that scales numeric data and converts categorical data
  from strings to int64 values indices, by creating a vocabulary for each
  category.

  Args:
    train_data_file: File containing training data
    test_data_file: File containing test data
    working_dir: Directory to write transformed data and metadata to
  """

  # The "with" block will create a pipeline, and run that pipeline at the exit
  # of the block.
  with beam.Pipeline() as pipeline:
    with tft_beam.Context(temp_dir=tempfile.mkdtemp()):
      # Create a coder to read the census data with the schema.  To do this we
      # need to list all columns in order since the schema doesn't specify the
      # order of columns in the csv.
      ordered_columns = [
          'age', 'workclass', 'fnlwgt', 'education', 'education-num',
          'marital-status', 'occupation', 'relationship', 'race', 'sex',
          'capital-gain', 'capital-loss', 'hours-per-week', 'native-country',
          'label'
      ]
      converter = tft.coders.CsvCoder(ordered_columns, RAW_DATA_METADATA.schema)

      # Read in raw data and convert using CSV converter.  Note that we apply
      # some Beam transformations here, which will not be encoded in the TF
      # graph since we don't do the from within tf.Transform's methods
      # (AnalyzeDataset, TransformDataset etc.).  These transformations are just
      # to get data into a format that the CSV converter can read, in particular
      # removing spaces after commas.
      #
      # We use MapAndFilterErrors instead of Map to filter out decode errors in
      # convert.decode which should only occur for the trailing blank line.
      raw_data = (
          pipeline
          | 'ReadTrainData' >> beam.io.ReadFromText(train_data_file)
          | 'FixCommasTrainData' >> beam.Map(
              lambda line: line.replace(', ', ','))
          | 'DecodeTrainData' >> MapAndFilterErrors(converter.decode))

      # Combine data and schema into a dataset tuple.  Note that we already used
      # the schema to read the CSV data, but we also need it to interpret
      # raw_data.
      raw_dataset = (raw_data, RAW_DATA_METADATA)
      transformed_dataset, transform_fn = (
          raw_dataset | tft_beam.AnalyzeAndTransformDataset(preprocessing_fn))
      transformed_data, transformed_metadata = transformed_dataset
      transformed_data_coder = tft.coders.ExampleProtoCoder(
          transformed_metadata.schema)

      _ = (
          transformed_data
          | 'EncodeTrainData' >> beam.Map(transformed_data_coder.encode)
          | 'WriteTrainData' >> beam.io.WriteToTFRecord(
              os.path.join(working_dir, TRANSFORMED_TRAIN_DATA_FILEBASE)))

      # Now apply transform function to test data.  In this case we remove the
      # trailing period at the end of each line, and also ignore the header line
      # that is present in the test data file.
      raw_test_data = (
          pipeline
          | 'ReadTestData' >> beam.io.ReadFromText(test_data_file,
                                                   skip_header_lines=1)
          | 'FixCommasTestData' >> beam.Map(
              lambda line: line.replace(', ', ','))
          | 'RemoveTrailingPeriodsTestData' >> beam.Map(lambda line: line[:-1])
          | 'DecodeTestData' >> MapAndFilterErrors(converter.decode))

      raw_test_dataset = (raw_test_data, RAW_DATA_METADATA)

      transformed_test_dataset = (
          (raw_test_dataset, transform_fn) | tft_beam.TransformDataset())
      # Don't need transformed data schema, it's the same as before.
      transformed_test_data, _ = transformed_test_dataset

      _ = (
          transformed_test_data
          | 'EncodeTestData' >> beam.Map(transformed_data_coder.encode)
          | 'WriteTestData' >> beam.io.WriteToTFRecord(
              os.path.join(working_dir, TRANSFORMED_TEST_DATA_FILEBASE)))

      # Will write a SavedModel and metadata to working_dir, which can then
      # be read by the tft.TFTransformOutput class.
      _ = (
          transform_fn
          | 'WriteTransformFn' >> tft_beam.WriteTransformFn(working_dir))

Using our preprocessed data to train a model

To show how tf.Transform enables us to use the same code for both training and serving, and thus prevent skew, we're going to train a model. To train our model and prepare our trained model for production we need to create input functions. The main difference between our training input function and our serving input function is that training data contains the labels, and production data does not. The arguments and returns are also somewhat different.

Create an input function for training

def _make_training_input_fn(tf_transform_output, transformed_examples,
                            batch_size):
  """Creates an input function reading from transformed data.

  Args:
    tf_transform_output: Wrapper around output of tf.Transform.
    transformed_examples: Base filename of examples.
    batch_size: Batch size.

  Returns:
    The input function for training or eval.
  """
  def input_fn():
    """Input function for training and eval."""
    dataset = tf.contrib.data.make_batched_features_dataset(
        file_pattern=transformed_examples,
        batch_size=batch_size,
        features=tf_transform_output.transformed_feature_spec(),
        reader=tf.data.TFRecordDataset,
        shuffle=True)

    transformed_features = dataset.make_one_shot_iterator().get_next()

    # Extract features and label from the transformed tensors.
    transformed_labels = transformed_features.pop(LABEL_KEY)

    return transformed_features, transformed_labels

  return input_fn

Create an input function for serving

Let's create an input function that we could use in production, and prepare our trained model for serving.

def _make_serving_input_fn(tf_transform_output):
  """Creates an input function reading from raw data.

  Args:
    tf_transform_output: Wrapper around output of tf.Transform.

  Returns:
    The serving input function.
  """
  raw_feature_spec = RAW_DATA_METADATA.schema.as_feature_spec()
  # Remove label since it is not available during serving.
  raw_feature_spec.pop(LABEL_KEY)

  def serving_input_fn():
    """Input function for serving."""
    # Get raw features by generating the basic serving input_fn and calling it.
    # Here we generate an input_fn that expects a parsed Example proto to be fed
    # to the model at serving time.  See also
    # tf.estimator.export.build_raw_serving_input_receiver_fn.
    raw_input_fn = tf.estimator.export.build_parsing_serving_input_receiver_fn(
        raw_feature_spec, default_batch_size=None)
    serving_input_receiver = raw_input_fn()

    # Apply the transform function that was used to generate the materialized
    # data.
    raw_features = serving_input_receiver.features
    transformed_features = tf_transform_output.transform_raw_features(
        raw_features)

    return tf.estimator.export.ServingInputReceiver(
        transformed_features, serving_input_receiver.receiver_tensors)

  return serving_input_fn

Wrap our input data in FeatureColumns

Our model will expect our data in TensorFlow FeatureColumns.

def get_feature_columns(tf_transform_output):
  """Returns the FeatureColumns for the model.

  Args:
    tf_transform_output: A `TFTransformOutput` object.

  Returns:
    A list of FeatureColumns.
  """
  # Wrap scalars as real valued columns.
  real_valued_columns = [tf.feature_column.numeric_column(key, shape=())
                         for key in NUMERIC_FEATURE_KEYS]

  # Wrap categorical columns.
  one_hot_columns = [
      tf.feature_column.categorical_column_with_vocabulary_file(
          key=key,
          vocabulary_file=tf_transform_output.vocabulary_file_by_name(
              vocab_filename=key))
      for key in CATEGORICAL_FEATURE_KEYS]

  return real_valued_columns + one_hot_columns

Train, Evaluate, and Export our model

def train_and_evaluate(working_dir, num_train_instances=NUM_TRAIN_INSTANCES,
                       num_test_instances=NUM_TEST_INSTANCES):
  """Train the model on training data and evaluate on test data.

  Args:
    working_dir: Directory to read transformed data and metadata from and to
        write exported model to.
    num_train_instances: Number of instances in train set
    num_test_instances: Number of instances in test set

  Returns:
    The results from the estimator's 'evaluate' method
  """
  tf_transform_output = tft.TFTransformOutput(working_dir)
  run_config = tf.estimator.RunConfig()

  estimator = tf.estimator.LinearClassifier(
      feature_columns=get_feature_columns(tf_transform_output),
      config=run_config)

  # Fit the model using the default optimizer.
  train_input_fn = _make_training_input_fn(
      tf_transform_output,
      os.path.join(working_dir, TRANSFORMED_TRAIN_DATA_FILEBASE + '*'),
      batch_size=TRAIN_BATCH_SIZE)
  estimator.train(
      input_fn=train_input_fn,
      max_steps=TRAIN_NUM_EPOCHS * num_train_instances / TRAIN_BATCH_SIZE)

  # Evaluate model on test dataset.
  eval_input_fn = _make_training_input_fn(
      tf_transform_output,
      os.path.join(working_dir, TRANSFORMED_TEST_DATA_FILEBASE + '*'),
      batch_size=1)

  # Export the model.
  serving_input_fn = _make_serving_input_fn(tf_transform_output)
  exported_model_dir = os.path.join(working_dir, EXPORTED_MODEL_DIR)
  estimator.export_savedmodel(exported_model_dir, serving_input_fn)

  return estimator.evaluate(input_fn=eval_input_fn, steps=num_test_instances)

Put it all together

We've created all the stuff we need to preprocess our census data, train a model, and prepare it for serving. So far we've just been getting things ready. It's time to start running!

import time

start = time.time()
try:
  transform_data(train, test, temp)
  print('Transform took {:.2f} seconds'.format(time.time() - start))
  results = train_and_evaluate(temp)
  print('Transform and training took {:.2f} seconds'.format(time.time() - start))
  pprint.pprint(results)
finally:
  # cleanup
  import shutil
  if os.path.isdir(temp) and not testing:
    shutil.rmtree(temp)
WARNING:tensorflow:From <ipython-input-7-8469aa6bda25>:16: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.

WARNING:tensorflow:From <ipython-input-7-8469aa6bda25>:16: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.

INFO:tensorflow:Assets added to graph.

INFO:tensorflow:Assets added to graph.

INFO:tensorflow:No assets to write.

INFO:tensorflow:No assets to write.

INFO:tensorflow:SavedModel written to: /tmp/tmpWoUiZL/tftransform_tmp/2e424e478e6c4ea48d565eb95167b958/saved_model.pb

INFO:tensorflow:SavedModel written to: /tmp/tmpWoUiZL/tftransform_tmp/2e424e478e6c4ea48d565eb95167b958/saved_model.pb

INFO:tensorflow:Assets added to graph.

INFO:tensorflow:Assets added to graph.

INFO:tensorflow:No assets to write.

INFO:tensorflow:No assets to write.

INFO:tensorflow:SavedModel written to: /tmp/tmpWoUiZL/tftransform_tmp/92e41ba2abfb418da391044c5f72bfc6/saved_model.pb

INFO:tensorflow:SavedModel written to: /tmp/tmpWoUiZL/tftransform_tmp/92e41ba2abfb418da391044c5f72bfc6/saved_model.pb

INFO:tensorflow:Saver not created because there are no variables in the graph to restore

INFO:tensorflow:Saver not created because there are no variables in the graph to restore

INFO:tensorflow:Saver not created because there are no variables in the graph to restore

INFO:tensorflow:Saver not created because there are no variables in the graph to restore

INFO:tensorflow:Assets added to graph.

INFO:tensorflow:Assets added to graph.

INFO:tensorflow:Assets written to: /tmp/tmpWoUiZL/tftransform_tmp/a2745ae045ee4ba989c7061f62088b0a/assets

INFO:tensorflow:Assets written to: /tmp/tmpWoUiZL/tftransform_tmp/a2745ae045ee4ba989c7061f62088b0a/assets

INFO:tensorflow:SavedModel written to: /tmp/tmpWoUiZL/tftransform_tmp/a2745ae045ee4ba989c7061f62088b0a/saved_model.pb

INFO:tensorflow:SavedModel written to: /tmp/tmpWoUiZL/tftransform_tmp/a2745ae045ee4ba989c7061f62088b0a/saved_model.pb

WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_2:0\022\016native-country"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_2:0\022\016native-country"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_3:0\022\003sex"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_3:0\022\003sex"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_4:0\022\014relationship"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_4:0\022\014relationship"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_9:0\022\teducation"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_9:0\022\teducation"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_10:0\022\tworkclass"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_10:0\022\tworkclass"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_13:0\022\noccupation"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_13:0\022\noccupation"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_16:0\022\016marital-status"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_16:0\022\016marital-status"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_17:0\022\004race"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_17:0\022\004race"


INFO:tensorflow:Saver not created because there are no variables in the graph to restore

INFO:tensorflow:Saver not created because there are no variables in the graph to restore

WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_2:0\022\016native-country"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_2:0\022\016native-country"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_3:0\022\003sex"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_3:0\022\003sex"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_4:0\022\014relationship"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_4:0\022\014relationship"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_9:0\022\teducation"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_9:0\022\teducation"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_10:0\022\tworkclass"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_10:0\022\tworkclass"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_13:0\022\noccupation"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_13:0\022\noccupation"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_16:0\022\016marital-status"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_16:0\022\016marital-status"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_17:0\022\004race"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_17:0\022\004race"


INFO:tensorflow:Saver not created because there are no variables in the graph to restore

INFO:tensorflow:Saver not created because there are no variables in the graph to restore
WARNING:root:Couldn't find python-snappy so the implementation of _TFRecordUtil._masked_crc32c is not as fast as it could be.

WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_2:0\022\016native-country"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_2:0\022\016native-country"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_3:0\022\003sex"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_3:0\022\003sex"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_4:0\022\014relationship"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_4:0\022\014relationship"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_9:0\022\teducation"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_9:0\022\teducation"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_10:0\022\tworkclass"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_10:0\022\tworkclass"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_13:0\022\noccupation"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_13:0\022\noccupation"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_16:0\022\016marital-status"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_16:0\022\016marital-status"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_17:0\022\004race"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_17:0\022\004race"


INFO:tensorflow:Saver not created because there are no variables in the graph to restore

INFO:tensorflow:Saver not created because there are no variables in the graph to restore

Transform took 12.67 seconds
INFO:tensorflow:vocabulary_size = 7 in workclass is inferred from the number of elements in the vocabulary_file /tmp/transform_fn/assets/workclass.

INFO:tensorflow:vocabulary_size = 7 in workclass is inferred from the number of elements in the vocabulary_file /tmp/transform_fn/assets/workclass.

INFO:tensorflow:vocabulary_size = 16 in education is inferred from the number of elements in the vocabulary_file /tmp/transform_fn/assets/education.

INFO:tensorflow:vocabulary_size = 16 in education is inferred from the number of elements in the vocabulary_file /tmp/transform_fn/assets/education.

INFO:tensorflow:vocabulary_size = 7 in marital-status is inferred from the number of elements in the vocabulary_file /tmp/transform_fn/assets/marital-status.

INFO:tensorflow:vocabulary_size = 7 in marital-status is inferred from the number of elements in the vocabulary_file /tmp/transform_fn/assets/marital-status.

INFO:tensorflow:vocabulary_size = 15 in occupation is inferred from the number of elements in the vocabulary_file /tmp/transform_fn/assets/occupation.

INFO:tensorflow:vocabulary_size = 15 in occupation is inferred from the number of elements in the vocabulary_file /tmp/transform_fn/assets/occupation.

INFO:tensorflow:vocabulary_size = 6 in relationship is inferred from the number of elements in the vocabulary_file /tmp/transform_fn/assets/relationship.

INFO:tensorflow:vocabulary_size = 6 in relationship is inferred from the number of elements in the vocabulary_file /tmp/transform_fn/assets/relationship.

INFO:tensorflow:vocabulary_size = 5 in race is inferred from the number of elements in the vocabulary_file /tmp/transform_fn/assets/race.

INFO:tensorflow:vocabulary_size = 5 in race is inferred from the number of elements in the vocabulary_file /tmp/transform_fn/assets/race.

INFO:tensorflow:vocabulary_size = 2 in sex is inferred from the number of elements in the vocabulary_file /tmp/transform_fn/assets/sex.

INFO:tensorflow:vocabulary_size = 2 in sex is inferred from the number of elements in the vocabulary_file /tmp/transform_fn/assets/sex.

INFO:tensorflow:vocabulary_size = 29 in native-country is inferred from the number of elements in the vocabulary_file /tmp/transform_fn/assets/native-country.

INFO:tensorflow:vocabulary_size = 29 in native-country is inferred from the number of elements in the vocabulary_file /tmp/transform_fn/assets/native-country.

WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmpwaevYZ

WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmpwaevYZ

INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_num_ps_replicas': 0, '_keep_checkpoint_max': 5, '_task_type': 'worker', '_global_id_in_cluster': 0, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fa680561d50>, '_model_dir': '/tmp/tmpwaevYZ', '_protocol': None, '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_service': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_tf_random_seed': None, '_save_summary_steps': 100, '_device_fn': None, '_experimental_distribute': None, '_num_worker_replicas': 1, '_task_id': 0, '_log_step_count_steps': 100, '_evaluation_master': '', '_eval_distribute': None, '_train_distribute': None, '_master': ''}

INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_num_ps_replicas': 0, '_keep_checkpoint_max': 5, '_task_type': 'worker', '_global_id_in_cluster': 0, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fa680561d50>, '_model_dir': '/tmp/tmpwaevYZ', '_protocol': None, '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_service': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_tf_random_seed': None, '_save_summary_steps': 100, '_device_fn': None, '_experimental_distribute': None, '_num_worker_replicas': 1, '_task_id': 0, '_log_step_count_steps': 100, '_evaluation_master': '', '_eval_distribute': None, '_train_distribute': None, '_master': ''}

WARNING:tensorflow:From <ipython-input-9-d16a32638667>:20: make_batched_features_dataset (from tensorflow.contrib.data.python.ops.readers) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.make_batched_features_dataset(...)`.

WARNING:tensorflow:From <ipython-input-9-d16a32638667>:20: make_batched_features_dataset (from tensorflow.contrib.data.python.ops.readers) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.make_batched_features_dataset(...)`.

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Create CheckpointSaverHook.

INFO:tensorflow:Create CheckpointSaverHook.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmpwaevYZ/model.ckpt.

INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmpwaevYZ/model.ckpt.

INFO:tensorflow:loss = 88.722855, step = 1

INFO:tensorflow:loss = 88.722855, step = 1

INFO:tensorflow:global_step/sec: 180.507

INFO:tensorflow:global_step/sec: 180.507

INFO:tensorflow:loss = 40.86286, step = 101 (0.560 sec)

INFO:tensorflow:loss = 40.86286, step = 101 (0.560 sec)

INFO:tensorflow:global_step/sec: 304.932

INFO:tensorflow:global_step/sec: 304.932

INFO:tensorflow:loss = 44.193005, step = 201 (0.327 sec)

INFO:tensorflow:loss = 44.193005, step = 201 (0.327 sec)

INFO:tensorflow:global_step/sec: 299.48

INFO:tensorflow:global_step/sec: 299.48

INFO:tensorflow:loss = 50.806885, step = 301 (0.334 sec)

INFO:tensorflow:loss = 50.806885, step = 301 (0.334 sec)

INFO:tensorflow:global_step/sec: 295.146

INFO:tensorflow:global_step/sec: 295.146

INFO:tensorflow:loss = 35.7563, step = 401 (0.339 sec)

INFO:tensorflow:loss = 35.7563, step = 401 (0.339 sec)

INFO:tensorflow:global_step/sec: 294.285

INFO:tensorflow:global_step/sec: 294.285

INFO:tensorflow:loss = 32.33618, step = 501 (0.339 sec)

INFO:tensorflow:loss = 32.33618, step = 501 (0.339 sec)

INFO:tensorflow:global_step/sec: 303.043

INFO:tensorflow:global_step/sec: 303.043

INFO:tensorflow:loss = 51.54535, step = 601 (0.330 sec)

INFO:tensorflow:loss = 51.54535, step = 601 (0.330 sec)

INFO:tensorflow:global_step/sec: 296.347

INFO:tensorflow:global_step/sec: 296.347

INFO:tensorflow:loss = 27.778667, step = 701 (0.338 sec)

INFO:tensorflow:loss = 27.778667, step = 701 (0.338 sec)

INFO:tensorflow:global_step/sec: 297.655

INFO:tensorflow:global_step/sec: 297.655

INFO:tensorflow:loss = 40.612244, step = 801 (0.336 sec)

INFO:tensorflow:loss = 40.612244, step = 801 (0.336 sec)

INFO:tensorflow:global_step/sec: 295.171

INFO:tensorflow:global_step/sec: 295.171

INFO:tensorflow:loss = 29.905676, step = 901 (0.338 sec)

INFO:tensorflow:loss = 29.905676, step = 901 (0.338 sec)

INFO:tensorflow:global_step/sec: 300.309

INFO:tensorflow:global_step/sec: 300.309

INFO:tensorflow:loss = 28.969994, step = 1001 (0.333 sec)

INFO:tensorflow:loss = 28.969994, step = 1001 (0.333 sec)

INFO:tensorflow:global_step/sec: 294.927

INFO:tensorflow:global_step/sec: 294.927

INFO:tensorflow:loss = 36.346336, step = 1101 (0.340 sec)

INFO:tensorflow:loss = 36.346336, step = 1101 (0.340 sec)

INFO:tensorflow:global_step/sec: 314.358

INFO:tensorflow:global_step/sec: 314.358

INFO:tensorflow:loss = 34.73613, step = 1201 (0.317 sec)

INFO:tensorflow:loss = 34.73613, step = 1201 (0.317 sec)

INFO:tensorflow:global_step/sec: 297.794

INFO:tensorflow:global_step/sec: 297.794

INFO:tensorflow:loss = 28.349026, step = 1301 (0.336 sec)

INFO:tensorflow:loss = 28.349026, step = 1301 (0.336 sec)

INFO:tensorflow:global_step/sec: 306.629

INFO:tensorflow:global_step/sec: 306.629

INFO:tensorflow:loss = 34.430466, step = 1401 (0.324 sec)

INFO:tensorflow:loss = 34.430466, step = 1401 (0.324 sec)

INFO:tensorflow:global_step/sec: 309.818

INFO:tensorflow:global_step/sec: 309.818

INFO:tensorflow:loss = 34.080967, step = 1501 (0.324 sec)

INFO:tensorflow:loss = 34.080967, step = 1501 (0.324 sec)

INFO:tensorflow:global_step/sec: 301.278

INFO:tensorflow:global_step/sec: 301.278

INFO:tensorflow:loss = 34.65186, step = 1601 (0.332 sec)

INFO:tensorflow:loss = 34.65186, step = 1601 (0.332 sec)

INFO:tensorflow:global_step/sec: 306.507

INFO:tensorflow:global_step/sec: 306.507

INFO:tensorflow:loss = 33.13823, step = 1701 (0.326 sec)

INFO:tensorflow:loss = 33.13823, step = 1701 (0.326 sec)

INFO:tensorflow:global_step/sec: 293.62

INFO:tensorflow:global_step/sec: 293.62

INFO:tensorflow:loss = 32.772217, step = 1801 (0.342 sec)

INFO:tensorflow:loss = 32.772217, step = 1801 (0.342 sec)

INFO:tensorflow:global_step/sec: 285.988

INFO:tensorflow:global_step/sec: 285.988

INFO:tensorflow:loss = 34.49459, step = 1901 (0.348 sec)

INFO:tensorflow:loss = 34.49459, step = 1901 (0.348 sec)

INFO:tensorflow:global_step/sec: 291.781

INFO:tensorflow:global_step/sec: 291.781

INFO:tensorflow:loss = 39.421787, step = 2001 (0.342 sec)

INFO:tensorflow:loss = 39.421787, step = 2001 (0.342 sec)

INFO:tensorflow:global_step/sec: 285.012

INFO:tensorflow:global_step/sec: 285.012

INFO:tensorflow:loss = 34.760498, step = 2101 (0.353 sec)

INFO:tensorflow:loss = 34.760498, step = 2101 (0.353 sec)

INFO:tensorflow:global_step/sec: 299.697

INFO:tensorflow:global_step/sec: 299.697

INFO:tensorflow:loss = 34.292377, step = 2201 (0.333 sec)

INFO:tensorflow:loss = 34.292377, step = 2201 (0.333 sec)

INFO:tensorflow:global_step/sec: 299.57

INFO:tensorflow:global_step/sec: 299.57

INFO:tensorflow:loss = 38.545174, step = 2301 (0.336 sec)

INFO:tensorflow:loss = 38.545174, step = 2301 (0.336 sec)

INFO:tensorflow:global_step/sec: 295.768

INFO:tensorflow:global_step/sec: 295.768

INFO:tensorflow:loss = 28.001547, step = 2401 (0.336 sec)

INFO:tensorflow:loss = 28.001547, step = 2401 (0.336 sec)

INFO:tensorflow:global_step/sec: 296.47

INFO:tensorflow:global_step/sec: 296.47

INFO:tensorflow:loss = 39.022064, step = 2501 (0.337 sec)

INFO:tensorflow:loss = 39.022064, step = 2501 (0.337 sec)

INFO:tensorflow:global_step/sec: 303.016

INFO:tensorflow:global_step/sec: 303.016

INFO:tensorflow:loss = 35.252045, step = 2601 (0.330 sec)

INFO:tensorflow:loss = 35.252045, step = 2601 (0.330 sec)

INFO:tensorflow:global_step/sec: 308.095

INFO:tensorflow:global_step/sec: 308.095

INFO:tensorflow:loss = 28.236652, step = 2701 (0.325 sec)

INFO:tensorflow:loss = 28.236652, step = 2701 (0.325 sec)

INFO:tensorflow:global_step/sec: 296.904

INFO:tensorflow:global_step/sec: 296.904

INFO:tensorflow:loss = 39.21161, step = 2801 (0.337 sec)

INFO:tensorflow:loss = 39.21161, step = 2801 (0.337 sec)

INFO:tensorflow:global_step/sec: 292.268

INFO:tensorflow:global_step/sec: 292.268

INFO:tensorflow:loss = 38.391006, step = 2901 (0.342 sec)

INFO:tensorflow:loss = 38.391006, step = 2901 (0.342 sec)

INFO:tensorflow:global_step/sec: 302.633

INFO:tensorflow:global_step/sec: 302.633

INFO:tensorflow:loss = 46.611404, step = 3001 (0.330 sec)

INFO:tensorflow:loss = 46.611404, step = 3001 (0.330 sec)

INFO:tensorflow:global_step/sec: 303.805

INFO:tensorflow:global_step/sec: 303.805

INFO:tensorflow:loss = 30.610819, step = 3101 (0.331 sec)

INFO:tensorflow:loss = 30.610819, step = 3101 (0.331 sec)

INFO:tensorflow:global_step/sec: 307.741

INFO:tensorflow:global_step/sec: 307.741

INFO:tensorflow:loss = 36.57055, step = 3201 (0.325 sec)

INFO:tensorflow:loss = 36.57055, step = 3201 (0.325 sec)

INFO:tensorflow:global_step/sec: 296.282

INFO:tensorflow:global_step/sec: 296.282

INFO:tensorflow:loss = 35.325912, step = 3301 (0.337 sec)

INFO:tensorflow:loss = 35.325912, step = 3301 (0.337 sec)

INFO:tensorflow:global_step/sec: 298.953

INFO:tensorflow:global_step/sec: 298.953

INFO:tensorflow:loss = 40.206863, step = 3401 (0.334 sec)

INFO:tensorflow:loss = 40.206863, step = 3401 (0.334 sec)

INFO:tensorflow:global_step/sec: 303.204

INFO:tensorflow:global_step/sec: 303.204

INFO:tensorflow:loss = 40.41941, step = 3501 (0.330 sec)

INFO:tensorflow:loss = 40.41941, step = 3501 (0.330 sec)

INFO:tensorflow:global_step/sec: 285.081

INFO:tensorflow:global_step/sec: 285.081

INFO:tensorflow:loss = 30.434353, step = 3601 (0.351 sec)

INFO:tensorflow:loss = 30.434353, step = 3601 (0.351 sec)

INFO:tensorflow:global_step/sec: 290.223

INFO:tensorflow:global_step/sec: 290.223

INFO:tensorflow:loss = 32.601574, step = 3701 (0.344 sec)

INFO:tensorflow:loss = 32.601574, step = 3701 (0.344 sec)

INFO:tensorflow:global_step/sec: 290.402

INFO:tensorflow:global_step/sec: 290.402

INFO:tensorflow:loss = 31.892595, step = 3801 (0.344 sec)

INFO:tensorflow:loss = 31.892595, step = 3801 (0.344 sec)

INFO:tensorflow:global_step/sec: 294.922

INFO:tensorflow:global_step/sec: 294.922

INFO:tensorflow:loss = 33.28108, step = 3901 (0.340 sec)

INFO:tensorflow:loss = 33.28108, step = 3901 (0.340 sec)

INFO:tensorflow:global_step/sec: 300.134

INFO:tensorflow:global_step/sec: 300.134

INFO:tensorflow:loss = 33.404457, step = 4001 (0.333 sec)

INFO:tensorflow:loss = 33.404457, step = 4001 (0.333 sec)

INFO:tensorflow:Saving checkpoints for 4071 into /tmp/tmpwaevYZ/model.ckpt.

INFO:tensorflow:Saving checkpoints for 4071 into /tmp/tmpwaevYZ/model.ckpt.

INFO:tensorflow:Loss for final step: 35.481014.

INFO:tensorflow:Loss for final step: 35.481014.

WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_2:0\022\016native-country"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_2:0\022\016native-country"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_3:0\022\003sex"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_3:0\022\003sex"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_4:0\022\014relationship"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_4:0\022\014relationship"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_9:0\022\teducation"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_9:0\022\teducation"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_10:0\022\tworkclass"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_10:0\022\tworkclass"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_13:0\022\noccupation"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_13:0\022\noccupation"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_16:0\022\016marital-status"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_16:0\022\016marital-status"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_17:0\022\004race"


WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_17:0\022\004race"


INFO:tensorflow:Saver not created because there are no variables in the graph to restore

INFO:tensorflow:Saver not created because there are no variables in the graph to restore

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Signatures INCLUDED in export for Eval: None

INFO:tensorflow:Signatures INCLUDED in export for Eval: None

INFO:tensorflow:Signatures INCLUDED in export for Classify: ['serving_default', 'classification']

INFO:tensorflow:Signatures INCLUDED in export for Classify: ['serving_default', 'classification']

INFO:tensorflow:Signatures INCLUDED in export for Regress: ['regression']

INFO:tensorflow:Signatures INCLUDED in export for Regress: ['regression']

INFO:tensorflow:Signatures INCLUDED in export for Predict: ['predict']

INFO:tensorflow:Signatures INCLUDED in export for Predict: ['predict']

INFO:tensorflow:Signatures INCLUDED in export for Train: None

INFO:tensorflow:Signatures INCLUDED in export for Train: None

INFO:tensorflow:Restoring parameters from /tmp/tmpwaevYZ/model.ckpt-4071

INFO:tensorflow:Restoring parameters from /tmp/tmpwaevYZ/model.ckpt-4071

WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py:1044: calling add_meta_graph_and_variables (from tensorflow.python.saved_model.builder_impl) with legacy_init_op is deprecated and will be removed in a future version.
Instructions for updating:
Pass your op to the equivalent parameter main_op instead.

WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py:1044: calling add_meta_graph_and_variables (from tensorflow.python.saved_model.builder_impl) with legacy_init_op is deprecated and will be removed in a future version.
Instructions for updating:
Pass your op to the equivalent parameter main_op instead.

INFO:tensorflow:Assets added to graph.

INFO:tensorflow:Assets added to graph.

INFO:tensorflow:Assets written to: /tmp/exported_model_dir/temp-1549657074/assets

INFO:tensorflow:Assets written to: /tmp/exported_model_dir/temp-1549657074/assets

INFO:tensorflow:SavedModel written to: /tmp/exported_model_dir/temp-1549657074/saved_model.pb

INFO:tensorflow:SavedModel written to: /tmp/exported_model_dir/temp-1549657074/saved_model.pb

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Calling model_fn.

WARNING:tensorflow:Trapezoidal rule is known to produce incorrect PR-AUCs; please switch to "careful_interpolation" instead.

WARNING:tensorflow:Trapezoidal rule is known to produce incorrect PR-AUCs; please switch to "careful_interpolation" instead.

WARNING:tensorflow:Trapezoidal rule is known to produce incorrect PR-AUCs; please switch to "careful_interpolation" instead.

WARNING:tensorflow:Trapezoidal rule is known to produce incorrect PR-AUCs; please switch to "careful_interpolation" instead.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Starting evaluation at 2019-02-08-20:17:56

INFO:tensorflow:Starting evaluation at 2019-02-08-20:17:56

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Restoring parameters from /tmp/tmpwaevYZ/model.ckpt-4071

INFO:tensorflow:Restoring parameters from /tmp/tmpwaevYZ/model.ckpt-4071

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Evaluation [1628/16281]

INFO:tensorflow:Evaluation [1628/16281]

INFO:tensorflow:Evaluation [3256/16281]

INFO:tensorflow:Evaluation [3256/16281]

INFO:tensorflow:Evaluation [4884/16281]

INFO:tensorflow:Evaluation [4884/16281]

INFO:tensorflow:Evaluation [6512/16281]

INFO:tensorflow:Evaluation [6512/16281]

INFO:tensorflow:Evaluation [8140/16281]

INFO:tensorflow:Evaluation [8140/16281]

INFO:tensorflow:Evaluation [9768/16281]

INFO:tensorflow:Evaluation [9768/16281]

INFO:tensorflow:Evaluation [11396/16281]

INFO:tensorflow:Evaluation [11396/16281]

INFO:tensorflow:Evaluation [13024/16281]

INFO:tensorflow:Evaluation [13024/16281]

INFO:tensorflow:Evaluation [14652/16281]

INFO:tensorflow:Evaluation [14652/16281]

INFO:tensorflow:Evaluation [16280/16281]

INFO:tensorflow:Evaluation [16280/16281]

INFO:tensorflow:Evaluation [16281/16281]

INFO:tensorflow:Evaluation [16281/16281]

INFO:tensorflow:Finished evaluation at 2019-02-08-20:18:50

INFO:tensorflow:Finished evaluation at 2019-02-08-20:18:50

INFO:tensorflow:Saving dict for global step 4071: accuracy = 0.8075671, accuracy_baseline = 0.75978136, auc = 0.85481, auc_precision_recall = 0.9476837, average_loss = 0.44109586, global_step = 4071, label/mean = 0.75978136, loss = 0.44109586, precision = 0.86745167, prediction/mean = 0.745779, recall = 0.8814066

INFO:tensorflow:Saving dict for global step 4071: accuracy = 0.8075671, accuracy_baseline = 0.75978136, auc = 0.85481, auc_precision_recall = 0.9476837, average_loss = 0.44109586, global_step = 4071, label/mean = 0.75978136, loss = 0.44109586, precision = 0.86745167, prediction/mean = 0.745779, recall = 0.8814066

INFO:tensorflow:Saving 'checkpoint_path' summary for global step 4071: /tmp/tmpwaevYZ/model.ckpt-4071

INFO:tensorflow:Saving 'checkpoint_path' summary for global step 4071: /tmp/tmpwaevYZ/model.ckpt-4071

Transform and training took 87.67 seconds
{'accuracy': 0.8075671,
 'accuracy_baseline': 0.75978136,
 'auc': 0.85481,
 'auc_precision_recall': 0.9476837,
 'average_loss': 0.44109586,
 'global_step': 4071,
 'label/mean': 0.75978136,
 'loss': 0.44109586,
 'precision': 0.86745167,
 'prediction/mean': 0.745779,
 'recall': 0.8814066}

What we did

In this example we used tf.Transform to preprocess a dataset of census data, and train a model with the cleaned and transformed data. We also created an input function that we could use when we deploy our trained model in a production environment to perform inference. By using the same code for both training and inference we avoid any issues with data skew. Along the way we learned about creating an Apache Beam transform to perform the transformation that we needed for cleaing the data, and wrapped our data in TensorFlow FeatureColumns. This is just a small piece of what TensorFlow Transform can do! We encourage you to dive into tf.Transform and discover what it can do for you.