SIG TFX-Addons 커뮤니티에 가입하여 TFX를 더욱 향상 시키십시오!
이 페이지는 Cloud Translation API를 통해 번역되었습니다.
Switch to English

TensorFlow Transform으로 데이터 전처리

TensorFlow Extended (TFX)의 기능 엔지니어링 구성 요소

이 예제 colab 노트북은 TensorFlow Transform ( tf.Transform )을 사용하여 모델을 학습시키고 프로덕션에서 추론을 제공하기 위해 정확히 동일한 코드를 사용하여 데이터를 사전 처리하는 방법에 대한 다소 고급 예제를 제공합니다.

TensorFlow Transform은 학습 데이터 세트에 대한 전체 패스가 필요한 기능 생성을 포함하여 TensorFlow의 입력 데이터를 사전 처리하기위한 라이브러리입니다. 예를 들어 TensorFlow Transform을 사용하여 다음을 수행 할 수 있습니다.

  • 평균 및 표준 편차를 사용하여 입력 값 정규화
  • 모든 입력 값에 대해 어휘를 생성하여 문자열을 정수로 변환
  • 관찰 된 데이터 분포를 기반으로 부동 소수점을 버킷에 할당하여 정수로 변환

TensorFlow는 단일 예제 또는 예제 배치에 대한 조작을 기본적으로 지원합니다. tf.Transform 은 이러한 기능을 확장하여 전체 학습 데이터 세트에 대한 전체 패스를 지원합니다.

tf.Transform 의 출력은 학습 및 제공 모두에 사용할 수있는 TensorFlow 그래프로 내보내집니다. 학습과 제공에 동일한 그래프를 사용하면 두 단계 모두에 동일한 변환이 적용되므로 편향을 방지 할 수 있습니다.

이 예에서 우리가하는 일

이 예에서는 인구 조사 데이터가 포함널리 사용되는 데이터 세트를 처리하고 분류를 수행하도록 모델을 학습합니다. 그 과정에서 tf.Transform 사용하여 데이터를 변환 할 것입니다.

Pip 업그레이드

로컬에서 실행할 때 시스템에서 Pip을 업그레이드하지 않으려면 Colab에서 실행 중인지 확인하십시오. 물론 로컬 시스템은 별도로 업그레이드 할 수 있습니다.

try:
  import colab
  !pip install -q --upgrade pip
except:
  pass

TensorFlow Transform 설치

pip install -q tensorflow-transform

Python 검사, 가져 오기 및 전역

먼저 Python 3을 사용하고 있는지 확인한 다음 필요한 항목을 설치하고 가져옵니다.

import sys

# Confirm that we're using Python 3
assert sys.version_info.major is 3, 'Oops, not running Python 3. Use Runtime > Change runtime type'
import math
import os
import pprint

import tensorflow as tf
print('TF: {}'.format(tf.__version__))

import apache_beam as beam
print('Beam: {}'.format(beam.__version__))

import tensorflow_transform as tft
import tensorflow_transform.beam as tft_beam
print('Transform: {}'.format(tft.__version__))

from tfx_bsl.public import tfxio

!wget https://storage.googleapis.com/artifacts.tfx-oss-public.appspot.com/datasets/census/adult.data
!wget https://storage.googleapis.com/artifacts.tfx-oss-public.appspot.com/datasets/census/adult.test

train = './adult.data'
test = './adult.test'
TF: 2.4.1
Beam: 2.27.0
Transform: 0.27.0
--2021-02-11 13:17:47--  https://storage.googleapis.com/artifacts.tfx-oss-public.appspot.com/datasets/census/adult.data
Resolving storage.googleapis.com (storage.googleapis.com)... 142.250.136.128, 74.125.129.128, 172.217.212.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|142.250.136.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3974305 (3.8M) [application/octet-stream]
Saving to: ‘adult.data’

adult.data          100%[===================>]   3.79M  --.-KB/s    in 0.04s   

2021-02-11 13:17:47 (105 MB/s) - ‘adult.data’ saved [3974305/3974305]

--2021-02-11 13:17:47--  https://storage.googleapis.com/artifacts.tfx-oss-public.appspot.com/datasets/census/adult.test
Resolving storage.googleapis.com (storage.googleapis.com)... 74.125.202.128, 74.125.70.128, 74.125.126.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|74.125.202.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2003153 (1.9M) [application/octet-stream]
Saving to: ‘adult.test’

adult.test          100%[===================>]   1.91M  --.-KB/s    in 0.02s   

2021-02-11 13:17:47 (103 MB/s) - ‘adult.test’ saved [2003153/2003153]

열 이름 지정

데이터 세트의 열을 참조하기위한 몇 가지 편리한 목록을 만들 것입니다.

CATEGORICAL_FEATURE_KEYS = [
    'workclass',
    'education',
    'marital-status',
    'occupation',
    'relationship',
    'race',
    'sex',
    'native-country',
]
NUMERIC_FEATURE_KEYS = [
    'age',
    'capital-gain',
    'capital-loss',
    'hours-per-week',
]
OPTIONAL_NUMERIC_FEATURE_KEYS = [
    'education-num',
]
ORDERED_CSV_COLUMNS = [
    'age', 'workclass', 'fnlwgt', 'education', 'education-num',
    'marital-status', 'occupation', 'relationship', 'race', 'sex',
    'capital-gain', 'capital-loss', 'hours-per-week', 'native-country', 'label'
]
LABEL_KEY = 'label'

기능 및 스키마 정의

입력에있는 열의 유형에 따라 스키마를 정의 해 보겠습니다. 무엇보다도 이것은 올바르게 가져 오는 데 도움이됩니다.

RAW_DATA_FEATURE_SPEC = dict(
    [(name, tf.io.FixedLenFeature([], tf.string))
     for name in CATEGORICAL_FEATURE_KEYS] +
    [(name, tf.io.FixedLenFeature([], tf.float32))
     for name in NUMERIC_FEATURE_KEYS] +
    [(name, tf.io.VarLenFeature(tf.float32))
     for name in OPTIONAL_NUMERIC_FEATURE_KEYS] +
    [(LABEL_KEY, tf.io.FixedLenFeature([], tf.string))]
)

SCHEMA = tft.tf_metadata.dataset_metadata.DatasetMetadata(
    tft.tf_metadata.schema_utils.schema_from_feature_spec(RAW_DATA_FEATURE_SPEC)).schema

초 매개 변수 및 기본 관리 설정

훈련에 사용되는 상수 및 초 매개 변수. 버킷 크기에는 데이터 세트 설명에 나열된 모든 범주와 "?"에 대한 추가 범주가 포함됩니다. 알 수 없음을 나타냅니다.

testing = os.getenv("WEB_TEST_BROWSER", False)
NUM_OOV_BUCKETS = 1
if testing:
  TRAIN_NUM_EPOCHS = 1
  NUM_TRAIN_INSTANCES = 1
  TRAIN_BATCH_SIZE = 1
  NUM_TEST_INSTANCES = 1
else:
  TRAIN_NUM_EPOCHS = 16
  NUM_TRAIN_INSTANCES = 32561
  TRAIN_BATCH_SIZE = 128
  NUM_TEST_INSTANCES = 16281

# Names of temp files
TRANSFORMED_TRAIN_DATA_FILEBASE = 'train_transformed'
TRANSFORMED_TEST_DATA_FILEBASE = 'test_transformed'
EXPORTED_MODEL_DIR = 'exported_model_dir'

tf.Transform 전처리

tf.Transform preprocessing_fn 만들기

전처리 기능 은 tf.Transform의 가장 중요한 개념입니다. 전처리 기능은 데이터 세트의 변환이 실제로 발생하는 곳입니다. 텐서 사전을 받아들이고 반환합니다. 여기서 텐서는 Tensor 또는 SparseTensor 의미합니다. 일반적으로 전처리 기능의 핵심을 형성하는 두 가지 주요 API 호출 그룹이 있습니다.

  1. TensorFlow Ops : 일반적으로 TensorFlow 작업을 의미하는 텐서를 수락하고 반환하는 모든 함수. 이렇게하면 원시 데이터를 한 번에 하나의 특성 벡터로 변환 된 데이터로 변환하는 TensorFlow 작업이 그래프에 추가됩니다. 이는 훈련과 서빙 중 모든 예에서 실행됩니다.
  2. TensorFlow Transform Analyzers : tf.Transform에서 제공하는 모든 분석기. 분석기는 또한 텐서를 수락하고 반환하지만 TensorFlow 작업과 달리 학습 중에 한 번만 실행되며 일반적으로 전체 학습 데이터 세트에 대해 전체 패스를 수행합니다. 그들은 그래프에 추가되는 텐서 상수를 만듭니다. 예를 들어 tft.min 은 훈련 데이터 세트에 대한 텐서의 최소값을 계산합니다. tf.Transform은 고정 된 분석기 세트를 제공하지만 향후 버전에서 확장 될 예정입니다.
def preprocessing_fn(inputs):
  """Preprocess input columns into transformed columns."""
  # Since we are modifying some features and leaving others unchanged, we
  # start by setting `outputs` to a copy of `inputs.
  outputs = inputs.copy()

  # Scale numeric columns to have range [0, 1].
  for key in NUMERIC_FEATURE_KEYS:
    outputs[key] = tft.scale_to_0_1(inputs[key])

  for key in OPTIONAL_NUMERIC_FEATURE_KEYS:
    # This is a SparseTensor because it is optional. Here we fill in a default
    # value when it is missing.
    sparse = tf.sparse.SparseTensor(inputs[key].indices, inputs[key].values,
                                    [inputs[key].dense_shape[0], 1])
    dense = tf.sparse.to_dense(sp_input=sparse, default_value=0.)
    # Reshaping from a batch of vectors of size 1 to a batch to scalars.
    dense = tf.squeeze(dense, axis=1)
    outputs[key] = tft.scale_to_0_1(dense)

  # For all categorical columns except the label column, we generate a
  # vocabulary but do not modify the feature.  This vocabulary is instead
  # used in the trainer, by means of a feature column, to convert the feature
  # from a string to an integer id.
  for key in CATEGORICAL_FEATURE_KEYS:
    outputs[key] = tft.compute_and_apply_vocabulary(
        tf.strings.strip(inputs[key]),
        num_oov_buckets=NUM_OOV_BUCKETS,
        vocab_filename=key)

  # For the label column we provide the mapping from string to index.
  table_keys = ['>50K', '<=50K']
  initializer = tf.lookup.KeyValueTensorInitializer(
      keys=table_keys,
      values=tf.cast(tf.range(len(table_keys)), tf.int64),
      key_dtype=tf.string,
      value_dtype=tf.int64)
  table = tf.lookup.StaticHashTable(initializer, default_value=-1)
  # Remove trailing periods for test data when the data is read with tf.data.
  label_str = tf.strings.regex_replace(inputs[LABEL_KEY], r'\.', '')
  label_str = tf.strings.strip(label_str)
  data_labels = table.lookup(label_str)
  transformed_label = tf.one_hot(
      indices=data_labels, depth=len(table_keys), on_value=1.0, off_value=0.0)
  outputs[LABEL_KEY] = tf.reshape(transformed_label, [-1, len(table_keys)])

  return outputs

데이터 변환

이제 Apache Beam 파이프 라인에서 데이터 변환을 시작할 준비가되었습니다.

  1. CSV 리더를 사용하여 데이터 읽기
  2. 각 카테고리에 대한 어휘를 생성하여 숫자 데이터를 확장하고 카테고리 데이터를 문자열에서 int64 값 인덱스로 변환하는 전처리 파이프 라인을 사용하여이를 변환합니다.
  3. 결과를 Example TFRecordTFRecord 로 작성합니다. 나중에 모델 학습에 사용할 것입니다.
def transform_data(train_data_file, test_data_file, working_dir):
  """Transform the data and write out as a TFRecord of Example protos.

  Read in the data using the CSV reader, and transform it using a
  preprocessing pipeline that scales numeric data and converts categorical data
  from strings to int64 values indices, by creating a vocabulary for each
  category.

  Args:
    train_data_file: File containing training data
    test_data_file: File containing test data
    working_dir: Directory to write transformed data and metadata to
  """

  # The "with" block will create a pipeline, and run that pipeline at the exit
  # of the block.
  with beam.Pipeline() as pipeline:
    with tft_beam.Context(temp_dir=tempfile.mkdtemp()):
      # Create a TFXIO to read the census data with the schema. To do this we
      # need to list all columns in order since the schema doesn't specify the
      # order of columns in the csv.
      # We first read CSV files and use BeamRecordCsvTFXIO whose .BeamSource()
      # accepts a PCollection[bytes] because we need to patch the records first
      # (see "FixCommasTrainData" below). Otherwise, tfxio.CsvTFXIO can be used
      # to both read the CSV files and parse them to TFT inputs:
      # csv_tfxio = tfxio.CsvTFXIO(...)
      # raw_data = (pipeline | 'ToRecordBatches' >> csv_tfxio.BeamSource())
      csv_tfxio = tfxio.BeamRecordCsvTFXIO(
          physical_format='text',
          column_names=ORDERED_CSV_COLUMNS,
          schema=SCHEMA)

      # Read in raw data and convert using CSV TFXIO.  Note that we apply
      # some Beam transformations here, which will not be encoded in the TF
      # graph since we don't do the from within tf.Transform's methods
      # (AnalyzeDataset, TransformDataset etc.).  These transformations are just
      # to get data into a format that the CSV TFXIO can read, in particular
      # removing spaces after commas.
      raw_data = (
          pipeline
          | 'ReadTrainData' >> beam.io.ReadFromText(
              train_data_file, coder=beam.coders.BytesCoder())
          | 'FixCommasTrainData' >> beam.Map(
              lambda line: line.replace(b', ', b','))
          | 'DecodeTrainData' >> csv_tfxio.BeamSource())

      # Combine data and schema into a dataset tuple.  Note that we already used
      # the schema to read the CSV data, but we also need it to interpret
      # raw_data.
      raw_dataset = (raw_data, csv_tfxio.TensorAdapterConfig())
      transformed_dataset, transform_fn = (
          raw_dataset | tft_beam.AnalyzeAndTransformDataset(preprocessing_fn))
      transformed_data, transformed_metadata = transformed_dataset
      transformed_data_coder = tft.coders.ExampleProtoCoder(
          transformed_metadata.schema)

      _ = (
          transformed_data
          | 'EncodeTrainData' >> beam.Map(transformed_data_coder.encode)
          | 'WriteTrainData' >> beam.io.WriteToTFRecord(
              os.path.join(working_dir, TRANSFORMED_TRAIN_DATA_FILEBASE)))

      # Now apply transform function to test data.  In this case we remove the
      # trailing period at the end of each line, and also ignore the header line
      # that is present in the test data file.
      raw_test_data = (
          pipeline
          | 'ReadTestData' >> beam.io.ReadFromText(
              test_data_file, skip_header_lines=1,
              coder=beam.coders.BytesCoder())
          | 'FixCommasTestData' >> beam.Map(
              lambda line: line.replace(b', ', b','))
          | 'RemoveTrailingPeriodsTestData' >> beam.Map(lambda line: line[:-1])
          | 'DecodeTestData' >> csv_tfxio.BeamSource())

      raw_test_dataset = (raw_test_data, csv_tfxio.TensorAdapterConfig())

      transformed_test_dataset = (
          (raw_test_dataset, transform_fn) | tft_beam.TransformDataset())
      # Don't need transformed data schema, it's the same as before.
      transformed_test_data, _ = transformed_test_dataset

      _ = (
          transformed_test_data
          | 'EncodeTestData' >> beam.Map(transformed_data_coder.encode)
          | 'WriteTestData' >> beam.io.WriteToTFRecord(
              os.path.join(working_dir, TRANSFORMED_TEST_DATA_FILEBASE)))

      # Will write a SavedModel and metadata to working_dir, which can then
      # be read by the tft.TFTransformOutput class.
      _ = (
          transform_fn
          | 'WriteTransformFn' >> tft_beam.WriteTransformFn(working_dir))

전처리 된 데이터를 사용하여 tf.keras를 사용하여 모델 학습

tf.Transform 을 사용하여 학습과 제공 모두에 동일한 코드를 사용하여 왜곡을 방지하는 방법을 보여주기 위해 모델을 학습 할 것입니다. 모델을 훈련시키고 생산을 위해 훈련 된 모델을 준비하려면 입력 함수를 만들어야합니다. 학습 입력 함수와 제공 입력 함수의 주요 차이점은 학습 데이터에는 레이블이 포함되고 프로덕션 데이터에는 포함되지 않는다는 것입니다. 인수와 반환도 약간 다릅니다.

훈련을위한 입력 함수 만들기

def _make_training_input_fn(tf_transform_output, transformed_examples,
                            batch_size):
  """An input function reading from transformed data, converting to model input.

  Args:
    tf_transform_output: Wrapper around output of tf.Transform.
    transformed_examples: Base filename of examples.
    batch_size: Batch size.

  Returns:
    The input data for training or eval, in the form of k.
  """
  def input_fn():
    return tf.data.experimental.make_batched_features_dataset(
        file_pattern=transformed_examples,
        batch_size=batch_size,
        features=tf_transform_output.transformed_feature_spec(),
        reader=tf.data.TFRecordDataset,
        label_key=LABEL_KEY,
        shuffle=True).prefetch(tf.data.experimental.AUTOTUNE)

  return input_fn

제공하기위한 입력 함수 만들기

프로덕션에서 사용할 수있는 입력 함수를 만들고 제공 할 학습 된 모델을 준비해 보겠습니다.

def _make_serving_input_fn(tf_transform_output, raw_examples, batch_size):
  """An input function reading from raw data, converting to model input.

  Args:
    tf_transform_output: Wrapper around output of tf.Transform.
    raw_examples: Base filename of examples.
    batch_size: Batch size.

  Returns:
    The input data for training or eval, in the form of k.
  """

  def get_ordered_raw_data_dtypes():
    result = []
    for col in ORDERED_CSV_COLUMNS:
      if col not in RAW_DATA_FEATURE_SPEC:
        result.append(0.0)
        continue
      spec = RAW_DATA_FEATURE_SPEC[col]
      if isinstance(spec, tf.io.FixedLenFeature):
        result.append(spec.dtype)
      else:
        result.append(0.0)
    return result

  def input_fn():
    dataset = tf.data.experimental.make_csv_dataset(
        file_pattern=raw_examples,
        batch_size=batch_size,
        column_names=ORDERED_CSV_COLUMNS,
        column_defaults=get_ordered_raw_data_dtypes(),
        prefetch_buffer_size=0,
        ignore_errors=True)

    tft_layer = tf_transform_output.transform_features_layer()

    def transform_dataset(data):
      raw_features = {}
      for key, val in data.items():
        if key not in RAW_DATA_FEATURE_SPEC:
          continue
        if isinstance(RAW_DATA_FEATURE_SPEC[key], tf.io.VarLenFeature):
          raw_features[key] = tf.RaggedTensor.from_tensor(
              tf.expand_dims(val, -1)).to_sparse()
          continue
        raw_features[key] = val
      transformed_features = tft_layer(raw_features)
      data_labels = transformed_features.pop(LABEL_KEY)
      return (transformed_features, data_labels)

    return dataset.map(
        transform_dataset,
        num_parallel_calls=tf.data.experimental.AUTOTUNE).prefetch(
            tf.data.experimental.AUTOTUNE)

  return input_fn

모델 훈련, 평가 및 내보내기

def export_serving_model(tf_transform_output, model, output_dir):
  """Exports a keras model for serving.

  Args:
    tf_transform_output: Wrapper around output of tf.Transform.
    model: A keras model to export for serving.
    output_dir: A directory where the model will be exported to.
  """
  # The layer has to be saved to the model for keras tracking purpases.
  model.tft_layer = tf_transform_output.transform_features_layer()

  @tf.function
  def serve_tf_examples_fn(serialized_tf_examples):
    """Serving tf.function model wrapper."""
    feature_spec = RAW_DATA_FEATURE_SPEC.copy()
    feature_spec.pop(LABEL_KEY)
    parsed_features = tf.io.parse_example(serialized_tf_examples, feature_spec)
    transformed_features = model.tft_layer(parsed_features)
    outputs = model(transformed_features)
    classes_names = tf.constant([['0', '1']])
    classes = tf.tile(classes_names, [tf.shape(outputs)[0], 1])
    return {'classes': classes, 'scores': outputs}

  concrete_serving_fn = serve_tf_examples_fn.get_concrete_function(
      tf.TensorSpec(shape=[None], dtype=tf.string, name='inputs'))
  signatures = {'serving_default': concrete_serving_fn}

  # This is required in order to make this model servable with model_server.
  versioned_output_dir = os.path.join(output_dir, '1')
  model.save(versioned_output_dir, save_format='tf', signatures=signatures)
def train_and_evaluate(working_dir,
                       num_train_instances=NUM_TRAIN_INSTANCES,
                       num_test_instances=NUM_TEST_INSTANCES):
  """Train the model on training data and evaluate on test data.

  Args:
    working_dir: The location of the Transform output.
    num_train_instances: Number of instances in train set
    num_test_instances: Number of instances in test set

  Returns:
    The results from the estimator's 'evaluate' method
  """
  train_data_path_pattern = os.path.join(working_dir,
                                 TRANSFORMED_TRAIN_DATA_FILEBASE + '*')
  eval_data_path_pattern = os.path.join(working_dir,
                            TRANSFORMED_TEST_DATA_FILEBASE + '*')
  tf_transform_output = tft.TFTransformOutput(working_dir)

  train_input_fn = _make_training_input_fn(
      tf_transform_output, train_data_path_pattern, batch_size=TRAIN_BATCH_SIZE)
  train_dataset = train_input_fn()

  # Evaluate model on test dataset.
  eval_input_fn = _make_training_input_fn(
      tf_transform_output, eval_data_path_pattern, batch_size=TRAIN_BATCH_SIZE)
  validation_dataset = eval_input_fn()

  feature_spec = tf_transform_output.transformed_feature_spec().copy()
  feature_spec.pop(LABEL_KEY)

  inputs = {}
  for key, spec in feature_spec.items():
    if isinstance(spec, tf.io.VarLenFeature):
      inputs[key] = tf.keras.layers.Input(
          shape=[None], name=key, dtype=spec.dtype, sparse=True)
    elif isinstance(spec, tf.io.FixedLenFeature):
      inputs[key] = tf.keras.layers.Input(
          shape=spec.shape, name=key, dtype=spec.dtype)
    else:
      raise ValueError('Spec type is not supported: ', key, spec)

  encoded_inputs = {}
  for key in inputs:
    feature = tf.expand_dims(inputs[key], -1)
    if key in CATEGORICAL_FEATURE_KEYS:
      num_buckets = tf_transform_output.num_buckets_for_transformed_feature(key)
      encoding_layer = (
          tf.keras.layers.experimental.preprocessing.CategoryEncoding(
              max_tokens=num_buckets, output_mode='binary', sparse=False))
      encoded_inputs[key] = encoding_layer(feature)
    else:
      encoded_inputs[key] = feature

  stacked_inputs = tf.concat(tf.nest.flatten(encoded_inputs), axis=1)
  output = tf.keras.layers.Dense(100, activation='relu')(stacked_inputs)
  output = tf.keras.layers.Dense(70, activation='relu')(output)
  output = tf.keras.layers.Dense(50, activation='relu')(output)
  output = tf.keras.layers.Dense(20, activation='relu')(output)
  output = tf.keras.layers.Dense(2, activation='sigmoid')(output)
  model = tf.keras.Model(inputs=inputs, outputs=output)

  model.compile(optimizer='adam',
                loss='binary_crossentropy',
                metrics=['accuracy'])
  pprint.pprint(model.summary())

  model.fit(train_dataset, validation_data=validation_dataset,
            epochs=TRAIN_NUM_EPOCHS,
            steps_per_epoch=math.ceil(num_train_instances / TRAIN_BATCH_SIZE),
            validation_steps=math.ceil(num_test_instances / TRAIN_BATCH_SIZE))

  # Export the model.
  exported_model_dir = os.path.join(working_dir, EXPORTED_MODEL_DIR)
  export_serving_model(tf_transform_output, model, exported_model_dir)

  metrics_values = model.evaluate(validation_dataset, steps=num_test_instances)
  metrics_labels = model.metrics_names
  return {l: v for l, v in zip(metrics_labels, metrics_values)}

모두 모아

인구 조사 데이터를 전처리하고, 모델을 훈련시키고, 제공 할 준비를하는 데 필요한 모든 것을 만들었습니다. 지금까지 우리는 준비를 마쳤습니다. 달리기를 시작할 시간입니다!

import tempfile
temp = os.path.join(tempfile.gettempdir(), 'keras')

transform_data(train, test, temp)
results = train_and_evaluate(temp)
pprint.pprint(results)
WARNING:apache_beam.runners.interactive.interactive_environment:Dependencies required for Interactive Beam PCollection visualization are not available, please use: `pip install apache-beam[interactive]` to install necessary dependencies to enable all data visualization features.
WARNING:tensorflow:Tensorflow version (2.4.1) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended.
WARNING:tensorflow:Tensorflow version (2.4.1) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended.
WARNING:tensorflow:Tensorflow version (2.4.1) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended.
WARNING:tensorflow:Tensorflow version (2.4.1) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_transform/tf_utils.py:266: Tensor.experimental_ref (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use ref() instead.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_transform/tf_utils.py:266: Tensor.experimental_ref (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use ref() instead.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/saved_model/signature_def_utils_impl.py:201: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/saved_model/signature_def_utils_impl.py:201: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:No assets to write.
INFO:tensorflow:No assets to write.
WARNING:tensorflow:Issue encountered when serializing tft_mapper_use.
Type is unsupported, or the types of the items don't match field type in CollectionDef. Note this is a warning and probably safe to ignore.
'Counter' object has no attribute 'name'
WARNING:tensorflow:Issue encountered when serializing tft_mapper_use.
Type is unsupported, or the types of the items don't match field type in CollectionDef. Note this is a warning and probably safe to ignore.
'Counter' object has no attribute 'name'
INFO:tensorflow:SavedModel written to: /tmp/tmpl1tkdg71/tftransform_tmp/eaa9d93f1d524541be17fad11c5ada8c/saved_model.pb
INFO:tensorflow:SavedModel written to: /tmp/tmpl1tkdg71/tftransform_tmp/eaa9d93f1d524541be17fad11c5ada8c/saved_model.pb
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:No assets to write.
INFO:tensorflow:No assets to write.
WARNING:tensorflow:Issue encountered when serializing tft_mapper_use.
Type is unsupported, or the types of the items don't match field type in CollectionDef. Note this is a warning and probably safe to ignore.
'Counter' object has no attribute 'name'
WARNING:tensorflow:Issue encountered when serializing tft_mapper_use.
Type is unsupported, or the types of the items don't match field type in CollectionDef. Note this is a warning and probably safe to ignore.
'Counter' object has no attribute 'name'
INFO:tensorflow:SavedModel written to: /tmp/tmpl1tkdg71/tftransform_tmp/0100b711e0284f628ce54c01d1649a0e/saved_model.pb
INFO:tensorflow:SavedModel written to: /tmp/tmpl1tkdg71/tftransform_tmp/0100b711e0284f628ce54c01d1649a0e/saved_model.pb
WARNING:tensorflow:Tensorflow version (2.4.1) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended.
WARNING:tensorflow:Tensorflow version (2.4.1) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended.
WARNING:tensorflow:Tensorflow version (2.4.1) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended.
WARNING:tensorflow:Tensorflow version (2.4.1) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended.
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:Assets written to: /tmp/tmpl1tkdg71/tftransform_tmp/83dd43ae67a74f97b14e9cd9fa638782/assets
INFO:tensorflow:Assets written to: /tmp/tmpl1tkdg71/tftransform_tmp/83dd43ae67a74f97b14e9cd9fa638782/assets
INFO:tensorflow:SavedModel written to: /tmp/tmpl1tkdg71/tftransform_tmp/83dd43ae67a74f97b14e9cd9fa638782/saved_model.pb
INFO:tensorflow:SavedModel written to: /tmp/tmpl1tkdg71/tftransform_tmp/83dd43ae67a74f97b14e9cd9fa638782/saved_model.pb
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_3:0\022\tworkclass"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_3:0\022\tworkclass"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_5:0\022\teducation"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_5:0\022\teducation"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_7:0\022\016marital-status"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_7:0\022\016marital-status"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_9:0\022\noccupation"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_9:0\022\noccupation"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_11:0\022\014relationship"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_11:0\022\014relationship"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_13:0\022\004race"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_13:0\022\004race"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_15:0\022\003sex"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_15:0\022\003sex"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_17:0\022\016native-country"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_17:0\022\016native-country"
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_3:0\022\tworkclass"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_3:0\022\tworkclass"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_5:0\022\teducation"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_5:0\022\teducation"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_7:0\022\016marital-status"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_7:0\022\016marital-status"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_9:0\022\noccupation"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_9:0\022\noccupation"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_11:0\022\014relationship"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_11:0\022\014relationship"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_13:0\022\004race"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_13:0\022\004race"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_15:0\022\003sex"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_15:0\022\003sex"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_17:0\022\016native-country"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_17:0\022\016native-country"
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
WARNING:apache_beam.io.tfrecordio:Couldn't find python-snappy so the implementation of _TFRecordUtil._masked_crc32c is not as fast as it could be.
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_3:0\022\tworkclass"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_3:0\022\tworkclass"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_5:0\022\teducation"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_5:0\022\teducation"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_7:0\022\016marital-status"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_7:0\022\016marital-status"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_9:0\022\noccupation"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_9:0\022\noccupation"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_11:0\022\014relationship"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_11:0\022\014relationship"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_13:0\022\004race"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_13:0\022\004race"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_15:0\022\003sex"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_15:0\022\003sex"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_17:0\022\016native-country"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_17:0\022\016native-country"
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
education (InputLayer)          [(None,)]            0                                            
__________________________________________________________________________________________________
marital-status (InputLayer)     [(None,)]            0                                            
__________________________________________________________________________________________________
native-country (InputLayer)     [(None,)]            0                                            
__________________________________________________________________________________________________
occupation (InputLayer)         [(None,)]            0                                            
__________________________________________________________________________________________________
race (InputLayer)               [(None,)]            0                                            
__________________________________________________________________________________________________
relationship (InputLayer)       [(None,)]            0                                            
__________________________________________________________________________________________________
sex (InputLayer)                [(None,)]            0                                            
__________________________________________________________________________________________________
workclass (InputLayer)          [(None,)]            0                                            
__________________________________________________________________________________________________
age (InputLayer)                [(None,)]            0                                            
__________________________________________________________________________________________________
capital-gain (InputLayer)       [(None,)]            0                                            
__________________________________________________________________________________________________
capital-loss (InputLayer)       [(None,)]            0                                            
__________________________________________________________________________________________________
tf.expand_dims_3 (TFOpLambda)   (None, 1)            0           education[0][0]                  
__________________________________________________________________________________________________
education-num (InputLayer)      [(None,)]            0                                            
__________________________________________________________________________________________________
hours-per-week (InputLayer)     [(None,)]            0                                            
__________________________________________________________________________________________________
tf.expand_dims_6 (TFOpLambda)   (None, 1)            0           marital-status[0][0]             
__________________________________________________________________________________________________
tf.expand_dims_7 (TFOpLambda)   (None, 1)            0           native-country[0][0]             
__________________________________________________________________________________________________
tf.expand_dims_8 (TFOpLambda)   (None, 1)            0           occupation[0][0]                 
__________________________________________________________________________________________________
tf.expand_dims_9 (TFOpLambda)   (None, 1)            0           race[0][0]                       
__________________________________________________________________________________________________
tf.expand_dims_10 (TFOpLambda)  (None, 1)            0           relationship[0][0]               
__________________________________________________________________________________________________
tf.expand_dims_11 (TFOpLambda)  (None, 1)            0           sex[0][0]                        
__________________________________________________________________________________________________
tf.expand_dims_12 (TFOpLambda)  (None, 1)            0           workclass[0][0]                  
__________________________________________________________________________________________________
tf.expand_dims (TFOpLambda)     (None, 1)            0           age[0][0]                        
__________________________________________________________________________________________________
tf.expand_dims_1 (TFOpLambda)   (None, 1)            0           capital-gain[0][0]               
__________________________________________________________________________________________________
tf.expand_dims_2 (TFOpLambda)   (None, 1)            0           capital-loss[0][0]               
__________________________________________________________________________________________________
category_encoding (CategoryEnco (None, 17)           0           tf.expand_dims_3[0][0]           
__________________________________________________________________________________________________
tf.expand_dims_4 (TFOpLambda)   (None, 1)            0           education-num[0][0]              
__________________________________________________________________________________________________
tf.expand_dims_5 (TFOpLambda)   (None, 1)            0           hours-per-week[0][0]             
__________________________________________________________________________________________________
category_encoding_1 (CategoryEn (None, 8)            0           tf.expand_dims_6[0][0]           
__________________________________________________________________________________________________
category_encoding_2 (CategoryEn (None, 43)           0           tf.expand_dims_7[0][0]           
__________________________________________________________________________________________________
category_encoding_3 (CategoryEn (None, 16)           0           tf.expand_dims_8[0][0]           
__________________________________________________________________________________________________
category_encoding_4 (CategoryEn (None, 6)            0           tf.expand_dims_9[0][0]           
__________________________________________________________________________________________________
category_encoding_5 (CategoryEn (None, 7)            0           tf.expand_dims_10[0][0]          
__________________________________________________________________________________________________
category_encoding_6 (CategoryEn (None, 3)            0           tf.expand_dims_11[0][0]          
__________________________________________________________________________________________________
category_encoding_7 (CategoryEn (None, 10)           0           tf.expand_dims_12[0][0]          
__________________________________________________________________________________________________
tf.concat (TFOpLambda)          (None, 115)          0           tf.expand_dims[0][0]             
                                                                 tf.expand_dims_1[0][0]           
                                                                 tf.expand_dims_2[0][0]           
                                                                 category_encoding[0][0]          
                                                                 tf.expand_dims_4[0][0]           
                                                                 tf.expand_dims_5[0][0]           
                                                                 category_encoding_1[0][0]        
                                                                 category_encoding_2[0][0]        
                                                                 category_encoding_3[0][0]        
                                                                 category_encoding_4[0][0]        
                                                                 category_encoding_5[0][0]        
                                                                 category_encoding_6[0][0]        
                                                                 category_encoding_7[0][0]        
__________________________________________________________________________________________________
dense (Dense)                   (None, 100)          11600       tf.concat[0][0]                  
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 70)           7070        dense[0][0]                      
__________________________________________________________________________________________________
dense_2 (Dense)                 (None, 50)           3550        dense_1[0][0]                    
__________________________________________________________________________________________________
dense_3 (Dense)                 (None, 20)           1020        dense_2[0][0]                    
__________________________________________________________________________________________________
dense_4 (Dense)                 (None, 2)            42          dense_3[0][0]                    
==================================================================================================
Total params: 23,282
Trainable params: 23,282
Non-trainable params: 0
__________________________________________________________________________________________________
None
Epoch 1/16
255/255 [==============================] - 3s 9ms/step - loss: 0.4275 - accuracy: 0.8075 - val_loss: 0.3487 - val_accuracy: 0.8320
Epoch 2/16
255/255 [==============================] - 2s 6ms/step - loss: 0.3378 - accuracy: 0.8403 - val_loss: 0.3346 - val_accuracy: 0.8448
Epoch 3/16
255/255 [==============================] - 2s 6ms/step - loss: 0.3291 - accuracy: 0.8437 - val_loss: 0.3277 - val_accuracy: 0.8471
Epoch 4/16
255/255 [==============================] - 2s 6ms/step - loss: 0.3197 - accuracy: 0.8498 - val_loss: 0.3203 - val_accuracy: 0.8519
Epoch 5/16
255/255 [==============================] - 2s 6ms/step - loss: 0.3126 - accuracy: 0.8543 - val_loss: 0.3213 - val_accuracy: 0.8491
Epoch 6/16
255/255 [==============================] - 2s 7ms/step - loss: 0.3054 - accuracy: 0.8589 - val_loss: 0.3188 - val_accuracy: 0.8513
Epoch 7/16
255/255 [==============================] - 2s 6ms/step - loss: 0.3082 - accuracy: 0.8548 - val_loss: 0.3226 - val_accuracy: 0.8496
Epoch 8/16
255/255 [==============================] - 2s 6ms/step - loss: 0.3031 - accuracy: 0.8555 - val_loss: 0.3205 - val_accuracy: 0.8517
Epoch 9/16
255/255 [==============================] - 2s 6ms/step - loss: 0.3026 - accuracy: 0.8596 - val_loss: 0.3267 - val_accuracy: 0.8510
Epoch 10/16
255/255 [==============================] - 2s 6ms/step - loss: 0.2973 - accuracy: 0.8598 - val_loss: 0.3266 - val_accuracy: 0.8505
Epoch 11/16
255/255 [==============================] - 2s 6ms/step - loss: 0.2968 - accuracy: 0.8617 - val_loss: 0.3299 - val_accuracy: 0.8500
Epoch 12/16
255/255 [==============================] - 2s 6ms/step - loss: 0.2906 - accuracy: 0.8637 - val_loss: 0.3311 - val_accuracy: 0.8499
Epoch 13/16
255/255 [==============================] - 2s 6ms/step - loss: 0.2801 - accuracy: 0.8695 - val_loss: 0.3376 - val_accuracy: 0.8493
Epoch 14/16
255/255 [==============================] - 2s 6ms/step - loss: 0.2788 - accuracy: 0.8686 - val_loss: 0.3366 - val_accuracy: 0.8469
Epoch 15/16
255/255 [==============================] - 2s 6ms/step - loss: 0.2753 - accuracy: 0.8692 - val_loss: 0.3437 - val_accuracy: 0.8473
Epoch 16/16
255/255 [==============================] - 2s 6ms/step - loss: 0.2758 - accuracy: 0.8707 - val_loss: 0.3420 - val_accuracy: 0.8456
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Assets written to: /tmp/keras/exported_model_dir/1/assets
INFO:tensorflow:Assets written to: /tmp/keras/exported_model_dir/1/assets
16281/16281 [==============================] - 60s 4ms/step - loss: 0.3422 - accuracy: 0.8454
{'accuracy': 0.8454025983810425, 'loss': 0.3422209322452545}

(선택 사항) 사전 처리 된 데이터를 사용하여 tf.estimator를 사용하여 모델 학습

Keras 모델 대신 Estimator 모델을 사용하려는 경우이 섹션의 코드가이를 수행하는 방법을 보여줍니다.

훈련을위한 입력 함수 만들기

def _make_training_input_fn(tf_transform_output, transformed_examples,
                            batch_size):
  """Creates an input function reading from transformed data.

  Args:
    tf_transform_output: Wrapper around output of tf.Transform.
    transformed_examples: Base filename of examples.
    batch_size: Batch size.

  Returns:
    The input function for training or eval.
  """
  def input_fn():
    """Input function for training and eval."""
    dataset = tf.data.experimental.make_batched_features_dataset(
        file_pattern=transformed_examples,
        batch_size=batch_size,
        features=tf_transform_output.transformed_feature_spec(),
        reader=tf.data.TFRecordDataset,
        shuffle=True)

    transformed_features = tf.compat.v1.data.make_one_shot_iterator(
        dataset).get_next()

    # Extract features and label from the transformed tensors.
    transformed_labels = tf.where(
        tf.equal(transformed_features.pop(LABEL_KEY), 1))

    return transformed_features, transformed_labels[:,1]

  return input_fn

제공하기위한 입력 함수 만들기

프로덕션에서 사용할 수있는 입력 함수를 만들고 제공 할 학습 된 모델을 준비해 보겠습니다.

def _make_serving_input_fn(tf_transform_output):
  """Creates an input function reading from raw data.

  Args:
    tf_transform_output: Wrapper around output of tf.Transform.

  Returns:
    The serving input function.
  """
  raw_feature_spec = RAW_DATA_FEATURE_SPEC.copy()
  # Remove label since it is not available during serving.
  raw_feature_spec.pop(LABEL_KEY)

  def serving_input_fn():
    """Input function for serving."""
    # Get raw features by generating the basic serving input_fn and calling it.
    # Here we generate an input_fn that expects a parsed Example proto to be fed
    # to the model at serving time.  See also
    # tf.estimator.export.build_raw_serving_input_receiver_fn.
    raw_input_fn = tf.estimator.export.build_parsing_serving_input_receiver_fn(
        raw_feature_spec, default_batch_size=None)
    serving_input_receiver = raw_input_fn()

    # Apply the transform function that was used to generate the materialized
    # data.
    raw_features = serving_input_receiver.features
    transformed_features = tf_transform_output.transform_raw_features(
        raw_features)

    return tf.estimator.export.ServingInputReceiver(
        transformed_features, serving_input_receiver.receiver_tensors)

  return serving_input_fn

입력 데이터를 FeatureColumns로 래핑

모델은 TensorFlow FeatureColumns에서 데이터를 예상합니다.

def get_feature_columns(tf_transform_output):
  """Returns the FeatureColumns for the model.

  Args:
    tf_transform_output: A `TFTransformOutput` object.

  Returns:
    A list of FeatureColumns.
  """
  # Wrap scalars as real valued columns.
  real_valued_columns = [tf.feature_column.numeric_column(key, shape=())
                         for key in NUMERIC_FEATURE_KEYS]

  # Wrap categorical columns.
  one_hot_columns = [
      tf.feature_column.indicator_column(
          tf.feature_column.categorical_column_with_identity(
              key=key,
              num_buckets=(NUM_OOV_BUCKETS +
                  tf_transform_output.vocabulary_size_by_name(
                      vocab_filename=key))))
      for key in CATEGORICAL_FEATURE_KEYS]

  return real_valued_columns + one_hot_columns

모델 훈련, 평가 및 내보내기

def train_and_evaluate(working_dir, num_train_instances=NUM_TRAIN_INSTANCES,
                       num_test_instances=NUM_TEST_INSTANCES):
  """Train the model on training data and evaluate on test data.

  Args:
    working_dir: Directory to read transformed data and metadata from and to
        write exported model to.
    num_train_instances: Number of instances in train set
    num_test_instances: Number of instances in test set

  Returns:
    The results from the estimator's 'evaluate' method
  """
  tf_transform_output = tft.TFTransformOutput(working_dir)

  run_config = tf.estimator.RunConfig()

  estimator = tf.estimator.LinearClassifier(
      feature_columns=get_feature_columns(tf_transform_output),
      config=run_config,
      loss_reduction=tf.losses.Reduction.SUM)

  # Fit the model using the default optimizer.
  train_input_fn = _make_training_input_fn(
      tf_transform_output,
      os.path.join(working_dir, TRANSFORMED_TRAIN_DATA_FILEBASE + '*'),
      batch_size=TRAIN_BATCH_SIZE)
  estimator.train(
      input_fn=train_input_fn,
      max_steps=TRAIN_NUM_EPOCHS * num_train_instances / TRAIN_BATCH_SIZE)

  # Evaluate model on test dataset.
  eval_input_fn = _make_training_input_fn(
      tf_transform_output,
      os.path.join(working_dir, TRANSFORMED_TEST_DATA_FILEBASE + '*'),
      batch_size=1)

  # Export the model.
  serving_input_fn = _make_serving_input_fn(tf_transform_output)
  exported_model_dir = os.path.join(working_dir, EXPORTED_MODEL_DIR)
  estimator.export_saved_model(exported_model_dir, serving_input_fn)

  return estimator.evaluate(input_fn=eval_input_fn, steps=num_test_instances)

모두 모아

인구 조사 데이터를 전처리하고, 모델을 훈련시키고, 제공 할 준비를하는 데 필요한 모든 것을 만들었습니다. 지금까지 우리는 준비를 마쳤습니다. 달리기를 시작할 시간입니다!

import tempfile
temp = os.path.join(tempfile.gettempdir(), 'estimator')

transform_data(train, test, temp)
results = train_and_evaluate(temp)
pprint.pprint(results)
WARNING:tensorflow:Tensorflow version (2.4.1) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended.
WARNING:tensorflow:Tensorflow version (2.4.1) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended.
WARNING:tensorflow:Tensorflow version (2.4.1) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended.
WARNING:tensorflow:Tensorflow version (2.4.1) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended.
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:No assets to write.
INFO:tensorflow:No assets to write.
WARNING:tensorflow:Issue encountered when serializing tft_mapper_use.
Type is unsupported, or the types of the items don't match field type in CollectionDef. Note this is a warning and probably safe to ignore.
'Counter' object has no attribute 'name'
WARNING:tensorflow:Issue encountered when serializing tft_mapper_use.
Type is unsupported, or the types of the items don't match field type in CollectionDef. Note this is a warning and probably safe to ignore.
'Counter' object has no attribute 'name'
INFO:tensorflow:SavedModel written to: /tmp/tmpyrq734d7/tftransform_tmp/69fe4f379b934c3ba4719ea2e9cb7bd5/saved_model.pb
INFO:tensorflow:SavedModel written to: /tmp/tmpyrq734d7/tftransform_tmp/69fe4f379b934c3ba4719ea2e9cb7bd5/saved_model.pb
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:No assets to write.
INFO:tensorflow:No assets to write.
WARNING:tensorflow:Issue encountered when serializing tft_mapper_use.
Type is unsupported, or the types of the items don't match field type in CollectionDef. Note this is a warning and probably safe to ignore.
'Counter' object has no attribute 'name'
WARNING:tensorflow:Issue encountered when serializing tft_mapper_use.
Type is unsupported, or the types of the items don't match field type in CollectionDef. Note this is a warning and probably safe to ignore.
'Counter' object has no attribute 'name'
INFO:tensorflow:SavedModel written to: /tmp/tmpyrq734d7/tftransform_tmp/59cea98873e44c6d90dbf7e57170f2ec/saved_model.pb
INFO:tensorflow:SavedModel written to: /tmp/tmpyrq734d7/tftransform_tmp/59cea98873e44c6d90dbf7e57170f2ec/saved_model.pb
WARNING:tensorflow:Tensorflow version (2.4.1) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended.
WARNING:tensorflow:Tensorflow version (2.4.1) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended.
WARNING:tensorflow:Tensorflow version (2.4.1) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended.
WARNING:tensorflow:Tensorflow version (2.4.1) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended.
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:Assets written to: /tmp/tmpyrq734d7/tftransform_tmp/fe31ba0a14ba4bc1a6c72011652fb877/assets
INFO:tensorflow:Assets written to: /tmp/tmpyrq734d7/tftransform_tmp/fe31ba0a14ba4bc1a6c72011652fb877/assets
INFO:tensorflow:SavedModel written to: /tmp/tmpyrq734d7/tftransform_tmp/fe31ba0a14ba4bc1a6c72011652fb877/saved_model.pb
INFO:tensorflow:SavedModel written to: /tmp/tmpyrq734d7/tftransform_tmp/fe31ba0a14ba4bc1a6c72011652fb877/saved_model.pb
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_3:0\022\tworkclass"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_3:0\022\tworkclass"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_5:0\022\teducation"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_5:0\022\teducation"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_7:0\022\016marital-status"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_7:0\022\016marital-status"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_9:0\022\noccupation"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_9:0\022\noccupation"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_11:0\022\014relationship"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_11:0\022\014relationship"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_13:0\022\004race"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_13:0\022\004race"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_15:0\022\003sex"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_15:0\022\003sex"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_17:0\022\016native-country"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_17:0\022\016native-country"
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_3:0\022\tworkclass"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_3:0\022\tworkclass"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_5:0\022\teducation"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_5:0\022\teducation"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_7:0\022\016marital-status"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_7:0\022\016marital-status"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_9:0\022\noccupation"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_9:0\022\noccupation"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_11:0\022\014relationship"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_11:0\022\014relationship"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_13:0\022\004race"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_13:0\022\004race"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_15:0\022\003sex"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_15:0\022\003sex"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_17:0\022\016native-country"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_17:0\022\016native-country"
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_3:0\022\tworkclass"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_3:0\022\tworkclass"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_5:0\022\teducation"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_5:0\022\teducation"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_7:0\022\016marital-status"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_7:0\022\016marital-status"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_9:0\022\noccupation"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_9:0\022\noccupation"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_11:0\022\014relationship"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_11:0\022\014relationship"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_13:0\022\004race"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_13:0\022\004race"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_15:0\022\003sex"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_15:0\022\003sex"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_17:0\022\016native-country"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_17:0\022\016native-country"
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmpc28os49f
WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmpc28os49f
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpc28os49f', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpc28os49f', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
/tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer_v1.py:1727: UserWarning: `layer.add_variable` is deprecated and will be removed in a future version. Please use `layer.add_weight` method instead.
  warnings.warn('`layer.add_variable` is deprecated and '
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/keras/optimizer_v2/ftrl.py:134: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/keras/optimizer_v2/ftrl.py:134: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmpc28os49f/model.ckpt.
INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmpc28os49f/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:loss = 88.72284, step = 0
INFO:tensorflow:loss = 88.72284, step = 0
INFO:tensorflow:global_step/sec: 160.387
INFO:tensorflow:global_step/sec: 160.387
INFO:tensorflow:loss = 32.879272, step = 100 (0.625 sec)
INFO:tensorflow:loss = 32.879272, step = 100 (0.625 sec)
INFO:tensorflow:global_step/sec: 203.588
INFO:tensorflow:global_step/sec: 203.588
INFO:tensorflow:loss = 39.33315, step = 200 (0.491 sec)
INFO:tensorflow:loss = 39.33315, step = 200 (0.491 sec)
INFO:tensorflow:global_step/sec: 208.996
INFO:tensorflow:global_step/sec: 208.996
INFO:tensorflow:loss = 41.41929, step = 300 (0.479 sec)
INFO:tensorflow:loss = 41.41929, step = 300 (0.479 sec)
INFO:tensorflow:global_step/sec: 204.007
INFO:tensorflow:global_step/sec: 204.007
INFO:tensorflow:loss = 49.609642, step = 400 (0.490 sec)
INFO:tensorflow:loss = 49.609642, step = 400 (0.490 sec)
INFO:tensorflow:global_step/sec: 207.842
INFO:tensorflow:global_step/sec: 207.842
INFO:tensorflow:loss = 39.066364, step = 500 (0.481 sec)
INFO:tensorflow:loss = 39.066364, step = 500 (0.481 sec)
INFO:tensorflow:global_step/sec: 232.743
INFO:tensorflow:global_step/sec: 232.743
INFO:tensorflow:loss = 44.590332, step = 600 (0.429 sec)
INFO:tensorflow:loss = 44.590332, step = 600 (0.429 sec)
INFO:tensorflow:global_step/sec: 236.522
INFO:tensorflow:global_step/sec: 236.522
INFO:tensorflow:loss = 43.621086, step = 700 (0.424 sec)
INFO:tensorflow:loss = 43.621086, step = 700 (0.424 sec)
INFO:tensorflow:global_step/sec: 232.083
INFO:tensorflow:global_step/sec: 232.083
INFO:tensorflow:loss = 36.95012, step = 800 (0.430 sec)
INFO:tensorflow:loss = 36.95012, step = 800 (0.430 sec)
INFO:tensorflow:global_step/sec: 235.231
INFO:tensorflow:global_step/sec: 235.231
INFO:tensorflow:loss = 47.9609, step = 900 (0.425 sec)
INFO:tensorflow:loss = 47.9609, step = 900 (0.425 sec)
INFO:tensorflow:global_step/sec: 234.733
INFO:tensorflow:global_step/sec: 234.733
INFO:tensorflow:loss = 38.81161, step = 1000 (0.426 sec)
INFO:tensorflow:loss = 38.81161, step = 1000 (0.426 sec)
INFO:tensorflow:global_step/sec: 234.283
INFO:tensorflow:global_step/sec: 234.283
INFO:tensorflow:loss = 43.415607, step = 1100 (0.427 sec)
INFO:tensorflow:loss = 43.415607, step = 1100 (0.427 sec)
INFO:tensorflow:global_step/sec: 231.427
INFO:tensorflow:global_step/sec: 231.427
INFO:tensorflow:loss = 44.434288, step = 1200 (0.432 sec)
INFO:tensorflow:loss = 44.434288, step = 1200 (0.432 sec)
INFO:tensorflow:global_step/sec: 234.016
INFO:tensorflow:global_step/sec: 234.016
INFO:tensorflow:loss = 41.374805, step = 1300 (0.428 sec)
INFO:tensorflow:loss = 41.374805, step = 1300 (0.428 sec)
INFO:tensorflow:global_step/sec: 235.818
INFO:tensorflow:global_step/sec: 235.818
INFO:tensorflow:loss = 34.242737, step = 1400 (0.424 sec)
INFO:tensorflow:loss = 34.242737, step = 1400 (0.424 sec)
INFO:tensorflow:global_step/sec: 234.335
INFO:tensorflow:global_step/sec: 234.335
INFO:tensorflow:loss = 47.45173, step = 1500 (0.427 sec)
INFO:tensorflow:loss = 47.45173, step = 1500 (0.427 sec)
INFO:tensorflow:global_step/sec: 233.157
INFO:tensorflow:global_step/sec: 233.157
INFO:tensorflow:loss = 47.429375, step = 1600 (0.429 sec)
INFO:tensorflow:loss = 47.429375, step = 1600 (0.429 sec)
INFO:tensorflow:global_step/sec: 232.13
INFO:tensorflow:global_step/sec: 232.13
INFO:tensorflow:loss = 42.183315, step = 1700 (0.431 sec)
INFO:tensorflow:loss = 42.183315, step = 1700 (0.431 sec)
INFO:tensorflow:global_step/sec: 232.812
INFO:tensorflow:global_step/sec: 232.812
INFO:tensorflow:loss = 45.8485, step = 1800 (0.429 sec)
INFO:tensorflow:loss = 45.8485, step = 1800 (0.429 sec)
INFO:tensorflow:global_step/sec: 234.715
INFO:tensorflow:global_step/sec: 234.715
INFO:tensorflow:loss = 47.009045, step = 1900 (0.426 sec)
INFO:tensorflow:loss = 47.009045, step = 1900 (0.426 sec)
INFO:tensorflow:global_step/sec: 235.8
INFO:tensorflow:global_step/sec: 235.8
INFO:tensorflow:loss = 38.599316, step = 2000 (0.424 sec)
INFO:tensorflow:loss = 38.599316, step = 2000 (0.424 sec)
INFO:tensorflow:global_step/sec: 232.126
INFO:tensorflow:global_step/sec: 232.126
INFO:tensorflow:loss = 52.092033, step = 2100 (0.431 sec)
INFO:tensorflow:loss = 52.092033, step = 2100 (0.431 sec)
INFO:tensorflow:global_step/sec: 234.289
INFO:tensorflow:global_step/sec: 234.289
INFO:tensorflow:loss = 43.23425, step = 2200 (0.427 sec)
INFO:tensorflow:loss = 43.23425, step = 2200 (0.427 sec)
INFO:tensorflow:global_step/sec: 233.428
INFO:tensorflow:global_step/sec: 233.428
INFO:tensorflow:loss = 36.335266, step = 2300 (0.429 sec)
INFO:tensorflow:loss = 36.335266, step = 2300 (0.429 sec)
INFO:tensorflow:global_step/sec: 235.532
INFO:tensorflow:global_step/sec: 235.532
INFO:tensorflow:loss = 44.19064, step = 2400 (0.424 sec)
INFO:tensorflow:loss = 44.19064, step = 2400 (0.424 sec)
INFO:tensorflow:global_step/sec: 232.256
INFO:tensorflow:global_step/sec: 232.256
INFO:tensorflow:loss = 46.30754, step = 2500 (0.433 sec)
INFO:tensorflow:loss = 46.30754, step = 2500 (0.433 sec)
INFO:tensorflow:global_step/sec: 234.104
INFO:tensorflow:global_step/sec: 234.104
INFO:tensorflow:loss = 37.5838, step = 2600 (0.426 sec)
INFO:tensorflow:loss = 37.5838, step = 2600 (0.426 sec)
INFO:tensorflow:global_step/sec: 232.023
INFO:tensorflow:global_step/sec: 232.023
INFO:tensorflow:loss = 46.541443, step = 2700 (0.430 sec)
INFO:tensorflow:loss = 46.541443, step = 2700 (0.430 sec)
INFO:tensorflow:global_step/sec: 232.63
INFO:tensorflow:global_step/sec: 232.63
INFO:tensorflow:loss = 45.745735, step = 2800 (0.430 sec)
INFO:tensorflow:loss = 45.745735, step = 2800 (0.430 sec)
INFO:tensorflow:global_step/sec: 233.342
INFO:tensorflow:global_step/sec: 233.342
INFO:tensorflow:loss = 47.941216, step = 2900 (0.429 sec)
INFO:tensorflow:loss = 47.941216, step = 2900 (0.429 sec)
INFO:tensorflow:global_step/sec: 233.304
INFO:tensorflow:global_step/sec: 233.304
INFO:tensorflow:loss = 36.410313, step = 3000 (0.429 sec)
INFO:tensorflow:loss = 36.410313, step = 3000 (0.429 sec)
INFO:tensorflow:global_step/sec: 234.682
INFO:tensorflow:global_step/sec: 234.682
INFO:tensorflow:loss = 55.285294, step = 3100 (0.426 sec)
INFO:tensorflow:loss = 55.285294, step = 3100 (0.426 sec)
INFO:tensorflow:global_step/sec: 236.159
INFO:tensorflow:global_step/sec: 236.159
INFO:tensorflow:loss = 43.026596, step = 3200 (0.423 sec)
INFO:tensorflow:loss = 43.026596, step = 3200 (0.423 sec)
INFO:tensorflow:global_step/sec: 232.722
INFO:tensorflow:global_step/sec: 232.722
INFO:tensorflow:loss = 39.6655, step = 3300 (0.430 sec)
INFO:tensorflow:loss = 39.6655, step = 3300 (0.430 sec)
INFO:tensorflow:global_step/sec: 236
INFO:tensorflow:global_step/sec: 236
INFO:tensorflow:loss = 42.351692, step = 3400 (0.424 sec)
INFO:tensorflow:loss = 42.351692, step = 3400 (0.424 sec)
INFO:tensorflow:global_step/sec: 230.57
INFO:tensorflow:global_step/sec: 230.57
INFO:tensorflow:loss = 47.030056, step = 3500 (0.434 sec)
INFO:tensorflow:loss = 47.030056, step = 3500 (0.434 sec)
INFO:tensorflow:global_step/sec: 233.265
INFO:tensorflow:global_step/sec: 233.265
INFO:tensorflow:loss = 37.418053, step = 3600 (0.428 sec)
INFO:tensorflow:loss = 37.418053, step = 3600 (0.428 sec)
INFO:tensorflow:global_step/sec: 232.662
INFO:tensorflow:global_step/sec: 232.662
INFO:tensorflow:loss = 47.06579, step = 3700 (0.430 sec)
INFO:tensorflow:loss = 47.06579, step = 3700 (0.430 sec)
INFO:tensorflow:global_step/sec: 231.063
INFO:tensorflow:global_step/sec: 231.063
INFO:tensorflow:loss = 54.143867, step = 3800 (0.433 sec)
INFO:tensorflow:loss = 54.143867, step = 3800 (0.433 sec)
INFO:tensorflow:global_step/sec: 234.231
INFO:tensorflow:global_step/sec: 234.231
INFO:tensorflow:loss = 47.50246, step = 3900 (0.427 sec)
INFO:tensorflow:loss = 47.50246, step = 3900 (0.427 sec)
INFO:tensorflow:global_step/sec: 232.346
INFO:tensorflow:global_step/sec: 232.346
INFO:tensorflow:loss = 41.983932, step = 4000 (0.430 sec)
INFO:tensorflow:loss = 41.983932, step = 4000 (0.430 sec)
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 4071...
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 4071...
INFO:tensorflow:Saving checkpoints for 4071 into /tmp/tmpc28os49f/model.ckpt.
INFO:tensorflow:Saving checkpoints for 4071 into /tmp/tmpc28os49f/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 4071...
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 4071...
INFO:tensorflow:Loss for final step: 32.503983.
INFO:tensorflow:Loss for final step: 32.503983.
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_3:0\022\tworkclass"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_3:0\022\tworkclass"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_5:0\022\teducation"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_5:0\022\teducation"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_7:0\022\016marital-status"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_7:0\022\016marital-status"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_9:0\022\noccupation"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_9:0\022\noccupation"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_11:0\022\014relationship"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_11:0\022\014relationship"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_13:0\022\004race"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_13:0\022\004race"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_15:0\022\003sex"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_15:0\022\003sex"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_17:0\022\016native-country"
WARNING:tensorflow:Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\014\n\nConst_17:0\022\016native-country"
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Signatures INCLUDED in export for Classify: ['serving_default', 'classification']
INFO:tensorflow:Signatures INCLUDED in export for Classify: ['serving_default', 'classification']
INFO:tensorflow:Signatures INCLUDED in export for Regress: ['regression']
INFO:tensorflow:Signatures INCLUDED in export for Regress: ['regression']
INFO:tensorflow:Signatures INCLUDED in export for Predict: ['predict']
INFO:tensorflow:Signatures INCLUDED in export for Predict: ['predict']
INFO:tensorflow:Signatures INCLUDED in export for Train: None
INFO:tensorflow:Signatures INCLUDED in export for Train: None
INFO:tensorflow:Signatures INCLUDED in export for Eval: None
INFO:tensorflow:Signatures INCLUDED in export for Eval: None
INFO:tensorflow:Restoring parameters from /tmp/tmpc28os49f/model.ckpt-4071
INFO:tensorflow:Restoring parameters from /tmp/tmpc28os49f/model.ckpt-4071
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:Assets written to: /tmp/estimator/exported_model_dir/temp-1613049613/assets
INFO:tensorflow:Assets written to: /tmp/estimator/exported_model_dir/temp-1613049613/assets
INFO:tensorflow:SavedModel written to: /tmp/estimator/exported_model_dir/temp-1613049613/saved_model.pb
INFO:tensorflow:SavedModel written to: /tmp/estimator/exported_model_dir/temp-1613049613/saved_model.pb
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2021-02-11T13:20:15Z
INFO:tensorflow:Starting evaluation at 2021-02-11T13:20:15Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpc28os49f/model.ckpt-4071
INFO:tensorflow:Restoring parameters from /tmp/tmpc28os49f/model.ckpt-4071
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [1628/16281]
INFO:tensorflow:Evaluation [1628/16281]
INFO:tensorflow:Evaluation [3256/16281]
INFO:tensorflow:Evaluation [3256/16281]
INFO:tensorflow:Evaluation [4884/16281]
INFO:tensorflow:Evaluation [4884/16281]
INFO:tensorflow:Evaluation [6512/16281]
INFO:tensorflow:Evaluation [6512/16281]
INFO:tensorflow:Evaluation [8140/16281]
INFO:tensorflow:Evaluation [8140/16281]
INFO:tensorflow:Evaluation [9768/16281]
INFO:tensorflow:Evaluation [9768/16281]
INFO:tensorflow:Evaluation [11396/16281]
INFO:tensorflow:Evaluation [11396/16281]
INFO:tensorflow:Evaluation [13024/16281]
INFO:tensorflow:Evaluation [13024/16281]
INFO:tensorflow:Evaluation [14652/16281]
INFO:tensorflow:Evaluation [14652/16281]
INFO:tensorflow:Evaluation [16280/16281]
INFO:tensorflow:Evaluation [16280/16281]
INFO:tensorflow:Evaluation [16281/16281]
INFO:tensorflow:Evaluation [16281/16281]
INFO:tensorflow:Inference Time : 69.77762s
INFO:tensorflow:Inference Time : 69.77762s
INFO:tensorflow:Finished evaluation at 2021-02-11-13:21:25
INFO:tensorflow:Finished evaluation at 2021-02-11-13:21:25
INFO:tensorflow:Saving dict for global step 4071: accuracy = 0.85099196, accuracy_baseline = 0.76377374, auc = 0.9018778, auc_precision_recall = 0.96719545, average_loss = 0.32442597, global_step = 4071, label/mean = 0.76377374, loss = 0.32442597, precision = 0.88510966, prediction/mean = 0.7523365, recall = 0.92496985
INFO:tensorflow:Saving dict for global step 4071: accuracy = 0.85099196, accuracy_baseline = 0.76377374, auc = 0.9018778, auc_precision_recall = 0.96719545, average_loss = 0.32442597, global_step = 4071, label/mean = 0.76377374, loss = 0.32442597, precision = 0.88510966, prediction/mean = 0.7523365, recall = 0.92496985
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 4071: /tmp/tmpc28os49f/model.ckpt-4071
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 4071: /tmp/tmpc28os49f/model.ckpt-4071
{'accuracy': 0.85099196,
 'accuracy_baseline': 0.76377374,
 'auc': 0.9018778,
 'auc_precision_recall': 0.96719545,
 'average_loss': 0.32442597,
 'global_step': 4071,
 'label/mean': 0.76377374,
 'loss': 0.32442597,
 'precision': 0.88510966,
 'prediction/mean': 0.7523365,
 'recall': 0.92496985}

우리가 한 일

이 예에서는 tf.Transform 을 사용하여 인구 조사 데이터의 데이터 세트를 전처리하고 정리 및 변환 된 데이터로 모델을 훈련 시켰습니다. 또한 학습 된 모델을 프로덕션 환경에 배포하여 추론을 수행 할 때 사용할 수있는 입력 함수를 만들었습니다. 훈련과 추론 모두에 동일한 코드를 사용함으로써 데이터 왜곡 문제를 피할 수 있습니다. 그 과정에서 데이터를 정리하는 데 필요한 변환을 수행하기 위해 Apache Beam 변환을 만드는 방법을 배웠습니다. 또한이 변환 된 데이터를 사용하여 tf.keras 또는 tf.estimator 사용하여 모델을 학습시키는 방법도 보았습니다. 이것은 TensorFlow Transform이 할 수있는 일의 작은 부분 일뿐입니다! tf.Transform 대해 tf.Transform 알아보고 어떤 tf.Transform 있는지 알아 보시기 바랍니다.