tft.scale_by_min_max_per_key

Scale a numerical column into a predefined range on a per-key basis.

x A numeric Tensor, SparseTensor, or RaggedTensor.
key A Tensor, SparseTensor, or RaggedTensor of dtype tf.string. Must meet one of the following conditions:

  1. key is None
  2. Both x and key are dense,
  3. Both x and key are composite and key must exactly match x in everything except values,
  4. The axis=1 index of each x matches its index of dense key.
output_min The minimum of the range of output values.
output_max The maximum of the range of output values.
elementwise If true, scale each element of the tensor independently.
key_vocabulary_filename (Optional) The file name for the per-key file. If None, this combiner will assume the keys fit in memory and will not store the analyzer result in a file. If '', a file name will be chosen based on the current TensorFlow scope. If not '', it should be unique within a given preprocessing function.
name (Optional) A name for this operation.

Example:

def preprocessing_fn(inputs):
  return {
     'scaled': tft.scale_by_min_max_per_key(inputs['x'], inputs['s'])
  }
raw_data = [dict(x=1, s='a'), dict(x=0, s='b'), dict(x=3, s='a')]
feature_spec = dict(
    x=tf.io.FixedLenFeature([], tf.float32),
    s=tf.io.FixedLenFeature([], tf.string))
raw_data_metadata = tft.DatasetMetadata.from_feature_spec(feature_spec)
with tft_beam.Context(temp_dir=tempfile.mkdtemp()):
  transformed_dataset, transform_fn = (
      (raw_data, raw_data_metadata)
      | tft_beam.AnalyzeAndTransformDataset(preprocessing_fn))
transformed_data, transformed_metadata = transformed_dataset
transformed_data
[{'scaled': 0.0}, {'scaled': 0.5}, {'scaled': 1.0}]

A Tensor, SparseTensor, or RaggedTensor containing the input column scaled to [output_min, output_max] on a per-key basis if a key is provided. If the analysis dataset is empty, a certain key contains a single distinct value or the computed key vocabulary doesn't have an entry for key, then x is scaled using a sigmoid function.

ValueError If output_min, output_max have the wrong order.
NotImplementedError If elementwise is True and key is not None.
InvalidArgumentError If indices of sparse x and key do not match.