Scale a numerical column into a predefined range on a per-key basis.
tft.scale_by_min_max_per_key(
x: common_types.ConsistentTensorType,
key: common_types.TensorType,
output_min: float = 0.0,
output_max: float = 1.0,
elementwise: bool = False,
key_vocabulary_filename: Optional[str] = None,
name: Optional[str] = None
) -> common_types.ConsistentTensorType
Args |
x
|
A numeric Tensor , SparseTensor , or RaggedTensor .
|
key
|
A Tensor , SparseTensor , or RaggedTensor of dtype tf.string.
Must meet one of the following conditions:
- key is None
- Both x and key are dense,
- Both x and key are composite and
key must exactly match x in
everything except values,
- The axis=1 index of each x matches its index of dense key.
|
output_min
|
The minimum of the range of output values.
|
output_max
|
The maximum of the range of output values.
|
elementwise
|
If true, scale each element of the tensor independently.
|
key_vocabulary_filename
|
(Optional) The file name for the per-key file.
If None, this combiner will assume the keys fit in memory and will not
store the analyzer result in a file. If '', a file name will be chosen
based on the current TensorFlow scope. If not '', it should be unique
within a given preprocessing function.
|
name
|
(Optional) A name for this operation.
|
Example:
def preprocessing_fn(inputs):
return {
'scaled': tft.scale_by_min_max_per_key(inputs['x'], inputs['s'])
}
raw_data = [dict(x=1, s='a'), dict(x=0, s='b'), dict(x=3, s='a')]
feature_spec = dict(
x=tf.io.FixedLenFeature([], tf.float32),
s=tf.io.FixedLenFeature([], tf.string))
raw_data_metadata = tft.DatasetMetadata.from_feature_spec(feature_spec)
with tft_beam.Context(temp_dir=tempfile.mkdtemp()):
transformed_dataset, transform_fn = (
(raw_data, raw_data_metadata)
| tft_beam.AnalyzeAndTransformDataset(preprocessing_fn))
transformed_data, transformed_metadata = transformed_dataset
transformed_data
[{'scaled': 0.0}, {'scaled': 0.5}, {'scaled': 1.0}]
Returns |
A Tensor , SparseTensor , or RaggedTensor containing the input column scaled to
[output_min, output_max] on a per-key basis if a key is provided. If the
analysis dataset is empty, a certain key contains a single distinct value or
the computed key vocabulary doesn't have an entry for key , then x is
scaled using a sigmoid function.
|
Raises |
ValueError
|
If output_min, output_max have the wrong order.
|
NotImplementedError
|
If elementwise is True and key is not None.
|
InvalidArgumentError
|
If indices of sparse x and key do not match.
|