|TensorFlow 1 version||View source on GitHub|
A dtype policy for a Keras layer.
Compat aliases for migration
See Migration guide for more details.
tf.keras.mixed_precision.experimental.Policy( name, loss_scale=USE_DEFAULT )
Used in the notebooks
|Used in the guide|
A dtype policy determines dtype-related aspects of a layer, such as its computation and variable dtypes. Each layer has a policy. Policies can be passed to the 'dtype' argument of layer constructors, or a global policy can be set with 'tf.keras.mixed_precision.experimental.set_policy'. A layer will default to the global policy if no policy is passed to it's constructor.
For most models, each layer will have the same computation dtype and variable dtype, which will typically be float32. However, when mixed precision training is used, most layers will instead have a float16 computation dtype and a float32 variable dtype. See this link for more information on mixed precision training. When the variable dtype does not match the computation dtype, variables will be automatically casted to the computation dtype to avoid type errors.
Policies also have a
tf.train.experimental.LossScale instance, which is used
tf.keras.Models to performance loss scaling. Loss scaling is only done by
Model.train_on_batch. Layers which are not Models
ignore the loss scale.
Policies are constructed by passing a string to the constructor, e.g.
tf.keras.mixed_precision.experimental.Policy('float32'). The string
determines the compute and variable dtypes. It can be one of the following:
- Any dtype name, such as 'float32' or 'float64'. Both the variable and compute dtypes will be that dtype. No loss scaling is done by default.
- 'mixed_float16' or 'mixed_bfloat16': The compute dtype is float16 or bfloat16, while the variable dtype is float32. These policies are used for mixed precision training. With 'mixed_float16', a dynamic loss scale is used by default. 'mixed_bfloat16' does no loss scaling by default, as loss scaling is unnecessary with bfloat16.
How to use mixed precision in layers with Policies
To use mixed precision in a model, the 'mixed_float16' policy can
tf.keras.mixed_precision.experimental.set_policy can be used to set
the default policy for layers if no policy is passed to them. For example:
tf.keras.mixed_precision.experimental.set_policy('mixed_float16') model = tf.keras.models.Sequential( tf.keras.layers.Input((100,)), # Dense layers use global policy of 'mixed_float16', which does # computations in float16 while keeping variables in float32. tf.keras.layers.Dense(10), tf.keras.layers.Dense(10), # Softmax should be done in float32 for numeric stability. We pass # dtype='float32' to use float32 instead of the global policy. tf.keras.layers.Activation('Softmax', dtype='float32') ) model.fit(...) # Train `model`
Alternatively, the policy can be passed to individual layers instead of
setting the global policy with
policy = tf.keras.mixed_precision.experimental.Policy('mixed_float16') model = tf.keras.models.Sequential( tf.keras.layers.Input((100,)), tf.keras.layers.Dense(10, dtype=policy), tf.keras.layers.Dense(10, dtype=policy), # Softmax should be done in float32 for numeric stability. tf.keras.layers.Activation('Softmax', dtype='float32') ) model.fit(...) # Train `model`
As the above example shows, strings can be directly passed to layer
constructors in the
dtype argument instead of policies, but only if the
string is convertible to a dtype.
Note the 'mixed_float16' policy will apply loss scaling by default in
Model.train_on_batch. If neither method is used (e.g., a
custom training loop is used) and 'mixed_float16' is used, the loss scale must
be manually applied. See
tf.keras.mixed_precision.experimental.LossScaleOptimizer for details. For
'mixed_bfloat16', no loss scaling is done and loss scaling never needs to be
The deprecated "infer" policy
In addition to a dtype or "
Once the layer is called for the first time, the layer's policy will change to the dtype of the first input.
Similarly to "infer", there is a deprecated "infer_with_float32_vars" policy
that infers the compute dtype, but not the variable dtype. Once a layer with
an "infer_with_float32_vars" policy is called for the first time, the layer's
policy will change to "
In TensorFlow 1, only the "infer" and "infer_with_float32_vars" policies are available.
name: A string. Can be one of the following values:
- Any dtype name, such as 'float32' or 'float64'. Both the variable and compute dtypes will be that dtype.
- 'mixed_float16' or 'mixed_bfloat16': The compute dtype is float16 or bfloat16, while the variable dtype is float32. With 'mixed_float16', a dynamic loss scale is used. These policies are used for mixed precision training.
- 'infer' (deprecated): Infer the compute and variable dtype from the input dtype.
tf.train.experimental.LossScale, or a value convertible to one such as "dynamic". Defaults to using no loss scaling unless
nameis "mixed_float16", in which case this defaults to "dynamic". Only
tf.keras.Models, not layers, use the loss scale, and it is only used during
compute_dtype: The compute dtype of this policy.
This is the dtype layers will do their computations in.
If this is None, the policy is "infer" or "infer_with_float32_vars" and
variable_dtypeis either None or float32 respectively.
Note that even if the compute dtype is float16 or bfloat16, hardware devices may not do individual adds, multiplies, and other fundamental operations in [b]float16, but instead may do some of them in float32 for numeric stability. The compute dtype is the dtype of the inputs and outputs of the TensorFlow ops that the layer executes. Internally, many TensorFlow ops will do certain internal calculations in float32, or some other device-internal intermediate format with higher precision than [b]float16, to increase numeric stability.
For example, a
tf.keras.layers.Denselayer, when run on a GPU with a float16 compute dtype, will pass float16 inputs to tf.matmul. But, tf.matmul will do use float32 intermediate math. The performance benefit of float16 is still apparent, due to increased memory bandwidth and the fact GPUs have specialized hardware for computating matmuls on float16 while still keeping intermediate computations in float32.
loss_scale: Returns the loss scale of this Policy.
name: Returns the name of this policy.
should_cast_variables: Returns True if variables should be casted.
This is true if the variable dtype is not the same as the compute dtype.
variable_dtype: The variable dtype of this policy.
This is the dtype layers will create their variables in, unless a layer explicit chooses a different dtype. If this is different than
Policy.compute_dtypeand both are non-None, Layers will cast variables to the compute dtype to avoid type errors.
If this is None, the policy is "infer" and the
compute_dtypeis also None. If
compute_dtypeis None, this is either None or float32.