Module: tfm.optimization.legacy_adamw

Adam optimizer with weight decay that exactly matches the original BERT.

Classes

class AdamWeightDecay: Adam enables L2 weight decay and clip_by_global_norm on gradients.