Dropout consists in randomly setting a fraction rate of input units to 0
at each update during training time, which helps prevent overfitting.
The units that are kept are scaled by 1 / (1 - rate), so that their
sum is unchanged at training time and inference time.
The dropout rate, between 0 and 1. E.g. "rate=0.1" would drop out
10% of input units.
1D tensor of type int32 representing the shape of the
binary dropout mask that will be multiplied with the input.
For instance, if your inputs have shape
(batch_size, timesteps, features), and you want the dropout mask
to be the same for all timesteps, you can use
noise_shape=[batch_size, 1, features].