Class to build VisionTransformer family model.
tfm.vision.backbones.VisionTransformer(
mlp_dim=3072,
num_heads=12,
num_layers=12,
attention_dropout_rate=0.0,
dropout_rate=0.1,
init_stochastic_depth_rate=0.0,
input_specs=layers.InputSpec(shape=[None, None, None, 3]),
patch_size=16,
hidden_size=768,
representation_size=0,
pooler='token',
kernel_regularizer=None,
original_init: bool = True,
output_encoded_tokens: bool = True,
output_2d_feature_maps: bool = False,
pos_embed_shape: Optional[Tuple[int, int]] = None,
layer_scale_init_value: float = 0.0,
transformer_partition_dims: Optional[Tuple[int, int, int, int]] = None
)
Attributes |
output_specs
|
A dict of {level: TensorShape} pairs for the model output.
|
Methods
call
call(
inputs, training=None, mask=None
)
Calls the model on new inputs and returns the outputs as tensors.
In this case call()
just reapplies
all ops in the graph to the new inputs
(e.g. build a new computational graph from the provided inputs).
Args |
inputs
|
Input tensor, or dict/list/tuple of input tensors.
|
training
|
Boolean or boolean scalar tensor, indicating whether to
run the Network in training mode or inference mode.
|
mask
|
A mask or list of masks. A mask can be either a boolean tensor
or None (no mask). For more details, check the guide
here.
|
Returns |
A tensor if there is a single output, or
a list of tensors if there are more than one outputs.
|