Bilinear resizes the images to fit into the bounding boxes in the output.

images A tensor in shape (batch_size, input_h, input_w, ...) with arbitrary numbers of channel dimensions.
bbox A tensor in shape (batch_size, 4), representing the absolute coordinates (ymin, xmin, ymax, xmax) for each bounding box.
output_size The size of the output images in (output_h, output_w).

A tensor in shape (batch_size, output_h, output_w, ...). The result has the same dtype as the input if it's float32, float16, bfloat16, otherwise the result is float32.