Vision Transformer

Transformer architecture adapted for computer vision tasks by treating images as sequences of patches.