Mobile-Former: Bridging MobileNet and Transformer
MobileNet and Transformer are bridged, rather than merged
#vision-transformer
#computer-vision
ViTGAN: Training GANs with Vision Transformers
Attention is all you need for GAN discriminators too
#generative-adversarial-network
#vision-transformer
How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers
A white paper for training your ViT
#vision-transformer
#computer-vision
XCiT: Cross-Covariance Image Transformers
Self-attention through features perform better and faster for ViTs
#vision-transformer
#computer-vision
Improved Transformer for High-Resolution GANs
Attention is all you need for GANs too
#generative-adversarial-network
#vision-transformer
Scaling Vision Transformers
Scaling up vision transformers takes it higher
#vision-transformer
#scaling
Anticipative Video Transformer
Action anticipation from video with transformers
#vision-transformer
#self-supervised
#video-understanding
Self-Supervised Learning with Swin Transformers
Swin-T + (MoCo + BYOL) = Encouraging result
#vision-transformer
#computer-vision
#self-supervised
Multiscale Vision Transformers
CNNs have pooling layers. Why not ViTs?
#vision-transformer
#pyramid-structure
#computer-vision
LocalViT: Bringing Locality to Vision Transformers
Merging locality of CNN seamlessly with any ViTs
#vision-transformer
#computer-vision
Understanding Robustness of Transformers for Image Classification
Keep calm and use vision-transformer
#vision-transformer
#robustness
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases
Another improvement to the vision-transformer-based models with a theoretical rationale
#vision-transformer
#computer-vision