Mobile-Former: Bridging MobileNet and Transformer
MobileNet and Transformer are bridged, rather than merged
                
                
                  
                
                  
                    #vision-transformer
                  
                
                
                  
                  
                    #computer-vision
                  
                
                
              
ViTGAN: Training GANs with Vision Transformers
Attention is all you need for GAN discriminators too
                
                
                  
                  
                    #generative-adversarial-network
                  
                
                
                  
                
                  
                    #vision-transformer
                  
                
                
              
How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers
A white paper for training your ViT
                
                
                  
                
                  
                    #vision-transformer
                  
                
                
                  
                  
                    #computer-vision
                  
                
                
              
XCiT: Cross-Covariance Image Transformers
Self-attention through features perform better and faster for ViTs
                
                
                  
                
                  
                    #vision-transformer
                  
                
                
                  
                  
                    #computer-vision
                  
                
                
              
Improved Transformer for High-Resolution GANs
Attention is all you need for GANs too
                
                
                  
                  
                    #generative-adversarial-network
                  
                
                
                  
                
                  
                    #vision-transformer
                  
                
                
              
Scaling Vision Transformers
Scaling up vision transformers takes it higher
                
                
                  
                
                  
                    #vision-transformer
                  
                
                
                  
                  
                    #scaling
                  
                
                
              
Anticipative Video Transformer
Action anticipation from video with transformers
                
                
                  
                
                  
                    #vision-transformer
                  
                
                
                  
                  
                    #self-supervised
                  
                
                
                  
                  
                    #video-understanding
                  
                
                
              
Self-Supervised Learning with Swin Transformers
Swin-T + (MoCo + BYOL) = Encouraging result
                
                
                  
                
                  
                    #vision-transformer
                  
                
                
                  
                  
                    #computer-vision
                  
                
                
                  
                  
                    #self-supervised
                  
                
                
              
Multiscale Vision Transformers
CNNs have pooling layers. Why not ViTs?
                
                
                  
                
                  
                    #vision-transformer
                  
                
                
                  
                  
                    #pyramid-structure
                  
                
                
                  
                  
                    #computer-vision
                  
                
                
              
LocalViT: Bringing Locality to Vision Transformers
Merging locality of CNN seamlessly with any ViTs
                
                
                  
                
                  
                    #vision-transformer
                  
                
                
                  
                  
                    #computer-vision
                  
                
                
              
Understanding Robustness of Transformers for Image Classification
Keep calm and use vision-transformer
                
                
                  
                
                  
                    #vision-transformer
                  
                
                
                  
                  
                    #robustness
                  
                
                
              
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases
Another improvement to the vision-transformer-based models with a theoretical rationale
                
                
                  
                
                  
                    #vision-transformer
                  
                
                
                  
                  
                    #computer-vision