Mobile-Former: Bridging MobileNet and Transformer
MobileNet and Transformer are bridged, rather than merged
#vision-transformer
#computer-vision
How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers
A white paper for training your ViT
#vision-transformer
#computer-vision
XCiT: Cross-Covariance Image Transformers
Self-attention through features perform better and faster for ViTs
#vision-transformer
#computer-vision
Hybrid Generative-Contrastive Representation Learning
Image representation learning in both generative-contrastive way
#self-supervised
#computer-vision
An Attention Free Transformer
Replacing attention of Transformer for computational efficiency
#transformer
#computer-vision
#natural-language-processing
Pay Attention to MLPs
MLPs taking over the game
#multi-layer-perceptron
#computer-vision
#natural-language-processing
Self-Supervised Learning with Swin Transformers
Swin-T + (MoCo + BYOL) = Encouraging result
#vision-transformer
#computer-vision
#self-supervised
ResMLP: Feedforward networks for image classification with data-efficient training
Matrix multiplication is all you need!
#multi-layer-perceptron
#computer-vision
Multiscale Vision Transformers
CNNs have pooling layers. Why not ViTs?
#vision-transformer
#pyramid-structure
#computer-vision
LocalViT: Bringing Locality to Vision Transformers
Merging locality of CNN seamlessly with any ViTs
#vision-transformer
#computer-vision
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases
Another improvement to the vision-transformer-based models with a theoretical rationale
#vision-transformer
#computer-vision