XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale
Large-scale speech model is here
#speech-model
#scaling
LiT
: Zero-Shot Transfer with Locked-image Text Tuning
Image-text pre-training with pre-trained image model enhances zero-shot performance
#multi-modal
#contrastive-learning
Masked Autoencoders Are Scalable Vision Learners
Fast representation learning with autoencoders by masked image reconstruction
#self-supervised
#image-representation
Palette: Image-to-Image Diffusion Models
Diffusion models beat GANs on image-to-image translation
#diffusion-model
#image-processing
Bootstrap Your Object Detector via Mixed Training
Augmentation and pseudo-labeling enhances object detection
#object-detection
#pseudo-labeling