A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation

Recaptioning images with high-quality samples improve the text-to-image generation

#image-generation #multi-modal

Knowledge-Augmented Language Model Verification

Better RAG by self-verifying the process

#language-model #retrieval-augmentation

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

Better RAG by self-reflecting the process

#language-model #retrieval-augmentation

Video Language Planning

Vision language models can make long horizon task plans

#language-model #multi-modal #robotics

PaLI-3 Vision Language Models: Smaller, Faster, Stronger

Contrastive ViT Makes VLM Stronger

#language-model #multi-modal

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2021                                                                                                                                                                          2021-03-29
Understanding Robustness of Transformers for Image Classification
             2021-04-05
Multi-Class Data Description for Out-of-distribution Detection
             2021-04-12
Relating Adversarially Robust Generalization to Flat Minima
             2021-04-19
Generating Bug-Fixes Using Pretrained Transformers
             2021-04-26
Learnable Online Graph Representations for 3D Multi-Object Tracking
             2021-05-03
Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing
             2021-05-10
ResMLP: Feedforward networks for image classification with data-efficient training
             2021-05-17
Out-of-manifold Regularization in Contextual Embedding Space for Text Classification
                          2021-05-31
Learning to Stylize Novel Views
             2021-06-07
Counterfactual Graph Learning for Link Prediction
             2021-06-14
Hybrid Generative-Contrastive Representation Learning
             2021-06-21
How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers
             2021-06-28
Single Image Texture Translation for Data Augmentation
                          2021-07-12
ViTGAN: Training GANs with Vision Transformers
                          2021-07-26
RewriteNet: Realistic Scene Text Image Generation via Editing Text in Real-world Image
             2021-08-02
EmailSum : Abstractive Email Thread Summarization
                          2021-08-16
FedPara: Low-rank Hadamard Product Parameterization for Efficient Federated Learning
                                       2021-09-06
Finetuned Language Models Are Zero-Shot Learners
                                                                                                        2021-11-01
InfoGCL: Information-Aware Graph Contrastive Learning
                                                                                                                    
                                                                                                                                                            2021-03-23
Self-Supervised Adaptation for Video Super-Resolution
             2021-03-30
Invertible Image Signal Processing
             2021-04-06
BBAEG: Towards BERT-based Biomedical Adversarial Example Generation for Text Classification
             2021-04-13
LocalViT: Bringing Locality to Vision Transformers
             2021-04-20
Surrogate Gradient Field for Latent Space Manipulation
             2021-04-27
Clean Images are Hard to Reblur: A New Clue for Deblurring
             2021-05-04
UniGNN: a Unified Framework for Graph and Hypergraph Neural Networks
             2021-05-11
Self-Supervised Learning with Swin Transformers
             2021-05-18
Pay Attention to MLPs
                          2021-06-01
An Attention Free Transformer
             2021-06-08
GAN Cocktail: mixing GANs without dataset access
             2021-06-15
Improved Transformer for High-Resolution GANs
                          2021-06-29
HDMapGen: A Hierarchical Graph Generative Model of High Definition Maps
             2021-07-06
On Model Calibration for Long-Tailed Object Detection and Instance Segmentation
             2021-07-13
End-to-end Multi-modal Video Temporal Grounding
             2021-07-20
Review update held until July 23rd
                          2021-08-03
Image Synthesis and Editing with Stochastic Differential Equations
                                                                                                                                                                         2021-11-02
Posts can now be sorted by #tags
                          2021-11-16
LiT:fire:: Zero-Shot Transfer with Locked-image Text Tuning
                                                    2021-12-14
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
             2021-12-21
Efficient Large Scale Language Modeling with Mixture-of-Experts
                         
                                                                                                                                                            2021-03-24
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases
             2021-03-31
Model-Contrastive Federated Learning
             2021-04-07
Personalized Entity Resolution with Dynamic Heterogeneous Knowledge Graph Representations
             2021-04-14
QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering
             2021-04-21
Gradient Matching for Domain Generalization
             2021-04-28
Balancing Constraints and Submodularity in Data Subset Selection