A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation

Recaptioning images with high-quality samples improve the text-to-image generation

#image-generation #multi-modal

Knowledge-Augmented Language Model Verification

Better RAG by self-verifying the process

#language-model #retrieval-augmentation

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

Better RAG by self-reflecting the process

#language-model #retrieval-augmentation

Video Language Planning

Vision language models can make long horizon task plans

#language-model #multi-modal #robotics

PaLI-3 Vision Language Models: Smaller, Faster, Stronger

Contrastive ViT Makes VLM Stronger

#language-model #multi-modal

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2021                                                                                                                                                                          2021-03-29
Understanding Robustness of Transformers for Image Classification
             2021-04-05
Multi-Class Data Description for Out-of-distribution Detection
             2021-04-12
Relating Adversarially Robust Generalization to Flat Minima
             2021-04-19
Generating Bug-Fixes Using Pretrained Transformers
             2021-04-26
Learnable Online Graph Representations for 3D Multi-Object Tracking
             2021-05-03
Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing
             2021-05-10
ResMLP: Feedforward networks for image classification with data-efficient training
             2021-05-17
Out-of-manifold Regularization in Contextual Embedding Space for Text Classification
                          2021-05-31
Learning to Stylize Novel Views
             2021-06-07
Counterfactual Graph Learning for Link Prediction
             2021-06-14
Hybrid Generative-Contrastive Representation Learning
             2021-06-21
How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers
             2021-06-28
Single Image Texture Translation for Data Augmentation
                          2021-07-12
ViTGAN: Training GANs with Vision Transformers
                          2021-07-26
RewriteNet: Realistic Scene Text Image Generation via Editing Text in Real-world Image
             2021-08-02
EmailSum : Abstractive Email Thread Summarization
                          2021-08-16
FedPara: Low-rank Hadamard Product Parameterization for Efficient Federated Learning
                                       2021-09-06
Finetuned Language Models Are Zero-Shot Learners
                                                                                                        2021-11-01
InfoGCL: Information-Aware Graph Contrastive Learning
                                                                                                                    
                                                                                                                                                            2021-03-23
Self-Supervised Adaptation for Video Super-Resolution
             2021-03-30
Invertible Image Signal Processing
             2021-04-06
BBAEG: Towards BERT-based Biomedical Adversarial Example Generation for Text Classification
             2021-04-13
LocalViT: Bringing Locality to Vision Transformers
             2021-04-20
Surrogate Gradient Field for Latent Space Manipulation
             2021-04-27
Clean Images are Hard to Reblur: A New Clue for Deblurring
             2021-05-04
UniGNN: a Unified Framework for Graph and Hypergraph Neural Networks
             2021-05-11
Self-Supervised Learning with Swin Transformers
             2021-05-18
Pay Attention to MLPs
                          2021-06-01
An Attention Free Transformer
             2021-06-08
GAN Cocktail: mixing GANs without dataset access
             2021-06-15
Improved Transformer for High-Resolution GANs
                          2021-06-29
HDMapGen: A Hierarchical Graph Generative Model of High Definition Maps
             2021-07-06
On Model Calibration for Long-Tailed Object Detection and Instance Segmentation
             2021-07-13
End-to-end Multi-modal Video Temporal Grounding
             2021-07-20
Review update held until July 23rd
                          2021-08-03
Image Synthesis and Editing with Stochastic Differential Equations
                                                                                                                                                                         2021-11-02
Posts can now be sorted by #tags
                          2021-11-16
LiT:fire:: Zero-Shot Transfer with Locked-image Text Tuning
                                                    2021-12-14
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
             2021-12-21
Efficient Large Scale Language Modeling with Mixture-of-Experts
                         
                                                                                                                                                            2021-03-24
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases
             2021-03-31
Model-Contrastive Federated Learning
             2021-04-07
Personalized Entity Resolution with Dynamic Heterogeneous Knowledge Graph Representations
             2021-04-14
QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering
             2021-04-21
Gradient Matching for Domain Generalization
             2021-04-28
Balancing Constraints and Submodularity in Data Subset Selection
                          2021-05-12
Diffusion Models Beat GANs on Image Synthesis
                                       2021-06-02
On Fast Sampling of Diffusion Probabilistic Models
             2021-06-09
Scaling Vision Transformers
             2021-06-16
SSMix: Saliency-Based Span Mixup for Text Classification
             2021-06-23
BARTScore: Evaluating Generated Text as Text Generation
             2021-06-30
Cascaded Diffusion Models for High Fidelity Image Generation
                          2021-07-14
Per-Pixel Classification is Not All You Need for Semantic Segmentation
                                       2021-08-04
Review update held until August 11th
                                                                                                                                                                                                                                                                                             
                                                                                                                                                            2021-03-25
Knowledge-aware Contrastive Molecular Graph Learning
             2021-04-01
StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
             2021-04-08
Regularizing Generative Adversarial Networks under Limited Data
             2021-04-15
Mutual Information Preserving Back-propagation: Learn to Invert for Faithful Attribution
             2021-04-22
MetricOpt: Learning to Optimize Black-Box Evaluation Metrics
             2021-04-29
Gradient-based Adversarial Attacks against Text Transformers
             2021-05-06
VoxelContext-Net: An Octree based Framework for Point Cloud Compression
             2021-05-13
Disentangling Sampling and Labeling Bias for Learning in Large-Output Spaces
             2021-05-20
Sparse Spiking Gradient Descent
                          2021-06-03
Towards Unified Surgical Skill Assessment
             2021-06-10
Knowledge distillation: A good teacher is patient and consistent
             2021-06-17
Multi-Resolution Continuous Normalizing Flows
             2021-06-24
Alias-Free Generative Adversarial Networks
                          2021-07-08
Evaluating Large Language Models Trained on Code
                                       2021-07-29
SimROD: A Simple Adaptation Method for Robust Object Detection
                          2021-08-12
Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling
             2021-08-19
Deep reparameterization of Multi-Frame Super-Resolution and Denoising
                                                                                                                                               2021-11-04
FAST: Searching for a Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation
             2021-11-11
Palette: Image-to-Image Diffusion Models
             2021-11-18
XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale
                                                    2021-12-16
Tackling the Generative Learning Trilemma with Denoising Diffusion GANs
                                      
                                                                                                                                                            2021-03-26
Orthogonal Projection Loss
             2021-04-02
Explore Image Deblurring via Encoded Blur Kernel Space
             2021-04-09
InfinityGAN: Towards Infinite-Resolution Image Synthesis
             2021-04-16
Orthogonalizing Convolutional Layers with the Cayley Transform
             2021-04-23
Multiscale Vision Transformers
             2021-04-30
A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning
             2021-05-07
Weakly Supervised Action Selection Learning in Video
             2021-05-14
Compatibility-aware Heterogeneous Visual Search
             2021-05-21
Review update held until May 28th
                          2021-06-04
Anticipative Video Transformer
             2021-06-11
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
             2021-06-18
XCiT: Cross-Covariance Image Transformers
             2021-06-25
Sparse Flows: Pruning Continuous-depth Models
             2021-07-02
CLIP-It! Language-Guided Video Summarization
                          2021-07-16
Recurrent Parameter Generators
                                                    2021-08-13
Mobile-Former: Bridging MobileNet and Transformer
                                                                                                                                                            2021-11-05
Bootstrap Your Object Detector via Mixed Training
             2021-11-12
Masked Autoencoders Are Scalable Vision Learners
                                                                                                       
2022                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                          2022-05-03
OPT: Open Pre-trained Transformer Language Models
                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
             2022-01-05
SubMix: Practical Private Prediction for Large-scale Language Models
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
2023                                                                                                                                                                                                                                                                                                                                                                             2023-07-10
Exploring the Potential of Large Language Models (LLMs) in Learning on Graphs
                                                                                                                                                                                      2023-10-16
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
                                                                                                                                              
                                                                                                                                                                                                                                                                                                                                                                            2023-07-11
Calendar index now available
                                                                                                                                                                                      2023-10-17
Video Language Planning
                                                                                                                                              
                                                                                                                                                                                                                                                                                                                                                                            2023-07-12
Collaborative Score Distillation for Consistent Visual Synthesis
                                                                                                                                                                         2023-10-11
Mistral 7B
                                                                                                                                                           
                                                                                                                                                                                                                                                                                                                                                                            2023-07-13
Instruction Mining: High-Quality Instruction Data Selection for Large Language Models
                                                                                                                                                                         2023-10-12
Empowering Psychotherapy with Large Language Models: Cognitive Distortion Detection through Diagnosis of Thought Prompting
             2023-10-19
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
             2023-10-26
A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation
                                                                                                                                 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     2023-10-13
Large Language Models Are Zero-Shot Time Series Forecasters
             2023-10-20
Knowledge-Augmented Language Model Verification
                                                                                                                                              
#image-generation #multi-modal #language-model #retrieval-augmentation #robotics #forecasting #psychiatry #instruction-tuning #diffusion-model #notice #graph-neural-network #responsible-ai #privacy-preserving #scaling #mixture-of-experts #generative-adversarial-network #speech-model #contrastive-learning #self-supervised #image-representation #image-processing #object-detection #pseudo-labeling #scene-text-detection #neural-architecture-search #data-sampling #long-tail #graph-representation #zero-shot #metric-learning #federated-learning #weight-matrix #low-rank #vision-transformer #computer-vision #normalizing-flow #invertible-neural-network #super-resolution #image-manipulation #thread-summarization #natural-language-processing #domain-adaptation #knowledge-distillation #scene-text #model-compression #semantic-segmentation #instance-segmentation #video-understanding #code-generation #graph-generation #image-translation #data-augmentation #model-pruning #signal-processing #text-generation #text-classification #music-representation #transfer-learning #link-prediction #counterfactual-learning #medical-imaging #acceleration #transformer #style-transfer #novel-view-synthesis #point-cloud #spiking-neural-network #optimization #multi-layer-perceptron #adversarial-training #visual-search #image-retrieval #negative-sampling #action-localization #weakly-supervised #data-compression #hypergraph #adversarial-attack #submodularity #active-learning #deblurring #object-tracking #pyramid-structure #loss-function #gradient-descent #generalization #bug-fix #orthogonality #explainability #saliency-mapping #information-theory #question-answering #knowledge-graph #robustness #limited-data #recommender-system #anomaly-detection #gaussian-discriminant-analysis #molecular-graph #video-processing