A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation

Recaptioning images with high-quality samples improve the text-to-image generation

#image-generation #multi-modal

October 20, 2023

Knowledge-Augmented Language Model Verification

Better RAG by self-verifying the process

#language-model #retrieval-augmentation

October 19, 2023

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

Better RAG by self-reflecting the process

#language-model #retrieval-augmentation

October 17, 2023

Video Language Planning

Vision language models can make long horizon task plans

#language-model #multi-modal #robotics

October 16, 2023

PaLI-3 Vision Language Models: Smaller, Faster, Stronger

Contrastive ViT Makes VLM Stronger

#language-model #multi-modal

Previous week

2021

2021-03-29
Understanding Robustness of Transformers for Image Classification

2021-04-05
Multi-Class Data Description for Out-of-distribution Detection

2021-04-12
Relating Adversarially Robust Generalization to Flat Minima

2021-04-19
Generating Bug-Fixes Using Pretrained Transformers

2021-04-26
Learnable Online Graph Representations for 3D Multi-Object Tracking

2021-05-03
Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing

2021-05-10
ResMLP: Feedforward networks for image classification with data-efficient training

2021-05-17
Out-of-manifold Regularization in Contextual Embedding Space for Text Classification

2021-05-31
Learning to Stylize Novel Views

2021-06-07
Counterfactual Graph Learning for Link Prediction

2021-06-14
Hybrid Generative-Contrastive Representation Learning

2021-06-21
How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers

2021-06-28
Single Image Texture Translation for Data Augmentation

2021-07-12
ViTGAN: Training GANs with Vision Transformers

2021-07-26
RewriteNet: Realistic Scene Text Image Generation via Editing Text in Real-world Image

2021-08-02
EmailSum : Abstractive Email Thread Summarization

2021-08-16
FedPara: Low-rank Hadamard Product Parameterization for Efficient Federated Learning

2021-09-06
Finetuned Language Models Are Zero-Shot Learners

2021-11-01
InfoGCL: Information-Aware Graph Contrastive Learning

2021-03-23
Self-Supervised Adaptation for Video Super-Resolution

2021-03-30
Invertible Image Signal Processing

2021-04-06
BBAEG: Towards BERT-based Biomedical Adversarial Example Generation for Text Classification

2021-04-13
LocalViT: Bringing Locality to Vision Transformers

2021-04-20
Surrogate Gradient Field for Latent Space Manipulation

2021-04-27
Clean Images are Hard to Reblur: A New Clue for Deblurring

2021-05-04
UniGNN: a Unified Framework for Graph and Hypergraph Neural Networks

2021-05-11
Self-Supervised Learning with Swin Transformers

2021-05-18
Pay Attention to MLPs

2021-06-01
An Attention Free Transformer

2021-06-08
GAN Cocktail: mixing GANs without dataset access

2021-06-15
Improved Transformer for High-Resolution GANs

2021-06-29
HDMapGen: A Hierarchical Graph Generative Model of High Definition Maps

2021-07-06
On Model Calibration for Long-Tailed Object Detection and Instance Segmentation

2021-07-13
End-to-end Multi-modal Video Temporal Grounding

2021-07-20
Review update held until July 23rd

2021-08-03
Image Synthesis and Editing with Stochastic Differential Equations

2021-11-02
Posts can now be sorted by #tags

2021-11-16
LiT

: Zero-Shot Transfer with Locked-image Text Tuning

2021-12-14
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

2021-12-21
Efficient Large Scale Language Modeling with Mixture-of-Experts

2021-03-24
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases

2021-03-31
Model-Contrastive Federated Learning

2021-04-07
Personalized Entity Resolution with Dynamic Heterogeneous Knowledge Graph Representations

2021-04-14
QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering

2021-04-21
Gradient Matching for Domain Generalization

2021-04-28
Balancing Constraints and Submodularity in Data Subset Selection

2021-05-12
Diffusion Models Beat GANs on Image Synthesis

2021-06-02
On Fast Sampling of Diffusion Probabilistic Models

2021-06-09
Scaling Vision Transformers

2021-06-16
SSMix: Saliency-Based Span Mixup for Text Classification

2021-06-23
BARTScore: Evaluating Generated Text as Text Generation

2021-06-30
Cascaded Diffusion Models for High Fidelity Image Generation

2021-07-14
Per-Pixel Classification is Not All You Need for Semantic Segmentation

2021-08-04
Review update held until August 11th

2021-03-25
Knowledge-aware Contrastive Molecular Graph Learning

2021-04-01
StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery

2021-04-08
Regularizing Generative Adversarial Networks under Limited Data

2021-04-15
Mutual Information Preserving Back-propagation: Learn to Invert for Faithful Attribution

2021-04-22
MetricOpt: Learning to Optimize Black-Box Evaluation Metrics

2021-04-29
Gradient-based Adversarial Attacks against Text Transformers

2021-05-06
VoxelContext-Net: An Octree based Framework for Point Cloud Compression

2021-05-13
Disentangling Sampling and Labeling Bias for Learning in Large-Output Spaces

2021-05-20
Sparse Spiking Gradient Descent

2021-06-03
Towards Unified Surgical Skill Assessment

2021-06-10
Knowledge distillation: A good teacher is patient and consistent

2021-06-17
Multi-Resolution Continuous Normalizing Flows

2021-06-24
Alias-Free Generative Adversarial Networks

2021-07-08
Evaluating Large Language Models Trained on Code

2021-07-29
SimROD: A Simple Adaptation Method for Robust Object Detection

2021-08-12
Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling

2021-08-19
Deep reparameterization of Multi-Frame Super-Resolution and Denoising

2021-11-04
FAST: Searching for a Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation

2021-11-11
Palette: Image-to-Image Diffusion Models

2021-11-18
XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale

2021-12-16
Tackling the Generative Learning Trilemma with Denoising Diffusion GANs

2021-03-26
Orthogonal Projection Loss

2021-04-02
Explore Image Deblurring via Encoded Blur Kernel Space

2021-04-09
InfinityGAN: Towards Infinite-Resolution Image Synthesis

2021-04-16
Orthogonalizing Convolutional Layers with the Cayley Transform

2021-04-23
Multiscale Vision Transformers

2021-04-30
A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning

2021-05-07
Weakly Supervised Action Selection Learning in Video

2021-05-14
Compatibility-aware Heterogeneous Visual Search

2021-05-21
Review update held until May 28th

2021-06-04
Anticipative Video Transformer

2021-06-11
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training

2021-06-18
XCiT: Cross-Covariance Image Transformers

2021-06-25
Sparse Flows: Pruning Continuous-depth Models

2021-07-02
CLIP-It! Language-Guided Video Summarization

2021-07-16
Recurrent Parameter Generators

2021-08-13
Mobile-Former: Bridging MobileNet and Transformer

2021-11-05
Bootstrap Your Object Detector via Mixed Training

2021-11-12
Masked Autoencoders Are Scalable Vision Learners

2023

2023-07-10
Exploring the Potential of Large Language Models (LLMs) in Learning on Graphs

2023-10-16
PaLI-3 Vision Language Models: Smaller, Faster, Stronger

2023-07-11
Calendar index now available

2023-10-17
Video Language Planning

2023-07-12
Collaborative Score Distillation for Consistent Visual Synthesis

2023-10-11
Mistral 7B

2023-07-13
Instruction Mining: High-Quality Instruction Data Selection for Large Language Models

2023-10-12
Empowering Psychotherapy with Large Language Models: Cognitive Distortion Detection through Diagnosis of Thought Prompting

2023-10-19
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

2023-10-26
A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation

2023-10-13
Large Language Models Are Zero-Shot Time Series Forecasters

2023-10-20
Knowledge-Augmented Language Model Verification

#image-generation #multi-modal #language-model #retrieval-augmentation #robotics #forecasting #psychiatry #instruction-tuning #diffusion-model #notice #graph-neural-network #responsible-ai #privacy-preserving #scaling #mixture-of-experts #generative-adversarial-network #speech-model #contrastive-learning #self-supervised #image-representation #image-processing #object-detection #pseudo-labeling #scene-text-detection #neural-architecture-search #data-sampling #long-tail #graph-representation #zero-shot #metric-learning #federated-learning #weight-matrix #low-rank #vision-transformer #computer-vision #normalizing-flow #invertible-neural-network #super-resolution #image-manipulation #thread-summarization #natural-language-processing #domain-adaptation #knowledge-distillation #scene-text #model-compression #semantic-segmentation #instance-segmentation #video-understanding #code-generation #graph-generation #image-translation #data-augmentation #model-pruning #signal-processing #text-generation #text-classification #music-representation #transfer-learning #link-prediction #counterfactual-learning #medical-imaging #acceleration #transformer #style-transfer #novel-view-synthesis #point-cloud #spiking-neural-network #optimization #multi-layer-perceptron #adversarial-training #visual-search #image-retrieval #negative-sampling #action-localization #weakly-supervised #data-compression #hypergraph #adversarial-attack #submodularity #active-learning #deblurring #object-tracking #pyramid-structure #loss-function #gradient-descent #generalization #bug-fix #orthogonality #explainability #saliency-mapping #information-theory #question-answering #knowledge-graph #robustness #limited-data #recommender-system #anomaly-detection #gaussian-discriminant-analysis #molecular-graph #video-processing