Collaborative Score Distillation for Consistent Visual Synthesis
Expand dimension by leveraging consistency without changing the architecture
#multi-modal
#diffusion-model
LiT
: Zero-Shot Transfer with Locked-image Text Tuning
Image-text pre-training with pre-trained image model enhances zero-shot performance
#multi-modal
#contrastive-learning
End-to-end Multi-modal Video Temporal Grounding
Adding depth and flow to RGB improves video understanding
#video-understanding
#multi-modal
CLIP-It! Language-Guided Video Summarization
Get highlight clip of your favorite player by typing it
#multi-modal
#video-understanding
StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
Make your cat cute by typing it into this StyleGAN-CLIP hybrid
#multi-modal
#generative-adversarial-network
#image-manipulation