End-to-end Multi-modal Video Temporal Grounding
Adding depth and flow to RGB improves video understanding
#video-understanding
#multi-modal
CLIP-It! Language-Guided Video Summarization
Get highlight clip of your favorite player by typing it
#multi-modal
#video-understanding
Anticipative Video Transformer
Action anticipation from video with transformers
#vision-transformer
#self-supervised
#video-understanding
A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning
SimCLR, MoCo, BYOL, and SwAV through time!
#video-understanding
#self-supervised