Results 11 to 20 of about 5,597 (161)
Dense-Captioning Events in Videos [PDF]
Most natural videos contain numerous events. For example, in a video of a "man playing a piano", the video might also contain "another man dancing" or "a crowd clapping". We introduce the task of dense-captioning events, which involves both detecting and
Fei-Fei, Li +4 more
core +2 more sources
Weakly Supervised Dense Video Captioning [PDF]
This paper focuses on a novel and challenging vision task, dense video captioning, which aims to automatically describe a video clip with multiple informative and diverse caption sentences.
Chen, Yurong +6 more
core +2 more sources
Multi-modal Dense Video Captioning [PDF]
Dense video captioning is a task of localizing interesting events from an untrimmed video and producing textual description (captions) for each localized event. Most of the previous works in dense video captioning are solely based on visual information and completely ignore the audio track.
Rahtu Esa, Iashin Vladimir
openaire +3 more sources
Streamlined Dense Video Captioning [PDF]
Dense video captioning is an extremely challenging task since accurate and coherent description of events in a video requires holistic understanding of video contents as well as contextual reasoning of individual events. Most existing approaches handle this problem by first detecting event proposals from a video and then captioning on a subset of the ...
Mun, Jonghwan +4 more
openaire +2 more sources
SoccerNet-Caption: Dense Video Captioning for Soccer Broadcasts Commentaries
Soccer is more than just a game - it is a passion that transcends borders and unites people worldwide. From the roar of the crowds to the excitement of the commentators, every moment of a soccer match is a thrill. Yet, with so many games happening simultaneously, fans cannot watch them all live.
Mkhallati, Hassan +4 more
openaire +2 more sources
Leveraging auxiliary image descriptions for dense video captioning
Abstract Collecting textual descriptions is an especially costly task for dense video captioning, since each event in the video needs to be annotated separately and a long descriptive paragraph needs to be provided. In this paper, we investigate a way to mitigate this heavy burden and propose to leverage captions of visually similar images as ...
Boran, Emre +5 more
openaire +2 more sources
DVC‐Net: A deep neural network model for dense video captioning
Dense video captioning (DVC) detects multiple events in an input video and generates natural language sentences to describe each event. Previous studies predominantly used convolutional neural networks to extract visual features from videos but failed to
Sujin Lee, Incheol Kim
doaj +1 more source
Multimodal Pretraining for Dense Video Captioning
AACL-IJCNLP ...
Huang, Gabriel +4 more
openaire +2 more sources
Post-Attention Modulator for Dense Video Captioning
Peer ...
Wang, Tzu-Jui Julius +3 more
openaire +3 more sources
Semantic-Aware Pretraining for Dense Video Captioning
This report describes the details of our approach for the event dense-captioning task in ActivityNet Challenge 2021. We present a semantic-aware pretraining method for dense video captioning, which empowers the learned features to recognize high-level semantic concepts.
Wang, Teng +5 more
openaire +2 more sources

