Dense video captioning - Open Access .click

Results 11 to 20 of about 5,597 (161)

Dense-Captioning Events in Videos [PDF]

2017 IEEE International Conference on Computer Vision (ICCV), 2017
Most natural videos contain numerous events. For example, in a video of a "man playing a piano", the video might also contain "another man dancing" or "a crowd clapping". We introduce the task of dense-captioning events, which involves both detecting and
Fei-Fei, Li +4 more
core +2 more sources

Weakly Supervised Dense Video Captioning [PDF]

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017
This paper focuses on a novel and challenging vision task, dense video captioning, which aims to automatically describe a video clip with multiple informative and diverse caption sentences.
Chen, Yurong +6 more
core +2 more sources

Multi-modal Dense Video Captioning [PDF]

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2020
Dense video captioning is a task of localizing interesting events from an untrimmed video and producing textual description (captions) for each localized event. Most of the previous works in dense video captioning are solely based on visual information and completely ignore the audio track.
Rahtu Esa, Iashin Vladimir
openaire +3 more sources

Streamlined Dense Video Captioning [PDF]

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019
Dense video captioning is an extremely challenging task since accurate and coherent description of events in a video requires holistic understanding of video contents as well as contextual reasoning of individual events. Most existing approaches handle this problem by first detecting event proposals from a video and then captioning on a subset of the ...
Mun, Jonghwan +4 more
openaire +2 more sources

SoccerNet-Caption: Dense Video Captioning for Soccer Broadcasts Commentaries

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2023
Soccer is more than just a game - it is a passion that transcends borders and unites people worldwide. From the roar of the crowds to the excitement of the commentators, every moment of a soccer match is a thrill. Yet, with so many games happening simultaneously, fans cannot watch them all live.
Mkhallati, Hassan +4 more
openaire +2 more sources

Leveraging auxiliary image descriptions for dense video captioning

Pattern Recognition Letters, 2021
Abstract Collecting textual descriptions is an especially costly task for dense video captioning, since each event in the video needs to be annotated separately and a long descriptive paragraph needs to be provided. In this paper, we investigate a way to mitigate this heavy burden and propose to leverage captions of visually similar images as ...
Boran, Emre +5 more
openaire +2 more sources

DVC‐Net: A deep neural network model for dense video captioning

IET Computer Vision, 2021
Dense video captioning (DVC) detects multiple events in an input video and generates natural language sentences to describe each event. Previous studies predominantly used convolutional neural networks to extract visual features from videos but failed to
Sujin Lee, Incheol Kim
doaj +1 more source

Multimodal Pretraining for Dense Video Captioning

Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, 2020
AACL-IJCNLP ...
Huang, Gabriel +4 more
openaire +2 more sources

Post-Attention Modulator for Dense Video Captioning

2022 26th International Conference on Pattern Recognition (ICPR), 2022
Peer ...
Wang, Tzu-Jui Julius, Laaksonen, Jorma, Guo, Zixin +3 more
openaire +3 more sources

Semantic-Aware Pretraining for Dense Video Captioning

, 2022
This report describes the details of our approach for the event dense-captioning task in ActivityNet Challenge 2021. We present a semantic-aware pretraining method for dense video captioning, which empowers the learned features to recognize high-level semantic concepts.
Wang, Teng +5 more
openaire +2 more sources

video captioning
feature extraction