Dense video captioning - Open Access .click

Results 1 to 10 of about 5,597 (161)

Dense video captioning based on local attention

IET Image Processing, 2023
Dense video captioning aims to locate multiple events in an untrimmed video and generate captions for each event. Previous methods experienced difficulties in establishing the multimodal feature relationship between frames and captions, resulting in low ...
Yong Qian +5 more
doaj +3 more sources

Fusion of Multi-Modal Features to Enhance Dense Video Caption [PDF]

Sensors, 2023
Dense video caption is a task that aims to help computers analyze the content of a video by generating abstract captions for a sequence of video frames.
Xuefei Huang +4 more
doaj +4 more sources

Parallel Pathway Dense Video Captioning With Deformable Transformer

IEEE Access, 2022
Dense video captioning is a very challenging task because it requires a high-level understanding of the video story, as well as pinpointing details such as objects and motions for a consistent and fluent description of the video.
Wangyu Choi, Jiasi Chen, Jongwon Yoon
doaj +4 more sources

Lightweight dense video captioning with cross-modal attention and knowledge-enhanced unbiased scene graph [PDF]

Complex & Intelligent Systems, 2023
Dense video captioning (DVC) aims at generating description for each scene in a video. Despite attractive progress for this task, previous works usually only concentrate on exploiting visual features while neglecting audio information in the video ...
Shixing Han +5 more
doaj +2 more sources

Cross-Modal Transformer-Based Streaming Dense Video Captioning with Neural ODE Temporal Localization [PDF]

Sensors
Dense video captioning is a critical task in video understanding, requiring precise temporal localization of events and the generation of detailed, contextually rich descriptions.
Shakhnoza Muksimova +3 more
doaj +2 more sources

Step by Step: A Gradual Approach for Dense Video Captioning

IEEE Access, 2023
Dense video captioning aims to localize and describe events for storytelling in untrimmed videos. It is a conceptually very challenging task that requires concise, relevant, and coherent captioning based on high-quality event localization.
Wangyu Choi, Jiasi Chen, Jongwon Yoon
doaj +3 more sources

PWS-DVC: Enhancing Weakly Supervised Dense Video Captioning With Pretraining Approach

IEEE Access, 2023
In recent times, there has been a notable increase in efforts to simultaneously comprehend vision and language, driven by the availability of video-related datasets and advancements in language models within the domain of natural language processing ...
Wangyu Choi, Jiasi Chen, Jongwon Yoon
doaj +3 more sources

Bridging human and machine intelligence: Reverse-engineering radiologist intentions for clinical trust and adoption [PDF]

Computational and Structural Biotechnology Journal
In the rapidly evolving landscape of medical imaging, the integration of artificial intelligence (AI) with clinical expertise offers unprecedented opportunities to enhance diagnostic precision and accuracy.
Akash Awasthi +5 more
doaj +2 more sources

Parallel Dense Video Caption Generation with Multi-Modal Features

Mathematics, 2023
The task of dense video captioning is to generate detailed natural-language descriptions for an original video, which requires deep analysis and mining of semantic captions to identify events in the video. Existing methods typically follow a localisation-
Xuefei Huang, Ka-Hou Chan, Wei Ke, Hao Sheng +3 more
doaj +1 more source

A latent topic‐aware network for dense video captioning

IET Computer Vision, 2023
Multiple events in a long untrimmed video possess the characteristics of similarity and continuity. These characteristics can be considered as a kind of topic semantic information, which probably behaves as same sports, similar scenes, same objects etc ...
Tao Xu, Yuanyuan Cui, Xinyu He, Caihua Liu +3 more
doaj +1 more source

video captioning
feature extraction