Dense video captioning based on local attention
Dense video captioning aims to locate multiple events in an untrimmed video and generate captions for each event. Previous methods experienced difficulties in establishing the multimodal feature relationship between frames and captions, resulting in low ...
Yong Qian +5 more
doaj +3 more sources
Fusion of Multi-Modal Features to Enhance Dense Video Caption [PDF]
Dense video caption is a task that aims to help computers analyze the content of a video by generating abstract captions for a sequence of video frames.
Xuefei Huang +4 more
doaj +4 more sources
Parallel Pathway Dense Video Captioning With Deformable Transformer
Dense video captioning is a very challenging task because it requires a high-level understanding of the video story, as well as pinpointing details such as objects and motions for a consistent and fluent description of the video.
Wangyu Choi, Jiasi Chen, Jongwon Yoon
doaj +4 more sources
Lightweight dense video captioning with cross-modal attention and knowledge-enhanced unbiased scene graph [PDF]
Dense video captioning (DVC) aims at generating description for each scene in a video. Despite attractive progress for this task, previous works usually only concentrate on exploiting visual features while neglecting audio information in the video ...
Shixing Han +5 more
doaj +2 more sources
Cross-Modal Transformer-Based Streaming Dense Video Captioning with Neural ODE Temporal Localization [PDF]
Dense video captioning is a critical task in video understanding, requiring precise temporal localization of events and the generation of detailed, contextually rich descriptions.
Shakhnoza Muksimova +3 more
doaj +2 more sources
Step by Step: A Gradual Approach for Dense Video Captioning
Dense video captioning aims to localize and describe events for storytelling in untrimmed videos. It is a conceptually very challenging task that requires concise, relevant, and coherent captioning based on high-quality event localization.
Wangyu Choi, Jiasi Chen, Jongwon Yoon
doaj +3 more sources
PWS-DVC: Enhancing Weakly Supervised Dense Video Captioning With Pretraining Approach
In recent times, there has been a notable increase in efforts to simultaneously comprehend vision and language, driven by the availability of video-related datasets and advancements in language models within the domain of natural language processing ...
Wangyu Choi, Jiasi Chen, Jongwon Yoon
doaj +3 more sources
Bridging human and machine intelligence: Reverse-engineering radiologist intentions for clinical trust and adoption [PDF]
In the rapidly evolving landscape of medical imaging, the integration of artificial intelligence (AI) with clinical expertise offers unprecedented opportunities to enhance diagnostic precision and accuracy.
Akash Awasthi +5 more
doaj +2 more sources
Parallel Dense Video Caption Generation with Multi-Modal Features
The task of dense video captioning is to generate detailed natural-language descriptions for an original video, which requires deep analysis and mining of semantic captions to identify events in the video. Existing methods typically follow a localisation-
Xuefei Huang +3 more
doaj +1 more source
A latent topicâaware network for dense video captioning
Multiple events in a long untrimmed video possess the characteristics of similarity and continuity. These characteristics can be considered as a kind of topic semantic information, which probably behaves as same sports, similar scenes, same objects etc ...
Tao Xu +3 more
doaj +1 more source

