Results 131 to 140 of about 10,385 (175)
Some of the next articles are maybe not open access.

Dense Video Captioning Using Graph-Based Sentence Summarization

IEEE transactions on multimedia
Recently, dense video captioning has made attractive progress in detecting and captioning all events in a long untrimmed video. Despite promising results were achieved, most existing methods do not sufficiently explore the scene evolution within an event
Zhiwang Zhang   +3 more
semanticscholar   +1 more source

VidCapBench: A Comprehensive Benchmark of Video Captioning for Controllable Text-to-Video Generation

Annual Meeting of the Association for Computational Linguistics
The training of controllable text-to-video (T2V) models relies heavily on the alignment between videos and captions, yet little existing research connects video caption evaluation with T2V generation assessment. This paper introduces VidCapBench, a video
Xinlong Chen   +9 more
semanticscholar   +1 more source

VideoCap-R1: Enhancing MLLMs for Video Captioning via Structured Thinking

arXiv.org
While recent advances in reinforcement learning have significantly enhanced reasoning capabilities in large language models (LLMs), these techniques remain underexplored in multi-modal LLMs for video captioning.
Desen Meng   +10 more
semanticscholar   +1 more source

VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning

AAAI Conference on Artificial Intelligence
Despite the advancements of Video Large Language Models (VideoLLMs) in various tasks, they struggle with fine-grained temporal understanding, such as Dense Video Captioning (DVC).
Ji Soo Lee   +4 more
semanticscholar   +1 more source

Event-Equalized Dense Video Captioning

Computer Vision and Pattern Recognition
Dense video captioning aims to localize and caption all events in arbitrary untrimmed videos. Although previous methods have achieved appealing results, they still face the issue of temporal bias, i.e, models tend to focus more on events with certain ...
Kangyi Wu   +7 more
semanticscholar   +1 more source

Action-Driven Semantic Representation and Aggregation for Video Captioning

IEEE transactions on circuits and systems for video technology (Print)
Video captioning, a challenging task that entails generating natural language descriptions of visual content, often fails to effectively grasp the essence of action semantics.
Tingting Han   +4 more
semanticscholar   +1 more source

Frame-by-Frame Multi-Object Tracking-Guided Video Captioning

IEEE transactions on circuits and systems for video technology (Print)
Video captioning through deep learning presents a multifaceted challenge that encompasses the extraction of complex spatio-temporal visual features and the synthesis of meaningful natural language descriptions.
Huilan Luo, Xia Cai, L. Shark
semanticscholar   +1 more source

Emotional Video Captioning With Vision-Based Emotion Interpretation Network

IEEE Transactions on Image Processing
Effectively summarizing and re-expressing video content by natural languages in a more human-like fashion is one of the key topics in the field of multimedia content understanding.
Peipei Song   +4 more
semanticscholar   +1 more source

Memory-Based Augmentation Network for Video Captioning

IEEE transactions on multimedia
Video captioning focuses on generating natural language descriptions according to the video content. Existing works mainly explore this multimodal learning with the paired source video and corresponding sentence, which have achieved competitive ...
Shuaiqi Jing   +5 more
semanticscholar   +1 more source

Learnability Matters: Active Learning for Video Captioning

Neural Information Processing Systems
This work focuses on the active learning in video captioning. In particular, we propose to address the learnability problem in active learning, which has been brought up by collective outliers in video captioning and neglected in the literature. To start
Yiqian Zhang   +5 more
semanticscholar   +1 more source

Home - About - Disclaimer - Privacy