Results 131 to 140 of about 10,385 (175)
Some of the next articles are maybe not open access.
Dense Video Captioning Using Graph-Based Sentence Summarization
IEEE transactions on multimediaRecently, dense video captioning has made attractive progress in detecting and captioning all events in a long untrimmed video. Despite promising results were achieved, most existing methods do not sufficiently explore the scene evolution within an event
Zhiwang Zhang +3 more
semanticscholar +1 more source
VidCapBench: A Comprehensive Benchmark of Video Captioning for Controllable Text-to-Video Generation
Annual Meeting of the Association for Computational LinguisticsThe training of controllable text-to-video (T2V) models relies heavily on the alignment between videos and captions, yet little existing research connects video caption evaluation with T2V generation assessment. This paper introduces VidCapBench, a video
Xinlong Chen +9 more
semanticscholar +1 more source
VideoCap-R1: Enhancing MLLMs for Video Captioning via Structured Thinking
arXiv.orgWhile recent advances in reinforcement learning have significantly enhanced reasoning capabilities in large language models (LLMs), these techniques remain underexplored in multi-modal LLMs for video captioning.
Desen Meng +10 more
semanticscholar +1 more source
VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning
AAAI Conference on Artificial IntelligenceDespite the advancements of Video Large Language Models (VideoLLMs) in various tasks, they struggle with fine-grained temporal understanding, such as Dense Video Captioning (DVC).
Ji Soo Lee +4 more
semanticscholar +1 more source
Event-Equalized Dense Video Captioning
Computer Vision and Pattern RecognitionDense video captioning aims to localize and caption all events in arbitrary untrimmed videos. Although previous methods have achieved appealing results, they still face the issue of temporal bias, i.e, models tend to focus more on events with certain ...
Kangyi Wu +7 more
semanticscholar +1 more source
Action-Driven Semantic Representation and Aggregation for Video Captioning
IEEE transactions on circuits and systems for video technology (Print)Video captioning, a challenging task that entails generating natural language descriptions of visual content, often fails to effectively grasp the essence of action semantics.
Tingting Han +4 more
semanticscholar +1 more source
Frame-by-Frame Multi-Object Tracking-Guided Video Captioning
IEEE transactions on circuits and systems for video technology (Print)Video captioning through deep learning presents a multifaceted challenge that encompasses the extraction of complex spatio-temporal visual features and the synthesis of meaningful natural language descriptions.
Huilan Luo, Xia Cai, L. Shark
semanticscholar +1 more source
Emotional Video Captioning With Vision-Based Emotion Interpretation Network
IEEE Transactions on Image ProcessingEffectively summarizing and re-expressing video content by natural languages in a more human-like fashion is one of the key topics in the field of multimedia content understanding.
Peipei Song +4 more
semanticscholar +1 more source
Memory-Based Augmentation Network for Video Captioning
IEEE transactions on multimediaVideo captioning focuses on generating natural language descriptions according to the video content. Existing works mainly explore this multimodal learning with the paired source video and corresponding sentence, which have achieved competitive ...
Shuaiqi Jing +5 more
semanticscholar +1 more source
Learnability Matters: Active Learning for Video Captioning
Neural Information Processing SystemsThis work focuses on the active learning in video captioning. In particular, we propose to address the learnability problem in active learning, which has been brought up by collective outliers in video captioning and neglected in the literature. To start
Yiqian Zhang +5 more
semanticscholar +1 more source

