Results 11 to 20 of about 20,321 (258)
Beyond Caption To Narrative: Video Captioning With Multiple Sentences [PDF]
Recent advances in image captioning task have led to increasing interests in video captioning task. However, most works on video captioning are focused on generating single input of aggregated features, which hardly deviates from image captioning process
Harada, Tatsuya +2 more
core +2 more sources
Dense-Captioning Events in Videos [PDF]
Most natural videos contain numerous events. For example, in a video of a "man playing a piano", the video might also contain "another man dancing" or "a crowd clapping". We introduce the task of dense-captioning events, which involves both detecting and
Fei-Fei, Li +4 more
core +2 more sources
Semantic guidance network for video captioning. [PDF]
Abstractvideo captioning is a more challenging task that aims to generate abundant natural language descriptions, and it has become a promising direction for artificial intelligence. However, most existing methods are prone to ignore the problems of visual information redundancy and scene information omission due to the limitation of the sampling ...
Guo L, Zhao H, Chen Z, Han Z.
europepmc +4 more sources
Video Captioning in Compressed Video [PDF]
Existing approaches in video captioning concentrate on exploring global frame features in the uncompressed videos, while the free of charge and critical saliency information already encoded in the compressed videos is generally neglected. We propose a video captioning method which operates directly on the stored compressed videos.
Zhu, Mingjian +2 more
openaire +3 more sources
Video Captioning Using Global-Local Representation. [PDF]
Video captioning is a challenging task as it needs to accurately transform visual understanding into natural language description. To date, state-of-the-art methods inadequately model global-local vision representation for sentence generation, leaving plenty of room for improvement.
Yan L +6 more
europepmc +3 more sources
Multi-modal Dense Video Captioning [PDF]
Dense video captioning is a task of localizing interesting events from an untrimmed video and producing textual description (captions) for each localized event. Most of the previous works in dense video captioning are solely based on visual information and completely ignore the audio track.
Rahtu Esa, Iashin Vladimir
openaire +3 more sources
Meaning Guided Video Captioning [PDF]
The 5th Asian Conference on Pattern Recognition (ACPR 2019)
Babariya, Rushi J., Tamaki, Toru
openaire +2 more sources
Step by Step: A Gradual Approach for Dense Video Captioning
Dense video captioning aims to localize and describe events for storytelling in untrimmed videos. It is a conceptually very challenging task that requires concise, relevant, and coherent captioning based on high-quality event localization.
Wangyu Choi, Jiasi Chen, Jongwon Yoon
doaj +1 more source
Empirical autopsy of deep video captioning encoder-decoder architecture
Contemporary deep learning based video captioning methods adopt encoder-decoder framework. In encoder, visual features are extracted with 2D/3D Convolutional Neural Networks (CNNs) and a transformed version of those features is passed to the decoder. The
Nayyer Aafaq +3 more
doaj +1 more source
Multimodal feature fusion based on object relation for video captioning
Video captioning aims at automatically generating a natural language caption to describe the content of a video. However, most of the existing methods in the video captioning task ignore the relationship between objects in the video and the correlation ...
Zhiwen Yan +3 more
doaj +1 more source

