Results 11 to 20 of about 20,321 (258)

Beyond Caption To Narrative: Video Captioning With Multiple Sentences [PDF]

open access: yes2016 IEEE International Conference on Image Processing (ICIP), 2016
Recent advances in image captioning task have led to increasing interests in video captioning task. However, most works on video captioning are focused on generating single input of aggregated features, which hardly deviates from image captioning process
Harada, Tatsuya   +2 more
core   +2 more sources

Dense-Captioning Events in Videos [PDF]

open access: yes2017 IEEE International Conference on Computer Vision (ICCV), 2017
Most natural videos contain numerous events. For example, in a video of a "man playing a piano", the video might also contain "another man dancing" or "a crowd clapping". We introduce the task of dense-captioning events, which involves both detecting and
Fei-Fei, Li   +4 more
core   +2 more sources

Semantic guidance network for video captioning. [PDF]

open access: yesSci Rep, 2023
Abstractvideo captioning is a more challenging task that aims to generate abundant natural language descriptions, and it has become a promising direction for artificial intelligence. However, most existing methods are prone to ignore the problems of visual information redundancy and scene information omission due to the limitation of the sampling ...
Guo L, Zhao H, Chen Z, Han Z.
europepmc   +4 more sources

Video Captioning in Compressed Video [PDF]

open access: yes2021 6th International Conference on Image, Vision and Computing (ICIVC), 2021
Existing approaches in video captioning concentrate on exploring global frame features in the uncompressed videos, while the free of charge and critical saliency information already encoded in the compressed videos is generally neglected. We propose a video captioning method which operates directly on the stored compressed videos.
Zhu, Mingjian   +2 more
openaire   +3 more sources

Video Captioning Using Global-Local Representation. [PDF]

open access: yesIEEE Trans Circuits Syst Video Technol, 2022
Video captioning is a challenging task as it needs to accurately transform visual understanding into natural language description. To date, state-of-the-art methods inadequately model global-local vision representation for sentence generation, leaving plenty of room for improvement.
Yan L   +6 more
europepmc   +3 more sources

Multi-modal Dense Video Captioning [PDF]

open access: yes2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2020
Dense video captioning is a task of localizing interesting events from an untrimmed video and producing textual description (captions) for each localized event. Most of the previous works in dense video captioning are solely based on visual information and completely ignore the audio track.
Rahtu Esa, Iashin Vladimir
openaire   +3 more sources

Meaning Guided Video Captioning [PDF]

open access: yes, 2020
The 5th Asian Conference on Pattern Recognition (ACPR 2019)
Babariya, Rushi J., Tamaki, Toru
openaire   +2 more sources

Step by Step: A Gradual Approach for Dense Video Captioning

open access: yesIEEE Access, 2023
Dense video captioning aims to localize and describe events for storytelling in untrimmed videos. It is a conceptually very challenging task that requires concise, relevant, and coherent captioning based on high-quality event localization.
Wangyu Choi, Jiasi Chen, Jongwon Yoon
doaj   +1 more source

Empirical autopsy of deep video captioning encoder-decoder architecture

open access: yesArray, 2021
Contemporary deep learning based video captioning methods adopt encoder-decoder framework. In encoder, visual features are extracted with 2D/3D Convolutional Neural Networks (CNNs) and a transformed version of those features is passed to the decoder. The
Nayyer Aafaq   +3 more
doaj   +1 more source

Multimodal feature fusion based on object relation for video captioning

open access: yesCAAI Transactions on Intelligence Technology, 2023
Video captioning aims at automatically generating a natural language caption to describe the content of a video. However, most of the existing methods in the video captioning task ignore the relationship between objects in the video and the correlation ...
Zhiwen Yan   +3 more
doaj   +1 more source

Home - About - Disclaimer - Privacy