Results 41 to 50 of about 20,321 (258)
Dense video captioning involves identifying, localizing, and describing multiple events within a video. Capturing temporal and contextual dependencies between events is essential for generating coherent and accurate captions.
Dvijesh Bhatt, Priyank Thakkar
doaj +1 more source
Traditional video captioning requests a holistic description of the video, yet the detailed descriptions of the specific objects may not be available. Besides, most methods adopt frame-level inter-tangled features among objects and ambiguous descriptions
Fangyi Zhu +4 more
doaj +1 more source
Video Captioning Based on Channel Soft Attention and Semantic Reconstructor
Video captioning is a popular task which automatically generates a natural-language sentence to describe video content. Previous video captioning works mainly use the encoder–decoder framework and exploit special techniques such as attention mechanisms ...
Zhou Lei, Yiyong Huang
doaj +1 more source
Multi-Task Video Captioning with Video and Entailment Generation
Video captioning, the task of describing the content of a video, has seen some promising improvements in recent years with sequence-to-sequence models, but accurately learning the temporal and logical dynamics involved in the task still remains a ...
Bansal, Mohit, Pasunuru, Ramakanth
core +1 more source
Describing a video automatically with natural language is a challenging task in the area of computer vision. In most cases, the on-site situation of great events is reported in news, but the situation of the off-site spectators in the entrance and exit is neglected which also arouses people's interest.
Yan, Liqi, Zhu, Mingjian, Yu, Changbin
openaire +2 more sources
Quality Enhancement Based Video Captioning in Video Communication Systems
Video captioning is an automatic task that collects natural language to represent visual content. Recently, it has achieved lots of amazing progress thanks to deep learning techniques.
The Van Le, Jin Young Lee
doaj +1 more source
Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning
It is well believed that video captioning is a fundamental but challenging task in both computer vision and artificial intelligence fields. The prevalent approach is to map an input video to a variable-length output sentence in a sequence to sequence ...
Chao, Hongyang +5 more
core +1 more source
Action knowledge for video captioning with graph neural networks
Many existing video captioning methods capture action information in the video by exploiting features extracted from an action recognition model. However, directly using the action features without object-specific representation may not well capture the ...
Willy Fitra Hendria +4 more
doaj +1 more source
Frame- and Segment-Level Features and Candidate Pool Evaluation for Video Caption Generation
We present our submission to the Microsoft Video to Language Challenge of generating short captions describing videos in the challenge dataset. Our model is based on the encoder--decoder pipeline, popular in image and video captioning systems. We propose
Laaksonen, Jorma, Shetty, Rakshith
core +1 more source
MAT: A Multimodal Attentive Translator for Image Captioning [PDF]
In this work we formulate the problem of image captioning as a multimodal translation task. Analogous to machine translation, we present a sequence-to-sequence recurrent neural networks (RNN) model for image caption generation.
Liu, Chang +4 more
core +1 more source

