Video captioning with stacked attention and semantic hard pull [PDF]
Video captioning, i.e., the task of generating captions from video sequences creates a bridge between the Natural Language Processing and Computer Vision domains of computer science.
Md. Mushfiqur Rahman +4 more
doaj +3 more sources
Video captioning based on vision transformer and reinforcement learning [PDF]
Global encoding of visual features in video captioning is important for improving the description accuracy. In this paper, we propose a video captioning method that combines Vision Transformer (ViT) and reinforcement learning.
Hong Zhao +3 more
doaj +3 more sources
An attention-based hybrid deep learning approach for bengali video captioning
Video captioning is an automated process of captioning a video by understanding the content within it. Although numerous studies have been performed on video captioning in English, the field of video captioning in Bengali remains nearly unexplored ...
Md. Shahir Zaoad +5 more
exaly +3 more sources
A Semantics-Assisted Video Captioning Model Trained With Scheduled Sampling [PDF]
Given the features of a video, recurrent neural networks can be used to automatically generate a caption for the video. Existing methods for video captioning have at least three limitations.
Haoran Chen +4 more
doaj +2 more sources
UAT: Universal Attention Transformer for Video Captioning [PDF]
Video captioning via encoder–decoder structures is a successful sentence generation method. In addition, using various feature extraction networks for extracting multiple features to obtain multiple kinds of visual features in the encoding process is a ...
Heeju Im, Yong-Suk Choi
doaj +2 more sources
Cross-Modal Transformer-Based Streaming Dense Video Captioning with Neural ODE Temporal Localization [PDF]
Dense video captioning is a critical task in video understanding, requiring precise temporal localization of events and the generation of detailed, contextually rich descriptions.
Shakhnoza Muksimova +3 more
doaj +2 more sources
Evaluation of automatic video captioning using direct assessment. [PDF]
We present Direct Assessment, a method for manually assessing the quality of automatically-generated captions for video. Evaluating the accuracy of video captions is particularly difficult because for any given video clip there is no definitive ground ...
Graham Y, Awad G, Smeaton A.
europepmc +2 more sources
Sparse Adversarial Examples Attacking on Video Captioning Model [PDF]
Despite the fact that multi-modal deep learning such as image captioning model has been proved to be vulnerable to adversarial examples,the adversarial susceptibility in video caption generation is under-examined.There are two main reasons for this.On ...
QIU Jiangxing, TANG Xueming, WANG Tianmei, WANG Chen, CUI Yongquan, LUO Ting
doaj +1 more source
PWS-DVC: Enhancing Weakly Supervised Dense Video Captioning With Pretraining Approach
In recent times, there has been a notable increase in efforts to simultaneously comprehend vision and language, driven by the availability of video-related datasets and advancements in language models within the domain of natural language processing ...
Wangyu Choi, Jiasi Chen, Jongwon Yoon
doaj +1 more source
Exploring deep learning approaches for video captioning: A comprehensive review
While humans can easily describe visual data at varying levels of detail, the same task presents a significant challenge for machines. This challenge becomes even more complex when dealing with video data.
Adel Jalal Yousif, Mohammed H. Al-Jammas
doaj +1 more source

