Video captioning - Open Access .click

Results 1 to 10 of about 20,321 (258)

Video captioning with stacked attention and semantic hard pull [PDF]

PeerJ Computer Science, 2021
Video captioning, i.e., the task of generating captions from video sequences creates a bridge between the Natural Language Processing and Computer Vision domains of computer science.
Md. Mushfiqur Rahman +4 more
doaj +3 more sources

Video captioning based on vision transformer and reinforcement learning [PDF]

PeerJ Computer Science, 2022
Global encoding of visual features in video captioning is important for improving the description accuracy. In this paper, we propose a video captioning method that combines Vision Transformer (ViT) and reinforcement learning.
Hong Zhao, Zhiwen Chen, Lan Guo, Zeyu Han +3 more
doaj +3 more sources

An attention-based hybrid deep learning approach for bengali video captioning

Journal of King Saud University - Computer and Information Sciences, 2023
Video captioning is an automated process of captioning a video by understanding the content within it. Although numerous studies have been performed on video captioning in English, the field of video captioning in Bengali remains nearly unexplored ...
Md. Shahir Zaoad +5 more
exaly +3 more sources

A Semantics-Assisted Video Captioning Model Trained With Scheduled Sampling [PDF]

Frontiers in Robotics and AI, 2020
Given the features of a video, recurrent neural networks can be used to automatically generate a caption for the video. Existing methods for video captioning have at least three limitations.
Haoran Chen +4 more
doaj +2 more sources

UAT: Universal Attention Transformer for Video Captioning [PDF]

Sensors, 2022
Video captioning via encoder–decoder structures is a successful sentence generation method. In addition, using various feature extraction networks for extracting multiple features to obtain multiple kinds of visual features in the encoding process is a ...
Heeju Im, Yong-Suk Choi
doaj +2 more sources

Cross-Modal Transformer-Based Streaming Dense Video Captioning with Neural ODE Temporal Localization [PDF]

Sensors
Dense video captioning is a critical task in video understanding, requiring precise temporal localization of events and the generation of detailed, contextually rich descriptions.
Shakhnoza Muksimova +3 more
doaj +2 more sources

Evaluation of automatic video captioning using direct assessment. [PDF]

PLoS One, 2018
We present Direct Assessment, a method for manually assessing the quality of automatically-generated captions for video. Evaluating the accuracy of video captions is particularly difficult because for any given video clip there is no definitive ground ...
Graham Y, Awad G, Smeaton A.
europepmc +2 more sources

Sparse Adversarial Examples Attacking on Video Captioning Model [PDF]

Jisuanji kexue, 2023
Despite the fact that multi-modal deep learning such as image captioning model has been proved to be vulnerable to adversarial examples,the adversarial susceptibility in video caption generation is under-examined.There are two main reasons for this.On ...
QIU Jiangxing, TANG Xueming, WANG Tianmei, WANG Chen, CUI Yongquan, LUO Ting
doaj +1 more source

PWS-DVC: Enhancing Weakly Supervised Dense Video Captioning With Pretraining Approach

IEEE Access, 2023
In recent times, there has been a notable increase in efforts to simultaneously comprehend vision and language, driven by the availability of video-related datasets and advancements in language models within the domain of natural language processing ...
Wangyu Choi, Jiasi Chen, Jongwon Yoon
doaj +1 more source

Exploring deep learning approaches for video captioning: A comprehensive review

e-Prime: Advances in Electrical Engineering, Electronics and Energy, 2023
While humans can easily describe visual data at varying levels of detail, the same task presents a significant challenge for machines. This challenge becomes even more complex when dealing with video data.
Adel Jalal Yousif, Mohammed H. Al-Jammas
doaj +1 more source

deep learning
computer vision
dense video captioning

lstm
natural language processing
video description

arabic video captioning