Arabic video captioning - Open Access .click

Results 141 to 150 of about 10,385 (175)

Some of the next articles are maybe not open access.

Learning Comprehensive Visual Grounding for Video Captioning

IEEE transactions on circuits and systems for video technology (Print)
The grounding accuracy of existing video captioners is still behind the expectation. The majority of existing methods perform grounded video captioning on sparse entity annotations.
Wenhui Jiang +5 more
semanticscholar +1 more source

PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning

arXiv.org
Vision-language pre-training has significantly elevated performance across a wide range of image-language applications. Yet, the pre-training process for video-related tasks demands exceptionally large computational and data resources, which hinders the ...
Lin Xu +5 more
semanticscholar +1 more source

Action-aware Linguistic Skeleton Optimization Network for Non-autoregressive Video Captioning

ACM Trans. Multim. Comput. Commun. Appl.
Non-autoregressive video captioning methods generate visual words in parallel but often overlook semantic correlations among them, especially regarding verbs, leading to lower caption quality.
Shuqin Chen +6 more
semanticscholar +1 more source

Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting

arXiv.org
We present CAT-V (Caption AnyThing in Video), a training-free framework for fine-grained object-centric video captioning that enables detailed descriptions of user-selected objects through time. CAT-V integrates three key components: a Segmenter based on
Yunlong Tang +18 more
semanticscholar +1 more source

NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative

International Conference on Learning Representations
Existing video captioning benchmarks and models lack causal-temporal narrative, which is sequences of events linked through cause and effect, unfolding over time and driven by characters or agents.
Asmar Nadeem +5 more
semanticscholar +1 more source

MoS2: Mixture of Scale and Shift Experts for Text-Only Video Captioning

ACM Multimedia
Video captioning is a challenging task and typically requires paired video-text data for training. However, manually annotating coherent textual descriptions for videos is laborious and time-consuming.
Heng Jia +5 more
semanticscholar +1 more source

EvCap: Element-Aware Video Captioning

IEEE transactions on circuits and systems for video technology (Print)
Video captioning is a multi-modal task across computer vision and natural language processing. Previous methods generally follow two paradigms, i.e. template-based and sequence-based. Template-based methods can generate relatively accurate elements (e.g.
Sheng Liu +4 more
semanticscholar +1 more source

RETTA: Retrieval-enhanced test-time adaptation for zero-shot video captioning

Pattern Recognition
Despite the significant progress of fully-supervised video captioning, zero-shot methods remain much less explored. In this paper, we propose a novel zero-shot video captioning framework named Retrieval-Enhanced Test-Time Adaptation (RETTA), which takes ...
Yunchuan Ma +6 more
semanticscholar +1 more source

Traffic Scenario Understanding and Video Captioning via Guidance Attention Captioning Network

IEEE transactions on intelligent transportation systems (Print)
Describing a traffic scenario from the driver’s perspective is a challenging process for Advanced Driving Assistance System (ADAS), involving different sub-tasks of detection, tracking, segmentation, etc.
Chunsheng Liu +6 more
semanticscholar +1 more source

Implicit Location-Caption Alignment via Complementary Masking for Weakly-Supervised Dense Video Captioning

AAAI Conference on Artificial Intelligence
Weakly-Supervised Dense Video Captioning (WSDVC) aims to localize and describe all events of interest in a video without requiring annotations of event boundaries. This setting poses a great challenge in accurately locating the temporal location of event,
Shiping Ge +6 more
semanticscholar +1 more source

computer science
linguistics
deep learning

engineering
education
computer vision

medicine