Results 71 to 80 of about 2,605 (177)
Improving Automatic VQA Evaluation Using Large Language Models
8 years after the visual question answering (VQA) task was proposed, accuracy remains the primary metric for automatic evaluation. VQA Accuracy has been effective so far in the IID evaluation setting.
Agrawal, Aishwarya +2 more
core +1 more source
Question Modifiers in VQA: Evaluating Model Sensitivity
Visual Question Answering (VQA) is a challenge problem that can advance AI by integrating several important sub-disciplines including natural language understanding and computer vision.
Britton, William Johnstone
core
Free VQA Models from Knowledge Inertia by Pairwise Inconformity Learning
In this paper, we uncover the issue of knowledge inertia in visual question answering (VQA), which commonly exists in most VQA models and forces the models to mainly rely on the question content to “guess” answer, without regard to the visual information.
Sun, Xiaoshuai +4 more
core +1 more source
Semi-Supervised Implicit Augmentation for Data-Scarce VQA
Vision-language models (VLMs) have demonstrated increasing potency in solving complex vision-language tasks in the recent past. Visual question answering (VQA) is one of the primary downstream tasks for assessing the capability of VLMs, as it helps in ...
Kartik Hegde +2 more
core +1 more source
Overcoming Language Priors in VQA via Decomposed Linguistic Representations
Most existing Visual Question Answering (VQA) models overly rely on language priors between questions and answers. In this paper, we present a novel method of language attention-based VQA that learns decomposed linguistic representations of questions and
Wu, Qi +9 more
core +1 more source
Subjective Scoring Framework for VQA Models in Autonomous Driving
The development of vision and language transformer models has paved the way for Visual Question Answering (VQA) models and related research. There are metrics to assess the general accuracy of VQA models but subjective assessment of the answers generated
Abbirah Ahmed +5 more
core +1 more source
VQA-LOL: Visual Question Answering Under the Lens of Logic
16th European Conference Glasgow, UK, August 23–28, 2020 Proceedings, Part XXILogical connectives and their implications on the meaning of a natural language sentence are a fundamental aspect of understanding. In this paper, we investigate whether visual
Yang, Yezhou +7 more
core +1 more source
Estimating semantic structure for the VQA answer space
Since its appearance, Visual Question Answering (VQA, i.e. answering a question posed over an image), has always been treated as a classification problem over a set of predefined answers.
Baccouche, Moez +3 more
core
KnowIT VQA: Answering Knowledge-Based Questions about Videos
We propose a novel video understanding task by fusing knowledge-based and video question answering. First, we introduce KnowIT VQA, a video dataset with 24,282 human-generated question-answer pairs about a popular sitcom.
Nakashima, Yuta +3 more
core +1 more source
What Large Language Models Bring to Text-rich VQA?
Text-rich VQA, namely Visual Question Answering based on text recognition in the images, is a cross-modal task that requires both image comprehension and text recognition.
Lu, Jinghui +6 more
core

