Results 21 to 30 of about 2,605 (177)
VMAF and variants: towards a unified VQA [PDF]
Some calculational errors have been fixed in this ...
Topiwala, Pankaj +4 more
openaire +2 more sources
How Transferable are Reasoning Patterns in VQA? [PDF]
Since its inception, Visual Question Answering (VQA) is notoriously known as a task, where models are prone to exploit biases in datasets to find shortcuts instead of performing high-level reasoning. Classical methods address this by removing biases from training data, or adding branches to models to detect and remove biases.
Kervadec, Corentin +5 more
openaire +2 more sources
Towards VQA Models That Can Read [PDF]
Studies have shown that a dominant class of questions asked by visually impaired users on images of their surroundings involves reading text in the image. But today's VQA models can not read! Our paper takes a first step towards addressing this problem.
Amanpreet Singh +7 more
openaire +2 more sources
DocVQA: A Dataset for VQA on Document Images [PDF]
We present a new dataset for Visual Question Answering (VQA) on document images called DocVQA. The dataset consists of 50,000 questions defined on 12,000+ document images. Detailed analysis of the dataset in comparison with similar datasets for VQA and reading comprehension is presented.
Minesh Mathew +2 more
openaire +2 more sources
Making the V in Text-VQA Matter
Text-based VQA aims at answering questions by reading the text present in the images. It requires a large amount of scene-text relationship understanding compared to the VQA task. Recent studies have shown that the question-answer pairs in the dataset are more focused on the text present in the image but less importance is given to visual features and ...
Shamanthak Hegde +2 more
openaire +2 more sources
Supervising the Transfer of Reasoning Patterns in VQA
Methods for Visual Question Anwering (VQA) are notorious for leveraging dataset biases rather than performing reasoning, hindering generalization. It has been recently shown that better reasoning patterns emerge in attention layers of a state-of-the-art VQA model when they are trained on perfect (oracle) visual inputs.
Kervadec, Corentin +4 more
openaire +4 more sources
Distraction-free Embeddings for Robust VQA
The generation of effective latent representations and their subsequent refinement to incorporate precise information is an essential prerequisite for Vision-Language Understanding (VLU) tasks such as Video Question Answering (VQA). However, most existing methods for VLU focus on sparsely sampling or fine-graining the input information (e.g., sampling ...
Atharvan Dogra +4 more
openaire +2 more sources
GS-VQA: Zero-shot neural-symbolic visual question answering with vision-language models [PDF]
Visual Question Answering (VQA) stellt Machine Learning (ML) Systeme vor die Aufgabe,eine über ein Bild gestellte Frage in natürlicher Sprache zu beantworten.
Hadl, Jan
core +1 more source
VQA-Levels: A Hierarchical Approach for Classifying Questions in VQA
Designing datasets for Visual Question Answering (VQA) is a difficult and complex task that requires NLP for parsing and computer vision for analysing the relevant aspects of the image for answering the question asked. Several benchmark datasets have been developed by researchers but there are many issues with using them for methodical performance ...
Madhuri Latha Madaka +1 more
openaire +2 more sources
Towards Reasoning-Aware Explainable VQA
The domain of joint vision-language understanding, especially in the context of reasoning in Visual Question Answering (VQA) models, has garnered significant attention in the recent past. While most of the existing VQA models focus on improving the accuracy of VQA, the way models arrive at an answer is oftentimes a black box.
Rakesh Vaideeswaran +3 more
openaire +2 more sources

