Results 21 to 30 of about 2,605 (177)

VMAF and variants: towards a unified VQA [PDF]

open access: yesApplications of Digital Image Processing XLIV, 2021
Some calculational errors have been fixed in this ...
Topiwala, Pankaj   +4 more
openaire   +2 more sources

How Transferable are Reasoning Patterns in VQA? [PDF]

open access: yes2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021
Since its inception, Visual Question Answering (VQA) is notoriously known as a task, where models are prone to exploit biases in datasets to find shortcuts instead of performing high-level reasoning. Classical methods address this by removing biases from training data, or adding branches to models to detect and remove biases.
Kervadec, Corentin   +5 more
openaire   +2 more sources

Towards VQA Models That Can Read [PDF]

open access: yes2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019
Studies have shown that a dominant class of questions asked by visually impaired users on images of their surroundings involves reading text in the image. But today's VQA models can not read! Our paper takes a first step towards addressing this problem.
Amanpreet Singh   +7 more
openaire   +2 more sources

DocVQA: A Dataset for VQA on Document Images [PDF]

open access: yes2021 IEEE Winter Conference on Applications of Computer Vision (WACV), 2021
We present a new dataset for Visual Question Answering (VQA) on document images called DocVQA. The dataset consists of 50,000 questions defined on 12,000+ document images. Detailed analysis of the dataset in comparison with similar datasets for VQA and reading comprehension is presented.
Minesh Mathew   +2 more
openaire   +2 more sources

Making the V in Text-VQA Matter

open access: yes2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2023
Text-based VQA aims at answering questions by reading the text present in the images. It requires a large amount of scene-text relationship understanding compared to the VQA task. Recent studies have shown that the question-answer pairs in the dataset are more focused on the text present in the image but less importance is given to visual features and ...
Shamanthak Hegde   +2 more
openaire   +2 more sources

Supervising the Transfer of Reasoning Patterns in VQA

open access: yesCoRR, 2021
Methods for Visual Question Anwering (VQA) are notorious for leveraging dataset biases rather than performing reasoning, hindering generalization. It has been recently shown that better reasoning patterns emerge in attention layers of a state-of-the-art VQA model when they are trained on perfect (oracle) visual inputs.
Kervadec, Corentin   +4 more
openaire   +4 more sources

Distraction-free Embeddings for Robust VQA

open access: yesCoRR, 2023
The generation of effective latent representations and their subsequent refinement to incorporate precise information is an essential prerequisite for Vision-Language Understanding (VLU) tasks such as Video Question Answering (VQA). However, most existing methods for VLU focus on sparsely sampling or fine-graining the input information (e.g., sampling ...
Atharvan Dogra   +4 more
openaire   +2 more sources

GS-VQA: Zero-shot neural-symbolic visual question answering with vision-language models [PDF]

open access: yes, 2023
Visual Question Answering (VQA) stellt Machine Learning (ML) Systeme vor die Aufgabe,eine über ein Bild gestellte Frage in natürlicher Sprache zu beantworten.
Hadl, Jan
core   +1 more source

VQA-Levels: A Hierarchical Approach for Classifying Questions in VQA

open access: yesCoRR
Designing datasets for Visual Question Answering (VQA) is a difficult and complex task that requires NLP for parsing and computer vision for analysing the relevant aspects of the image for answering the question asked. Several benchmark datasets have been developed by researchers but there are many issues with using them for methodical performance ...
Madhuri Latha Madaka   +1 more
openaire   +2 more sources

Towards Reasoning-Aware Explainable VQA

open access: yesCoRR, 2022
The domain of joint vision-language understanding, especially in the context of reasoning in Visual Question Answering (VQA) models, has garnered significant attention in the recent past. While most of the existing VQA models focus on improving the accuracy of VQA, the way models arrive at an answer is oftentimes a black box.
Rakesh Vaideeswaran   +3 more
openaire   +2 more sources

Home - About - Disclaimer - Privacy