Results 11 to 20 of about 2,605 (177)

On the Role of Visual Grounding in VQA

open access: yesCoRR
Visual Grounding (VG) in VQA refers to a model\u27s proclivity to infer answers based on question-relevant image regions. Conceptually, VG identifies as an axiomatic requirement of the VQA task.
Reich, Daniel, Schultz, Tanja
core   +2 more sources

SQuINTing at VQA Models: Introspecting VQA Models With Sub-Questions [PDF]

open access: yes2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020
Existing VQA datasets contain questions with varying levels of complexity. While the majority of questions in these datasets require perception for recognizing existence, properties, and spatial relationships of entities, a significant portion of questions pose challenges that correspond to reasoning tasks - tasks that can only be answered through a ...
Ramprasaath R. Selvaraju   +6 more
openaire   +4 more sources

Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA Models [PDF]

open access: yes2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021
To appear in ICCV 2021; Website: https://adversarialvqa.github.io/
Linjie Li   +3 more
openaire   +2 more sources

January 2020 VQA Sale

open access: yes, 2020
Sale results of VQA Calf salesPublished ...
Overbay, Andrew
core   +2 more sources

VQA With No Questions-Answers Training [PDF]

open access: yes2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020
Methods for teaching machines to answer visual questions have made significant progress in recent years, but current methods still lack important human capabilities, including integrating new visual classes and concepts in a modular manner, providing explanations for the answers and handling new domains without explicit examples.
Ben Zion Vatashsky, Shimon Ullman
openaire   +2 more sources

An Experimental Study of the Vision-Bottleneck in Vqa [PDF]

open access: yesSSRN Electronic Journal, 2022
As in many tasks combining vision and language, both modalities play a crucial role in Visual Question Answering (VQA). To properly solve the task, a given model should both understand the content of the proposed image and the nature of the question. While the fusion between modalities, which is another obviously important part of the problem, has been
Pierre Marza   +4 more
openaire   +2 more sources

MUST-VQA: MUltilingual Scene-Text VQA

open access: yes, 2023
In this paper, we present a framework for Multilingual Scene Text Visual Question Answering that deals with new languages in a zero-shot fashion. Specifically, we consider the task of Scene Text Visual Question Answering (STVQA) in which the question can be asked in different languages and it is not necessarily aligned to the scene text language. Thus,
Emanuele Vivoli   +4 more
openaire   +2 more sources

Object-Based Reasoning in VQA [PDF]

open access: yes2018 IEEE Winter Conference on Applications of Computer Vision (WACV), 2018
10 pages, 15 figures, published as a conference paper at 2018 IEEE Winter Conf. on Applications of Computer Vision (WACV'2018)
Mikyas T. Desta   +2 more
openaire   +2 more sources

VQA: Visual Question Answering [PDF]

open access: yes2015 IEEE International Conference on Computer Vision (ICCV), 2015
The first three authors contributed equally.
Stanislaw Antol   +6 more
openaire   +3 more sources

How (not) to ensemble LVLMs for VQA

open access: yesCoRR, 2023
This paper studies ensembling in the era of Large Vision-Language Models (LVLMs). Ensembling is a classical method to combine different models to get increased performance. In the recent work on Encyclopedic-VQA the authors examine a wide variety of models to solve their task: from vanilla LVLMs, to models including the caption as extra context, to ...
Lisa Alazraki   +5 more
openaire   +3 more sources

Home - About - Disclaimer - Privacy