Results 31 to 40 of about 2,605 (177)
Measuring Faithful and Plausible Visual Grounding in VQA
Metrics for Visual Grounding (VG) in Visual Question Answering (VQA) systems primarily aim to measure a system's reliance on relevant parts of the image when inferring an answer to the given question.
Reich, Daniel +2 more
core +1 more source
Continual VQA for Disaster Response Systems
Accepted at Tackling Climate Change with Machine Learning workshop at NeurIPS ...
Aditya Kane, V. Manushree, Sahil Khose
openaire +2 more sources
An Efficient Modern Baseline for FloodNet VQA
Designing efficient and reliable VQA systems remains a challenging problem, more so in the case of disaster management and response systems. In this work, we revisit fundamental combination methods like concatenation, addition and element-wise multiplication with modern image and text feature abstraction models.
Aditya Kane, Sahil Khose
openaire +2 more sources
Exploring Question Decomposition for Zero-Shot VQA
Visual question answering (VQA) has traditionally been treated as a single-step task where each question receives the same amount of effort, unlike natural human question-answering strategies.
Khan, Zaid +4 more
core
Story2Board: A Training‐Free Approach for Expressive Visual Storytelling
Abstract We present Story2Board, a training‐free framework for expressive storyboard generation from natural language. Existing methods narrowly focus on subject identity, overlooking key aspects of visual storytelling such as spatial composition, background evolution, and narrative pacing.
D. Dinkevich +4 more
wiley +1 more source
C-VQA: A Compositional Split of the Visual Question Answering (VQA) v1.0 Dataset
Visual Question Answering (VQA) has received a lot of attention over the past couple of years. A number of deep learning models have been proposed for this task. However, it has been shown that these models are heavily driven by superficial correlations in the training data and lack compositionality -- the ability to answer questions about unseen ...
Aishwarya Agrawal +3 more
openaire +2 more sources
Unsupervised Keyword Extraction for Full-Sentence VQA [PDF]
EMNLP 2020 workshop: NLP Beyond Text (NLPBT)
Kohei Uehara, Tatsuya Harada
openaire +2 more sources
Crayons × Code: Re‐Exploring Children's Drawings With Multimodal AI and Dynamic Ethics
ABSTRACT Children's drawings are used to study their environmental perception, but AI‐powered analysis of these visual materials remains underexplored. This paper re‐explores the use of children's drawings as a tool for understanding their perceptions of the environment, leveraging multimodal AI for analysis.
Chen Qu
wiley +1 more source
Multiple-Question Multiple-Answer Text-VQA
We present Multiple-Question Multiple-Answer (MQMA), a novel approach to do text-VQA in encoder-decoder transformer models. The text-VQA task requires a model to answer a question by understanding multi-modal content: text (typically from OCR) and an ...
Tang, Peng +4 more
core
Abstract Purpose This study aimed to quantitatively evaluate the efficacy of plan adaptation in stereotactic body proton therapy (SBPT) for pancreatic cancer using daily CT‐based dose evaluation, investigate the appropriate adaptation frequency. Methods This retrospective planning study included 10 patients previously treated with X‐ray stereotactic ...
Yuto Matsuo +8 more
wiley +1 more source

