Results 31 to 40 of about 2,605 (177)

Measuring Faithful and Plausible Visual Grounding in VQA

open access: yes, 2023
Metrics for Visual Grounding (VG) in Visual Question Answering (VQA) systems primarily aim to measure a system's reliance on relevant parts of the image when inferring an answer to the given question.
Reich, Daniel   +2 more
core   +1 more source

Continual VQA for Disaster Response Systems

open access: yesCoRR, 2022
Accepted at Tackling Climate Change with Machine Learning workshop at NeurIPS ...
Aditya Kane, V. Manushree, Sahil Khose
openaire   +2 more sources

An Efficient Modern Baseline for FloodNet VQA

open access: yesCoRR, 2022
Designing efficient and reliable VQA systems remains a challenging problem, more so in the case of disaster management and response systems. In this work, we revisit fundamental combination methods like concatenation, addition and element-wise multiplication with modern image and text feature abstraction models.
Aditya Kane, Sahil Khose
openaire   +2 more sources

Exploring Question Decomposition for Zero-Shot VQA

open access: yes, 2023
Visual question answering (VQA) has traditionally been treated as a single-step task where each question receives the same amount of effort, unlike natural human question-answering strategies.
Khan, Zaid   +4 more
core  

Story2Board: A Training‐Free Approach for Expressive Visual Storytelling

open access: yesComputer Graphics Forum, EarlyView.
Abstract We present Story2Board, a training‐free framework for expressive storyboard generation from natural language. Existing methods narrowly focus on subject identity, overlooking key aspects of visual storytelling such as spatial composition, background evolution, and narrative pacing.
D. Dinkevich   +4 more
wiley   +1 more source

C-VQA: A Compositional Split of the Visual Question Answering (VQA) v1.0 Dataset

open access: yesCoRR, 2017
Visual Question Answering (VQA) has received a lot of attention over the past couple of years. A number of deep learning models have been proposed for this task. However, it has been shown that these models are heavily driven by superficial correlations in the training data and lack compositionality -- the ability to answer questions about unseen ...
Aishwarya Agrawal   +3 more
openaire   +2 more sources

Unsupervised Keyword Extraction for Full-Sentence VQA [PDF]

open access: yesProceedings of the First International Workshop on Natural Language Processing Beyond Text, 2020
EMNLP 2020 workshop: NLP Beyond Text (NLPBT)
Kohei Uehara, Tatsuya Harada
openaire   +2 more sources

Crayons × Code: Re‐Exploring Children's Drawings With Multimodal AI and Dynamic Ethics

open access: yesArea, Volume 58, Issue 2, June 2026.
ABSTRACT Children's drawings are used to study their environmental perception, but AI‐powered analysis of these visual materials remains underexplored. This paper re‐explores the use of children's drawings as a tool for understanding their perceptions of the environment, leveraging multimodal AI for analysis.
Chen Qu
wiley   +1 more source

Multiple-Question Multiple-Answer Text-VQA

open access: yes, 2023
We present Multiple-Question Multiple-Answer (MQMA), a novel approach to do text-VQA in encoder-decoder transformer models. The text-VQA task requires a model to answer a question by understanding multi-modal content: text (typically from OCR) and an ...
Tang, Peng   +4 more
core  

Evaluation of the efficacy of plan adaptation in stereotactic body proton therapy for pancreatic cancer

open access: yesJournal of Applied Clinical Medical Physics, Volume 27, Issue 5, May 2026.
Abstract Purpose This study aimed to quantitatively evaluate the efficacy of plan adaptation in stereotactic body proton therapy (SBPT) for pancreatic cancer using daily CT‐based dose evaluation, investigate the appropriate adaptation frequency. Methods This retrospective planning study included 10 patients previously treated with X‐ray stereotactic ...
Yuto Matsuo   +8 more
wiley   +1 more source

Home - About - Disclaimer - Privacy