Visual question answering - Open Access .click

Results 11 to 20 of about 34,250 (255)

Vision-language models for medical report generation and visual question answering: a review [PDF]

Frontiers in Artificial Intelligence
Ghulam Rasool, Rasool Ghulam
exaly +2 more sources

Multi-Module Co-Attention Model for Visual Question Answering [PDF]

Jisuanji gongcheng, 2022
Visual Question Answering(VQA) is a typical multi-modal problem in computer vision and natural language processing.Most of the existing VQA models ignore the dynamic relationships of semantic information between two modes and the rich spatial structure ...
ZOU Pinrong, XIAO Feng, ZHANG Wenjuan, ZHANG Wanyu, WANG Chenyang
doaj +1 more source

A Comprehensive Review and Open Challenges on Visual Question Answering Models

AiBi Revista de Investigación, Administración e Ingeniería, 2023
Users are now able to actively interact with images and pose different questions based on images, thanks to recent developments in artificial intelligence. In turn, a response in a natural language answer is expected.
Fasi Ahamad Shaik +4 more
doaj +1 more source

SBVQA 2.0: Robust End-to-End Speech-Based Visual Question Answering for Open-Ended Questions

IEEE Access, 2023
Speech-based Visual Question Answering (SBVQA) is a challenging task that aims to answer spoken questions about images. The challenges of this task involve the variability of speakers, the different recording environments, as well as the various objects ...
Faris Alasmary, Saad Al-Ahmadi
doaj +1 more source

Question-Agnostic Attention for Visual Question Answering [PDF]

2020 25th International Conference on Pattern Recognition (ICPR), 2021
Visual Question Answering (VQA) models employ attention mechanisms to discover image locations that are most relevant for answering a specific question. For this purpose, several multimodal fusion strategies have been proposed, ranging from relatively simple operations (e.g., linear sum) to more complex ones (e.g., Block).
Moshiur R. Farazi, Salman H. Khan 0001, Nick Barnes +2 more
openaire +2 more sources

VQA: Visual Question Answering [PDF]

2015 IEEE International Conference on Computer Vision (ICCV), 2015
The first three authors contributed equally.
Stanislaw Antol +6 more
openaire +3 more sources

Knowledge-based Visual Question Answering:A Survey [PDF]

Jisuanji kexue, 2023
As an important presentation form of the completeness of artificial intelligence and the visual Turing test,visual question answering(VQA),coupled with its potential application value,has received extensive attention from computer vision and na-tural ...
WANG Ruiping, WU Shihong, ZHANG Meihang, WANG Xiaoping
doaj +1 more source

TASTA: Text‐Assisted Spatial and Temporal Attention Network for Video Question Answering

Advanced Intelligent Systems, 2023
Video question answering (VideoQA) is a typical task that integrates language and vision. The key for VideoQA is to extract relevant and effective visual information for answering a specific question. Information selection is believed to be necessary for
Tian Wang +5 more
doaj +1 more source

Generative Visual Question Answering

CoRR, 2023
Multi-modal tasks involving vision and language in deep learning continue to rise in popularity and are leading to the development of newer models that can generalize beyond the extent of their training data. The current models lack temporal generalization which enables models to adapt to changes in future data.
Ethan Shen, Scotty Singh, Bhavesh Kumar
openaire +2 more sources

Counterfactual Mix-Up for Visual Question Answering

IEEE Access, 2023
Counterfactuals have been shown to be a powerful method in Visual Question Answering in the alleviation of Visual Question Answering’s unimodal bias. However, existing counterfactual methods tend to generate samples that are not diverse or require
Jae Won Cho +3 more
doaj +1 more source

vqa
attention mechanism
natural language processing

computer vision
deep learning
medicine

question answering