Results 11 to 20 of about 34,250 (255)

Multi-Module Co-Attention Model for Visual Question Answering [PDF]

open access: yesJisuanji gongcheng, 2022
Visual Question Answering(VQA) is a typical multi-modal problem in computer vision and natural language processing.Most of the existing VQA models ignore the dynamic relationships of semantic information between two modes and the rich spatial structure ...
ZOU Pinrong, XIAO Feng, ZHANG Wenjuan, ZHANG Wanyu, WANG Chenyang
doaj   +1 more source

A Comprehensive Review and Open Challenges on Visual Question Answering Models

open access: yesAiBi Revista de Investigación, Administración e Ingeniería, 2023
Users are now able to actively interact with images and pose different questions based on images, thanks to recent developments in artificial intelligence. In turn, a response in a natural language answer is expected.
Fasi Ahamad Shaik   +4 more
doaj   +1 more source

SBVQA 2.0: Robust End-to-End Speech-Based Visual Question Answering for Open-Ended Questions

open access: yesIEEE Access, 2023
Speech-based Visual Question Answering (SBVQA) is a challenging task that aims to answer spoken questions about images. The challenges of this task involve the variability of speakers, the different recording environments, as well as the various objects ...
Faris Alasmary, Saad Al-Ahmadi
doaj   +1 more source

Question-Agnostic Attention for Visual Question Answering [PDF]

open access: yes2020 25th International Conference on Pattern Recognition (ICPR), 2021
Visual Question Answering (VQA) models employ attention mechanisms to discover image locations that are most relevant for answering a specific question. For this purpose, several multimodal fusion strategies have been proposed, ranging from relatively simple operations (e.g., linear sum) to more complex ones (e.g., Block).
Moshiur R. Farazi   +2 more
openaire   +2 more sources

VQA: Visual Question Answering [PDF]

open access: yes2015 IEEE International Conference on Computer Vision (ICCV), 2015
The first three authors contributed equally.
Stanislaw Antol   +6 more
openaire   +3 more sources

Knowledge-based Visual Question Answering:A Survey [PDF]

open access: yesJisuanji kexue, 2023
As an important presentation form of the completeness of artificial intelligence and the visual Turing test,visual question answering(VQA),coupled with its potential application value,has received extensive attention from computer vision and na-tural ...
WANG Ruiping, WU Shihong, ZHANG Meihang, WANG Xiaoping
doaj   +1 more source

TASTA: Text‐Assisted Spatial and Temporal Attention Network for Video Question Answering

open access: yesAdvanced Intelligent Systems, 2023
Video question answering (VideoQA) is a typical task that integrates language and vision. The key for VideoQA is to extract relevant and effective visual information for answering a specific question. Information selection is believed to be necessary for
Tian Wang   +5 more
doaj   +1 more source

Generative Visual Question Answering

open access: yesCoRR, 2023
Multi-modal tasks involving vision and language in deep learning continue to rise in popularity and are leading to the development of newer models that can generalize beyond the extent of their training data. The current models lack temporal generalization which enables models to adapt to changes in future data.
Ethan Shen, Scotty Singh, Bhavesh Kumar
openaire   +2 more sources

Counterfactual Mix-Up for Visual Question Answering

open access: yesIEEE Access, 2023
Counterfactuals have been shown to be a powerful method in Visual Question Answering in the alleviation of Visual Question Answering’s unimodal bias. However, existing counterfactual methods tend to generate samples that are not diverse or require
Jae Won Cho   +3 more
doaj   +1 more source

Home - About - Disclaimer - Privacy