Multi-modal adaptive gated mechanism for visual question answering. [PDF]
Visual Question Answering (VQA) is a multimodal task that uses natural language to ask and answer questions based on image content. For multimodal tasks, obtaining accurate modality feature information is crucial.
Yangshuyi Xu, Lin Zhang, Xiang Shen
doaj +2 more sources
COIN: Counterfactual Image Generation for Visual Question Answering Interpretation [PDF]
Due to the significant advancement of Natural Language Processing and Computer Vision-based models, Visual Question Answering (VQA) systems are becoming more intelligent and advanced.
Zeyd Boukhers +2 more
doaj +2 more sources
Adversarial Learning with Bidirectional Attention for Visual Question Answering [PDF]
In this paper, we provide external image features and use the internal attention mechanism to solve the VQA problem given a dataset of textual questions and related images. Most previous models for VQA use a pair of images and questions as input.
Qifeng Li, Xinyi Tang, Yi Jian
doaj +2 more sources
Deep Modular Bilinear Attention Network for Visual Question Answering [PDF]
VQA (Visual Question Answering) is a multi-model task. Given a picture and a question related to the image, it will determine the correct answer. The attention mechanism has become a de facto component of almost all VQA models. Most recent VQA approaches
Feng Yan, Wushouer Silamu, Yanbing Li
doaj +2 more sources
Informed-Learning-Guided Visual Question Answering Model of Crop Disease [PDF]
In contemporary agriculture, experts develop preventative and remedial strategies for various disease stages in diverse crops. Decision-making regarding the stages of disease occurrence exceeds the capabilities of single-image tasks, such as image ...
Yunpeng Zhao +6 more
doaj +2 more sources
A visual question answering method based on task decomposition. [PDF]
Visual question answering (VQA) as an interdisciplinary task of computer vision and natural language processing, estimating the model's visual reasoning ability, which requires the integration of image information extraction technology and natural ...
Yao Cong, Hongwei Mo
doaj +2 more sources
Multi-View Visual Question Answering with Active Viewpoint Selection [PDF]
This paper proposes a framework that allows the observation of a scene iteratively to answer a given question about the scene. Conventional visual question answering (VQA) methods are designed to answer given questions based on single-view images ...
Yue Qiu +4 more
doaj +2 more sources
BPI-MVQA: a bi-branch model for medical visual question answering [PDF]
Background Visual question answering in medical domain (VQA-Med) exhibits great potential for enhancing confidence in diagnosing diseases and helping patients better understand their medical conditions.
Shengyan Liu +3 more
doaj +2 more sources
Multi-Modal Explicit Sparse Attention Networks for Visual Question Answering [PDF]
Visual question answering (VQA) is a multi-modal task involving natural language processing (NLP) and computer vision (CV), which requires models to understand of both visual information and textual information simultaneously to predict the correct ...
Zihan Guo, Dezhi Han
doaj +2 more sources
Review of Visual Question Answering Technology [PDF]
Visual question answering (VQA) is a popular cross-modal task that combines natural language pro-cessing and computer vision techniques. The main objective of this task is to enable computers to intelligently recognize and retrieve visual content and ...
WANG Yu, SUN Haichun
doaj +1 more source

