Results 81 to 90 of about 2,605 (177)

Enabling Multimodal Understanding: Lidar Data Meets VQA

open access: yes
This chapter explores the integration of Light Detection and Ranging (LiDAR) data with multimodal systems such as Visual Question Answering (VQA) to enable robust contextual understanding.
Dhananjay Thiruvady (13066857)   +3 more
core  

BOK-VQA: Bilingual outside Knowledge-Based Visual Question Answering via Graph Representation Pretraining

open access: yes
The current research direction in generative models, such as the recently developed GPT4, aims to find relevant knowledge information for multimodal and multilingual inputs to provide answers.
Lim, KyungTae   +4 more
core   +1 more source

Overview of the VQA-Med task at ImageCLEF 2021 ::visual question answering and generation in the medical domain

open access: yes, 2021
This paper presents an overview of the fourth edition of the Medical Visual Question Answering (VQA-Med) task at ImageCLEF 2021. VQA-Med 2021 includes a task on Visual Question Answering (VQA),where participants are tasked with answering questions from ...
Ben Abacha, Asma   +4 more
core  

C3-VQA: Cryogenic Counter-Based Coprocessor for Variational Quantum Algorithms

open access: yes
Cryogenic quantum computers play a leading role in demonstrating quantum advantage. Given the severe constraints on the cooling capacity in cryogenic environments, thermal design is crucial for the scalability of these computers.
Satoshi Imamura   +7 more
core   +1 more source

Evaluating VQA Models' Consistency in the Scientific Domain

open access: yes
International audienceVisual Question Answering (VQA) in the scientific domain is a challenging task that requires a high-level understanding of the given image to answer a given question. Although having impressive results on the ScienceQA dataset, both
Guinaudeau, Camille   +2 more
core   +1 more source

VQA-Diff: Exploiting VQA and Diffusion for Zero-Shot Image-to-3D Vehicle Asset Generation in Autonomous Driving

open access: yes
Generating 3D vehicle assets from in-the-wild observations is crucial to autonomous driving. Existing image-to-3D methods cannot well address this problem because they learn generation merely from image RGB information without a deeper understanding of ...
Shan, Jinjun   +7 more
core  

MISS: A Generative Pretraining and Finetuning Approach for Med-VQA

open access: yes
Medical visual question answering (VQA) is a challenging multimodal task, where Vision-Language Pre-training (VLP) models can effectively improve the generalization performance.
Jiang, Yue   +4 more
core  

RAM-VQA: Restoration Assisted Multi-Modality Video Quality Assessment

open access: yes
International audienceVideo Quality Assessment (VQA) strives to computationally emulate human perceptual judgments and has garnered significant attention given its widespread applicability. However, existing methodologies face two primary impediments:(1)
Li, Leida   +5 more
core   +1 more source

Home - About - Disclaimer - Privacy