Results 1 to 10 of about 2,605 (177)
MedVH: Toward Systematic Evaluation of Hallucination for Large Vision Language Models in the Medical Context. [PDF]
MedVH introduces the first comprehensive benchmark for diagnosing hallucinations in medical vision‐language models. Across six multitask evaluations, eight state‐of‐the‐art LVLMs reveal that domain‐tuned models, while strong on routine questions, hallucinate more than general models, raising serious concerns for real‐world clinical use.
Gu Z, Chen J, Liu F, Yin C, Zhang P.
europepmc +2 more sources
Advancing Surgical VQA with Scene Graph Knowledge
International audiencePurpose The modern operating room is becoming increasingly complex, requiring innovative intra-operative support systems. While the focus of surgical data science has largely been on video analysis, integrating surgical computer ...
Srivastav, Vinkle +5 more
core +6 more sources
Inverse Visual Question Answering: A New Benchmark and VQA Diagnosis Tool
In recent years, visual question answering (VQA) has become topical. The premise of VQA's significance as a benchmark in AI, is that both the image and textual question need to be well understood and mutually grounded in order to infer the correct answer.
Feng Liu +2 more
exaly +2 more sources
Empirical evaluation of no-reference VQA methods on a natural video quality database
No-Reference (NR) Video Quality Assessment (VQA) is a challenging task since it predicts the visual quality of a video sequence without comparison to some original reference video. Several NR-VQA methods have been proposed.
Hui Men, Hanhe Lin, Dietmar Saupe
exaly +2 more sources
EaSe: A Diagnostic Tool for VQA Based on Answer Diversity [PDF]
We propose EASE, a simple diagnostic tool for Visual Question Answering (VQA) which quantifies the difficulty of an image, question sample. EASE is based on the pattern of answers provided by multiple annotators to a given question.
Nabi, M. +5 more
core +2 more sources
Tri-VQA: Triangular Reasoning Medical Visual Question Answering for Multi-Attribute Analysis
The intersection of medical Visual Question Answering (Med-VQA) is a challenging research topic with advantages including patient engagement and clinical expert involvement for second opinions.
Lin Fan, , Yafei Ou
exaly +2 more sources
Visual Question Answering (VQA) is a multimodal task that uses natural language to ask and answer questions based on image content. For multimodal tasks, obtaining accurate modality feature information is crucial.
Yangshuyi Xu (16462504) +2 more
core +1 more source
RankDVQA: Deep VQA based on Ranking-inspired Hybrid Training
In recent years, deep learning techniques have shown significant potential for improving video quality assessment (VQA), achieving higher correlation with subjective opinions compared to conventional approaches.
Bull, David +5 more
core +1 more source
Attacking vqa systems via adversarial background noise
Adversarial examples have been successfully generated for various image classification models. Recently, several methods have been proposed to generate adversarial examples for more sophisticated tasks such as image captioning and visual question ...
Chaturvedi, Akshay, Garain, Utpal
core +1 more source
A Self-supervised Strategy for the Robustness of VQA Models
Part 6: Game Theory and EmotionInternational audienceIn visual question answering (VQA), most existing models suffer from language biases which make models not robust.
Jing, Chenchen +3 more
core +1 more source

