Diffusion-Refined VQA Annotations for Semi-Supervised Gaze Following
Training gaze following models requires a large number of images with gaze target coordinates annotated by human annotators, which is a laborious and inherently ambiguous process.
Graikos, Alexandros +5 more
core
Empowering liver cancer diagnosis and treatment with foundation models: technological innovation and clinical practice. [PDF]
Wang J +8 more
europepmc +1 more source
Uncovering the Full Potential of Visual Grounding Methods in VQA
Visual Grounding (VG) methods in Visual Question Answering (VQA) attempt to improve VQA performance by strengthening a model's reliance on question-relevant visual information.
Reich, Daniel, Schultz, Tanja
core
A Benchmark for Breast Cancer Screening and Diagnosis in Mammogram Visual Question Answering. [PDF]
Zhu J, Huang F, Luo Q, Chen H.
europepmc +1 more source
AniDriveQA: a VQA dataset for driving scenes with animal presence. [PDF]
Wang R, Wang R, Hu H, Yu H.
europepmc +1 more source
Foundation Models Meet Medical Image Interpretation. [PDF]
Jiao L +11 more
europepmc +1 more source
Scaling up biomedical vision-language models: Fine-tuning, instruction tuning, and multi-modal learning. [PDF]
Peng C +5 more
europepmc +1 more source
Multimodal Large Language Models in Medical Imaging: Current State and Future Directions. [PDF]
Nam Y +14 more
europepmc +1 more source
AI-Driven Digital Pathology: Deep Learning and Multimodal Integration for Precision Oncology. [PDF]
Jang HJ, Lee SH.
europepmc +1 more source

