A multimodal transformer-based visual question answering method integrating local and global information. [PDF]
Huang C, Hu Z.
europepmc +1 more source
Efficient knowledge distillation and alignment for improved KB-VQA. [PDF]
Qin X, Pei R, He C, Li F, Zhang X.
europepmc +1 more source
Video quality prediction and classification using XGBoost under variable encoding and network conditions. [PDF]
Frnda J +4 more
europepmc +1 more source
Benchmarking large multimodal models for ophthalmic visual question answering with OphthalWeChat. [PDF]
Xu P +9 more
europepmc +1 more source
Towards generalist foundation model for radiology by leveraging web-scale 2D&3D medical data. [PDF]
Wu C +5 more
europepmc +1 more source
M3AE-Distill: An Efficient Distilled Model for Medical Vision-Language Downstream Tasks. [PDF]
Liang X, Xie J, Zhang M, Bi Z.
europepmc +1 more source
Context-Aware Multi-Agent Architecture for Wildfire Insights. [PDF]
Sandeep A +5 more
europepmc +1 more source
Evaluating the performance of large language & visual-language models in cervical cytology screening. [PDF]
Hong Q +15 more
europepmc +1 more source
Hallucination-Aware Multimodal Benchmark for Gastrointestinal Image Analysis with Large Vision-Language Models. [PDF]
Khanal B +9 more
europepmc +1 more source

