Results 221 to 230 of about 34,250 (255)
Differential performance of large language models in advanced cardiac life support assessment: A comprehensive multi-dimensional analysis of accuracy, consistency, and visual recognition capabilities. [PDF]
Genc M +9 more
europepmc +1 more source
Specialized foundation models for intelligent operating rooms. [PDF]
Özsoy E +5 more
europepmc +1 more source
Collaborative positional attention for image to English question answering. [PDF]
Li Y, Teng H.
europepmc +1 more source
Medical visual question answering: A survey
Medical Visual Question Answering~(VQA) is a combination of medical artificial intelligence and popular VQA challenges. Given a medical image and a clinically relevant question in natural language, the medical VQA system is expected to predict a plausible and convincing answer.
Zhihong Lin +2 more
exaly +4 more sources
Some of the next articles are maybe not open access.
Related searches:
Related searches:
Multiple answers to a question: a new approach for visual question answering
The Visual Computer, 2020With the advent of deep learning, multi-modal data have been of great interest. One of the multi-modal tasks which can be included in the computer vision domain is visual question answering (VQA). In VQA, a question and an image are entered into the model and the model tries to answer the question according to the image.
Sayedshayan Hashemi Hosseinabad +2 more
openaire +1 more source
Visual Question Answer Diversity
Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, 2018Visual questions (VQs) can lead multiple people to respond with different answers rather than a single, agreed upon response. Moreover, the answers from a crowd can include different numbers of unique answers that arise with different relative frequencies.
Chun-Ju Yang +2 more
openaire +1 more source
2024 International Conference on Computing, Networking and Communications (ICNC)
Abstract - Vision-Language Pre-Training (VLP) significantly improves performance for a variety of multimodal tasks. However, existing models are often specialized in understanding or generation, which limits their versatility. Furthermore, trust in text data for large, loud web text remains the optimal approach for monitoring.
Ahmed Nada, Min Chen
openaire +2 more sources
Abstract - Vision-Language Pre-Training (VLP) significantly improves performance for a variety of multimodal tasks. However, existing models are often specialized in understanding or generation, which limits their versatility. Furthermore, trust in text data for large, loud web text remains the optimal approach for monitoring.
Ahmed Nada, Min Chen
openaire +2 more sources
Answer Distillation for Visual Question Answering
2019Answering open-ended questions in Visual Question Answering (VQA) is a challenging task. As the answers are totally free-form, the answer space for open-ended questions is infinite in theory. This increases the difficulty for algorithms to predict the correct answers. In this paper, we propose a method named answer distillation to decrease the scale of
Zhiwei Fang +4 more
openaire +1 more source
Sequential Visual Reasoning for Visual Question Answering
2018 5th IEEE International Conference on Cloud Computing and Intelligence Systems (CCIS), 2018Visual question answering (VQA) is a challenging task which addressing the learning and reasoning at the intersection of vision and language. This reasoning requires both understanding sequential and compositional linguistic structure from questions and sets of visual objects and their spatial relation from images.
Jinlai Liu +3 more
openaire +1 more source

