Comparative evaluation of viral hepatitis question responses: ChatGPT-4.5 outperforms three established models. [PDF]
Ma J +8 more
europepmc +1 more source
Performance of large language models in fluoride-related dental knowledge: a comparative evaluation study of ChatGPT-4, Claude 3.5 Sonnet, Copilot, and Grok 3. [PDF]
Biswas R +2 more
europepmc +1 more source
Evaluating the performance of general purpose large language models in identifying human facial emotions. [PDF]
Nelson BW +5 more
europepmc +1 more source
Evaluating Spanish Translations of Emergency Department Discharge Instructions by a Large Language Model: Tool Validation and Reliability Study. [PDF]
Carreras Tartak JA +9 more
europepmc +1 more source
The imitation game: large language models versus multidisciplinary tumor boards: benchmarking AI against 21 sarcoma centers from the ring trial. [PDF]
Li CP +9 more
europepmc +1 more source
Limitations of broadly trained LLMs in interpreting orthopedic Walch glenoid classifications. [PDF]
ElSayed A, Updegrove GF.
europepmc +1 more source
Human researchers are superior to large language models in writing a medical systematic review in a comparative multitask assessment. [PDF]
Sollini M +7 more
europepmc +1 more source
Comparative Analysis of Large Language Models in First-Aid Scenario Recognition and Management: An In Silico Evaluation of ChatGPT and Claude. [PDF]
West NK +4 more
europepmc +1 more source

