Results 1 to 10 of about 2,239,090 (215)
MMMU: A Massive Multi-Discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI [PDF]
We introduce MMMU: a new benchmark designed to evaluate multimodal models on massive multi-discipline tasks demanding college-level subject knowledge and deliberate reasoning.
Xiang Yue +21 more
semanticscholar +1 more source
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models [PDF]
Multimodal Large Language Model (MLLM) relies on the powerful LLM to perform multimodal tasks, showing amazing emergent abilities in recent studies, such as writing poems based on an image. However, it is difficult for these case studies to fully reflect
Chaoyou Fu +12 more
semanticscholar +1 more source
ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning [PDF]
Charts are very popular for analyzing data. When exploring charts, people often ask a variety of complex reasoning questions that involve several logical and arithmetic operations. They also commonly refer to visual features of a chart in their questions.
Ahmed Masry +4 more
semanticscholar +1 more source
LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding [PDF]
Although large language models (LLMs) demonstrate impressive performance for many language tasks, most of them can only handle texts a few thousand tokens long, limiting their applications on longer sequence inputs, such as books, reports, and codebases.
Yushi Bai +12 more
semanticscholar +1 more source
VBench: Comprehensive Benchmark Suite for Video Generative Models [PDF]
Video generation has witnessed significant advance-ments, yet evaluating these models remains a challenge. A comprehensive evaluation benchmark for video generation is indispensable for two reasons: 1) Existing metrics do not fully align with human ...
Ziqi Huang +15 more
semanticscholar +1 more source
Renifolin F is a prenylated chalcone isolated from Shuteria involucrata, a traditional minority ethnic medicine used to treat the respiratory diseases and asthma.
Zhuya Yang +8 more
doaj +1 more source
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding [PDF]
Human ability to understand language is general, flexible, and robust. In contrast, most NLU models above the word level are designed for a specific task and struggle with out-of-domain data.
Alex Wang +5 more
semanticscholar +1 more source
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark [PDF]
With the rapid development of Multi-modal Large language Models (MLLMs), a number of diagnostic bench-marks have recently emerged to evaluate the comprehension capabilities of these models.
Kunchang Li +11 more
semanticscholar +1 more source
Immunization of Cats against Fel d 1 Results in Reduced Allergic Symptoms of Owners
An innovative approach was tested to treat cat allergy in humans by vaccinating cats with Fel-CuMV (HypoCatTM), a vaccine against the major cat allergen Fel d 1 based on virus-like particles derived from cucumber mosaic virus (CuMV-VLPs).
Franziska Thoms +14 more
doaj +1 more source
Streptococcus iniae is a problematic gram-positive bacterium negatively affecting Nile tilapia (Oreochromis niloticus), one of the main aquacultural species produced worldwide. The aim of this study was to identify the genetic architecture of survival to
Sergio Vela-Avitúa +6 more
doaj +1 more source

