Results 11 to 20 of about 692,148 (292)

MMMU: A Massive Multi-Discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI [PDF]

open access: yesComputer Vision and Pattern Recognition, 2023
We introduce MMMU: a new benchmark designed to evaluate multimodal models on massive multi-discipline tasks demanding college-level subject knowledge and deliberate reasoning.
Xiang Yue   +21 more
semanticscholar   +1 more source

Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering [PDF]

open access: yesNeural Information Processing Systems, 2022
When answering a question, humans utilize the information available across different modalities to synthesize a consistent and complete chain of thought (CoT).
Pan Lu   +8 more
semanticscholar   +1 more source

Improving Factuality and Reasoning in Language Models through Multiagent Debate [PDF]

open access: yesInternational Conference on Machine Learning, 2023
Large language models (LLMs) have demonstrated remarkable capabilities in language generation, understanding, and few-shot learning in recent years. An extensive body of work has explored how their performance may be further improved through the tools of
Yilun Du   +4 more
semanticscholar   +1 more source

A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity [PDF]

open access: yesInternational Joint Conference on Natural Language Processing, 2023
This paper proposes a framework for quantitatively evaluating interactive LLMs such as ChatGPT using publicly available data sets. We carry out an extensive technical evaluation of ChatGPT using 23 data sets covering 8 different common NLP application ...
Yejin Bang   +12 more
semanticscholar   +1 more source

PIQA: Reasoning about Physical Commonsense in Natural Language [PDF]

open access: yesAAAI Conference on Artificial Intelligence, 2019
To apply eyeshadow without a brush, should I use a cotton swab or a toothpick? Questions requiring this kind of physical commonsense pose a challenge to today's natural language understanding systems.
Yonatan Bisk   +4 more
semanticscholar   +1 more source

How the Custom Suppresses the Endowment Effect: Exchange Paradigm in Kanak Country

open access: yesFrontiers in Psychology, 2022
In this paper, Knetsch's exchange paradigm is analyzed from the perspective of pragmatics and social norms. In this paradigm the participant, at the beginning of the experiment, receives an object from the experimenter and at the end, the same ...
Jean Baratgin   +5 more
doaj   +1 more source

A Study on the Sufficient Conditional and the Necessary Conditional With Chinese and French Participants

open access: yesFrontiers in Psychology, 2022
According to the weak version of linguistic relativity, also called the Sapir-Whorf hypothesis, the features of an individual’s native language influence his worldview and perception.
Jing Shao   +4 more
doaj   +1 more source

ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning [PDF]

open access: yesFindings, 2022
Charts are very popular for analyzing data. When exploring charts, people often ask a variety of complex reasoning questions that involve several logical and arithmetic operations. They also commonly refer to visual features of a chart in their questions.
Ahmed Masry   +4 more
semanticscholar   +1 more source

GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering

open access: yesComputer Vision and Pattern Recognition, 2019
We introduce GQA, a new dataset for real-world visual reasoning and compositional question answering, seeking to address key shortcomings of previous VQA datasets.
Drew A. Hudson, Christopher D. Manning
semanticscholar   +1 more source

Least-to-Most Prompting Enables Complex Reasoning in Large Language Models [PDF]

open access: yesInternational Conference on Learning Representations, 2022
Chain-of-thought prompting has demonstrated remarkable performance on various natural language reasoning tasks. However, it tends to perform poorly on tasks which requires solving problems harder than the exemplars shown in the prompts.
Denny Zhou   +9 more
semanticscholar   +1 more source

Home - About - Disclaimer - Privacy