Results 261 to 270 of about 2,239,189 (314)

GAIA: a benchmark for General AI Assistants

arXiv.org, 2023
We introduce GAIA, a benchmark for General AI Assistants that, if solved, would represent a milestone in AI research. GAIA proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing,
G. Mialon   +5 more
semanticscholar   +1 more source

Endogenous Benchmarks [PDF]

open access: possibleSSRN Electronic Journal, 2009
This paper develops a new approach that controls for commonalities in actively managed investment fund returns when measuring their performance. It is well-known that many investment funds may systematically load on common priced factors omitted from popular models, exhibit similarities in their choices of specific stocks and industries, or vary their ...
Hunter, David   +3 more
openaire   +2 more sources

OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception

IEEE International Conference on Computer Vision, 2023
Semantic occupancy perception is essential for autonomous driving, as automated vehicles require a fine-grained perception of the 3D urban structures. However, existing relevant benchmarks lack diversity in urban scenes, and they only evaluate front-view
Xiaofeng Wang   +9 more
semanticscholar   +1 more source

Benchmarking HRM and the benchmarking of benchmarking

Employee Relations, 2000
Organisations with low absenteeism and low turnover can be distinguished from organisations with high absenteeism and turnover through the identification and implementation of sophisticated and strategic best practices such as benchmarking relative cost position, developing a corporate ethic, valuing the negotiation of an enterprise agreement, and not ...
Rodwell, John J.   +2 more
openaire   +2 more sources

MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

Neural Information Processing Systems
In the age of large-scale language models, benchmarks like the Massive Multitask Language Understanding (MMLU) have been pivotal in pushing the boundaries of what AI can achieve in language comprehension and reasoning across diverse domains.
Yubo Wang   +16 more
semanticscholar   +1 more source

BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents

arXiv.org
We present BrowseComp, a simple yet challenging benchmark for measuring the ability for agents to browse the web. BrowseComp comprises 1,266 questions that require persistently navigating the internet in search of hard-to-find, entangled information ...
Jason Wei   +9 more
semanticscholar   +1 more source

Benchmarking, benchmarking processes

1996
Many managers strongly believe that successful benchmarking exercises are only the result of choosing carefully the right methodology. The process of benchmarking, however, although being a tool, is very difficult to apply. It challenges the existing culture of work and scientific practices and methodologies in place.
Mohamed Zairi, Paul Leonard
openaire   +1 more source

Home - About - Disclaimer - Privacy