Results 261 to 270 of about 2,239,189 (314)
Some of the next articles are maybe not open access.
Related searches:
Related searches:
GAIA: a benchmark for General AI Assistants
arXiv.org, 2023We introduce GAIA, a benchmark for General AI Assistants that, if solved, would represent a milestone in AI research. GAIA proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing,
G. Mialon +5 more
semanticscholar +1 more source
This paper develops a new approach that controls for commonalities in actively managed investment fund returns when measuring their performance. It is well-known that many investment funds may systematically load on common priced factors omitted from popular models, exhibit similarities in their choices of specific stocks and industries, or vary their ...
Hunter, David +3 more
openaire +2 more sources
OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception
IEEE International Conference on Computer Vision, 2023Semantic occupancy perception is essential for autonomous driving, as automated vehicles require a fine-grained perception of the 3D urban structures. However, existing relevant benchmarks lack diversity in urban scenes, and they only evaluate front-view
Xiaofeng Wang +9 more
semanticscholar +1 more source
Benchmarking HRM and the benchmarking of benchmarking
Employee Relations, 2000Organisations with low absenteeism and low turnover can be distinguished from organisations with high absenteeism and turnover through the identification and implementation of sophisticated and strategic best practices such as benchmarking relative cost position, developing a corporate ethic, valuing the negotiation of an enterprise agreement, and not ...
Rodwell, John J. +2 more
openaire +2 more sources
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
Neural Information Processing SystemsIn the age of large-scale language models, benchmarks like the Massive Multitask Language Understanding (MMLU) have been pivotal in pushing the boundaries of what AI can achieve in language comprehension and reasoning across diverse domains.
Yubo Wang +16 more
semanticscholar +1 more source
BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents
arXiv.orgWe present BrowseComp, a simple yet challenging benchmark for measuring the ability for agents to browse the web. BrowseComp comprises 1,266 questions that require persistently navigating the internet in search of hard-to-find, entangled information ...
Jason Wei +9 more
semanticscholar +1 more source
Benchmarking, benchmarking processes
1996Many managers strongly believe that successful benchmarking exercises are only the result of choosing carefully the right methodology. The process of benchmarking, however, although being a tool, is very difficult to apply. It challenges the existing culture of work and scientific practices and methodologies in place.
Mohamed Zairi, Paul Leonard
openaire +1 more source

