Results 301 to 310 of about 1,818,152 (349)
Some of the next articles are maybe not open access.

Open Problems in Mechanistic Interpretability

arXiv.org
Mechanistic interpretability aims to understand the computational mechanisms underlying neural networks' capabilities in order to accomplish concrete scientific and engineering goals. Progress in this field thus promises to provide greater assurance over
Lee Sharkey   +28 more
semanticscholar   +1 more source

Mechanistic Interpretability for AI Safety - A Review

Trans. Mach. Learn. Res.
Understanding AI systems' inner workings is critical for ensuring value alignment and safety. This review explores mechanistic interpretability: reverse engineering the computational mechanisms and representations learned by neural networks into human ...
Leonard Bereska, E. Gavves
semanticscholar   +1 more source

Rethinking Interpretability in the Era of Large Language Models

arXiv.org
Interpretable machine learning has exploded as an area of interest over the last decade, sparked by the rise of increasingly large datasets and deep neural networks.
Chandan Singh   +4 more
semanticscholar   +1 more source

Kolmogorov-Arnold Networks for Time Series: Bridging Predictive Power and Interpretability

arXiv.org
Kolmogorov-Arnold Networks (KAN) is a groundbreaking model recently proposed by the MIT team, representing a revolutionary approach with the potential to be a game-changer in the field.
Kunpeng Xu, Lifei Chen, Shengrui Wang
semanticscholar   +1 more source

Interpreting Interpretations

2017
“Interpreting Interpretations,” Chapter 6 of A New Narrative for Psychology, discusses the premises for the analysis of narratives in research in the field of psychology and in everyday life. The chapter focuses on how researchers think about narratives after the data have been collected and on how narratives should be understood and analyzed.
openaire   +2 more sources

A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models

arXiv.org
Mechanistic interpretability (MI) is an emerging sub-field of interpretability that seeks to understand a neural network model by reverse-engineering its internal computations.
Daking Rai   +4 more
semanticscholar   +1 more source

Enhancing interpretability and accuracy of AI models in healthcare: a comprehensive review on challenges and future directions

Frontiers Robotics AI
Artificial Intelligence (AI) has demonstrated exceptional performance in automating critical healthcare tasks, such as diagnostic imaging analysis and predictive modeling, often surpassing human capabilities.
Mohammad Ennab, Hamid Mcheick
semanticscholar   +1 more source

Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control

arXiv.org
Disentangling model activations into meaningful features is a central problem in interpretability. However, the absence of ground-truth for these features in realistic scenarios makes validating recent approaches, such as sparse dictionary learning ...
Aleksandar Makelov   +2 more
semanticscholar   +1 more source

RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations

Annual Meeting of the Association for Computational Linguistics
Individual neurons participate in the representation of multiple high-level concepts. To what extent can different interpretability methods successfully disentangle these roles? To help address this question, we introduce RAVEL (Resolving Attribute-Value
Jing Huang   +4 more
semanticscholar   +1 more source

A Multimodal Automated Interpretability Agent

International Conference on Machine Learning
This paper describes MAIA, a Multimodal Automated Interpretability Agent. MAIA is a system that uses neural models to automate neural model understanding tasks like feature interpretation and failure mode discovery.
Tamar Rott Shaham   +6 more
semanticscholar   +1 more source

Home - About - Disclaimer - Privacy