Results 301 to 310 of about 1,818,152 (349)
Some of the next articles are maybe not open access.
Open Problems in Mechanistic Interpretability
arXiv.orgMechanistic interpretability aims to understand the computational mechanisms underlying neural networks' capabilities in order to accomplish concrete scientific and engineering goals. Progress in this field thus promises to provide greater assurance over
Lee Sharkey +28 more
semanticscholar +1 more source
Mechanistic Interpretability for AI Safety - A Review
Trans. Mach. Learn. Res.Understanding AI systems' inner workings is critical for ensuring value alignment and safety. This review explores mechanistic interpretability: reverse engineering the computational mechanisms and representations learned by neural networks into human ...
Leonard Bereska, E. Gavves
semanticscholar +1 more source
Rethinking Interpretability in the Era of Large Language Models
arXiv.orgInterpretable machine learning has exploded as an area of interest over the last decade, sparked by the rise of increasingly large datasets and deep neural networks.
Chandan Singh +4 more
semanticscholar +1 more source
Kolmogorov-Arnold Networks for Time Series: Bridging Predictive Power and Interpretability
arXiv.orgKolmogorov-Arnold Networks (KAN) is a groundbreaking model recently proposed by the MIT team, representing a revolutionary approach with the potential to be a game-changer in the field.
Kunpeng Xu, Lifei Chen, Shengrui Wang
semanticscholar +1 more source
2017
“Interpreting Interpretations,” Chapter 6 of A New Narrative for Psychology, discusses the premises for the analysis of narratives in research in the field of psychology and in everyday life. The chapter focuses on how researchers think about narratives after the data have been collected and on how narratives should be understood and analyzed.
openaire +2 more sources
“Interpreting Interpretations,” Chapter 6 of A New Narrative for Psychology, discusses the premises for the analysis of narratives in research in the field of psychology and in everyday life. The chapter focuses on how researchers think about narratives after the data have been collected and on how narratives should be understood and analyzed.
openaire +2 more sources
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
arXiv.orgMechanistic interpretability (MI) is an emerging sub-field of interpretability that seeks to understand a neural network model by reverse-engineering its internal computations.
Daking Rai +4 more
semanticscholar +1 more source
Frontiers Robotics AI
Artificial Intelligence (AI) has demonstrated exceptional performance in automating critical healthcare tasks, such as diagnostic imaging analysis and predictive modeling, often surpassing human capabilities.
Mohammad Ennab, Hamid Mcheick
semanticscholar +1 more source
Artificial Intelligence (AI) has demonstrated exceptional performance in automating critical healthcare tasks, such as diagnostic imaging analysis and predictive modeling, often surpassing human capabilities.
Mohammad Ennab, Hamid Mcheick
semanticscholar +1 more source
Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control
arXiv.orgDisentangling model activations into meaningful features is a central problem in interpretability. However, the absence of ground-truth for these features in realistic scenarios makes validating recent approaches, such as sparse dictionary learning ...
Aleksandar Makelov +2 more
semanticscholar +1 more source
RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations
Annual Meeting of the Association for Computational LinguisticsIndividual neurons participate in the representation of multiple high-level concepts. To what extent can different interpretability methods successfully disentangle these roles? To help address this question, we introduce RAVEL (Resolving Attribute-Value
Jing Huang +4 more
semanticscholar +1 more source
A Multimodal Automated Interpretability Agent
International Conference on Machine LearningThis paper describes MAIA, a Multimodal Automated Interpretability Agent. MAIA is a system that uses neural models to automate neural model understanding tasks like feature interpretation and failure mode discovery.
Tamar Rott Shaham +6 more
semanticscholar +1 more source

