- bertology
- Cross-Lingual Consistency of Factual Knowledge in Multilingual Language Models
- Ecco An Open Source Library for the Explainability of Transformer Language Models
- Explainability for Speech Models On the Challenges of Acoustic Feature Selection
- Generative Models What do they know Do they know things Letâs find out!
- Grad-CAM Visual Explanations from Deep Networks via Gradient-based Localization -grad-cam
- Inseq An Interpretability Toolkit for Sequence Generation Models
- LVLM-Intrepret An Interpretability Tool for Large Vision-Language Models
- Listenable Maps for Audio Classifiers
- Looking for a Needle in a Haystack A Comprehensive Study of Hallucinations in Neural Machine Translation
- Measuring the Mixing of Contextual Information in the Transformer
- Pyramid Feature Attention Network for Saliency detection
- Quantifying the Plausibility of Context Reliance in Neural Machine Translation
- SPES Spectrogram Perturbation for Explainable Speech-to-Text Generation
- Sparse Autoencoders Find Highly Interpretable Features in Language Models
- Towards Monosemanticity Decomposing Language Models With Dictionary Learning
- Towards Monosemanticity Decomposing Language Models With Dictionary Learning
- Visualizing and Understanding Convolutional Networks
- Visualizing the Loss Landscape of Neural Nets
- Why Should I Trust You Explaining the Predictions of Any Classifier -LIME
- xTower A Multilingual LLM for Explaining and Correcting Translation Errors
- Information-Theoretic Probing for Linguistic Structure
- Designing and Interpreting Probes with Control Tasks
- Circuit Tracing Revealing Computational Graphs in Language Models
Surveys and Reviews
- Explainability for Large Language Models A Survey - cited by Dennisâs SPES paper
Resources
- Interpretable Machine Learning: A Guide for Making Black Box Models Explainable by Christoph Molnar
- Daily Picks in Interpretability & Analysis of LMs - a Hugging Face Space by Gabriele Sarti