🪴 Anil's Garden

❯

❯

eXplainability

18 Jul 20252 min read

topic
interpretability
bertology
grad-cam
LIME

Shifting Attention to Relevance Towards the Predictive Uncertainty Quantification of Free-Form Large Language Models
MARS Meaning-Aware Response Scoring for Uncertainty Estimation in Generative LLMs
bertology
- BERT Rediscovers the Classical NLP Pipeline
- What Does BERT Look At An Analysis of BERT’s Attention
Cross-Lingual Consistency of Factual Knowledge in Multilingual Language Models
Ecco An Open Source Library for the Explainability of Transformer Language Models
Explainability for Speech Models On the Challenges of Acoustic Feature Selection
Generative Models What do they know Do they know things Let’s find out!
Grad-CAM Visual Explanations from Deep Networks via Gradient-based Localization -grad-cam
Inseq An Interpretability Toolkit for Sequence Generation Models
LVLM-Intrepret An Interpretability Tool for Large Vision-Language Models
Listenable Maps for Audio Classifiers
Looking for a Needle in a Haystack A Comprehensive Study of Hallucinations in Neural Machine Translation
Measuring the Mixing of Contextual Information in the Transformer
Pyramid Feature Attention Network for Saliency detection
Quantifying the Plausibility of Context Reliance in Neural Machine Translation
SPES Spectrogram Perturbation for Explainable Speech-to-Text Generation
Sparse Autoencoders Find Highly Interpretable Features in Language Models
Towards Monosemanticity Decomposing Language Models With Dictionary Learning
Towards Monosemanticity Decomposing Language Models With Dictionary Learning
Visualizing and Understanding Convolutional Networks
Visualizing the Loss Landscape of Neural Nets
Why Should I Trust You Explaining the Predictions of Any Classifier -LIME
xTower A Multilingual LLM for Explaining and Correcting Translation Errors
Information-Theoretic Probing for Linguistic Structure
Designing and Interpreting Probes with Control Tasks
Circuit Tracing Revealing Computational Graphs in Language Models
Golden Gate Claude
Scaling Monosemanticity Extracting Interpretable Features from Claude 3 Sonnet

Surveys and Reviews

Explainability for Large Language Models A Survey - cited by Dennis’s SPES paper

Resources

Interpretable Machine Learning: A Guide for Making Black Box Models Explainable by Christoph Molnar
Daily Picks in Interpretability & Analysis of LMs - a Hugging Face Space by Gabriele Sarti

See also: Explainable artificial intelligence - Wikipedia

Graph View

Surveys and Reviews
Resources

Backlinks

No backlinks found

Website
Bluesky
Twitter/X
GitHub
LinkedIn
Instagram
Goodreads
Letterboxd
🍋