🪴 Anil's Garden

❯

❯

Theory of Deep Learning

Theory of Deep Learning

18 Jul 20252 min read

topic
theory

Small Batch Size Training for Language Models When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful
Deep Ensemble as a Gaussian Process Approximate Posterior
Generative Adversarial Networks
Interpolating Compressed Parameter Subspaces
Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs
Overcoming catastrophic forgetting in neural networks - catastrophic forgetting
Task Singular Vectors Reducing Task Interference in Model Merging
Artificial Kuramoto Oscillatory Neurons
Attention Is All You Need
Batch Normalization Accelerating Deep Network Training by Reducing Internal Covariate Shift
Can You Trust Your Model’s Uncertainty Evaluating Predictive Uncertainty Under Dataset Shift
Conformal Prediction for Natural Language Processing A Survey
Deep Ensembles A Loss Landscape Perspective
Dropout A Simple Way to Prevent Neural Networks from Overfitting
How many degrees of freedom do we need to train deep networks a loss landscape perspective
How transferable are features in deep neural networks
Knowledge distillation A good teacher is patient and consistent
Measuring the Intrinsic Dimension of Objective Landscapes
Neural Machine Translation by Jointly Learning to Align and Translate
On the Number of Linear Regions of Deep Neural Networks
On the difficulty of training Recurrent Neural Networks
Overcoming catastrophic forgetting in neural networks
Practical recommendations for gradient-based training of deep architectures
Qualitatively characterizing neural network optimization problems
Revisiting Model Stitching to Compare Neural Representations
Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles
Snapshot Ensembles Train 1, get M for free
Sparse Communication via Mixed Distributions
Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting
The Forward-Forward Algorithm Some Preliminary Investigations
The Goldilocks zone Towards better understanding of neural network loss landscapes
What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision
Why Warmup the Learning Rate Underlying Mechanisms and Improvements
Improving neural networks by preventing co-adaptation of feature detectors
Flow Matching for Generative Modeling
A Convergence Theory for Deep Learning via Over-Parameterization - neural tangent kernels

Resources 📚

Alice’s Adventures in a Differentiable Wonderland — Volume I, A Tour of the Land
Probabilistic Artificial Intelligence
Understanding the Effectivity of Ensembles in Deep Learning - Weights & Biases
Yes you should understand backprop by Andrej Karpathy

Also see the research papers in Statistical Learning Theory and background (didactic material) in Statistics and Probability

Graph View

Backlinks

No backlinks found

Website
Bluesky
Twitter/X
GitHub
LinkedIn
Instagram
Goodreads
Letterboxd
🍋