🪴 Anil's Garden

❯

❯

Deep Learning Theory

Deep Learning Theory

23 Nov 20254 min read

topic
theory

Neural Networks Fail to Learn Periodic Functions and How to Fix It - “Snake activations”
Small Batch Size Training for Language Models When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful
Deep Ensemble as a Gaussian Process Approximate Posterior
Generative Adversarial Networks
Interpolating Compressed Parameter Subspaces
Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs
Overcoming catastrophic forgetting in neural networks - catastrophic forgetting
Task Singular Vectors Reducing Task Interference in Model Merging
Artificial Kuramoto Oscillatory Neurons
Attention Is All You Need
Batch Normalization Accelerating Deep Network Training by Reducing Internal Covariate Shift
Can You Trust Your Model’s Uncertainty Evaluating Predictive Uncertainty Under Dataset Shift
Conformal Prediction for Natural Language Processing A Survey
Deep Ensembles A Loss Landscape Perspective
Dropout A Simple Way to Prevent Neural Networks from Overfitting
How many degrees of freedom do we need to train deep networks a loss landscape perspective
How transferable are features in deep neural networks
Knowledge distillation A good teacher is patient and consistent
Measuring the Intrinsic Dimension of Objective Landscapes
Neural Machine Translation by Jointly Learning to Align and Translate
On the Number of Linear Regions of Deep Neural Networks
On the difficulty of training Recurrent Neural Networks
Overcoming catastrophic forgetting in neural networks
Practical recommendations for gradient-based training of deep architectures
Qualitatively characterizing neural network optimization problems
Revisiting Model Stitching to Compare Neural Representations
Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles
Snapshot Ensembles Train 1, get M for free
Sparse Communication via Mixed Distributions
Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting
The Forward-Forward Algorithm Some Preliminary Investigations
The Goldilocks zone Towards better understanding of neural network loss landscapes
What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision
Why Warmup the Learning Rate Underlying Mechanisms and Improvements
Improving neural networks by preventing co-adaptation of feature detectors
Flow Matching for Generative Modeling
A Convergence Theory for Deep Learning via Over-Parameterization - neural tangent kernels

Resources 📚

See also research papers in Machine Learning and didactic material in Statistics and Probability

✨ Alice’s Adventures in a Differentiable Wonderland — Volume I, A Tour of the Land
Probabilistic Artificial Intelligence
The Deep Learning Book by Ian Goodfellow and Yoshua Bengio and Aaron Courville
CS231n: Convolutional Neural Networks for Visual Recognition
Yann LeCun’s Deep Learning Course at CDS [Home]
Deep Learning by Yann LeCun & Alfredo Canziani (DS-GA 1008 · SPRING 2020) · NYU CENTER FOR DATA SCIENCE
- other editions (years) are hosted by Alfredo Canziani
Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges by Michael M. Bronstein, Joan Bruna, Taco Cohen, Petar Veličković
Dive into Deep Learning
DeepMind x UCL | Deep Learning Lecture Series 2020
Neural Networks and Deep Learning by Michael Nielsen
Christopher Olah’s Posts on Neural Networks
Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto
Algorithms of Reinforcement Learning by Csaba Szepesvári
Machine Learning with PyTorch and Scikit-Learn by Sebastian Raschka, Yuxi Hayden Liu and Vahid Mirjalili (includes sections on Transformers, GANs, GCNs and RL)
labml.ai Annotated PyTorch Paper Implementations - Multi-Headed Attention, Transformer Encoder and Decoder Models, Denoising Diffusion Probabilistic Models, Wasserstein GAN
Deep Learning & Applied AI @Sapienza - Course material, 2nd semester a.y. 2023/2024, Dept. of Computer Science taught by Emanuele Rodolà
Understanding the Effectivity of Ensembles in Deep Learning - Weights & Biases
Yes you should understand backprop by Andrej Karpathy
fast.ai—Making neural nets uncool again
Fast.AI Deep Learning For Coders—36 hours of lessons for free
Launchpad Reading Group videos

A guide to convolution arithmetic for deep learning.pdf

karpathy/min-char-rnn.py - Minimal character-level language model with a Vanilla Recurrent Neural Network, in Python/numpy
A Recipe for Training Neural Networks

Annotated Bibliography of Recommended Materials from the Center for Human-Compatible AI
Open Learning by Frederik Kratzert
AI Safety Syllabus Reading List from 80,000 Hours

Graph View

Backlinks

No backlinks found

Website
Bluesky
Twitter/X
GitHub
LinkedIn
Instagram
Goodreads
Letterboxd
🍋