- Neural Networks Fail to Learn Periodic Functions and How to Fix It - âSnake activationsâ
- Small Batch Size Training for Language Models When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful
- Deep Ensemble as a Gaussian Process Approximate Posterior
- Generative Adversarial Networks
- Interpolating Compressed Parameter Subspaces
- Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs
- Overcoming catastrophic forgetting in neural networks - catastrophic forgetting
- Task Singular Vectors Reducing Task Interference in Model Merging
- Artificial Kuramoto Oscillatory Neurons
- Attention Is All You Need
- Batch Normalization Accelerating Deep Network Training by Reducing Internal Covariate Shift
- Can You Trust Your Modelâs Uncertainty Evaluating Predictive Uncertainty Under Dataset Shift
- Conformal Prediction for Natural Language Processing A Survey
- Deep Ensembles A Loss Landscape Perspective
- Dropout A Simple Way to Prevent Neural Networks from Overfitting
- How many degrees of freedom do we need to train deep networks a loss landscape perspective
- How transferable are features in deep neural networks
- Knowledge distillation A good teacher is patient and consistent
- Measuring the Intrinsic Dimension of Objective Landscapes
- Neural Machine Translation by Jointly Learning to Align and Translate
- On the Number of Linear Regions of Deep Neural Networks
- On the difficulty of training Recurrent Neural Networks
- Overcoming catastrophic forgetting in neural networks
- Practical recommendations for gradient-based training of deep architectures
- Qualitatively characterizing neural network optimization problems
- Revisiting Model Stitching to Compare Neural Representations
- Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles
- Snapshot Ensembles Train 1, get M for free
- Sparse Communication via Mixed Distributions
- Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting
- The Forward-Forward Algorithm Some Preliminary Investigations
- The Goldilocks zone Towards better understanding of neural network loss landscapes
- What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision
- Why Warmup the Learning Rate Underlying Mechanisms and Improvements
- Improving neural networks by preventing co-adaptation of feature detectors
- Flow Matching for Generative Modeling
- A Convergence Theory for Deep Learning via Over-Parameterization - neural tangent kernels
Resources đ
See also research papers in Machine Learning and didactic material in Statistics and Probability
- âš Aliceâs Adventures in a Differentiable Wonderland â Volume I, A Tour of the Land
- Probabilistic Artificial Intelligence
- The Deep Learning Book by Ian Goodfellow and Yoshua Bengio and Aaron Courville
- CS231n: Convolutional Neural Networks for Visual Recognition
- Yann LeCunâs Deep Learning Course at CDS [Home]
- Deep Learning by Yann LeCun & Alfredo Canziani (DS-GA 1008 · SPRING 2020) · NYU CENTER FOR DATA SCIENCE
- other editions (years) are hosted by Alfredo Canziani
- Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges by Michael M. Bronstein, Joan Bruna, Taco Cohen, Petar VeliÄkoviÄ
- Dive into Deep Learning
- DeepMind x UCL | Deep Learning Lecture Series 2020
- Neural Networks and Deep Learning by Michael Nielsen
- Christopher Olahâs Posts on Neural Networks
- Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto
- Algorithms of Reinforcement Learning by Csaba SzepesvĂĄri
- Machine Learning with PyTorch and Scikit-Learn by Sebastian Raschka, Yuxi Hayden Liu and Vahid Mirjalili (includes sections on Transformers, GANs, GCNs and RL)
- labml.ai Annotated PyTorch Paper Implementations - Multi-Headed Attention, Transformer Encoder and Decoder Models, Denoising Diffusion Probabilistic Models, Wasserstein GAN
- Deep Learning & Applied AI @Sapienza - Course material, 2nd semester a.y. 2023/2024, Dept. of Computer Science taught by Emanuele RodolĂ
- Understanding the Effectivity of Ensembles in Deep Learning - Weights & Biases
- Yes you should understand backprop by Andrej Karpathy
- fast.aiâMaking neural nets uncool again
- Fast.AI Deep Learning For Codersâ36 hours of lessons for free
- Launchpad Reading Group videos
- karpathy/min-char-rnn.py - Minimal character-level language model with a Vanilla Recurrent Neural Network, in Python/numpy
- A Recipe for Training Neural Networks
- Annotated Bibliography of Recommended Materials from the Center for Human-Compatible AI
- Open Learning by Frederik Kratzert
- AI Safety Syllabus Reading List from 80,000 Hours