🪴 Anil's Garden

❯

❯

Weights & Biases

Weights & Biases

28 Jul 202511 min read

clippings

In the paper, the authors investigate the question - why do deep ensembles work better than single deep neural networks?

In their investigation, the authors figure out:

Different snapshots of the same model (i.e., model trained after 1, 10, 100 epochs) exhibit functional similarity. Hence, their ensemble is less likely to explore the different modes of local minima in the optimization space.
Different solutions of the same model (i.e., trained with different random initializations each time) exhibit functional dissimilarity. Hence, their ensemble is more likely to explore the different modes of local minima in the optimization space.

Inspired by their findings, in this article, we present several different insights that are useful for understanding the dynamics of deep neural networks in general.

Table of Contents

Revisiting the Optimization Landscape of Neural Networks

Neural networks are stochastic functions i.e., each time you train a neural network, it may not lead to the exact same solution as before. Neural networks are optimized using gradient-based learning. This optimization problem is almost always non-convex. When expressed with Greek letters, this optimization problem looks like so -

minimize_{θ} \frac{1}{m} i = 1 \sum m ℓ (h_{θ} (x_{i}), y_{i})

where,

\theta

- $$ m $$﻿ is the number of training examples, - $$ h_\theta $$﻿ is the model (neural network) parameterized over $$ \theta

\ell(h_\theta(x), y) $ i so u r l oss f u n c t i o nin w hi c h$ x $ an d$ y

Graph View

Backlinks

Theory of Deep Learning

Website
Bluesky
Twitter/X
GitHub
LinkedIn
Instagram
Goodreads
Letterboxd
🍋