🪴 Anil's Garden

❯

How to Train Your Energy-Based Models

18 Jul 20252 min read

paper
annotated

Title: How to Train Your Energy-Based Models
Authors: Yang Song, Diederik P. Kingma
Published: 9th January 2021 (Saturday) @ 04:51:31
Link: http://arxiv.org/abs/2101.03288v2

Abstract

Energy-Based Models (EBMs), also known as non-normalized probabilistic models, specify probability density or mass functions up to an unknown normalizing constant. Unlike most other probabilistic models, EBMs do not place a restriction on the tractability of the normalizing constant, thus are more flexible to parameterize and can model a more expressive family of probability distributions. However, the unknown normalizing constant of EBMs makes training particularly difficult. Our goal is to provide a friendly introduction to modern approaches for EBM training. We start by explaining maximum likelihood training with Markov chain Monte Carlo (MCMC), and proceed to elaborate on MCMC-free approaches, including Score Matching (SM) and Noise Constrastive Estimation (NCE). We highlight theoretical connections among these three approaches, and end with a brief survey on alternative training methods, which are still under active research. Our tutorial is targeted at an audience with basic understanding of generative models who want to apply EBMs or start a research project in this direction.

Methods of optimizing EBMS:
- MCMC
- Score Matching
- Noise Contrastive Estimation
$x$ is the dependent variable - i.e. the output we’re interested in; $E_{θ} (x)$ is the energy
“Since $Z_{θ}$ is a function of $θ$ , evaluation and differentiation of $log p_{θ} (x)$ w.r.t. its parameters involves a typically intractable integral.”
- context for re-reading: $Z_{θ}$ is the intractable normalization/scaling term
- Why does the fact that it depends on $θ$ make it intractable?
MCMC:
- If you can draw random samples from the model, you can optimize since the gradient of the log-probability of an EBM decomposes - see eqn. 3: (1) gradient of expectation of the dependent variable, $x$ (tractable via AD) and (2) gradient of log intractable denominator $Z_{θ}$
- MCMC is non-trivial - lot of work on making sampling efficient
- Langevin MCMC
- Hamiltonian Monte Carlo (e.g. as implemented in Stan)

Graph View

Backlinks

Energy Based Models
Optimisation

Website
Bluesky
Twitter/X
GitHub
LinkedIn
Instagram
Goodreads
Letterboxd
🍋

🪴 Anil's Garden

Explorer

How to Train Your Energy-Based Models

Graph View

Backlinks