🪴 Anil's Garden

❯

Hidden Markov Models

12 Dec 20245 min read

An introduction to Hidden Markov Models. Many parts of this post are to be fleshed out when I have a moment.

Hidden Markov Models (HMMs) extend regular Markov models by associating a distribution of emission probabilities of observable events with each state of an underlying hidden Markov process. Notionally, we traverse a graph over unobserved states according to the Markov process and emit an observable event at each step. It is usually the latter unobservable process which we are interested in and so we make inference on this given the observed data. Notable applications of HMMs include video label propagation[^Badrinarayanan], part-of-speech tagging (POS tagging)[^hmm-pos-tagging] and nucleotide labelling[^nucleotide-tagging] and many others.

Formally, a Hidden Markov Model is determined by

A set of $N$ states - $Q = q_{1}, q_{2}, \dots, q_{N}$
A transition probability matrix representing the probability of moving from hidden state $i$ to state $j$ in the underlying Markov process - $A = a_{11}, \dots, a_{ij}, \dots, a_{NN}$ such that $\sum_{j = 1}^{N} a_{ij} = 1 \forall i$
A sequence of $T$ observations, $o_{t}$ , drawn from an event space $V = v_{1}, v_{2}, \dots, v_{V}$ - $O = o_{1}, o_{2}, \dots, o_{T}$
A distribution of emission probabilities, $b_{i} (o_{t})$ , for each hidden state, $i$ - $B = b_{i} (o_{t}) \equiv P (V = v ∣ Q_{t} = q_{t})$
An initial probability distribution over states - $π = π_{1}, π_{2}, \dots, π_{N}$ such that $\sum_{i = 1}^{N} π_{i} = 1$

where points 1, 2 and 5 merely determine the underlying Markov process that is hidden, point 3 is specifically the set of observations generated conditional on state at time $t$ and point 4 is the defining component of the HMM, namely the probability distributions associated with hidden states that generates observed events.

Note that the setup above implies a one-to-one relation of hidden states and observed events but there are variants of HMMs such as segmental HMMs used in speech recognition or semi-HMMs used in text processing where the injective mapping is broken.

Core Problems (CP)

Rabiner (1989) introduced the idea that HMMs are characterised by three core problems:

Likelihood: Given an HMM, $λ = (A, B)$ , and an observation sequence, $O$ , determine the likelihood $P (O ∣ λ)$ .
Decoding: Given an observation sequence, $O$ , and an HMM, $λ = (A, B)$ , discover the best hidden state sequence, $Q$ .
Learning: Given an observation sequence, $O$ , and the set of states in the HMM, learn the HMM parameters, $A$ and $B$ .

In a Simpler World

The likelihood (i.e. the probability of the data given specific parameters) that an HMM with parameters, $(A, B)$ , generates a state path, $q_{1}, \dots, q_{T}$ , and an observed sequence, $o_{1}, \dots, o_{T}$ is the product of (a) the emission probabilities of events given hidden states and (b) the transition probabilities over states

$\prod_{t = 1}^{T} b_{q_{t}} (o_{t}) \cdot π_{q_{1}} \cdot \prod_{t = 2}^{T - 1} a_{q_{t}, q_{t + 1}}$

With this product (CP1: Likelihood), we can then compute the likelihoods of each possible path of transitions and, for each of those, the set of possible observations. Given some observed data, from these likelihoods, we can select the maximum and infer that this was the most probable path in light of our observations (CP2: Decoding).

CP1. Likelihood

CP1 Solution: The Forward Algorithm

& \text{function Forward(observations of length T, state-graph of length N)} \rightarrow \text{forward-prob:} \\ & \hspace{3em} \text{create a probability matrix forward[N, T]} \\ &\hspace{3em} \text{for each state s from 1 to N do} && \text{; initialization step} \\ &\hspace{6em} \text{forward[s, 1]} \leftarrow \pi_s \cdot b_s(o_1) \\ &\hspace{3em} \text{for each time step t from 2 to T do} && \text{; recursion step} \\ &\hspace{6em} \text{for each state s from 1 to N do} \\ &\hspace{6em} \text{forward[s, t]} \leftarrow \sum_{s'=1}^N \text{forward[s', t-1]} \cdot a_{s',\ s} \cdot b_s(o_t) \\ &\hspace{3em} \text{forward-prob} \leftarrow \sum_{s=1}^N \text{forward[s, T]} && \text{; termination step} \\ &\text{return forwardprob} \end{aligned}$$ --- ```python def forward(O, state_graph) -> float: [0.0] # or this version...? # import numpy as np # def forward(O, state_graph): # T, N = len(O), len(state_graph) # forward = np.zeros((N, T)) # probability matrix # for s # return ``` ### CP2. Decoding ### CP3. Learning --- If you spot any errors or have any constructive comments, please let me know. --- ## References - Sean R Eddy (2004) What is a hidden Markov model? Nature Biotechnology. [[PDF](/files/post-2021-07-15-hidden-markov-models/Sean-R-Eddy-2004-What-is-a-hidden-Markov-model-Nature-Biotechnology.pdf)] - Daniel Jurafsky & James H. Mart (2020) Chapter A Hidden Markov Models in Speech and Language Processing. <https://web.stanford.edu/~jurafsky/slp3/A.pdf>. - Rabiner, L.R. (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77, 257–286. <https://web.ece.ucsb.edu/Faculty/Rabiner/ece259/Reprints/tutorial%20on%20hmm%20and%20applications.pdf>. - [Machine Learning for OR & FE - Hidden Markov Models - HMMs_MasterSlides.pdf](https://martin-haugh.github.io/files/MachineLearningORFE/HMMs_MasterSlides.pdf) --- [^Badrinarayanan]: Vijay Badrinarayanan, Fabio Galasso and Roberto Cipolla (2014) Label Propagation in Video Sequences. CVPR. <https://ieeexplore.ieee.org/document/5540054>. [^hmm-pos-tagging]: Daniel Jurafsky & James H. Mart (2020) Chapter A Hidden Markov Models in Speech and Language Processing. <https://web.stanford.edu/~jurafsky/slp3/A.pdf> [^nucleotide-tagging]: Sean R Eddy (2004) What is a hidden Markov model? Nature Biotechnology. [[PDF](/files/post-2021-07-15-hidden-markov-models/Sean-R-Eddy-2004-What-is-a-hidden-Markov-model-Nature-Biotechnology.pdf)]

Graph View

Core Problems (CP)
CP1. Likelihood
CP1 Solution: The Forward Algorithm

Backlinks

No backlinks found

Website
Bluesky
Twitter/X
GitHub
LinkedIn
Instagram
Goodreads
Letterboxd
🍋