- Building Bridges between Regression, Clustering, and Classification
- XGBoost A Scalable Tree Boosting System
- Learnability and the Vapnik-Chervonenkis dimension
Statistical Learning Theory (Spring 2001; CS 281B / Stat 241B) by Michael Jordan
- Tree Models
- Cross Validation, Regularization, and Information Criteria
- TIC/AIC
- Bayesian Model Selection
- MDL Introduction and Source Coding
- Minimum Description Length
- More on Marginal Likelihood
- Approximation of Marginal Likelihood
- Reversible Jump MCMC and Introduction to Kernel Methods (version 1)
- Reversible Jump MCMC and Introduction to Kernel Methods (version 2)
- Introduction to Support Vector Machines
- Lagrangian Duality
- Optimal Margin Classifiers
- Introduction to Kernels
- Support Vector Machines---Non-Separable Classification and Regression
- Kernel Principal Component Analysis
- Reproducing Kernel Hilbert Spaces
- Reproducing Kernel Hilbert Spaces II
- The Representer Theorem
- Regularization and RKHS
- Fourier Perspective on Regularization
- Gaussian Processes I
- Gaussian Processes II
- Gaussian Processes and Reproducing Kernels
- Background on Uniform Convergence Bounds
- Statistical Learning Theory---Finite Case I
- Statistical Learning Theory---Finite Case II
- Statistical Learning Theory---Symmetrization Lemma
- Annealed Entropy and Growth Function
- Vapnik-Chervonenkis Dimension
- Structural Risk Minimization
- Boosting I
- Boosting II
Statistical Learning Theory (Spring 2004; CS 281B / Stat 241B) by Michael Jordan
- Introduction [ps] [pdf]
- Maximal margin classification [ps] [pdf]
- Introduction to kernels [ps] [pdf]
- Ridge regression and kernels [ps] [pdf]
- Properties of kernels [ps] [pdf]
- Soft-margin SVM, sparseness [ps] [pdf]
- Regression, the SVD and PCAÂ [ps]Â [pdf]
- Kernel PCA and kernel CCA
- Incomplete Cholesky decomposition [ps] [pdf]
- ANOVA kernels and diffusion kernels [ps] [pdf]
- String kernels and marginalized kernels [ps] [pdf]
- Fisher kernels and semidefinite programming [ps] [pdf]
- Multiple kernels and RKHS introduction [ps] [pdf]
- Reproducing kernel Hilbert spaces IÂ [ps]Â [pdf]
- Reproducing kernel Hilbert spaces IIÂ [ps]Â [pdf]
- The Representer Theorem [ps] [pdf]
- Gaussian processes IÂ [ps]Â [pdf]
- Gaussian processes IIÂ [ps]Â [pdf]
- Gaussian processes and reproducing kernels [ps] [pdf]
- Spectral clustering [ps] [pdf]
- Spectral clustering, introduction to Bayesian methods [ps] [pdf]
- Conjugacy and exponential family [ps] [pdf]
- Importance sampling and MCMC
- Properties of Dirichlet distribution [ps] [pdf]
- Dirichlet processes IÂ [ps]Â [pdf]
- Dirichlet processes IIÂ [ps]Â [pdf]
- Dirichlet process mixtures IÂ [ps]Â [pdf]
- Dirichlet process mixtures IIÂ [ps]Â [pdf]
- Probabilistic formulation of prediction problems [ps]
- Risk bounds, concentration inequalities [ps]
- Glivenko-Cantelli classes and Rademacher averages [ps]
- Growth function and VC-dimension [ps]
- Applications of Rademacher averages in large margin classification [ps]
- Growth function estimates for parameterized binary classes [ps]
- Covering numbers and metric entropy [ps]
- Chaining, Dudleyâs entropy integral [ps]
- Covering numbers of VC classes [ps]
- Bernsteinâs inequality, and generalizations [ps]
Books and Articles - Multivariate Statistics and Machine Learning by Michael Jordan (1998)
T. S. Jaakkola, and M. I. Jordan
Variational probabilistic inference and the QMR-DT database
October, 1998
M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul
An introduction to variational methods for graphical models
December, 1997
R. D. Shachter, S. K. Andersen, and P. Szolovits
Global conditioning for probabilistic inference in belief networks
December, 1997
Kevin P. Murphy
Fitting a conditional Gaussian distribution
Michael I. Jordan
Notes on recursive least squares
M. I. Jordan and R. A. Jacobs
Hierarchical mixtures of experts and the EM algorithm.
M. I. Jordan and R. A. Jacobs
Learning in modular and hierarchical systems.
Michael I. Jordan
Slides from a tutorial on clustering
Michael I. Jordan
Why the logistic function?
Robert Cowell
Introduction to Inference in Bayesian Networks
Michael I. Jordan (Ed.)
Learning in graphical models. MIT Press, Cambridge, MA 1999.
Christopher M. Bishop
Neural Networks for Pattern Recognition. Oxford University Press, 1995.
David Heckerman
Tutorial on Learning With Bayesian Networks, updated November 1996.