- Machine Learning Systems - Principles and Practices of Engineering Artificially Intelligent Systems by Vijay Janapa Reddi
- extension of the CS249r course at Harvard University, taught by Prof. Vijay Janapa Reddi
- Clipped: Machine Learning Systems Last Updated: January 12, 2025
- The Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani and Jerome Friedman
- this is a cheat code
- CS229: Machine Learning by Tengyu Ma, Andrew Ng and Chris Ré
- A Course in Machine Learning by Hal Daumé III
- Unsupervised Feature Learning and Deep Learning Tutorial
- Oxford Machine Learning by Nando de Freitas
- An Introduction to Statistical Learning by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani
- Statistical Learning with Sparsity by Trevor Hastie, Robert Tibshirani and Martin Wainwright
- Pattern Recognition and Machine Learning by Christopher Bishop
- Foundations of Machine Learning by Mehryar Mohri, Afshin Rostamizadeh and Ameet Talwalkar
- Understanding Machine Learning: From Theory to Algorithms by Shai Shalev-Shwartz and Shai Ben-David
- Statistical Modeling and Analysis of Neural Data (Spring 2018) by Jonathan Pillow
- Theoretical Machine Learning (Princeton Computer Science 511; Spring 2014) by Rob Schapire
- Shervine Amidi: Teaching - ML Cheat Sheets
- How to Train Your Robot by Brandon Rohrer - a book-in-progress about applied robotics, machine learning, and software engineering
- Machine Learning @Sapienza - Course material, 2nd semester a.y. 2023/2024, Mathematical Sciences for AI taught by Emanuele Rodolà
- rushter/MLAlgorithms: Minimal and clean examples of machine learning algorithms implementations
- https://probml.github.io/pml-book/book1.html
- Probabilistic Machine Learning: An Introduction (March 2022; draft PDF) by Kevin Patrick Murphy
- Mathematics for Machine Learning by Marc Peter Deisenroth, A. Aldo Faisal and Cheng Soon Ong (landing page)
- Artificial Intelligence A Modern Approach
Regression
Production-oriented
Apple’s Machine learning-powered APIs
These are much more production and creativity-oriented allowing developers to build the features / think of the use cases
-
Apple Developer Machine Learning - Bring intelligent on-device machine learning powered features, object detection in images and video, language analysis, and sound classification, to your app with just a few lines of code.
-
Core ML- Core ML delivers blazingly fast performance on Apple devices with easy integration of machine learning and AI models into your apps. Convert models from popular training libraries using Core ML Tools or download ready-to-use Core ML models. Easily preview models and understand their performance right in Xcode.
-
Building Bridges between Regression, Clustering, and Classification
Statistical Learning Theory (Spring 2001; CS 281B / Stat 241B) by Michael Jordan
- Tree Models
- Cross Validation, Regularization, and Information Criteria
- TIC/AIC
- Bayesian Model Selection
- MDL Introduction and Source Coding
- Minimum Description Length
- More on Marginal Likelihood
- Approximation of Marginal Likelihood
- Reversible Jump MCMC and Introduction to Kernel Methods (version 1)
- Reversible Jump MCMC and Introduction to Kernel Methods (version 2)
- Introduction to Support Vector Machines
- Lagrangian Duality
- Optimal Margin Classifiers
- Introduction to Kernels
- Support Vector Machines---Non-Separable Classification and Regression
- Kernel Principal Component Analysis
- Reproducing Kernel Hilbert Spaces
- Reproducing Kernel Hilbert Spaces II
- The Representer Theorem
- Regularization and RKHS
- Fourier Perspective on Regularization
- Gaussian Processes I
- Gaussian Processes II
- Gaussian Processes and Reproducing Kernels
- Background on Uniform Convergence Bounds
- Statistical Learning Theory---Finite Case I
- Statistical Learning Theory---Finite Case II
- Statistical Learning Theory---Symmetrization Lemma
- Annealed Entropy and Growth Function
- Vapnik-Chervonenkis Dimension
- Structural Risk Minimization
- Boosting I
- Boosting II
Statistical Learning Theory (Spring 2004; CS 281B / Stat 241B) by Michael Jordan
- Introduction [ps] [pdf]
- Maximal margin classification [ps] [pdf]
- Introduction to kernels [ps] [pdf]
- Ridge regression and kernels [ps] [pdf]
- Properties of kernels [ps] [pdf]
- Soft-margin SVM, sparseness [ps] [pdf]
- Regression, the SVD and PCA [ps] [pdf]
- Kernel PCA and kernel CCA
- Incomplete Cholesky decomposition [ps] [pdf]
- ANOVA kernels and diffusion kernels [ps] [pdf]
- String kernels and marginalized kernels [ps] [pdf]
- Fisher kernels and semidefinite programming [ps] [pdf]
- Multiple kernels and RKHS introduction [ps] [pdf]
- Reproducing kernel Hilbert spaces I [ps] [pdf]
- Reproducing kernel Hilbert spaces II [ps] [pdf]
- The Representer Theorem [ps] [pdf]
- Gaussian processes I [ps] [pdf]
- Gaussian processes II [ps] [pdf]
- Gaussian processes and reproducing kernels [ps] [pdf]
- Spectral clustering [ps] [pdf]
- Spectral clustering, introduction to Bayesian methods [ps] [pdf]
- Conjugacy and exponential family [ps] [pdf]
- Importance sampling and MCMC
- Properties of Dirichlet distribution [ps] [pdf]
- Dirichlet processes I [ps] [pdf]
- Dirichlet processes II [ps] [pdf]
- Dirichlet process mixtures I [ps] [pdf]
- Dirichlet process mixtures II [ps] [pdf]
- Probabilistic formulation of prediction problems [ps]
- Risk bounds, concentration inequalities [ps]
- Glivenko-Cantelli classes and Rademacher averages [ps]
- Growth function and VC-dimension [ps]
- Applications of Rademacher averages in large margin classification [ps]
- Growth function estimates for parameterized binary classes [ps]
- Covering numbers and metric entropy [ps]
- Chaining, Dudley’s entropy integral [ps]
- Covering numbers of VC classes [ps]
- Bernstein’s inequality, and generalizations [ps]
Books and Articles - Multivariate Statistics and Machine Learning by Michael Jordan (1998)
T. S. Jaakkola, and M. I. Jordan
Variational probabilistic inference and the QMR-DT database
October, 1998
M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul
An introduction to variational methods for graphical models
December, 1997
R. D. Shachter, S. K. Andersen, and P. Szolovits
Global conditioning for probabilistic inference in belief networks
December, 1997
Kevin P. Murphy
Fitting a conditional Gaussian distribution
Michael I. Jordan
Notes on recursive least squares
M. I. Jordan and R. A. Jacobs
Hierarchical mixtures of experts and the EM algorithm.
M. I. Jordan and R. A. Jacobs
Learning in modular and hierarchical systems.
Michael I. Jordan
Slides from a tutorial on clustering
Michael I. Jordan
Why the logistic function?
Robert Cowell
Introduction to Inference in Bayesian Networks
Michael I. Jordan (Ed.)
Learning in graphical models. MIT Press, Cambridge, MA 1999.
Christopher M. Bishop
Neural Networks for Pattern Recognition. Oxford University Press, 1995.
David Heckerman
Tutorial on Learning With Bayesian Networks, updated November 1996.