Machine Learning Study Guide

Supervised and unsupervised learning, neural networks, optimization, deep learning architectures, and modern ML systems — the foundations for ML interviews.

Other tracks: Quantitative Finance · Software Engineering

Section 1: Supervised Learning Foundations

4 lessons

The Learning Problem
Inputs, outputs, hypothesis classes, and what 'learning' actually means.
Loss Functions
How squared error, cross-entropy, hinge, and Huber implicitly define what the model optimizes for.
Bias-Variance Tradeoff
Why infinite capacity doesn't fix everything — the geometry of generalization error.
Generalization and ERM
Why minimizing training loss is not the same as minimizing test loss.

Section 2: Linear Models

4 lessons

Linear Regression
The model everything else is compared against. Closed-form solution, assumptions, when it works.
Regularization
Ridge, lasso, elastic net — and why the geometry of the penalty determines what kind of solution you get.
Logistic Regression
Linear models for binary classification — sigmoid, log-odds, and why it's still the workhorse.
Softmax for Multiclass
Generalizing logistic regression to K classes — softmax, cross-entropy, and the redundancy you need to remove.

Section 3: Tree Ensembles

4 lessons

Decision Trees
Recursive splits, impurity measures, and why a single tree is rarely the right answer.
Random Forests
Bagging plus feature subsampling — variance reduction through deliberate decorrelation.
Gradient Boosting
Sequentially fitting trees to the residuals — the canonical recipe for tabular data.
XGBoost Internals
Second-order optimization, regularized objective, and the engineering tricks that made GBMs production-grade.

Section 4: Neural Networks

4 lessons

Multi-Layer Perceptrons
The basic feedforward architecture — fully connected layers, activations, and the universal approximation theorem.
Backpropagation
Reverse-mode automatic differentiation — the algorithm that makes deep learning possible.
Activations and Initialization
Why ReLU won, what initialization schemes do, and how the two interact.
Regularization for Deep Networks
Dropout, batch norm, weight decay, data augmentation — what they actually do and when each helps.

Section 5: Deep Learning Architectures

4 lessons

Convolutional Networks
Translation equivariance, parameter sharing, and the receptive field — why CNNs were the right answer for images.
RNNs and LSTMs
Modeling sequences with shared parameters across time — and what made plain RNNs unworkable for long sequences.
Attention
Query, key, value — and why content-based addressing changed sequence modeling.
Transformers
Architecture, positional encodings, layer norm placement, and why this design dominated.

Section 6: Training Modern Models

4 lessons

Optimizers
SGD, Adam, AdamW — what they actually do and what differs.
Learning Rate Schedules
Warmup, cosine, step decay — the single most impactful hyperparameter after batch size.
Mixed-Precision Training
FP16/BF16 forward and backward — 2x speedup and half the memory, if you do it right.
Distributed Training
Data, model, and pipeline parallelism — how to train models that don't fit on one GPU.

Section 7: Evaluation and Model Selection

4 lessons

Cross-Validation
K-fold, stratified, group, and time-series CV — and the leakage pitfalls each is designed to avoid.
Hyperparameter Tuning
Grid, random, Bayesian, and Hyperband — when each makes sense.
A/B Testing ML Models
Holding out users vs. holding out predictions — what's actually being measured and what isn't.
Calibration and Decision Thresholds
When predicted probabilities aren't probabilities, and why your 0.5 cutoff is rarely the right one.

Section 8: Production ML

4 lessons

Feature Stores
Why training-serving skew happens and the systems built to prevent it.
Model Serving
Batch vs online, latency budgets, batching, and the operational realities of running models in production.
Monitoring and Drift Detection
Input drift, prediction drift, performance decay — what to alert on and what to ignore.
MLOps Pipelines
From notebook to production — the engineering glue that ties training, deployment, and monitoring together.

Section 1: Supervised Learning Foundations

The Learning Problem

Loss Functions

Bias-Variance Tradeoff

Generalization and ERM

Section 2: Linear Models

Linear Regression

Regularization

Logistic Regression

Softmax for Multiclass

Section 3: Tree Ensembles

Decision Trees

Random Forests

Gradient Boosting

XGBoost Internals

Section 4: Neural Networks

Multi-Layer Perceptrons

Backpropagation

Activations and Initialization

Regularization for Deep Networks

Section 5: Deep Learning Architectures

Convolutional Networks

RNNs and LSTMs

Attention

Transformers

Section 6: Training Modern Models

Optimizers

Learning Rate Schedules

Mixed-Precision Training

Distributed Training

Section 7: Evaluation and Model Selection

Cross-Validation

Hyperparameter Tuning

A/B Testing ML Models

Calibration and Decision Thresholds

Section 8: Production ML

Feature Stores

Model Serving

Monitoring and Drift Detection

MLOps Pipelines

Section 1: Supervised Learning Foundations

The Learning Problem

Loss Functions

Bias-Variance Tradeoff

Generalization and ERM

Section 2: Linear Models

Linear Regression

Regularization

Logistic Regression

Softmax for Multiclass

Section 3: Tree Ensembles

Decision Trees

Random Forests

Gradient Boosting

XGBoost Internals

Section 4: Neural Networks

Multi-Layer Perceptrons

Backpropagation

Activations and Initialization

Regularization for Deep Networks

Section 5: Deep Learning Architectures

Convolutional Networks

RNNs and LSTMs

Attention

Transformers

Section 6: Training Modern Models

Optimizers

Learning Rate Schedules

Mixed-Precision Training

Distributed Training

Section 7: Evaluation and Model Selection

Cross-Validation

Hyperparameter Tuning

A/B Testing ML Models

Calibration and Decision Thresholds

Section 8: Production ML

Feature Stores

Model Serving

Monitoring and Drift Detection

MLOps Pipelines