Section 19 · Lesson 19.1

Supervised vs Unsupervised

Learning from labels versus learning structure from data alone.

Supervised learning fits a function $f: X \to Y$ from labeled training data $(x_i, y_i)$ . The labels $y$ might be discrete (classification: spam vs not, default vs not) or continuous (regression: predicted return, predicted volatility).

Unsupervised learning has no labels — only inputs $x_i$ . The goal is to find structure: clusters of similar points, low-dimensional manifolds, anomalous outliers, or generative distributions.

Semi-supervised mixes the two: a few labels and many unlabeled points. Self-supervised learning, dominant in modern NLP, creates labels from the data itself (predict the next word given the previous ones).

In quant work, supervised methods predict returns, default risk, and execution slippage. Unsupervised methods find regime clusters, factor structures, and trade-pattern anomalies.

You have a dataset of $10{,}000$ daily stock movements with no labels and want to find groups of similarly-behaving stocks. Which type of learning?

Section 19 · Lesson 19.1

Supervised vs Unsupervised

Learning from labels versus learning structure from data alone.

Unsupervised learning has no labels — only inputs $x_i$ . The goal is to find structure: clusters of similar points, low-dimensional manifolds, anomalous outliers, or generative distributions.

In quant work, supervised methods predict returns, default risk, and execution slippage. Unsupervised methods find regime clusters, factor structures, and trade-pattern anomalies.

You have a dataset of $10{,}000$ daily stock movements with no labels and want to find groups of similarly-behaving stocks. Which type of learning?