Feature Stores

Training-serving skew is when the features your model trains on differ from the features it serves on. It's the most common ML production bug — and the most expensive one, because the model trains fine and only fails after deployment.

How skew happens

Different code paths: training is a Python pipeline; serving is Go or a SQL view. Subtle differences in tokenization, missing-value handling, or rounding produce different feature values.
Different data sources: training reads from a warehouse snapshot; serving reads from a live OLTP database. The schemas have drifted.
Different time semantics: training computed "user's purchases in last 30 days" with a calendar window; serving computes it as "purchases in last 30 days from this exact moment."

A centralized service that:

Stores feature definitions (the code that computes a feature from raw data) once.
Materializes features into both a training-time store (warehouse, Parquet) and a serving-time store (low-latency KV like Redis or DynamoDB).
Guarantees the *same* feature value at training and serving time, given the same entity and timestamp.

Examples: Feast (open source), Tecton, Vertex AI Feature Store, Databricks Feature Store.

Point-in-time correctness

The hard part. Training requires features as they would have looked at the time of the prediction event. A naive join on (entity, timestamp) using the latest available value leaks future information into training.

A feature store handles this with as-of joins: for each training row at time $t$ , fetch the most recent feature value with timestamp $\leq t$ . Doing this correctly is fiddly; centralizing it in one system is worth a lot.

When to actually use a feature store

Multiple models share features
You have a serving-time latency budget you can't blow on recomputation
Skew bugs have bitten you before

When you don't need one:

One model, one team, batch scoring only
Features are computed inline from request payload (no historical aggregation)
Prototyping