Training-serving skew is when the features your model trains on differ from the features it serves on. It's the most common ML production bug — and the most expensive one, because the model trains fine and only fails after deployment.
How skew happens
- Different code paths: training is a Python pipeline; serving is Go or a SQL view. Subtle differences in tokenization, missing-value handling, or rounding produce different feature values.
- Different data sources: training reads from a warehouse snapshot; serving reads from a live OLTP database. The schemas have drifted.
- Different time semantics: training computed "user's purchases in last 30 days" with a calendar window; serving computes it as "purchases in last 30 days from this exact moment."
Feature stores
A centralized service that:
- Stores feature definitions (the code that computes a feature from raw data) once.
- Materializes features into both a training-time store (warehouse, Parquet) and a serving-time store (low-latency KV like Redis or DynamoDB).
- Guarantees the *same* feature value at training and serving time, given the same entity and timestamp.
Examples: Feast (open source), Tecton, Vertex AI Feature Store, Databricks Feature Store.
Point-in-time correctness
The hard part. Training requires features as they would have looked at the time of the prediction event. A naive join on (entity, timestamp) using the latest available value leaks future information into training.
A feature store handles this with as-of joins: for each training row at time , fetch the most recent feature value with timestamp . Doing this correctly is fiddly; centralizing it in one system is worth a lot.
When to actually use a feature store
- Multiple models share features
- You have a serving-time latency budget you can't blow on recomputation
- Skew bugs have bitten you before
When you don't need one:
- One model, one team, batch scoring only
- Features are computed inline from request payload (no historical aggregation)
- Prototyping