pagesxyz
JobsCompaniesBlogResourcesCommunity
FeedbackContact
JobsCompaniesResourcesBlogContactFeedback

Foundations of Probability

  • What is Probability?
  • Theoretical vs Empirical Probability
  • Three Views of Probability
  • Sample Space and Events
  • Axioms of Probability
  • Independence and Expectation
  • Variance and Standard Deviation
  • Covariance and Correlation
  • Key Inequalities

Set Theory & Combinatorics

  • Set Operations in Probability
  • Counting Methods
  • Advanced Counting

Conditional & Bayesian Probability

  • Conditional Probability
  • Bayes' Theorem
  • Law of Total Probability

Random Variables & Distributions

  • What is a Random Variable?
  • Discrete vs Continuous
  • PDFs and CDFs
  • Expectation, Variance, and Moments

Discrete Distributions

  • Bernoulli and Binomial
  • Poisson and Geometric
  • Negative Binomial and Hypergeometric

Continuous Distributions

  • Uniform and Normal
  • Exponential, Gamma, Beta
  • Heavy-Tailed Distributions

Limit Theorems

  • Law of Large Numbers
  • Central Limit Theorem
  • Convergence in Probability vs Distribution

Frequentist Inference

  • Confidence Intervals
  • Hypothesis Testing
  • p-values and Statistical Decisions
  • Type I and Type II Errors
  • Power and Effect Size
  • Bootstrapping and Resampling

Advanced Probability Tools

  • Law of the Unconscious Statistician
  • Moment Generating Functions
  • Characteristic Functions
  • Markov Chains
  • Stationary Distributions

Bayesian Inference

  • Bayesian Philosophy
  • Prior, Likelihood, Posterior
  • Conjugate Priors
  • MCMC and Modern Computation

Regression Analysis

  • Ordinary Least Squares
  • Multiple Linear Regression
  • Regression Diagnostics
  • Regularization
  • Logistic and Generalized Linear Models

Multivariate Statistics

  • Joint, Marginal, and Conditional
  • Multivariate Normal
  • Covariance Matrices
  • Correlation vs Causation
  • Principal Component Analysis

Stochastic Processes

  • Random Walks
  • Poisson Processes
  • Brownian Motion
  • Itô's Lemma
  • Martingales
  • Geometric Brownian Motion

Simulation & Approximation

  • Monte Carlo Simulation
  • Variance Reduction
  • Bootstrapping for Finance
  • Quasi-Monte Carlo

Time Series

  • Stationarity and Autocorrelation
  • AR, MA, and ARIMA
  • GARCH and Volatility Clustering
  • Cointegration and Pairs Trading
  • Kalman Filters

Information Theory

  • Shannon Entropy
  • Kullback–Leibler Divergence
  • Mutual Information
  • Maximum Entropy

Linear Algebra

  • Vectors, Norms, and Inner Products
  • Matrix Operations
  • Eigenvalues and Eigenvectors
  • Singular Value Decomposition
  • Positive Definite Matrices
  • Numerical Stability

Calculus & Optimization

  • Multivariate Calculus
  • Lagrange Multipliers
  • Convex Optimization
  • Gradient Descent and Variants
  • Stochastic Calculus Primer

Machine Learning Fundamentals

  • Supervised vs Unsupervised
  • Bias–Variance Trade-off
  • Cross-Validation
  • Tree-Based Methods
  • Support Vector Machines
  • Clustering and Dimensionality Reduction
  • Classification Metrics

Deep Learning

  • Feedforward Networks
  • Backpropagation
  • Optimizers and Schedules
  • Regularization in DL
  • Architectures for Finance
  • Loss Functions

Options Pricing

  • Payoffs and Put–Call Parity
  • Risk-Neutral Valuation
  • Binomial Trees
  • Black–Scholes
  • The Greeks
  • Volatility Smile and Surface
  • Exotic Options

Portfolio Theory

  • Mean–Variance Optimization
  • CAPM and Factor Models
  • Sharpe, Sortino, and Information Ratio
  • Black–Litterman
  • Risk Parity

Trading & Risk Applications

  • Value-at-Risk
  • Expected Shortfall
  • Backtesting
  • Market Making Basics
  • Execution and Market Microstructure
  • Statistical Arbitrage
Study Guide/Machine Learning Fundamentals
Section 19 · Lesson 19.86

Cross-Validation

Estimating out-of-sample performance honestly.

Cross-validation estimates how a model will perform on unseen data by splitting the available data into training and validation parts. The simplest version is kkk-fold: divide data into kkk groups, train on k−1k-1k−1 and validate on the held-out group, rotate, and average.

Why bother? In-sample error always underestimates out-of-sample error because the model has been tuned to the data it's evaluated on. Cross-validation removes this leak.

Beware of look-ahead bias and shuffling in time series. For temporal data, use forward-chaining cross-validation (train on past, validate on future) rather than random splits — random splits can leak future information into training.

In quant trading, cross-validation results are necessary but not sufficient: even rigorous CV can produce overconfident estimates due to dataset re-use across many model trials. Walk-forward backtests and out-of-sample lockboxes provide harder-to-fool reality checks.

Why is random k-fold cross-validation a poor choice for time-series data?

Previous
Bias–Variance Trade-off
Next
Tree-Based Methods