VC Dimension and the Fundamental Theorem of Statistical Learning — from Scratch May 2026 This post answers a single question: when does learning from data actually work? You train a model on samples, it performs well on those samples, and you hope it performs well on new data. When is that hope justified? The answer turns out to be a clean equivalence: a hypothesis class is learnable if and only if it has finite VC dimension. This is the Fundamental Theorem of Statistical Learning. We'll build every piece from scratch: Markov's inequality, Hoeffding's lemma, concentration inequalities, the union bound, VC dimension, the Sauer–Shelah lemma, symmetrization, and the generalization bounds that tie it all together. Nothing is assumed beyond basic probability (expectations, independence, indicator random variables). We prove both directions of the equivalence: the upper bound (finite VC dimension implies learnability) and the lower bound (via information theory: total variation, KL divergence, Pinsker's inequality, and Le Cam's method). This is Part 1 of two. Part 2 continues the story with Rademacher complexity — a probabilistic refinement of VC dimension that gives tighter, data-dependent bounds — and shows how all the pieces connect in one beautiful chain. 1. The Setup: What Does “Learning” Mean? We work in the simplest interesting setting: binary classification. Definition (The Learning Problem) Instance space \(\mathcal{X}\): the set of all possible inputs (images, documents, feature vectors, etc.). Label space \(\mathcal{Y} = \{0, 1\}\): binary labels. Distribution \(\mathcal{D}\): an unknown probability distribution over \(\mathcal{X} \times \mathcal{Y}\). Nature draws \((x, y) \sim \mathcal{D}\). Hypothesis class \(\mathcal{H} \subseteq \{0,1\}^{\mathcal{X}}\): a fixed set of classifiers \(h : \mathcal{X} \to \{0,1\}\) that the learner is allowed to pick from. Training sample \(S = ((x_1, y_1), \ldots, (x_m, y_m))\): drawn i.i.d. from \(\mathcal{D}\). Definition (...
First seen: 2026-05-24 02:46
Last seen: 2026-05-24 02:46