Economics makes causal claims — minimum wages affect employment, education raises earnings, institutions determine growth. Testing these claims requires data and a method for distinguishing causation from correlation. Econometrics is that method.
This chapter is not a statistics course. We assume familiarity with basic probability and regression. Instead, we focus on the central problem of empirical economics: identification — finding credible sources of exogenous variation that allow us to estimate causal effects. Every tool in this chapter — OLS, instrumental variables, difference-in-differences, regression discontinuity — is a strategy for solving the identification problem.
Prerequisites: Chapters 2 and 5 (economic context for examples). Mathematical prerequisites: linear algebra, probability and statistics.
Consider the question: does an additional year of education increase earnings? We observe that more-educated people earn more. But is this because:
Both are consistent with the observed correlation. The identification problem is that we cannot directly compare the same person with and without education — the counterfactual is unobserved.
The fundamental equation:
where $Y_i$ is the outcome (earnings), $X_i$ is the treatment (years of education), $\beta$ is the causal parameter of interest, and $\varepsilon_i$ captures everything else affecting $Y_i$ — ability, family background, motivation, luck, health, and thousands of other factors.
The identification problem arises when $X_i$ is correlated with $\varepsilon_i$ — when the "treatment" is not randomly assigned. In statistics, this is called endogeneity. In economics, it is the norm, not the exception: people choose their education (and the choice is correlated with ability), countries choose their policies (and the choice is correlated with their economic conditions), firms choose their prices (and the choice is correlated with demand conditions).
In a randomized experiment, the treatment $X_i$ is assigned by a coin flip — it is independent of $\varepsilon_i$ by construction. But economists rarely have the luxury of randomization for the big questions. The methods in this chapter — OLS, IV, DiD, RD — are strategies for finding "natural experiments" that approximate randomization in observational data.
For the multivariate model $Y = X\beta + \varepsilon$ (matrix notation):
Under the Gauss-Markov assumptions, OLS has desirable properties:
Under these assumptions, OLS is BLUE — the Best Linear Unbiased Estimator. "Best" means lowest variance among all linear unbiased estimators. "Unbiased" means $E[\hat{\beta}] = \beta$.
The critical assumption is #4: $E[\varepsilon|X] = 0$. When this fails — due to omitted variables, simultaneity, or measurement error in $X$ — OLS is biased. The estimate $\hat{\beta}$ no longer converges to the true $\beta$ even with infinite data. This is not a small-sample problem — it is a fundamental design flaw that more data cannot fix.
A scatter plot with a fitted OLS regression line. Drag the slider to add an outlier at different vertical positions and watch the regression line tilt. Observe how a single high-leverage point can dramatically change the slope, $R^2$, and coefficients.
Figure 10.1. OLS regression with an adjustable outlier. The outlier is placed at $X=14$ (high leverage). Drag the slider above "No outlier" to introduce it and watch the line tilt. Hover for values.
Suppose the true model is $Y = \beta_0 + \beta_1 X + \beta_2 Z + u$, but we omit $Z$ and run $Y = \alpha_0 + \alpha_1 X + e$. Then:
The bias equals the effect of the omitted variable ($\beta_2$) times the association between the omitted variable and the included regressor.
Sign of bias:
| $Cov(X, Z) > 0$ | $Cov(X, Z) < 0$ | |
|---|---|---|
| $\beta_2 > 0$ | Upward bias (overestimate $\beta_1$) | Downward bias |
| $\beta_2 < 0$ | Downward bias | Upward bias |
Suppose ability ($Z$) is positively correlated with both education ($X$) and earnings ($Y$). Then $\beta_2 > 0$ (ability raises earnings) and $Cov(X,Z) > 0$ (more able people get more education). The OLS estimate of the return to education is biased upward — it attributes some of the ability effect to education.
Two panels show the same data. Left: the true relationship with the confounder (ability) shown as point color. Right: the naive OLS regression that omits ability. Drag the slider to change confounding strength and watch the bias grow.
Left: True model with confounder (ability) shown as color. Darker = higher ability.
Right: Naive OLS ignoring ability. The biased line (red dashed) is steeper than the true causal effect (blue).
When OLS is biased because $X$ is endogenous ($Cov(X, \varepsilon) \neq 0$), an instrumental variable can rescue the estimation.
Two-Stage Least Squares (2SLS):
First stage: Regress $X$ on $Z$ (and any control variables):
This isolates the part of $X$ driven by the instrument — the exogenous part. The fitted values $\hat{X}_i$ represent the "clean" variation in $X$.
Second stage: Regress $Y$ on $\hat{X}$. In matrix form:
In the simple case with one instrument and one endogenous regressor:
The IV estimate is the ratio of the reduced form (effect of $Z$ on $Y$) to the first stage (effect of $Z$ on $X$). The intuition: $Z$ affects $Y$ only through $X$ (exclusion restriction), so dividing out the first stage isolates the causal effect of $X$ on $Y$.
What IV estimates. With heterogeneous treatment effects, IV identifies the Local Average Treatment Effect (LATE) — the causal effect for the subpopulation whose behavior is changed by the instrument (the "compliers").
If $Z$ is weakly correlated with $X$, the first stage is weak, and the IV estimate is unreliable (biased toward OLS, wide confidence intervals). Rule of thumb: first-stage F-statistic > 10.
Quarter of birth was used as an instrument for years of schooling. Compulsory schooling laws mean students born earlier in the year can drop out with slightly less education. Quarter of birth is plausibly: (a) correlated with schooling (relevance), and (b) not directly related to earnings (exclusion). The IV estimate of the return to schooling was approximately 7–8% per year.
This directed acyclic graph shows the causal structure of an IV design. Toggle between views to see how an instrument Z breaks the confounding path.
Figure 10.2. DAG for the instrumental variables design. Z is the instrument, X is the endogenous regressor, Y is the outcome, and U is the unobserved confounder. The IV strategy uses only the variation in X that is driven by Z, bypassing the confounding path through U.
The first difference removes time-invariant group characteristics. The second difference removes common time trends.
Key assumption: Parallel trends. In the absence of treatment, the treatment and control groups would have followed the same trend. This is untestable for the post-treatment period but assessable for the pre-treatment period.
New Jersey raised its minimum wage from \$4.25 to \$5.05 in April 1992; Pennsylvania did not. The DiD estimate of the employment effect was positive (+2.7 FTE workers), contradicting the simple competitive model prediction. This study spurred a revolution in empirical labor economics.
Regression formulation:
Two time series show a treatment group and a control group. The treatment occurs at $t = 5$. Drag the slider to change the treatment effect size and see how the DiD estimate updates. Pre-treatment parallel trends are visible.
Figure 10.3. Difference-in-differences design. The dashed line shows the counterfactual — what would have happened to the treatment group without treatment (parallel to control). The gap between the actual and counterfactual outcomes at the end is the treatment effect.
You now have difference-in-differences, instrumental variables, and the tools of causal identification. This is where the minimum wage debate gets resolved — not by theory, but by evidence.
Card and Krueger (1994) applied the method you just learned — difference-in-differences — to a natural experiment. When New Jersey raised its minimum wage from \$4.25 to \$5.05 in 1992, neighboring Pennsylvania didn't. By surveying fast-food restaurants on both sides of the border before and after the increase, they constructed a clean DiD estimate: the treatment group (NJ) versus the control group (PA), differencing out common trends. The result stunned the profession: employment in New Jersey fast-food restaurants didn't fall. If anything, it rose slightly. The competitive model's prediction — that a binding price floor reduces quantity demanded — failed its most direct empirical test. Subsequent studies using county-border designs (Dube, Lester & Reich, 2010) confirmed the pattern: comparing adjacent counties across state lines where one side raised its minimum wage and the other didn't, employment effects were small to negligible for moderate increases.
Neumark and Wascher mounted the most sustained challenge. Using payroll data from the Bureau of Labor Statistics instead of Card and Krueger's telephone surveys, they found employment did decline in New Jersey — the original result, they argued, was an artifact of noisy survey data. Beyond data quality, the critique has structural force: DiD captures short-run effects, but firms adjust on multiple margins over time. Hours get cut even when headcount doesn't (Jardim et al., 2022, on Seattle's \$15 minimum). Benefits erode. Automation accelerates — self-order kiosks and scheduling software aren't coincidental. And the border-design studies may systematically understate effects by comparing areas that are economically similar precisely because they trade workers across the border, contaminating the control group. The meta-analysis is genuinely mixed: which studies you weight, and how, determines whether you find small negative effects or no effects.
The field's response illustrates what economists call the "credibility revolution" — the shift from estimating structural models to designing identification strategies. Card and Krueger didn't just challenge a prediction; they changed how empirical economics is done. The question moved from "what does the model predict?" to "can we find a credible research design that isolates the causal effect?" Cengiz, Dube, Lindner, and Zipperer (2019) produced the most comprehensive answer to date, analyzing 138 state-level minimum wage changes using a bunching estimator. They looked at the entire wage distribution: jobs paying just below the new minimum disappeared, jobs paying at or just above it appeared, and — crucially — total employment in the affected range barely changed. The jobs didn't vanish; they moved up the wage ladder. This is exactly what the monopsony model from Chapter 6 predicts and exactly what the competitive model says shouldn't happen.
The textbook prediction — that minimum wages cause unemployment — is wrong as a general empirical claim. Moderate minimum wage increases, up to roughly 50–60% of the local median wage, produce minimal detectable employment effects in most credible studies. This is consistent with monopsony power in low-wage labor markets: when employers have wage-setting power, a moderate minimum wage pushes them toward the competitive outcome rather than away from it. But "moderate" is the operative word. The competitive model isn't wrong — it's incomplete. Push the minimum wage high enough relative to local conditions (above 60% of the median, as a federal \$15 would in low-wage regions), and the standard prediction reasserts itself. The deeper lesson is methodological: a theoretical prediction that seemed airtight for decades was overturned not by better theory but by better identification. The model was logically correct; its empirical relevance was the question all along.
This Big Question is essentially resolved at this level: moderate minimum wages don't cause significant unemployment, consistent with monopsony. The remaining frontier is calibration, not direction. How high can you go before disemployment appears? The answer varies by region, sector, and time horizon — and the automation margin (kiosks, AI scheduling, self-checkout) may make long-run effects larger than short-run DiD estimates capture. The debate has shifted from "does it cause unemployment?" to "what's the right number for this labor market?" — which is a policy design question, not an economic theory question. The tools you learned in this chapter — DiD, IV, identification strategy — are exactly how that calibration question gets answered.
The Fight for \$15 made a number into a movement. But \$15 in San Francisco is very different from \$15 in rural Mississippi. The evidence says moderate increases work — is \$15 moderate?
IntroIf the minimum wage isn't about employment anymore, it's about adequacy. How do economists measure what "enough" means — and who decides?
IntermediateKey assumption: Continuity. All factors affecting $Y$ (other than treatment) vary continuously at the cutoff — no sorting or manipulation around the threshold.
A scholarship is awarded to students scoring above 80 on an exam. Students scoring 79 and 81 are similar in ability but one gets the scholarship and the other does not. The discontinuity in outcomes (e.g., college completion rates) at the 80-point threshold estimates the causal effect of the scholarship.
A scatter plot with a running variable (test score). Students above the cutoff receive treatment (scholarship). Polynomial fits on each side reveal the jump at the cutoff. Adjust the cutoff position and the bandwidth to see how the estimated treatment effect changes.
Figure 10.4. Regression discontinuity. The vertical dashed line marks the cutoff. Points left of the cutoff are untreated (gray); right are treated (green). The jump at the cutoff is the treatment effect estimate. Adjust the bandwidth to focus on observations near the cutoff.
RCTs are the "gold standard" for internal validity because randomization guarantees $E[\varepsilon|X] = 0$ by construction. Banerjee, Duflo, and Kremer received the 2019 Nobel Prize for their experimental approach to alleviating global poverty.
A job training program randomly assigns 500 individuals to treatment and 500 to control. Only 60% of those assigned to treatment actually attend the program (compliance rate = 0.6).
Results: Average earnings: treatment group = \$25,000, control group = \$23,000.
ITT: $\hat{\tau}_{ITT} = 25{,}000 - 23{,}000 = \\$2{,}000$. This is the effect of being offered the program.
TOT: $\hat{\tau}_{TOT} = 2{,}000 / 0.6 = \\$3{,}333$. This estimates the effect of actually attending the program (for compliers). The TOT is larger because the ITT is diluted by non-compliers.
Power check: With $n = 500$ per group, $\sigma = \\$2{,}000$, and a true effect of $\\$2{,}000$, power $\approx 0.80$. The study is adequately powered to detect the ITT.
Statistical power is the probability of detecting a true treatment effect. Use the sliders to explore how effect size, sample size, and variance affect power. The power curve updates in real time, and the minimum detectable effect (MDE) at 80% power is highlighted.
Figure 10.5. Power curve: probability of detecting the effect as a function of effect size. The red dashed line marks 80% power. The green diamond marks the current parameter combination. The MDE is the smallest effect detectable at 80% power given sample size and variance.
A point estimate without a measure of uncertainty is nearly useless.
Standard errors (SE) are the square roots of the diagonal elements. A 95% confidence interval is approximately $\hat{\beta} \pm 1.96 \cdot SE(\hat{\beta})$.
Statistical significance: We reject $H_0: \beta = 0$ at the 5% level if $|t| = |\hat{\beta}/SE(\hat{\beta})| > 1.96$.
Economic significance vs statistical significance: A coefficient can be statistically significant but economically trivial. Conversely, an imprecise estimate can be economically large but statistically insignificant. Good empirical work discusses both.
A practical rule: In modern applied economics, always use robust or clustered standard errors.
Every empirical strategy has assumptions that can fail:
| Strategy | Key Assumption | Threat | Diagnostic |
|---|---|---|---|
| OLS | No omitted variables ($E[\varepsilon|X]=0$) | Confounding | Theory + sensitivity analysis |
| IV | Exclusion restriction | Direct effect of $Z$ on $Y$ | Cannot test directly; argue theoretically |
| IV | Relevance | Weak instruments | First-stage F > 10 |
| DiD | Parallel trends | Differential pre-trends | Plot pre-treatment trends |
| RD | No manipulation at cutoff | Sorting around threshold | McCrary density test |
| RCT | No attrition, no spillovers | Differential dropout; contamination | Balance checks, attrition analysis |
An economist wants to estimate the effect of Kaelani's new education policy (free textbooks for grades 1–6) on test scores. The policy was implemented in the eastern provinces in 2024 but not the western provinces.
Design: Difference-in-differences.
| Pre-policy (2023) | Post-policy (2025) | Change | |
|---|---|---|---|
| Eastern (treatment) | 55 | 63 | +8 |
| Western (control) | 52 | 56 | +4 |
| DiD estimate | +4 |
The DiD estimate is 4 points. Free textbooks raised test scores by 4 points, after controlling for the common upward trend.
Threats: (1) Parallel trends: Were eastern provinces already improving faster? (2) Spillovers: Did families near the border send children to eastern schools? (3) Composition changes: Did free textbooks change enrollment?
A complementary approach: regression discontinuity at the provincial border, comparing villages just on either side.
| Label | Equation | Description |
|---|---|---|
| Eq. 10.1 | $Y_i = \alpha + \beta X_i + \varepsilon_i$ | Structural equation |
| Eq. 10.2 | $\hat{\beta}_{OLS} = (X'X)^{-1}X'Y$ | OLS estimator |
| Eq. 10.3 | $E[\hat{\alpha}_1] = \beta_1 + \beta_2 \cdot Cov(X,Z)/Var(X)$ | Omitted variable bias formula |
| Eq. 10.5 | $\hat{\beta}_{IV} = Cov(Z,Y)/Cov(Z,X)$ | IV estimator (simple) |
| Eq. 10.6 | $\hat{\tau}_{DiD}$ = (treat change) − (control change) | DiD estimator |
| Eq. 10.7 | $Y_{it} = \alpha + \beta_1 Treat + \beta_2 Post + \tau(Treat \times Post) + \varepsilon$ | DiD regression |
| Eq. 10.8 | $\hat{\tau}_{RD} = \lim_{x \downarrow c} E[Y|X=x] - \lim_{x \uparrow c} E[Y|X=x]$ | RD estimator |
| Eq. 10.9 | $\hat{\tau}_{RCT} = \bar{Y}_{treat} - \bar{Y}_{control}$ | RCT estimator |
| Eq. 10.10 | $Var(\hat{\beta}) = \sigma^2(X'X)^{-1}$ | OLS variance |