Chapter 10: Econometrics Foundations

Economics makes causal claims — minimum wages affect employment, education raises earnings, institutions determine growth. Testing these claims requires data and a method for distinguishing causation from correlation. Econometrics is that method.

This chapter is not a statistics course. We assume familiarity with basic probability and regression. Instead, we focus on the central problem of empirical economics: identification — finding credible sources of exogenous variation that allow us to estimate causal effects. Every tool in this chapter — OLS, instrumental variables, difference-in-differences, regression discontinuity — is a strategy for solving the identification problem.

Prerequisites: Chapters 2 and 5 (economic context for examples). Mathematical prerequisites: linear algebra, probability and statistics.

10.1 The Identification Problem

Consider the question: does an additional year of education increase earnings? We observe that more-educated people earn more. But is this because:

Both are consistent with the observed correlation. The identification problem is that we cannot directly compare the same person with and without education — the counterfactual is unobserved.

where $Y_i$ is the outcome (earnings), $X_i$ is the treatment (years of education), $\beta$ is the causal parameter of interest, and $\varepsilon_i$ captures everything else affecting $Y_i$ — ability, family background, motivation, luck, health, and thousands of other factors.

The identification problem arises when $X_i$ is correlated with $\varepsilon_i$ — when the "treatment" is not randomly assigned. In statistics, this is called endogeneity. In economics, it is the norm, not the exception: people choose their education (and the choice is correlated with ability), countries choose their policies (and the choice is correlated with their economic conditions), firms choose their prices (and the choice is correlated with demand conditions).

In a randomized experiment, the treatment $X_i$ is assigned by a coin flip — it is independent of $\varepsilon_i$ by construction. But economists rarely have the luxury of randomization for the big questions. The methods in this chapter — OLS, IV, DiD, RD — are strategies for finding "natural experiments" that approximate randomization in observational data.

10.2 Ordinary Least Squares (OLS)

Under these assumptions, OLS is BLUE — the Best Linear Unbiased Estimator. "Best" means lowest variance among all linear unbiased estimators. "Unbiased" means $E[\hat{\beta}] = \beta$.

The critical assumption is #4: $E[\varepsilon|X] = 0$. When this fails — due to omitted variables, simultaneity, or measurement error in $X$ — OLS is biased. The estimate $\hat{\beta}$ no longer converges to the true $\beta$ even with infinite data. This is not a small-sample problem — it is a fundamental design flaw that more data cannot fix.

Omitted Variable Bias

Suppose the true model is $Y = \beta_0 + \beta_1 X + \beta_2 Z + u$, but we omit $Z$ and run $Y = \alpha_0 + \alpha_1 X + e$. Then:

The bias equals the effect of the omitted variable ($\beta_2$) times the association between the omitted variable and the included regressor.

10.3 Instrumental Variables (IV)

	$Cov(X, Z) > 0$	$Cov(X, Z) < 0$
$\beta_2 > 0$	Upward bias (overestimate $\beta_1$)	Downward bias
$\beta_2 < 0$	Downward bias	Upward bias

When OLS is biased because $X$ is endogenous ($Cov(X, \varepsilon) \neq 0$), an instrumental variable can rescue the estimation.

This isolates the part of $X$ driven by the instrument — the exogenous part. The fitted values $\hat{X}_i$ represent the "clean" variation in $X$.

The IV estimate is the ratio of the reduced form (effect of $Z$ on $Y$) to the first stage (effect of $Z$ on $X$). The intuition: $Z$ affects $Y$ only through $X$ (exclusion restriction), so dividing out the first stage isolates the causal effect of $X$ on $Y$.

What IV estimates. With heterogeneous treatment effects, IV identifies the Local Average Treatment Effect (LATE) — the causal effect for the subpopulation whose behavior is changed by the instrument (the "compliers").

Weak Instruments

If $Z$ is weakly correlated with $X$, the first stage is weak, and the IV estimate is unreliable (biased toward OLS, wide confidence intervals). Rule of thumb: first-stage F-statistic > 10.

10.4 Difference-in-Differences (DiD)

The first difference removes time-invariant group characteristics. The second difference removes common time trends.

Key assumption: Parallel trends. In the absence of treatment, the treatment and control groups would have followed the same trend. This is untestable for the post-treatment period but assessable for the pre-treatment period.

Big Question #3

Do minimum wages cause unemployment?

You now have difference-in-differences, instrumental variables, and the tools of causal identification. This is where the minimum wage debate gets resolved — not by theory, but by evidence.

What the model says

Card and Krueger (1994) applied the method you just learned — difference-in-differences — to a natural experiment. When New Jersey raised its minimum wage from \$4.25 to \$5.05 in 1992, neighboring Pennsylvania didn't. By surveying fast-food restaurants on both sides of the border before and after the increase, they constructed a clean DiD estimate: the treatment group (NJ) versus the control group (PA), differencing out common trends. The result stunned the profession: employment in New Jersey fast-food restaurants didn't fall. If anything, it rose slightly. The competitive model's prediction — that a binding price floor reduces quantity demanded — failed its most direct empirical test. Subsequent studies using county-border designs (Dube, Lester & Reich, 2010) confirmed the pattern: comparing adjacent counties across state lines where one side raised its minimum wage and the other didn't, employment effects were small to negligible for moderate increases.

The strongest counter

Neumark and Wascher mounted the most sustained challenge. Using payroll data from the Bureau of Labor Statistics instead of Card and Krueger's telephone surveys, they found employment did decline in New Jersey — the original result, they argued, was an artifact of noisy survey data. Beyond data quality, the critique has structural force: DiD captures short-run effects, but firms adjust on multiple margins over time. Hours get cut even when headcount doesn't (Jardim et al., 2022, on Seattle's \$15 minimum). Benefits erode. Automation accelerates — self-order kiosks and scheduling software aren't coincidental. And the border-design studies may systematically understate effects by comparing areas that are economically similar precisely because they trade workers across the border, contaminating the control group. The meta-analysis is genuinely mixed: which studies you weight, and how, determines whether you find small negative effects or no effects.

How the mainstream responded

The field's response illustrates what economists call the "credibility revolution" — the shift from estimating structural models to designing identification strategies. Card and Krueger didn't just challenge a prediction; they changed how empirical economics is done. The question moved from "what does the model predict?" to "can we find a credible research design that isolates the causal effect?" Cengiz, Dube, Lindner, and Zipperer (2019) produced the most comprehensive answer to date, analyzing 138 state-level minimum wage changes using a bunching estimator. They looked at the entire wage distribution: jobs paying just below the new minimum disappeared, jobs paying at or just above it appeared, and — crucially — total employment in the affected range barely changed. The jobs didn't vanish; they moved up the wage ladder. This is exactly what the monopsony model from Chapter 6 predicts and exactly what the competitive model says shouldn't happen.

The judgment (at this level)

The textbook prediction — that minimum wages cause unemployment — is wrong as a general empirical claim. Moderate minimum wage increases, up to roughly 50–60% of the local median wage, produce minimal detectable employment effects in most credible studies. This is consistent with monopsony power in low-wage labor markets: when employers have wage-setting power, a moderate minimum wage pushes them toward the competitive outcome rather than away from it. But "moderate" is the operative word. The competitive model isn't wrong — it's incomplete. Push the minimum wage high enough relative to local conditions (above 60% of the median, as a federal \$15 would in low-wage regions), and the standard prediction reasserts itself. The deeper lesson is methodological: a theoretical prediction that seemed airtight for decades was overturned not by better theory but by better identification. The model was logically correct; its empirical relevance was the question all along.

What you can't resolve yet

This Big Question is essentially resolved at this level: moderate minimum wages don't cause significant unemployment, consistent with monopsony. The remaining frontier is calibration, not direction. How high can you go before disemployment appears? The answer varies by region, sector, and time horizon — and the automation margin (kiosks, AI scheduling, self-checkout) may make long-run effects larger than short-run DiD estimates capture. The debate has shifted from "does it cause unemployment?" to "what's the right number for this labor market?" — which is a policy design question, not an economic theory question. The tools you learned in this chapter — DiD, IV, identification strategy — are exactly how that calibration question gets answered.

Related Takes

Take

"A \$7.25 minimum wage is a starvation wage" — AOC on the House floor, 2019

The Fight for \$15 made a number into a movement. But \$15 in San Francisco is very different from \$15 in rural Mississippi. The evidence says moderate increases work — is \$15 moderate?

Intro

Take

What should a living wage be?

If the minimum wage isn't about employment anymore, it's about adequacy. How do economists measure what "enough" means — and who decides?

Intermediate

← Previous: Ch 6 — Monopsony and market power Stop 3 of 3 (Final)

10.5 Regression Discontinuity (RD)

Key assumption: Continuity. All factors affecting $Y$ (other than treatment) vary continuously at the cutoff — no sorting or manipulation around the threshold.

10.6 Randomized Controlled Trials (RCTs)

RCTs are the "gold standard" for internal validity because randomization guarantees $E[\varepsilon|X] = 0$ by construction. Banerjee, Duflo, and Kremer received the 2019 Nobel Prize for their experimental approach to alleviating global poverty.

Limitations of RCTs

10.7 Standard Errors and Inference

Standard errors (SE) are the square roots of the diagonal elements. A 95% confidence interval is approximately $\hat{\beta} \pm 1.96 \cdot SE(\hat{\beta})$.

Statistical significance: We reject $H_0: \beta = 0$ at the 5% level if $|t| = |\hat{\beta}/SE(\hat{\beta})| > 1.96$.

Economic significance vs statistical significance: A coefficient can be statistically significant but economically trivial. Conversely, an imprecise estimate can be economically large but statistically insignificant. Good empirical work discusses both.

Threats to Valid Inference

A practical rule: In modern applied economics, always use robust or clustered standard errors.

10.8 Threats to Validity

Strategy	Key Assumption	Threat	Diagnostic
OLS	No omitted variables ($E[\varepsilon\|X]=0$)	Confounding	Theory + sensitivity analysis
IV	Exclusion restriction	Direct effect of $Z$ on $Y$	Cannot test directly; argue theoretically
IV	Relevance	Weak instruments	First-stage F > 10
DiD	Parallel trends	Differential pre-trends	Plot pre-treatment trends
RD	No manipulation at cutoff	Sorting around threshold	McCrary density test
RCT	No attrition, no spillovers	Differential dropout; contamination	Balance checks, attrition analysis

	Pre-policy (2023)	Post-policy (2025)	Change
Eastern (treatment)	55	63	+8
Western (control)	52	56	+4
DiD estimate			+4

Label	Equation	Description
Eq. 10.1	$Y_i = \alpha + \beta X_i + \varepsilon_i$	Structural equation
Eq. 10.2	$\hat{\beta}_{OLS} = (X'X)^{-1}X'Y$	OLS estimator
Eq. 10.3	$E[\hat{\alpha}_1] = \beta_1 + \beta_2 \cdot Cov(X,Z)/Var(X)$	Omitted variable bias formula
Eq. 10.5	$\hat{\beta}_{IV} = Cov(Z,Y)/Cov(Z,X)$	IV estimator (simple)
Eq. 10.6	$\hat{\tau}_{DiD}$ = (treat change) − (control change)	DiD estimator
Eq. 10.7	$Y_{it} = \alpha + \beta_1 Treat + \beta_2 Post + \tau(Treat \times Post) + \varepsilon$	DiD regression
Eq. 10.8	$\hat{\tau}_{RD} = \lim_{x \downarrow c} E[Y\|X=x] - \lim_{x \uparrow c} E[Y\|X=x]$	RD estimator
Eq. 10.9	$\hat{\tau}_{RCT} = \bar{Y}_{treat} - \bar{Y}_{control}$	RCT estimator
Eq. 10.10	$Var(\hat{\beta}) = \sigma^2(X'X)^{-1}$	OLS variance

Exercises

Practice

Suppose you regress wages on years of education using OLS and estimate a coefficient of 0.10 (each year of education is associated with 10% higher wages). List two omitted variables that could bias this estimate and predict the direction of bias for each.
An IV study uses "distance to nearest college" as an instrument for years of schooling. (a) Argue for relevance. (b) What is the exclusion restriction, and what might violate it?
Two cities are compared before and after City A enacts a soda tax. Pre-tax, soda consumption in City A was 100 cans/person and in City B was 90. Post-tax, consumption is 80 in A and 85 in B. Compute the DiD estimate. What is the parallel trends assumption here?
A scholarship program admits students with GPA ≥ 3.5. You have data on students with GPA from 3.0 to 4.0. (a) Describe the RD design. (b) What is the running variable? (c) What assumption must hold about student behavior near the cutoff?

Apply

A government randomizes access to a job training program. 60% of those offered the program actually attend. The intent-to-treat estimate is a \$100 increase in earnings. What is the treatment-on-treated estimate? What assumption do you need, and how does this relate to IV?
An economist claims that democracy causes economic growth, citing cross-country correlations. Critique this claim using the framework of this chapter. What specific identification strategy would you propose?
A DiD study estimates the effect of an environmental regulation. Pre-treatment trends show the treatment group's pollution was already declining faster than the control group's. How does this violate parallel trends? In which direction is the DiD estimate biased?

Challenge

Derive the OLS estimator $\hat{\beta} = (X'X)^{-1}X'Y$ by minimizing $S(\beta) = (Y - X\beta)'(Y - X\beta)$. Show that the first-order condition gives the normal equations $X'X\hat{\beta} = X'Y$.
Show algebraically that when the instrument $Z$ is binary, the IV estimator reduces to the Wald estimator: $\hat{\beta}_{IV} = (\bar{Y}_1 - \bar{Y}_0)/(\bar{X}_1 - \bar{X}_0)$.
Discuss the "credibility revolution" in economics (Angrist and Pischke, 2010). What changed between structural econometrics and design-based empirical work? What are the strengths and limitations of each approach?

Chapter 10Econometrics Foundations

Introduction