What is Regression Analysis ?

July 3, 2017

What is Regression Analysis?

Regression analysis is a statistical technique used to examine and quantify the relationship between variables. In economics and econometrics, it allows us to move beyond describing a relationship in words and instead estimate it precisely: by how much does the dependent variable change when an independent variable changes by one unit?

Regression Analysis is a statistical technique that explains the change in a dependent variable due to movement in one or more independent variables. It is a technique for predicting the unknown value of a variable based on the known values of other variables.

Dependent and Independent Variables

Every regression model has:

Dependent variable (Y) — the variable we are trying to explain or predict. Also called the regressand, explained variable, or response variable.
Independent variable(s) (X) — the variables we use to explain Y. Also called regressors, explanatory variables, or predictor variables.

Example: Suppose we want to explain the quantity demanded of a good:

Q = f(P, P_s, Y_d)

Where Q (quantity demanded) is the dependent variable, and P (price), P_s (price of substitutes), and Y_d (income) are the independent variables. Changes in P, P_s, or Y_d cause changes in Q.

Important distinction: Regression analysis measures the strength and direction of a relationship, and tests whether it is statistically significant. It does not automatically prove causation — a strong statistical relationship could still be spurious if the theory underlying it is weak.

The Simple Linear Regression Model

The simple regression model involves just one independent variable and takes the form:

Y = β₀ + β₁X + ε

Where:

Y = the dependent variable
X = the independent variable
β₀ = the intercept — the value of Y when X = 0
β₁ = the slope coefficient — the change in Y for a one-unit increase in X (ceteris paribus)
ε (epsilon) = the error term — captures all other factors affecting Y that are not included in the model

Why is it called “linear”?

The model is linear in two senses:

If plotted, the relationship between Y and X is a straight line.
It is linear in the parameters (β₀ and β₁) — this is the more technically precise meaning in econometrics.

The Multiple Regression Model

Most economic phenomena are influenced by more than one variable. The multiple regression model extends the simple model to include several independent variables:

Y = β₀ + β₁X₁ + β₂X₂ + … + β_kX_k + ε

Each coefficient β_i measures the effect of X_i on Y, holding all other variables constant (ceteris paribus). This is the power of multiple regression — it allows us to isolate the individual effect of each variable.

Example: Wage Equation

Wage = β₀ + β₁(Education) + β₂(Experience) + β₃(Female) + ε

Here β₁ tells us how much wages increase with an extra year of education, holding experience and gender constant. β₃ on the Female dummy variable measures the wage gap between female and male workers with equal education and experience.

Ordinary Least Squares (OLS) Estimation

The most common method for estimating regression coefficients is Ordinary Least Squares (OLS). OLS finds the values of β₀ and β₁ that minimise the sum of squared residuals — the sum of squared differences between observed Y values and the values predicted by the regression line.

Minimise: Σ(Y_i − Ŷ_i)²

Where Ŷ_i is the predicted value of Y for observation i. OLS is the Best Linear Unbiased Estimator (BLUE) when the classical regression assumptions are satisfied (Gauss-Markov theorem).

Key Regression Output Explained

Statistic	What it tells you
Coefficient (β̂)	The estimated effect of X on Y, ceteris paribus
t-statistic	Tests whether the coefficient is statistically significantly different from zero
p-value	The probability of observing this result if the true coefficient were zero; p < 0.05 is typically significant
R² (R-squared)	The proportion of variation in Y explained by the model; ranges from 0 to 1
Adjusted R²	R² adjusted for the number of predictors; more reliable for comparing models
F-statistic	Tests whether the overall model is statistically significant

Classical Assumptions of OLS

For OLS estimates to be valid (unbiased and efficient), several assumptions must hold:

The relationship between Y and X is linear in parameters
No perfect multicollinearity among independent variables
The error term has zero mean: E(ε) = 0
Homoscedasticity: constant variance of the error term across observations
No autocorrelation: error terms are not correlated with each other
The error term is normally distributed (required for hypothesis testing)

Violations of these assumptions (e.g., heteroscedasticity, multicollinearity, autocorrelation) require corrective techniques covered in advanced econometrics.

Summary

Regression analysis is the cornerstone of econometrics. It allows economists to estimate the quantitative relationship between variables, test economic theories with data, and make predictions. The simple model Y = β₀ + β₁X + ε provides the foundation, while the multiple regression model extends it to control for many factors simultaneously. OLS is the standard estimation method, and understanding its assumptions and output is essential for any applied economist.

Lagrange Multipliers in Economics: Shadow Prices, the Envelope Theorem, and Kuhn-Tucker Conditions

by Marjankhan | Jul 16, 2026

Every proposition in microeconomics is the solution to the same problem: optimise subject to a constraint. A complete guide to Lagrange multipliers, the envelope theorem, why λ is a shadow price, Kuhn-Tucker conditions, and the duality between utility maximisation and cost minimisation.

The Solow Growth Model: Steady State, the Golden Rule, and Why Saving Doesn’t Cause Growth

by Marjankhan | Jul 15, 2026

Solow’s model proves that capital accumulation cannot generate sustained growth in living standards — and then declares that the thing which can is exogenous. A complete guide to the steady state, the Golden Rule, growth accounting, conditional convergence, and the East Asian growth debate.

Quantitative Easing and the Zero Lower Bound: What Fifteen Years of Evidence Actually Shows

by Marjankhan | Jul 15, 2026

Central banks expanded their balance sheets by trillions and inflation did not appear — until it did, for reasons that had little to do with QE. A complete guide to the zero lower bound, QE’s transmission channels, the Japanese experiment, and why almost every 2009 prediction was wrong.

Lagrange Multipliers in Economics: Shadow Prices, the Envelope Theorem, and Kuhn-Tucker Conditions

by Marjankhan | Jul 16, 2026

Every proposition in microeconomics is the solution to the same problem: optimise subject to a constraint. A complete guide to Lagrange multipliers, the envelope theorem, why λ is a shadow price, Kuhn-Tucker conditions, and the duality between utility maximisation and cost minimisation.

The Solow Growth Model: Steady State, the Golden Rule, and Why Saving Doesn’t Cause Growth

by Marjankhan | Jul 15, 2026

Solow’s model proves that capital accumulation cannot generate sustained growth in living standards — and then declares that the thing which can is exogenous. A complete guide to the steady state, the Golden Rule, growth accounting, conditional convergence, and the East Asian growth debate.

Quantitative Easing and the Zero Lower Bound: What Fifteen Years of Evidence Actually Shows

by Marjankhan | Jul 15, 2026

Central banks expanded their balance sheets by trillions and inflation did not appear — until it did, for reasons that had little to do with QE. A complete guide to the zero lower bound, QE’s transmission channels, the Japanese experiment, and why almost every 2009 prediction was wrong.

What is Regression Analysis ?