Comparison of Linear, Lasso, and Ridge Regression

Predicting student performance from study hours is a classic regression task. This experiment compares three widely used regression techniques — Linear Regression, Lasso Regression, and Ridge Regression — to assess their predictive accuracy, stability, and generalization performance on noisy or high-dimensional data. Regularization methods such as Lasso and Ridge are specifically designed to prevent overfitting and enhance model robustness.

1. Linear Regression (Ordinary Least Squares)

Linear Regression models the relationship between study hours (X) and exam score (Y) using a linear equation:

Y = β₀ + β₁ X + ε

Where:

  • Y → Exam score (dependent variable)
  • X → Number of study hours (independent variable)
  • β₀ → Intercept (expected score when X = 0)
  • β₁ → Slope (increase in score per extra hour)
  • ε → Random error, assumed ε ~ N(0, σ²)

Interpretation Example:
A fitted model Ŷ = 40 + 5X means each additional study hour increases the predicted score by 5 points.

Cost Function (minimized by OLS):

J(β₀, β₁) = (1/n) Σ(i=1 to n) (Y_i − Ŷ_i)²

Key Assumptions:

  • Linearity
  • Independence
  • Homoscedasticity
  • Normality of residuals

2. Lasso Regression (L1 Regularization)

Lasso adds an L1 penalty penalty to the cost function:

J(β₀, β) = MSE + α Σ(j=1 to p) |βⱼ|

Key Characteristics:

  • Drives unimportant coefficients exactly to zero
  • Performs automatic feature selection
  • Produces sparse and interpretable models

Best for: Student datasets with many predictors — Lasso will keep only the truly important ones (e.g., only study hours matter, sleep may be dropped).

3. Ridge Regression (L2 Regularization)

Ridge adds an L2 penalty (squared coefficients):

J(β₀, β) = MSE + α Σ(j=1 to p) βⱼ²

Key Characteristics:

  • Shrinks coefficients toward zero but rarely to exactly zero
  • Excellent at handling multicollinearity
  • More stable when predictors are highly correlated

Best for: Cases where study hours, attendance, and revision time are correlated — Ridge keeps all but prevents extreme values.

Comparison of the Three Models

Aspect Linear Regression Lasso Regression Ridge Regression
Regularization None L1 → **`Σ βⱼ
Feature Selection No Yes (sets some β = 0) No
Handles Multicollinearity Poor Moderate Excellent
Coefficient Shrinkage None Strong (can be zero) Moderate (near zero)
Resulting Model Dense Sparse Dense
Best Use Case Clean, low-dim data Need sparsity & selection Correlated predictors