Virtual Labs

Comparison of Linear, Lasso, and Ridge Regression

Predicting student performance from study hours is a classic regression task. This experiment compares three widely used regression techniques — Linear Regression, Lasso Regression, and Ridge Regression — to assess their predictive accuracy, stability, and generalization performance on noisy or high-dimensional data. Regularization methods such as Lasso and Ridge are specifically designed to prevent overfitting and enhance model robustness.

1. Linear Regression (Ordinary Least Squares)

Linear Regression models the relationship between study hours (X) and exam score (Y) using a linear equation:

Y = β₀ + β₁ X + ε

Where:

Y → Exam score (dependent variable)
X → Number of study hours (independent variable)
β₀ → Intercept (expected score when X = 0)
β₁ → Slope (increase in score per extra hour)
ε → Random error, assumed ε ~ N(0, σ²)

Interpretation Example:
A fitted model Ŷ = 40 + 5X means each additional study hour increases the predicted score by 5 points.

Cost Function (minimized by OLS):

J(β₀, β₁) = (1/n) Σ(i=1 to n) (Y_i − Ŷ_i)²

Key Assumptions:

Linearity
Independence
Homoscedasticity
Normality of residuals

2. Lasso Regression (L1 Regularization)

Lasso adds an L1 penalty penalty to the cost function:

J(β₀, β) = MSE + α Σ(j=1 to p) |βⱼ|

Key Characteristics:

Drives unimportant coefficients exactly to zero
Performs automatic feature selection
Produces sparse and interpretable models

Best for: Student datasets with many predictors — Lasso will keep only the truly important ones (e.g., only study hours matter, sleep may be dropped).

3. Ridge Regression (L2 Regularization)

Ridge adds an L2 penalty (squared coefficients):

J(β₀, β) = MSE + α Σ(j=1 to p) βⱼ²

Key Characteristics:

Shrinks coefficients toward zero but rarely to exactly zero
Excellent at handling multicollinearity
More stable when predictors are highly correlated

Best for: Cases where study hours, attendance, and revision time are correlated — Ridge keeps all but prevents extreme values.

Comparison of the Three Models

Aspect	Linear Regression	Lasso Regression	Ridge Regression
Regularization	None	L1 → **`Σ	βⱼ
Feature Selection	No	Yes (sets some β = 0)	No
Handles Multicollinearity	Poor	Moderate	Excellent
Coefficient Shrinkage	None	Strong (can be zero)	Moderate (near zero)
Resulting Model	Dense	Sparse	Dense
Best Use Case	Clean, low-dim data	Need sparsity & selection	Correlated predictors