29 junio 2026

Regression (Part 6): Ridge & Lasso — Taming Overfitting

When your model has too much freedom, the solution isn’t to take it away — it’s to charge for using it.

In Part 5 we saw something powerful and something dangerous. The powerful part: by adding extra columns — x², x³, x⁴… — our model can learn curves with ease. The dangerous part: the more columns we add, the more the model can twist itself to fit every training point, including the noise.

The result is familiar: a degree‑10 polynomial that hits every training point perfectly… and then produces absurd predictions in between. Huge coefficients, wild oscillations, perfect training error, terrible validation error.

Our only defense so far was caution: keep the degree low, use cross‑validation, start simple. That’s like saying “drive slowly.” It works, but it limits how far you can go.

Regularization is the seat belt that lets the model run fast without losing control.

1. The real problem: coefficients that explode

When a model tries to chase noise, it needs to make sharp bends. Sharp bends require huge coefficients.

Typical symptoms:

coefficients with massive magnitudes
alternating signs
violent oscillations between data points
training error drops
validation error skyrockets

If we control the coefficients, we control overfitting.

2. The key idea: penalize complexity

Until now, the model had one goal:

Minimize the error (MSE)

Regularization adds a second goal:

Keep coefficients small

The new cost function is:

J(β)=error+λ⋅penalty

The first term wants to fit the data.
The second term wants to keep the model humble.

λ (lambda) is the dial that controls how strict we are.

Metaphor:

“I won’t forbid you from using x² or x³… but I’ll charge you for exaggerating.”

3. Ridge Regression (L2): smooth without removing

Ridge penalizes the sum of squared coefficients:

λ∑βj2

Effect:

big coefficients pay a heavy price
small coefficients pay a small price
all coefficients shrink, but none go to zero

Result:

smoother curves
no wild oscillations
all features remain in the model

Metaphor:

“Ridge is adding shock absorbers: the car stays powerful, but stops bouncing on every bump.”

4. Lasso Regression (L1): keep what matters, drop the rest

Lasso penalizes the sum of absolute values:

λ∑∣βj∣

Effect:

constant pressure toward zero
if a coefficient contributes little, it becomes exactly zero
the model performs automatic feature selection

Result:

smooth curves
simpler equations
fewer terms, more interpretability

Metaphor:

“Lasso is cleaning a closet: what you don’t use, goes out.”

5. Ridge vs Lasso (quick summary)

Method	What it does	Best use case
Ridge (L2)	Shrinks coefficients	When most features matter
Lasso (L1)	Eliminates coefficients	When only a few features matter
Elastic Net	Mix of L1 + L2	When features are correlated

6. The role of λ: the discipline dial

λ = 0 → no regularization → chaos
small λ → slight smoothing
optimal λ → perfect balance
large λ → overly rigid model → underfitting

We choose λ using cross‑validation, just like we choose polynomial degree.

7. How training works (light version)

Gradient descent stays the same. Only the derivatives change:

Ridge:

∂J∂βj=usual gradient+2λβj

→ gentle pull toward zero.

Lasso:

∂J∂βj=usual gradient+λ⋅sign(βj)

→ constant push until the coefficient hits zero.

That’s all you need for intuition.

8. A visual example (conceptual)

Imagine the same degree‑10 polynomial:

No regularization: chaotic curve, huge coefficients.
Ridge: smooth curve, small coefficients.
Lasso: smooth curve, simple equation (only x and x² survive).

Three treatments, three outcomes. Only Ridge and Lasso tame the chaos.

9. When to use regularization

Use regularization when:

validation error ≫ training error
coefficients are huge
you have many features
you use high‑degree polynomials
features are correlated
the model is unstable

Skip regularization when:

you have lots of data and few features
the model already generalizes well
the problem is lack of expressiveness, not excess

10. In summary

Overfitting happens when coefficients explode.
Regularization penalizes that explosion.
Ridge shrinks coefficients.
Lasso eliminates coefficients.
λ controls how strict we are.
Training still uses gradient descent.
Regularization is the seat belt for flexible models.

Machine Learning, AI, Artificial Intelligence, General, IA, Inteligencia Artificial

| Tags: AI, artificial, inteligencia, Inteligencia artificial, learning, machine, Machine Learning

De On-Premise a la Nube in English

Ridge and Lasso Explained: How Regularization Tames Overfitting in Machine Learning (Part 6)

Regression (Part 6): Ridge & Lasso — Taming Overfitting

When your model has too much freedom, the solution isn’t to take it away — it’s to charge for using it.

1. The real problem: coefficients that explode

2. The key idea: penalize complexity

3. Ridge Regression (L2): smooth without removing

4. Lasso Regression (L1): keep what matters, drop the rest

5. Ridge vs Lasso (quick summary)

6. The role of λ: the discipline dial

7. How training works (light version)

Ridge:

Lasso:

8. A visual example (conceptual)

9. When to use regularization

10. In summary

Deja una respuesta Cancelar la respuesta