18 mayo 2026

Nested Learning: Google’s New Paradigm That Could Redefine the Future of LLMs

Making room in the ML series: what Google just unveiled could be the future of LLMs

Google introduces Nested Learning, a new paradigm that could be the conceptual successor to Transformers.

Today we’re pausing our series on Machine Learning, Deep Learning, and LLMs.

Not to go off track, but because something has happened that fits perfectly with where we are: a breakthrough that could define the next decade of AI.

The context: from «Attention Is All You Need» to «Nested Learning»

In 2017, Google Research published Attention Is All You Need.

That paper changed history: it gave birth to Transformers, the architecture that underpins absolutely all modern generative AI. ChatGPT, Claude, Gemini, Llama… they’re all Transformers.

Eight years later, in November 2025, the same team makes its move again.

And what they’re proposing isn’t an incremental improvement, but a new paradigm: Nested Learning.

An approach designed to tackle one of the biggest problems with current LLMs: catastrophic forgetting.

The problem Nested Learning aims to solve

Current LLMs are impressive, but they have a critical structural limitation:

They can’t learn new things without forgetting part of what came before.

They’re static models, frozen after training. If you want GPT-4 to know something that happened yesterday, you can’t simply «teach it.» You have to retrain it from scratch or use workarounds like RAG (Retrieval-Augmented Generation).

This isn’t a bug. It’s a direct consequence of how Transformers are designed.

And it’s exactly what Nested Learning proposes to change.

The mental «click»: different learning speeds within the same model

In a traditional Transformer:

All layers learn at the same speed
All updates happen at every training step
There’s no notion of memory with different rhythms

Nested Learning breaks that paradigm.

Imagine a model where:

Some parts learn very fast → short-term memory, adapted to current context
Others learn very slowly → stable knowledge, long-term memory
And others operate at intermediate frequencies → semantic knowledge, recurring patterns

This looks much more like how human neuroplasticity works: waves, rhythms, layers of memory that update at different speeds.

The key lies in the conceptual proposal:

Architecture and optimization aren’t separate things. They’re the same process operating at different scales.

Instead of viewing a model as a single block learning at a fixed speed, Nested Learning conceives it as a system of nested optimizations, each with its own update frequency.

«Hope»: the experimental architecture that proves the concept

To test the idea, Google didn’t stop at theory. They built an experimental architecture called Hope.

Hope implements a Continuous Memory System (CMS):

There’s not just «short memory» vs «long memory»
There’s a full spectrum of memory modules
Each updates at its own frequency

The results are promising

Hope outperforms standard Transformers in:

General language modeling → better perplexity
Long-context tasks → handles extended conversations and documents with more coherence
Needle-In-A-Haystack (NIAH) tests → finds specific information in massive contexts
Memory efficiency → retains relevant information without exploding in size

In other words: Hope remembers better, for longer, and at lower cost.

My take: Google hits the accelerator in the AI race

I’ll be honest: in recent months it seemed like China was leading the race, especially with advances like:

DeepSeek and its ultra-efficient MoE (Mixture of Experts)
DSA (Dynamic Sparse Attention) architectures
Open-source models rivaling proprietary ones

But this Google paper changes the tone.

Nested Learning is not:

A patch
An optimization
An engineering trick
A bigger model

It’s a foundational proposal.

And that matters for three reasons:

1. It’s a path beyond brute force

Until now, the recipe was simple:

More data → more parameters → more GPUs

Nested Learning proposes something different: smarter models, not just bigger ones.

2. It opens the door to models that learn in real-time

Imagine an AI that:

Reads today’s news
Integrates that knowledge into its memory
Adapts without massive retraining

This is a game changer.

Today, if you want an LLM to know something new, you have to:

Retrain it (expensive, slow)
Use RAG (limited, fragile)
Wait for the next version

With Nested Learning, the model could learn continuously, like we do.

3. It’s a step toward closing the gap with the human brain

Not in size, but in learning dynamics.

The brain doesn’t learn everything at the same speed. Some memories are fixed quickly (where you left your keys), others take years to consolidate (your native language).

Nested Learning is the first serious attempt to replicate that structure in AI.

What does this mean for the future of LLMs?

It’s too early to say that Nested Learning will be the successor to Transformers.

But it’s undoubtedly the most serious and ambitious proposal since 2017.

If it scales, we could see:

LLMs that update in real-time without retraining
Smaller but more capable models thanks to efficient memory
AI that learns from interaction with users, not just from the initial corpus

And that’s why we’re making room in the series today: because understanding the future of LLMs is also part of understanding the present.

«If 2017 was the year of Transformers, 2025 could be remembered as the year their successor began.»

Reference links

For those who want to dive deeper, here are the direct links to the original material:

🔗 Google Research Blog — research.google

🔗 Paper NeurIPS 2025 — arxiv.org

🔗 Analysis article — thedavestack.com

TL;DR

Google proposes Nested Learning, a new training paradigm
Tackles catastrophic forgetting with continuous multi-frequency memory
The experimental architecture Hope outperforms Transformers on long-context tasks
It’s the most serious proposal toward adaptive LLMs since «Attention Is All You Need»
Could change how we train and deploy AI models

Machine Learning, AI, Artificial Intelligence, Deep Learning, General, IA, Inteligencia Artificial

| Tags: AI, artificial, ia, inteligencia, Inteligencia artificial, learning, machine, Machine Learning

De On-Premise a la Nube in English

Nested Learning: Google’s New Paradigm That Could Redefine the Future of LLMs

Nested Learning: Google’s New Paradigm That Could Redefine the Future of LLMs

Making room in the ML series: what Google just unveiled could be the future of LLMs

The context: from «Attention Is All You Need» to «Nested Learning»

The problem Nested Learning aims to solve

The mental «click»: different learning speeds within the same model

«Hope»: the experimental architecture that proves the concept

The results are promising

My take: Google hits the accelerator in the AI race

1. It’s a path beyond brute force

2. It opens the door to models that learn in real-time

3. It’s a step toward closing the gap with the human brain

What does this mean for the future of LLMs?

Reference links

TL;DR

Deja una respuesta Cancelar la respuesta