How DeepSeek’s new way of training advanced AI models could disrupt everything – again

by
0 comments
How DeepSeek's new way of training advanced AI models could disrupt everything – again

Flavio Coelho/Moment via Getty

Follow ZDNET: Add us as a favorite source On Google.


ZDNET Highlights

  • DeepSeek introduces manifold-constrained hyper-connections, or mHC
  • They provide a way to enhance LLM without incurring huge costs.
  • The company postponed the release of its R2 model to mid-2025.

Just before the start of the new year, the AI ​​world was introduced to a potentially game-changing new method for training advanced models.

A team of researchers from Chinese AI firm DeepSeek released a paper on Wednesday describing what it calls manifold-constrained hyper-connections, or mHC in short, which can provide a path for engineers to build and scale large language models without the enormous computational costs typically required.

Also: Is DeepSeek’s new model the latest setback for proprietary AI?

DeepSeek came into the cultural spotlight a year ago with the release of R1, a model that rivaled the capabilities of OpenAI’s O1 and which was reportedly trained at a fraction of the cost. The release came as a shock to US-based tech developers, as it showed that access to vast reserves of capital and computing resources was not necessary to train cutting-edge AI models.

New mThe HC paper may become the technical framework for DeepSeek’s upcoming model, the R2, which was expected to be released mid-last year but was postponed. Allegedly Due to China’s limited access to advanced AI chips and concerns from company CEO Liang Wenfeng about the model’s performance.

challenge

Posted on the preprint server site arXiv, a popular online resource where researchers can share study results that have not yet been peer-reviewed, DeepSeq’s new paper is an attempt to bridge a complex and significant technological gap that hinders the scalability of AI models.

Also: Mistral’s latest open-source release bets on smaller models more than larger ones – here’s why

LLMs are built on neural networks, which in turn are designed to preserve signals across multiple layers. The problem is that as more layers are added, the more attenuated or distorted the signal can become, and the greater the risk of it turning into noise. It’s a bit like playing a game of telephone: the more people join in, the more likely it is that the original message will get confused and changed.

The main challenge, then, is to create models that can preserve their signals across as many layers as possible – or “better optimize the trade-off between plasticity and stability,” as the DeepSeek researchers describe it in their new paper.

Solution

The authors of the new paper – which includes DeepSeek CEO Liang Wenfeng – were leading hyper-connectionor HCs, a framework introduced in 2024 by researchers at ByteDance, which diversifies the number of channels through which layers of neural networks can share information with each other. However, HCs introduce the risk that the original signal is lost in translation. (Again, think about connecting as many people as possible to a game of telephone.) They also come with high memory costs, making them difficult to implement on a large scale.

Also: DeepSeek AI could be about to shake up the world again – as we know it

mHC architectures aim to solve this by limiting hyperconnectivity within a model, thereby preserving the informational complexity enabled by HC while bypassing the memory problem. This, in turn, could allow the training of highly complex models in a way that could be practical and scalable even for smaller, more cash-strapped developers.

why it matters

Just like with the release of R1 in January 2025, the beginning of mThe HC framework may indicate a new direction for the development of AI.

So far in the AI ​​race, the prevailing wisdom has been that only the biggest, deepest-pocketed companies can afford to build frontier models. But DeepSeek has consistently shown that solutions are possible, and success can only be achieved through clever engineering.

The fact that the company has published its new research mThe HC method means it can be widely adopted by smaller developers, especially if it is being used by the highly anticipated R2 model (whose release date has not been officially announced).

Related Articles

Leave a Comment