
The document, co-authored by founder Liang Wenfeng, introduces a framework called Manifold-Constrained Hyper-Connections.
It’s designed to improve scalability while reducing the computational and energy demands of training advanced AI systems, according to the authors.
Such publications from DeepSeek have foreshadowed the release of major models in the past.
The Hangzhou-based startup stunned the industry with the R1 reasoning model a year ago, developed at a fraction of the cost of its Silicon Valley rivals.
DeepSeek has since released several smaller platforms, but anticipation is mounting for its next flagship system, widely dubbed the R2, expected around the Spring Festival in February.
Chinese startups continue to operate under significant constraints, with the US preventing access to the most advanced semiconductors essential to developing and running AI.
Those restrictions have forced researchers to pursue unconventional methods and architectures.
What Bloomberg Intelligence says
DeepSeek’s forthcoming R2 model – which could launch in the next few months – has potential to upend the global AI sector again, despite Google’s recent gains.
“Google’s Gemini 3 model overtook OpenAI in November to claim a top-3 slot in LiveBench’s ranking of global large language model (LLM) performance.
“China’s low-cost models, which are developed at a fraction of the cost of competitors, claimed two slots in the top-15,” said analysts Robert Lea and Jasmine Lyu.
DeepSeek, known for its unorthodox innovations, published its latest paper this week through the open repository arXiv and open-source platform Hugging Face.
The paper lists 19 authors, with Liang’s name appearing last.
The founder, who’s consistently steered DeepSeek’s research agenda, has pushed his team to rethink how large-scale AI systems are conceived and built.
The latest research addresses challenges such as training instability and limited scalability, noting that the new method incorporates “rigorous infrastructure optimisation to ensure efficiency”.
Tests were conducted on models ranging from 3 billion to 27 billion parameters, building on ByteDance Ltd’s 2024 research into hyper-connection architectures.
“The technique holds promise “for the evolution of foundational models,” the authors said.