Get all your news in one place.
100's of premium titles.
One app.
Start reading
Tom’s Hardware
Tom’s Hardware
Technology
Luke James

Huawei-led team claims it post-trained DeepSeek's 1.6-trillion-parameter model — 1,000 Ascend 910C chips used in training

The DeepSeek logo against a hexagonal textured background.

A research group that includes Huawei Technologies says it completed full-parameter post-training of DeepSeek's V4-Pro, a 1.6-trillion-parameter model. The group used a cluster of at least 1,000 Huawei Ascend 910C chips, according to the Shenzhen municipal government, as reported by the South China Morning Post.

The revelation is evidence that Chinese accelerators can now handle a training-class workload on domestic silicon, the part of the AI pipeline Chinese firms have had the most trouble moving off Nvidia hardware under U.S. export controls. Huawei carried out the work with the Shenzhen Loop Area Institute, the Shenzhen campus of Harbin Institute of Technology, and the Shenzhen Research Institute of Big Data.

The Ascend 910C is Huawei's current flagship AI accelerator, a dual-die part that returned roughly 60% of an Nvidia H100's inference performance in earlier DeepSeek testing. Chinese chips have been competitive at inference, where a finished model answers prompts, but weak at training, where a model's weights are recalculated across large datasets. The team says it ran full-parameter post-training, meaning every weight was updated rather than a thin adapter layer added on top.

Post-training is essentially the “tuning” stage that follows the much larger pre-training phase. Pre-training builds a model's core capabilities by working through enormous text corpora, and DeepSeek's documentation puts V4-Pro's pre-training corpus at more than 32 trillion tokens.

Go deeper with TH Premium: AI and data centers

(Image credit: Microsoft)

Post-training then shapes behavior through instruction-following, safety alignment, and task-specific data. Completing it on Ascend silicon is a genuine result for the platform, but it doesn’t demonstrate that the chips can pre-train a frontier model from scratch, which is the heavier and costlier job.

Back in August, it was reported that DeepSeek couldn’t complete a single successful training run for its R2 model in Ascend chips, even with Huawei engineers on site, blaming unstable performance, slow chip-to-chip interconnects, and gaps in Huawei's CANN software stack, its substitute for Nvidia's CUDA. The company fell back on Nvidia GPUs for training and left Ascend on inference. DeepSeek-V4-Pro, released in April, was the first DeepSeek model built around Ascend from the outset.

As for the claim coming out of Shenzen, it carries no benchmarks, gives no figure for how long the run took, how it compared to the same job on Nvidia hardware, or how efficiently the 1,000-chip cluster was used. It’s ultimately just another addition to a series of dubious claims that have come from the Chinese state without anything to back them up; DeepSeek itself hasn’t commented.

Sign up to read this article
Read news from 100's of titles, curated specifically for you.
Already a member? Sign in here
Related Stories
Top stories on inkl right now
One subscription that gives you access to news from hundreds of sites
Already a member? Sign in here
Our Picks
Fourteen days free
Download the app
One app. One membership.
100+ trusted global sources.