The AI company PrimeIntellect recently started training a new 10 billion parameter model, a task it hopes to complete using the help of users around the world.
On its blog, PrimeIntellect said its new model, INTELLECT-1, will be the product of the first decentralized training run of a model of this scale. However, this still falls significantly short of even four-year-old models like OpenAI’s GPT-3 which featured over 175 billion parameters.
This project started out with research on the open-source implementation and scaling of globally distributed AI model training. The method worked for a model 1 billion parameters large and now the next step is to scale it up by a factor of ten.
Size isn't everything though. Newer models like Microsoft's Phi and Meta's Llama are proving you can achieve GPT-3 and even GPT-4 level performance with a fraction of the parameters through efficiency improvements.
The company’s goal is to find a way to make decentralized training a reality to ensure that the next generation of AI, artificial general intelligence (AGI), is open-source, transparent, and accessible. This reduces the risk of only a few large companies having access to this advanced technology.
For now, users can only contribute to the project through the company’s own platform. You can do this by renting GPUs that PrimeIntellect selected, specifically NVIDIA’s H100 Tensor Core GPU, which cost around $20 per hour to run. But in the future, you should be able to contribute to the model’s training with your own hardware.
The training is made possible through separate clusters of devices that process data to train the AI model. However, new features allow the different clusters to communicate less frequently with each other to synchronize their progress, thus freeing up bandwidth requirements. The training framework can also handle nodes joining or leaving without leading to system crashes.
Nodes that join training that has already started would need to be brought up to speed with the latest state of the model before being able to contribute. Delays with this catching-up process have been solved by having new nodes request checkpoints from their peers.
What happens next for INTELLECT-1
INTELLECT-1 is based on the Llama-3 architecture and is being trained on four different datasets. It’s mainly training on a Hugging Face dataset called FineWeb-Edu which contains content from educational web pages.
In the future, PrimeIntellect wants to train even larger models and create ways for anyone to create their own similar AI model training project to which other users can also contribute their processing power.