Big AI developers like OpenAI, Google’s Gemini team, and the folks behind Microsoft Copilot have massive data centers at their disposal for AI workloads. Thanks to the work of a team of developers, a new software could allow you to run your own AI cluster at home using your existing smartphones, tablets, and computers.
The experimental exo software splits up your Large Language Model (LLM) to use some or all of your computing devices at home to run your personal chatbot or other AI project. This can include your Android phones and tablets, as well as computers running macOS or Linux.
The result of this is allowing you to use your various devices together to appear like one powerful GPU to the AI model. The developer showed off a demo of the software running Llama-3-70B at home using an iPhone 15 Pro Max, an iPad Pro M4, a Galaxy S24 Ultra, an M2 MacBook Pro, an M3 MacBook Pro, and two MSI Nvidia GTX 4090 graphics cards.
The exo software is compatible with Llama and other popular AI models. It also includes, through a one-line change in the application, a ChatGPT-compatible API for running models. You only need your compatible devices running Python 3.12.0 or higher to compile and run the software.
Running Llama-3-70B at home with @exolabs_Combines the compute of all these devices to make one big GPU:- iPhone 15 Pro Max- iPad Pro M4- Galaxy S24 Ultra- MacBook Pro M2 and M3 Pro- 2 x MSI NVIDIA GeForce RTX 4090 SUPRIMCode is open source 👇 pic.twitter.com/bFfwYIRCJIJuly 15, 2024
Once compiled and running, exo automatically discovers devices on your network to include in the cluster. It provides device equality using peer-to-peer connections. While exo supports various partitioning strategies to distribute the work across devices, it defaults to a ring memory-weighted scheme that allocates the workload based on how much memory each device has.
The exo software also supports iOS, but the developers say the code needs some work to be ready for mainstream use. It’s pulled the iOS version but will allow access to it for those who email the lead developer.
The developers also plan further refinement and features. They also have a bounty program for helping add new features and compatibility. As of this writing, these included support for LLaVa, batched requests, a radio networking module, and pipeline parallel inference support. In its current state, it already looks like a cool project to experiment with.