The open source AI boom builds on the handouts of Big Tech. How long will it last?

Stability AI’s first release, the Stable Diffusion text-to-image model, performed just as well, if not better, than closed equivalents like Google’s Imagen and OpenAI’s DALL-E. Not only was it free, but it also ran on a good home computer. Stable Diffusion did more than any other model to spark the explosion of open source development around AI for imaging last year.
MITTR| GETTY
This time, however, Mostaque wants to manage expectations: StableLM doesn’t come close to GPT-4. « There’s still a lot of work to do, » he says. “It’s not like Stable Diffusion, where you immediately have something that’s super usable. Language models are harder to train.”
Another problem is that models are harder to train the older they get. This doesn’t just depend on the cost of computing power. The training process stops more often with larger models and has to be restarted, making these models even more expensive to build.
There’s basically an upper limit to the number of metrics most groups can afford to train, says Biderman. This is because large models need to be trained on multiple different GPUs, and connecting all the hardware together is tricky. “Successfully training models at that scale is a very new field of high-performance computing research,” he says.
The exact number changes as technology advances, but right now Biderman places that limit roughly in the range of 6 to 10 billion metrics. (By comparison, GPT-3 has 175 billion parameters; LLaMA has 65 billion.) It’s not an exact correlation, but in general, larger models tend to perform much better.
Biderman expects the flurry of activity around open source large language models to continue. But he will be focused on extending or adapting some existing pre-trained models rather than advancing the core technology. “There are only a handful of organizations that have pre-trained these models and I predict it will remain that way for the foreseeable future,” he says.
That’s why many open source models are built on LLaMA, which was trained from the ground up by Meta AI, or versions by EleutherAI, a nonprofit organization unique in its contribution to open source technology. Biderman says he knows of only one other similar group, and it’s in China.
EleutherAI started with OpenAI. Fast forward to 2020 and the San Francisco-based company had just released a great new model. “GPT-3 was a big shift for a lot of people in how they thought about AI at scale,” Biderman says. « It’s often credited as an intellectual paradigm shift in terms of what people expect from these models. »