The open source ecosystem has long been the backbone of the global technology industry, and in the era of generative AI, the situation is no different. Some of the most impressive models out there are open source, like Mistral AI and Meta's Llama.
With the AI industry growing at an astonishing rate, the open source community is well positioned to contribute and guide whatever this next generation of technology brings.
In a press conference at KubeCon 2024, Jim Zemlin, CEO of the Linux Foundation, touted the wide range of areas where open source can help in AI development.
“It might be easier to think about the goal of open source more broadly in generative AI if you look at it from a holistic perspective,” Zemlin said.
Zemlin worked his way from the CPU level to the data level of the relationship between open source and generative AI, pointing out the notable strides open source is making along the way.
At the basic level of computing, the Linux Foundation has created the 'Unified Acceleration Foundation' to look a little more closely at the role of open source and GPUs, while also mentioning the push for open source at the basic model level .
Perhaps most notably, Zemlin said he believes open source could be the answer to some of AI's most pertinent problems, such as hallucinations, security risks, and the distinction between real and AI-generated content.
“Sometimes the answer to technology problems is more technology, and a lot of people are skeptical about that,” Zemlin said. “But in this case, I think it's true.”
“If you look at some of the things around big language models, around AI security, I think this is an area where we're seeing some good starts,” Zemlin said.
Speaking about specific areas where the open source community could help develop tools to track issues, Zemlin said there are already projects underway to help developers “unlearn” in an attempt to fine-tune AI models.
“We are already seeing some of these tools in our Linux Foundation big data AI project,” he added.
Zemlin also drew attention to open source's commitment to the Coalition for Content Provenance Authority (C2PA), a project that builds on the efforts of the Content Authenticity Initiative (CAI) to establish a framework for identifying content. generated by AI.
The open source ecosystem can do more to support AI development
Zemlin cautioned, however, that the open source community can be more proactive about AI development. The ecosystem should be more vocal about the role it can play in underpinning safe and responsible development, she suggested.
“[There’s] a real opportunity for open source to do more,” he said.
speaking to ITPro At the conference, Oleksandr Matvitskyy, senior director analyst at Gartner, echoed Zemlin's comments about the role of open source in the future of generative AI development.
Closer collaboration with the ecosystem and ensuring that open source development is prioritized should be a key focus for companies, regulators and governments moving forward, Matvitskyy said.
“I think you can do anything with open source,” he told ITPro.
“Believe [it] It has to be the priority of all governments and all regulators to ensure that AI remains open source,” Matvitskyy added.
Prevalence of private data could hinder progress
However, there are obstacles standing in the way of open source AI development approaches, specifically when it comes to AI training. The past 18 months have been plagued by cases of hallucinations and safety concerns.
Matvitskyy noted that these problems are particularly visible in the operation of AI models.
“They are still amazed at their results,” Matvitskyy said, “they have no data to learn from: everything is private, everything is protected.”
Companies often hoard their data, limiting the amount of open data available for AI training, which, fundamentally, is the only way AI models will be developed beyond their current level of complexity.
Mattvitskyy said that about 60% of the data kept by companies is probably “not really important” and could be made public for training AI models.
“They should be open and companies should get money… for innovation, for what they actually do, not for what they created thirty years ago,” Matvitskyy said.