Since OpenAI kicked off the generative AI race in November 2022 with the release of ChatGPT, large language models (LLMs) have skyrocketed in popularity.
These systems can offer capabilities like text generation, code completion, translation, article summaries, and more, and companies across the board are finding new use cases for generative AI.
But what if you want to harness the power of LLMs without the high licensing fees or restrictions of using a closed model?
Well, in that case, a large open source language model might be worth considering, and there's certainly no shortage of options now.
What is a large open source language model?
An open source LLM differs from a proprietary (or closed source) model in that it is publicly available for organizations to use, although sometimes there may be some limitations, so be careful.
There are a number of benefits to open source LLMs other than the fact that they are publicly available for use right now. In particular, cost savings have frequently been touted as a key factor in some companies' decisions to use these models.
Likewise, flexibility is another advantage here. Instead of relying on a single vendor, open source models allow business users to leverage multiple vendors.
Maintenance, updates, and development can also be handled by internal teams, as long as you have the internal capabilities to do so.
This article will delve into three of the most notable options for open source LLM on the market right now, spanning models released by major vendors like Meta and Mistral, and a newcomer to the scene.
Meta's first foray into a public AI model came with LLaMA (Large Language Model Meta AI) in February 2023, its initial open source 65B parameter LLM.
In July 2023, in partnership with Microsoft, Meta announced the second version of its open source flagship model, Llama 2, with three model sizes featuring 7, 13, and 70 billion parameters.
This included both the underlying basic model and enhanced versions of each model designed for conversational agent interfaces billed as Llama 2 Chat.
Each version of Llama 2 was trained using an offline dataset between January and July 2023.
Llama 2 was announced as the first free competitor to ChatGPT and is available through an application programming interface (API), allowing organizations to quickly integrate the model into their IT infrastructure.
Additionally, Llama 2 is also available through the AWS or Hugging Face platforms to meet a broader range of use cases in the open source community.
However, some experts questioned the open source credentials of Meta's model, as applicants with products or services that have more than 700 million monthly active users will need to approach Meta directly for permission to use the model.
Open source data science and machine learning platform Hugging Face claimed that the Llama 2 7B Chat model outperformed open source chat models in most benchmarks they tested.
Mixtral 8x7B (Mistral AI)
French artificial intelligence startup Mistral AI has been a hot property in the AI space since it was launched in April 2023 by former Google Deepmind researcher Arthur Mensch.
Mistral AI closed a major Series A funding round in December 2023, securing $415 million and raising the company's value to nearly $2 billion just eight months after its inception.
The company launched Mistral 7B, a 7 billion parameter large language model in September 2023 via Hugging Face.
The model was released under the Apache 2.0 license, a permissive software license that means the software has only minimal restrictions on how it can be used by third parties.
The launch of Mistral 7B was accompanied by some pretty bold claims from Mistral AI, claiming that it could outperform the 13 billion parameter version of Meta's Llama 2 in all benchmarks.
Mistral 7B uses clustered query attention (GQA) to achieve faster inference functionality, as well as sliding window attention (SWA) to handle longer sequences at lower costs.
In a head-to-head matchup with Meta's 13 billion-parameter Llama 2 model, platform-as-a-service (PaaS) company E2E Cloud said Mistral 7B's comparable performance relative to larger Meta models indicates better memory efficiency and better performance. of the French model.
Shortly after the launch of the Mistral 7B, the French AI specialists announced their second model, the Mixtral 8x7B.
Mixtral 8x7B uses a sparse Mixture of Experts (MoE) architecture composed of eight expert layers or neural networks.
The advantage of this approach is that MoEs allow models to be pretrained much faster than their denser feedback network (FFN) layer counterparts.
The performance of Mixtral 8x7B is also a step up from its predecessor in terms of inference, with faster inference capabilities compared to a traditional model with the same number of parameters.
However, this architecture has some drawbacks: the eight expert neural networks must be loaded into memory. This means that Mixtral 8x7B will require high levels of VRAM to run.
Tuning models with a MoE approach can also be difficult, as the architecture is prone to overfitting. This means that you may have difficulty deviating from your training data when presented with new data.
Despite this, there is reason to be optimistic, as a new tuning method known as instruction tuning could address some of these concerns.
Smaug-72B (AI Abacus)
American startup Abacus.AI launched its 72 billion benchmark in February 2024, and its impressive benchmark performance generated a lot of excitement in the machine learning community.
Smaug-72B, an improved version of the Qwen-72B model, was the first and only open source model to post an average score of 80+ in major LLM assessments.
Smaug-72B outperformed a number of the most powerful proprietary models in Hugging Face tests in massively multitasking language understanding (MMLU), mathematical reasoning, and common sense reasoning.
This included OpenAI's GPT-3.5 and Mistral's closed-source Medium model.
As things stand, Smag-72B rules the top spot in Hugging Face's LLM open rankings.
Interestingly, the Smaug-72B outperforms the model it was based on, the Qwen-72B. According to Abacus, this is due to a new adjustment technique, DPO-Positive, which addresses several of the weaknesses of previous LLMs.
In a research paper published in February 2024, Abacus.AI engineers described how they designed a new loss function and training procedure that avoids a common failure mode associated with the Direct Preference Optimization (DPO) training method. ).