In the field of artificial intelligence, one of the most transformative advancements has been the emergence of Large Language Models (LLMs).
Large language models are machine learning models for language-related tasks such as translation, question-answering, summarization, content, code generation, and more.
Typically based on Google's seminal paper on the transformer architecture in 2017, these massive neural networks are trained on vast amounts of text data (i.e. sometimes the entire internet), and the release of LLM-enabled consumer apps like ChatGPT have brought an undeniable paradigm shift in everyday use of AI.
Although still a relatively young field, the past few months and years have seen a Cambrian explosion of activity related to LLM development.
While many AI startups are focused on building on top of LLM APIs (otherwise known as layer 2), in this guide we'll focus on the top LLM developer companies that are building foundational models.
Discover 1500+ Recently Funded AI Startups
Want to discover more recently funded startups? Check out our list of 1500+ AI startups below:
Of course, the top of the LLM developer list goes to OpenAI as they've likely had the largest overall impact on the space to date, bringing LLMs to the masses with the release of ChatGPT.
OpenAI describes itself as an AI research and deployment company that aims to ensure that artificial general intelligence (AGI) benefits all of humanity. They are one of the most well-funded private companies in the LLM developer space, having raised over $12 billion+ in equity, more recently raising a massive $10B partnership round from Microsoft.
OpenAI has developed several notable language models, including the GPT-3.5 and GPT-4 series.
- The GPT-3.5 model is optimized for chat and can be used for traditional completion tasks.
- The GPT-4 model is an improvement over the GPT-3.5 model and can solve difficult problems with greater accuracy. It is more reliable, creative, and able to handle much more nuanced instructions than its predecessor.
Anthropic is another key AI safety and research company based in San Francisco. In addition to their research, Anthropic has developed notable LLM models that have been trained with reinforcement learning from human feedback (RLHF). Their recent research shows that simple prompting approaches can help these LLM models produce less harmful outputs.
Anthropic’s main product is Claude, a next-generation AI assistant based on their research into training helpful, honest, and harmless AI systems.
- Claude is capable of a wide variety of conversational and text processing tasks while maintaining a high degree of reliability and predictability.
- It can be accessed through a chat interface and API in their developer console.
- Claude has recently been updated to Claude 2, which has improved performance, longer responses, and can be accessed via API as well as a new public-facing beta website.
Amazon has announced a strategic collaboration with Anthropic in which they will invest up to $4 billion in Anthropic and have a minority ownership position in the company. Amazon developers and engineers will be able to build with Anthropic models via Amazon Bedrock so they can incorporate generative AI capabilities into their work, enhance existing applications, and create net-new customer experiences across Amazon’s businesses.
Meta is another company that is committed to advancing the state-of-the-art in AI through fundamental and applied research. Most notably, they released Llama 2, their open-source LLM, for research and commercial use. Llama 2 is state-of-the-art for publicly available LLMs on coding tasks. Meta’s research in NLLB is also being applied to translation systems used by Wikipedia editors.
Meta’s commitment to open source, cross-collaboration, and innovation is reflected in their release of Llama 2, which offers a unique opportunity for developers . They are also working on a new AI model that will make content available in hundreds of languages. This model will be used to translate content and serve up better ads, as well as spot harmful content and misinformation
Amazon has been making strides in the field of Large Language Models with its Amazon Titan foundation models (FMs). These models are pre-trained on large datasets and are built to support a variety of use cases such as text generation, summarization, semantic search, retrieval augmented generation, code generation, table creation, data formatting, paraphrasing, chain of thought, rewrite, extraction, Q&A, and chat.
- Amazon Titan offers a range of FMs that cater to different needs. For instance, Titan Text Express is a LLM that offers a balance of price and performance. It supports over 100 languages and can generate up to 8K tokens.
- Titan Text Lite is another LLM that is affordable and compact. It is ideal for basic tasks and fine-tuning
Amazon has also released Bedrock, a set of APIs that provide access to a variety of AI tools created by Amazon. Bedrock includes Titan models that support the development and scaling of generative AI applications.
As you can imagine, Google has been a pioneer in the field of LLMs and has made significant strides in this area. Although OpenAI beat them to market with ChatGPT, they quickly released a competitor called Bard.
Bard is powered by their most recent LLM: PaLM 2, which excels at advanced reasoning tasks, including code and math, classification and question answering, translation and multilingual proficiency, and natural language generation.
Google has also opened some of its AI-powered applications to developers, introducing the Pathways Language Model (PaLM) API for language models and the Makersuite prototyping tool within it. Additionally, Google Cloud offers a range of AI tools that can be used to build LLMs, including Google Cloud AutoML Natural Language, which enables developers to train custom machine learning models for natural language processing tasks.
Microsoft is another major tech company that has been at the forefront of developing and deploying LLMs into their applications.
The company has been working on several LLM-based projects, including the recent release of AutoGen, a framework for simplifying the orchestration, optimization, and automation of LLM workflows. AutoGen offers customizable and conversable agents that leverage the strongest capabilities of the most advanced LLMs, like GPT-4, while addressing their limitations by integrating with humans and tools and having conversations between multiple agents via automated chat.
Microsoft has also been working on LLMOps, a research initiative on fundamental research and technology for building AI products with foundation models, especially on the general technology for enabling AI capabilities with LLMs and Generative AI models. The company has also introduced LLM-Augmenter, which improves large language models with external knowledge and automated feedback.
In addition to these projects, Microsoft has been developing several LLM-based products such as GPT-3-powered Power Apps that can generate code based on natural language input. The company has also introduced Azure Machine Learning, which enables operationalizing and managing large language models using Azure ML.
Stability.ai is a company that specializes in developing open-source language models. One of their flagship products is Stable LM, a powerful Large Language Model with exceptional reasoning ability across varied benchmarks. It can be fine-tuned for specific use cases and excels in sentence auto-completion. The company’s researchers innovate rapidly and release open models that rank amongst the best in the industry. They have also developed Stable Code, two LLMs trained to create code using prompt descriptions and code completion, which can improve developer efficiency and solve programming puzzles.
Stability.ai aims to provide transparency, accessibility, and support to users. They have released their LLMs under the CC BY-SA license, which allows developers to use the model for research and inspection for commercial and research purposes. The company has also launched RLHF-tuned models for research use.
Contextual AI is another company that specializes in creating LLMs that are purpose-built for enterprises. The company was founded in 2023 by Douwe Kiela and Amanpreet Singh, who have been training sophisticated LLMs for much of their professional careers. They have advanced the state of the art through their well-cited research at places like Meta (Facebook AI Research), Hugging Face, and Stanford University.
Contextual AI’s goal is to develop AI solutions that are more suitable for companies than consumer-focused LLM offerings. To tackle the barriers of unstructured data processing and analysis, they are creating a new generation of LLMs that cater to specific enterprise needs. With the ability to customize models to each company’s individual data sources, they offer a secure, accurate, and efficient way to empower knowledge workers to do their work with efficiency.
EleutherAI is a non-profit organization that focuses on training and releasing large LLMs for open-source research applications. The organization has trained and released several LLMs, some of which were the largest or most capable available at the time. EleutherAI has also released the codebases used to train these models, which have been widely used in open-source research application.
EleutherAI’s key initiatives include training LLMs, evaluating advanced AI models in robust and reliable ways, and building LLMs and doing NLP in non-English languages. EleutherAI has also released a suite of LLMs specifically designed for research on interpretability and training dynamics called Pythia.
In October 2022, EleutherAI announced CarperAI, a lab that will release an open-source LLM explicitly trained to follow human instructions using reinforcement learning from human feedback.
DataBricks is a company that specializes in providing a unified analytics platform for data science teams. They offer a wide range of products and services that help organizations to accelerate innovation using LLMs.
Specifically, Databricks has developed a language model called Dolly 2.0 that was trained on a high-quality human-generated dataset called
databricks-dolly-15k. Dolly 2.0 is meant to be an example of how you can inexpensively and quickly train your own LLM.
MosaicML is a software infrastructure and AI training algorithm developer that aims to improve the efficiency of neural networks. The company’s application is designed to recompose machine learning models using algorithmic techniques such as sparsity and networking pruning, enabling users to efficiently and easily train large-scale AI models, in their secure environments, on their proprietary data.
MosaicML is best known for its family of MPT (Mosaic Pruning Transformer) models, which are generative language models that can be fine-tuned on a variety of natural language processing tasks. These models have been shown to achieve state-of-the-art performance on several benchmarks, including the GLUE benchmark.
In 2023, MosaicML was acquired by Databricks for $1.3 billion. The company had previously raised a total of $33.7 million in funding across two rounds.
AI21 Labs is an Israeli-based AI lab and product company that has been working on developing large language models that can rival OpenAI’s GPT-3.
- The largest version of its model, Jurassic-1 Jumbo, contains 178 billion parameters, making it larger than GPT-3.
- Jurassic-1 can recognize 250,000 lexical items including expressions, words, and phrases.
- The company has also developed a new system called the modular reasoning knowledge and language system (MRKL system) to augment the power of LLMs.
- AI21 Labs’ Studio platform lets developers experiment with the model in open beta to prototype applications like virtual agents and chatbots.
The company recently announced a raised $155 million series C financing round at a $1.4B valuation.
Cohere provides access to LLM and NLP tools through their API. Their LLMs are among the highest performing models as measured by Stanford University’s HELM benchmarks.
- Cohere’s Command model is a high-performing LLM that quickly and accurately generates text such as product descriptions, blog posts, and articles. It can also create concise, relevant, customizable summaries of text and documents.
- Cohere’s embeddings models can make applications understand the meaning of text data at scale, unlocking powerful semantic search, classification, and rerank capabilities.
- Cohere’s semantic search model is capable of generating natural language that can be used for interactive autocomplete, augmenting human writing processes, summarization, text rephrasing, and other text-to-text tasks in non-sensitive domains.
- Cohere’s classification models can classify text based on various parameters.
They also offer customizable models that can be fine-tuned to work for any use case, domain or industry.
Inflection AI is a company founded by Mustafa Suleyman, co-founder of DeepMind, and LinkedIn co-founder Reed Hoffman. Inflection’s LLM, which powers its conversational agent Pi, aims to create “personal AI for everyone”.
Inflection has raised $1.3 billion in new funding less than two months after the launch of their first chatbot Pi and is now valued at $4 billion. Investors include Microsoft, NVIDIA, Bill Gates, and former Google CEO Eric Schmidt.
Inflection’s LLM has been developed to rival Google and OpenAI’s LLMs. It is designed to improve human-computer interaction by addressing the problem that humans need to simplify the requests they make to computers and adjust the language to fit what machines can understand.
Together AI research focuses on creating leading open-source models that are transparent and controllable for human-AI interaction.
Together AI has developed a set of open-source foundation models and datasets called RedPajama. The models include Apache 2.0 licensed base, chat, and instruction-tuned models, and the largest-ever open pre-training dataset, which has been used to train over 100 models. The RedPajama models are available on Hugging Face.
Last but not least, Mistral AI is an early-stage company that specializes in developing LLMs.
Their first LLM, Mistral 7B v0.1, is trained on massive amounts of data that is able to generate coherent text and perform various natural language processing tasks.
- Mistral 7B outperforms Llama 2 13B across all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, and code generation.
- The raw model weights are downloadable from the documentation and on Hugging Face.
- Mistral AI provides a Docker image bundling vLLM, a fast Python inference server, with everything required to run their model. This allows users to quickly spin a completion API on any major cloud provider with NVIDIA GPUs.
Mistral AI also raised the largest seed round in European history, raising a massive $113M seed round at a $260M valuation.
Discover Recently Funded AI Startups
Want to discover more recently funded startups? Check out our list of 1500+ AI startups below: