Command-R: RAG at Production Scale
Excerpt
Command-R is a scalable generative model targeting RAG and Tool Use to enable production-scale AI for enterprise.
Today, we are introducing Command-R, a new LLM aimed at large-scale production workloads. Command-R targets the emerging “scalable” category of models that balance high efficiency with strong accuracy, enabling companies to move beyond proof of concept, and into production.
Command-R is a generative model optimized for long context tasks such as retrieval augmented generation (RAG)
Mar 11, 2024
Command-R: Retrieval Augmented Generation at Production Scale
Share:
Command-R is a scalable generative model targeting RAG and Tool Use to enable production-scale AI for enterprise.
Today, we are introducing Command-R, a new LLM aimed at large-scale production workloads. Command-R targets the emerging “scalable” category of models that balance high efficiency with strong accuracy, enabling companies to move beyond proof of concept, and into production.
Command-R is a generative model optimized for long context tasks such as retrieval augmented generation (RAG) and using external APIs and tools. It is designed to work in concert with our industry-leading Embed and Rerank models to provide best-in-class integration for RAG applications and excel at enterprise use cases. As a model built for companies to implement at scale, Command-R boasts:
- Strong accuracy on RAG and Tool Use
- Low latency, and high throughput
- Longer 128k context and lower pricing
- Strong capabilities across 10 key languages
- Model weights available on HuggingFace for research and evaluation
Command-R will be available immediately on Cohere’s hosted API, and on major cloud providers in the near future. In keeping with Cohere’s core principles, it maintains a focus on privacy and data security.
Command-R is the first in a series of model releases advancing capabilities crucial to enterprise adoption at scale. We’re excited to share more soon.
“Scale is working with a Fortune 500 enterprise customer to build a custom knowledge management application for their customer support team. Since Scale GenAI Platform’s test and evaluation tools are integrated with Cohere’s models, the customer was able to identify that Cohere’s solution would enable them to optimize TCO while still maintaining high-level performance, which are critical considerations in enterprise buying decisions.” - Arun C Murthy, Chief Product & Technology Officer, Scale
High Performance Retrieval Augmented Generation (RAG)
Retrieval augmented generation (RAG) has become a crucial pattern in the deployment of LLMs. RAG enables enterprises to give the model access to private knowledge that it otherwise would not have. By letting the model search over private databases and use that information to form responses, the accuracy and usefulness of the model changes dramatically. The key components to RAG are:
- Retrieval: Searching over corpora of information relevant to responding to a user.
- Augmented Generation: Using the information retrieved to form a more informed response.
Retrieval: Cohere’s Embed model significantly improves the usefulness and accuracy of the retrieval step by improving contextual and semantic understanding when searching across millions or, even billions, of documents. Meanwhile, Cohere’s Rerank model further helps to improve the value of the information retrieved, optimizing the results across custom metrics, such as relevance and personalization.
Augmented Generation: With the most relevant information identified, Command-R can summarize, analyze, package, and generally put that information to work in ways that help employees be more productive, or to create a magical new product experience. Unique to Command-R, the model’s outputs come with clear citations that mitigate the risk of hallucinations, and enable surfacing additional context from the source materials.
Even without leveraging Cohere’s Embed and Rerank models, Command-R outperforms others in the scalable category of generative models. When used together, the lead expands significantly, enabling higher performance in more complicated domains.
(left) Head-to-Head overall human preference evaluation between Command-R and Mixtral on a range of enterprise-relevant RAG applications, taking fluency, answer utility and citations into consideration. (right) Average accuracy of an end-to-end evaluation of the Natural Questions (Kwiatkowski et al. 2019), TriviaQA (Joshi et al. 2017), and HotpotQA (Yang et al. 2018) (single-retrieval) benchmarks using a KILT Wikipedia index (Petroni et al. 2020) for all models. We evaluated both a leading open source embedding model (gte-large (Li. et al 2023) and Cohere’s search stack with Command-R. Accuracy is calculated using the presence of keyphrases in the model’s answer.
LLMs should be core reasoning engines that can automate tasks and take real-world action, not just machines that ingest and generate text. Command-R achieves this with the ability to use tools (APIs), such as code interpreters and other user-defined tools that enable the model to automate highly sophisticated tasks.
Tool Use enables developers at enterprises to turn Command-R into an engine for powering the automation of tasks and workflows that require using internal infrastructure like databases and software tools, as well as external tools like CRMs, search engines, and more. This unlocks the automation of time-consuming and manual tasks that span multiple systems and require complex reasoning and decision making.
Tool Use is now available via our API, read more here.
Accuracy on 3-shot multi-hop REACT (Yao et al. 2022) agents powered by various models, retrieving from wikipedia (HotPotQA) and the internet (Bamboogle). Accuracy is calculated using the presence of keyphrases in the model’s final answer. We use the test sets from (Shin et. al. 2023) and (Press et al. 2023).
Speaking More Languages of Global Business
Command-R is designed to serve as many people, organizations, and markets as possible. The model excels at 10 major languages of global business: English, French, Spanish, Italian, German, Portuguese, Japanese, Korean, Arabic, and Chinese. In addition, our Embed and Rerank models serve over 100 languages natively.
This enables users to draw answers from a vast set of data sources, regardless of language, and have clear and accurate dialogues provided in their native tongue.
Comparison of Command, Command-R, Llama-70B-Chat, Mixtral, and GPT-3.5-Turbo on multilingual MMLU (our translation) and FLORES in French, Spanish, Italian, German, Portuguese, Japanese, Korean, Arabic, and Chinese. (Goyal et al. 2021), (Hendrycks et al 2020).
Longer Context Window & Improved Pricing
Command-R features a longer context length, supporting up to 128k tokens in this initial release. The upgrade also comes with a price reduction on Cohere’s hosted API, and significant efficiency improvements for Cohere’s private cloud deployments. By combining a longer context window with less expensive pricing, Command-R unlocks RAG use cases where additional context can drive dramatic performance improvements.
Long-context “Needles in a Haystack” Evaluation. A fact is inserted into a prompt of varying length at different depths, and Command-R is asked to recall it. To improve robustness we modify the original evaluation by shuffling 8 times and report the average score. Bright green indicates perfect recovery of the inserted fact. (Kamradt, 2023)
Cohere API Pricing | $ / M input tokens | $ / M output tokens |
---|---|---|
Command | $1.00 | $2.00 |
Command-R | $0.50 | $1.50 |
Availability
Cohere works with all major cloud providers as well as on-prem for regulated industries and privacy-sensitive use cases, to make our models universally available.
To understand how your company can deploy these advanced RAG applications at production scale, reach out to our sales team.
Command-R with RAG is now also in Cohere’s demo environment, offering a hands-on experience for anyone to interact with the model through a simple chat interface.
We strongly believe in both supporting academic AI research and allowing our models to be independently evaluated. As part of this, our non-profit research lab Cohere For AI is releasing the weights for this version of Command-R publicly so that it can be used for research purposes. This is part of our wider support for the ML ecosystem alongside research compute grants and open source research releases like Aya.
For all enterprise and commercial use, Command-R will continue to require a commercial license, and will be continually updated alongside our Rerank and Embed models.
You can access the weights on HuggingFace.
Scalable Models for Businesses
At Cohere, we are focused on developing AI technology that is designed for use at production scale. As enterprises begin to transition from proof-of-concept projects to real-world production deployment it’s becoming crucial to leverage scalable AI solutions.
Enterprises need an AI partner they can trust, and that’s why Cohere maintains a core focus on cloud choice and strict data privacy.
We are excited to hear user feedback on Command-R and to see what developers build. We will continue to deliver scalable models that help companies succeed.