Cohere: Command R+
Inference Model ID: cohere/command-r-plus
Overview
| Property |
Value |
| Provider |
Cohere |
| Model ID |
cohere/command-r-plus |
| Short Name |
Command R+ |
| Created |
April 4, 2024 |
| Parameters |
104 billion |
| Context Length |
128,000 tokens |
| Max Completion Tokens |
4,096 |
| Input Modalities |
Text |
| Output Modalities |
Text |
| Architecture |
Auto-regressive transformer with optimized design |
| Training |
SFT + Preference training aligned to human preferences |
Description
Command R+ is a 104B-parameter large language model from Cohere, purpose-built for enterprise applications. It excels at roleplay, general consumer use cases, and Retrieval Augmented Generation (RAG). The model features multilingual support for ten key languages to facilitate global business operations.
Key characteristics:
- Open Weights: Publicly available via HuggingFace (
CohereForAI/c4ai-command-r-plus)
- RAG Optimized: State-of-the-art retrieval-augmented generation with grounded citations
- Multilingual Excellence: Strong performance across 10 primary + 13 secondary languages
- Tool Use: Single-step and multi-step (agentic) tool calling capabilities
- Enterprise Focus: Designed for enterprise-grade workloads with safety alignment
Pricing
| Type |
Price per Million |
| Input Tokens |
$2.50 |
| Output Tokens |
$10.00 |
Capabilities
1. Grounded Generation & RAG
- Generates responses with citation spans from provided documents
- Supports "accurate" and "fast" citation modes
- Processes document chunks (100-400 words typical)
- Document format: key-value pairs with title/text structure
- JSON-formatted action generation
- Multi-tool support with parameter specification
- Special
directly_answer tool for abstention
- Two-inference model: Tool Selection -> Response Generation
- Iterative Action -> Observation -> Reflection cycles
- Multi-hop reasoning capabilities
- Sequential tool orchestration
4. Code Capabilities
- Code snippet interaction
- Code explanations and rewrites
- Optimized with low temperature for code generation
- Not optimized for pure code completion
Supported Parameters
| Parameter |
Description |
max_tokens |
Maximum number of tokens to generate |
temperature |
Controls randomness in output generation (recommended: 0.3 for code) |
top_p |
Nucleus sampling probability threshold |
top_k |
Top-K sampling parameter |
stop |
Stop sequences to end generation |
frequency_penalty |
Penalty for token frequency |
presence_penalty |
Penalty for token presence |
seed |
Seed for reproducible outputs |
response_format |
Format specification for the response |
structured_outputs |
Enable structured output generation |
Features
| Feature |
Supported |
| Tool Choice |
Yes (none, auto, required, function) |
| Reasoning |
No |
| Chat Completions |
Yes |
| Completions Endpoint |
No |
| Multipart Support |
Yes |
| Grounded Generation |
Yes |
| RAG with Citations |
Yes |
literal_none - Disable tool use
literal_auto - Let model decide tool usage
literal_required - Force tool usage
type_function - Specific function tool selection
Use Cases
- Retrieval Augmented Generation: Enterprise search, document Q&A with citations
- Agentic Workflows: Complex multi-step tasks with tool usage
- Multilingual Applications: Global customer service, translation, content generation
- Roleplay & Creative: Conversational AI, character simulation
- Long Document Processing: Analysis of lengthy documents, contracts, research papers
- Enterprise Applications: Business-critical tasks requiring reliable performance
API Usage Example
curl -X POST https://api.langmart.ai/v1/chat/completions \
-H "Authorization: Bearer $LANGMART_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "cohere/command-r-plus",
"messages": [
{"role": "user", "content": "Hello, how are you?"}
],
"max_tokens": 1024,
"temperature": 0.3
}'
Using with RAG/Grounded Generation
curl -X POST https://api.langmart.ai/v1/chat/completions \
-H "Authorization: Bearer $LANGMART_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "cohere/command-r-plus",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant that answers questions based on provided documents."
},
{
"role": "user",
"content": "Based on the following document, answer my question.\n\nDocument: {\"title\": \"Company Policy\", \"text\": \"All employees are entitled to 20 days of paid vacation per year.\"}\n\nQuestion: How many vacation days do employees get?"
}
],
"max_tokens": 512
}'
curl -X POST https://api.langmart.ai/v1/chat/completions \
-H "Authorization: Bearer $LANGMART_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "cohere/command-r-plus",
"messages": [
{"role": "user", "content": "What is the weather in San Francisco?"}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"}
},
"required": ["location"]
}
}
}
],
"tool_choice": "auto"
}'
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "CohereForAI/c4ai-command-r-plus"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
messages = [{"role": "user", "content": "Hello, how are you?"}]
input_ids = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
)
gen_tokens = model.generate(
input_ids,
max_new_tokens=100,
do_sample=True,
temperature=0.3
)
gen_text = tokenizer.decode(gen_tokens[0])
print(gen_text)
Command R+ Variants
| Model ID |
Description |
cohere/command-r-plus-08-2024 |
Updated version with 50% higher throughput, 25% lower latency |
Cohere Model Family
| Model ID |
Description |
cohere/command-r |
Command R - Smaller 35B parameter model |
cohere/command-r-08-2024 |
Command R (August 2024 release) |
cohere/command-a |
Command A - Latest 111B model with 256K context |
Similar Enterprise Models
| Model ID |
Description |
meta-llama/llama-3.3-70b-instruct |
Llama 3.3 70B - Open weights |
mistralai/mixtral-8x22b-instruct |
Mixtral 8x22B - Open weights MoE |
qwen/qwen-2.5-72b-instruct |
Qwen 2.5 72B - Open weights |
Providers
Primary Provider: Cohere
| Property |
Value |
| Provider |
Cohere |
| Provider Base URL |
https://api.langmart.ai/v1 |
| Data Policy |
Training disabled |
| Prompt Retention |
30 days |
| Publication Allowed |
No |
Supported Languages
Primary Languages (10 Optimized)
- English
- French
- Spanish
- Italian
- German
- Brazilian Portuguese
- Japanese
- Korean
- Arabic
- Simplified Chinese
Secondary Languages (13 in Pre-training)
- Russian
- Polish
- Turkish
- Vietnamese
- Dutch
- Czech
- Indonesian
- Ukrainian
- Romanian
- Greek
- Hindi
- Hebrew
- Persian
Open LLM Leaderboard Scores
| Benchmark |
Score |
| Average |
74.6 |
| Arc (Challenge) |
70.99 |
| HellaSwag |
88.6 |
| MMLU |
75.7 |
| Truthful QA |
56.3 |
| Winogrande |
85.4 |
| GSM8k |
70.7 |
Outperforms: DBRX Instruct (74.5), Mixtral 8x7B (72.7)
Model Weights
The model weights are publicly available:
Quantization Options
- 8-bit precision (via BitsAndBytes)
- 4-bit precision (separate quantized version available)
Notes
- This model is part of Cohere's Command family released in April 2024
- With 128K context length, it supports extremely long documents and conversations
- Open weights enable self-hosting and fine-tuning (non-commercial use)
- Optimized for RAG with built-in citation capabilities
- Excellent multilingual support for global applications
- Strong tool calling for agentic workflows
- An updated version (08-2024) offers 50% higher throughput and 25% lower latency
Usage Policy
This model is subject to:
Source: LangMart Model Registry
HuggingFace: CohereForAI/c4ai-command-r-plus
Last Updated: December 23, 2025