C

Cohere: Command R+

Cohere
128K
Context
$2.50
Input /1M
$10.00
Output /1M
4K
Max Output

Cohere: Command R+

Inference Model ID: cohere/command-r-plus

Overview

Property Value
Provider Cohere
Model ID cohere/command-r-plus
Short Name Command R+
Created April 4, 2024
Parameters 104 billion
Context Length 128,000 tokens
Max Completion Tokens 4,096
Input Modalities Text
Output Modalities Text
Architecture Auto-regressive transformer with optimized design
Training SFT + Preference training aligned to human preferences

Description

Command R+ is a 104B-parameter large language model from Cohere, purpose-built for enterprise applications. It excels at roleplay, general consumer use cases, and Retrieval Augmented Generation (RAG). The model features multilingual support for ten key languages to facilitate global business operations.

Key characteristics:

  • Open Weights: Publicly available via HuggingFace (CohereForAI/c4ai-command-r-plus)
  • RAG Optimized: State-of-the-art retrieval-augmented generation with grounded citations
  • Multilingual Excellence: Strong performance across 10 primary + 13 secondary languages
  • Tool Use: Single-step and multi-step (agentic) tool calling capabilities
  • Enterprise Focus: Designed for enterprise-grade workloads with safety alignment

Pricing

Type Price per Million
Input Tokens $2.50
Output Tokens $10.00

Capabilities

1. Grounded Generation & RAG

  • Generates responses with citation spans from provided documents
  • Supports "accurate" and "fast" citation modes
  • Processes document chunks (100-400 words typical)
  • Document format: key-value pairs with title/text structure

2. Single-Step Tool Use (Function Calling)

  • JSON-formatted action generation
  • Multi-tool support with parameter specification
  • Special directly_answer tool for abstention
  • Two-inference model: Tool Selection -> Response Generation

3. Multi-Step Tool Use (Agents)

  • Iterative Action -> Observation -> Reflection cycles
  • Multi-hop reasoning capabilities
  • Sequential tool orchestration

4. Code Capabilities

  • Code snippet interaction
  • Code explanations and rewrites
  • Optimized with low temperature for code generation
  • Not optimized for pure code completion

Supported Parameters

Parameter Description
max_tokens Maximum number of tokens to generate
temperature Controls randomness in output generation (recommended: 0.3 for code)
top_p Nucleus sampling probability threshold
top_k Top-K sampling parameter
stop Stop sequences to end generation
frequency_penalty Penalty for token frequency
presence_penalty Penalty for token presence
seed Seed for reproducible outputs
response_format Format specification for the response
structured_outputs Enable structured output generation

Features

Feature Supported
Tool Choice Yes (none, auto, required, function)
Reasoning No
Chat Completions Yes
Completions Endpoint No
Multipart Support Yes
Grounded Generation Yes
RAG with Citations Yes

Tool Choice Options

  • literal_none - Disable tool use
  • literal_auto - Let model decide tool usage
  • literal_required - Force tool usage
  • type_function - Specific function tool selection

Use Cases

  • Retrieval Augmented Generation: Enterprise search, document Q&A with citations
  • Agentic Workflows: Complex multi-step tasks with tool usage
  • Multilingual Applications: Global customer service, translation, content generation
  • Roleplay & Creative: Conversational AI, character simulation
  • Long Document Processing: Analysis of lengthy documents, contracts, research papers
  • Enterprise Applications: Business-critical tasks requiring reliable performance

API Usage Example

curl -X POST https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "cohere/command-r-plus",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ],
    "max_tokens": 1024,
    "temperature": 0.3
  }'

Using with RAG/Grounded Generation

curl -X POST https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "cohere/command-r-plus",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant that answers questions based on provided documents."
      },
      {
        "role": "user",
        "content": "Based on the following document, answer my question.\n\nDocument: {\"title\": \"Company Policy\", \"text\": \"All employees are entitled to 20 days of paid vacation per year.\"}\n\nQuestion: How many vacation days do employees get?"
      }
    ],
    "max_tokens": 512
  }'

Using with Tools

curl -X POST https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "cohere/command-r-plus",
    "messages": [
      {"role": "user", "content": "What is the weather in San Francisco?"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get the current weather for a location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {"type": "string", "description": "City name"}
            },
            "required": ["location"]
          }
        }
      }
    ],
    "tool_choice": "auto"
  }'

Python Usage with Transformers

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "CohereForAI/c4ai-command-r-plus"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

messages = [{"role": "user", "content": "Hello, how are you?"}]
input_ids = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
)

gen_tokens = model.generate(
    input_ids,
    max_new_tokens=100,
    do_sample=True,
    temperature=0.3
)

gen_text = tokenizer.decode(gen_tokens[0])
print(gen_text)

Command R+ Variants

Model ID Description
cohere/command-r-plus-08-2024 Updated version with 50% higher throughput, 25% lower latency

Cohere Model Family

Model ID Description
cohere/command-r Command R - Smaller 35B parameter model
cohere/command-r-08-2024 Command R (August 2024 release)
cohere/command-a Command A - Latest 111B model with 256K context

Similar Enterprise Models

Model ID Description
meta-llama/llama-3.3-70b-instruct Llama 3.3 70B - Open weights
mistralai/mixtral-8x22b-instruct Mixtral 8x22B - Open weights MoE
qwen/qwen-2.5-72b-instruct Qwen 2.5 72B - Open weights

Providers

Primary Provider: Cohere

Property Value
Provider Cohere
Provider Base URL https://api.langmart.ai/v1
Data Policy Training disabled
Prompt Retention 30 days
Publication Allowed No

Supported Languages

Primary Languages (10 Optimized)

  • English
  • French
  • Spanish
  • Italian
  • German
  • Brazilian Portuguese
  • Japanese
  • Korean
  • Arabic
  • Simplified Chinese

Secondary Languages (13 in Pre-training)

  • Russian
  • Polish
  • Turkish
  • Vietnamese
  • Dutch
  • Czech
  • Indonesian
  • Ukrainian
  • Romanian
  • Greek
  • Hindi
  • Hebrew
  • Persian

Performance Benchmarks

Open LLM Leaderboard Scores

Benchmark Score
Average 74.6
Arc (Challenge) 70.99
HellaSwag 88.6
MMLU 75.7
Truthful QA 56.3
Winogrande 85.4
GSM8k 70.7

Outperforms: DBRX Instruct (74.5), Mixtral 8x7B (72.7)

Model Weights

The model weights are publicly available:

Quantization Options

  • 8-bit precision (via BitsAndBytes)
  • 4-bit precision (separate quantized version available)

Notes

  • This model is part of Cohere's Command family released in April 2024
  • With 128K context length, it supports extremely long documents and conversations
  • Open weights enable self-hosting and fine-tuning (non-commercial use)
  • Optimized for RAG with built-in citation capabilities
  • Excellent multilingual support for global applications
  • Strong tool calling for agentic workflows
  • An updated version (08-2024) offers 50% higher throughput and 25% lower latency

Usage Policy

This model is subject to:


Source: LangMart Model Registry HuggingFace: CohereForAI/c4ai-command-r-plus Last Updated: December 23, 2025