NVIDIA Llama 3.1 Nemotron 70B Instruct

Description

This model combines the Llama 3.1 70B architecture with Reinforcement Learning from Human Feedback (RLHF) to excel in automatic alignment benchmarks. It is designed for generating precise and useful responses across diverse domains and user queries, with emphasis on helpfulness and accuracy.

Technical Specifications

Specification	Value
Context Window	128,000 tokens
Context Length	131,072 tokens
Max Completion Tokens	16,384 tokens
Input Modalities	Text
Output Modalities	Text
Quantization	FP8

Pricing

Type	Price
Input Tokens	$1.20 per 1M tokens
Output Tokens	$1.20 per 1M tokens

Provider: DeepInfra (fp8 quantization)

Supported Parameters

Parameter	Supported
max_tokens	Yes
temperature	Yes
top_p	Yes
top_k	Yes
stop	Yes
frequency_penalty	Yes
presence_penalty	Yes
repetition_penalty	Yes
seed	Yes
min_p	Yes
response_format	Yes
tools	Yes
tool_choice	Yes

Model Information

Field	Value
Model ID	`nvidia/llama-3.1-nemotron-70b-instruct`
Full Name	NVIDIA: Llama 3.1 Nemotron 70B Instruct
Created	October 15, 2024
HuggingFace Slug	nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
Architecture	Llama 3.1 70B
Instruct Type	llama3

Default Stop Sequences

<|eot_id|>
<|end_of_text|>

Provider Information

DeepInfra

Feature	Value
Endpoint	DeepInfra
Max Completion Tokens	16,384
Abortable Requests	Yes
Multipart Support	Yes
Terms	https://deepinfra.com/terms

Data Policy

Policy	Status
Training Use	No
Retains Prompts	No
Can Publish	No

Usage Terms

Subject to Meta's Acceptable Use Policy: https://www.llama.com/llama3/use-policy/

Usage Analytics

Recent usage shows consistent deployment with daily API requests ranging from approximately 295 to 24,429 calls across recent dates.

Source

LangMart: https://langmart.ai/model-docs
HuggingFace: https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Instruct-HF