NVIDIA Llama 3.1 Nemotron 70B Instruct
Description
This model combines the Llama 3.1 70B architecture with Reinforcement Learning from Human Feedback (RLHF) to excel in automatic alignment benchmarks. It is designed for generating precise and useful responses across diverse domains and user queries, with emphasis on helpfulness and accuracy.
Technical Specifications
| Specification |
Value |
| Context Window |
128,000 tokens |
| Context Length |
131,072 tokens |
| Max Completion Tokens |
16,384 tokens |
| Input Modalities |
Text |
| Output Modalities |
Text |
| Quantization |
FP8 |
Pricing
| Type |
Price |
| Input Tokens |
$1.20 per 1M tokens |
| Output Tokens |
$1.20 per 1M tokens |
Provider: DeepInfra (fp8 quantization)
Supported Parameters
| Parameter |
Supported |
| max_tokens |
Yes |
| temperature |
Yes |
| top_p |
Yes |
| top_k |
Yes |
| stop |
Yes |
| frequency_penalty |
Yes |
| presence_penalty |
Yes |
| repetition_penalty |
Yes |
| seed |
Yes |
| min_p |
Yes |
| response_format |
Yes |
| tools |
Yes |
| tool_choice |
Yes |
| Field |
Value |
| Model ID |
nvidia/llama-3.1-nemotron-70b-instruct |
| Full Name |
NVIDIA: Llama 3.1 Nemotron 70B Instruct |
| Created |
October 15, 2024 |
| HuggingFace Slug |
nvidia/Llama-3.1-Nemotron-70B-Instruct-HF |
| Architecture |
Llama 3.1 70B |
| Instruct Type |
llama3 |
Default Stop Sequences
<|eot_id|>
<|end_of_text|>
DeepInfra
| Feature |
Value |
| Endpoint |
DeepInfra |
| Max Completion Tokens |
16,384 |
| Abortable Requests |
Yes |
| Multipart Support |
Yes |
| Terms |
https://deepinfra.com/terms |
Data Policy
| Policy |
Status |
| Training Use |
No |
| Retains Prompts |
No |
| Can Publish |
No |
Usage Terms
Subject to Meta's Acceptable Use Policy: https://www.llama.com/llama3/use-policy/
Usage Analytics
Recent usage shows consistent deployment with daily API requests ranging from approximately 295 to 24,429 calls across recent dates.
Source