N

NVIDIA Llama 3.1 Nemotron 70B Instruct

NVIDIA
128K
Context
$1.20
Input /1M
$1.20
Output /1M
16K
Max Output

NVIDIA Llama 3.1 Nemotron 70B Instruct

Description

This model combines the Llama 3.1 70B architecture with Reinforcement Learning from Human Feedback (RLHF) to excel in automatic alignment benchmarks. It is designed for generating precise and useful responses across diverse domains and user queries, with emphasis on helpfulness and accuracy.

Technical Specifications

Specification Value
Context Window 128,000 tokens
Context Length 131,072 tokens
Max Completion Tokens 16,384 tokens
Input Modalities Text
Output Modalities Text
Quantization FP8

Pricing

Type Price
Input Tokens $1.20 per 1M tokens
Output Tokens $1.20 per 1M tokens

Provider: DeepInfra (fp8 quantization)

Supported Parameters

Parameter Supported
max_tokens Yes
temperature Yes
top_p Yes
top_k Yes
stop Yes
frequency_penalty Yes
presence_penalty Yes
repetition_penalty Yes
seed Yes
min_p Yes
response_format Yes
tools Yes
tool_choice Yes

Model Information

Field Value
Model ID nvidia/llama-3.1-nemotron-70b-instruct
Full Name NVIDIA: Llama 3.1 Nemotron 70B Instruct
Created October 15, 2024
HuggingFace Slug nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
Architecture Llama 3.1 70B
Instruct Type llama3

Default Stop Sequences

  • <|eot_id|>
  • <|end_of_text|>

Provider Information

DeepInfra

Feature Value
Endpoint DeepInfra
Max Completion Tokens 16,384
Abortable Requests Yes
Multipart Support Yes
Terms https://deepinfra.com/terms

Data Policy

Policy Status
Training Use No
Retains Prompts No
Can Publish No

Usage Terms

Subject to Meta's Acceptable Use Policy: https://www.llama.com/llama3/use-policy/

Usage Analytics

Recent usage shows consistent deployment with daily API requests ranging from approximately 295 to 24,429 calls across recent dates.

Source