Llama 3.1 Nemotron 70B Instruct HF
Model Overview¶
Llama-3.1-Nemotron-70B-Instruct is a large language model customized by NVIDIA to improve the helpfulness of LLM generated responses to user queries. Llama-3.1-Nemotron-70B-Instruct-HF has been converted from Llama-3.1-Nemotron-70B-Instruct to support it in the HuggingFace Transformers codebase.
This model was trained using RLHF (specifically, REINFORCE), Llama-3.1-Nemotron-70B-Reward and HelpSteer2-Preference prompts on a Llama-3.1-70B-Instruct model as the initial policy.
- Model Architecture: Transformer Llama 3.1
- Model Source: nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
- License: Llama 3.1 Community License Agreement
QPC Configurations¶
Precision | SoCs / Tensor slicing | NSP-Cores (per SoC) | Full Batch Size | Chunking Prompt Length | Context Length (CL) | Generated URL | Download |
---|---|---|---|---|---|---|---|
MXFP6 | 4 | 16 | 1 | 128 | 8192 | https://qualcom-qpc-models.s3-accelerate.amazonaws.com/SDK1.19.6/nvidia/Llama-3.1-Nemotron-70B-Instruct-HF/qpc_16cores_128pl_8192cl_1fbs_4devices_mxfp6_mxint8.tar.gz | Download |
MXFP6 | 8 | 16 | 8 | 128 | 8192 | https://qualcom-qpc-models.s3-accelerate.amazonaws.com/SDK1.19.6/nvidia/Llama-3.1-Nemotron-70B-Instruct-HF/qpc_16cores_128pl_8192cl_8fbs_8devices_mxfp6_mxint8.tar.gz | Download |