Llama 3.1 Nemotron 70B Instruct HF

Model Overview¶

Llama-3.1-Nemotron-70B-Instruct is a large language model customized by NVIDIA to improve the helpfulness of LLM generated responses to user queries. Llama-3.1-Nemotron-70B-Instruct-HF has been converted from Llama-3.1-Nemotron-70B-Instruct to support it in the HuggingFace Transformers codebase.

This model was trained using RLHF (specifically, REINFORCE), Llama-3.1-Nemotron-70B-Reward and HelpSteer2-Preference prompts on a Llama-3.1-70B-Instruct model as the initial policy.

Model Architecture: Transformer Llama 3.1
Model Source: nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
License: Llama 3.1 Community License Agreement

QPC Configurations¶

Precision	SoCs / Tensor slicing	NSP-Cores (per SoC)	Full Batch Size	Chunking Prompt Length	Context Length (CL)	Generated URL	Download
MXFP6	4	16	1	128	8192	https://qualcom-qpc-models.s3-accelerate.amazonaws.com/SDK1.19.6/nvidia/Llama-3.1-Nemotron-70B-Instruct-HF/qpc_16cores_128pl_8192cl_1fbs_4devices_mxfp6_mxint8.tar.gz	Download
MXFP6	8	16	8	128	8192	https://qualcom-qpc-models.s3-accelerate.amazonaws.com/SDK1.19.6/nvidia/Llama-3.1-Nemotron-70B-Instruct-HF/qpc_16cores_128pl_8192cl_8fbs_8devices_mxfp6_mxint8.tar.gz	Download