Skip to content

Llama 3.1 Nemotron 70B Instruct HF

Model Overview

Llama-3.1-Nemotron-70B-Instruct is a large language model customized by NVIDIA to improve the helpfulness of LLM generated responses to user queries. Llama-3.1-Nemotron-70B-Instruct-HF has been converted from Llama-3.1-Nemotron-70B-Instruct to support it in the HuggingFace Transformers codebase.

This model was trained using RLHF (specifically, REINFORCE), Llama-3.1-Nemotron-70B-Reward and HelpSteer2-Preference prompts on a Llama-3.1-70B-Instruct model as the initial policy.

QPC Configurations

Precision SoCs / Tensor slicing NSP-Cores (per SoC) Full Batch Size Chunking Prompt Length Context Length (CL) Generated URL Download
MXFP6 4 16 1 128 8192 https://qualcom-qpc-models.s3-accelerate.amazonaws.com/SDK1.19.6/nvidia/Llama-3.1-Nemotron-70B-Instruct-HF/qpc_16cores_128pl_8192cl_1fbs_4devices_mxfp6_mxint8.tar.gz Download
MXFP6 8 16 8 128 8192 https://qualcom-qpc-models.s3-accelerate.amazonaws.com/SDK1.19.6/nvidia/Llama-3.1-Nemotron-70B-Instruct-HF/qpc_16cores_128pl_8192cl_8fbs_8devices_mxfp6_mxint8.tar.gz Download