Zentree-Qualcomm Pre-compiled Model Catalog for Cloud AI Accelerators

Nvidia Llama 3.1 Nemotron 70B Instruct HF AWQ INT4

Zentree-Qualcomm Pre-compiled Model Catalog for Cloud AI Accelerators

User Guide
User Guide
Catalog
Catalog
- Overview
- Meta-Llama-SPD
  Meta-Llama-SPD
  - 1.20.2
    1.20.2
    
    Meta llama
- bge-base-en-v1.5
  bge-base-en-v1.5
  - 1.19.6
    1.19.6
    
    Bge base en v1.5
- bge-large-en-v1.5
  bge-large-en-v1.5
  - 1.19.6
    1.19.6
    
    Bge large en v1.5
- bge-m3
  bge-m3
  - 1.19.6
    1.19.6
    
    Bge m3
- deepseek-r1-distill-llama-8b-awq
  deepseek-r1-distill-llama-8b-awq
  - 1.19.6
    1.19.6
    
    Deepseek r1 distill llama 8b awq
  - 1.20.4
    1.20.4
    
    Deepseek r1 distill llama 8b awq
- DeepSeek-R1-Distill-Llama-8B
  DeepSeek-R1-Distill-Llama-8B
  - 1.19.6
    1.19.6
    
    DeepSeek R1 Distill Llama 8B
- DeepSeek-R1-Distill-Llama-70B-AWQ
  DeepSeek-R1-Distill-Llama-70B-AWQ
  - 1.19.6
    1.19.6
    
    DeepSeek R1 Distill Llama 70B AWQ
  - 1.20.4
    1.20.4
    
    DeepSeek R1 Distill Llama 70B AWQ
- DeepSeek-R1-Distill-Llama-70B
  DeepSeek-R1-Distill-Llama-70B
  - 1.18
    1.18
    
    DeepSeek R1 Distill Llama 70B
  - 1.19.6
    1.19.6
    
    DeepSeek R1 Distill Llama 70B
- deepseek-r1-distill-qwen-7b-awq
  deepseek-r1-distill-qwen-7b-awq
  - 1.19.6
    1.19.6
    
    Deepseek r1 distill qwen 7b awq
  - 1.20.4
    1.20.4
    
    Deepseek r1 distill qwen 7b awq
- DeepSeek-R1-Distill-Qwen-7B
  DeepSeek-R1-Distill-Qwen-7B
  - 1.19.6
    1.19.6
    
    DeepSeek R1 Distill Qwen 7B
- DeepSeek-R1-Distill-Qwen-32B-AWQ
  DeepSeek-R1-Distill-Qwen-32B-AWQ
  - 1.19.6
    1.19.6
    
    DeepSeek R1 Distill Qwen 32B AWQ
  - 1.20.4
    1.20.4
    
    DeepSeek R1 Distill Qwen 32B AWQ
- DeepSeek-R1-Distill-Qwen-32B
  DeepSeek-R1-Distill-Qwen-32B
  - 1.18
    1.18
    
    DeepSeek R1 Distill Qwen 32B
  - 1.19.6
    1.19.6
    
    DeepSeek R1 Distill Qwen 32B
  - 1.20.2
    1.20.2
    
    DeepSeek R1 Distill Qwen 32B
- gpt-oss 20b
  gpt-oss 20b
  - 1.20.2
    1.20.2
    
    Gpt oss 20b
- gpt-oss 120b
  gpt-oss 120b
  - 1.20.2
    1.20.2
    
    Gpt oss 120b
- granite-3.2-8b-instruct
  granite-3.2-8b-instruct
  - 1.19.8
    1.19.8
    
    Granite 3.2 8b instruct
  - 1.20.2
    1.20.2
    
    Granite 3.2 8b instruct
- granite-3.3-8b-instruct
  granite-3.3-8b-instruct
  - 1.20.2
    1.20.2
    
    Granite 3.3 8b instruct
  - 1.20.4
    1.20.4
    
    Granite 3.3 8b instruct
- Jais-7B
  Jais-7B
  - 1.18
    1.18
    
    Jais 7B
- Llama3.1 8B
  Llama3.1 8B
  - 1.18
    1.18
    
    Llama 3.1 8B
  - 1.19.6
    1.19.6
    
    Llama 3.1 8B
- Llama3.1 70B
  Llama3.1 70B
  - 1.19.6
    1.19.6
    
    Llama 3.1 70B
- Llama-3.1-Nemotron-70B-Instruct-HF
  Llama-3.1-Nemotron-70B-Instruct-HF
  - 1.19.6
    1.19.6
    
    Llama 3.1 Nemotron 70B Instruct HF
- Llama-3.1-Nemotron-Nano-8B-v1
  Llama-3.1-Nemotron-Nano-8B-v1
  - 1.19.6
    1.19.6
    
    Llama 3.1 Nemotron Nano 8B v1
- llama-3.3-70b-instruct-awq
  llama-3.3-70b-instruct-awq
  - 1.19.6
    1.19.6
    
    Llama 3.3 70b instruct awq
  - 1.20.4
    1.20.4
    
    Llama 3.3 70b instruct awq
- Llama3.3 70B
  Llama3.3 70B
  - 1.18
    1.18
    
    Llama 3.3 70B
  - 1.19.6
    1.19.6
    
    Llama 3.3 70B
  - 1.20.2
    1.20.2
    
    Llama 3.3 70B
- Llama-4-Scout-17B-16E-Instruct
  Llama-4-Scout-17B-16E-Instruct
  - 1.20.1.2
    1.20.1.2
    
    Llama 4 Scout 17B 16E Instruct
  - 1.20.2
    1.20.2
    
    Llama 4 Scout 17B 16E Instruct
- Meta-Llama-3.1-8B-Instruct-AWQ-INT4
  Meta-Llama-3.1-8B-Instruct-AWQ-INT4
  - 1.19.6
    1.19.6
    
    Meta Llama 3.1 8B Instruct AWQ INT4
  - 1.20.4
    1.20.4
    
    Meta Llama 3.1 8B Instruct AWQ INT4
- Meta-Llama-3.1-70B-Instruct-AWQ-INT4
  Meta-Llama-3.1-70B-Instruct-AWQ-INT4
  - 1.19.6
    1.19.6
    
    Meta Llama 3.1 70B Instruct AWQ INT4
  - 1.20.4
    1.20.4
    
    Meta Llama 3.1 70B Instruct AWQ INT4
- multilingual-e5-large
  multilingual-e5-large
  - 1.19.6
    1.19.6
    
    Multilingual e5 large
- multilingual-e5-small
  multilingual-e5-small
  - 1.19.6
    1.19.6
    
    Multilingual e5 small
- Nvidia-Llama-3.1-Nemotron-70B-Instruct-HF-AWQ-INT4
  Nvidia-Llama-3.1-Nemotron-70B-Instruct-HF-AWQ-INT4
  - 1.19.6
    1.19.6
    
    Nvidia Llama 3.1 Nemotron 70B Instruct HF AWQ INT4 Nvidia Llama 3.1 Nemotron 70B Instruct HF AWQ INT4
    Table of contents
    
    Model Overview
    
    QPC Configurations
  - 1.20.4
    1.20.4
    
    Nvidia Llama 3.1 Nemotron 70B Instruct HF AWQ INT4
- Mistral 7B
  Mistral 7B
  - 1.18
    1.18
    
    Mistral 7B
  - 1.19.6
    1.19.6
    
    Mistral 7B
- phi-4-AWQ
  phi-4-AWQ
  - 1.19.6
    1.19.6
    
    phi 4 AWQ
  - 1.20.4
    1.20.4
    
    phi 4 AWQ
- Phi-4
  Phi-4
  - 1.18
    1.18
    
    Phi 4
  - 1.19.6
    1.19.6
    
    Phi 4
- Qwen2.5-Coder-32B-Instruct-AWQ
  Qwen2.5-Coder-32B-Instruct-AWQ
  - 1.19.6
    1.19.6
    
    Qwen2.5 Coder 32B Instruct AWQ
  - 1.20.4
    1.20.4
    
    Qwen2.5 Coder 32B Instruct AWQ
- Qwen2.5-Coder-32B-Instruct
  Qwen2.5-Coder-32B-Instruct
  - 1.19.6
    1.19.6
    
    Qwen2.5 Coder 32B Instruct
- QwQ-32B
  QwQ-32B
  - 1.19.6
    1.19.6
    
    QwQ 32B
- QwQ-32B-AWQ
  QwQ-32B-AWQ
  - 1.19.6
    1.19.6
    
    QwQ 32B AWQ
  - 1.20.4
    1.20.4
    
    QwQ 32B AWQ
- sdxl-turbo
  sdxl-turbo
  - 1.19.6
    1.19.6
    
    Sdxl turbo
Cloud AI Images
Cloud AI Images
- Cloud AI Images

Nvidia Llama 3.1 Nemotron 70B Instruct HF AWQ INT4

Model Overview¶

This repository is an AWQ 4-bit quantized version of the nvidia/Llama-3.1-Nemotron-70B-Instruct-HF model, which is an NVIDIA customized version of meta-llama/Meta-Llama-3.1-70B-Instruct, originally released by Meta AI.

This model was quantized using AutoAWQ from FP16 down to INT4 using GEMM kernels, with zero-point quantization and a group size of 128.

Model Architecture: Transformer Llama 3.1
Model Source: ibnzterrell/Nvidia-Llama-3.1-Nemotron-70B-Instruct-HF-AWQ-INT4
License: Llama 3.1 Community License Agreement

QPC Configurations¶

Precision	SoCs / Tensor slicing	NSP-Cores (per SoC)	Full Batch Size	Chunking Prompt Length	Context Length (CL)	Generated URL	Download
MXFP6	4	16	1	128	8192	https://qualcom-qpc-models.s3-accelerate.amazonaws.com/SDK1.19.6/ibnzterrell/Nvidia-Llama-3.1-Nemotron-70B-Instruct-HF-AWQ-INT4/qpc_16cores_128pl_8192cl_1fbs_4devices_mxfp6_mxint8.tar.gz	Download