Llama 4 Scout 17B 16E Instruct
Model Overview¶
The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.
- Model Architecture: The Llama 4 models are auto-regressive language models that use a mixture-of-experts (MoE) architecture and incorporate early fusion for native multimodality. Llama 4 Scout is a 17 billion parameter model with 16 experts.
- Model Release Date: April 5, 2025
- Repository: llama-models/models/llama4
- Model Source: meta-llama/Llama-4-Scout-17B-16E-Instruct
- License: llama4
- Supported languages: Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese.
Multi Model QPC Configuration # 1¶
| Precision | SoCs / Tensor slicing | NSP-Cores (per SoC) | Batch Size | Chunking Prompt Length | Context Length (CL) | CCL_Enabled | QPC URL | QPC Size | QPC Download | Onnx URL | Onnx Download | Generation Date |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MXFP6 | 4 | 16 | 1 | 128 | 8192 | False | https://dc00tk1pxen80.cloudfront.net/SDK1.21.6/meta-llama/Llama-4-Scout-17B-16E-Instruct/Llama-4-Scout-17B-16E-Instruct_qpc_Encoder_16cores_128pl_8192cl_4devices_mxfp6_mxint8.tar.gz | 9.9GB | Download | Inprogress | Download | 23-June-2026 |
| MXFP6 | 4 | 16 | 1 | 128 | 8192 | False | https://dc00tk1pxen80.cloudfront.net/SDK1.21.6/meta-llama/Llama-4-Scout-17B-16E-Instruct/Llama-4-Scout-17B-16E-Instruct_qpc_Encoder_16cores_128pl_65536cl_4devices_mxfp6_mxint8.tar.gz | 94GB | Download | Inprogress | Download | 23-June-2026 |
Multi Model QPC Configuration # 2¶
| Precision | SoCs / Tensor slicing | NSP-Cores (per SoC) | Batch Size | Chunking Prompt Length | Context Length (CL) | CCL_Enabled | QPC URL | QPC Size | QPC Download | Onnx URL | Onnx Download | Generation Date |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MXFP6 | 4 | 16 | 1 | 128 | 65536 | False | https://dc00tk1pxen80.cloudfront.net/SDK1.21.6/meta-llama/Llama-4-Scout-17B-16E-Instruct/Llama-4-Scout-17B-16E-Instruct_qpc_Encoder_16cores_128pl_65536cl_4devices_mxfp6_mxint8.tar.gz | 9.9GB | Download | Inprogress | Download | 23-June-2026 |
| MXFP6 | 4 | 16 | 1 | 128 | 65536 | False | https://qualcom-qpc-models.s3-accelerate.amazonaws.com/SDK1.21.6/meta-llama/Llama-4-Scout-17B-16E-Instruct/Llama-4-Scout-17B-16E-Instruct_qpc_Decoder_16cores_128pl_65536cl_4devices_mxfp6_mxint8.tar.gz | 95GB | Download | Inprogress | Download | 23-June-2026 |
Run This Model¶
Download QPCs¶
mkdir -p meta-llama/Llama-4-Scout-17B-16E-Instruct
cd meta-llama/Llama-4-Scout-17B-16E-Instruct
# Download Encoder QPC
wget <Encoder_QPC_Download_URL>
tar xzvf <encoder_qpc_filename.tar.gz>
# Download Decoder QPC
wget <Decoder_QPC_Download_URL>
tar xzvf <decoder_qpc_filename.tar.gz>
# Download Inference Script
wget http://qualcom-qpc-models.s3-website-us-east-1.amazonaws.com/QPC/multimodel_inference_1_21_6.py
Run QPC¶
Replace <encoder_qpc_path> and <decoder_qpc_path> with the actual extracted QPC directories.
python3 multimodel_inference_1_21_6.py \
--model-id meta-llama/Llama-4-Scout-17B-16E-Instruct \
--vision-qpc <encoder_qpc_path> \
--lang-qpc <decoder_qpc_path> \
--ctx-len <ctx_len> \
--prefill-seq-len 128 \
--device-ids <device_ids> \
--generation-len 200 \
--image-url "<image_url>" \
--prompt "<prompt>"