Skip to content

Llama 4 Scout 17B 16E Instruct

Model Overview

The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.

  • Model Architecture: The Llama 4 models are auto-regressive language models that use a mixture-of-experts (MoE) architecture and incorporate early fusion for native multimodality. Llama 4 Scout is a 17 billion parameter model with 16 experts.
  • Model Release Date: April 5, 2025
  • Repository: llama-models/models/llama4
  • Model Source: meta-llama/Llama-4-Scout-17B-16E-Instruct
  • License: llama4
  • Supported languages: Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese.

Multi Model QPC Configuration # 1

Language (CPL)

Precision SoCs / Tensor slicing NSP-Cores (per SoC) Batch Size Chunking Prompt Length Context Length (CL) QPC URL QPC Size QPC Download Onnx URL Onnx Download Generation Date
MXFP6 4 16 1 128 8192 https://dc00tk1pxen80.cloudfront.net/SDK1.21.4.0/meta-llama/Llama-4-Scout-17B-16E-Instruct/llama4-scout_text_128pl__vs2488_cl8192_bs1_c16_ts4_sdk1_21_4.tar.gz 9.9GB Download https://dc00tk1pxen80.cloudfront.net/SDK1.21.4.0/meta-llama/Llama-4-Scout-17B-16E-Instruct/Llama-4-Scout-17B-16E-Instruct_Encoder_ONNX.tar.gz Download 23-Apr-2026

Vision (PL/SL)

Precision SoCs / Tensor slicing NSP-Cores (per SoC) Batch Size PL (Vision) / SL (Vision) Context Length (CL) QPC URL QPC Size QPC Download Onnx URL Onnx Download Generation Date
MXFP6 4 16 1 2448 8192 https://qualcom-qpc-models.s3-accelerate.amazonaws.com/SDK1.21.4.0/meta-llama/Llama-4-Scout-17B-16E-Instruct/llama4-scout_vision_128pl__vs2488_cl8192_bs1_c16_ts4_sdk1_21_4.tar.gz 94 GB Download https://qualcom-qpc-models.s3-accelerate.amazonaws.com/SDK1.21.4.0/meta-llama/Llama-4-Scout-17B-16E-Instruct/Llama-4-Scout-17B-16E-Instruct_Decoder_ONNX.tar.gz Download 23-Apr-2026

Multi Model QPC Configuration # 2

Language (CPL)

Precision SoCs / Tensor slicing NSP-Cores (per SoC) Batch Size Chunking Prompt Length Context Length (CL) QPC URL QPC Size QPC Download Onnx URL Onnx Download Generation Date
MXFP6 8 16 1 128 8192 https://dc00tk1pxen80.cloudfront.net/SDK1.21.4.0/meta-llama/Llama-4-Scout-17B-16E-Instruct/llama4-scout_text_128pl__vs2488_cl8192_bs1_c16_ts8_sdk1_21_4.tar.gz 37GB Download https://dc00tk1pxen80.cloudfront.net/SDK1.21.4.0/meta-llama/Llama-4-Scout-17B-16E-Instruct/Llama-4-Scout-17B-16E-Instruct_Encoder_ONNX.tar.gz Download 23-Apr-2026

Vision (PL/SL)

Precision SoCs / Tensor slicing NSP-Cores (per SoC) Batch Size PL (Vision) / SL (Vision) Context Length (CL) QPC URL QPC Size QPC Download Onnx URL Onnx Download Generation Date
MXFP6 8 16 1 2448 8192 https://qualcom-qpc-models.s3-accelerate.amazonaws.com/SDK1.21.4.0/meta-llama/Llama-4-Scout-17B-16E-Instruct/llama4-scout_vision_128pl__vs2488_cl8192_bs1_c16_ts8_sdk1_21_4.tar.gz 110GB Download https://qualcom-qpc-models.s3-accelerate.amazonaws.com/SDK1.21.4.0/meta-llama/Llama-4-Scout-17B-16E-Instruct/Llama-4-Scout-17B-16E-Instruct_Decoder_ONNX.tar.gz Download 23-Apr-2026

Multi Model QPC Configuration # 3

Language (CPL)

Precision SoCs / Tensor slicing NSP-Cores (per SoC) Batch Size Chunking Prompt Length Context Length (CL) QPC URL QPC Size QPC Download Onnx URL Onnx Download Generation Date
MXFP6 4 8 1 128 8192 https://dc00tk1pxen80.cloudfront.net/SDK1.21.4.0/meta-llama/Llama-4-Scout-17B-16E-Instruct/llama4-scout_text_128pl__vs2488_cl8192_bs1_c8_ts4_sdk1_21_4.tar.gz 11GB Download https://dc00tk1pxen80.cloudfront.net/SDK1.21.4.0/meta-llama/Llama-4-Scout-17B-16E-Instruct/Llama-4-Scout-17B-16E-Instruct_Encoder_ONNX.tar.gz Download 24-Apr-2026

Vision (PL/SL)

Precision SoCs / Tensor slicing NSP-Cores (per SoC) Batch Size PL (Vision) / SL (Vision) Context Length (CL) QPC URL QPC Size QPC Download Onnx URL Onnx Download Generation Date
MXFP6 4 8 1 2448 8192 https://qualcom-qpc-models.s3-accelerate.amazonaws.com/SDK1.21.4.0/meta-llama/Llama-4-Scout-17B-16E-Instruct/llama4-scout_vision_128pl__vs2488_cl8192_bs1_c8_ts4_sdk1_21_4.tar.gz 94GB Download https://qualcom-qpc-models.s3-accelerate.amazonaws.com/SDK1.21.4.0/meta-llama/Llama-4-Scout-17B-16E-Instruct/Llama-4-Scout-17B-16E-Instruct_Decoder_ONNX.tar.gz Download 24-Apr-2026

Multi Model QPC Configuration # 4

Language (CPL)

Precision SoCs / Tensor slicing NSP-Cores (per SoC) Batch Size Chunking Prompt Length Context Length (CL) QPC URL QPC Size QPC Download Onnx URL Onnx Download Generation Date
MXFP6 4 16 1 128 65536 https://dc00tk1pxen80.cloudfront.net/SDK1.21.4.0/meta-llama/Llama-4-Scout-17B-16E-Instruct/llama4-scout_encoder_128pl__vs2488_cl65536_bs1_c16_ts4_sdk1_21_4.tar.gz 9.9GB Download https://dc00tk1pxen80.cloudfront.net/SDK1.21.4.0/meta-llama/Llama-4-Scout-17B-16E-Instruct/Llama-4-Scout-17B-16E-Instruct_Encoder_ONNX.tar.gz Download 4-May-2026

Vision (PL/SL)

Precision SoCs / Tensor slicing NSP-Cores (per SoC) Batch Size PL (Vision) / SL (Vision) Context Length (CL) QPC URL QPC Size QPC Download Onnx URL Onnx Download Generation Date
MXFP6 4 16 1 2448 65536 http://qualcom-qpc-models.s3-website-us-east-1.amazonaws.com/SDK1.21.4.0/meta-llama/Llama-4-Scout-17B-16E-Instruct/llama4-scout_Decoder_128pl__vs2488_cl65536_bs1_c16_ts4_sdk1_21_4.tar.gz 95GB Download https://qualcom-qpc-models.s3-accelerate.amazonaws.com/SDK1.21.4.0/meta-llama/Llama-4-Scout-17B-16E-Instruct/Llama-4-Scout-17B-16E-Instruct_Decoder_ONNX.tar.gz Download 4-May-2026