Llama 4 Scout 17B 16E Instruct
Model Overview
The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.
- Model Architecture: The Llama 4 models are auto-regressive language models that use a mixture-of-experts (MoE) architecture and incorporate early fusion for native multimodality. Llama 4 Scout is a 17 billion parameter model with 16 experts.
- Model Release Date: April 5, 2025
- Repository: llama-models/models/llama4
- Model Source: meta-llama/Llama-4-Scout-17B-16E-Instruct
- License: llama4
- Supported languages: Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese.
QPC Configurations - Prompt Length 2688
Precision |
SoCs / Tensor slicing |
NSP-Cores (per SoC) |
Full Batch Size |
Chunking Prompt Length |
Context Length (CL) |
Generated URL |
Download |
MXFP6 |
4 |
8 |
1 |
128 |
8192 |
https://qualcom-qpc-models.s3-accelerate.amazonaws.com/SDK1.20.1.2/meta-llama/Llama-4-Scout-17B-16E-Instruct/llama4_text_pl128_cl8192_bs1_c8_ts4_sdkMaster.tar.gz |
Download |
MXFP6 |
4 |
8 |
1 |
2688 |
8192 |
https://dc00tk1pxen80.cloudfront.net/SDK1.20.1.2/meta-llama/Llama-4-Scout-17B-16E-Instruct/lama4_vision_pl2688_cl8192_bs1_c8_ts4_sdk20_110.tar.gz |
Download |
QPC Configurations - Prompt Length 3968
Precision |
SoCs / Tensor slicing |
NSP-Cores (per SoC) |
Full Batch Size |
Chunking Prompt Length |
Context Length (CL) |
Generated URL |
Download |
MXFP6 |
4 |
8 |
1 |
128 |
8192 |
https://qualcom-qpc-models.s3-accelerate.amazonaws.com/SDK1.20.1.2/meta-llama/Llama-4-Scout-17B-16E-Instruct/vision_pl3968/llama4_text_pl128_cl8192_bs1_c8_ts4_sdkMaster.tar.gz |
Download |
MXFP6 |
4 |
8 |
1 |
3968 |
8192 |
https://dc00tk1pxen80.cloudfront.net/SDK1.20.1.2/meta-llama/Llama-4-Scout-17B-16E-Instruct/vision_pl3968/llama4_vision_pl3968_cl8192_bs1_c8_ts4_sdk20_110.tar.gz |
Download |