Gpt oss 120b
Model Overview
OpenAI’s GPT-OSS models (gpt-oss-120b & gpt-oss-20b) are open-weight models designed for powerful reasoning, agentic tasks and versatile developer use cases. GPT-OSS-120B is used for production, general purpose, high reasoning use cases that fit into a single 80GB GPU.
- Model Architecture: 117B parameters with 5.1B active parameters. Trained on harmony response format and should only be used with the harmony format as it will not work correctly otherwise.
- Model Source: openai/gpt-oss-120b
- License: Apache 2.0 license. Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment.
- Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs.
- Full chain-of-thought: Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users.
- Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning.
- Agentic capabilities: Use the models’ native capabilities for function calling, web browsing, Python code execution, and Structured Outputs.
- Native MXFP4 quantization: The models were post-trained with MXFP4 quantization of the MoE weights, making gpt-oss-120b run on a single 80GB GPU.
QPC Configuration - 1
| Precision |
SoCs / Tensor slicing |
NSP-Cores (per SoC) |
Batch Size |
Chunking Prompt Length |
Context Length (CL) |
Generated URL |
Download |
Generation Date |
| MXFP6 |
4 |
16 |
1 |
256 |
8192 |
https://qualcom-qpc-models.s3-accelerate.amazonaws.com/SDK1.21A1.1/openai/gpt-oss-120b/gpt_oss120b_qpc_256pl_8192cl_prefill.tar.gz |
Download |
13-Feb-2026 |
| MXFP6 |
4 |
16 |
1 |
256 |
8192 |
https://qualcom-qpc-models.s3-accelerate.amazonaws.com/SDK1.21A1.1/openai/gpt-oss-120b/gpt_oss120b_qpc_256pl_8192cl_decode.tar.gz |
Download |
13-Feb-2026 |
QPC Configuration - 2
| Precision |
SoCs / Tensor slicing |
NSP-Cores (per SoC) |
Batch Size |
Chunking Prompt Length |
Context Length (CL) |
Generated URL |
Download |
Generation Date |
| MXFP6 |
4 |
16 |
1 |
128 |
16384 |
https://qualcom-qpc-models.s3-accelerate.amazonaws.com/SDK1.21A1.1/openai/gpt-oss-120b/gpt_oss120b_qpc_128pl_16384cl_prefill.tar.gz |
Download |
13-Feb-2026 |
| MXFP6 |
4 |
16 |
1 |
128 |
16384 |
https://qualcom-qpc-models.s3-accelerate.amazonaws.com/SDK1.21A1.1/openai/gpt-oss-120b/gpt_oss120b_qpc_128pl_16384cl_decode.tar.gz |
Download |
13-Feb-2026 |
QPC Configuration - 3
| Precision |
SoCs / Tensor slicing |
NSP-Cores (per SoC) |
Batch Size |
Chunking Prompt Length |
Context Length (CL) |
Generated URL |
Download |
Generation Date |
| MXFP6 |
4 |
16 |
1 |
256 |
32768 |
https://qualcom-qpc-models.s3-accelerate.amazonaws.com/SDK1.21A1.1/openai/gpt-oss-120b/gpt_oss120b_qpc_256pl_32kcl_prefill.tar.gz |
Download |
16-Feb-2026 |
| MXFP6 |
4 |
16 |
1 |
256 |
32768 |
https://qualcom-qpc-models.s3-accelerate.amazonaws.com/SDK1.21A1.1/openai/gpt-oss-120b/gpt_oss120b_qpc_256pl_32kcl_decode.tar.gz |
Download |
16-Feb-2026 |
QPC Configuration - 4
| Precision |
SoCs / Tensor slicing |
NSP-Cores (per SoC) |
Batch Size |
Chunking Prompt Length |
Context Length (CL) |
Generated URL |
Download |
Generation Date |
| MXFP6 |
4 |
16 |
1 |
256 |
16384 |
https://qualcom-qpc-models.s3-accelerate.amazonaws.com/SDK1.21A1.1/openai/gpt-oss-120b/gpt_oss120b_qpc_256pl_16384cl_prefill.tar.gz |
Download |
18-Feb-2026 |
| MXFP6 |
4 |
16 |
1 |
256 |
16384 |
https://qualcom-qpc-models.s3-accelerate.amazonaws.com/SDK1.21A1.1/openai/gpt-oss-120b/gpt_oss120b_qpc_256pl_16384cl_decode.tar.gz |
Download |
18-Feb-2026 |
QPC Configuration - 5
| Precision |
SoCs / Tensor slicing |
NSP-Cores (per SoC) |
Batch Size |
Chunking Prompt Length |
Context Length (CL) |
Generated URL |
Download |
Generation Date |
| MXFP6 |
4 |
16 |
1 |
256 |
13312 |
https://qualcom-qpc-models.s3-accelerate.amazonaws.com/SDK1.21A1.1/openai/gpt-oss-120b/gpt_oss120b_qpc_256pl_13312cl_prefill.tar.gz |
Download |
19-Feb-2026 |
| MXFP6 |
4 |
16 |
1 |
256 |
13312 |
https://qualcom-qpc-models.s3-accelerate.amazonaws.com/SDK1.21A1.1/openai/gpt-oss-120b/gpt_oss120b_qpc_256pl_13312cl_decode.tar.gz |
Download |
19-Feb-2026 |
QPC Configuration - 6
| Precision |
SoCs / Tensor slicing |
NSP-Cores (per SoC) |
Batch Size |
Chunking Prompt Length |
Context Length (CL) |
Generated URL |
Download |
Generation Date |
| MXFP6 |
4 |
16 |
1 |
256 |
4096 |
https://qualcom-qpc-models.s3-accelerate.amazonaws.com/SDK1.21A1.1/openai/gpt-oss-120b/gpt_oss120b_qpc_256pl_4096cl_prefill.tar.gz |
Download |
19-Feb-2026 |
| MXFP6 |
4 |
16 |
1 |
256 |
4096 |
https://qualcom-qpc-models.s3-accelerate.amazonaws.com/SDK1.21A1.1/openai/gpt-oss-120b/gpt_oss120b_qpc_256pl_4096cl_decode.tar.gz |
Download |
19-Feb-2026 |
Run This Model
Download QPCs
mkdir -p openai/gpt-oss-120b
cd openai/gpt-oss-120b
# Download Prefill QPC
wget <Prefill_QPC_Download_URL>
tar xzvf <prefill_qpc_filename.tar.gz>
# Download Decode QPC
wget <Decode_QPC_Download_URL>
tar xzvf <decode_qpc_filename.tar.gz>
Run QPC
Replace prefill_qpc_path and decode_qpc_path with actual extracted QPC directories.
python3 -m qaic_disagg \
--prefill-port 9800 \
--decode-port 9900 \
--port 8082 \
--decode-device-group 4,5,6,7 \
--prefill-device-group 0,1,2,3 \
--model openai/gpt-oss-120b \
--prefill-max-num-seqs 1 \
--decode-max-num-seqs 1 \
--prefill-max-seq-len-to-capture <128/256> \
--max-model-len <context_length(cl)> \
--prefill-override-qaic-config "split_retained_state_io:True mxfp6_matmul:True enable_chunking:True node_precision_info=/workspace/examples/disagg_serving/non_subfunction_120b_npi.yaml qpc_path=<prefill_qpc_path>" \
--decode-override-qaic-config "mxfp6_matmul:True retain_full_kv:True node_precision_info=/workspace/examples/disagg_serving/non_subfunction_120b_npi.yaml qpc_path=<decode_qpc_path>" \
-vvv \
--dtype bfloat16 \
--kv-cache-dtype mxint8 \
--kv-handOff-port 5066 \
--tool-call-parser openai \
--enable-auto-tool-choice \
--enable-log-outputs \
--reasoning-parser openai_gptoss \
--generation_config vllm \
--enable-log-requests \
--chat-template-content-format openai
Useful Links