Gpt oss 20b

Model Overview¶

OpenAI’s GPT-OSS models (gpt-oss-120b & gpt-oss-20b) are open-weight models designed for powerful reasoning, agentic tasks and versatile developer use cases. GPT-OSS-20B is used for lower latency, and local or specialized use cases.

Model Architecture: 21B parameters with 3.6B active parameters. Trained on harmony response format and should only be used with the harmony format as it will not work correctly otherwise.
Model Source: openai/gpt-oss-20b
License: Apache 2.0 license. Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment.
Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs.
Full chain-of-thought: Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users.
Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning.
Agentic capabilities: Use the models’ native capabilities for function calling, web browsing, Python code execution, and Structured Outputs.
Native MXFP4 quantization: The models are trained with native MXFP4 precision for the MoE layer, making the gpt-oss-20b model run within 16GB of memory.

QPC Configuration # 1¶

Precision	SoCs / Tensor slicing	NSP-Cores (per SoC)	Full Batch Size	Chunking Prompt Length	Context Length (CL)	QPC URL	QPC Size	QPC Download	Onnx URL	Onnx Download	Generation Date
MXFP6	2	8	1	256	32768	https://dc00tk1pxen80.cloudfront.net/SDK1.21.4.0/openai/gpt-oss-20b/gpt_oss20b_qpc_8cores_256pl_32768cl_[4k,8k,12k,16k]ccl_prefill.tar.gz	16GB	Download	https://dc00tk1pxen80.cloudfront.net/SDK1.21.4.0/openai/gpt-oss-20b/GPT-OSS-20B_ONNX_prefill.tar.gz	Download	29-Apr-2026
MXFP6	2	8	1	256	32768	https://dc00tk1pxen80.cloudfront.net/SDK1.21.4.0/openai/gpt-oss-20b/gpt_oss20b_qpc_8cores_256pl_32768cl[4k,8k,12k,16k]ccl_decode.tar.gz	18GB	Download	https://dc00tk1pxen80.cloudfront.net/SDK1.21.4.0/openai/gpt-oss-20b/GPT-OSS-20B_ONNX_decode.tar.gz	Download	29-Apr-2026

QPC Configuration # 2¶

Precision	SoCs / Tensor slicing	NSP-Cores (per SoC)	Full Batch Size	Chunking Prompt Length	Context Length (CL)	QPC URL	QPC Size	QPC Download	Onnx URL	Onnx Download	Generation Date
MXFP6	3	8	1	256	32768	https://dc00tk1pxen80.cloudfront.net/SDK1.21.4.0/openai/gpt-oss-20b/gpt_oss20b_qpc_8cores_256pl_32768cl_[4k,8k,12k,16k,32k]ccl_3devices_prefill.tar.gz	16GB	Download	https://dc00tk1pxen80.cloudfront.net/SDK1.21.4.0/openai/gpt-oss-20b/GPT-OSS-20B_ONNX_prefill.tar.gz	Download	30-Apr-2026
MXFP6	3	8	1	256	32768	https://dc00tk1pxen80.cloudfront.net/SDK1.21.4.0/openai/gpt-oss-20b/gpt_oss20b_qpc_8cores_256pl_32768cl_[4k,8k,12k,16k,32k]ccl_3devices_decode.tar.gz	19GB	Download	https://dc00tk1pxen80.cloudfront.net/SDK1.21.4.0/openai/gpt-oss-20b/GPT-OSS-20B_ONNX_decode.tar.gz	Download	30-Apr-2026

Run This Model¶

Download QPCs¶

mkdir -p openai/gpt-oss-20b
cd openai/gpt-oss-20b

# Download Prefill QPC
wget <Prefill_QPC_Download_URL>
tar xzvf <prefill_qpc_filename.tar.gz>

# Download Decode QPC
wget <Decode_QPC_Download_URL>
tar xzvf <decode_qpc_filename.tar.gz>

Run QPC¶

Replace prefill_qpc_path and decode_qpc_path with actual extracted QPC directories.

python3 -m qaic_disagg \
  --prefill-port 9800 \
  --decode-port 9900 \
  --port 8082 \
  --decode-device-group 2,3 \
  --prefill-device-group 0,1 \
  --model openai/gpt-oss-20b \
  --prefill-max-num-seqs 1 \
  --decode-max-num-seqs 1 \
  --prefill-max-seq-len-to-capture 256 \
  --max-model-len 32768 \
  --prefill-override-qaic-config "split_retained_state_io:True mxfp6_matmul:True enable_chunking:True qpc_path=<prefill_qpc_path>" \
  --decode-override-qaic-config "mxfp6_matmul:True retain_full_kv:True ccl_enabled=True comp_ctx_lengths_decode=4096,8192,12288,16384 qpc_path=<decode_qpc_path>" \
  -vvv \
  --dtype bfloat16 \
  --kv-cache-dtype mxint8 \
  --tool-call-parser openai \
  --enable-auto-tool-choice \
  --enable-log-outputs \
  --reasoning-parser openai_gptoss \
  --generation_config vllm \
  --enable-log-requests \
  --chat-template-content-format  openai

Useful Links¶

Environment Setup