QwQ 32B AWQ

Model Overview¶

This is the the AWQ-quantized 4-bit QwQ 32B model.

QwQ is the reasoning model of the Qwen series. QwQ is capable of thinking and reasoning, can achieve significantly enhanced performance in downstream tasks, especially hard problems. QwQ-32B is the medium-sized reasoning model, which is capable of achieving competitive performance against state-of-the-art reasoning models, e.g., DeepSeek-R1, o1-mini.

QwQ is based on Qwen2.5, whose code has been in the latest Hugging face transformers.

Model Architecture: Transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias.Number of Parameters: 32.5B. Number of Paramaters (Non-Embedding): 31.0B. Number of Layers: 64. Number of Attention Heads (GQA): 40 for Q and 8 for KV. Context Length: Full 131,072 tokens. Quantization: AWQ 4-bit.
Model Repository: QwenLM/Qwen2.5
Model Source: Qwen/QwQ-32B-AWQ
License: apache-2.0

QPC Configurations¶

Precision	SoCs / Tensor slicing	NSP-Cores (per SoC)	Full Batch Size	Chunking Prompt Length	Context Length (CL)	Generated URL	Download
MXFP6	4	16	1	128	8192	https://dc00tk1pxen80.cloudfront.net/SDK1.20.4/Qwen/QwQ-32B-AWQ/qpc_16cores_128pl_8192cl_1fbs_4devices_mxfp6_mxint8.tar.gz	Download