Skip to content

QwQ 32B AWQ

Model Overview

This is the the AWQ-quantized 4-bit QwQ 32B model.

QwQ is the reasoning model of the Qwen series. QwQ is capable of thinking and reasoning, can achieve significantly enhanced performance in downstream tasks, especially hard problems. QwQ-32B is the medium-sized reasoning model, which is capable of achieving competitive performance against state-of-the-art reasoning models, e.g., DeepSeek-R1, o1-mini.

QwQ is based on Qwen2.5, whose code has been in the latest Hugging face transformers.

  • Model Architecture: Transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias.Number of Parameters: 32.5B. Number of Paramaters (Non-Embedding): 31.0B. Number of Layers: 64. Number of Attention Heads (GQA): 40 for Q and 8 for KV. Context Length: Full 131,072 tokens. Quantization: AWQ 4-bit.
  • Model Repository: QwenLM/Qwen2.5
  • Model Source: Qwen/QwQ-32B-AWQ
  • License: apache-2.0

QPC Configurations

Precision SoCs / Tensor slicing NSP-Cores (per SoC) Full Batch Size Chunking Prompt Length Context Length (CL) Generated URL Download
MXFP6 4 16 1 128 8192 https://dc00tk1pxen80.cloudfront.net/SDK1.20.4/Qwen/QwQ-32B-AWQ/qpc_16cores_128pl_8192cl_1fbs_4devices_mxfp6_mxint8.tar.gz Download