An open-source language model powered by Mixture-of-Experts architecture. Designed for complex reasoning, long-context understanding, and state-of-the-art performance across coding, mathematics, and general intelligence tasks.
Built from the ground up for reasoning, Qwen 3.7 combines a sparse MoE architecture with extended context windows and reinforcement-learning-driven training.
A sparse MoE architecture activates only the most relevant parameters per token, delivering dense-model quality at a fraction of the compute cost. This means faster inference and higher throughput without sacrificing capability.
Supports long-context sequences ideal for document analysis, codebase reasoning, multi-turn conversations, and research paper comprehension. The extended window lets the model maintain coherence over thousands of tokens.
Trained with reinforcement learning to optimize for chain-of-thought quality. The model learns not just what to answer, but how to reason step by step, producing more reliable and interpretable outputs.
Weights, inference code, and training recipes are released under an open license. Researchers and developers can inspect, fine-tune, and deploy Qwen 3.7 freely without proprietary restrictions.
Trained on a diverse corpus spanning natural languages and programming languages. Qwen 3.7 excels at code generation, debugging, translation, and cross-lingual reasoning tasks out of the box.
Achieves state-of-the-art results on major reasoning benchmarks including MATH, GSM8K, HumanEval, MMLU-Pro, and more. Competitive with frontier models while remaining fully open and accessible.
A structured, inspectable process that turns a prompt into a well-reasoned response.
The input prompt is parsed and routed through the expert gating network, activating only the relevant MoE sub-networks for the task.
The model generates an internal chain-of-thought reasoning trace, exploring multiple reasoning paths before committing to an answer.
A lightweight verification step checks the reasoning for consistency and correctness, pruning invalid branches before the final output.
The verified reasoning trace is distilled into a final response, balancing thoroughness with conciseness for the intended use case.
Qwen 3.7 achieves competitive or leading scores across major reasoning, coding, and general knowledge benchmarks.
A clear explanation of what Qwen 3.7 is, the architecture behind it, how the reasoning loop works, and how to get started.
Qwen 3.7 is the latest iteration in the Qwen family of open-source large language models, released by the Qwen team at Alibaba Cloud. It represents a significant leap forward in reasoning capability, built on a Mixture-of-Experts (MoE) architecture that dynamically activates only the subset of parameters most relevant to each input token. This design choice means Qwen 3.7 delivers the reasoning depth of a much larger dense model while keeping inference costs manageable.
The model was trained using a multi-stage pipeline: continued pre-training on high-quality reasoning data, supervised fine-tuning on instruction and chain-of-thought traces, and a reinforcement learning stage that directly optimizes for reasoning quality. The result is a model that not only produces correct answers but does so through transparent, step-by-step reasoning.
Key architectural highlights include an extended context window that supports long-form document and code reasoning, a gating mechanism for efficient expert routing, and post-training alignment that reduces hallucination while preserving creative potential. Qwen 3.7 is released under an open license, giving the community full access to inspect, fine-tune, and deploy the model in both research and production settings.
Read the full technical report →The fastest answers to the questions people ask first about Qwen 3.7.
Qwen 3.7 was created by the Qwen team at Alibaba Cloud. The team has been responsible for the open-source Qwen model series, which has consistently pushed the frontier of open-weight language model performance. The model weights, inference code, and training recipes are released openly for both research and commercial use under a permissive license.
Mixture-of-Experts (MoE) is a neural network design where multiple specialized sub-networks ("experts") each handle different types of input patterns. A gating mechanism determines which experts to activate for each token. This means the model can have a very large total parameter count while only using a fraction of those parameters per inference step — giving you dense-model quality with sparse-model efficiency. For Qwen 3.7, this translates to faster inference, lower memory usage, and the ability to serve more concurrent requests without sacrificing output quality.
Qwen 3.7 achieves top-tier results across a wide range of benchmarks including MATH-500 (96.8%), GSM8K (97.1%), HumanEval (94.2%), MMLU-Pro (88.5%), LiveCodeBench (82.3%), and GPQA-Diamond (79.6%). It is particularly strong in mathematical reasoning, code generation, and complex multi-step reasoning tasks, often matching or exceeding the performance of much larger proprietary models.
Yes, Qwen 3.7 is released as open-source under a permissive license. The model weights are publicly available, along with inference code, example notebooks, and documentation. This allows developers and researchers to use the model for a wide range of applications including commercial deployment, fine-tuning on domain-specific data, and academic research without proprietary restrictions.
Qwen 3.7 can be run locally using several popular inference frameworks including transformers (Hugging Face), vLLM, llama.cpp, and Ollama. The model weights are available on Hugging Face Hub. Hardware requirements depend on the model size variant: the smaller versions can run on consumer GPUs with 8-16GB VRAM, while the full model benefits from multi-GPU setups or high-memory accelerators like the A100 or H100.
Qwen 3.7 supports a wide range of languages including English, Chinese, and many other major world languages. It is particularly strong in English and Chinese due to the training data distribution, but also demonstrates competent performance in code (Python, JavaScript, C++, Java, and many more), making it a versatile tool for multilingual and cross-lingual applications.
Every claim on this page is grounded in official documentation, the model card, or community discussion so you can verify the details yourself.
The source for the Qwen 3.7 announcement, architecture overview, benchmark results, and release details.
Read the blog →The official Qwen repository on GitHub with model weights, inference code, and usage examples.
View on GitHub →The model card and weight downloads for Qwen 3.7 on Hugging Face, including configuration and usage notes.
View on Hugging Face →Discussion threads and community analysis of Qwen 3.7's architecture, benchmarks, and practical usage.
Join the discussion →