OpenAI releases two open-weight language models, gpt-oss-120b and gpt-oss-20b

OpenAI released gpt-oss-120b and gpt-oss-20b on August 5, 2025, an open-weight language model with 117 billion parameters, designed for high reasoning and production use.
It seems likely that this model performs comparably to some of OpenAI's proprietary models on tasks like coding and problem-solving, based on available benchmarks.
Research suggests it supports advanced features like tool use and adjustable reasoning, making it versatile for developers and researchers.
There is some controversy around its performance compared to closed models, with community feedback mixed between excitement and skepticism.

Introduction and Context

On August 5, 2025, OpenAI announced the release of two open-weight language models: gpt-oss-120b and gpt-oss-20b, marking a notable shift back to open models since their last open-weight release, GPT-2, in 2019. This article focuses on gpt-oss-120b, a 117 billion parameter model designed for production use, high reasoning, and general-purpose tasks. The release is significant as it aims to democratize access to advanced AI, aligning with OpenAI's mission to make AI broadly accessible, as stated by CEO Sam Altman: "We're excited to make this model, the result of billions of dollars of research, available to the world to get AI into the hands of the most people possible."

Model Overview and Specifications

gpt-oss-120b is a text-only, open-weight reasoning model with 117 billion parameters, utilizing a Mixture-of-Experts (MoE) architecture. This architecture activates only 5.1 billion parameters per forward pass, enabling it to run efficiently on a single H100 GPU, which typically has 80GB of memory. The model employs MXFP4 quantization, a 4-bit quantization scheme applied to MoE weights, reducing the memory footprint and facilitating fast inference. It was trained using OpenAI's harmony response format, which is critical for correct operation, and users are advised to use this format as detailed at Github.

Key specifications include:

Parameter Count: 117B total, 5.1B active per forward pass.
Architecture: Mixture-of-Experts with MXFP4 quantization.
Hardware Requirement: Fits on a single 80GB GPU (e.g., H100).
Training Cost: Approximately 2.1 million H100-hours, estimated at $2 million, highlighting the resource intensity.

The model supports configurable reasoning effort, allowing users to select low, medium, or high settings to trade off between latency and performance, and offers full chain-of-thought access for debugging and trust in outputs.

Performance and Benchmark Analysis

Research suggests gpt-oss-120b performs comparably to OpenAI's proprietary models, specifically o3-mini and o4-mini, on various benchmarks. Detailed performance metrics include:

Competition Coding (Codeforces): Outperforms o3-mini, matches/exceeds o4-mini.
General Problem Solving (MMLU, HLE): Similar superiority over o3-mini, near-parity with o4-mini.
Tool Calling (TauBench): Outperforms o3-mini, matches/exceeds o4-mini.
Health-Related Queries (HealthBench): Outperforms o4-mini, surpassing OpenAI o1 and GPT-4o.
Competition Mathematics (AIME 2024 & 2025): Outperforms o4-mini.

These benchmarks, detailed in OpenAI's blog, indicate strong real-world performance at a lower cost, with the model achieving near-parity with o4-mini on core reasoning tasks while running on a single 80GB GPU.

Use Cases and Capabilities

The evidence leans toward gpt-oss-120b being ideal for agentic tasks, where it can act autonomously, interacting with tools and environments. It supports advanced features such as:

Function Calling: Enabling integration with external APIs.
Web Browsing: Facilitating research and information retrieval.
Python Code Execution: Useful for coding and automation tasks.
Structured Outputs: Providing formatted responses for easier parsing.

These capabilities make it suitable for developing sophisticated AI applications, particularly in areas requiring reasoning, coding, and tool interaction. It is deeply customizable, supporting fine-tuning and offering full chain-of-thought transparency, which enhances trust and debugging.

Licensing, Availability, and Accessibility

gpt-oss-120b is licensed under the Apache 2.0 license, subject to OpenAI's gpt-oss usage policy, ensuring permissiveness for experimentation, customization, and commercial deployment without copyleft restrictions or patent risks. It is available for download on Hugging Face, and can be run using frameworks like Transformers, vLLM, llama.cpp, and Ollama, as noted in community discussions. For instance, Ollama's integration, detailed at Ollama, supports native MXFP4 format, enhancing accessibility.

Safety and Ethical Considerations

OpenAI has prioritized safety, conducting extensive evaluations under their Preparedness Framework, as outlined at safety paper. Adversarial fine-tuning tests showed the model does not reach high capability levels in biological, chemical, cyber, or AI self-improvement risks, reviewed by the Safety Advisory Group. The model card provides further details, noting a different risk profile compared to proprietary models due to open availability, with potential for determined attackers to fine-tune for harm, though mitigations are limited post-release.

Conclusion

gpt-oss-120b is a landmark release, offering a powerful, open-weight model that balances performance and accessibility. Its capabilities in reasoning, coding, and tool use, combined with a permissive license, position it as a valuable resource for developers and researchers. While community reception is largely positive, ongoing discussions highlight areas for improvement, such as multi-lingual support and fine-tuning ease. As the AI community continues to explore this model, it seems likely to drive significant advancements in open-source AI development.

Supporting Tables

Model	Parameters (Total/Active)	Use Cases	URL for Download
gpt-oss-120b	117B / 5.1B	Production, general purpose, high reasoning, single H100 GPU	https://huggingface.co/openai/gpt-oss-120b
gpt-oss-20b	21B / 3.6B	Lower latency, local or specialized use cases, 16GB memory	https://huggingface.co/openai/gpt-oss-20b

Benchmark/Task	gpt-oss-120b Performance	gpt-oss-20b Performance
Competition Coding (Codeforces)	Outperforms OpenAI o3-mini, matches/exceeds o4-mini	Matches/exceeds OpenAI o3-mini
General Problem Solving (MMLU, HLE)	Outperforms OpenAI o3-mini, matches/exceeds o4-mini	Matches/exceeds OpenAI o3-mini
Tool Calling (TauBench)	Outperforms OpenAI o3-mini, matches/exceeds o4-mini	Matches/exceeds OpenAI o3-mini
Health-Related Queries (HealthBench)	Outperforms o4-mini, outperforms OpenAI o1 and GPT-4o	Outperforms OpenAI o3-mini
Competition Mathematics (AIME 2024 & 2025)	Outperforms o4-mini	Outperforms OpenAI o3-mini