Back to Blog
5 min read

MiniMax M3: The Open-Weight Model That Beats GPT-5.5

MiniMax M3 launched 1 June 2026 with a 59% SWE-Bench Pro score, a 1M-token context window, native multimodal, and open weights on Hugging Face — all at $0.60 per million input tokens.

MiniMax M3: The Open-Weight Model That Beats GPT-5.5

The First Open-Weight Model to Match the Frontier

On 1 June 2026, Shanghai-based MiniMax released MiniMax M3, an open-weight large language model the company describes as the first to combine three capabilities in a single architecture: frontier-level coding performance, a one-million-token context window, and native multimodal understanding of images and video. The API went live on the same day. Open weights and the accompanying technical report appeared on Hugging Face and arXiv on 11 June 2026.

The headline benchmark is SWE-Bench Pro, the software engineering evaluation that has become an industry standard for measuring how effectively a model can complete real-world code changes from issue descriptions. M3 scores 59.0 per cent on that test, placing it above both OpenAI GPT-5.5 and Google Gemini 3.1 Pro. On BrowseComp, which measures a model's ability to find and synthesise information through web browsing, M3 scores 83.5 — surpassing Claude Opus 4.7's 79.3. The model is available to any developer with an API key at $0.60 per million input tokens and $2.40 per million output tokens, roughly 5 to 10 per cent of the cost of the closed frontier models it outperforms on coding benchmarks.

The MSA Architecture: Why It Matters

The core technical innovation in M3 is MiniMax Sparse Attention, abbreviated MSA. Standard transformer attention computes relationships between every token pair in a sequence, which scales quadratically with context length. At one million tokens, full attention is computationally prohibitive. MSA replaces it with a two-stage process: a lightweight index branch first selects the relevant blocks of the key-value cache, and the main attention layer then processes only those selected blocks.

The efficiency gains are substantial. Compared to MiniMax M2, MSA delivers more than 9 times faster prefill speed — the time taken to process an input prompt — and more than 15 times faster decoding speed at one-million-token context. Per-token compute at maximum context falls to one-twentieth of the M2 cost. These numbers matter in practice because they determine whether a one-million-token context window is usable in production, or merely a headline specification that makes latency impractical for real workloads.

What the Benchmarks Tell Builders

The four benchmark scores published for M3 each test a different dimension of capability. SWE-Bench Pro at 59.0 per cent measures autonomous software engineering — the ability to identify, plan, and execute code changes from an issue description without human guidance. Terminal-Bench 2.1 at 66.0 per cent tests how effectively the model operates as an autonomous command-line coding agent. MCP Atlas at 74.2 per cent measures tool-calling accuracy in multi-step agentic workflows — the most direct proxy for real-world agent reliability. BrowseComp at 83.5 evaluates web browsing and information retrieval.

To demonstrate agentic endurance, MiniMax tasked M3 with independently reproducing the experiments from an ICLR 2025 outstanding paper on the learning dynamics of LLM fine-tuning. The model ran for nearly 12 hours without human intervention, produced 18 code commits, and generated 23 experimental charts. It is a single data point rather than a controlled benchmark, but a more honest test of practical agentic capability than most curated evaluations.

Open Weights and What They Unlock

The open-weight release on 11 June 2026 means organisations can download M3's parameters and run the model on their own infrastructure. For enterprise teams handling data they cannot route through third-party APIs — regulated financial data, personal health information, proprietary code repositories — a frontier-class open-weight model changes the build decision. Instead of choosing between capability and data sovereignty, teams can now have both. The MSA technical report on arXiv provides the engineering detail needed to evaluate whether M3's attention architecture suits a specific production workload.

The Pricing Argument for Indian Development Teams

Indian software teams and AI product builders have been watching frontier AI model costs closely since late 2025, when agentic coding tool budgets became a real operational concern across engineering organisations of every scale. M3's pricing at $0.60 per million input tokens represents a structural departure from that cost pressure.

By comparison, models at the closed frontier — GPT-5.5, Claude Opus 4.8, Gemini 3.5 Pro — price input tokens at rates between $7 and $15 per million. The cost difference is ten to twenty times, not a marginal optimisation. For teams running high-volume document analysis, long-context code review, or multi-step agentic workflows where token consumption accumulates quickly, M3's economics are materially different from any closed frontier alternative.

For Indian teams with data localisation requirements, the open-weight release creates an additional path: self-hosted deployment at M3's capability level, with no outbound data transfer to a US-based API endpoint. The 512,000-token guaranteed minimum context and the MSA efficiency at full one-million-token scale mean the model is practically usable for large-codebase workloads without the latency penalty that typically makes long-context calls impractical in production.

The Bottom Line

MiniMax M3, released 1 June 2026 with open weights available from 11 June, is the first model to combine a one-million-token context window, native multimodal understanding, and a 59.0 per cent SWE-Bench Pro score in an open-weight package priced at $0.60 per million input tokens. It outperforms GPT-5.5 and Gemini 3.1 Pro on the coding benchmark most directly relevant to agentic development work, at a fraction of their cost. For Indian AI teams building agentic coding pipelines, long-context applications, or products where data cannot leave their own infrastructure, M3 is the first open-weight model that genuinely competes with the closed frontier. The weights are on Hugging Face today.

Frequently Asked Questions

What is MiniMax M3 and when was it released?+

MiniMax M3 is an open-weight large language model released by Shanghai-based MiniMax on 1 June 2026. Its API went live on the same day, and open weights were published on Hugging Face on 11 June 2026 alongside a technical report on arXiv. M3 is designed as the first open-weight model to combine frontier-level coding performance, a one-million-token context window, and native multimodal capabilities — image and video understanding — in a single architecture.

How does MiniMax M3 compare to GPT-5.5 and Gemini 3.1 Pro on benchmarks?+

MiniMax M3 scores 59.0 per cent on SWE-Bench Pro, surpassing both OpenAI GPT-5.5 and Google Gemini 3.1 Pro on that software engineering benchmark. On Terminal-Bench 2.1 it scores 66.0 per cent, on MCP Atlas 74.2 per cent, and on BrowseComp 83.5 — ahead of Claude Opus 4.7's 79.3. It achieves these results at $0.60 per million input tokens and $2.40 per million output tokens, which is 5 to 10 per cent of the cost of the closed frontier models it outperforms.

What is MiniMax Sparse Attention and how does it enable a 1M-token context window?+

MiniMax Sparse Attention (MSA) is the architectural innovation behind M3's one-million-token context capability. Instead of computing full attention across every token pair — which scales quadratically — MSA uses a two-stage approach: a lightweight index branch selects the relevant key-value cache blocks, and the main attention layer processes only those blocks. The result is more than 9 times faster prefill and more than 15 times faster decoding at one-million-token context compared to M2, at one-twentieth the per-token compute.

Where can developers access MiniMax M3 and what does it cost?+

MiniMax M3 is available via the MiniMax API at $0.60 per million input tokens and $2.40 per million output tokens. The context window supports up to one million tokens with a guaranteed minimum of 512,000 tokens. Open weights are available on Hugging Face since 11 June 2026, allowing teams to self-host the model. The accompanying MiniMax Sparse Attention technical report is published on arXiv and provides full architectural detail for teams evaluating deployment on their own infrastructure.

TT

Written by

TechPillow Team

Sharing insights on technology, product development, and the Indian tech ecosystem.

Ready to Build Something Extraordinary?

From ideation to launch, we're your end-to-end technology partner.

Book a Free Strategy Call