China’s MiniMax-M1: Open‑Source LLM with 1 Million‑Token Window Surpasses DeepSeek in Efficiency


 1. Introduction

Shanghai’s AI startup MiniMax, backed by giants like Alibaba and Tencent, has unveiled its latest milestone—MiniMax-M1, an open‑weight reasoning model with a record-breaking 1 million token context window and groundbreaking compute efficiency. The team claims it surpasses DeepSeek’s R1 model on key reasoning and long-context tasks, all while operating at a fraction of the hardware cost.  


2. What Sets M1 Apart?

1 Million Input Context / 80 K Output Tokens

M1 supports up to a one‑million token input window eight times larger than DeepSeek-R1’s capacity. It also delivers up to 80K tokens of output.  

Open‑Source with Apache‑2.0 License

M1 is released on GitHub and Hugging Face under a permissive open-source license offering full transparency and commercial usage, unlike proprietary models.  

Hybrid MoE + Lightning Attention

Architecturally, M1 uses a mixture‑of‑experts (MoE) hybrid attention with a novel “Lightning Attention” mechanism, optimizing efficiency during training and inference.  

Ultra‑Efficient Training

Using 512 Nvidia H800 GPUs and MiniMax’s custom CISPO RL algorithm, the M1 training cost was $5–6 million) on R1.  

3. Performance Benchmarks

MiniMax-M1 has been evaluated using rigorous benchmarks including AIME 2024, LiveCodeBench, SWE‑Bench, TAU‑Bench, and MRCR. Results show M1 surpasses open‑weight competitors (DeepSeek-R1, Qwen3-235B) and even rivals closed models (Gemini, Claude, OpenAI o3) in areas like math, code reasoning, and long-context understanding.  


4. Efficiency vs DeepSeek

South China Morning Post reports that MiniMax-M1 consumes less than half the compute resources of DeepSeek-R1 for tasks involving ≤64K tokens. The company highlights that M1 “halves compute” and is vastly more efficient in reasoning tasks.  

Moreover, VentureBeat mentions M1 requires just 25% of DeepSeek’s operations at 100K generation length.  


5. Strategic Significance

The launch intensifies China’s domestic AI arms race, with MiniMax joining other “little dragons” like DeepSeek, Moonshot, and 01.AI. With national chip sanctions limiting access to cutting-edge hardware, efficiency and innovation are essential.  

MiniMax-M1 emerges as not only a technological breakthrough but also a strategic asset—supporting on‑prem deployment, lower licensing costs, and full customization for enterprise use.


6. Transparency & Trust

MiniMax’s release of model code, attention mechanism design, benchmarks, and efficiency claims provide a transparent foundation for independent verification—critical for user adoption and trust.  


7. Implications & Outlook

MiniMax-M1’s combination of ultra-long context, reasoning strength, open‑source licensing, and compute efficiency makes it a potent foundation for real-world applications:


  • Enterprise: Long-document processing, complex multi-step workflows, and tool‑augmented agents.
  • Developers: Full access for fine‑tuning and deployment via vLLM or Transformers.
  • Research: Enables safety audits, benchmark testing, and further innovation around mechanisms like Lightning Attention.


Moving forward, MiniMax is expected to add models, tools, and ecosystem integrations (a “MiniMaxWeek” campaign is underway). Its statements suggest ambitions to rival not only domestic, but also US closed‑weight leaders like GPT‑4 and Gemini.  


Final Thoughts

MiniMax-M1 represents a significant leap in China’s AI innovation: offering a long-context, high-performing, and open-source foundation model that efficiently challenges regional heavyweight DeepSeek. Its transparent release and robust benchmarks position it as a key player in the global AI landscape.

Lebih baru Lebih lama