Chinese food delivery giant Meituan has officially open-sourced LongCat-2.0 on GitHub, Hugging Face, and its native platform — revealing it as the model behind "Owl Alpha," an anonymous stealth system that spent two months at the top of global developer charts on OpenRouter.
The 1.6-trillion-parameter Mixture-of-Experts (MoE) model ships under the permissive MIT license with a native 1-million-token context window, targeting enterprise-scale autonomous software engineering.
The Stealth Run That Made It Famous
Before its official reveal, Owl Alpha had already racked up:
- ~10.1 trillion monthly tokens on OpenRouter
- 559 billion tokens/day on average
- 242% month-over-month volume growth
- #1 on Hermes Agent workspace
- #2 on Claude Code deployments
- #3 across international OpenClaw environments
The model had effectively earned its reputation before Meituan claimed it.
Pricing: Aggressive and Cache-Friendly
LongCat-2.0's commercial API introduces a tiered pricing strategy designed to undercut closed-source alternatives:
- Cache hits: completely free
- Standard pay-as-you-go: $0.75 / $2.95 per million tokens (input/output)
- Limited-time promo: $0.30 / $1.20 per million tokens (input/output)
At promotional rates, LongCat-2.0 sits competitively alongside MiniMax-M3 and well below GPT-5.5 ($5.00/$30.00) or Claude Opus 4.8 ($5.00/$25.00).
Trained Entirely on Chinese Silicon
Perhaps the most consequential aspect of this release: LongCat-2.0 was trained on a cluster of over 50,000 domestic Chinese ASICs — with no Nvidia GPUs involved.
This is a direct proof point that near-frontier, trillion-parameter models can be trained on homegrown silicon at scale. If Chinese labs can consistently iterate at this level without U.S. hardware, it introduces a meaningful long-term threat to Nvidia's stranglehold on frontier AI training infrastructure.
The timing amplifies the significance. The U.S. government has recently pressured OpenAI to restrict access to its GPT-5.6 models, and Anthropic previously took its Claude Fable 5 / Mythos 5 models entirely offline following similar directives. Critics argue these moves have inadvertently created a vacuum — one that open-source Chinese models are filling rapidly.
Architecture: Sparse Attention at Million-Token Scale
At the core of LongCat-2.0 is an aggressive MoE sparsity design:
- 1.6T total parameters, with only ~48B active per token
- Dynamic activation range of 33B–56B parameters depending on query complexity
- Zero-Compute Experts framework to eliminate idle overhead
To sustain the 1M-token context without hardware bottlenecks, Meituan developed LongCat Sparse Attention (LSA), an evolution of DeepSeek Sparse Attention, built on three orthogonal mechanisms:
- Streaming-aware Indexing (SI) — converts fragmented memory access into sequential, HBM-coalesced reads
- Cross-Layer Indexing (CLI) — amortizes attention indexing costs across adjacent layers using a single pass
- Hierarchical Indexing (HI) — applies a coarse-to-fine two-stage scoring pass to filter and refine token candidates efficiently
The model also incorporates an N-gram Embedding module — adding 135 billion parameters across a 5-gram token framework — which expands the embedding space roughly 100-fold and reduces memory I/O bottlenecks during large-batch inference.
Benchmark Performance
LongCat-2.0 is explicitly optimized for agentic, multi-step software engineering tasks rather than conversational fluency.
| Benchmark | LongCat-2.0 | Competitor | |---|---|---| | SWE-bench Pro | 59.5 | GPT-5.5: 58.6 | | Terminal-Bench 2.1 | 70.8 | — | | SWE-bench Multilingual | 77.3 | — | | FORTE (enterprise workflows) | 73.2 | — |
These results position LongCat-2.0 as a credible alternative to closed-source leaders for autonomous repository manipulation, tool integration, and long-context engineering pipelines.
Post-Training: The MOPD Framework
Rather than blending human feedback into a single reward function, Meituan's post-training pipeline relies on Multi-Teacher Optimization via Mixture of Specialized Experts (MOPD). The framework separates optimization into distinct specialized expert streams — enabling more precise behavioral shaping for agentic, tool-use, and code-generation contexts without the performance dilution common in generalist RLHF approaches.
What This Means
LongCat-2.0 represents a convergence of several industry-defining trends: the maturation of Chinese domestic AI silicon, the strategic value of open-source distribution, and the unintended consequences of Western export controls.
For global developers seeking affordable, high-performance agentic coding tools, the calculus is shifting — and Meituan just made that shift very concrete.
With MIT licensing, free cache hits, and benchmark results that challenge GPT-5.5, LongCat-2.0 is not a regional story. It's a signal of where frontier AI infrastructure power is heading.


