Kimi K2 Thinking: The Chinese AI Model Outperforming GPT-5 And Claude Sonnet 4.5

Last updated: November 11, 2025

By Tricia Wei

China’s Moonshot AI launches new model lauded as No. 1 among open-source systems – Chinese artificial intelligence company Moonshot AI has unveiled a new version of its open-source model that is making waves across the tech world.

The model, called Kimi K2 Thinking, has reportedly outperformed OpenAI’s GPT-5 and Anthropic’s Claude Sonnet 4.5 in multiple benchmarks and is earning plenty of praise from AI experts online.

A Big Leap for Chinese Open-Source AI

The Beijing-based start-up announced on Thursday that Kimi K2 Thinking is now available on Kimi.com and through its API, so developers and businesses can integrate the model into their own tools and applications.

In a post on GitHub, Kimi’s researchers said that the new model has set “new records across benchmarks that assess reasoning, coding and agent capabilities.” This marks another major step in closing the performance gap between Chinese open-source AI systems and the high-end closed-source models from the US.

For a long time, Chinese open-source models were popular but usually a step behind their American counterparts. Now, it seems that is starting to change.

“A Turning Point in AI”: Kimi K2 Thinking – Deedy Das

“Today is a turning point in AI. A Chinese open-source model is #1,” wrote Deedy Das, a partner at Menlo Ventures, on X. “Seminal moment in AI.”

According to the GitHub post, Kimi K2 Thinking scored 44.9 per cent on the Humanity’s Last Exam benchmark, a rigorous test of large language models with 2,500 questions covering a wide range of topics. That result puts it ahead of GPT-5 and Claude Sonnet 4.5.

The model also outperformed the two US systems on the BrowseComp benchmark, which measures how effectively AI models browse the web and gather information. Kimi K2 Thinking scored 60.2 per cent there.

It also led the Seal-0 benchmark, which tests how well search-augmented models handle real-world research queries, with a score of 56.3 per cent.

Record-Breaking Performance Across Tests

Consultancy firm Artificial Analysis shared on X that Kimi K2 Thinking achieved 93 per cent accuracy on its Tau-2 Bench Telecom benchmark, the highest score the firm has ever recorded. This benchmark tests conversational AI models in technical support situations where the AI and user must work together to solve telecom problems.

“Kimi K2 Thinking achieved #1 in … the benchmark and is potentially the new leading [open-source] model,” the consultancy wrote.

The model is built with 1 trillion parameters and uses a Mixture of Experts design, which mimics how humans tackle complex problems by dividing them among specialized experts. In simple terms, more parameters often mean better performance and a deeper understanding.

Kimi K2 Thinking: The Chinese AI Model Outperforming GPT-5 And Claude Sonnet 4.5

Powerful Yet Affordable

Despite its massive scale, Kimi K2 Thinking’s API is impressively affordable, reportedly six to ten times cheaper than the APIs of OpenAI and Anthropic models, according to the Post’s calculation.

Moonshot AI is backed by major Chinese tech giants Alibaba Group Holding and Tencent Holdings. The company credits its success to a “model-as-an-agent” training approach, which strengthens the model’s ability to use different tools intelligently while reasoning through complex tasks.

Narrowing the Global AI Gap

“This is the closest open models have been to the closed frontier of performance AI ever,” wrote Nathan Lambert, a researcher at the Allen Institute for AI, in a Substack post.

Lambert added that the rise of Chinese AI innovators like Moonshot AI and DeepSeek has “made the closed labs sweat.” He also noted that “There’s serious pricing pressure and expectations that [the US developers] need to manage.”

Echoing that sentiment, Mohammad Emad Mostaque, founder of Stability AI, said on X that the new model proves “the gap between closed and open [AI models] continues to narrow.”

With Kimi K2 Thinking now leading in key benchmarks, it is clear that Chinese AI developers are not just catching up; they are setting new global standards.

ALSO READ: Honor Magic 8 Ultra: Expected Specs, Design, And Release Timeline