What GLM-5.2 Is (Z.ai's Open-Weight Model)
GLM-5.2 is a large language model released on June 17, 2026 by Z.ai (formerly Zhipu AI, based in Beijing), built for long-horizon tasks. Its defining trait is that the model itself is distributed as "open weights" — and under the permissive MIT license at that. In other words, anyone can download the model and build it into a commercial service, or adapt it to their own needs — and that is the decisive difference from closed models like ChatGPT or Claude, whose internals are not public.
GLM-5.2 key specs (official)
A 753B MoE and How IndexShare Works
GLM-5.2 has a huge 753B total parameters, but it uses a Mixture-of-Experts (MoE) design. MoE means only a part of the giant network fires for any given input, so not every parameter is computed each time. Reports put the active size at roughly 40B per token, which keeps such a large model manageable to run. On top of that, GLM-5.2's new "IndexShare" mechanism cuts the per-token compute (FLOPs) by about 2.9× even at a 1M-token context.
We propose IndexShare, which reuses the same indexer across every four sparse attention layers, reducing per-token FLOPs by 2.9× at a 1M context length. — From the model card's description of the IndexShare architecture (total parameters listed separately as "753B params")
The 1M-Token Context and the MIT License
GLM-5.2 can handle up to 1 million tokens of text (its context) at once. That is enough to load an entire large codebase or a stack of documents and keep working while preserving context. The core of GLM-5.2 is that it is designed to run long stretches of coding and agent-style autonomous execution stably. Training uses Z.ai's own RL framework, "slime," which supports everything from training through large-scale inference.
slime serves as an integrated infrastructure layer from training to large-scale inference rollout. — From the description of the "slime" training framework
GLM-5.2 Performance vs GPT-5.5
The main reason GLM-5.2 drew attention is that its coding performance drew level with the top closed models. But lumping it together as "beats GPT-5.5" is not accurate. Look at the official benchmark table and the wins and losses are clearly split.
Key coding benchmarks compared (figures from the official model card)
Scores normalized to 100. Blue = GLM-5.2 / grey = GPT-5.5 / black = Claude Opus 4.8. Terminal-Bench uses the Terminus-2 harness. Source: Z.ai official model card.
SWE-bench Pro (GLM-5.2 wins)
Terminal-Bench 2.1 (GPT-5.5 ahead)
FrontierSWE (GLM-5.2 beats GPT-5.5)
Where GLM-5.2 Beats GPT-5.5
GLM-5.2 beats GPT-5.5 on SWE-bench Pro (62.1 vs 58.6) and FrontierSWE (74.4 vs 72.6), both of which test long-horizon coding. An open-weight model topping one of the leading closed models on real coding benchmarks is close to a first. VentureBeat likewise reported that GLM-5.2 beat GPT-5.5 on multiple long-horizon coding benchmarks.
SWE-bench Pro: GLM-5.2 62.1 vs GPT-5.5 58.6. FrontierSWE: GLM-5.2 74.4 vs GPT-5.5 72.6. — From the official benchmark table (Coding category)
Opus 4.8 Leads on Terminal-Bench and SWE-bench Pro
That said, it does not win everything. On Terminal-Bench 2.1, GPT-5.5 (84.0) edges out GLM-5.2 (81.0), and the top SWE-bench Pro score belongs to Claude Opus 4.8 (69.2). Terminal-Bench rankings also shift with the harness (the execution environment), and some runs report GLM-5.2 ahead of Opus 4.8. The accurate read on GLM-5.2 is not "it dethroned the champion" but "an open model that pulled level with the very top on several key coding metrics." Keeping that distinction in mind helps avoid disappointment born of over-high expectations.
It Beats Claude Fable on Design Taste
It is also well regarded beyond coding. On Design Arena, which pits models against each other on design quality, GLM-5.2 was reported to beat Claude Fable (Fable 5). Scoring well on something a user feels immediately — the look of what comes back on the first prompt — is another reason GLM-5.2 became a talking point. AI researcher Nathan Lambert called it the first open model that feels right as a general agent inside coding harnesses.
GLM-5.2 is the open weight model that feels right in coding harnesses as a general agent. It's the first one. — From the analysis of GLM-5.2's significance
GLM-5.2 Pricing and How to Use It in Claude Code
Alongside performance, the low cost drew attention. VentureBeat reported that GLM-5.2 delivers comparable coding performance at about one-sixth the cost of GPT-5.5. Beyond being open-weight, the fact that it is also cheaper to use via API or a flat-rate plan is what motivates teams to consider switching.
Rough cost compared with GPT-5.5 (an estimate based on reporting)
Relative cost with GPT-5.5 set to 100. GLM-5.2 is an estimate based on the "about one-sixth" reporting; actual cost varies by usage. Source: VentureBeat and other reporting.
About One-Sixth the Cost of GPT-5.5
Because GLM-5.2's weights are distributed for free, running it on your own server incurs no extra per-model usage fee. On Z.ai's coding-focused flat-rate plan, usage is billed at 3× during peak hours and 2× during off-peak hours. The official blog also announced a limited-time promotion that bills off-peak usage at 1× through the end of September 2026. The fact that self-hosting incurs no usage-based fee in principle is what creates the large cost gap with closed ChatGPT-family models.
Consumes 3× during peak hours and 2× during off-peak hours. — From the description of the coding plan's consumption rate
Using It in Claude Code, ZCode, and OpenCode
GLM-5.2 is available from the major coding agents. Claude Code, Z.ai's own ZCode, and OpenCode all support it, so you can swap in just the model while staying in the tool you already know.
Steps to use GLM-5.2 in Claude Code
Speaking as someone who uses Claude Code at the center of day-to-day work, being able to try just a different model without changing tools is no small thing. Being able to swap in models with a different cost-performance balance, without touching your existing workflow, is the biggest practical upside of having more open models around.
Pros and Cautions for GLM-5.2
Finally, here are the points to weigh when deciding whether GLM-5.2 is for you. Its strengths are clear, but a model this large also brings real-world hurdles.
What to weigh before choosing GLM-5.2
The Strength of Open Weights You Can Self-Host
GLM-5.2's biggest strength is that, under the MIT license, you get the model itself. You can run it on your own server even for work where data cannot leave the building, and you can modify or fine-tune it to fit your use case. Not depending on a single AI provider — and not being at the mercy of pricing changes or a service shutting down — pays off the longer you use it in production. This kind of open option also dovetails with rising interest in sovereign AI and open-weight models.
The Catch of Running Such a Large Model Locally
On the other hand, 753B total parameters is not something you can casually run on a personal computer. Running it yourself takes a substantial GPU setup, so for most people an API or chat front-end is the realistic entry point. "Open weights" does not mean "anyone can run it on their own machine right away" — worth understanding correctly before you adopt it. A sensible path is to first check the performance through Z.ai's chat or Claude Code, then consider self-hosting if needed.
When you research models from overseas, you will often need to read English official docs and model cards. If you want an AI to summarize a long English document, converting it to Markdown first preserves the heading and table structure and improves accuracy. Pasting a web page as-is tends to mix in styling noise, so tidying the source before handing it over is the shortcut to stable results.
GLM-5.2 is a milestone model: open-weight, yet beating GPT-5.5 on some coding benchmarks. It is not top of the table on every metric, but having an option that handles long-horizon coding cheaply — standing alongside the closed models — matters a great deal. To judge whether it fits your needs, the steady approach is to try its actual responses via chat or API first. For developers who want to keep costs down on long-horizon coding and agent work, it has become a strong candidate.
