Have you tried Minimax M2.5? How did it compare?

kir-gadjello · 2026-03-02T19:17:18 1772479038

Much worse - from my experience minimax is not suitable for high autonomy on hard projects. The real distant second in my experience is mimo flash v2 (but I did not try the latest version, might be closer to parity). I would not use minimax for serious work.

StepFun 3.5 Flash is better compared to google's gemini 3 flash which is surprisingly good and pretty costly, and to GLM-5.

I find this outcome ironic given minimax's more aggressive marketing and large-scale distillation accusations from Anthropic specifically accusing minimax but not StepFun.

I can only wonder about the true underlying reasons, but deducing from public information I suspect that minimax simply has weaker, benchmaxx-targeting post-training R&D and leans more on distillation of western frontier models, while StepFun has extensive post-training with lots of hard-won custom R&D and internal large-scale distillation teachers.

Aerroon · 2026-03-02T19:41:49 1772480509

Interesting. I'm surprised you feel that it's better than GLM 5 - these models are in different weight classes after all.

I tried it out a bunch and it seems good. I can't really tell if it's better or worse than most of these other models in such a short time though.

kir-gadjello · 2026-03-02T19:53:53 1772481233

I don't think it's strictly better than GLM 5, more like they are peers (but in math competitions StepFun is stronger than most), and in my experience have similar coding/bugfix ceiling where world knowledge is not the deciding factor. But I didn't test GLM 5 for more than 30 hours, and my agentic harness (opencode) might be suboptimal - I'm open to the idea that GLM 5 with the right agentic harness is ready for ultra-long autonomy, but I have yet to see it myself.

Where GLM 5 is strictly worse for me though, compared to StepFun, is long-form content generation (planning, research documents) - but this can be said about geminis too and these are obviously very smart models.

Given the free option I'd explore GLM 5 more, but if I had to pay for it myself ofc I'd choose stepfun every time. Basically I think right now the optimal configuration for maximizing output of correct software features per dollar involves using StepFun or its future class competitor for bulk coding and first stage code review.

Maybe I need to write a blogpost about it after all.

Aerroon · 2026-03-03T02:17:11 1772504231

I tried them both out with a task of creating a todo-like web app (you can use the chat interface for GLM 5 for free if there's capacity). GLM 5 ended up with a working version. Sadly StepFun didn't quite function right. The main issue was that it ended up putting everything that should be in different columns into a single one. I didn't prompt it further to fix it, but it seems relatively capable. I think it beat what the big Qwen model came up with.

What's really surprising to me is the cost of the model. It's definitely very good for its price. DeepSeek is the only one that offers and competition to it at that price point (GLM 5 is literally 10x more expensive).