Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I got fed up with GLM-4.7 after using it for a few weeks; it was slow through z.ai and not as good as the benchmarks lead me to believe (esp. with regards to instruction following) but I'm willing to give it another try.


I forgot to mention that GLM 4.7 loves to perform destructive operations; it'll happily git reset and push to main. Put it on a very tight leash.


Try Cerberas


I spent $10 in 2 minutes with that and gave up


Their 50 USD per month plan gives you 24M tokens per day: https://www.cerebras.ai/pricing


I had that for a few months and cancelled. They have minutely rate limits as well so you get 3-4 hyperspeed responses and then a 45 second pause waiting for the throttling to let your next request through.

And then, depending on what you're working on, the 24M daily allotment is gone in under an hour. I regularly burned it in about 25 minutes of agent use.

I imagine if I had infinite budget to pay regular API rates on a high usage tier, it would be really quite good though.


> They have minutely rate limits as well so you get 3-4 hyperspeed responses and then a 45 second pause waiting for the throttling to let your next request through.

I haven’t really gotten that, though have noticed on some occasions:

A) high server load notifications, most commonly, can delay an answer by about 3-10 seconds

B) hangs, this happens quite rarely, not sure if a network issue or something on their side, but sometimes the submitted message just freezes (e.g. nothing happening in OpenCode), doesn’t seem deliberate because resubmitting immediately works, more often than not

> And then, depending on what you're working on, the 24M daily allotment is gone in under an hour. I regularly burned it in about 25 minutes of agent use.

That’s a lot of tokens, almost a million a minute! Since the context is about 128k, you’d be doing about 8 full context requests every minute for 25 minutes straight.

I can see something like that, but at that point it feels like the only thing that’d actually be helpful would be caching support on their end.

You must be on some pretty high tier subscriptions with the other providers to get the same performance!


Synthetic is a bless when it comes to providing OSS models (including GLM), their team is responsive, no downtime or any issue for the last 6 months.

Full list of models provided : https://dev.synthetic.new/docs/api/models

Referal link if you're interested in trying it for free, and discount for the first month : https://synthetic.new/?referral=kwjqga9QYoUgpZV




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: