One huge problem with these "cheap" models is that they happen to be more expensive in the typical agent workflow if the provider does not support caching.
Input and output costs are peanuts compared to the order of magnitude(or more) amount of tokens that hit the cache.
At that point you might as well use GPT-5. It will be the same price or cheaper, and more capable.
> One huge problem with these "cheap" models is that they happen to be more expensive in the typical agent workflow if the provider does not support caching.
DeepSeek supports caching and cache hits are a tenth of the cost.
First you complained about lack of caching. When you were informed that the model supports caching, instead of admitting your error you switched to an unrelated complaint. I hope that you you do not use similar strategies for discussion in your personal and work life.
caching is not a function of the model but the provider, all models can be cached. the provider serving the model decides if they are going to cache it. openrouter is not a provider but a middleman between providers, so some of their providers for deepseek might provide caching and some might not. if you just use any then you might run into the issue. some of their provider might use your data for training, some might not. you have to look at the list and you can cherry pick ones that won't train on your data and that also provide caching.
Input and output costs are peanuts compared to the order of magnitude(or more) amount of tokens that hit the cache.
At that point you might as well use GPT-5. It will be the same price or cheaper, and more capable.