GPT models can follow tool format correctly but don't keep on going.
Grok-4+ are decent but with issues in longer chats.
Kimi 2.5 has issues with it reverting to its RL tool format.
GPT models can follow tool format correctly but don't keep on going.
Grok-4+ are decent but with issues in longer chats.
Kimi 2.5 has issues with it reverting to its RL tool format.