What's interesting to me isn't the self-generated finding (everyone here has correctly identified the methodology issue). It's Table 4 buried on page 6.
The domains where models have the weakest priors from pretraining benefit the most from external procedural knowledge. That's not surprising on its own, but there's an implication I haven't seen anyone raise: these are exactly the enterprise domains where that procedural knowledge is most proprietary and most dangerous to lose between sessions.
The paper's entire architecture is single-player. A SKILL.md sits in a directory, one agent reads it, session ends. When Agent A at a bank figures out the right approach to parsing 13F filings (0% to 75% with the right skill in this paper), that knowledge dies with the context window. Agent B starts from scratch.
We're building shared memory infrastructure for agents at Memco (https://memco.ai) and this paper maps directly to what our enterprise design partners keep telling us — the problem isn't writing skills, it's that procedural knowledge doesn't compound across agents, sessions, or teams. The paper even shows 2-3 focused skills outperform comprehensive docs, which is a retrieval problem masquerading as an authoring problem.
The question this paper should be asking isn't "can agents write their own skills" — it's "what infrastructure makes skills accumulate and transfer?" Static files in a folder is the wrong primitive for that.
Healthcare +51.9pp. Manufacturing +41.9pp. Software Engineering +4.5pp.
The domains where models have the weakest priors from pretraining benefit the most from external procedural knowledge. That's not surprising on its own, but there's an implication I haven't seen anyone raise: these are exactly the enterprise domains where that procedural knowledge is most proprietary and most dangerous to lose between sessions.
The paper's entire architecture is single-player. A SKILL.md sits in a directory, one agent reads it, session ends. When Agent A at a bank figures out the right approach to parsing 13F filings (0% to 75% with the right skill in this paper), that knowledge dies with the context window. Agent B starts from scratch.
We're building shared memory infrastructure for agents at Memco (https://memco.ai) and this paper maps directly to what our enterprise design partners keep telling us — the problem isn't writing skills, it's that procedural knowledge doesn't compound across agents, sessions, or teams. The paper even shows 2-3 focused skills outperform comprehensive docs, which is a retrieval problem masquerading as an authoring problem.
The question this paper should be asking isn't "can agents write their own skills" — it's "what infrastructure makes skills accumulate and transfer?" Static files in a folder is the wrong primitive for that.