What's interesting to me isn't the self-generated finding (everyone here has cor...

What's interesting to me isn't the self-generated finding (everyone here has correctly identified the methodology issue). It's Table 4 buried on page 6.

Healthcare +51.9pp. Manufacturing +41.9pp. Software Engineering +4.5pp.

The domains where models have the weakest priors from pretraining benefit the most from external procedural knowledge. That's not surprising on its own, but there's an implication I haven't seen anyone raise: these are exactly the enterprise domains where that procedural knowledge is most proprietary and most dangerous to lose between sessions.

The paper's entire architecture is single-player. A SKILL.md sits in a directory, one agent reads it, session ends. When Agent A at a bank figures out the right approach to parsing 13F filings (0% to 75% with the right skill in this paper), that knowledge dies with the context window. Agent B starts from scratch.

We're building shared memory infrastructure for agents at Memco (https://memco.ai) and this paper maps directly to what our enterprise design partners keep telling us — the problem isn't writing skills, it's that procedural knowledge doesn't compound across agents, sessions, or teams. The paper even shows 2-3 focused skills outperform comprehensive docs, which is a retrieval problem masquerading as an authoring problem.

The question this paper should be asking isn't "can agents write their own skills" — it's "what infrastructure makes skills accumulate and transfer?" Static files in a folder is the wrong primitive for that.