I only generate skills _after_ I've worked through a problem with the model - usually by asking it "what have you learned in this session?". I have no idea why people would think it can zero-shot a problem space without any guidance or actual experience...
> I only generate skills _after_ I've worked through a problem with the model.
This is the correct way vast majority of the time. There are exceptions. When I know for certain that the models do not have enough training material on a new library or one that isn't often used or an internal tool. In those cases I know I will have struggle on my hand if I don't start out with a skill that teaches the model the basics of what it does not know. I then update the skill with more polish as we discover additional ways it can be improved. Any errors the model makes are used to improve existing skills or create new ones.
I think it could still be an interesting benchmark. Like, assuming AI companies are genuinely trying to solve this pelican problem, how well do they solve it? That seems valid, and the assumption here is that the approach they take could generalize, which seems plausible.
The point of this benchmark is that making decent SVG art is actually useful. Simon has private image prompts he uses, since he didnt say gemini failed at those it is reasonable to assume those were also successful.
I am currently porting pyte to Go through a similar approach (feeding the LLM with a core SPEC and two VT100/VT220 test suites). It's chugging along quite nicely.
I’m not crazy about the Liquid Glass look. I decided to stick with Reeder Classic until it dies, even if it’s nice to see a well maintained alternative…
Me too. Unfortunately, it seems Silvio has abandoned further development. There are some truly annoying bugs (YouTube embeds do not work, swipe gestures stopped working, iOS images are slightly cut off,…) :( But I tried all alternatives and can't make the transition after idk, 8 years…
I see this as another OPEX expenditure that has to be factored into Anthropic’s (hypothetical) profitability, and am intrigued as to what this means in an industry that is becoming rife with CAPEX sinks…
reply