Hacker Newsnew | past | comments | ask | show | jobs | submit | rcarmo's commentslogin

I only generate skills _after_ I've worked through a problem with the model - usually by asking it "what have you learned in this session?". I have no idea why people would think it can zero-shot a problem space without any guidance or actual experience...

> I only generate skills _after_ I've worked through a problem with the model.

This is the correct way vast majority of the time. There are exceptions. When I know for certain that the models do not have enough training material on a new library or one that isn't often used or an internal tool. In those cases I know I will have struggle on my hand if I don't start out with a skill that teaches the model the basics of what it does not know. I then update the skill with more polish as we discover additional ways it can be improved. Any errors the model makes are used to improve existing skills or create new ones.


Why would you expect it to generate more effective skills when you aren't even making a salt circle or lighting incense?

Not surprising if you've been paying attention on Twitter, but interesting to see nonetheless.

I used to play this game incessantly. Audio on Firefox on Linux is, sadly, very very garbled.

I don't think this is a good "benchmark" anymore. It's probably on everyone's training set by now.

I think it could still be an interesting benchmark. Like, assuming AI companies are genuinely trying to solve this pelican problem, how well do they solve it? That seems valid, and the assumption here is that the approach they take could generalize, which seems plausible.

The point of this benchmark is that making decent SVG art is actually useful. Simon has private image prompts he uses, since he didnt say gemini failed at those it is reasonable to assume those were also successful.

Doesn’t work on an iPad. It’s not that hard to fix, but shows that the fundamentals of input handling and accessibility aren’t there.

well, you can start with https://github.com/rcarmo/go-textile, https://github.com/rcarmo/go-rdp, https://github.com/rcarmo/go-ooxml, https://github.com/rcarmo/go-busybox (still WIP). All of these are essentially SPEC and test-driven and they are all working for me (save a couple of bugs in go-rdp I need to fix myself, and some gaps in the ECMA specs for go-ooxml that require me to provide actual manually created documents for further testing).

I am currently porting pyte to Go through a similar approach (feeding the LLM with a core SPEC and two VT100/VT220 test suites). It's chugging along quite nicely.


I’m not crazy about the Liquid Glass look. I decided to stick with Reeder Classic until it dies, even if it’s nice to see a well maintained alternative…

Me too. Unfortunately, it seems Silvio has abandoned further development. There are some truly annoying bugs (YouTube embeds do not work, swipe gestures stopped working, iOS images are slightly cut off,…) :( But I tried all alternatives and can't make the transition after idk, 8 years…

I see this as another OPEX expenditure that has to be factored into Anthropic’s (hypothetical) profitability, and am intrigued as to what this means in an industry that is becoming rife with CAPEX sinks…

"I could never get the hang of Tuesdays"

- Arthur Dent, H2G2


Thursdays, unfortunately

Good. Maybe then we'll stop having Open Source projects using it as their only store of knowledge :)

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: