"64K tokens context window" I do wish they had managed to extend it to at least 128K to match the capabilities of GPT-4 Turbo
Maybe this limit will become a joke when looking back? Can you imagine reaching a trillion tokens context window in the future, as Sam speculated on Lex's podcast?
How useful is such a large input window when most of the middle isn't really used? I'm thinking mostly about coding. But wheb putting even say 20k tokens into the input, a good chunk doesn't seem to be "remembered" or used for the output
While you're 100% correct, they are working on ways to make the middle useful, such a "Needle in a Haystack" testing. When we say we wish for context length that large, I think it's implied we mean functionally. But you do make a really great point.
64GB is not GPU RAM, but system RAM. Consumer GPUs have 24GB at most, those with good value/price have way less. Current generation workstation GPUs are unaffordable; used can be found on ebay for a reasonable price, but they are quite slow. DDR5 RAM might be a better investment.
While there is a lot more HBM (or UMA if you're a Mac system) you need to run these LLM models, my overarching point is that at this point most systems don't have RAM constraints for most of the software you need to run and as a result, RAM becomes less of a selling point except in very specialized instances like graphic design or 3D rendering work.
If we have cheap billion token context windows, 99% of your use cases aren't going to hit anywhere close to that limit and as a result, your models will "just run"
Wasn't there a paper yesterday that turned context evaluation linear (instead of quadratic) and made effectively unlimited context windows possible? Between that and 1.58b quantization I feel like we're overdue for an LLM revolution.
Maybe this limit will become a joke when looking back? Can you imagine reaching a trillion tokens context window in the future, as Sam speculated on Lex's podcast?