> I'm not sure that means anything. In an OO language, you are always de-serializing into objects, and objects are always 'executable code'. Hashes and Arrays are executable code too, right?
No. You're conflating code and state (which was the problem to begin with!)
Let's disassemble parsing a list of strings:
When you instantiate the individual string objects, you do not 'eval' the data to allow it to direct which string class should be instantiated. You also do not 'eval' the data to determine which fields to set on the string class.
You instantiate a known String type, and you feed it the string representation as an array of non-executable bytes using a method you specified when writing your code -- NOT a method the data specifies.
The data is not executable. It's an array of untrusted bytes. The string code is executable, and it operates on state: the data.
You repeat this process, feeding the string objects into the list object. At no point do you ask the data what class or code you should run to represent it. Your parsing code dictates what classes to instantiate, and the data is interpreted according to those fixed rules, and your data is never executed.
It should never be possible for data to direct the instantiation of types. The relationship must always occur in the opposite direction, whereby known types dictate how to interpret data.
> I think it was not obvious to a bunch of people who are in retrospect _claiming_ it was obvious to.
Given the preponderance of prior art, this seems unlikely.
The YAML vulnerability was not from any 'eval' in the YAML library itself, you realize, right?
It was from allowing de-serialization to arbitrary classes, when it turned out that some classes had dangerous side-effects merely from instantiation -- including in some cases, 'eval' behavior, yes, but the eval behavior wasn't in YAML, it was in other classes, where it could be triggered by instantiation.
To use your language, I don't think it's 'intellectual honest' to call allowing de-serialization to data-specified classes "a YAML parser that executed code"--that's being misleading -- or to say that a 'trained monkey should have known it was a bad idea' (allowing de-serialization to arbitrary data-specified classes).
There have been multiple vulnerabilities _just like this_ in other environments, including several in Java (and in major popular Java packages). You could say with all that prior art it ought to have been obvious, but of course you could say that for each of the multiple prior vulnerabilities too. Of course, each time there's even more prior art, and for whatever reason this one finally got enough publicity that maybe this kind of vulnerablity will be common knowledge now.
> The YAML vulnerability was not from any 'eval' in the YAML library itself, you realize, right?
> It was from allowing de-serialization to arbitrary classes, when it turned out that some classes had dangerous side-effects merely from instantiation -- including in some cases, 'eval' behavior, yes, but the eval behavior wasn't in YAML, it was in other classes, where it could be triggered by instantiation.
No. You're conflating code and state (which was the problem to begin with!)
Let's disassemble parsing a list of strings:
When you instantiate the individual string objects, you do not 'eval' the data to allow it to direct which string class should be instantiated. You also do not 'eval' the data to determine which fields to set on the string class.
You instantiate a known String type, and you feed it the string representation as an array of non-executable bytes using a method you specified when writing your code -- NOT a method the data specifies.
The data is not executable. It's an array of untrusted bytes. The string code is executable, and it operates on state: the data.
You repeat this process, feeding the string objects into the list object. At no point do you ask the data what class or code you should run to represent it. Your parsing code dictates what classes to instantiate, and the data is interpreted according to those fixed rules, and your data is never executed.
It should never be possible for data to direct the instantiation of types. The relationship must always occur in the opposite direction, whereby known types dictate how to interpret data.
> I think it was not obvious to a bunch of people who are in retrospect _claiming_ it was obvious to.
Given the preponderance of prior art, this seems unlikely.