> "One way we explored approaching this was using puppeteer to automate opening ...

obmelvin · on Nov 10, 2023

If I'm understanding correctly, they are talking about how they are solving very specific problems with their models.

In this case, if you look two images up you will see e-commerce image with many images composted into one image/layer. How will their system automatically decide whether all those should be separate images/layers or one composted image? To do so they trained a model that examines web pages and <img> tags and see's their location. Basically, they are under the assumption that their data has good decisions and you can learn in which cases people use multiple vs one image.

I could be misunderstanding :)

mnutt · on Nov 10, 2023

They have a known system that can go from specified coordinates to images in the form of puppeteer (chromium) and so they can run it on lots of websites to generate [coordinates, output image] pairs to use for training data. In general, if you have a transform and input data, you can use it to train a model to learn the reverse transform.