Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> "One way we explored approaching this was using puppeteer to automate opening websites in a web browser, taking a screenshot of the site, and traversing the HTML to find the img tags.

> We then used the location of the images as the output data and the screenshot of the webpage as the input data. And now we have exactly what we need — a source image and coordinates of where all the sub-images are to train this AI model."

I don't quite understand this part. How does this lead to a model that can generate code from a UI?



If I'm understanding correctly, they are talking about how they are solving very specific problems with their models.

In this case, if you look two images up you will see e-commerce image with many images composted into one image/layer. How will their system automatically decide whether all those should be separate images/layers or one composted image? To do so they trained a model that examines web pages and <img> tags and see's their location. Basically, they are under the assumption that their data has good decisions and you can learn in which cases people use multiple vs one image.

I could be misunderstanding :)


They have a known system that can go from specified coordinates to images in the form of puppeteer (chromium) and so they can run it on lots of websites to generate [coordinates, output image] pairs to use for training data. In general, if you have a transform and input data, you can use it to train a model to learn the reverse transform.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: