It supports both txt2img and img2img. (Not affiliated.)
Edit: Incidentally, I tried running it on a CPU. It is possible, but it took 3 minutes instead of 10 seconds to produce an image. It also required me to hack up the script in a really gross way. Perhaps there is a script somewhere that properly supports this.
I do runs at 384px by 384px, with batch size of 1. Sampling method has almost no impact on memory. Using k_euler with 30 steps renders an image in 10 to 20 seconds. The biggest thing that affect rending speed is the steps and the resolution, so 512x512 with C 50 using ddim is much slower than 256x256 with C 25 using k_euler.
The sampling methods run mostly in the same timelines, but the k_euler one can produce viable output at lower C values, meaning it is faster than the rest.
Don't add gfpgan in the same pipeline, as it takes more vram.
I'm running it on Windows 10 with latest drivers. I set the python process to Realtime priority in task manager (makes a slight difference!). Have not tried it on Linux.
Good way to know (if not found online) is to start at 512x512 if you have a card with 12GB VRAM and try to increment it (the ui slider increases it by 64px for each increment) and backtrack when you start getting "cuda out of memory" errors. I've seen some renders on discord where the sizes are well above 1000px, so they must have had a 16/24GB cards or something similar.
In the research context, they are used to using 40GB/80GB hardware (and perhaps multiples) to train and render. So quite remarkable that it works on consumer hardware at this point.
edit: on second thought, they most likely rendered at 512px but then ran it through an upscaler model. I've been meaning to hook mine up but kinda forgot to try.
Nah that's normal. It's why GPUs are the usual thing for AI. Any crap, old, weak gpu with 4gb memory would run circles around a cpu
It's often easier to actually get models to run on CPU, due to simpler install configs and more available memory. Just painful to get a result out of it. Which might help keep the install simple, because it's not even worth optimizing
I’m not even sure it works well - if at all - with 4Gb.
In any case, it’s impressive even if it takes minutes. And it’s not like you need to be there to make it work. You can create a list of prompts, let it do its thing and check the results later.
Good to know it works. That’s not a crap, old, weak gpu, I guess.
I’ve not tried that size but I tried 256x256 and it was too small to get interesting results - maybe there are some parameters that can be adjusted to improve it though.
How do you set up the model? Instructions only say "Download the model checkpoint. (e.g. from huggingface)", but I can't find instructions there on how to find a ckpt file, nor exactly what file should I look for.
Takes 3 minutes (for a prompt resulting in a set of 4 images) on my 1080 as well. Really astonished that it takes GP about the same time using just a CPU. Seems like the older generation of GPUs isn't much better than CPUs in regards to ML stuff.
To add another data point, my GTX1080 takes ~60 sec to generate a pair of 500x500 images using txt2img. Haven't tried img2img yet as the UI package I went with is a bit buggy with it
I'm using lstein's repository which loads the models and keeps them in memory. Then, for some diffusers 16 steps is enough to come up with a usable image (and using more steps will only add details, most of the time won't change much).
This is for the initial "exploration" step of the process. Once I like an image I typically play with the settings, then in the final step run with a large number of steps (and maybe even use upscaling).
So, default 512x512 size, 16 steps and default for the rest of the settings (I believe 7.5 scale, 0.75 strength).
Having said that, I also tried the official Docker image for stable diffusion and with the default values it generated an image in about 40 seconds.
Not sure exact pricing but look for a used maxwell (geforce 1000 series) nvidia gpu i'd bet. A quadro m2000 with 4gb of ram was about 100 on ebay a short bit ago
https://github.com/hlky/stable-diffusion
It supports both txt2img and img2img. (Not affiliated.)
Edit: Incidentally, I tried running it on a CPU. It is possible, but it took 3 minutes instead of 10 seconds to produce an image. It also required me to hack up the script in a really gross way. Perhaps there is a script somewhere that properly supports this.