Try Stable Diffusion's Img2Img Mode

comex · on Aug 29, 2022

If you have a GPU with >4GB of VRAM and you want to run this locally, here's a fork of the Stable Diffusion repo with a convenient web UI:

https://github.com/hlky/stable-diffusion

It supports both txt2img and img2img. (Not affiliated.)

Edit: Incidentally, I tried running it on a CPU. It is possible, but it took 3 minutes instead of 10 seconds to produce an image. It also required me to hack up the script in a really gross way. Perhaps there is a script somewhere that properly supports this.

kgwgk · on Aug 29, 2022

There are as well forks for the GPU in Apple’s M1 chips:

https://github.com/magnusviri/stable-diffusion

francisduvivier · on Aug 29, 2022

Anyone knows how fast this runs on an m1 macbook air?

bjtitus · on Aug 29, 2022

Takes anywhere from 30s-1.5m on my M1 Max.

BatteryMountain · on Aug 29, 2022

Nvidia GTX 1660 Super with 6GB of VRAM.

I do runs at 384px by 384px, with batch size of 1. Sampling method has almost no impact on memory. Using k_euler with 30 steps renders an image in 10 to 20 seconds. The biggest thing that affect rending speed is the steps and the resolution, so 512x512 with C 50 using ddim is much slower than 256x256 with C 25 using k_euler.

The sampling methods run mostly in the same timelines, but the k_euler one can produce viable output at lower C values, meaning it is faster than the rest.

Don't add gfpgan in the same pipeline, as it takes more vram.

I'm running it on Windows 10 with latest drivers. I set the python process to Realtime priority in task manager (makes a slight difference!). Have not tried it on Linux.

gitfan86 · on Aug 29, 2022

I'm running 1660 ti on Windows 11.

I'm thinking about getting a 3090 so that I can make higher resolution images.

Gfpgan runs much faster for me 5 seconds per picture

machineleaning · on Aug 29, 2022

What resolution could you get on a 3090?

BatteryMountain · on Aug 30, 2022

Good way to know (if not found online) is to start at 512x512 if you have a card with 12GB VRAM and try to increment it (the ui slider increases it by 64px for each increment) and backtrack when you start getting "cuda out of memory" errors. I've seen some renders on discord where the sizes are well above 1000px, so they must have had a 16/24GB cards or something similar. In the research context, they are used to using 40GB/80GB hardware (and perhaps multiples) to train and render. So quite remarkable that it works on consumer hardware at this point.

edit: on second thought, they most likely rendered at 512px but then ran it through an upscaler model. I've been meaning to hook mine up but kinda forgot to try.

gitfan86 · on Aug 29, 2022

No idea, things are changing very quickly now

jtolmar · on Aug 29, 2022

I'm impressed that running it on CPU only made it ~20x slower. How did you do it?

lkois · on Aug 29, 2022

Nah that's normal. It's why GPUs are the usual thing for AI. Any crap, old, weak gpu with 4gb memory would run circles around a cpu

It's often easier to actually get models to run on CPU, due to simpler install configs and more available memory. Just painful to get a result out of it. Which might help keep the install simple, because it's not even worth optimizing

kgwgk · on Aug 29, 2022

> Any crap, old, weak gpu with 4gb memory would run circles around a cpu

Not really.

https://news.ycombinator.com/item?id=32635086

I’m not even sure it works well - if at all - with 4Gb.

In any case, it’s impressive even if it takes minutes. And it’s not like you need to be there to make it work. You can create a list of prompts, let it do its thing and check the results later.

hwers · on Aug 29, 2022

I have 4 gb and it takes about 9 seconds for me :) (Tho at 448x448 but there’s no real difference in quality.)

kgwgk · on Aug 29, 2022

Good to know it works. That’s not a crap, old, weak gpu, I guess.

I’ve not tried that size but I tried 256x256 and it was too small to get interesting results - maybe there are some parameters that can be adjusted to improve it though.

stavros · on Aug 29, 2022

The GP is asking why it's only 20x slower, rather than more slow.

lkois · on Aug 29, 2022

Ha yep I see that now, on a reread. My bad

stavros · on Aug 29, 2022

It's OK, I missed the "only" the first time around and read it the same way as you.

acidburnNSA · on Aug 29, 2022

Just tried it on Ubuntu 22.04. And it's working! Had to install python-is-python3 and conda, but it's up now. Fun, thanks.

TuringTest · on Aug 29, 2022

How do you set up the model? Instructions only say "Download the model checkpoint. (e.g. from huggingface)", but I can't find instructions there on how to find a ckpt file, nor exactly what file should I look for.

nilsb · on Aug 29, 2022

You'll need this file: https://huggingface.co/CompVis/stable-diffusion-v-1-4-origin... - Before you can download it you have to accept the T&C at https://huggingface.co/CompVis/stable-diffusion-v-1-4-origin...

nickthegreek · on Aug 29, 2022

Thank you, this saved me a lot of frustration.

asicsp · on Aug 29, 2022

Can you share your script for running on CPU?

ArneBab · on Aug 30, 2022

python optimizedSD/optimized_txt2img.py --device cpu --precision full --prompt "…" --H 512 --W 512 --n_iter 1 --n_samples 1 --ddim_steps 50

asicsp · on Aug 30, 2022

Thank you. There was a also a related submission for running on CPU: https://news.ycombinator.com/item?id=32642255

I'm planning to try it out this weekend. But not really hopeful. I only have 8GB ram, and mine is a cheap Intel CPU that doesn't even have AVX.

MichaelDickens · on Aug 29, 2022

What kind of GPU do you have? It takes several minutes to produce an image on my 1070.

elaus · on Aug 29, 2022

Takes 3 minutes (for a prompt resulting in a set of 4 images) on my 1080 as well. Really astonished that it takes GP about the same time using just a CPU. Seems like the older generation of GPUs isn't much better than CPUs in regards to ML stuff.

krisoft · on Aug 29, 2022

> Really astonished that it takes GP about the same time using just a CPU.

GP talks about generating a single image while you talk about generating 4.

stingraycharles · on Aug 29, 2022

A 1070 isn’t very powerful for ML compared to more recent GPUs, so several minutes sounds about right.

SirYandi · on Aug 29, 2022

To add another data point, my GTX1080 takes ~60 sec to generate a pair of 500x500 images using txt2img. Haven't tried img2img yet as the UI package I went with is a bit buggy with it

anhner · on Aug 29, 2022

also on a 1070, I can generate an image in ~15 seconds, surely you're doing something wrong.

freeqaz · on Aug 29, 2022

What settings are you using it the people with 1080s above you are taking 3 minutes?

anhner · on Aug 29, 2022

I'm using lstein's repository which loads the models and keeps them in memory. Then, for some diffusers 16 steps is enough to come up with a usable image (and using more steps will only add details, most of the time won't change much).

This is for the initial "exploration" step of the process. Once I like an image I typically play with the settings, then in the final step run with a large number of steps (and maybe even use upscaling).

So, default 512x512 size, 16 steps and default for the rest of the settings (I believe 7.5 scale, 0.75 strength).

Having said that, I also tried the official Docker image for stable diffusion and with the default values it generated an image in about 40 seconds.

bambax · on Aug 29, 2022

What are good and reasonably priced GPUs for this (<$250, possibly less)?

simcop2387 · on Aug 29, 2022

Not sure exact pricing but look for a used maxwell (geforce 1000 series) nvidia gpu i'd bet. A quadro m2000 with 4gb of ram was about 100 on ebay a short bit ago

bambax · on Aug 30, 2022

Thanks! A Quadro m2000 can still be found fr around 100 today, but I found an m4000 for 150, and ordered that... We'll see...

ilaksh · on Aug 29, 2022

Amazing. Anyone know of a fork where this is hosted on a cloud GPU? Or any existing hosting of this?

fragmede · on Aug 29, 2022

There’s a Colab implementation (Google hosted GPU) linked to from https://www.youtube.com/watch?v=Xur1JeRjjOI

outdoorblake · on Aug 29, 2022

here is a serverless GPU template for Stable Diffusion hosted on Banana's cloud platform. template: https://github.com/bananaml/serverless-template-stable-diffu... setup demo: https://www.banana.dev/blog/how-to-deploy-stable-diffusion-t...

nickthegreek · on Aug 29, 2022

Got this working on my 8gb 3070. 7-8s per image with default settings. Thanks for posting this!

joshuahedlund · on Aug 29, 2022

Some amazing examples of what people have done with img2img with Stable Diffusion:

https://old.reddit.com/r/StableDiffusion/comments/wy7oa5/img...

https://old.reddit.com/r/StableDiffusion/comments/wyq04v/usi...

https://old.reddit.com/r/StableDiffusion/comments/wzlmty/its...

emrah · on Aug 29, 2022

If i ran the same input image several times, would it produce the same output?

supermatt · on Aug 29, 2022

if you use the same parameters (e.g. image, seed, noise strength, guidance scale, sample count - which are not exposed on this UI), yes.

WithinReason · on Aug 29, 2022

Not necessarily, GPU compute can be non-deterministic due to scheduling: the result of A+B+C is subtly different when it's (A+B)+C or when it's A+(B+C), which can get amplified in a long processing pipeline

gliptic · on Aug 29, 2022

Depends. The default script uses a non-deterministic stochastic encode for img2img. That would need to be modified as far as I know.

supermatt · on Aug 29, 2022

On the forked repo with the webui (https://github.com/hlky/stable-diffusion) it seems that the same inputs result in the same output, so maybe it has been resolved there?

Kiro · on Aug 29, 2022

old.reddit is truly horrible on mobile. Once you click on an image you can't go back. Off topic, but what is the other alternative UI called that people sometimes use?

bj-rn · on Aug 29, 2022

https://teddit.net?

babuskov · on Aug 29, 2022

I use i.reddit.com on mobile.

xd1936 · on Aug 29, 2022

I use the Slide for Reddit app (one of many choices) and links open automatically because of the app's URL handler.

mishig · on Aug 29, 2022

Hello, everyone I'm Mishig, one of the engineers worked on setting up the demo. Happy to answer any questions if you got some :)

You can find the announcement tweet here: https://twitter.com/mishig25/status/1563226161924407298?s=20...

fragmede · on Aug 29, 2022

Just wanted to say thanks. It’s really cool!

supermatt · on Aug 29, 2022

NOTE: for some reason this is NOT using huggingface for inference, resulting in huge queues, slow performance, etc. It is sending requests to https://sdb.pcuenca.net/i2i (https://huggingface.co/spaces/huggingface/diffuse-the-rest/b...)

freeqaz · on Aug 29, 2022

I've been playing with this for a few hours. It's slow going -- you really need a fast GPU with a lot of RAM to make this very usable.

I ended up paying the $10 for Google Colab Pro and that's how I've been using this. Maybe I'll figure out how to get this working on my old 1080 TI to see if it's faster.

Anyway, for the one that I'm using which has a web UI, you can use this Colab link. It's pretty great! https://colab.research.google.com/drive/1KeNq05lji7p-WDS2BL-...

What I really wish was that the img2img tool could be used to take a text2img output and then "refine" it further. As it is, the img2img tool doesn't seem particularly great.

People on Reddit are talking about "I just generate 100 images and pick the best one"... but this is incredibly slow on the P100 GPU that Google has me on. Does this just require a monster GPU like a 3080/3090 in order to get any decent results?

StevenWaterman · on Aug 29, 2022

You can feed a txt2img output into the img2img pipeline as an init, that's something that I do quite often, eg https://twitter.com/SteWaterman/status/1563872748161613826

Also how slow is your p100? I'm usually getting around 3 it/s. Maybe it's just because I'm used to disco diffusion where a single image took over an hour, but this is ungodly fast to me

ShamelessC · on Aug 29, 2022

Colab can feel slow for other reasons, such as throttled download speeds making it very slow to download weights on a cold boot.

Jach · on Aug 29, 2022

FWIW I'm using an old gtx 1080 Ti to play around, it takes about 21 seconds per image. You can make it go even faster by lowering the timesteps taken from the default 50 (--ddim_steps), though both lowering and raising the value can result in quite different first-iteration images (though they tend to be similar) and seems to guarantee totally different further iteration images (as counted by --n_iter)... I'm with you on the feeling that it's hard to control, whether in refinement or in other ways, but I suspect that'll get a lot better in the next couple years (if not weeks or dare I say days).

epups · on Aug 29, 2022

With a 3090 it's about 10s to generate a 512x512 image from another, maybe less.

orbital-decay · on Aug 29, 2022

You're probably using the default PLMS sampler with 50 steps. There are better samplers, the best seem to be Euler (more predictable in regards to the number of steps) and Euler ancestral (gives more variation). Both typically need much less steps to converge, speeding up the generation.

ShamelessC · on Aug 29, 2022

My understanding was that PLMS was the current state-of-the-art. Would be interested if you have a citation for this "Euler" sampling method.

epups · on Aug 29, 2022

Yes I'm using PLMS. Thanks for the tip, will try.

googlryas · on Aug 29, 2022

What's huggingface's business model? How do they pay for all this compute and (apparently) 140+ employees?

tmabraham · on Aug 29, 2022

HuggingFace is a company that mainly builds open-source libraries and platforms to support open-source ML projects. They started out with their famous Transformers library and have many other libraries including the diffusion model one that is actually what this application here is using. They also have this model/dataset hub and interactive application platform known as "Spaces". Their goal is to be the "GitHub of machine learning".

Their business model is basically supporting enterprise and private use-cases. For example, getting expert support for using these libraries, or hosting models and datasets privately. You can see more information about the pricing here: https://huggingface.co/pricing

They reached a $2 billion valuation after a recent round of funding so overall they're probably pretty flush with cash lol

w-ll · on Aug 29, 2022

They are building a model of what everyone else is typing into these open models.

NoToP · on Aug 29, 2022

Aren't we all?

bee_rider · on Aug 29, 2022

I did see a couple people in the reddit posted here

https://news.ycombinator.com/item?id=32634139

complaining about the price of getting their images done. So part of it may actually exchanging money for a service. I bet a good chunk of it is investor cash though.

simonw · on Aug 29, 2022

You can also try it out on Replicate: https://replicate.com/stability-ai/stable-diffusion

gyuopy · on Aug 29, 2022

Note that it eventually asks for your credit card, it's not indefinitely free like the linked site.

bfirsh · on Aug 29, 2022

Here's an example of doing img2img: https://replicate.com/stability-ai/stable-diffusion?predicti...

qwertox · on Aug 29, 2022

I've been trying to get some sensible images out of my descriptions, but I fail miserably.

In this case I had the prompt "cow chewing bone" with 4 squares representing the two pair of feet, the body and the head. None cared about chewing on a bone.

With DALL·E 2 I tried to get an image of a little girl building sandcastles and a monster threatening her:

"little scared girl building a sandcastle and a big angry monster is looking at her."

"little scared girl building a sandcastle six damaged sandcastles are to her side. a big angry monster is threatening her. it is dark." https://imgur.com/a/f5FFKOi

"little scared girl building a sandcastle with six damaged sandcastles to her side and a big angry monster threatening her"

Is there some kind of structure the sentences should follow?

BatteryMountain · on Aug 29, 2022

Yes, checkout examples on lexica or use a prompt builder to help, like promptmania.

Also, most of the good ones you see online are cherry picked from hundreds of runs, so set your batch size too 1000 and go to bed! After that, people then tend to run some of the good results through img2img, also with a lot of variations produced from a single image. Finally, some people also run them at higher resolutions if they have enough VRAM, as smaller resolution can distort or generate rubbish. For the messed up faces, they run it through gfpgan a few times to get prettier faces. Other than that, it is pure luck (using random seeds) to figure out what works and what doesn't. Use the 2 sites above to help you improve your prompts.

(meant in the context of stable diffusion)

BatteryMountain · on Aug 30, 2022

Just know that if you let it run over night often, you will see it on your electricity bill. My GTX 1660 runs at max while rendering, which is 125W. Leaving it running over night can easily eat 2 to 6 kw's, depending on your system.

bemmu · on Aug 29, 2022

I managed to get one that was correct with "A little scared girl is building a sandcastle, while a monster is looking at her. Award-winning photograph.", but I couldn't figure out a phrasing where it wouldn't most of the time get confused thinking that the sandcastle is the monster, or that the girl is the monster.

Marazan · on Aug 29, 2022

Dalle is bad at being instructed to have an exact count of items in the picture. Ask for 6 kittens and you get 7 & each kitten will be much more "wrong" than a piture of a single kitten.

Dalle is is bad at positional prompts. Ask for somethi g to be in the top rightbhand corner and it will appear bottom centre

ThePhysicist · on Aug 29, 2022

It only produces colored noise for me. What am I doing wrong?

wokwokwok · on Aug 29, 2022

Wait longer.

That’s not the “incremental diffusion preview” it’s the “waiting in a queue” preview.

vimy · on Aug 29, 2022

Is your resolution set below 512x512?

fareesh · on Aug 29, 2022

Is anyone working on a variation of this where the output is a blender scene?

namrog84 · on Aug 29, 2022

I am very interested in possibilities of generating 3d models as well.

There is a fair amount of 3d models out there so it should be possible.

I suspect we will end up using multiple 2d images at different angles to generate a 3d model. I have seen this done before

coolspot · on Aug 29, 2022

https://github.com/NVlabs/instant-ngp

cout · on Aug 29, 2022

Why does the button say "diffuse the f rest"? What is the f for?

gliptic · on Aug 29, 2022

It's a play on the "draw the rest of the fucking owl" meme.

mikkelam · on Aug 29, 2022

Diffuse the fucking rest, I would assume.

Gigachad · on Aug 29, 2022

Looks pretty hard to get it to run. Queue full every time.

fragmede · on Aug 29, 2022

Skip the queue and run the colab version instead: https://www.youtube.com/watch?v=Xur1JeRjjOI

bfirsh · on Aug 29, 2022

You can also run it on Replicate without a queue: https://replicate.com/stability-ai/stable-diffusion (set “init_image”)

supermatt · on Aug 29, 2022

It is super easy to install and use locally: clone repo, create conda env, run.

there are a number of forks and auxilliary repos available that add UI, reduce memory requirements, etc.

bravura · on Aug 29, 2022

Is there an interpolate mode for Stable Diffusion? (Interpolate between two images without using a text prompt.)

Here are DALL-E 2 interpolations: https://twitter.com/model_mechanic/status/151297688118364569...

f38zf5vdt · on Aug 29, 2022

One of the first things that came out when the model was released.

https://github.com/schmidtdominik/stablediffusion-interpolat...

bravura · on Aug 29, 2022

Thank you, but this notebook appears to use images generated from text prompts. I was interested in interpolating between two given images, without generating from text.

f38zf5vdt · on Aug 29, 2022

You would need to run CLIP and generate CLIP embeddings from the images first, then feed them into the samplers. If you give it the starting images and caption them yourself, then it would also work.

fire · on Aug 29, 2022

Seems like if you draw something and hit start, but get the queue error, you lose the image prompt you drew as the frame is replaced with the first steps of diffusion noise? None of the results have a composition anywhere close to my image prompt, presumably because I can't get a run on first attempt.

hakuseki · on Aug 29, 2022

I'm not quite grasping how to use this. I tried uploading a photograph and erasing part of it. But instead of painting in the erased portion, it left the erased area blank and replaced my photograph with an entirely new image.

desindol · on Aug 29, 2022

What you tried is called inpainting. Img2Img needs a rough sketch.

natch · on Aug 29, 2022

I guess documentation and clarity of UX design are not big strengths of Hugging Face.

supermatt · on Aug 29, 2022

its like github pages - the repo owner is responsible for the UI: https://huggingface.co/spaces/huggingface/diffuse-the-rest/t...

salawat · on Aug 29, 2022

They haven't built the ML model for that due to paucity of training data.

aabbcc1241 · on Aug 29, 2022

I though it is a joke when seeing the random pixels.

Try again after seeing this post made into front page, it's actually working (and relatively very fast), just the loading screen is misleading ...

keepquestioning · on Aug 29, 2022

Can't help but see this as a friendly herald for a dystopian future.

rvz · on Aug 29, 2022

There you go. We were promised quite far fetched things like 'flying cars', 'time travel', 'life extension' and 'universal basic income'.

Instead we get Deep Learning AI's being used for generating and faking sentences, images, videos, voices, code and digital art all being trained on mountains of data in data centers all significantly contributing to the already burning up of the planet to no benefit and no efficient alternatives.

A dystopian future with deepfakes and easy fake news creation thanks to the technologists who helped create those things for others to create a new griftopia on top of that.

So you will own nothing, believe everything that you see on the internet, and be very happy.

wruza · on Aug 29, 2022

People are too pessimistic.

If we had time travel, they’d say now someone can kill our granddads and rewrite history. If we had flying cars, they’d say now our flats are worthless because everyone can fly by and gaze. If we had universal basic income, they’d say something about it too.

It’s really nice that we only got AIs that draws pictures and bitcoin that helps with stolen money.

npteljes · on Aug 29, 2022

Tune out the news and look for a futurism themed news source[0], and you can get your techno-optimistic pipe dreams back, right now. What you describe as "we were promised" and "we got" is just you seeing a bit more of the world. It was filled with empty promises and horrors in the beforetimes too, today's flavor is nothing special - just in its specifics for today's culture. And there's also plenty of people doing great things in the past too, perhaps not promising those things, just living their life doing something they believe in, and getting results.

My point being, there's enough good things and bad things to fit any mood and worldview. Everyone can basically pick as they'd like.

[0] something like this https://www.reddit.com/r/Futurology/

desindol · on Aug 29, 2022

You must be kidding it’s hard labor creating images and you are complaining that it’s going to be easier in the future? You are complaining that the skill gap is eradicated and it now only depends on your ideas to produce compelling work? That’s absolutely insane everything that takes labor intensive tasks and makes them basically free is the future. This is not dystopian it’s our future, a future in which you and your capabilities don’t matter. It’s exactly the future we need and doing it opensource is exactly the way to keep it out of harms way of turbo capitalism. You are missing the point so hard it’s not even funny. Flying cars for the 1% is dystopian.

omnimus · on Aug 29, 2022

The skill gap was what made it valuable. There is not that much value in flooding everyone with generated images that have no story, no person behind it, no effort, no reason to exist, out of place. “Art” is as much about the final image as it is about the person who made it, why she made it and how she got there.

Also there is pretty dystopian angle because these tools balantly stole all the work of artists by “learning” on their work (without pemission). Calling it learning is too nice we all know its just huge visual pattern copy mashup paste machine. People are not gonna invent with this - they just add name of artist they like to the prompt and get to call that their work.

Its not liberation - its exploitation. Its gonna destroy peoples lives by making their hard earned skills obsolete.

Then again it probably wont be so bad. Artists wont disappear and people wont suddenly become artists just because they can write into prompt.

desindol · on Aug 29, 2022

You in 1894: Why does no one think about all the horses and stable boys. The value of art is also not in the hours you put into creating it it's about the idea behind it and how it's conveyed. Artists are also always stealing especially concept artists the only thing to say is "photobashing". You talk like someone who has absolutely no idea how the sausage is made.

omnimus · on Aug 29, 2022

Yes, yes and invention of camera will kill painters. Sure everybody knows this.

But there is important difference between someone stealing and some algorithm copy replicating anything from the past in instant. One is a remix that brings something new (even if author doesn't want) the other in static its conservation. It will create side effects that will impact our (visual) culture. But who knows what those will be.

desindol · on Aug 29, 2022

You must have missed the last 50 years with that much fear of change. Yikes.

omnimus · on Aug 29, 2022

Not sure why i deserve so much toxicity. We obviously work in different fields and have different view of what art actually is. No i am not happy to jump on AI bandwagon that will surely mean even faster pace and more precarious working conditions because individuals surely wont reap the benefits of automation. It's easy to not care when you are not affected.

None of that mean that i deserve the hate. It's just a different opinion.

ctdonath · on Aug 30, 2022

I keep providing drawings, and it mostly just produces other drawings. Thought it was to fill in content accordingly. What am I missing?

astrange · on Aug 29, 2022

If you upload an image with 2:1 aspect ratio it squishes it to 1:1 instead of letting you crop it. Seems like a basic thing they could add.

ShamelessC · on Aug 29, 2022

I bet you could add it yourself, or at least comment on it on the GitHub.

astrange · on Aug 30, 2022

Yeah, I just didn't know if that control was part of this project or a standard part of the website.

frozencell · on Aug 29, 2022

Is there a tutorial on how to run this on Paperspace?

fwsgonzo · on Aug 29, 2022

If I want to run Stable diffusion in C++ only? I would like to build this program statically. I have been searching on github but so far unable to find anyone having ported this.

frozencell · on Aug 29, 2022

What are your best prompts so far?

TakeBlaster16 · on Aug 29, 2022

I typed "queue overflowing its capacity" and got pretty realistic results

SequoiaHope · on Aug 29, 2022

Okay that was good.

smcameron · on Aug 29, 2022

I typed in "space nerds in space fighting the zarkons", And it gave me a picture that looked similar to NASA's astronaut group photos, with 3 astronauts, 2 human, and one 3-eyed ninja-turtle looking alien in a spacesuit. Behind them was a space scene with a huge, earthlike planet. The faces of the two human astronauts looked eerily similar to the face of a friend of mine, a friend who happens to be the 2nd most prolific contributer of code to my open source game called "Space Nerds in Space". One of the human astronauts was bald, the other had a mullet, and the long portion of their hair was not entirely unlike my own hair.

freeqaz · on Aug 29, 2022

I found this tool pretty interesting (it helped me frame my mindset while creating prompts). https://promptomania.com/stable-diffusion-prompt-builder/

hirundo · on Aug 29, 2022

"john wayne as captain america" is spot on. It has so many images of both to work with.

rootw0rm · on Aug 29, 2022

i tried "john wayne was a nazi" but was left disappointed

Rebelgecko · on Aug 29, 2022

Why were you disappointed? I haven't tried that with img2img, but as just a regular text prompt without any fancy prompt engineering I get results in line with what I'd expect (kind of a Hugo Boss/Cowboy crossover, e.g. this: https://imgur.com/a/gcKi3WG)

croes · on Aug 29, 2022

Have you tried "as a Nazi"?

bilsbie · on Aug 29, 2022

What is it?

simonw · on Aug 29, 2022

You may find my blog entry useful: https://simonwillison.net/2022/Aug/29/stable-diffusion/

mpaepper · on Aug 29, 2022

To understand the more technical aspects of how it works: https://www.paepper.com/blog/posts/how-and-why-stable-diffus...

arhankb · on Aug 31, 2022

portal in the distance on a wheat field

avocado2 · on Aug 29, 2022

web demo for stable diffusion (txt2img): https://huggingface.co/spaces/stabilityai/stable-diffusion

github with gui: https://github.com/hlky/stable-diffusion

dev repo (more features, may have bugs): https://github.com/hlky/stable-diffusion-webui

repo with docker: https://github.com/AbdBarho/stable-diffusion-webui-docker

colab repo (new): https://github.com/altryne/sd-webui-colab

can also run it in colab (includes img2img): https://colab.research.google.com/drive/1NfgqublyT_MWtR5Csmr...

demo made with gradio: https://github.com/gradio-app/gradio

altryne1 · on Aug 29, 2022

I'm one of the maintainers (in charge of the UI) for hlky webui repo! And we just updated our own colab, you can find it here : https://github.com/altryne/sd-webui-colab

do_anh_tu · on Aug 29, 2022

Huge respect! Your colab file run perfectly, I’m using it for a few hours now, very addictive.

mmh0000 · on Aug 29, 2022

Well. Apparently "penis on park bench" is an "inappropriate" query.

However, "lot of child dying in a massive fire" is an a-okay query with some "interesting" results.

Prudes are such a weird bunch.

omginternets · on Aug 29, 2022

I’m not gonna lie, I’m quite disappointed at all the anti-NSFW shenanigans.

deanCommie · on Aug 29, 2022

I'm not gonna lie, I'm quite disappointed by the wilful obstinance on HN whenever there is a discussion of anti-NSFW or anti-racism filters that are present on today's artificial intelligence research.

First, the filters are only on the interface. The research is public. That's why there has been an explosion of new implementations. You are welcome to run the code yourself and make the horniest model you like.

But more importantly, this comes up every time, and everyone acts SO confused. But why? Artificial Intelligence is MASSIVE mainstream news. New advancements are arriving daily.

Do you want the news stories to be dominated by whatever heinous shit some bad-faith giggly teenager with a call of the void to offend as much as possible (i know, i was one) is able to generate?

Do you want the discourse to be dominated with "Won't someone think of the children?" blocking legitimate research and progress?

Europeans always like to dunk on Americans for being "prudes", but only a tiny section of Western Europe is progressive enough to not mind random nipples on their bus ads and television. All of Eastern Europe, Africa, India, and China are culturally still fairly conservative about sex - at least out in the open.

People think that preventing racist or sexual use of these models is indulging the prudish mores of America. But that itself is a very narrow perspective that ignores the perspectives of billions of people in the world.

epups · on Aug 29, 2022

I can generate many violent prompts right now, and the model knows the face of hundreds of celebrities (you would think they would garble those to prevent PR disasters). It knows what the 9/11 attack was, it knows about nuclear bombs, horror, gore. Yet somehow it doesn't know what a blowjob is. This is 100% about narrow-minded, prudish conservative values. And yes, most of them come from the US right now, because these models are not being trained in Africa, India or China.

Your argument is entirely descriptive. We know why they do this. What should be argued here is how stupid and sad it is to block progress due to nothing more than religious or moral values.

deanCommie · on Aug 29, 2022

Tell me, earnestly, what progress is being blocked by your inability to use AI to generate an image of a blowjob.

I don't mean this in a moralistic sense. I have no qualms about images of blowjobs. I have no doubt that the porn industry has already deployed these models and is experimenting with this without filters. As a monetizable way to reduce the human costs, it's an entirely logical step, and adult performers should be as nervous as artists, and talking to lawyers.

But PROGRESS???

epups · on Aug 29, 2022

It's funny because you spell out a use case that can represent undeniable progress: artificial porn. Porn with no component of human suffering, yet realistic and personalized. A dream product, no doubt. By what metric is this not progress, except a puritanical/religious one?

Moreover, sexuality is perhaps the most important theme in art, historically. It is a very legitimate thing to want to include in a tool such as this one. Its censorship is akin to "moral codes" of the past, completely regressive.

deanCommie · on Aug 29, 2022

I'm not as sure that artificial porn is undeniable progress.

While I am entirely pro liberty in terms of consumption of pornography, it is not a replacement for human interaction, and can be used in conjunction with other tools to dehumanize and isolate individuals. Some individuals that then make negative contributions to the rest of society at large. These are not isolated incidents, and they've been rising: https://en.wikipedia.org/wiki/Misogynist_terrorism

There are many kinds of porn - some that exploit women, some that empower them in their production. And there are also many kinds of products - ones that focus on a human connection (whether vanilla or kinky) and ones that don't.

I would absolutely worry that artificial porn would be good enough to meet some of these demands but not most, and would ultimately be a net negative for society.

> Moreover, sexuality is perhaps the most important theme in art, historically.

I mean, it's demonstrably not, religious art imagery dominates by volume. But that's also not necessarily important - that's just who happened to be patrons of arts and had the means to commission them.

I'm not gonna deny that humans are horny and want to make lots of sexy art.

But you're also not going to get porn on basic cable. You have to seek that out extra for yourself, and that's going to be the case with AI-generated porn art also.

Gigachad · on Aug 29, 2022

I think people can both understand the motives of the companies and even choose to do the same themselves, but still be disappointed in the state of society where this is required.

I agree with you that this probably is required while the tech is new and then eventually won't be when everyone and their dog can run these things locally on their phone.

thfuran · on Aug 29, 2022

>Do you want the news stories to be dominated by whatever heinous shit some bad-faith giggly teenager with a call of the void to offend as much as possible (i know, i was one) is able to generate?

How is that the only alternative?

corysama · on Aug 29, 2022

It’s not the only alternative. It’s the alternative that will absolutely will happen eventually. But, will happen later and to a lesser degree because of all of the crude, annoying, imperfect filters that have been holding it back so far.

BeFlatXIII · on Aug 29, 2022

…and why should a company not based in one of those nations care about the views of all the billions of others not there?

joshcryer · on Aug 29, 2022

Stable Diffusion was released for free and open source with a totally unenforceable license (dont do anything illegal or unethical). People right now are likely generating the most heinous of things. Some piece of shit bully is almost certainly as we speak taking pictures of their target and making the worst compromising cruel thing imaginable.

That discussion is over. You shouldn't care about a company trying to filter out bad shit on their own platform.

Pandoras box has already been open.

cm2012 · on Aug 29, 2022

If you don't have anti-NSFW stuff your service will generate 95% porn as users who want it flock to you. No one wants a reputation as the porn AI (although I'm sure it will exist eventually).

booleandilemma · on Aug 29, 2022

hentAI - Someone feel free to use this.

saynay · on Aug 29, 2022

Pretty sure that one already exists. A de-censoring and inpainting one, if I remember right.

jerojero · on Aug 29, 2022

I think Ai has a different meaning in Japanese already :)

astrange · on Aug 29, 2022

Japanese people don’t think 愛 and AI are the same word, that’s an artifact of writing it in English.

Of course that’s not what hentai means either.

teaearlgraycold · on Aug 29, 2022

Give it 12 months

idiotsecant · on Aug 29, 2022

The NSFW thing is just a fig leaf though, it's trivial to disable if you run it locally and it annoys you.

epups · on Aug 29, 2022

However, it looks like the model wasn't trained on NSFW keywords anyway, which means it is permanently impaired in that regard.

Hard_Space · on Aug 29, 2022

It was trained on LAION 5b, which is chock-full of NSFW material. Search for yourself: https://rom1504.github.io/clip-retrieval/

epups · on Aug 29, 2022

They filtered a lot of it out, it is a subset of LAION 5b according to undisclosed parameters: https://stability.ai/blog/stable-diffusion-announcement

BatteryMountain · on Aug 29, 2022

Run it locally and remove the nsfw filters in the scripts...

skybrian · on Aug 29, 2022

I think of this like a "first post" filter, to cut down on the noise. Drawing dicks on things is something a lot of kids do. You want the trolls to have to work a little.

kgwgk · on Aug 29, 2022

It's interesting that the people who write things like "look at this stream of words, they have to be coming from a sentient being" do not seem to care much about the "intelligence" generating these images.

bethecloud · on Aug 29, 2022

Try img to img using decentralized cloud: https://mirror.xyz/bitkevin.eth/F3cZoh630VvRKgmPzVQ3QXM5gQu_...

coolspot · on Aug 29, 2022

The article describes how to store a picture in “decentralized” storage. You’re not running img2img in a “decentralized cloud”.