Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Try Stable Diffusion's Img2Img Mode (huggingface.co)
415 points by fragmede on Aug 29, 2022 | hide | past | favorite | 156 comments


If you have a GPU with >4GB of VRAM and you want to run this locally, here's a fork of the Stable Diffusion repo with a convenient web UI:

https://github.com/hlky/stable-diffusion

It supports both txt2img and img2img. (Not affiliated.)

Edit: Incidentally, I tried running it on a CPU. It is possible, but it took 3 minutes instead of 10 seconds to produce an image. It also required me to hack up the script in a really gross way. Perhaps there is a script somewhere that properly supports this.


There are as well forks for the GPU in Apple’s M1 chips:

https://github.com/magnusviri/stable-diffusion


Anyone knows how fast this runs on an m1 macbook air?


Takes anywhere from 30s-1.5m on my M1 Max.


Nvidia GTX 1660 Super with 6GB of VRAM.

I do runs at 384px by 384px, with batch size of 1. Sampling method has almost no impact on memory. Using k_euler with 30 steps renders an image in 10 to 20 seconds. The biggest thing that affect rending speed is the steps and the resolution, so 512x512 with C 50 using ddim is much slower than 256x256 with C 25 using k_euler.

The sampling methods run mostly in the same timelines, but the k_euler one can produce viable output at lower C values, meaning it is faster than the rest.

Don't add gfpgan in the same pipeline, as it takes more vram.

I'm running it on Windows 10 with latest drivers. I set the python process to Realtime priority in task manager (makes a slight difference!). Have not tried it on Linux.


I'm running 1660 ti on Windows 11.

I'm thinking about getting a 3090 so that I can make higher resolution images.

Gfpgan runs much faster for me 5 seconds per picture


What resolution could you get on a 3090?


Good way to know (if not found online) is to start at 512x512 if you have a card with 12GB VRAM and try to increment it (the ui slider increases it by 64px for each increment) and backtrack when you start getting "cuda out of memory" errors. I've seen some renders on discord where the sizes are well above 1000px, so they must have had a 16/24GB cards or something similar. In the research context, they are used to using 40GB/80GB hardware (and perhaps multiples) to train and render. So quite remarkable that it works on consumer hardware at this point.

edit: on second thought, they most likely rendered at 512px but then ran it through an upscaler model. I've been meaning to hook mine up but kinda forgot to try.


No idea, things are changing very quickly now


I'm impressed that running it on CPU only made it ~20x slower. How did you do it?


Nah that's normal. It's why GPUs are the usual thing for AI. Any crap, old, weak gpu with 4gb memory would run circles around a cpu

It's often easier to actually get models to run on CPU, due to simpler install configs and more available memory. Just painful to get a result out of it. Which might help keep the install simple, because it's not even worth optimizing


> Any crap, old, weak gpu with 4gb memory would run circles around a cpu

Not really.

https://news.ycombinator.com/item?id=32635086

I’m not even sure it works well - if at all - with 4Gb.

In any case, it’s impressive even if it takes minutes. And it’s not like you need to be there to make it work. You can create a list of prompts, let it do its thing and check the results later.


I have 4 gb and it takes about 9 seconds for me :) (Tho at 448x448 but there’s no real difference in quality.)


Good to know it works. That’s not a crap, old, weak gpu, I guess.

I’ve not tried that size but I tried 256x256 and it was too small to get interesting results - maybe there are some parameters that can be adjusted to improve it though.


The GP is asking why it's only 20x slower, rather than more slow.


Ha yep I see that now, on a reread. My bad


It's OK, I missed the "only" the first time around and read it the same way as you.


Just tried it on Ubuntu 22.04. And it's working! Had to install python-is-python3 and conda, but it's up now. Fun, thanks.


How do you set up the model? Instructions only say "Download the model checkpoint. (e.g. from huggingface)", but I can't find instructions there on how to find a ckpt file, nor exactly what file should I look for.



Thank you, this saved me a lot of frustration.


Can you share your script for running on CPU?


python optimizedSD/optimized_txt2img.py --device cpu --precision full --prompt "…" --H 512 --W 512 --n_iter 1 --n_samples 1 --ddim_steps 50


Thank you. There was a also a related submission for running on CPU: https://news.ycombinator.com/item?id=32642255

I'm planning to try it out this weekend. But not really hopeful. I only have 8GB ram, and mine is a cheap Intel CPU that doesn't even have AVX.


What kind of GPU do you have? It takes several minutes to produce an image on my 1070.


Takes 3 minutes (for a prompt resulting in a set of 4 images) on my 1080 as well. Really astonished that it takes GP about the same time using just a CPU. Seems like the older generation of GPUs isn't much better than CPUs in regards to ML stuff.


> Really astonished that it takes GP about the same time using just a CPU.

GP talks about generating a single image while you talk about generating 4.


A 1070 isn’t very powerful for ML compared to more recent GPUs, so several minutes sounds about right.


To add another data point, my GTX1080 takes ~60 sec to generate a pair of 500x500 images using txt2img. Haven't tried img2img yet as the UI package I went with is a bit buggy with it


also on a 1070, I can generate an image in ~15 seconds, surely you're doing something wrong.


What settings are you using it the people with 1080s above you are taking 3 minutes?


I'm using lstein's repository which loads the models and keeps them in memory. Then, for some diffusers 16 steps is enough to come up with a usable image (and using more steps will only add details, most of the time won't change much).

This is for the initial "exploration" step of the process. Once I like an image I typically play with the settings, then in the final step run with a large number of steps (and maybe even use upscaling).

So, default 512x512 size, 16 steps and default for the rest of the settings (I believe 7.5 scale, 0.75 strength).

Having said that, I also tried the official Docker image for stable diffusion and with the default values it generated an image in about 40 seconds.


What are good and reasonably priced GPUs for this (<$250, possibly less)?


Not sure exact pricing but look for a used maxwell (geforce 1000 series) nvidia gpu i'd bet. A quadro m2000 with 4gb of ram was about 100 on ebay a short bit ago


Thanks! A Quadro m2000 can still be found fr around 100 today, but I found an m4000 for 150, and ordered that... We'll see...


Amazing. Anyone know of a fork where this is hosted on a cloud GPU? Or any existing hosting of this?


There’s a Colab implementation (Google hosted GPU) linked to from https://www.youtube.com/watch?v=Xur1JeRjjOI


here is a serverless GPU template for Stable Diffusion hosted on Banana's cloud platform. template: https://github.com/bananaml/serverless-template-stable-diffu... setup demo: https://www.banana.dev/blog/how-to-deploy-stable-diffusion-t...


Got this working on my 8gb 3070. 7-8s per image with default settings. Thanks for posting this!



If i ran the same input image several times, would it produce the same output?


if you use the same parameters (e.g. image, seed, noise strength, guidance scale, sample count - which are not exposed on this UI), yes.


Not necessarily, GPU compute can be non-deterministic due to scheduling: the result of A+B+C is subtly different when it's (A+B)+C or when it's A+(B+C), which can get amplified in a long processing pipeline


Depends. The default script uses a non-deterministic stochastic encode for img2img. That would need to be modified as far as I know.


On the forked repo with the webui (https://github.com/hlky/stable-diffusion) it seems that the same inputs result in the same output, so maybe it has been resolved there?


old.reddit is truly horrible on mobile. Once you click on an image you can't go back. Off topic, but what is the other alternative UI called that people sometimes use?



I use i.reddit.com on mobile.


I use the Slide for Reddit app (one of many choices) and links open automatically because of the app's URL handler.


Hello, everyone I'm Mishig, one of the engineers worked on setting up the demo. Happy to answer any questions if you got some :)

You can find the announcement tweet here: https://twitter.com/mishig25/status/1563226161924407298?s=20...


Just wanted to say thanks. It’s really cool!


NOTE: for some reason this is NOT using huggingface for inference, resulting in huge queues, slow performance, etc. It is sending requests to https://sdb.pcuenca.net/i2i (https://huggingface.co/spaces/huggingface/diffuse-the-rest/b...)


I've been playing with this for a few hours. It's slow going -- you really need a fast GPU with a lot of RAM to make this very usable.

I ended up paying the $10 for Google Colab Pro and that's how I've been using this. Maybe I'll figure out how to get this working on my old 1080 TI to see if it's faster.

Anyway, for the one that I'm using which has a web UI, you can use this Colab link. It's pretty great! https://colab.research.google.com/drive/1KeNq05lji7p-WDS2BL-...

What I really wish was that the img2img tool could be used to take a text2img output and then "refine" it further. As it is, the img2img tool doesn't seem particularly great.

People on Reddit are talking about "I just generate 100 images and pick the best one"... but this is incredibly slow on the P100 GPU that Google has me on. Does this just require a monster GPU like a 3080/3090 in order to get any decent results?


You can feed a txt2img output into the img2img pipeline as an init, that's something that I do quite often, eg https://twitter.com/SteWaterman/status/1563872748161613826

Also how slow is your p100? I'm usually getting around 3 it/s. Maybe it's just because I'm used to disco diffusion where a single image took over an hour, but this is ungodly fast to me


Colab can feel slow for other reasons, such as throttled download speeds making it very slow to download weights on a cold boot.


FWIW I'm using an old gtx 1080 Ti to play around, it takes about 21 seconds per image. You can make it go even faster by lowering the timesteps taken from the default 50 (--ddim_steps), though both lowering and raising the value can result in quite different first-iteration images (though they tend to be similar) and seems to guarantee totally different further iteration images (as counted by --n_iter)... I'm with you on the feeling that it's hard to control, whether in refinement or in other ways, but I suspect that'll get a lot better in the next couple years (if not weeks or dare I say days).


With a 3090 it's about 10s to generate a 512x512 image from another, maybe less.


You're probably using the default PLMS sampler with 50 steps. There are better samplers, the best seem to be Euler (more predictable in regards to the number of steps) and Euler ancestral (gives more variation). Both typically need much less steps to converge, speeding up the generation.


My understanding was that PLMS was the current state-of-the-art. Would be interested if you have a citation for this "Euler" sampling method.


Yes I'm using PLMS. Thanks for the tip, will try.


What's huggingface's business model? How do they pay for all this compute and (apparently) 140+ employees?


HuggingFace is a company that mainly builds open-source libraries and platforms to support open-source ML projects. They started out with their famous Transformers library and have many other libraries including the diffusion model one that is actually what this application here is using. They also have this model/dataset hub and interactive application platform known as "Spaces". Their goal is to be the "GitHub of machine learning".

Their business model is basically supporting enterprise and private use-cases. For example, getting expert support for using these libraries, or hosting models and datasets privately. You can see more information about the pricing here: https://huggingface.co/pricing

They reached a $2 billion valuation after a recent round of funding so overall they're probably pretty flush with cash lol


They are building a model of what everyone else is typing into these open models.


Aren't we all?


I did see a couple people in the reddit posted here

https://news.ycombinator.com/item?id=32634139

complaining about the price of getting their images done. So part of it may actually exchanging money for a service. I bet a good chunk of it is investor cash though.


You can also try it out on Replicate: https://replicate.com/stability-ai/stable-diffusion


Note that it eventually asks for your credit card, it's not indefinitely free like the linked site.



I've been trying to get some sensible images out of my descriptions, but I fail miserably.

In this case I had the prompt "cow chewing bone" with 4 squares representing the two pair of feet, the body and the head. None cared about chewing on a bone.

With DALL·E 2 I tried to get an image of a little girl building sandcastles and a monster threatening her:

"little scared girl building a sandcastle and a big angry monster is looking at her."

"little scared girl building a sandcastle six damaged sandcastles are to her side. a big angry monster is threatening her. it is dark." https://imgur.com/a/f5FFKOi

"little scared girl building a sandcastle with six damaged sandcastles to her side and a big angry monster threatening her"

Is there some kind of structure the sentences should follow?


Yes, checkout examples on lexica or use a prompt builder to help, like promptmania.

Also, most of the good ones you see online are cherry picked from hundreds of runs, so set your batch size too 1000 and go to bed! After that, people then tend to run some of the good results through img2img, also with a lot of variations produced from a single image. Finally, some people also run them at higher resolutions if they have enough VRAM, as smaller resolution can distort or generate rubbish. For the messed up faces, they run it through gfpgan a few times to get prettier faces. Other than that, it is pure luck (using random seeds) to figure out what works and what doesn't. Use the 2 sites above to help you improve your prompts.

(meant in the context of stable diffusion)


Just know that if you let it run over night often, you will see it on your electricity bill. My GTX 1660 runs at max while rendering, which is 125W. Leaving it running over night can easily eat 2 to 6 kw's, depending on your system.


I managed to get one that was correct with "A little scared girl is building a sandcastle, while a monster is looking at her. Award-winning photograph.", but I couldn't figure out a phrasing where it wouldn't most of the time get confused thinking that the sandcastle is the monster, or that the girl is the monster.


Dalle is bad at being instructed to have an exact count of items in the picture. Ask for 6 kittens and you get 7 & each kitten will be much more "wrong" than a piture of a single kitten.

Dalle is is bad at positional prompts. Ask for somethi g to be in the top rightbhand corner and it will appear bottom centre


It only produces colored noise for me. What am I doing wrong?


Wait longer.

That’s not the “incremental diffusion preview” it’s the “waiting in a queue” preview.


Is your resolution set below 512x512?


Is anyone working on a variation of this where the output is a blender scene?


I am very interested in possibilities of generating 3d models as well.

There is a fair amount of 3d models out there so it should be possible.

I suspect we will end up using multiple 2d images at different angles to generate a 3d model. I have seen this done before



Why does the button say "diffuse the f rest"? What is the f for?


It's a play on the "draw the rest of the fucking owl" meme.


Diffuse the fucking rest, I would assume.


Looks pretty hard to get it to run. Queue full every time.


Skip the queue and run the colab version instead: https://www.youtube.com/watch?v=Xur1JeRjjOI


You can also run it on Replicate without a queue: https://replicate.com/stability-ai/stable-diffusion (set “init_image”)


It is super easy to install and use locally: clone repo, create conda env, run.

there are a number of forks and auxilliary repos available that add UI, reduce memory requirements, etc.


Is there an interpolate mode for Stable Diffusion? (Interpolate between two images without using a text prompt.)

Here are DALL-E 2 interpolations: https://twitter.com/model_mechanic/status/151297688118364569...


One of the first things that came out when the model was released.

https://github.com/schmidtdominik/stablediffusion-interpolat...


Thank you, but this notebook appears to use images generated from text prompts. I was interested in interpolating between two given images, without generating from text.


You would need to run CLIP and generate CLIP embeddings from the images first, then feed them into the samplers. If you give it the starting images and caption them yourself, then it would also work.


Seems like if you draw something and hit start, but get the queue error, you lose the image prompt you drew as the frame is replaced with the first steps of diffusion noise? None of the results have a composition anywhere close to my image prompt, presumably because I can't get a run on first attempt.


I'm not quite grasping how to use this. I tried uploading a photograph and erasing part of it. But instead of painting in the erased portion, it left the erased area blank and replaced my photograph with an entirely new image.


What you tried is called inpainting. Img2Img needs a rough sketch.


I guess documentation and clarity of UX design are not big strengths of Hugging Face.


its like github pages - the repo owner is responsible for the UI: https://huggingface.co/spaces/huggingface/diffuse-the-rest/t...


They haven't built the ML model for that due to paucity of training data.


I though it is a joke when seeing the random pixels.

Try again after seeing this post made into front page, it's actually working (and relatively very fast), just the loading screen is misleading ...


Can't help but see this as a friendly herald for a dystopian future.


There you go. We were promised quite far fetched things like 'flying cars', 'time travel', 'life extension' and 'universal basic income'.

Instead we get Deep Learning AI's being used for generating and faking sentences, images, videos, voices, code and digital art all being trained on mountains of data in data centers all significantly contributing to the already burning up of the planet to no benefit and no efficient alternatives.

A dystopian future with deepfakes and easy fake news creation thanks to the technologists who helped create those things for others to create a new griftopia on top of that.

So you will own nothing, believe everything that you see on the internet, and be very happy.


People are too pessimistic.

If we had time travel, they’d say now someone can kill our granddads and rewrite history. If we had flying cars, they’d say now our flats are worthless because everyone can fly by and gaze. If we had universal basic income, they’d say something about it too.

It’s really nice that we only got AIs that draws pictures and bitcoin that helps with stolen money.


Tune out the news and look for a futurism themed news source[0], and you can get your techno-optimistic pipe dreams back, right now. What you describe as "we were promised" and "we got" is just you seeing a bit more of the world. It was filled with empty promises and horrors in the beforetimes too, today's flavor is nothing special - just in its specifics for today's culture. And there's also plenty of people doing great things in the past too, perhaps not promising those things, just living their life doing something they believe in, and getting results.

My point being, there's enough good things and bad things to fit any mood and worldview. Everyone can basically pick as they'd like.

[0] something like this https://www.reddit.com/r/Futurology/


You must be kidding it’s hard labor creating images and you are complaining that it’s going to be easier in the future? You are complaining that the skill gap is eradicated and it now only depends on your ideas to produce compelling work? That’s absolutely insane everything that takes labor intensive tasks and makes them basically free is the future. This is not dystopian it’s our future, a future in which you and your capabilities don’t matter. It’s exactly the future we need and doing it opensource is exactly the way to keep it out of harms way of turbo capitalism. You are missing the point so hard it’s not even funny. Flying cars for the 1% is dystopian.


The skill gap was what made it valuable. There is not that much value in flooding everyone with generated images that have no story, no person behind it, no effort, no reason to exist, out of place. “Art” is as much about the final image as it is about the person who made it, why she made it and how she got there.

Also there is pretty dystopian angle because these tools balantly stole all the work of artists by “learning” on their work (without pemission). Calling it learning is too nice we all know its just huge visual pattern copy mashup paste machine. People are not gonna invent with this - they just add name of artist they like to the prompt and get to call that their work.

Its not liberation - its exploitation. Its gonna destroy peoples lives by making their hard earned skills obsolete.

Then again it probably wont be so bad. Artists wont disappear and people wont suddenly become artists just because they can write into prompt.


You in 1894: Why does no one think about all the horses and stable boys. The value of art is also not in the hours you put into creating it it's about the idea behind it and how it's conveyed. Artists are also always stealing especially concept artists the only thing to say is "photobashing". You talk like someone who has absolutely no idea how the sausage is made.


Yes, yes and invention of camera will kill painters. Sure everybody knows this.

But there is important difference between someone stealing and some algorithm copy replicating anything from the past in instant. One is a remix that brings something new (even if author doesn't want) the other in static its conservation. It will create side effects that will impact our (visual) culture. But who knows what those will be.


You must have missed the last 50 years with that much fear of change. Yikes.


Not sure why i deserve so much toxicity. We obviously work in different fields and have different view of what art actually is. No i am not happy to jump on AI bandwagon that will surely mean even faster pace and more precarious working conditions because individuals surely wont reap the benefits of automation. It's easy to not care when you are not affected.

None of that mean that i deserve the hate. It's just a different opinion.


I keep providing drawings, and it mostly just produces other drawings. Thought it was to fill in content accordingly. What am I missing?


If you upload an image with 2:1 aspect ratio it squishes it to 1:1 instead of letting you crop it. Seems like a basic thing they could add.


I bet you could add it yourself, or at least comment on it on the GitHub.


Yeah, I just didn't know if that control was part of this project or a standard part of the website.


Is there a tutorial on how to run this on Paperspace?


If I want to run Stable diffusion in C++ only? I would like to build this program statically. I have been searching on github but so far unable to find anyone having ported this.


What are your best prompts so far?


I typed "queue overflowing its capacity" and got pretty realistic results


Okay that was good.


I typed in "space nerds in space fighting the zarkons", And it gave me a picture that looked similar to NASA's astronaut group photos, with 3 astronauts, 2 human, and one 3-eyed ninja-turtle looking alien in a spacesuit. Behind them was a space scene with a huge, earthlike planet. The faces of the two human astronauts looked eerily similar to the face of a friend of mine, a friend who happens to be the 2nd most prolific contributer of code to my open source game called "Space Nerds in Space". One of the human astronauts was bald, the other had a mullet, and the long portion of their hair was not entirely unlike my own hair.


I found this tool pretty interesting (it helped me frame my mindset while creating prompts). https://promptomania.com/stable-diffusion-prompt-builder/


"john wayne as captain america" is spot on. It has so many images of both to work with.


i tried "john wayne was a nazi" but was left disappointed


Why were you disappointed? I haven't tried that with img2img, but as just a regular text prompt without any fancy prompt engineering I get results in line with what I'd expect (kind of a Hugo Boss/Cowboy crossover, e.g. this: https://imgur.com/a/gcKi3WG)


Have you tried "as a Nazi"?


What is it?



To understand the more technical aspects of how it works: https://www.paepper.com/blog/posts/how-and-why-stable-diffus...


portal in the distance on a wheat field



I'm one of the maintainers (in charge of the UI) for hlky webui repo! And we just updated our own colab, you can find it here : https://github.com/altryne/sd-webui-colab


Huge respect! Your colab file run perfectly, I’m using it for a few hours now, very addictive.


Well. Apparently "penis on park bench" is an "inappropriate" query.

However, "lot of child dying in a massive fire" is an a-okay query with some "interesting" results.

Prudes are such a weird bunch.


I’m not gonna lie, I’m quite disappointed at all the anti-NSFW shenanigans.


I'm not gonna lie, I'm quite disappointed by the wilful obstinance on HN whenever there is a discussion of anti-NSFW or anti-racism filters that are present on today's artificial intelligence research.

First, the filters are only on the interface. The research is public. That's why there has been an explosion of new implementations. You are welcome to run the code yourself and make the horniest model you like.

But more importantly, this comes up every time, and everyone acts SO confused. But why? Artificial Intelligence is MASSIVE mainstream news. New advancements are arriving daily.

Do you want the news stories to be dominated by whatever heinous shit some bad-faith giggly teenager with a call of the void to offend as much as possible (i know, i was one) is able to generate?

Do you want the discourse to be dominated with "Won't someone think of the children?" blocking legitimate research and progress?

Europeans always like to dunk on Americans for being "prudes", but only a tiny section of Western Europe is progressive enough to not mind random nipples on their bus ads and television. All of Eastern Europe, Africa, India, and China are culturally still fairly conservative about sex - at least out in the open.

People think that preventing racist or sexual use of these models is indulging the prudish mores of America. But that itself is a very narrow perspective that ignores the perspectives of billions of people in the world.


I can generate many violent prompts right now, and the model knows the face of hundreds of celebrities (you would think they would garble those to prevent PR disasters). It knows what the 9/11 attack was, it knows about nuclear bombs, horror, gore. Yet somehow it doesn't know what a blowjob is. This is 100% about narrow-minded, prudish conservative values. And yes, most of them come from the US right now, because these models are not being trained in Africa, India or China.

Your argument is entirely descriptive. We know why they do this. What should be argued here is how stupid and sad it is to block progress due to nothing more than religious or moral values.


Tell me, earnestly, what progress is being blocked by your inability to use AI to generate an image of a blowjob.

I don't mean this in a moralistic sense. I have no qualms about images of blowjobs. I have no doubt that the porn industry has already deployed these models and is experimenting with this without filters. As a monetizable way to reduce the human costs, it's an entirely logical step, and adult performers should be as nervous as artists, and talking to lawyers.

But PROGRESS???


It's funny because you spell out a use case that can represent undeniable progress: artificial porn. Porn with no component of human suffering, yet realistic and personalized. A dream product, no doubt. By what metric is this not progress, except a puritanical/religious one?

Moreover, sexuality is perhaps the most important theme in art, historically. It is a very legitimate thing to want to include in a tool such as this one. Its censorship is akin to "moral codes" of the past, completely regressive.


I'm not as sure that artificial porn is undeniable progress.

While I am entirely pro liberty in terms of consumption of pornography, it is not a replacement for human interaction, and can be used in conjunction with other tools to dehumanize and isolate individuals. Some individuals that then make negative contributions to the rest of society at large. These are not isolated incidents, and they've been rising: https://en.wikipedia.org/wiki/Misogynist_terrorism

There are many kinds of porn - some that exploit women, some that empower them in their production. And there are also many kinds of products - ones that focus on a human connection (whether vanilla or kinky) and ones that don't.

I would absolutely worry that artificial porn would be good enough to meet some of these demands but not most, and would ultimately be a net negative for society.

> Moreover, sexuality is perhaps the most important theme in art, historically.

I mean, it's demonstrably not, religious art imagery dominates by volume. But that's also not necessarily important - that's just who happened to be patrons of arts and had the means to commission them.

I'm not gonna deny that humans are horny and want to make lots of sexy art.

But you're also not going to get porn on basic cable. You have to seek that out extra for yourself, and that's going to be the case with AI-generated porn art also.


I think people can both understand the motives of the companies and even choose to do the same themselves, but still be disappointed in the state of society where this is required.

I agree with you that this probably is required while the tech is new and then eventually won't be when everyone and their dog can run these things locally on their phone.


>Do you want the news stories to be dominated by whatever heinous shit some bad-faith giggly teenager with a call of the void to offend as much as possible (i know, i was one) is able to generate?

How is that the only alternative?


It’s not the only alternative. It’s the alternative that will absolutely will happen eventually. But, will happen later and to a lesser degree because of all of the crude, annoying, imperfect filters that have been holding it back so far.


…and why should a company not based in one of those nations care about the views of all the billions of others not there?


Stable Diffusion was released for free and open source with a totally unenforceable license (dont do anything illegal or unethical). People right now are likely generating the most heinous of things. Some piece of shit bully is almost certainly as we speak taking pictures of their target and making the worst compromising cruel thing imaginable.

That discussion is over. You shouldn't care about a company trying to filter out bad shit on their own platform.

Pandoras box has already been open.


If you don't have anti-NSFW stuff your service will generate 95% porn as users who want it flock to you. No one wants a reputation as the porn AI (although I'm sure it will exist eventually).


hentAI - Someone feel free to use this.


Pretty sure that one already exists. A de-censoring and inpainting one, if I remember right.


I think Ai has a different meaning in Japanese already :)


Japanese people don’t think 愛 and AI are the same word, that’s an artifact of writing it in English.

Of course that’s not what hentai means either.


Give it 12 months


The NSFW thing is just a fig leaf though, it's trivial to disable if you run it locally and it annoys you.


However, it looks like the model wasn't trained on NSFW keywords anyway, which means it is permanently impaired in that regard.


It was trained on LAION 5b, which is chock-full of NSFW material. Search for yourself: https://rom1504.github.io/clip-retrieval/


They filtered a lot of it out, it is a subset of LAION 5b according to undisclosed parameters: https://stability.ai/blog/stable-diffusion-announcement


Run it locally and remove the nsfw filters in the scripts...


I think of this like a "first post" filter, to cut down on the noise. Drawing dicks on things is something a lot of kids do. You want the trolls to have to work a little.


It's interesting that the people who write things like "look at this stream of words, they have to be coming from a sentient being" do not seem to care much about the "intelligence" generating these images.


Try img to img using decentralized cloud: https://mirror.xyz/bitkevin.eth/F3cZoh630VvRKgmPzVQ3QXM5gQu_...


The article describes how to store a picture in “decentralized” storage. You’re not running img2img in a “decentralized cloud”.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: