So you want to get started generating images, but where do you even start? And everyone does something different and it feels impossible to find a good starting point. Well at least that's how I felt starting out. And hopefully this guide will help with that. I'm gonna have to assume you know how to install software and have a working computer with a decent graphics card. Otherwise to go over the setup of all of that, this guide would be way to long.
To start with the software you need, I like SwarmUI personally, there is also sd-webui-forge, and there is ComfyUI as well, but if you're just starting out I'd not get into ComfyUI just yet, it just adds another layer of complexity on top of an already complex thing. Whichever you choose to use though is fine, the settings and starting prompt stuff is basically the same regardless.
For this guide we'll be using novafurry, you can find models at civitai or similar sites, you have your software of choice loaded up and ready and have this like blank prompt box and no idea what to do next, which is like daunting I know, but i'm gonna start with a basic prompt for say an example oc, an anthro fox girl with green eyes and purple hair, sounds easy right? well we use this prompt and generate it and its like decent but its not what we wanted.fox girl with purple hair and green eyes
So we gotta kinda touch on that furry and anime models do work with natural language type prompts, ie just saying what you want in a sentence like the above example. But they're also trained with tags in mind, like the very tags from certain sites you probably already know well.
anthro, fox, female, purple hair, green eyes
So if we try that, we now have the furry at least. You can find examples of people using tags like on civitai which can help with inspiration.
But what is the negative prompt? Its basically like the oppasite of the prompt, as in it tells the model what you want it to avoid, or pushing the model away from those concepts, for example I like to use a simple negative prompt, like
negativeprompt: lowres, bad anatomy, signature, watermark, artist nameI usually leave it at that unless its doing something you don't want, then you can add the term for it in the negative prompt.
Okay so that did something. in this example it pushed it away from concepts it was trained on, stuff that included low res, bad anatomy, etc,
But other than adding more to the prompt what are the important settings to get a feel for first? One is the steps, steps are defined as Diffusion works by running a model repeatedly to slowly build and then refine an image. This parameter is how many times to run the model. More steps = better quality, but more time. Currently with the default settings we were running at 20 steps, I find a baseline of 35 to be good. You can go up to 40, but anything higher and you quickly get diminishing returns. So running our basic prompt at 35 steps instead of 20 you can see how it spent a little more time on the image. There is some randomness to each generation, but the important part is the basic understanding of what it does. The next setting we can tweak is CFG, CFG is classifier free guidance, that part isn't really important here. What it does though is it basically instructs the model how closely to follow the prompt. our baseline here in these examples is CFG 7. Most of the time you'll want to stick around there, maybe from 5 to 8 at the highest. At least for illustrious models/nova furry. Because that's around where it was trained on. If you go too low it will follow your prompt less. If you go too high it will end up baking the image basically. Resolution is pretty self explanatory but it does effect composition of the images you generate to a degree. So something to keep in mind. The next thing to mess with is sampler, but like what is that? The sampler is sorta like the way it follows the random noise to get to the image the prompt is asking for, some samplers to try messing with are Euler Ancestral, or sometimes called Euler A, it adds random noise in between each step, giving a bit more variation which can be nice for more interesting images. A sorta stable baseline is DPM++ 2M,