Create a VM with a GPU, at least 15GB RAM, and 30GB disk. Connect to SSH, install git, and clone this [Automatic1111 (A1111) repository](https://github.com/AUTOMATIC1111/stable-diffusion-webui).
You should see a link like `https://xxxxxxxxxxxxxxxx.gradio.live` after the webui finishes launching. Warning, do NOT share the public link, others can abuse you instance and increase your bill.
As different models are trained on different image resolutions, it is best to use the training image resolution for generations. For SD1.5 use 512x512 and for SDXL1.0 use 1024x1024. You can slightly vary one of the dimensions without significant issues.
`text2img` can be thought of as generating visual content based on textual descriptions. Popular models include [DALL-E](https://openai.com/dall-e-2), [Midjourney](https://www.midjourney.com/home), and [Stable Diffusion](https://stability.ai/blog/stable-diffusion-public-release). Stable Diffusion (such as SD1.5 and SDXL1.0) are open and gives us more control on the image generation process. A1111 starts with the `text2img` upon launch.
You can enter both a (positive) prompt and a negative prompt. For example:
`img2img` refers to the transformation of one image into another, typically maintaining the same content but changing the style or other visual attributes. A1111 has a `img2img` tab where you can try this. You can also suppliment the generation with a text prompt.
### Inpaint
Inpainting is a technique to make small modification or fix small defects on an image. A1111 has an `inpaint` tab under the `img2img` tab.
ControlNet is a neural network structure to control diffusion models by adding extra conditions. Install extension for A1111: [`sd-webui-controlnet`](https://github.com/Mikubill/sd-webui-controlnet)