Create a VM with a GPU, at least 15GB RAM, and 30GB disk. Connect to SSH, install git, and clone this [Automatic1111 (A1111) repository](https://github.com/AUTOMATIC1111/stable-diffusion-webui).
You should see a link like `https://xxxxxxxxxxxxxxxx.gradio.live` after the webui finishes launching. Warning, do NOT share the public link, others can abuse you instance and increase your bill.
As different models are trained on different image resolutions, it is best to use the training image resolution for generations. For SD1.5 use 512x512 and for SDXL1.0 use 1024x1024. You can slightly vary one of the dimensions without significant issues.
`text2img` can be thought of as generating visual content based on textual descriptions. Popular models include [DALL-E](https://openai.com/dall-e-2), [Midjourney](https://www.midjourney.com/home), and [Stable Diffusion](https://stability.ai/blog/stable-diffusion-public-release). Stable Diffusion (such as SD1.5 and SDXL1.0) are open and gives us the more control on the image generation process.
`img2img` refers to the transformation of one image into another, typically maintaining the same content but changing the style or other visual attributes.