18. Diffusion Image Generation#
18.1. Pre-reading#
Objectives#
Apply lessons-learned from containerization.
Get a diffusion text to image model working on Jetson.
Use ChatGPT to serve working model as API.
18.2. Image generation#
My goal is to serve a text-to-image model on the Jetson Orin Nano.
The tutorial#
I first read Does Stable Diffusion run on NVIDIA Jetson AGX Xavier Developer Kit with CUDA? and thought this might be possible!
Following the tutorial under Run Diffusers on Docker, I quickly came to this command:
git clone https://huggingface.co/runwayml/stable-diffusion-v1-5
Even with git-lfs
installed, this was taking foreverā¦
I pulled the plug after 25 GB came down with dozens more promised.
Going to take a different approach, but noting the Dockerfile for later reference, even though I can tell this person is not a docker expert.
The documentation#
The Hugging Face Diffusers Documentation has the suggestion: Choose an AutoPipeline for your task šš¼
from diffusers import AutoPipelineForText2Image
import torch
pipeline = AutoPipelineForText2Image.from_pretrained(
"runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, use_safetensors=True
).to("cuda")
prompt = "peasant and dragon combat, wood cutting style, viking era, bevel with rune"
image = pipeline(prompt, num_inference_steps=25).images[0]
Under the hood, AutoPipelineForText2Image:
automatically detects a āstable-diffusionā class from the model_index.json file
loads the corresponding text-to-image StableDiffusionPipline based on the āstable-diffusionā class name.
Letās run that!ā¦ on my Intel NUC.
Try it on the NUC#
I do not have an NVIDIA GPU on this device.
So, a quick DuckDuckGo search (ChatGPT actually got this one wrong)
tells me about torch.cuda.is_available()
. So I write:
from torch import cuda
# Use GPU if available; otherwise, cpu
if cuda.is_available():
device = "cuda"
else:
device = "cpu"
Then I just replace to.("cuda")
with to.(device)
.
Install dependencies:
pip install torch diffusers[torch] transformers
Then run it: great success! (After about 3:20, yikes!)
Hereās how to display your image:
# Explore what this image even is
type(image)
# Oh, it's a PIL! We can just
image.show()
Add the API#
I want this thing to be served. Letās try ChatGPT.
Time for some prompt engineering.
First, I know I need to be specific about what I want as well as the steps I want the LLM to take.
Iāve heard FastAPI is a good choice, but have never used it before.
Prompt 1#
Here is what I try first
I want to adapt the following code to serve a REST API where the user
passes a prompt with `prompt` and a JPEG is returned.
~~~python
from diffusers import AutoPipelineForText2Image
from torch import cuda
# https://huggingface.co/runwayml/stable-diffusion-v1-5
diffusion_model: str = "runwayml/stable-diffusion-v1-5"
# Use GPU if available; otherwise, cpu
if cuda.is_available():
device = "cuda"
else:
device = "cpu"
# The auto pipeline automatically detects and loads the model
# https://huggingface.co/docs/diffusers/tutorials/autopipeline#choose-an-autopipeline-for-your-task
# Use_safetensors enables a NSFW filter.
pipeline = AutoPipelineForText2Image.from_pretrained(
diffusion_model, use_safetensors=True
).to(device)
# Generate the image based on the prompt
prompt = "A falcon flying over mountains"
image = pipeline(prompt, num_inference_steps=25).images[0]
~~~
1. Create the FastAPI function `generate_img`
2. Use the `prompt` and `pipeline` to make the image
3. Convert the image to JPEG in memory
4. Return the image
I also want you to talk about how to serve this API and how to call it
using `curl -o`.
Ok, tbh this is totally not how I did it. I walked the dog with individual prompts, starting from a basic example. I would run those, search some things or ask for clarification, and iterate.
This prompt will probably give you working code, but it also will almost certainly be way too complex.
For example, I ended up with a GET method doing it bit-by-bit but the prompt above gave me a POST method. Maybe thatās better, in truth.
At minimum, you should
Have
@app.post
or@app.get
macroRun the app with uvicorn
Be able to
curl
the port and save the image. At this point, single word prompts with no whitespace are fine.
18.3. Containerize the app#
Here is a Dockerfile to get you started:
FROM nvcr.io/nvidia/pytorch:23.09-py3
EXPOSE 5858
WORKDIR /app
COPY ./generate_img.py /app
# Ideally, pin to versions you know work.
RUN pip install --no-cache-dir \
# TODO
Switch from the generic nvidia/pytorch base to the one designed for the Jetson.
Finish writing
generate_img.py
and make sure it is in the same directory as your Dockerfile.Finish installing dependencies with pip.
Write the
CMD
layer.Figure out the
docker run
command.You should use a
-v
volume to reuse the model cacheYou need to bind the correct port
After all that, you should be able to curl your container and get a generated image!