7. Lab: Serving Digits with FastAPI#

Previously we made a model. Now it’s time to serve it so others can use it!

7.1. Pre-lab#

For this lab, you will serve your previously trained MNIST digits model.

  1. Make your own copy of the template repository: USAFA-ECE/ece386-lab01

  2. Commit your Keras neural network model for predicting digits into server/digits.keras

  3. Make four images of different handwritten digits and place them into client/img/

7.2. Colab Demo#

This exercise simply demonstrates how to serve a FastAPI app in Google Colab.

Get a terminal#

First, we install colab-xterm, which allows us to have an interactive terminal. We will need this to run the app process, since normally colab runs a process and then kills it.

%pip install -q "fastapi[standard]"
!pip install -q colab-xterm
%load_ext colabxterm
%xterm

You should now have an interactive terminal! Run a command in it, such as

pwd

Create python script#

Next we will create the python file. Run the command

vim hello.py

Vim is a ubiquitous text editor in linux systems. Enter insert mode by pressing the i key. Then paste in this code:

from fastapi import FastAPI

app = FastAPI()

@app.get("/")
def read_root():
    return {"Hello": "World"}

Figure out how to exit Vim.

Once you exit the file, you can make sure you saved it properly with

cat hello.py

Run the script#

In production, you would query a domain and DNS would resolve that to an IP address. But since we are just doing testing, simply look at what your IP address is with this command:

ip a

It shows a loopback at 127.0.0.1 (this is a reserved address and always means “myself”). You will also see an eth0 address that probably starts with 172.. This subnet is reserved for local networks. In production, this would be a public IP address, but this will work for now
 we are just going to talk to ourselves.

Run the script! We will use dev mode.

# First, view help menu
fastapi dev --help

# Then, you can run the app
# 8000 is the default port
fastapi dev hello.py

You should see something starting like:

Serving at: http://127.0.0.1:8000 
 Application startup somplete.

Notice that in dev mode you get automatic reloading and more debug info, but the app only listens on the local host. In run mode FastAPI will listen on all interfaces.

Hit the API#

Now it’s time to connect to the API endpoint and see what we get!

Down the road we’ll do this with Python, but for our quick test, just run the below cell.

!curl 127.0.0.1:8000/

You should see a glorious {“Hello”: “World”} returned! Yes, that’s JSON being returned. JSON is a standard way of representing text data, and you’ll need to learn to work with it a bit.

Back in your terminal, you should also see a “GET” request from yourself that was answered with a “200 OK” status code. You don’t need to go in-depth on HTTP response status codes, but

  • 100s for info

  • 200s for success

  • 300s for redirect

  • 400s for client error

  • 500s for server error

So a “404 Not Found” error is your fault, as the client, because you requested something that doesn’t exist!

7.3. Lab#

This lab will require significant self-learning! You’ll absolutely need to spend time reading documentation.

Large Language Models (such as ChatGPT or Claude) are authorized for clarifying questions and how to do very specific things, but you cannot ask for a large chunk of the program. Furthermore, you must document in the README by:

  1. Stating your primary query AND

  2. Providing a link to the entire conversation.

For example, this is ok: “Write a demo FastAPI POST that can accepts an image and explain each line.”

But this is not ok: “Write a FastAPI app with a POST method that feeds an image into a Keras model.”

Disclaimer, I entered both of those prompts into ChatGPT and neither one was very good! You really, really, need to be careful taking code from LLMs. A better approach might be “I have this simple FastAPI method, but am new to using the package. Can you help explain and critique it for me?”

Note

You should do your development and testing on a single local machine, and then deploy to the server.

Server#

You will use a Raspberry Pi as your server. All the code for the server will be located in the server/ directory of your repository.

  1. Clone your repository.

  2. Create a virtual environment on a Raspberry Pi.

  3. Use server/requirements.txt to install packages.

  4. Serve the model with fastapi run digits.py

The app should simply:

  • Open the previously saved Keras model.

  • Accept a POST request to /predict.

  • Reshape and grayscale the image to work with the model’s input expectations.

  • Conduct inference on the image and return an integer of the predicted class.

Client#

For your client, you can use any machine with Python and on the same network as the Pi. All the code for the client will be in the client directory of your repository.

  1. Clone your repository.

  2. Create a virtual environment on your client.

  3. Use client/requirements.txt to install packages.

  4. Run client.py "xxx.xxx.xxx.xxx", where the X’s are the IP address of the server.

  5. client.py prompts the user for a path to an image. After hitting “enter” the client makes an HTTP POST request to the server, waits for the response, then displays the integer to the user.

  6. The client continues to prompt the user until the user exits with CTRL+C.

Tip

The requests package is easier to use than the built-in urllib.

Deliverables#

  • Demonstrate your client-server exchange.

  • Complete the three README’s in the repository (root, client, server).

  • Push your code and submit to Gradescope.