12. TensorFlow Lite#

Previously, we trained a model based on MobileNetV2 to differentiate between cats and dogs.

Prior to conducting inference with this model, we will convert the model to Tensorflow Lite. This will have performance advantages on constrained hardware.

12.1. Pre-reading#

12.2. Overview#

Recall our machine learning workflow for embedded systems:

  1. Decide on a goal

  2. Collect and understand a dataset

  3. Design a model architecture

    • Design the data input pipeline

    • Design the model itself

    • Design outputs that meet the goal

  4. Train the model

  5. Evaluate the model

  6. Convert the model

  7. Run inference

  8. Iterate

    • Troubleshoot

    • Evaluate on-hardware performance

    • Collect data for feedback

In the previous lesson, ā€œTransfer Learningā€, we completed steps 1-5. Now it is sime to convert the model to TensorFlow Lite so that it can be executed on a Raspberry Pi.

12.3. Convert to TF Lite#

12.3.1. Upload the Model#

First, upload the saved model from the previous lesson (cat-dog-tuned.zip) into this Colab instance.

Then unzip the model.

# Run after uploading the file
!unzip cat-dog-tuned.zip

12.3.2. Convert the saved model#

Weā€™ll use TFLiteConverter to export the model to a single binary.

import tensorflow as tf

print(tf.__version__)
print(help(tf.lite.TFLiteConverter))

To convert the model we will open it and then follow the docs.

Notice that we are using Dynamic Range Quantization. This is the default optimization.

This type of quantization statically quantizes only the weights from floating point to integer at conversion time, which provides 8-bits of precision.

Outputs are still stored using floating point so the increased speed of dynamic-range ops is less than a full fixed-point computation.

However, this does not require us to calibrate the input range like full integer quantization does.

saved_model_dir = "cat-dog-tuned"  # path to the SavedModel directory
tflite_model = "cat-dog.tflite"  # what to save the converted model as

# Open the model
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
# Use Dynamic Range Quantization
# https://www.tensorflow.org/lite/performance/post_training_quantization
converter.optimizations = [tf.lite.Optimize.DEFAULT]
# Convert the model.
tflite_model = converter.convert()

# Save the model.
with open("cat-dog.tflite", "wb") as f:
    f.write(tflite_model)

12.3.2.1. Save the converted model#

Youā€™ll ultimately need to get your model onto a Raspberry Pi.

Download your .tflite model now.

12.3.3. Compare Model Sizes#

There is some metadata that gets thrown out and some zip compression going on, but just for an order of magnitude estimate, compare the size of the full cat-dog-tuned.zip to cat-dog.tflite.

# Sort by file size and show in Human Readable format
!ls -lhS

12.4. Conduct inference on novel images#

Now that weā€™ve converted the model, letā€™s test it with novel images!

  • First, use the full Keras API

  • Second, rely on the Tensorflow Lite runtime

12.4.1. Upload images#

The book has some sample images. Either download them or find your own and upload them.

!mkdir img
!wget -qP img/ https://raw.githubusercontent.com/USAFA-ECE/ai-hardware/main/book/dnn/cat-dog/img/cat1.jpg
!wget -qP img/ https://raw.githubusercontent.com/USAFA-ECE/ai-hardware/main/book/dnn/cat-dog/img/cat2.jpg
!wget -qP img/ https://raw.githubusercontent.com/USAFA-ECE/ai-hardware/main/book/dnn/cat-dog/img/cat3.jpg
!wget -qP img/ https://raw.githubusercontent.com/USAFA-ECE/ai-hardware/main/book/dnn/cat-dog/img/dog1.jpg
!wget -qP img/ https://raw.githubusercontent.com/USAFA-ECE/ai-hardware/main/book/dnn/cat-dog/img/dog2.jpg
!wget -qP img/ https://raw.githubusercontent.com/USAFA-ECE/ai-hardware/main/book/dnn/cat-dog/img/dog3.jpg
!ls img/

12.4.2. Use TF Keras API#

This requires the full tensorflow install. It uses the original model saved in a directory.

# use Keras API
import tensorflow as tf
import numpy as np
import os
from time import process_time

# Labels: 0 = Cat, 1 = Dog
model = tf.keras.models.load_model("cat-dog-tuned")

# Where test images should be uploaded to
dir = "img/"
# Recursively iterate over all images in directory
for root, dirs, files in os.walk(dir):
    for file in files:
        if file.endswith(".jpg") or file.endswith(".jpeg"):
            # Load and resize the image
            file_path = os.path.join(root, file)
            img = tf.keras.utils.load_img(file_path, target_size=(160, 160))
            img_array = tf.keras.utils.img_to_array(img)
            img_array = tf.expand_dims(img_array, 0)  # Create a batch of size 1

            # Conduct inference and extract the result from the np array
            start_time = process_time()
            prediction = model.predict(img_array)
            result = np.squeeze(prediction)
            elapsed_time = process_time() - start_time

            # Activation function
            sig_result = tf.nn.sigmoid(result)
            sig_predict = tf.where(sig_result < 0.5, 0, 1)
            sig_predict = sig_predict.numpy()

            print("Img:", file)
            print("Inference time", elapsed_time)
            print("Raw prediction:", result)
            print("Inferred label:", sig_predict)

12.4.3. Use TF Lite Interpreter#

This more closely mirrors what weā€™ll do on our embedded system. The only difference is we will use the included tf.lite module instead of the standalone tflite-runtime.

# Use tf.lite interpreter
import tensorflow as tf  # on embedded device use: import tflite_runtime.interpreter as tflite
import numpy as np
from PIL import Image
from time import process_time

# Labels: 0 = Cat, 1 = Dog
model_path = "cat-dog.tflite"

# For running on tflite-runtime replace this with tflite.Interpreter
interpreter = tf.lite.Interpreter(model_path=model_path)

# Embedded devices are memory constrained, so this handles that
interpreter.allocate_tensors()
# Details about model inputs and outputs
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
input_shape = input_details[0]["shape"]

# Where test images should be uploaded to
dir = "img/"
# Recursively iterate over all images in directory
for root, dirs, files in os.walk(dir):
    for file in files:
        if file.endswith(".jpg") or file.endswith(".jpeg"):
            file_path = os.path.join(root, file)

            # Load the image using PIL
            image = Image.open(file_path)
            # Resize the image to match what the model was trained on
            resized_image = image.resize((input_shape[1], input_shape[2]))
            input_data = np.array(resized_image, dtype=np.float32)
            input_data = np.expand_dims(input_data, axis=0)  # Create a batch of size 1

            # Conduct inference
            start_time = process_time()
            interpreter.set_tensor(input_details[0]["index"], input_data)
            interpreter.invoke()
            output_data = interpreter.get_tensor(output_details[0]["index"])
            # Pull out the raw value from the np array
            prediction = np.squeeze(output_data)
            elapsed_time = process_time() - start_time

            # Computing exponents for sigmoid function is expensive, so use a simple heuristic instead.
            # If  you need an "unknown" option or confidence threshold, use something like this.
            # label = 0 if prediction < -3 else (1 if prediction > 3 else -1)
            label = 0 if prediction < 0 else 1

            print("Img:", file)
            print("Inference time", elapsed_time)
            print("Raw prediction:", result)
            print("Inferred label:", sig_predict)

12.4.4. Next step: Raspberry Pi#

Now that we know our TF Lite model works, letā€™s put it on an embedded system!

Make sure you downloaded your .tflite file!