ICE 2: Handwritten Digits - DNN

6. ICE 2: Handwritten Digits - DNN#

This notebook demonstrates the building and training of a deep neural network (DNN) for digit classification using the load_digits dataset from scikit-learn. The DNN is built using TensorFlow’s Keras API.

6.1. Pre-Reading#

Video 3Blue1Brown: But what is a neural network?

Objectives#

Understand the types of layers in a Neural Network and how they can be modified.
Describe TensorFlow and Keras at a conceptual level
Get a model for classifying handwritten digits

This notebook is a modification of Chollet’s, Deep Learning with Python, 2.1 A first look at a neural network GitHub notebook with a few additions from TensorFlow Tutorials.

# Google colab includes by default, so you probably won't need to run
%pip install -q matplotlib scikit-learn

import tensorflow as tf

print("Running Tensorflow version", tf.__version__)

device_name = tf.test.gpu_device_name()
if device_name == "/device:GPU:0":
    print(f"Using GPU: {device_name}")
else:
    print("No GPU detected; running on CPU.")

6.2. The Dataset#

We will use the same handwritten digits dataset we used with K-Means.

The MNIST dataset comes preloaded in Keras, in the form of a set of four train and test NumPy arrays.

from tensorflow.keras.datasets import mnist

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

Visualize the dataset

import numpy as np
import matplotlib.pyplot as plt


def show_images():
    """Show 100 random images from the dataset"""
    array = np.random.randint(low=1, high=10000, size=100)
    fig = plt.figure(figsize=(30, 35))
    for i in range(100):
        fig.add_subplot(10, 10, i + 1)
        plt.xticks([])
        plt.yticks([])
        plt.title(train_labels[array[i]], color="red", fontsize=20)
        plt.imshow(train_images[array[i]], cmap="gray")


show_images()

Set Splitting#

The dataset already comes split into two sets:

train is the set we will fit the model to
test is the set we will evaluate the model against

In the upcoming Lab we will want to use this model to predict on digits we create!

To do this we’ll need to pass the network exactly the same shape tensor of the appropriate type.

# TODO: replace None with `type` function call on train_images
# Upload the output of this cell to Gradescope
print("Type model expects:", type(train_images))

# TODO: replace None with `ndim` attribute of train_images
# Upload the output of this cell to Gradescope
print("Dimensions model expects:", train_images.ndim)

It’s also extremely helpful to know the shape of the training dataset. The first axis will be how many samples, the next axes will be how many features per axis… in this case, pixels!

# TODO: print the shape of train_images

What about the labels for the train set? How many and what’s their type?

len(train_labels)

train_labels

How many samples are in the test set?

len(test_labels)

Validation Set#

To monitor overfitting, we will break out a validation set from out training set using scikit-learn train_test_split

We will use stratify to make sure that the validation set contains a balanced representation of the labels present in the training set.

from sklearn.model_selection import train_test_split

train_images, val_images, train_labels, val_labels = train_test_split(
    train_images, train_labels, test_size=0.15, random_state=37, stratify=train_labels
)

print(
    f"After split: train_images: {train_images.shape}, val_images: {val_images.shape}"
)

6.3. Build the Deep Neural Network#

We are going to build a deep(ish) neural network, remember that you aren’t expected to understand everything about this example yet. Layers get added into the model one at a time (sequential).

The core building block of neural networks is the layer. You can think of a layer as a filter for data: some data goes in, and it comes out in a more useful form.

We will assemble our model as a series of Keras sequential layers

A Sequential model is appropriate for a plain stack of layers where each layer has exactly one input tensor and one output tensor.

Here is the layer breakdown. The number and size of the hidden layers are arbitrarily chosen here… this is one of the greatest challenges in DNN.

Input tells our model what shape tensor to expect as an input.
Rescaling preprocesses the pixel values from [0, 255] to [0, 1], which helps prevent large values from skewing training.
Flatten converts the 2D matrix input to a 1D vector, which is needed because the upcoming dense layer expects the shape (batch_size, input_dim).
Dense implements the operation: output = activation(dot(input, kernel) + bias). We’ll use ReLU (Rectified Linear Unit), which has the output max(x, 0).
Dropout randomly sets some input units to 0. This tends to help prevent overfitting.
Softmax converts a vector of K real numbers into a probability distribution of K possible outcomes. The sum of these probabilities equals 1. We will assign our sample to the class with the highest probability.

from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential(
    [
        layers.Input(shape=(28, 28)),
        layers.Rescaling(1.0 / 255),  # Normalize pixel values to [0, 1]
        layers.Flatten(),  # Flatten input
        layers.Dense(512, activation="relu"),
        layers.Dropout(0.2),  # Regularization to prevent overfitting
        layers.Dense(256, activation="relu"),
        layers.Dropout(0.2),  # Another regularization layer
        layers.Dense(10, activation="softmax"),
    ]
)

Compile the Model#

To make the model ready for training, we need to pick three more things as part of the compilation step:

Optimizer: The mechanism through which the model will update itself based on the training data it sees, so as to improve its performance.
Loss function: How the model will be able to measure its performance on the training data, and thus how it will be able to steer itself in the right direction.
Metrics: to monitor during training and testing—Here, we’ll only care about accuracy (the fraction of the images that were correctly classified).

Keras provides the compile API which does A LOT of stuff under the hood.

model.compile(
    optimizer="rmsprop", loss="sparse_categorical_crossentropy", metrics=["accuracy"]
)

Visualize the Model#

We can print a summary of the model as well as a graphical represnetation.

We should always do this for a few reasons:

Confirm layer order. It is easy to add layers in the wrong order with the sequential API or to connect them together incorrectly with the functional API. The graph plot can help you confirm that the model is connected the way you intended.
Confirm the output shape of each layer. It is common to have problems when defining the shape of input data for complex networks like convolutional and recurrent neural networks. The summary and plot can help you confirm the input shape to the network is as you intended.
Confirm parameters. Some network configurations can use far fewer parameters, such as the use of a TimeDistributed wrapped Dense layer in an Encoder-Decoder recurrent neural network. Reviewing the summary can help spot cases of using far more parameters than expected.

Our “Output Shape” is unknown because we didn’t specify an Input Layer. Instead, we need to transform our data.

# You should ALWAYS run this after compile
model.summary()

6.4. Train the model#

Keras offers a fit API that will automatically train the model on our data for a set number of epochs.

Two quantities are displayed during training: the loss of the model over the training data, and the accuracy of the model over the training data.

Notice the accuracy increasing to over 98%.

history = model.fit(
    train_images,
    train_labels,
    epochs=10,
    batch_size=128,
    validation_data=(val_images, val_labels),
)

Loss History#

Let’s plot the loss history.

Based on validation loss vs. training loss, do you think overfitting is occurring?

# A simple plot of training and validation loss over the epochs

plt.plot(history.history["loss"], label="loss")
plt.plot(history.history["val_loss"], label="val_loss")
plt.title("Training and Validation Loss")
plt.xlabel("Epoch")
plt.ylabel("Loss")
# plt.ylim([0, 1])
plt.legend(loc="lower right")

plt.show()

6.5. Make a Prediction#

Finally, we can evaluate the accuracy of the trained model on the validation set.

Now that we have a trained model, we can use it to predict class probabilities for new digits—images that weren’t part of the training data, like those from the test set.

# Grab the first samples from the test dataset
test_digits = test_images[0:5]

# Make prediction on that sample
prediction = model.predict(test_digits)
print(prediction)

This is the output of the softmax layer.

The sum of probabilities for these 10 elements is 1.

Whichever class corresponds with the element that has highest probability is the class we predict.

highest_prediction_index = prediction[0].argmax()
print(
    "Index of highest probability:",
    highest_prediction_index,
    "with probability:",
    prediction[0][highest_prediction_index],
    "\nTrue label:",
    test_labels[0],
)

Because index 7 has the highest probability (with over 99%), we predict that is the class. The true label agrees!

And here is the image itself!

# Plot the first sample
plt.imshow(test_digits[0], cmap="gray")

Overall Accuracy#

On average, how good is our model at classifying such never-before-seen digits? Let’s check by computing average accuracy over the entire test set.

test_loss, test_acc = model.evaluate(test_images, test_labels)
print(f"test_acc: {test_acc}")
print(f"test_loss: {test_loss}")

The test set will have a lower accuracy than the training set, partyl because of overfitting. We will address that later.

6.6. Save the Model#

We need to re-use this model later. Make sure to download the digits.keras file after you run this command!

# Download and keep this file after saving!
model.save("digits.keras")

6.7. Deliverables#

Submit the following to the Gradescope ICE 2 assignment:

The type and shape outputs
Your Keras model
This completed notebook