MNIST Digits with Neural Network

6. MNIST Digits with Neural Network#

This notebook demonstrates the building and training of a deep neural network (DNN) for digit classification using the load_digits dataset from scikit-learn. The DNN is built using TensorFlow’s Keras API.

6.1. Pre-Reading#

Video 3Blue1Brown: But what is a neural network?

Objectives#

Understand the types of layers in a Neural Network and how they can be modified.
Describe TensorFlow and Keras at a conceptual level

This notebook is a modification of fchollet/deep-learning-with-python-notebooks

6.2. Make the model#

This exercise is taken from Chollet, 2.1 A first look at a neural network

Load and Preprocess the Data#

We will use the same handwritten digits dataset we used with K-Means.

The MNIST dataset comes preloaded in Keras, in the form of a set of four NumPy arrays.

from tensorflow.keras.datasets import mnist

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

Train and Test Sets#

For now, we’ll split our images into two sets:

train is the set we will fit the model to
test is the set we will evaluate the model against

The included MNIST dataset is already broken into train and test for us.

In later exercises we will also include a validation set will help determine if the model generalizes well, or if it is just overfitting to the data we are using.

train_images.shape

len(train_labels)

train_labels

len(test_labels)

test_labels

Build the Deep Neural Network#

Let’s build the network—again, remember that you aren’t expected to understand everything about this example yet. Layers get added into the model one at a time (sequential).

The core building block of neural networks is the layer. You can think of a layer as a filter for data: some data goes in, and it comes out in a more useful form.

The number and size of the hidden layers are arbitrarily chosen here… this is one of the greatest challenges in DNN.

We’ll use ReLU (Rectified Linear Unit) in a densely connected layer as the activation function. The output is max(x, 0).

Finally, Softmax converts a vector of K real numbers into a probability distribution of K possible outcomes. The sum of these probabilities equals 1. We will assign our sample to the class with the highest probability.

from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential(
    [layers.Dense(512, activation="relu"), layers.Dense(10, activation="softmax")]
)

Compile the Model#

To make the model ready for training, we need to pick three more things as part of the compilation step:

Optimizer: The mechanism through which the model will update itself based on the training data it sees, so as to improve its performance.
Loss function: How the model will be able to measure its performance on the training data, and thus how it will be able to steer itself in the right direction.
Metrics: to monitor during training and testing—Here, we’ll only care about accuracy (the fraction of the images that were correctly classified).

Keras provides the compile API which does A LOT of stuff under the hood.

model.compile(
    optimizer="rmsprop", loss="sparse_categorical_crossentropy", metrics=["accuracy"]
)

Visualize the Model#

We can print a summary of the model as well as a graphical represnetation.

We should always do this for a few reasons:

Confirm layer order. It is easy to add layers in the wrong order with the sequential API or to connect them together incorrectly with the functional API. The graph plot can help you confirm that the model is connected the way you intended.
Confirm the output shape of each layer. It is common to have problems when defining the shape of input data for complex networks like convolutional and recurrent neural networks. The summary and plot can help you confirm the input shape to the network is as you intended.
Confirm parameters. Some network configurations can use far fewer parameters, such as the use of a TimeDistributed wrapped Dense layer in an Encoder-Decoder recurrent neural network. Reviewing the summary can help spot cases of using far more parameters than expected.

Our “Output Shape” is unknown because we didn’t specify an Input Layer. Instead, we need to transform our data.

# You should ALWAYS run this after compile
model.summary()

6.3. Train the model#

Transform Data#

Before training, we’ll preprocess the data by reshaping it into the shape the model expects and scaling it so that all values are in the [0, 1] interval. Previously, our training images were stored in an array of shape (60000, 28, 28) of type uint8 with values in the [0, 255] interval. We’ll transform it into a float32 array of shape (60000, 28 * 28) with values between 0 and 1.

Fit model#

This method trains the model for a fixed number of ephochs (dataset iterations).

# Recall that train_image.
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype("float32") / 255
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype("float32") / 255

Keras offers a fit API that will automatically train the model on our data for a set number of epochs.

Two quantities are displayed during training: the loss of the model over the training data, and the accuracy of the model over the training data. We quickly reach an accuracy of 0.989 (98.9%) on the training data.

model.fit(train_images, train_labels, epochs=5, batch_size=128)

6.4. Make a Prediction#

Finally, we can evaluate the accuracy of the trained model on the validation set.

Now that we have a trained model, we can use it to predict class probabilities for new digits—images that weren’t part of the training data, like those from the test set.

# Grab the first 10 samples from the test dataset
test_digits = test_images[0:10]
# Make predictions
predictions = model.predict(test_digits)
print(predictions[0])

This first test digit has the highest probability score (0.99999106, almost 1) at index 7, so according to our model, it must be a 7:

highest_prediction_index = predictions[0].argmax()
print(
    "Index of highest probability:",
    highest_prediction_index,
    "with probability:",
    predictions[0][highest_prediction_index],
)

We can check that the test label agrees:

test_labels[0]

On average, how good is our model at classifying such never-before-seen digits? Let’s check by computing average accuracy over the entire test set.

The test-set accuracy turns out to be 97.8%—that’s quite a bit lower than the training-set accuracy (98.9%). This gap between training accuracy and test accuracy is an example of overfitting: the fact that machine learning models tend to perform worse on new data than on their training data. We will address overfitting later.

test_loss, test_acc = model.evaluate(test_images, test_labels)
print(f"test_acc: {test_acc}")

6.5. Save the Model#

We need to re-use this model later. Make sure to download the digits.keras file after you run this command!

# Download and keep this file after saving!
model.save("digits.keras")