7. Neural Networks#

This notebook demonstrates the building and training of a deep neural network (DNN) for digit classification using the load_digits dataset from scikit-learn. The DNN is built using TensorFlow’s Keras API.

7.1. Pre-Reading#

7.1.1. Objectives#

  • Understand the types of layers in a Neural Network and how they can be modified.

  • Describe TensorFlow and Keras at a conceptual level

7.2. Load and Preprocess the Data#

We will use the same handwritten digits dataset we used with K-Means. As such, we can skip some of the exploration and visualization of the data.

Let’s jump right in. Load and preprocess the dataset exactly like we did in the previous lab.

import numpy as np
from sklearn.datasets import load_digits

# Load the dataset
data, labels = load_digits(return_X_y=True)
(n_samples, n_features), n_digits = data.shape, np.unique(labels).size

print(f"# digits: {n_digits}; # samples: {n_samples}; # features {n_features}")

7.2.1. Train, Test, and Validation Sets#

Deep learning comes with a host of challenges. One of them is overfitting.

To combat this - and for other reasons - we’ll split our images into three sets:

  • train is the set we will fit the model to

  • test is the set we will evaluate the model against

  • validate will help determine if the model generalizes well, or is just being matched to test/train

A reasonable starting breakdown is:

  • 20% test

  • 60% train

  • 20% validate

from sklearn.model_selection import train_test_split

# Split the data into train, test, and validation sets
X_train, X_test, y_train, y_test = train_test_split(
    data, labels, test_size=0.2, random_state=42
)
X_train, X_val, y_train, y_val = train_test_split(
    X_train, y_train, test_size=0.25, random_state=42
)

# Normalize the data
X_train = X_train / 16.0
X_test = X_test / 16.0
X_val = X_val / 16.0

7.3. Build the Deep Neural Network#

Next, let’s build the DNN model with a single dropout layer.

Layers get added into the model one at a time (sequential).

Notice that the first layer matches the number of features from our dataset.

The number and size of the hidden layers are arbitarily chosen here… this is one of the greatest challenges in DNN.

We’ll use ReLU (Rectified Linear Unit) as the activation function. The output is max(x, 0).

The Dropout layer randomly sets input units to 0 with a frequency of rate at each step during training time, which helps prevent overfitting. Yes, randomly throwing data away somehow helps.

Finally, Softmax converts a vector of K real numbers into a probability distribution of K possible outcomes. The sum of these probabilities equals 1. We will assign our sample to the class with the highest probability.

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout

# Build the DNN model with a single dropout layer
model = Sequential()
model.add(Dense(64, activation="relu", input_shape=(n_features,)))
model.add(Dense(64, activation="relu"))
model.add(Dense(64, activation="relu"))
model.add(Dropout(0.2))  # Dropout regularization with 20% dropout rate
model.add(Dense(10, activation="softmax"))

7.4. Compile and Train the Model#

After building the model, we need to compile it with an optimizer, loss function, and metrics. Then, we can train the model on the training set.

Keras provides the compile API which does A LOT of stuff under the hood.

For example, you can pick an optimizer. Adam is a popular optimizer; it is a “stochastic gradient descent method.”

# Compile the model
model.compile(
    optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]
)

7.4.1. Visualize the Model#

We can print a summary of the model as well as a graphical represnetation.

We should always do this for a few reasons:

  • Confirm layer order. It is easy to add layers in the wrong order with the sequential API or to connect them together incorrectly with the functional API. The graph plot can help you confirm that the model is connected the way you intended.

  • Confirm the output shape of each layer. It is common to have problems when defining the shape of input data for complex networks like convolutional and recurrent neural networks. The summary and plot can help you confirm the input shape to the network is as you intended.

  • Confirm parameters. Some network configurations can use far fewer parameters, such as the use of a TimeDistributed wrapped Dense layer in an Encoder-Decoder recurrent neural network. Reviewing the summary can help spot cases of using far more parameters than expected.

# You should ALWAYS run this after compile
model.summary()
# Required for plot_model()
# plot_model() returns an image, instead of text
%pip install pydot
!apt install graphviz -y
from keras.utils.vis_utils import plot_model

# This is sometimes worth running, if you have the dependencies installed
plot_model(
    model,
    "plot_model.png",
    show_shapes=True,
    show_layer_names=True,
    show_layer_activations=True,
    show_dtype=True,
)

7.4.2. Train the model#

Keras also offers a fit API.

This method trains the model for a fixed number of ephochs (dataset iterations).

# Fit the model
model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_val, y_val))

7.5. Evaluate the Model#

Finally, we can evaluate the accuracy of the trained model on the validation set.

You guessed it: Keras API. Returns the loss value & metrics values for the model in test mode.

There is also a predict method that conducts inference on unknown samples.

# Evaluate accuracy on the validation set
_, accuracy = model.evaluate(X_val, y_val)
print("Validation Accuracy:", accuracy)

7.6. Go further (optional)#

Can you improve Validation Accuracy?

Try some of the following

  • Adjust the number or size of hidden layers

  • Use data augmentation to increase your training set