Welcome to the project on classification using Convolutional Neural Networks. We will continue to work with the Street View Housing Numbers (SVHN) image dataset for this project.
One of the most interesting tasks in deep learning is to recognize objects in natural scenes. The ability to process visual information using machine learning algorithms can be very useful as demonstrated in various applications.
The SVHN dataset contains over 600,000 labeled digits cropped from street-level photos. It is one of the most popular image recognition datasets. It has been used in neural networks created by Google to improve the map quality by automatically transcribing the address numbers from a patch of pixels. The transcribed number with a known street address helps pinpoint the location of the building it represents.
To build a CNN model that can recognize the digits in the images.
Here, we will use a subset of the original data to save some computation time. The dataset is provided as a .h5 file. The basic preprocessing steps have been applied on the dataset.
Let us start by mounting the Google drive. You can run the below cell to mount the Google drive.
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, MaxPool2D, BatchNormalization, Dropout, Flatten, LeakyReLU
from tensorflow.keras.losses import categorical_crossentropy
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import to_categorical
Let us check the version of tensorflow.
print(tf.__version__)
2.8.2
import h5py
# Open the file as read only
# User can make changes in the path as required
h5f = h5py.File('/content/drive/MyDrive/Elective Project/SVHN_single_grey1.h5', 'r')
# Load the the train and the test dataset
X_train = h5f['X_train'][:]
y_train = h5f['y_train'][:]
X_test = h5f['X_test'][:]
y_test = h5f['y_test'][:]
# Close this file
h5f.close()
Let's check the number of images in the training and the testing dataset.
len(X_train), len(X_test)
(42000, 18000)
Observation:
# Visualizing the first 10 images in the dataset and printing their labels
%matplotlib inline
import matplotlib.pyplot as plt
plt.figure(figsize = (10, 1))
for i in range(10):
plt.subplot(1, 10, i+1)
plt.imshow(X_train[i], cmap = "gray") # Write the function to visualize images
plt.axis('off')
plt.show()
print('label for each of the above image: %s' % (y_train[0:10]))
label for each of the above image: [2 6 7 4 4 0 3 0 7 3]
# Shape and the array of pixels for the first image
print("Shape:", X_train[0].shape)
print()
print("First image:\n", X_train[0])
Shape: (32, 32) First image: [[ 33.0704 30.2601 26.852 ... 71.4471 58.2204 42.9939] [ 25.2283 25.5533 29.9765 ... 113.0209 103.3639 84.2949] [ 26.2775 22.6137 40.4763 ... 113.3028 121.775 115.4228] ... [ 28.5502 36.212 45.0801 ... 24.1359 25.0927 26.0603] [ 38.4352 26.4733 23.2717 ... 28.1094 29.4683 30.0661] [ 50.2984 26.0773 24.0389 ... 49.6682 50.853 53.0377]]
# Reshaping the dataset to be able to pass them to CNNs. Remember that we always have to give a 4D array as input to CNNs
X_train = X_train.reshape(X_train.shape[0], 32, 32, 1)
X_test = X_test.reshape(X_test.shape[0], 32, 32, 1)
# Normalize inputs from 0-255 to 0-1
X_train = X_train / 255.0
X_test = X_test / 255.0
# New shape
print('Training set:', X_train.shape, y_train.shape)
print('Test set:', X_test.shape, y_test.shape)
Training set: (42000, 32, 32, 1) (42000,) Test set: (18000, 32, 32, 1) (18000,)
# Write the function and appropriate variable name to one-hot encode the output
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
# test labels
y_test
array([[0., 1., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 1., 0., 0.], [0., 0., 1., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 1., 0., 0.], [0., 0., 0., ..., 0., 0., 1.], [0., 0., 1., ..., 0., 0., 0.]], dtype=float32)
Observation:
Now that we have done data preprocessing, let's build a CNN model.
# Fixing the seed for random number generators
np.random.seed(42)
import random
random.seed(42)
tf.random.set_seed(42)
# Define the model
def cnn_model_1():
model = Sequential()
# Add layers as per the architecture mentioned above in the same sequence
model.add(Conv2D(filters = 16, kernel_size = (3, 3), padding = "same", input_shape = (32, 32, 1))) #first convolutional layer
model.add(LeakyReLU(0.1))
model.add(Conv2D(filters = 32, kernel_size = (3, 3), padding = "same")) #second convolutional layer
model.add(LeakyReLU(0.1))
model.add(MaxPool2D(pool_size = (2,2))) #max-pooling layer
model.add(Flatten()) #flatten
model.add(Dense(32))
model.add(LeakyReLU(0.1))
model.add(Dense(10, activation = 'softmax')) #output layer
# Compile the model
model.compile(loss = 'categorical_crossentropy',
metrics = ['accuracy'],
optimizer = 'adam')
return model
# Build the model
model_1 = cnn_model_1()
# Print the model summary
model_1.summary()
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d (Conv2D) (None, 32, 32, 16) 160 leaky_re_lu (LeakyReLU) (None, 32, 32, 16) 0 conv2d_1 (Conv2D) (None, 32, 32, 32) 4640 leaky_re_lu_1 (LeakyReLU) (None, 32, 32, 32) 0 max_pooling2d (MaxPooling2D (None, 16, 16, 32) 0 ) flatten (Flatten) (None, 8192) 0 dense (Dense) (None, 32) 262176 leaky_re_lu_2 (LeakyReLU) (None, 32) 0 dense_1 (Dense) (None, 10) 330 ================================================================= Total params: 267,306 Trainable params: 267,306 Non-trainable params: 0 _________________________________________________________________
# Fit the model
history_model_1 = model_1.fit(X_train,
y_train,
epochs = 20,
validation_split= 0.2,
batch_size = 32,
verbose=1)
Epoch 1/20 1050/1050 [==============================] - 17s 5ms/step - loss: 1.1727 - accuracy: 0.6101 - val_loss: 0.6389 - val_accuracy: 0.8170 Epoch 2/20 1050/1050 [==============================] - 4s 4ms/step - loss: 0.5376 - accuracy: 0.8441 - val_loss: 0.5146 - val_accuracy: 0.8545 Epoch 3/20 1050/1050 [==============================] - 4s 4ms/step - loss: 0.4434 - accuracy: 0.8706 - val_loss: 0.4992 - val_accuracy: 0.8617 Epoch 4/20 1050/1050 [==============================] - 4s 4ms/step - loss: 0.3841 - accuracy: 0.8872 - val_loss: 0.4704 - val_accuracy: 0.8690 Epoch 5/20 1050/1050 [==============================] - 4s 4ms/step - loss: 0.3403 - accuracy: 0.8987 - val_loss: 0.4673 - val_accuracy: 0.8707 Epoch 6/20 1050/1050 [==============================] - 4s 4ms/step - loss: 0.3020 - accuracy: 0.9093 - val_loss: 0.4812 - val_accuracy: 0.8668 Epoch 7/20 1050/1050 [==============================] - 4s 4ms/step - loss: 0.2723 - accuracy: 0.9168 - val_loss: 0.4558 - val_accuracy: 0.8771 Epoch 8/20 1050/1050 [==============================] - 4s 4ms/step - loss: 0.2416 - accuracy: 0.9254 - val_loss: 0.4820 - val_accuracy: 0.8717 Epoch 9/20 1050/1050 [==============================] - 4s 4ms/step - loss: 0.2214 - accuracy: 0.9322 - val_loss: 0.4911 - val_accuracy: 0.8762 Epoch 10/20 1050/1050 [==============================] - 4s 4ms/step - loss: 0.1973 - accuracy: 0.9397 - val_loss: 0.5099 - val_accuracy: 0.8764 Epoch 11/20 1050/1050 [==============================] - 4s 4ms/step - loss: 0.1795 - accuracy: 0.9435 - val_loss: 0.5833 - val_accuracy: 0.8667 Epoch 12/20 1050/1050 [==============================] - 4s 4ms/step - loss: 0.1584 - accuracy: 0.9504 - val_loss: 0.5740 - val_accuracy: 0.8702 Epoch 13/20 1050/1050 [==============================] - 4s 4ms/step - loss: 0.1441 - accuracy: 0.9540 - val_loss: 0.6148 - val_accuracy: 0.8635 Epoch 14/20 1050/1050 [==============================] - 4s 4ms/step - loss: 0.1325 - accuracy: 0.9586 - val_loss: 0.6166 - val_accuracy: 0.8643 Epoch 15/20 1050/1050 [==============================] - 4s 4ms/step - loss: 0.1223 - accuracy: 0.9613 - val_loss: 0.6700 - val_accuracy: 0.8635 Epoch 16/20 1050/1050 [==============================] - 4s 4ms/step - loss: 0.1031 - accuracy: 0.9674 - val_loss: 0.6931 - val_accuracy: 0.8699 Epoch 17/20 1050/1050 [==============================] - 4s 4ms/step - loss: 0.0957 - accuracy: 0.9686 - val_loss: 0.7262 - val_accuracy: 0.8690 Epoch 18/20 1050/1050 [==============================] - 4s 4ms/step - loss: 0.0976 - accuracy: 0.9677 - val_loss: 0.7269 - val_accuracy: 0.8706 Epoch 19/20 1050/1050 [==============================] - 4s 4ms/step - loss: 0.0786 - accuracy: 0.9746 - val_loss: 0.7828 - val_accuracy: 0.8695 Epoch 20/20 1050/1050 [==============================] - 4s 4ms/step - loss: 0.0790 - accuracy: 0.9731 - val_loss: 0.8345 - val_accuracy: 0.8631
# Plotting the accuracies
dict_hist = history_model_1.history
list_ep = [i for i in range(1, 21)]
plt.figure(figsize = (8, 8))
plt.plot(list_ep, dict_hist['accuracy'], ls = '--', label = 'accuracy')
plt.plot(list_ep, dict_hist['val_accuracy'], ls = '--', label = 'val_accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epochs')
plt.legend()
plt.show()
Observations: The model is performing well on the training data but is much less accurate on the validation data. This tells us that the model is overfitting the training data. After about 3 epochs the validation data accuracy fluctuates slightly around 86%.
Let's build another model and see if we can get a better model with generalized performance.
First, we need to clear the previous model's history from the Keras backend. Also, let's fix the seed again after clearing the backend.
# Clearing backend
from tensorflow.keras import backend
backend.clear_session()
# Fixing the seed for random number generators
np.random.seed(42)
import random
random.seed(42)
tf.random.set_seed(42)
# Define the model
def cnn_model_2():
model = Sequential()
# Add layers as per the architecture mentioned above in the same sequence
model.add(Conv2D(filters = 16, kernel_size = (3, 3), padding = "same", input_shape = (32, 32, 1))) #first convolutional layer
model.add(LeakyReLU(0.1))
model.add(Conv2D(filters = 32, kernel_size = (3, 3), padding = "same")) #second convolutional layer
model.add(LeakyReLU(0.1))
model.add(MaxPool2D(pool_size = (2,2))) #max-pooling layer
model.add(BatchNormalization())
model.add(Conv2D(filters = 32, kernel_size = (3, 3), padding = "same")) #third convolutional layer
model.add(LeakyReLU(0.1))
model.add(Conv2D(filters = 64, kernel_size = (3, 3), padding = "same")) #fourth convolutional layer
model.add(LeakyReLU(0.1))
model.add(MaxPool2D(pool_size = (2,2))) #max-pooling layer
model.add(BatchNormalization())
model.add(Flatten()) #Flatten
model.add(Dense(32))
model.add(LeakyReLU(0.1))
model.add(Dropout(0.5))
model.add(Dense(10, activation = 'softmax')) #output layer
# Compile the model
model.compile(loss = 'categorical_crossentropy',
metrics = ['accuracy'],
optimizer = 'adam')
return model
# Build the model
model_2 = cnn_model_2()
# Print the summary
model_2.summary()
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d (Conv2D) (None, 32, 32, 16) 160 leaky_re_lu (LeakyReLU) (None, 32, 32, 16) 0 conv2d_1 (Conv2D) (None, 32, 32, 32) 4640 leaky_re_lu_1 (LeakyReLU) (None, 32, 32, 32) 0 max_pooling2d (MaxPooling2D (None, 16, 16, 32) 0 ) batch_normalization (BatchN (None, 16, 16, 32) 128 ormalization) conv2d_2 (Conv2D) (None, 16, 16, 32) 9248 leaky_re_lu_2 (LeakyReLU) (None, 16, 16, 32) 0 conv2d_3 (Conv2D) (None, 16, 16, 64) 18496 leaky_re_lu_3 (LeakyReLU) (None, 16, 16, 64) 0 max_pooling2d_1 (MaxPooling (None, 8, 8, 64) 0 2D) batch_normalization_1 (Batc (None, 8, 8, 64) 256 hNormalization) flatten (Flatten) (None, 4096) 0 dense (Dense) (None, 32) 131104 leaky_re_lu_4 (LeakyReLU) (None, 32) 0 dropout (Dropout) (None, 32) 0 dense_1 (Dense) (None, 10) 330 ================================================================= Total params: 164,362 Trainable params: 164,170 Non-trainable params: 192 _________________________________________________________________
# Fit the model
history_model_2 = model_2.fit(X_train,
y_train,
epochs = 30,
validation_split= 0.2,
batch_size = 128,
verbose=1)
Epoch 1/30 263/263 [==============================] - 4s 12ms/step - loss: 1.4729 - accuracy: 0.4994 - val_loss: 3.0288 - val_accuracy: 0.1901 Epoch 2/30 263/263 [==============================] - 2s 9ms/step - loss: 0.6925 - accuracy: 0.7841 - val_loss: 0.6941 - val_accuracy: 0.7889 Epoch 3/30 263/263 [==============================] - 2s 9ms/step - loss: 0.5608 - accuracy: 0.8289 - val_loss: 0.4835 - val_accuracy: 0.8585 Epoch 4/30 263/263 [==============================] - 2s 9ms/step - loss: 0.4954 - accuracy: 0.8486 - val_loss: 0.4414 - val_accuracy: 0.8737 Epoch 5/30 263/263 [==============================] - 3s 10ms/step - loss: 0.4495 - accuracy: 0.8635 - val_loss: 0.4221 - val_accuracy: 0.8821 Epoch 6/30 263/263 [==============================] - 2s 9ms/step - loss: 0.4099 - accuracy: 0.8738 - val_loss: 0.4149 - val_accuracy: 0.8860 Epoch 7/30 263/263 [==============================] - 2s 9ms/step - loss: 0.3828 - accuracy: 0.8820 - val_loss: 0.4117 - val_accuracy: 0.8870 Epoch 8/30 263/263 [==============================] - 3s 12ms/step - loss: 0.3566 - accuracy: 0.8906 - val_loss: 0.3905 - val_accuracy: 0.8883 Epoch 9/30 263/263 [==============================] - 3s 12ms/step - loss: 0.3307 - accuracy: 0.8978 - val_loss: 0.3488 - val_accuracy: 0.9051 Epoch 10/30 263/263 [==============================] - 2s 9ms/step - loss: 0.3145 - accuracy: 0.9020 - val_loss: 0.3925 - val_accuracy: 0.8915 Epoch 11/30 263/263 [==============================] - 3s 11ms/step - loss: 0.2955 - accuracy: 0.9079 - val_loss: 0.4291 - val_accuracy: 0.8939 Epoch 12/30 263/263 [==============================] - 3s 12ms/step - loss: 0.2851 - accuracy: 0.9112 - val_loss: 0.3754 - val_accuracy: 0.9045 Epoch 13/30 263/263 [==============================] - 4s 14ms/step - loss: 0.2626 - accuracy: 0.9190 - val_loss: 0.3729 - val_accuracy: 0.9080 Epoch 14/30 263/263 [==============================] - 3s 12ms/step - loss: 0.2597 - accuracy: 0.9173 - val_loss: 0.3785 - val_accuracy: 0.9079 Epoch 15/30 263/263 [==============================] - 3s 13ms/step - loss: 0.2524 - accuracy: 0.9193 - val_loss: 0.3833 - val_accuracy: 0.9049 Epoch 16/30 263/263 [==============================] - 3s 11ms/step - loss: 0.2424 - accuracy: 0.9240 - val_loss: 0.3992 - val_accuracy: 0.9048 Epoch 17/30 263/263 [==============================] - 3s 11ms/step - loss: 0.2327 - accuracy: 0.9251 - val_loss: 0.3937 - val_accuracy: 0.9044 Epoch 18/30 263/263 [==============================] - 2s 9ms/step - loss: 0.2227 - accuracy: 0.9278 - val_loss: 0.4894 - val_accuracy: 0.8780 Epoch 19/30 263/263 [==============================] - 3s 10ms/step - loss: 0.2112 - accuracy: 0.9295 - val_loss: 0.4271 - val_accuracy: 0.9036 Epoch 20/30 263/263 [==============================] - 3s 10ms/step - loss: 0.2017 - accuracy: 0.9351 - val_loss: 0.4124 - val_accuracy: 0.9089 Epoch 21/30 263/263 [==============================] - 3s 10ms/step - loss: 0.1934 - accuracy: 0.9369 - val_loss: 0.3949 - val_accuracy: 0.9112 Epoch 22/30 263/263 [==============================] - 3s 10ms/step - loss: 0.1855 - accuracy: 0.9372 - val_loss: 0.4154 - val_accuracy: 0.9094 Epoch 23/30 263/263 [==============================] - 3s 10ms/step - loss: 0.1902 - accuracy: 0.9382 - val_loss: 0.4650 - val_accuracy: 0.9048 Epoch 24/30 263/263 [==============================] - 3s 10ms/step - loss: 0.1843 - accuracy: 0.9391 - val_loss: 0.4220 - val_accuracy: 0.9101 Epoch 25/30 263/263 [==============================] - 3s 12ms/step - loss: 0.1708 - accuracy: 0.9437 - val_loss: 0.3992 - val_accuracy: 0.9130 Epoch 26/30 263/263 [==============================] - 3s 10ms/step - loss: 0.1717 - accuracy: 0.9444 - val_loss: 0.4331 - val_accuracy: 0.9083 Epoch 27/30 263/263 [==============================] - 3s 11ms/step - loss: 0.1653 - accuracy: 0.9443 - val_loss: 0.4414 - val_accuracy: 0.9133 Epoch 28/30 263/263 [==============================] - 3s 11ms/step - loss: 0.1571 - accuracy: 0.9479 - val_loss: 0.4175 - val_accuracy: 0.9046 Epoch 29/30 263/263 [==============================] - 3s 11ms/step - loss: 0.1552 - accuracy: 0.9472 - val_loss: 0.4719 - val_accuracy: 0.9063 Epoch 30/30 263/263 [==============================] - 2s 9ms/step - loss: 0.1548 - accuracy: 0.9487 - val_loss: 0.5228 - val_accuracy: 0.9029
# Plotting the accuracies
dict_hist = history_model_2.history
list_ep = [i for i in range(1, 31)]
plt.figure(figsize = (8, 8))
plt.plot(list_ep, dict_hist['accuracy'], ls = '--', label = 'accuracy')
plt.plot(list_ep, dict_hist['val_accuracy'], ls = '--', label = 'val_accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epochs')
plt.legend()
plt.show()
Observations:After adding more layers and Batch normalizations and dropout layers the accuracy of the validation data drastically improved to 90% with very few epochs. The accuracies for the training and validation data are very similar indicating generalized performance.
# Make prediction on the test data using model_2
test_pred = model_2.predict(X_test)
test_pred = np.argmax(test_pred, axis = -1)
Note: Earlier, we noticed that each entry of the target variable is a one-hot encoded vector, but to print the classification report and confusion matrix, we must convert each entry of y_test to a single label.
# Converting each entry to single label from one-hot encoded vector
y_test = np.argmax(y_test, axis = -1)
# Importing required functions
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
# Printing the classification report
print(classification_report(y_test, test_pred))
# Plotting the heatmap using confusion matrix
cm = confusion_matrix(y_test, test_pred)
plt.figure(figsize = (8, 5))
sns.heatmap(cm, annot = True, fmt = '.0f')
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show()
precision recall f1-score support 0 0.92 0.93 0.92 1814 1 0.91 0.89 0.90 1828 2 0.93 0.91 0.92 1803 3 0.92 0.86 0.89 1719 4 0.89 0.93 0.91 1812 5 0.90 0.90 0.90 1768 6 0.85 0.92 0.88 1832 7 0.91 0.93 0.92 1808 8 0.92 0.85 0.89 1812 9 0.88 0.90 0.89 1804 accuracy 0.90 18000 macro avg 0.90 0.90 0.90 18000 weighted avg 0.90 0.90 0.90 18000
Final Observations: From the confusion matrix we can see that the overall average accuracy has increased to 90% with the convoluted neural networks with batch normalization and dropout layers. The heatmap shows far less confusion with the highest mistake at 56.