CIFAR-10 (Canadian Institute For Advanced Research) is a collection of images with 10 different classes representing airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. CIFAR-10 is a set of images that can be used to teach a computer how to recognize objects.
Since the images in CIFAR-10 are low-resolution (32x32x3), this dataset can allow researchers to quickly try different algorithms to see what works. Various kinds of convolutional neural networks tend to be the best at recognizing the images in CIFAR-10.
In this case study, we will build a multi-class classification algorithm to predict 10 different classes of the CIFAR-10 dataset using Convolutional Neural Networks and Transfer Learning.
The CIFAR-10 dataset consists of 60000 32x32x3, i.e., color images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. You can learn more about this dataset here - https://www.cs.toronto.edu/~kriz/cifar.html
import keras
import numpy as np
# A library for data visualization
import matplotlib.pyplot as plt
# An advanced library for data visualization
import seaborn as sns
import tensorflow as tf
# Keras Sequential Model
from tensorflow.keras.models import Sequential
# Importing all the different layers and optimizers
from tensorflow.keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D, BatchNormalization, Activation, LeakyReLU
from tensorflow.keras.optimizers import Adam
The CIFAR dataset is already present in the Keras library in the form of an n-dimensional NumPy array. We will download this dataset from the Keras module here.
# Importing the dataset
from keras.datasets import cifar10
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
In case the earlier cell throws an error, please uncomment the cell below. Run it once. It will show an error. Restart the run time and then run it again.
Please note: It will definitely show an error the first time you run it and restarting the runtime is compulsory. The conversion of the tensors to numpy array in this cell may take too long to execute.
#!pip install hub
#import hub
#train = hub.load('hub://activeloop/cifar10-train')
#test = hub.load('hub://activeloop/cifar10-test')
#x_train_tensor = train.images
#y_train = train.labels
#x_test = test.images
#y_test = test.labels
#x_train = np.array(x_train)
#x_test = np.array(x_test)
#y_train = np.array(y_train)
#y_test = np.array(y_test)
# Checking the shape of the dataset
x_train.shape
(50000, 32, 32, 3)
Here the data is stored in a 4-dimensional NumPy array. The first dimension 50000 is denoting the number of images in the training data, and each image is stacked on top of the other as a 3-dimensional NumPy array. The second dimension 32 is denoting the number of pixels along the x-axis, the third dimension 32 is denoting the number of pixels along the y-axis, and the fourth dimension 3 is the total number of channels in those images, i.e., these are colored images consisting of RGB (Red, Green, and Blue) channels.
Below is the 3-dimensional NumPy representation of the first image in the training data. Each pixel in the image has 3 values - the intensity of R, G, and B channels, and the size of each image is 32x32. So, each image is represented by 32 arrays of shape 32x3.
x_train[0]
array([[[ 59, 62, 63],
[ 43, 46, 45],
[ 50, 48, 43],
...,
[158, 132, 108],
[152, 125, 102],
[148, 124, 103]],
[[ 16, 20, 20],
[ 0, 0, 0],
[ 18, 8, 0],
...,
[123, 88, 55],
[119, 83, 50],
[122, 87, 57]],
[[ 25, 24, 21],
[ 16, 7, 0],
[ 49, 27, 8],
...,
[118, 84, 50],
[120, 84, 50],
[109, 73, 42]],
...,
[[208, 170, 96],
[201, 153, 34],
[198, 161, 26],
...,
[160, 133, 70],
[ 56, 31, 7],
[ 53, 34, 20]],
[[180, 139, 96],
[173, 123, 42],
[186, 144, 30],
...,
[184, 148, 94],
[ 97, 62, 34],
[ 83, 53, 34]],
[[177, 144, 116],
[168, 129, 94],
[179, 142, 87],
...,
[216, 184, 140],
[151, 118, 84],
[123, 92, 72]]], dtype=uint8)
y_train[0]
array([6], dtype=uint8)
The target labels are encoded in numerical format. Here, encoding 6 denotes the category frog. We will create a list of category names to convert the number to its original category name.
# Checking the shape of the test data
x_test.shape
(10000, 32, 32, 3)
As we saw above, all the images are stored as NumPy arrays, and values in the array denote the pixel intensities of the image. We can use matplotlib's imshow function to visualize the image from NumPy arrays. Below we are plotting a few random images from the dataset, to see what the images look like.
# Declaring the number of classes
num_classes = 10
# Creating a list of category names in alphabetical order
cifar10_classes = ["airplane", "automobile", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck"]
# Declaring the number of rows
rows = 3
# Declaring the number of columns
cols = 4
fig = plt.figure(figsize = (10, 8))
for i in range(cols):
for j in range(rows):
random_index = np.random.randint(0, len(y_train))
ax = fig.add_subplot(rows, cols, i * rows + j + 1)
ax.imshow(x_train[random_index, :])
ax.set_title(cifar10_classes[y_train[random_index, 0]])
# Display the plot
plt.show()
In neural networks, it is always suggested to normalize the feature inputs. Normalization has the below benefits while training a neural network model:
As we know, image pixel values range from 0 - 255, so we are simply dividing all the pixel values by 255 to standardize all the images to have values between 0 - 1.
# Normalizing the image pixels
x_train_normalized = x_train/255
x_test_normalized = x_test/255
Since this is a 10 class classification problem, the output layer should have 10 neurons, which will provide us with the probabilities of the input image belonging to each of those 10 classes. Therefore, we also need to create a one-hot encoded representation for the target classes.
# Creating one-hot encoded representation of target labels
# We can do this by using this utility function - https://www.tensorflow.org/api_docs/python/tf/keras/utils/to_categorical
y_train_encoded = tf.keras.utils.to_categorical(y_train)
y_test_encoded = tf.keras.utils.to_categorical(y_test)
Let's now create a CNN model sequentially where we will be adding the layers one after another.
First, let's set the seed for random number generators in NumPy, Python, and TensorFlow to be able to reproduce the same results every time we run the code.
# Fixing the seed for random number generators
np.random.seed(42)
import random
random.seed(42)
tf.random.set_seed(42)
Let's build a CNN model with Leaky Rectified Linear Unit (LeakyRelu) as the activation function. LeakyReLU is a type of activation function based on a ReLU, but it has a small slope for negative values instead of a flat slope. The slope coefficient is determined before training, i.e., it is not learned during training.
Note:
# Initialized a sequential model
model_1 = Sequential()
# Adding the first convolutional layer with 16 filters and the kernel size of 3x3, and 'same' padding
# The input_shape denotes input dimension of CIFAR images
model_1.add(Conv2D(filters = 16, kernel_size = (3, 3), padding = "same", input_shape = (32, 32, 3)))
# Adding LeakyRelu activation function with a negative slope of 0.1
model_1.add(LeakyReLU(0.1))
# Adding the second convolutional layer with 32 filters and the kernel size of 3x3
model_1.add(Conv2D(filters = 32, kernel_size = (3, 3), padding = 'same'))
# Adding LeakyRelu activation function with a negative slope of 0.1
model_1.add(LeakyReLU(0.1))
# Adding max pooling to reduce the size of the output of second convolutional layer
model_1.add(MaxPooling2D(pool_size = (2, 2)))
# Flattening the 3-d output of the convolutional layer after max pooling to make it ready for creating dense connections
model_1.add(Flatten())
# Adding a fully connected dense layer with 256 neurons
model_1.add(Dense(256))
# Adding LeakyRelu activation function with a negative slope of 0.1
model_1.add(LeakyReLU(0.1))
# Adding the output layer with 10 neurons and 'softmax' activation function (for a multi-class classification problem)
model_1.add(Dense(10, activation = 'softmax'))
# Printing the model summary
model_1.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 32, 32, 16) 448
leaky_re_lu (LeakyReLU) (None, 32, 32, 16) 0
conv2d_1 (Conv2D) (None, 32, 32, 32) 4640
leaky_re_lu_1 (LeakyReLU) (None, 32, 32, 32) 0
max_pooling2d (MaxPooling2D (None, 16, 16, 32) 0
)
flatten (Flatten) (None, 8192) 0
dense (Dense) (None, 256) 2097408
leaky_re_lu_2 (LeakyReLU) (None, 256) 0
dense_1 (Dense) (None, 10) 2570
=================================================================
Total params: 2,105,066
Trainable params: 2,105,066
Non-trainable params: 0
_________________________________________________________________
As we can see from the above summary, this CNN model will train and learn 2,105,066 parameters (weights and biases).
Let's now compile and train the model using the train data. Here, we are using the loss function - categorical_crossentropy as this is a multi-class classification problem. We will try to minimize this loss at every iteration using the optimizer of our choice. Also, we are choosing accuracy as the metric to measure the performance of the model.
model_1.compile(
loss = 'categorical_crossentropy',
# Using Adam optimizer with 0.005 learning rate, by default it is 0.001
optimizer = tf.keras.optimizers.Adamax(learning_rate = 0.005),
metrics=['accuracy']
)
history_1 = model_1.fit(
x_train_normalized, y_train_encoded,
epochs = 10,
validation_split = 0.1,
shuffle = True,
verbose = 2
)
Epoch 1/10 1407/1407 - 19s - loss: 1.3551 - accuracy: 0.5157 - val_loss: 1.0576 - val_accuracy: 0.6206 - 19s/epoch - 13ms/step Epoch 2/10 1407/1407 - 9s - loss: 0.9225 - accuracy: 0.6753 - val_loss: 0.9056 - val_accuracy: 0.6880 - 9s/epoch - 6ms/step Epoch 3/10 1407/1407 - 9s - loss: 0.6909 - accuracy: 0.7562 - val_loss: 0.9105 - val_accuracy: 0.6990 - 9s/epoch - 6ms/step Epoch 4/10 1407/1407 - 9s - loss: 0.4753 - accuracy: 0.8357 - val_loss: 0.9843 - val_accuracy: 0.6982 - 9s/epoch - 6ms/step Epoch 5/10 1407/1407 - 9s - loss: 0.2814 - accuracy: 0.9044 - val_loss: 1.0863 - val_accuracy: 0.7106 - 9s/epoch - 6ms/step Epoch 6/10 1407/1407 - 9s - loss: 0.1533 - accuracy: 0.9496 - val_loss: 1.4088 - val_accuracy: 0.6936 - 9s/epoch - 7ms/step Epoch 7/10 1407/1407 - 9s - loss: 0.0839 - accuracy: 0.9727 - val_loss: 1.6588 - val_accuracy: 0.6998 - 9s/epoch - 6ms/step Epoch 8/10 1407/1407 - 9s - loss: 0.0525 - accuracy: 0.9836 - val_loss: 1.9140 - val_accuracy: 0.6970 - 9s/epoch - 7ms/step Epoch 9/10 1407/1407 - 9s - loss: 0.0337 - accuracy: 0.9892 - val_loss: 1.9631 - val_accuracy: 0.6938 - 9s/epoch - 6ms/step Epoch 10/10 1407/1407 - 9s - loss: 0.0290 - accuracy: 0.9909 - val_loss: 2.1622 - val_accuracy: 0.6946 - 9s/epoch - 6ms/step
plt.plot(history_1.history['accuracy'])
plt.plot(history_1.history['val_accuracy'])
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc = 'upper left')
# Display the plot
plt.show()
Observations:
Let's try adding a few dropout layers to the model structure to reduce overfitting and see if this improves the model or not.
First, we need to clear the previous model's history from the session. In Keras, we need special command to clear the model's history, otherwise, the previous model history remains in the backend.
Also, let's fix the seed again after clearing the backend.
# Clearing the backend
from tensorflow.keras import backend
backend.clear_session()
# Fixing the seed for random number generators
np.random.seed(42)
import random
random.seed(42)
tf.random.set_seed(42)
# Initialized a sequential model
model_2 = Sequential()
# Adding the first convolutional layer with 16 filters and the kernel size of 3x3, and 'same' padding
# The input_shape denotes the input dimension of CIFAR images
model_2.add(Conv2D(filters = 16, kernel_size = (3, 3), padding = "same", input_shape = (32, 32, 3)))
# Adding LeakyRelu activation function with a negative slope of 0.1
model_2.add(LeakyReLU(0.1))
# Adding dropout to randomly switch off 20% neurons to reduce overfitting
model_2.add(Dropout(0.2))
# Adding the second convolutional layer with 32 filters and the kernel size of 3x3
model_2.add(Conv2D(filters = 32, kernel_size = (3, 3), padding = 'same'))
# Adding LeakyRelu activation function with a negative slope of 0.1
model_2.add(LeakyReLU(0.1))
# Adding dropout to randomly switch off 20% neurons to reduce overfitting
model_2.add(Dropout(0.2))
# Adding max pooling to reduce the size of output of second convolutional layer
model_2.add(MaxPooling2D(pool_size = (2, 2)))
# Flattening the 3-d output of the convolutional layer after max pooling to make it ready for creating dense connections
model_2.add(Flatten())
# Adding a fully connected dense layer with 256 neurons
model_2.add(Dense(256))
# Adding LeakyRelu activation function with a negative slope of 0.1
model_2.add(LeakyReLU(0.1))
# Adding dropout to randomly switch off 50% neurons to reduce overfitting
model_2.add(Dropout(0.5))
# Adding the output layer with 10 neurons and 'softmax' activation function since this is a multi-class classification problem
model_2.add(Dense(10, activation = 'softmax'))
# Printing the model summary
model_2.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 32, 32, 16) 448
leaky_re_lu (LeakyReLU) (None, 32, 32, 16) 0
dropout (Dropout) (None, 32, 32, 16) 0
conv2d_1 (Conv2D) (None, 32, 32, 32) 4640
leaky_re_lu_1 (LeakyReLU) (None, 32, 32, 32) 0
dropout_1 (Dropout) (None, 32, 32, 32) 0
max_pooling2d (MaxPooling2D (None, 16, 16, 32) 0
)
flatten (Flatten) (None, 8192) 0
dense (Dense) (None, 256) 2097408
leaky_re_lu_2 (LeakyReLU) (None, 256) 0
dropout_2 (Dropout) (None, 256) 0
dense_1 (Dense) (None, 10) 2570
=================================================================
Total params: 2,105,066
Trainable params: 2,105,066
Non-trainable params: 0
_________________________________________________________________
# Compiling the model
model_2.compile(
loss = 'categorical_crossentropy',
optimizer = tf.keras.optimizers.Adamax(learning_rate = 0.005),
)
# Fitting the model
history_2 = model_2.fit(
x_train_normalized, y_train_encoded,
epochs = 10,
validation_split = 0.1,
shuffle = True,
verbose = 2
)
Epoch 1/10 1407/1407 - 11s - loss: 1.5545 - accuracy: 0.4451 - val_loss: 1.7006 - val_accuracy: 0.4308 - 11s/epoch - 8ms/step Epoch 2/10 1407/1407 - 10s - loss: 1.1775 - accuracy: 0.5825 - val_loss: 1.2222 - val_accuracy: 0.5828 - 10s/epoch - 7ms/step Epoch 3/10 1407/1407 - 10s - loss: 1.0168 - accuracy: 0.6428 - val_loss: 1.1596 - val_accuracy: 0.6174 - 10s/epoch - 7ms/step Epoch 4/10 1407/1407 - 10s - loss: 0.9153 - accuracy: 0.6762 - val_loss: 1.1593 - val_accuracy: 0.6192 - 10s/epoch - 7ms/step Epoch 5/10 1407/1407 - 10s - loss: 0.8393 - accuracy: 0.7048 - val_loss: 1.1708 - val_accuracy: 0.6276 - 10s/epoch - 7ms/step Epoch 6/10 1407/1407 - 10s - loss: 0.7721 - accuracy: 0.7272 - val_loss: 1.2177 - val_accuracy: 0.6312 - 10s/epoch - 7ms/step Epoch 7/10 1407/1407 - 10s - loss: 0.7122 - accuracy: 0.7492 - val_loss: 1.0574 - val_accuracy: 0.6698 - 10s/epoch - 7ms/step Epoch 8/10 1407/1407 - 10s - loss: 0.6680 - accuracy: 0.7627 - val_loss: 0.9645 - val_accuracy: 0.6820 - 10s/epoch - 7ms/step Epoch 9/10 1407/1407 - 10s - loss: 0.6168 - accuracy: 0.7808 - val_loss: 1.1259 - val_accuracy: 0.6530 - 10s/epoch - 7ms/step Epoch 10/10 1407/1407 - 10s - loss: 0.5861 - accuracy: 0.7942 - val_loss: 1.0098 - val_accuracy: 0.6892 - 10s/epoch - 7ms/step
plt.plot(history_2.history['accuracy'])
plt.plot(history_2.history['val_accuracy'])
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc = 'upper left')
# Display the plot
plt.show()
Observations:
Let's now build another model with a few more convolution layers, max-pooling layers, and dropout layers to reduce overfitting. Also, let's change the learning rate and the number of epochs and see if the model's performance improves.
# Clearing backend
from tensorflow.keras import backend
backend.clear_session()
# Fixing the seed for random number generators
np.random.seed(42)
import random
random.seed(42)
tf.random.set_seed(42)
# Initialized a sequential model
model_3 = Sequential()
# Adding the first convolutional layer with 16 filters and the kernel size of 3x3, and 'same' padding
# The input_shape denotes input dimension of CIFAR images
model_3.add(Conv2D(filters = 16, kernel_size = (3, 3), padding = "same", input_shape = (32, 32, 3)))
# Adding LeakyRelu activation function with a negative slope of 0.1
model_3.add(LeakyReLU(0.1))
# Adding the second convolutional layer with 32 filters and the kernel size of 3x3
model_3.add(Conv2D(filters = 32, kernel_size = (3, 3), padding = 'same'))
# Adding LeakyRelu activation function with a negative slope of 0.1
model_3.add(LeakyReLU(0.1))
# Adding max pooling to reduce the size of output of the second convolutional layer
model_3.add(MaxPooling2D(pool_size = (2, 2)))
# Adding dropout to randomly switch off 25% of the network to reduce overfitting
model_3.add(Dropout(0.25))
# Adding the third convolutional layer with 32 filters and the kernel size of 3x3
model_3.add(Conv2D(filters = 32, kernel_size = (3, 3), padding = 'same'))
# Adding LeakyRelu activation function with a negative slope of 0.1
model_3.add(LeakyReLU(0.1))
# Adding the fourth convolutional layer with 64 filters and the kernel size of 3x3
model_3.add(Conv2D(filters = 64, kernel_size = (3, 3), padding = 'same'))
# Adding LeakyRelu activation function with a negative slope of 0.1
model_3.add(LeakyReLU(0.1))
# Adding max pooling to reduce the size of output of the fourth convolutional layer
model_3.add(MaxPooling2D(pool_size = (2, 2)))
# Adding dropout to randomly switch off 25% of the network to reduce overfitting
model_3.add(Dropout(0.25))
# Flattening the 3-d output of the convolutional layer after max pooling to make it ready for creating dense connections
model_3.add(Flatten())
# Adding a fully connected dense layer with 256 neurons
model_3.add(Dense(256))
# Adding LeakyRelu activation function with negative slope of 0.1
model_3.add(LeakyReLU(0.1))
# Adding dropout to randomly switch off 50% of dense layer neurons to reduce overfitting
model_3.add(Dropout(0.5))
# Adding the output layer with 10 neurons and 'softmax' activation function since this is a multi-class classification problem
model_3.add(Dense(10, activation = 'softmax'))
# Summary of the model
model_3.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 32, 32, 16) 448
leaky_re_lu (LeakyReLU) (None, 32, 32, 16) 0
conv2d_1 (Conv2D) (None, 32, 32, 32) 4640
leaky_re_lu_1 (LeakyReLU) (None, 32, 32, 32) 0
max_pooling2d (MaxPooling2D (None, 16, 16, 32) 0
)
dropout (Dropout) (None, 16, 16, 32) 0
conv2d_2 (Conv2D) (None, 16, 16, 32) 9248
leaky_re_lu_2 (LeakyReLU) (None, 16, 16, 32) 0
conv2d_3 (Conv2D) (None, 16, 16, 64) 18496
leaky_re_lu_3 (LeakyReLU) (None, 16, 16, 64) 0
max_pooling2d_1 (MaxPooling (None, 8, 8, 64) 0
2D)
dropout_1 (Dropout) (None, 8, 8, 64) 0
flatten (Flatten) (None, 4096) 0
dense (Dense) (None, 256) 1048832
leaky_re_lu_4 (LeakyReLU) (None, 256) 0
dropout_2 (Dropout) (None, 256) 0
dense_1 (Dense) (None, 10) 2570
=================================================================
Total params: 1,084,234
Trainable params: 1,084,234
Non-trainable params: 0
_________________________________________________________________
In this new architecture, although the number of convolutional layers has increased, but the total number of trainable parameters has reduced significantly (~ 50%). This is due to the addition of a few max-pooling layers in between. Let's train this model.
model_3.compile(
loss = 'categorical_crossentropy',
optimizer = tf.keras.optimizers.Adamax(learning_rate = 0.001),
metrics = ['accuracy']
)
history_3 = model_3.fit(
x_train_normalized, y_train_encoded,
epochs = 15,
validation_split = 0.1,
shuffle = True,
verbose = 2
)
Epoch 1/15 1407/1407 - 14s - loss: 1.6298 - accuracy: 0.4076 - val_loss: 1.3043 - val_accuracy: 0.5286 - 14s/epoch - 10ms/step Epoch 2/15 1407/1407 - 12s - loss: 1.2694 - accuracy: 0.5462 - val_loss: 1.0677 - val_accuracy: 0.6286 - 12s/epoch - 9ms/step Epoch 3/15 1407/1407 - 12s - loss: 1.1054 - accuracy: 0.6079 - val_loss: 0.9629 - val_accuracy: 0.6626 - 12s/epoch - 9ms/step Epoch 4/15 1407/1407 - 12s - loss: 1.0017 - accuracy: 0.6465 - val_loss: 0.9013 - val_accuracy: 0.6908 - 12s/epoch - 9ms/step Epoch 5/15 1407/1407 - 13s - loss: 0.9240 - accuracy: 0.6752 - val_loss: 0.8416 - val_accuracy: 0.7102 - 13s/epoch - 9ms/step Epoch 6/15 1407/1407 - 13s - loss: 0.8632 - accuracy: 0.6963 - val_loss: 0.7724 - val_accuracy: 0.7350 - 13s/epoch - 9ms/step Epoch 7/15 1407/1407 - 13s - loss: 0.8120 - accuracy: 0.7168 - val_loss: 0.7838 - val_accuracy: 0.7258 - 13s/epoch - 9ms/step Epoch 8/15 1407/1407 - 13s - loss: 0.7721 - accuracy: 0.7303 - val_loss: 0.7191 - val_accuracy: 0.7508 - 13s/epoch - 9ms/step Epoch 9/15 1407/1407 - 13s - loss: 0.7368 - accuracy: 0.7409 - val_loss: 0.7273 - val_accuracy: 0.7488 - 13s/epoch - 9ms/step Epoch 10/15 1407/1407 - 12s - loss: 0.6983 - accuracy: 0.7530 - val_loss: 0.6896 - val_accuracy: 0.7584 - 12s/epoch - 9ms/step Epoch 11/15 1407/1407 - 13s - loss: 0.6773 - accuracy: 0.7612 - val_loss: 0.6766 - val_accuracy: 0.7692 - 13s/epoch - 9ms/step Epoch 12/15 1407/1407 - 13s - loss: 0.6485 - accuracy: 0.7716 - val_loss: 0.6468 - val_accuracy: 0.7752 - 13s/epoch - 9ms/step Epoch 13/15 1407/1407 - 16s - loss: 0.6244 - accuracy: 0.7813 - val_loss: 0.6540 - val_accuracy: 0.7764 - 16s/epoch - 12ms/step Epoch 14/15 1407/1407 - 14s - loss: 0.6061 - accuracy: 0.7853 - val_loss: 0.6455 - val_accuracy: 0.7788 - 14s/epoch - 10ms/step Epoch 15/15 1407/1407 - 13s - loss: 0.5824 - accuracy: 0.7947 - val_loss: 0.6239 - val_accuracy: 0.7838 - 13s/epoch - 9ms/step
plt.plot(history_3.history['accuracy'])
plt.plot(history_3.history['val_accuracy'])
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc = 'upper left')
# Display the plot
plt.show()
Observations:
We can try out some more iterations and tune some of the hyperparameters to further improve the model but hyperparameter tuning is exhaustive and can take a long time to find the right set of values for each hyperparameter.
Let's try some other techniques like transfer learning to see if we can speed up the process of training the model and can also get a more accurate model overall.
Transfer learning is a popular deep learning technique that reuses a pre-trained model on a new problem. It can train deep neural networks with comparatively little data. This is very useful in the data science field since most real-world problems typically do not have millions of labeled data points to train complex models.
Let's begin by clearing the backend and fixing the seed.
# Clearing backend
from tensorflow.keras import backend
backend.clear_session()
# Fixing the seed for random number generators
np.random.seed(42)
import random
random.seed(42)
tf.random.set_seed(42)
# Importing necessary libraries
from tensorflow.keras import Model
from tensorflow.keras.applications.vgg16 import VGG16
Now, let's instantiate the VGG16 model.
vgg_model = VGG16(weights = 'imagenet',
include_top = False,
input_shape = (32, 32, 3), pooling = 'max')
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5 58892288/58889256 [==============================] - 0s 0us/step 58900480/58889256 [==============================] - 0s 0us/step
# Checking summary of the model
vgg_model.summary()
Model: "vgg16"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 32, 32, 3)] 0
block1_conv1 (Conv2D) (None, 32, 32, 64) 1792
block1_conv2 (Conv2D) (None, 32, 32, 64) 36928
block1_pool (MaxPooling2D) (None, 16, 16, 64) 0
block2_conv1 (Conv2D) (None, 16, 16, 128) 73856
block2_conv2 (Conv2D) (None, 16, 16, 128) 147584
block2_pool (MaxPooling2D) (None, 8, 8, 128) 0
block3_conv1 (Conv2D) (None, 8, 8, 256) 295168
block3_conv2 (Conv2D) (None, 8, 8, 256) 590080
block3_conv3 (Conv2D) (None, 8, 8, 256) 590080
block3_pool (MaxPooling2D) (None, 4, 4, 256) 0
block4_conv1 (Conv2D) (None, 4, 4, 512) 1180160
block4_conv2 (Conv2D) (None, 4, 4, 512) 2359808
block4_conv3 (Conv2D) (None, 4, 4, 512) 2359808
block4_pool (MaxPooling2D) (None, 2, 2, 512) 0
block5_conv1 (Conv2D) (None, 2, 2, 512) 2359808
block5_conv2 (Conv2D) (None, 2, 2, 512) 2359808
block5_conv3 (Conv2D) (None, 2, 2, 512) 2359808
block5_pool (MaxPooling2D) (None, 1, 1, 512) 0
global_max_pooling2d (Globa (None, 512) 0
lMaxPooling2D)
=================================================================
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
_________________________________________________________________
transfer_layer = vgg_model.get_layer('block3_pool')
vgg_model.trainable = False
Now, we will add classification layers to our model using Functional Model API.
# Add classification layers on top of it
x = Flatten()(transfer_layer.output)
x = Dense(256, activation = 'relu')(x)
x = Dense(128, activation = 'relu')(x)
x = Dropout(0.3)(x)
x = Dense(64, activation = 'relu')(x)
x = BatchNormalization()(x)
pred = Dense(10, activation = 'softmax')(x)
# Initializing the model
model_4 = Model(vgg_model.input, pred)
# Compiling the model
model_4.compile(loss = 'categorical_crossentropy',
optimizer = tf.keras.optimizers.Adamax(learning_rate = 0.0005),
metrics = ['accuracy'])
# Fitting the model
history_4 = model_4.fit(
x_train_normalized, y_train_encoded,
epochs = 10,
batch_size = 250,
validation_split = 0.1,
verbose = 2
)
Epoch 1/10 180/180 - 10s - loss: 1.0818 - accuracy: 0.6320 - val_loss: 0.8091 - val_accuracy: 0.7244 - 10s/epoch - 56ms/step Epoch 2/10 180/180 - 8s - loss: 0.8667 - accuracy: 0.7068 - val_loss: 0.7526 - val_accuracy: 0.7462 - 8s/epoch - 45ms/step Epoch 3/10 180/180 - 8s - loss: 0.7654 - accuracy: 0.7394 - val_loss: 0.6967 - val_accuracy: 0.7658 - 8s/epoch - 42ms/step Epoch 4/10 180/180 - 8s - loss: 0.6931 - accuracy: 0.7654 - val_loss: 0.6643 - val_accuracy: 0.7744 - 8s/epoch - 45ms/step Epoch 5/10 180/180 - 8s - loss: 0.6395 - accuracy: 0.7832 - val_loss: 0.6554 - val_accuracy: 0.7752 - 8s/epoch - 45ms/step Epoch 6/10 180/180 - 8s - loss: 0.5934 - accuracy: 0.7973 - val_loss: 0.6257 - val_accuracy: 0.7840 - 8s/epoch - 45ms/step Epoch 7/10 180/180 - 8s - loss: 0.5527 - accuracy: 0.8108 - val_loss: 0.6206 - val_accuracy: 0.7900 - 8s/epoch - 42ms/step Epoch 8/10 180/180 - 8s - loss: 0.5064 - accuracy: 0.8268 - val_loss: 0.6014 - val_accuracy: 0.7960 - 8s/epoch - 46ms/step Epoch 9/10 180/180 - 8s - loss: 0.4773 - accuracy: 0.8396 - val_loss: 0.6055 - val_accuracy: 0.7920 - 8s/epoch - 46ms/step Epoch 10/10 180/180 - 8s - loss: 0.4437 - accuracy: 0.8473 - val_loss: 0.6118 - val_accuracy: 0.7940 - 8s/epoch - 42ms/step
plt.plot(history_4.history['accuracy'])
plt.plot(history_4.history['val_accuracy'])
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc = 'upper left')
# Display the plot
plt.show()
Observations:
Here, let's use the this model to make predictions on the test data.
# Making predictions on the test data
y_pred_test = model_4.predict(x_test_normalized)
# Converting probabilities to class labels
y_pred_test_classes = np.argmax(y_pred_test, axis = 1)
# Calculating the probability of the predicted class
y_pred_test_max_probas = np.max(y_pred_test, axis = 1)
# Importing required functions
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
# Printing the classification report
print(classification_report(y_test, y_pred_test_classes))
# Plotting the heatmap using confusion matrix
cm = confusion_matrix(y_test, y_pred_test_classes)
plt.figure(figsize = (8, 5))
sns.heatmap(cm, annot = True, fmt = '.0f', xticklabels = cifar10_classes, yticklabels = cifar10_classes)
plt.ylabel('Actual')
plt.xlabel('Predicted')
# Display the plot
plt.show()
precision recall f1-score support
0 0.80 0.84 0.82 1000
1 0.92 0.84 0.88 1000
2 0.76 0.70 0.73 1000
3 0.69 0.54 0.61 1000
4 0.78 0.73 0.76 1000
5 0.61 0.76 0.68 1000
6 0.82 0.86 0.84 1000
7 0.86 0.83 0.84 1000
8 0.82 0.91 0.86 1000
9 0.84 0.85 0.85 1000
accuracy 0.79 10000
macro avg 0.79 0.79 0.79 10000
weighted avg 0.79 0.79 0.79 10000
Observations:
rows = 3
cols = 4
fig = plt.figure(figsize = (10, 12))
for i in range(cols):
for j in range(rows):
random_index = np.random.randint(0, len(y_test))
ax = fig.add_subplot(rows, cols, i * rows + j + 1)
ax.imshow(x_test[random_index, :])
pred_label = cifar10_classes[y_pred_test_classes[random_index]]
pred_proba = y_pred_test_max_probas[random_index]
true_label = cifar10_classes[y_test[random_index, 0]]
ax.set_title("actual: {}\npredicted: {}\nprobability: {:.3}\n".format(
true_label, pred_label, pred_proba
))
plt.show()
In this notebook, we have implemented a CNN model from scratch and used transfer learning to make predictions on the CIFAR-10 dataset. We have learned how to prepare the image data before passing it into the CNN model and how to add layers sequentially inside the model.
We have seen four different iterations of the CNN model and built an intuition about how to improve the model by tuning various hyperparameters and using different techniques. There is still plenty of scope for improvement and you can try out tuning different hyperparameters to improve the model performance.