CIFAR-10 Image Classification¶



Context¶


CIFAR-10 (Canadian Institute For Advanced Research) is a collection of images with 10 different classes representing airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. CIFAR-10 is a set of images that can be used to teach a computer how to recognize objects.

Since the images in CIFAR-10 are low-resolution (32x32x3), this dataset can allow researchers to quickly try different algorithms to see what works. Various kinds of convolutional neural networks tend to be the best at recognizing the images in CIFAR-10.


Objective¶


In this case study, we will build a multi-class classification algorithm to predict 10 different classes of the CIFAR-10 dataset using Convolutional Neural Networks and Transfer Learning.


Dataset¶


The CIFAR-10 dataset consists of 60000 32x32x3, i.e., color images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. You can learn more about this dataset here - https://www.cs.toronto.edu/~kriz/cifar.html

Loading libraries¶

In [ ]:
import keras

import numpy as np

# A library for data visualization
import matplotlib.pyplot as plt

# An advanced library for data visualization
import seaborn as sns

import tensorflow as tf

# Keras Sequential Model
from tensorflow.keras.models import Sequential

# Importing all the different layers and optimizers
from tensorflow.keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D, BatchNormalization, Activation, LeakyReLU

from tensorflow.keras.optimizers import Adam

Importing and loading the CIFAR dataset¶

The CIFAR dataset is already present in the Keras library in the form of an n-dimensional NumPy array. We will download this dataset from the Keras module here.

In [ ]:
# Importing the dataset
from keras.datasets import cifar10

(x_train, y_train), (x_test, y_test) = cifar10.load_data()

In case the earlier cell throws an error, please uncomment the cell below. Run it once. It will show an error. Restart the run time and then run it again.

Please note: It will definitely show an error the first time you run it and restarting the runtime is compulsory. The conversion of the tensors to numpy array in this cell may take too long to execute.

In [ ]:
#!pip install hub
#import hub

#train = hub.load('hub://activeloop/cifar10-train')
#test = hub.load('hub://activeloop/cifar10-test')
#x_train_tensor = train.images
#y_train = train.labels
#x_test = test.images
#y_test = test.labels

#x_train = np.array(x_train)
#x_test = np.array(x_test)
#y_train = np.array(y_train)
#y_test = np.array(y_test)
In [ ]:
# Checking the shape of the dataset
x_train.shape
Out[ ]:
(50000, 32, 32, 3)

Here the data is stored in a 4-dimensional NumPy array. The first dimension 50000 is denoting the number of images in the training data, and each image is stacked on top of the other as a 3-dimensional NumPy array. The second dimension 32 is denoting the number of pixels along the x-axis, the third dimension 32 is denoting the number of pixels along the y-axis, and the fourth dimension 3 is the total number of channels in those images, i.e., these are colored images consisting of RGB (Red, Green, and Blue) channels.

Below is the 3-dimensional NumPy representation of the first image in the training data. Each pixel in the image has 3 values - the intensity of R, G, and B channels, and the size of each image is 32x32. So, each image is represented by 32 arrays of shape 32x3.

In [ ]:
x_train[0]
Out[ ]:
array([[[ 59,  62,  63],
        [ 43,  46,  45],
        [ 50,  48,  43],
        ...,
        [158, 132, 108],
        [152, 125, 102],
        [148, 124, 103]],

       [[ 16,  20,  20],
        [  0,   0,   0],
        [ 18,   8,   0],
        ...,
        [123,  88,  55],
        [119,  83,  50],
        [122,  87,  57]],

       [[ 25,  24,  21],
        [ 16,   7,   0],
        [ 49,  27,   8],
        ...,
        [118,  84,  50],
        [120,  84,  50],
        [109,  73,  42]],

       ...,

       [[208, 170,  96],
        [201, 153,  34],
        [198, 161,  26],
        ...,
        [160, 133,  70],
        [ 56,  31,   7],
        [ 53,  34,  20]],

       [[180, 139,  96],
        [173, 123,  42],
        [186, 144,  30],
        ...,
        [184, 148,  94],
        [ 97,  62,  34],
        [ 83,  53,  34]],

       [[177, 144, 116],
        [168, 129,  94],
        [179, 142,  87],
        ...,
        [216, 184, 140],
        [151, 118,  84],
        [123,  92,  72]]], dtype=uint8)
In [ ]:
y_train[0]
Out[ ]:
array([6], dtype=uint8)

The target labels are encoded in numerical format. Here, encoding 6 denotes the category frog. We will create a list of category names to convert the number to its original category name.

In [ ]:
# Checking the shape of the test data
x_test.shape
Out[ ]:
(10000, 32, 32, 3)
  • There are 10,000 images in the test data.

Converting NumPy arrays to images and visualizing some random images¶

As we saw above, all the images are stored as NumPy arrays, and values in the array denote the pixel intensities of the image. We can use matplotlib's imshow function to visualize the image from NumPy arrays. Below we are plotting a few random images from the dataset, to see what the images look like.

In [ ]:
# Declaring the number of classes
num_classes = 10

# Creating a list of category names in alphabetical order
cifar10_classes = ["airplane", "automobile", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck"]
In [ ]:
# Declaring the number of rows
rows = 3

# Declaring the number of columns
cols = 4

fig = plt.figure(figsize = (10, 8))

for i in range(cols):

    for j in range(rows):

        random_index = np.random.randint(0, len(y_train))

        ax = fig.add_subplot(rows, cols, i * rows + j + 1)

        ax.imshow(x_train[random_index, :])

        ax.set_title(cifar10_classes[y_train[random_index, 0]])

# Display the plot
plt.show()

Data Preparation¶

In neural networks, it is always suggested to normalize the feature inputs. Normalization has the below benefits while training a neural network model:

  1. Normalization makes the training faster and reduces the chances of getting stuck at local optima.
  2. Also, weight decay and estimation can be done more conveniently with normalized inputs.
  3. In deep neural networks, normalization helps to avoid exploding gradient problems. Gradient exploding problem occurs when large error gradients accumulate and result in very large updates to neural network model weights during training. This makes a model unstable and unable to learn from the training data.

As we know, image pixel values range from 0 - 255, so we are simply dividing all the pixel values by 255 to standardize all the images to have values between 0 - 1.

In [ ]:
# Normalizing the image pixels
x_train_normalized = x_train/255

x_test_normalized = x_test/255

Since this is a 10 class classification problem, the output layer should have 10 neurons, which will provide us with the probabilities of the input image belonging to each of those 10 classes. Therefore, we also need to create a one-hot encoded representation for the target classes.

In [ ]:
# Creating one-hot encoded representation of target labels

# We can do this by using this utility function - https://www.tensorflow.org/api_docs/python/tf/keras/utils/to_categorical
y_train_encoded = tf.keras.utils.to_categorical(y_train)

y_test_encoded = tf.keras.utils.to_categorical(y_test)

Model Building¶

Let's now create a CNN model sequentially where we will be adding the layers one after another.

First, let's set the seed for random number generators in NumPy, Python, and TensorFlow to be able to reproduce the same results every time we run the code.

In [ ]:
# Fixing the seed for random number generators
np.random.seed(42)

import random

random.seed(42)

tf.random.set_seed(42)

Let's build a CNN model with Leaky Rectified Linear Unit (LeakyRelu) as the activation function. LeakyReLU is a type of activation function based on a ReLU, but it has a small slope for negative values instead of a flat slope. The slope coefficient is determined before training, i.e., it is not learned during training.

Note:

  • In Keras, the input must be 4-dimensional to pass them to a CNN model.
  • Here, we already have a 4-dimensional input as these are colored images.
  • In the case of grayscale images, we must reshape the input features to specifically mention that we have only 1 channel, i.e., gray.
In [ ]:
# Initialized a sequential model
model_1 = Sequential()

# Adding the first convolutional layer with 16 filters and the kernel size of 3x3, and 'same' padding

# The input_shape denotes input dimension of CIFAR images
model_1.add(Conv2D(filters = 16, kernel_size = (3, 3), padding = "same", input_shape = (32, 32, 3)))

# Adding LeakyRelu activation function with a negative slope of 0.1
model_1.add(LeakyReLU(0.1))

# Adding the second convolutional layer with 32 filters and the kernel size of 3x3
model_1.add(Conv2D(filters = 32, kernel_size = (3, 3), padding = 'same'))

# Adding LeakyRelu activation function with a negative slope of 0.1
model_1.add(LeakyReLU(0.1))
    
# Adding max pooling to reduce the size of the output of second convolutional layer
model_1.add(MaxPooling2D(pool_size = (2, 2)))
    
# Flattening the 3-d output of the convolutional layer after max pooling to make it ready for creating dense connections
model_1.add(Flatten())

# Adding a fully connected dense layer with 256 neurons    
model_1.add(Dense(256))

# Adding LeakyRelu activation function with a negative slope of 0.1
model_1.add(LeakyReLU(0.1))

# Adding the output layer with 10 neurons and 'softmax' activation function (for a multi-class classification problem)
model_1.add(Dense(10, activation = 'softmax'))
In [ ]:
# Printing the model summary
model_1.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d (Conv2D)             (None, 32, 32, 16)        448       
                                                                 
 leaky_re_lu (LeakyReLU)     (None, 32, 32, 16)        0         
                                                                 
 conv2d_1 (Conv2D)           (None, 32, 32, 32)        4640      
                                                                 
 leaky_re_lu_1 (LeakyReLU)   (None, 32, 32, 32)        0         
                                                                 
 max_pooling2d (MaxPooling2D  (None, 16, 16, 32)       0         
 )                                                               
                                                                 
 flatten (Flatten)           (None, 8192)              0         
                                                                 
 dense (Dense)               (None, 256)               2097408   
                                                                 
 leaky_re_lu_2 (LeakyReLU)   (None, 256)               0         
                                                                 
 dense_1 (Dense)             (None, 10)                2570      
                                                                 
=================================================================
Total params: 2,105,066
Trainable params: 2,105,066
Non-trainable params: 0
_________________________________________________________________

As we can see from the above summary, this CNN model will train and learn 2,105,066 parameters (weights and biases).

Let's now compile and train the model using the train data. Here, we are using the loss function - categorical_crossentropy as this is a multi-class classification problem. We will try to minimize this loss at every iteration using the optimizer of our choice. Also, we are choosing accuracy as the metric to measure the performance of the model.

In [ ]:
model_1.compile(
    
    loss = 'categorical_crossentropy',
    
    # Using Adam optimizer with 0.005 learning rate, by default it is 0.001
    optimizer = tf.keras.optimizers.Adamax(learning_rate = 0.005),
    
    metrics=['accuracy']
)
In [ ]:
history_1 = model_1.fit(
    
            x_train_normalized, y_train_encoded,
            
            epochs = 10,
            
            validation_split = 0.1,
            
            shuffle = True,
            
            verbose = 2
)
Epoch 1/10
1407/1407 - 19s - loss: 1.3551 - accuracy: 0.5157 - val_loss: 1.0576 - val_accuracy: 0.6206 - 19s/epoch - 13ms/step
Epoch 2/10
1407/1407 - 9s - loss: 0.9225 - accuracy: 0.6753 - val_loss: 0.9056 - val_accuracy: 0.6880 - 9s/epoch - 6ms/step
Epoch 3/10
1407/1407 - 9s - loss: 0.6909 - accuracy: 0.7562 - val_loss: 0.9105 - val_accuracy: 0.6990 - 9s/epoch - 6ms/step
Epoch 4/10
1407/1407 - 9s - loss: 0.4753 - accuracy: 0.8357 - val_loss: 0.9843 - val_accuracy: 0.6982 - 9s/epoch - 6ms/step
Epoch 5/10
1407/1407 - 9s - loss: 0.2814 - accuracy: 0.9044 - val_loss: 1.0863 - val_accuracy: 0.7106 - 9s/epoch - 6ms/step
Epoch 6/10
1407/1407 - 9s - loss: 0.1533 - accuracy: 0.9496 - val_loss: 1.4088 - val_accuracy: 0.6936 - 9s/epoch - 7ms/step
Epoch 7/10
1407/1407 - 9s - loss: 0.0839 - accuracy: 0.9727 - val_loss: 1.6588 - val_accuracy: 0.6998 - 9s/epoch - 6ms/step
Epoch 8/10
1407/1407 - 9s - loss: 0.0525 - accuracy: 0.9836 - val_loss: 1.9140 - val_accuracy: 0.6970 - 9s/epoch - 7ms/step
Epoch 9/10
1407/1407 - 9s - loss: 0.0337 - accuracy: 0.9892 - val_loss: 1.9631 - val_accuracy: 0.6938 - 9s/epoch - 6ms/step
Epoch 10/10
1407/1407 - 9s - loss: 0.0290 - accuracy: 0.9909 - val_loss: 2.1622 - val_accuracy: 0.6946 - 9s/epoch - 6ms/step
In [ ]:
plt.plot(history_1.history['accuracy'])

plt.plot(history_1.history['val_accuracy'])

plt.title('Model Accuracy')

plt.ylabel('Accuracy')

plt.xlabel('Epoch')

plt.legend(['Train', 'Validation'], loc = 'upper left')

# Display the plot
plt.show()

Observations:

  • We can see from the above plot that the model has done poorly on the validation data. The model is highly overfitting the training data.
  • The validation accuracy has become more or less constant after 2 epochs.

Let's try adding a few dropout layers to the model structure to reduce overfitting and see if this improves the model or not.

First, we need to clear the previous model's history from the session. In Keras, we need special command to clear the model's history, otherwise, the previous model history remains in the backend.

Also, let's fix the seed again after clearing the backend.

In [ ]:
# Clearing the backend
from tensorflow.keras import backend

backend.clear_session()
In [ ]:
# Fixing the seed for random number generators
np.random.seed(42)

import random

random.seed(42)

tf.random.set_seed(42)
In [ ]:
# Initialized a sequential model
model_2 = Sequential()

# Adding the first convolutional layer with 16 filters and the kernel size of 3x3, and 'same' padding

# The input_shape denotes the input dimension of CIFAR images
model_2.add(Conv2D(filters = 16, kernel_size = (3, 3), padding = "same", input_shape = (32, 32, 3)))

# Adding LeakyRelu activation function with a negative slope of 0.1
model_2.add(LeakyReLU(0.1))

# Adding dropout to randomly switch off 20% neurons to reduce overfitting
model_2.add(Dropout(0.2))

# Adding the second convolutional layer with 32 filters and the kernel size of 3x3
model_2.add(Conv2D(filters = 32, kernel_size = (3, 3), padding = 'same'))

# Adding LeakyRelu activation function with a negative slope of 0.1
model_2.add(LeakyReLU(0.1))

# Adding dropout to randomly switch off 20% neurons to reduce overfitting
model_2.add(Dropout(0.2))
    
# Adding max pooling to reduce the size of output of second convolutional layer
model_2.add(MaxPooling2D(pool_size = (2, 2)))
    
# Flattening the 3-d output of the convolutional layer after max pooling to make it ready for creating dense connections
model_2.add(Flatten())

# Adding a fully connected dense layer with 256 neurons    
model_2.add(Dense(256))

# Adding LeakyRelu activation function with a negative slope of 0.1
model_2.add(LeakyReLU(0.1))

# Adding dropout to randomly switch off 50% neurons to reduce overfitting
model_2.add(Dropout(0.5))

# Adding the output layer with 10 neurons and 'softmax'  activation function since this is a multi-class classification problem
model_2.add(Dense(10, activation = 'softmax'))
In [ ]:
# Printing the model summary
model_2.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d (Conv2D)             (None, 32, 32, 16)        448       
                                                                 
 leaky_re_lu (LeakyReLU)     (None, 32, 32, 16)        0         
                                                                 
 dropout (Dropout)           (None, 32, 32, 16)        0         
                                                                 
 conv2d_1 (Conv2D)           (None, 32, 32, 32)        4640      
                                                                 
 leaky_re_lu_1 (LeakyReLU)   (None, 32, 32, 32)        0         
                                                                 
 dropout_1 (Dropout)         (None, 32, 32, 32)        0         
                                                                 
 max_pooling2d (MaxPooling2D  (None, 16, 16, 32)       0         
 )                                                               
                                                                 
 flatten (Flatten)           (None, 8192)              0         
                                                                 
 dense (Dense)               (None, 256)               2097408   
                                                                 
 leaky_re_lu_2 (LeakyReLU)   (None, 256)               0         
                                                                 
 dropout_2 (Dropout)         (None, 256)               0         
                                                                 
 dense_1 (Dense)             (None, 10)                2570      
                                                                 
=================================================================
Total params: 2,105,066
Trainable params: 2,105,066
Non-trainable params: 0
_________________________________________________________________
In [ ]:
# Compiling the model
model_2.compile(
 
    loss = 'categorical_crossentropy',
    
    optimizer = tf.keras.optimizers.Adamax(learning_rate = 0.005),
)
In [ ]:
# Fitting the model
history_2 = model_2.fit(
    
            x_train_normalized, y_train_encoded,

            epochs = 10,
            
            validation_split = 0.1,

            shuffle = True,
            
            verbose = 2
)
Epoch 1/10
1407/1407 - 11s - loss: 1.5545 - accuracy: 0.4451 - val_loss: 1.7006 - val_accuracy: 0.4308 - 11s/epoch - 8ms/step
Epoch 2/10
1407/1407 - 10s - loss: 1.1775 - accuracy: 0.5825 - val_loss: 1.2222 - val_accuracy: 0.5828 - 10s/epoch - 7ms/step
Epoch 3/10
1407/1407 - 10s - loss: 1.0168 - accuracy: 0.6428 - val_loss: 1.1596 - val_accuracy: 0.6174 - 10s/epoch - 7ms/step
Epoch 4/10
1407/1407 - 10s - loss: 0.9153 - accuracy: 0.6762 - val_loss: 1.1593 - val_accuracy: 0.6192 - 10s/epoch - 7ms/step
Epoch 5/10
1407/1407 - 10s - loss: 0.8393 - accuracy: 0.7048 - val_loss: 1.1708 - val_accuracy: 0.6276 - 10s/epoch - 7ms/step
Epoch 6/10
1407/1407 - 10s - loss: 0.7721 - accuracy: 0.7272 - val_loss: 1.2177 - val_accuracy: 0.6312 - 10s/epoch - 7ms/step
Epoch 7/10
1407/1407 - 10s - loss: 0.7122 - accuracy: 0.7492 - val_loss: 1.0574 - val_accuracy: 0.6698 - 10s/epoch - 7ms/step
Epoch 8/10
1407/1407 - 10s - loss: 0.6680 - accuracy: 0.7627 - val_loss: 0.9645 - val_accuracy: 0.6820 - 10s/epoch - 7ms/step
Epoch 9/10
1407/1407 - 10s - loss: 0.6168 - accuracy: 0.7808 - val_loss: 1.1259 - val_accuracy: 0.6530 - 10s/epoch - 7ms/step
Epoch 10/10
1407/1407 - 10s - loss: 0.5861 - accuracy: 0.7942 - val_loss: 1.0098 - val_accuracy: 0.6892 - 10s/epoch - 7ms/step
In [ ]:
plt.plot(history_2.history['accuracy'])

plt.plot(history_2.history['val_accuracy'])

plt.title('Model Accuracy')

plt.ylabel('Accuracy')

plt.xlabel('Epoch')

plt.legend(['Train', 'Validation'], loc = 'upper left')

# Display the plot
plt.show()

Observations:

  • The second model with dropout layers seems to have reduced the overfitting in comparison to the previous model, but still, the model is not performing well on the validation data.
  • The validation accuracy has decreased slightly as compared to the previous model.

Let's now build another model with a few more convolution layers, max-pooling layers, and dropout layers to reduce overfitting. Also, let's change the learning rate and the number of epochs and see if the model's performance improves.

In [ ]:
# Clearing backend
from tensorflow.keras import backend

backend.clear_session()
In [ ]:
# Fixing the seed for random number generators
np.random.seed(42)

import random

random.seed(42)

tf.random.set_seed(42)
In [ ]:
# Initialized a sequential model
model_3 = Sequential()

# Adding the first convolutional layer with 16 filters and the kernel size of 3x3, and 'same' padding

# The input_shape denotes input dimension of CIFAR images
model_3.add(Conv2D(filters = 16, kernel_size = (3, 3), padding = "same", input_shape = (32, 32, 3)))

# Adding LeakyRelu activation function with a negative slope of 0.1
model_3.add(LeakyReLU(0.1))

# Adding the second convolutional layer with 32 filters and the kernel size of 3x3
model_3.add(Conv2D(filters = 32, kernel_size = (3, 3), padding = 'same'))

# Adding LeakyRelu activation function with a negative slope of 0.1
model_3.add(LeakyReLU(0.1))
    
# Adding max pooling to reduce the size of output of the second convolutional layer
model_3.add(MaxPooling2D(pool_size = (2, 2)))
    
# Adding dropout to randomly switch off 25% of the network to reduce overfitting
model_3.add(Dropout(0.25))
    
# Adding the third convolutional layer with 32 filters and the kernel size of 3x3
model_3.add(Conv2D(filters = 32, kernel_size = (3, 3), padding = 'same'))

# Adding LeakyRelu activation function with a negative slope of 0.1
model_3.add(LeakyReLU(0.1))

# Adding the fourth convolutional layer with 64 filters and the kernel size of 3x3
model_3.add(Conv2D(filters = 64, kernel_size = (3, 3), padding = 'same'))

# Adding LeakyRelu activation function with a negative slope of 0.1
model_3.add(LeakyReLU(0.1))

# Adding max pooling to reduce the size of output of the fourth convolutional layer    
model_3.add(MaxPooling2D(pool_size = (2, 2)))
    
# Adding dropout to randomly switch off 25% of the network to reduce overfitting
model_3.add(Dropout(0.25))

# Flattening the 3-d output of the convolutional layer after max pooling to make it ready for creating dense connections
model_3.add(Flatten())

# Adding a fully connected dense layer with 256 neurons    
model_3.add(Dense(256))

# Adding LeakyRelu activation function with negative slope of 0.1
model_3.add(LeakyReLU(0.1))
    
# Adding dropout to randomly switch off 50% of dense layer neurons to reduce overfitting
model_3.add(Dropout(0.5))

# Adding the output layer with 10 neurons and 'softmax' activation function since this is a multi-class classification problem
model_3.add(Dense(10, activation = 'softmax'))
In [ ]:
# Summary of the model
model_3.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d (Conv2D)             (None, 32, 32, 16)        448       
                                                                 
 leaky_re_lu (LeakyReLU)     (None, 32, 32, 16)        0         
                                                                 
 conv2d_1 (Conv2D)           (None, 32, 32, 32)        4640      
                                                                 
 leaky_re_lu_1 (LeakyReLU)   (None, 32, 32, 32)        0         
                                                                 
 max_pooling2d (MaxPooling2D  (None, 16, 16, 32)       0         
 )                                                               
                                                                 
 dropout (Dropout)           (None, 16, 16, 32)        0         
                                                                 
 conv2d_2 (Conv2D)           (None, 16, 16, 32)        9248      
                                                                 
 leaky_re_lu_2 (LeakyReLU)   (None, 16, 16, 32)        0         
                                                                 
 conv2d_3 (Conv2D)           (None, 16, 16, 64)        18496     
                                                                 
 leaky_re_lu_3 (LeakyReLU)   (None, 16, 16, 64)        0         
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 8, 8, 64)         0         
 2D)                                                             
                                                                 
 dropout_1 (Dropout)         (None, 8, 8, 64)          0         
                                                                 
 flatten (Flatten)           (None, 4096)              0         
                                                                 
 dense (Dense)               (None, 256)               1048832   
                                                                 
 leaky_re_lu_4 (LeakyReLU)   (None, 256)               0         
                                                                 
 dropout_2 (Dropout)         (None, 256)               0         
                                                                 
 dense_1 (Dense)             (None, 10)                2570      
                                                                 
=================================================================
Total params: 1,084,234
Trainable params: 1,084,234
Non-trainable params: 0
_________________________________________________________________

In this new architecture, although the number of convolutional layers has increased, but the total number of trainable parameters has reduced significantly (~ 50%). This is due to the addition of a few max-pooling layers in between. Let's train this model.

In [ ]:
model_3.compile(
    
    loss = 'categorical_crossentropy',

    optimizer = tf.keras.optimizers.Adamax(learning_rate = 0.001),
    
    metrics = ['accuracy']
)
In [ ]:
history_3 = model_3.fit(
    
            x_train_normalized, y_train_encoded,

            epochs = 15,

            validation_split = 0.1,

            shuffle = True,

            verbose = 2
)
Epoch 1/15
1407/1407 - 14s - loss: 1.6298 - accuracy: 0.4076 - val_loss: 1.3043 - val_accuracy: 0.5286 - 14s/epoch - 10ms/step
Epoch 2/15
1407/1407 - 12s - loss: 1.2694 - accuracy: 0.5462 - val_loss: 1.0677 - val_accuracy: 0.6286 - 12s/epoch - 9ms/step
Epoch 3/15
1407/1407 - 12s - loss: 1.1054 - accuracy: 0.6079 - val_loss: 0.9629 - val_accuracy: 0.6626 - 12s/epoch - 9ms/step
Epoch 4/15
1407/1407 - 12s - loss: 1.0017 - accuracy: 0.6465 - val_loss: 0.9013 - val_accuracy: 0.6908 - 12s/epoch - 9ms/step
Epoch 5/15
1407/1407 - 13s - loss: 0.9240 - accuracy: 0.6752 - val_loss: 0.8416 - val_accuracy: 0.7102 - 13s/epoch - 9ms/step
Epoch 6/15
1407/1407 - 13s - loss: 0.8632 - accuracy: 0.6963 - val_loss: 0.7724 - val_accuracy: 0.7350 - 13s/epoch - 9ms/step
Epoch 7/15
1407/1407 - 13s - loss: 0.8120 - accuracy: 0.7168 - val_loss: 0.7838 - val_accuracy: 0.7258 - 13s/epoch - 9ms/step
Epoch 8/15
1407/1407 - 13s - loss: 0.7721 - accuracy: 0.7303 - val_loss: 0.7191 - val_accuracy: 0.7508 - 13s/epoch - 9ms/step
Epoch 9/15
1407/1407 - 13s - loss: 0.7368 - accuracy: 0.7409 - val_loss: 0.7273 - val_accuracy: 0.7488 - 13s/epoch - 9ms/step
Epoch 10/15
1407/1407 - 12s - loss: 0.6983 - accuracy: 0.7530 - val_loss: 0.6896 - val_accuracy: 0.7584 - 12s/epoch - 9ms/step
Epoch 11/15
1407/1407 - 13s - loss: 0.6773 - accuracy: 0.7612 - val_loss: 0.6766 - val_accuracy: 0.7692 - 13s/epoch - 9ms/step
Epoch 12/15
1407/1407 - 13s - loss: 0.6485 - accuracy: 0.7716 - val_loss: 0.6468 - val_accuracy: 0.7752 - 13s/epoch - 9ms/step
Epoch 13/15
1407/1407 - 16s - loss: 0.6244 - accuracy: 0.7813 - val_loss: 0.6540 - val_accuracy: 0.7764 - 16s/epoch - 12ms/step
Epoch 14/15
1407/1407 - 14s - loss: 0.6061 - accuracy: 0.7853 - val_loss: 0.6455 - val_accuracy: 0.7788 - 14s/epoch - 10ms/step
Epoch 15/15
1407/1407 - 13s - loss: 0.5824 - accuracy: 0.7947 - val_loss: 0.6239 - val_accuracy: 0.7838 - 13s/epoch - 9ms/step
In [ ]:
plt.plot(history_3.history['accuracy'])

plt.plot(history_3.history['val_accuracy'])

plt.title('Model Accuracy')

plt.ylabel('Accuracy')

plt.xlabel('Epoch')

plt.legend(['Train', 'Validation'], loc = 'upper left')

# Display the plot
plt.show()

Observations:

  • The third iteration of this model seems very promising now.
  • The validation accuracy has improved substantially and the problem of overfitting has been solved. We can say that the model is giving a generalized performance.
  • The above plot shows that the validation accuracy is higher than the training accuracy. There are a few possible reasons for this:
    • The size of the validation set is not big enough.
    • We may have imbalanced data in the validation set.
    • High regularization. If we use regularization methods such as L2, L1, or Dropout, while the model calculates training accuracy it uses a regularized model but when it calculates accuracy on the validation set, it processes the data through an unregularized model. Regularization introduces some noise in loss value during training, because of this the training accuracy decreases more than expected, but while evaluating the model, the model doesn't use regularization, and hence there's no noise, which is why the validation accuracy doesn't decrease.
    • To overcome this, we can try reducing the regularization or increasing the size of the validation set.

We can try out some more iterations and tune some of the hyperparameters to further improve the model but hyperparameter tuning is exhaustive and can take a long time to find the right set of values for each hyperparameter.

Let's try some other techniques like transfer learning to see if we can speed up the process of training the model and can also get a more accurate model overall.

Transfer Learning¶

Transfer learning is a popular deep learning technique that reuses a pre-trained model on a new problem. It can train deep neural networks with comparatively little data. This is very useful in the data science field since most real-world problems typically do not have millions of labeled data points to train complex models.

Let's begin by clearing the backend and fixing the seed.

In [ ]:
# Clearing backend
from tensorflow.keras import backend

backend.clear_session()
In [ ]:
# Fixing the seed for random number generators
np.random.seed(42)

import random

random.seed(42)

tf.random.set_seed(42)

We will use VGG16 as the pre-trained model. You can read about it here.

Also, we will use the Functional Model API to build the model this time because it allows explicitly connecting the output of one layer to the input of another layer. Each connection is specified. You can read about it here.

In [ ]:
# Importing necessary libraries
from tensorflow.keras import Model

from tensorflow.keras.applications.vgg16 import VGG16

Now, let's instantiate the VGG16 model.

  • The VGG16 model was originally trained on images of size 224 x 224. The TensorFlow application allows the minimum image size of 32x32 which is luckily the same as the image size in the CIFAR-10 dataset. If you want to use any other size, you can change the size of the input image.
  • By specifying the argument include_top=False argument, we load a network that doesn't include the classification layers at the top, i.e., we will use the VGG16 model only for feature extraction.
In [ ]:
vgg_model = VGG16(weights = 'imagenet', 
                  
                       include_top = False, 
                  
                       input_shape = (32, 32, 3), pooling = 'max')
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5
58892288/58889256 [==============================] - 0s 0us/step
58900480/58889256 [==============================] - 0s 0us/step
In [ ]:
# Checking summary of the model
vgg_model.summary()
Model: "vgg16"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_1 (InputLayer)        [(None, 32, 32, 3)]       0         
                                                                 
 block1_conv1 (Conv2D)       (None, 32, 32, 64)        1792      
                                                                 
 block1_conv2 (Conv2D)       (None, 32, 32, 64)        36928     
                                                                 
 block1_pool (MaxPooling2D)  (None, 16, 16, 64)        0         
                                                                 
 block2_conv1 (Conv2D)       (None, 16, 16, 128)       73856     
                                                                 
 block2_conv2 (Conv2D)       (None, 16, 16, 128)       147584    
                                                                 
 block2_pool (MaxPooling2D)  (None, 8, 8, 128)         0         
                                                                 
 block3_conv1 (Conv2D)       (None, 8, 8, 256)         295168    
                                                                 
 block3_conv2 (Conv2D)       (None, 8, 8, 256)         590080    
                                                                 
 block3_conv3 (Conv2D)       (None, 8, 8, 256)         590080    
                                                                 
 block3_pool (MaxPooling2D)  (None, 4, 4, 256)         0         
                                                                 
 block4_conv1 (Conv2D)       (None, 4, 4, 512)         1180160   
                                                                 
 block4_conv2 (Conv2D)       (None, 4, 4, 512)         2359808   
                                                                 
 block4_conv3 (Conv2D)       (None, 4, 4, 512)         2359808   
                                                                 
 block4_pool (MaxPooling2D)  (None, 2, 2, 512)         0         
                                                                 
 block5_conv1 (Conv2D)       (None, 2, 2, 512)         2359808   
                                                                 
 block5_conv2 (Conv2D)       (None, 2, 2, 512)         2359808   
                                                                 
 block5_conv3 (Conv2D)       (None, 2, 2, 512)         2359808   
                                                                 
 block5_pool (MaxPooling2D)  (None, 1, 1, 512)         0         
                                                                 
 global_max_pooling2d (Globa  (None, 512)              0         
 lMaxPooling2D)                                                  
                                                                 
=================================================================
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
_________________________________________________________________
  • The VGG16 model has more than 14.7 M trainable parameters.
  • Here, we will not train any of the layers from the VGG16 model. We will use the pre-trained weights and biases.
  • Also, we can take any layer's output from the VGG16 model as the input of our new model. Here, we will take the output of the 3rd block of the VGG16 model as the input of our new model.
In [ ]:
transfer_layer = vgg_model.get_layer('block3_pool')
In [ ]:
vgg_model.trainable = False

Now, we will add classification layers to our model using Functional Model API.

In [ ]:
# Add classification layers on top of it
x = Flatten()(transfer_layer.output)

x = Dense(256, activation = 'relu')(x)

x = Dense(128, activation = 'relu')(x)

x = Dropout(0.3)(x)

x = Dense(64, activation = 'relu')(x)

x = BatchNormalization()(x)

pred = Dense(10, activation = 'softmax')(x)

# Initializing the model
model_4 = Model(vgg_model.input, pred)
In [ ]:
# Compiling the model
model_4.compile(loss = 'categorical_crossentropy',
                
              optimizer = tf.keras.optimizers.Adamax(learning_rate = 0.0005),

              metrics = ['accuracy'])
In [ ]:
# Fitting the model
history_4 = model_4.fit(
    
            x_train_normalized, y_train_encoded,

            epochs = 10,

            batch_size = 250,

            validation_split = 0.1,

            verbose = 2
)
Epoch 1/10
180/180 - 10s - loss: 1.0818 - accuracy: 0.6320 - val_loss: 0.8091 - val_accuracy: 0.7244 - 10s/epoch - 56ms/step
Epoch 2/10
180/180 - 8s - loss: 0.8667 - accuracy: 0.7068 - val_loss: 0.7526 - val_accuracy: 0.7462 - 8s/epoch - 45ms/step
Epoch 3/10
180/180 - 8s - loss: 0.7654 - accuracy: 0.7394 - val_loss: 0.6967 - val_accuracy: 0.7658 - 8s/epoch - 42ms/step
Epoch 4/10
180/180 - 8s - loss: 0.6931 - accuracy: 0.7654 - val_loss: 0.6643 - val_accuracy: 0.7744 - 8s/epoch - 45ms/step
Epoch 5/10
180/180 - 8s - loss: 0.6395 - accuracy: 0.7832 - val_loss: 0.6554 - val_accuracy: 0.7752 - 8s/epoch - 45ms/step
Epoch 6/10
180/180 - 8s - loss: 0.5934 - accuracy: 0.7973 - val_loss: 0.6257 - val_accuracy: 0.7840 - 8s/epoch - 45ms/step
Epoch 7/10
180/180 - 8s - loss: 0.5527 - accuracy: 0.8108 - val_loss: 0.6206 - val_accuracy: 0.7900 - 8s/epoch - 42ms/step
Epoch 8/10
180/180 - 8s - loss: 0.5064 - accuracy: 0.8268 - val_loss: 0.6014 - val_accuracy: 0.7960 - 8s/epoch - 46ms/step
Epoch 9/10
180/180 - 8s - loss: 0.4773 - accuracy: 0.8396 - val_loss: 0.6055 - val_accuracy: 0.7920 - 8s/epoch - 46ms/step
Epoch 10/10
180/180 - 8s - loss: 0.4437 - accuracy: 0.8473 - val_loss: 0.6118 - val_accuracy: 0.7940 - 8s/epoch - 42ms/step
In [ ]:
plt.plot(history_4.history['accuracy'])

plt.plot(history_4.history['val_accuracy'])

plt.title('Model Accuracy')

plt.ylabel('Accuracy')

plt.xlabel('Epoch')

plt.legend(['Train', 'Validation'], loc = 'upper left')

# Display the plot
plt.show()

Observations:

  • The model training accuracy is slightly higher than the validation accuracy.
  • The validation accuracy has improved in comparison to the previous model.
  • We have been able to achieve the best validation accuracy so far without actually training any of the convolutional layers. There are other pre-trained models which can be tried and/or tuned to get better performance.

Here, let's use the this model to make predictions on the test data.

Making predictions¶

In [ ]:
# Making predictions on the test data
y_pred_test = model_4.predict(x_test_normalized)

# Converting probabilities to class labels
y_pred_test_classes = np.argmax(y_pred_test, axis = 1)

# Calculating the probability of the predicted class
y_pred_test_max_probas = np.max(y_pred_test, axis = 1)
In [ ]:
# Importing required functions
from sklearn.metrics import classification_report

from sklearn.metrics import confusion_matrix

# Printing the classification report
print(classification_report(y_test, y_pred_test_classes))

# Plotting the heatmap using confusion matrix
cm = confusion_matrix(y_test, y_pred_test_classes)

plt.figure(figsize = (8, 5))

sns.heatmap(cm, annot = True,  fmt = '.0f', xticklabels = cifar10_classes, yticklabels = cifar10_classes)

plt.ylabel('Actual')

plt.xlabel('Predicted')

# Display the plot
plt.show()
              precision    recall  f1-score   support

           0       0.80      0.84      0.82      1000
           1       0.92      0.84      0.88      1000
           2       0.76      0.70      0.73      1000
           3       0.69      0.54      0.61      1000
           4       0.78      0.73      0.76      1000
           5       0.61      0.76      0.68      1000
           6       0.82      0.86      0.84      1000
           7       0.86      0.83      0.84      1000
           8       0.82      0.91      0.86      1000
           9       0.84      0.85      0.85      1000

    accuracy                           0.79     10000
   macro avg       0.79      0.79      0.79     10000
weighted avg       0.79      0.79      0.79     10000

Observations:

  • The model is giving about 79% accuracy on the test data which is comparable to the accuracy of the validation data. This implies that the model is giving a generalized performance.
  • The recall has a high range which implies that the model is good at identifying some objects while poor at identifying some other objects. For example, the model can identify more than 90% of ships but can identify only ~ 54% of cats.
  • The model is majorly confused between cats and dogs. This implies that the model might be focused on features related to shapes and sizes but not deep features of objects that can help to distinguish between objects like cats and dogs.
  • Consequently, precision also has a high range with the 'cat' class having the least precision.
  • The highest precision is for 'automobile' which implies that the model can distinguish automobiles from other objects.

Visualizing the predicted images¶

In [ ]:
rows = 3

cols = 4

fig = plt.figure(figsize = (10, 12))

for i in range(cols):

    for j in range(rows):

        random_index = np.random.randint(0, len(y_test))

        ax = fig.add_subplot(rows, cols, i * rows + j + 1)

        ax.imshow(x_test[random_index, :])

        pred_label = cifar10_classes[y_pred_test_classes[random_index]]

        pred_proba = y_pred_test_max_probas[random_index]

        true_label = cifar10_classes[y_test[random_index, 0]]
        
        ax.set_title("actual: {}\npredicted: {}\nprobability: {:.3}\n".format(
               true_label, pred_label, pred_proba
        ))
plt.show()

Conclusion¶

In this notebook, we have implemented a CNN model from scratch and used transfer learning to make predictions on the CIFAR-10 dataset. We have learned how to prepare the image data before passing it into the CNN model and how to add layers sequentially inside the model.

We have seen four different iterations of the CNN model and built an intuition about how to improve the model by tuning various hyperparameters and using different techniques. There is still plenty of scope for improvement and you can try out tuning different hyperparameters to improve the model performance.