All about Convolutional Neural Network(CNN)

12 min readAug 8, 2022

Introduction

First of all, I want to clarify something, which is that a convolutional neural network (CNN) is a type of artificial neural network used in image recognition and processing, it is specially designed for processing pixel data, which always gives very good results in this part.

What is the convolutional neural network?

CNN’s area network does an excellent job of processing images and is pretty aggressive in this segment. It is powered by artificial intelligence (AI) that uses deep learning to perform both generative and descriptive tasks that are identified, often using machine vision that includes image and video recognition, Combined with Recommendation and Natural Language Processing (NLP) systems.

We can also say that it is one of the systems and programs designed as it is similar in itself to human neurons and works with the same idea that it works with. Here something becomes clear which is that traditional neural networks do not work very efficiently to process images, and when they work, they must identify or fetch many images and deposit them in them. Lots of feed means images should be fed with low-resolution parts. The CNN has its “neurons” arranged more like those of the frontal lobe, the area responsible for processing visual stimuli in humans and other animals. The layers of neurons are arranged in such a way that they cover the entire visual field and avoid the problem of processing the fragmented images of traditional neural networks.

Here CNN focuses on a system much like a multi-layered perspective that is designed to reduce processing requirements that are time-consuming in themselves. Hence, we can say that a convolutional neural network consists of an input layer, an output layer, and a hidden layer that includes multiple convolutional layers, pooling layers, fully connected layers, and settlement layers. Removing constraints and increasing efficiency in image processing results in a more efficient and simpler system for the limited trains of image processing and natural language processing.

How exactly do convolutional neural networks work?

As we know that convolutional neural networks are distinguished from other neural networks in that they have a very special performance with image, text or audio signal inputs. It consists of three types of important classes:

Convolutional layer
Pooling layer
Fully Connected Layer (FC)

The convolutional layer: is the first layer of the convolutional network. While convolutional layers can be followed by additional convolutional layers or pooling layers, the fully connected layer is the final layer. With each layer, the CNN increases in complexity, which leads to the identification of larger parts of the image. Previous layers focus on simple features, such as colors and edges. As the image data progresses through the CNN layers, it begins to recognize the larger elements or shapes of the object until it finally determines the intended object.

Convolutional layer

Convolutional Layer We can say that this layer is what appears as there is no neural network as it is the basic building block of the CNN, and it is where most of the computational operations on which the network takes place. It requires some basic components, namely input data, a filter, and an appropriate feature map. Let’s say the input will be an image with one or more colors, which is made up of an array of 3D pixels. This means that the input will have three visual dimensions — height, width, and depth — which correspond to the RGB in the image. Here we also have the feature detector working on this, also known as a kernel or filter, which will go through the image’s receiving fields, to check If the feature is present. This process is known as torsion.

Here the feature detector is a two-dimensional (2-D) array of weights as specified, which is part of the input image. While they can vary in size, the filter size is usually a 3x3 matrix; This also determines the size of the receptive field. The filter is then applied to an area of the image, and the point product between the input pixels and the filter is calculated. This dot product is then entered into the output matrix. Then, the filter goes one step further and the process is repeated until the entire image is engulfed in the kernel. Here the final output from the series of raster products from the input and the filter is known as the feature map, activation map, or feature wrapped.

If we look at the image above, we will see that it is not necessary to connect every output value from the map to its output value, since the features are with every pixel value in the input image. You just need to contact the receptive field, where the filter is applied. Since the output array does not need to be assigned directly to each input value, convolutional (and aggregation) layers are usually referred to as “partially connected” layers. Hence, this property can also be described as a local connection.

We can also see in the image that the weights in the feature detector remain constant as it moves through the image we are working on, which is also known as parameter sharing. Some parameters, such as weight values, are adjusted during training through the process of backpropagation and gradient descent. However, there are three hyperparameters that affect the size of the output that must be set before starting to train the neural network. We can mention them below:

1. The number of filters affects the depth of the output. For example, three distinct filters would yield three different feature maps, creating a depth of three.

2. Stride is the distance, or a number of pixels, that the kernel moves over the input matrix. While stride values of two or greater is rare, a larger stride yields a smaller output.

3. Zero-padding is usually used when the filters do not fit the input image. This sets all elements that fall outside of the input matrix to zero, producing a larger or equally sized output. There are three types of padding:

Valid padding: This is also known as no padding. In this case, the last convolution is dropped if the dimensions do not align.
Same padding: This padding ensures that the output layer has the same size as the input layer
Full padding: This type of padding increases the size of the output by adding zeros to the border of the input.

Here, after each convolution, CNN applies a corrected linear unit (ReLU) transformation to the feature map, introducing nonlinearity into the model.

Where, as we’ve illustrated, another convolution layer can follow the initial convolution layer. When this happens, the CNN structure can become hierarchical as later layers can see the pixels within the receiving fields of the previous layers. We can give an example of this if we have a picture of a car. This car consists of several things such as wheels and many things. We can disassemble those things separately and arrange them in a hierarchical form, this is how the pseudo-neural works.

Pooling layer

In this layer, the aggregation process works here, and it is also known as reductive, so it is called reductive, by reducing the dimensions, which reduces the number of parameters in the input. Similar to the convolutional layer, the pooling process sweeps a filter across the entire portlet, but the difference is that this filter doesn’t have any weights. Instead, the kernel applies an aggregate function to the values inside the receptive field, to populate the output array. There are two main types of grouping:

Max pooling: When the filter moves through the input, it selects the pixel with the maximum value to send to the output array. As a downside, this approach tends to be used more often than average aggregation.
Average pooling: As the filter moves through the input, it calculates the average value within the future field to send to the output array.

While a lot of information is lost in the pooling layer, it also has a number of benefits for CNN. It helps reduce complexity, improve efficiency, and reduce the risk of overfitting.

Fully-Connected layer

The name of the fully connected layer aptly describes itself. As mentioned earlier, the pixel values of the input image are not directly related to the output layer in partially connected layers. However, in the fully connected layer, each node in the output layer is directly connected to a node in the previous layer.

This layer performs the task of classification based on the features extracted from the previous layers and their various filters. While convolutional and aggregate layers tend to use ReLu functions, FC layers typically make use of the softmax activation function to appropriately classify the input, yielding a probability from 0 to 1.

Types of convolutional neural networks

These types are among the most famous and widely used types, and they are accurate in the sense of the word, and they are one of the most important derivatives of the convolutional neural network.

AlexNet
VggNet
GoogLeNet
ResNet

Implmentation

After looking at the definitions, which are important in order to know what are the names we need in this analogy, of course, we have to go to implementation, and this part is one of the most important parts of this article, as it will clarify two things, building a convolutional neural network using a sequential AND API, and we will clarify both and what is the difference between them.

Using Sequential.

What is the sequential?

The sequential model is the basis for a linear configuration of Keras layers. As it is one of the easiest serial models and has the simplicity and it works to represent all neural networks that are almost identical to a large extent.

We can write an easy model to illustrate:

#import the Library from keras
from keras.models import Sequential
from keras.layers import Dense, Activation model = Sequential()  
model.add(Dense(512, activation = 'relu', input_shape = (784,)))

Let’s look:

Line 1 imports Sequential model from Keras models
Line 2 imports the Dense layer and Activation module
Line 4 creates a new sequential model using Sequential API
Line 5 adds a dense layer (Dense API) with relu activation (using Activation module) function.

Sequential model exposes Model class to create customized models as well. We can use the sub-classing concept to create our own complex model.

Layer

Each layer in Keras represents the corresponding layer (input layer, hidden layer, and output layer) in the actual proposed neural network model that will be shown. Keras provides plenty of pre-build layers so that any complex neural network can be easily created. Here below are the most important layers:

Core Layers
Convolution Layers
Pooling Layers
Recurrent Layers

When can we use the actual sequential model?

Here the sequential model is suitable for a normal stack of layers since here each layer contains one input tensor and one output tensor to work.

Schematically, the following sequential form:

# Define Sequential model with 3 layer
model = keras.Sequential(
    [
        layers.Dense(2, activation="relu", name="layer1"),
        layers.Dense(3, activation="relu", name="layer2"),
        layers.Dense(4, name="layer3"),
    ]
)
# Call model on a test input
x = tf.ones((3, 3))
y = model(x)s
################################################
#YOU can usuing another way :# Create 3 layer
layer1 = layers.Dense(2, activation="relu", name="layer1")
layer2 = layers.Dense(3, activation="relu", name="layer2")
layer3 = layers.Dense(4, name="layer3")# Call layers on a test input
x = tf.ones((3, 3))
y = layer3(layer2(layer1(x)))s

In these cases we cannot use the sequential model:

When.

Your model has multiple inputs or multiple outputs
Any of your layers has multiple inputs or multiple outputs
You need to do layer sharing
You want a non-linear topology (e.g. a residual connection, a multi-branch model)

Now explain the CNN with implementation:

work with some images.

Set up the Library:

'''import from library tensorflrow cause this is the place we can call all function from it.'''import tensorflow as timport os
import math
import numpy as npfrom tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.preprocessing.image import load_img
from tensorflow.keras.preprocessing.image import array_to_img
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.preprocessing import image_dataset_from_directoryfrom IPython.display import displayf

Load data: BSDS500 dataset

dataset_url = "http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/BSR/BSR_bsds500.tgz
data_dir = keras.utils.get_file(origin=dataset_url, fname="BSR", untar=True)
root_dir = os.path.join(data_dir, "BSDS500/data")"

We create training and validation datasets via image_dataset_from_directory.

crop_size = 30
upscale_factor = 3
input_size = crop_size // upscale_factor
batch_size = 8
#creat train datatrain_ds = image_dataset_from_directory(
    root_dir,
    batch_size=batch_size,
    image_size=(crop_size, crop_size),
    validation_split=0.2,
    subset="training",
    seed=1337,
    label_mode=None,
)
# creat validation data
valid_ds = image_dataset_from_directory(
    root_dir,
    batch_size=batch_size,
    image_size=(crop_size, crop_size),
    validation_split=0.2,
    subset="validation",
    seed=1337,
    label_mode=None,
)0

We rescale the images to take values in the range [0, 1].

def scaling(input_image):
    input_image = input_image / 255.0
    return input_image
# Scale from (0, 255) to (0, 1)
train_ds = train_ds.map(scaling)
valid_ds = valid_ds.map(scaling)

We’ll prepare the dataset for the test image paths that we’ll use for visual evaluation at the end of this example.

dataset = os.path.join(root_dir, "images"
test_path = os.path.join(dataset, "test")test_img_paths = sorted(
    [
        os.path.join(test_path, fname)
        for fname in os.listdir(test_path)
        if fname.endswith(".jpg")
    ]
))

Work on cropping and resizing images:

Let’s process the image data. First, we convert our images from RGB color space to YUV color space.

# Use TF Ops to process
def process_input(input, input_size, upscale_factor):
    input = tf.image.rgb_to_yuv(input)
    last_dimension_axis = len(input.shape) - 1
    y, u, v = tf.split(input, 3, axis=last_dimension_axis)
    return tf.image.resize(y, [input_size, input_size], method="area")
def process_target(input):
    input = tf.image.rgb_to_yuv(input)
    last_dimension_axis = len(input.shape) - 1
    y, u, v = tf.split(input, 3, axis=last_dimension_axis)
    return y
train_ds = train_ds.map(
    lambda x: (process_input(x, input_size, upscale_factor), process_target(x))
)
train_ds = train_ds.prefetch(buffer_size=32)valid_ds = valid_ds.map(
    lambda x: (process_input(x, input_size, upscale_factor), process_target(x))
)
valid_ds = valid_ds.prefetch(buffer_size=32).

Build a model

def get_model(upscale_factor=3, channels=1):
    conv_args = {
        "activation": "relu",
        "kernel_initializer": "Orthogonal",
        "padding": "same",
    }
    inputs = keras.Input(shape=(None, None, channels))
    x = layers.Conv2D(64, 5, **conv_args)(inputs)
    x = layers.Conv2D(64, 3, **conv_args)(x)
    x = layers.Conv2D(32, 3, **conv_args)(x)
    x = layers.Conv2D(channels * (upscale_factor ** 2), 3, **conv_args)(x)
    outputs = tf.nn.depth_to_space(x, upscale_factor)    return keras.Model(inputs, outputs)

Define utility functions

We need to define several utility functions to monitor our results:

plot_results Plots an image to save.

get_lowres_image to convert an image to its low-resolution version.

upscale_image to convert a low-resolution image into a high-resolution version reconstructed by

import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1.inset_locator import zoomed_inset_axes
from mpl_toolkits.axes_grid1.inset_locator import mark_inset
import PIL
def plot_results(img, prefix, title):
    """Plot the result with zoom-in area."""
    img_array = img_to_array(img)
    img_array = img_array.astype("float32") / 255.0    # Create a new figure with a default 111 subplot.
    fig, ax = plt.subplots()
    im = ax.imshow(img_array[::-1], origin="lower")    plt.title(title)
    # zoom-factor: 2.0, location: upper-left
    axins = zoomed_inset_axes(ax, 2, loc=2)
    axins.imshow(img_array[::-1], origin="lower")    # Specify the limits.
    x1, x2, y1, y2 = 200, 300, 100, 200
    # Apply the x-limits.
    axins.set_xlim(x1, x2)
    # Apply the y-limits.
    axins.set_ylim(y1, y2)    plt.yticks(visible=False)
    plt.xticks(visible=False)    # Make the line.
    mark_inset(ax, axins, loc1=1, loc2=3, fc="none", ec="blue")
    plt.savefig(str(prefix) + "-" + title + ".png")
    plt.show()
def get_lowres_image(img, upscale_factor):
    """Return low-resolution image to use as model input."""
    return img.resize(
        (img.size[0] // upscale_factor, img.size[1] // upscale_factor),
        PIL.Image.BICUBIC,
    )
def upscale_image(model, img):
    """Predict the result based on input image and restore the image as RGB."""
    ycbcr = img.convert("YCbCr")
    y, cb, cr = ycbcr.split()
    y = img_to_array(y)
    y = y.astype("float32") / 255.0    input = np.expand_dims(y, axis=0)
    out = model.predict(input)    out_img_y = out[0]
    out_img_y *= 255.0    # Restore the image in RGB color space.
    out_img_y = out_img_y.clip(0, 255)
    out_img_y = out_img_y.reshape((np.shape(out_img_y)[0], np.shape(out_img_y)[1]))
    out_img_y = PIL.Image.fromarray(np.uint8(out_img_y), mode="L")
    out_img_cb = cb.resize(out_img_y.size, PIL.Image.BICUBIC)
    out_img_cr = cr.resize(out_img_y.size, PIL.Image.BICUBIC)
    out_img = PIL.Image.merge("YCbCr", (out_img_y, out_img_cb, out_img_cr)).convert(
        "RGB"
    )
    return out_img

Define callbacks to monitor training

The ESPCNCallback object will compute and display the PSNR metric. This is the main metric we use to evaluate super-resolution performance.

class ESPCNCallback(keras.callbacks.Callback):
    def __init__(self):
        super(ESPCNCallback, self).__init__()
        self.test_img = get_lowres_image(load_img(test_img_paths[0]), upscale_factor)    # Store PSNR value in each epoch.
    def on_epoch_begin(self, epoch, logs=None):
        self.psnr = []    def on_epoch_end(self, epoch, logs=None):
        print("Mean PSNR for epoch: %.2f" % (np.mean(self.psnr)))
        if epoch % 20 == 0:
            prediction = upscale_image(self.model, self.test_img)
            plot_results(prediction, "epoch-" + str(epoch), "prediction")    def on_test_batch_end(self, batch, logs=None):
        self.psnr.append(10 * math.log10(1 / logs["loss"]))
####Define ModelCheckpoint and EarlyStopping callbacks.
early_stopping_callback = keras.callbacks.EarlyStopping(monitor="loss", patience=10)checkpoint_filepath = "/tmp/checkpoint"model_checkpoint_callback = keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_filepath,
    save_weights_only=True,
    monitor="loss",
    mode="min",
    save_best_only=True,
)model = get_model(upscale_factor=upscale_factor, channels=1)
model.summary()callbacks = [ESPCNCallback(), early_stopping_callback, model_checkpoint_callback]
loss_fn = keras.losses.MeanSquaredError()
optimizer = keras.optimizers.Adam(learning_rate=0.001)

Training the Model:

epochs = 100model.compile(
    optimizer=optimizer, loss=loss_fn,
)model.fit(
    train_ds, epochs=epochs, callbacks=callbacks, validation_data=valid_ds, verbose=2
)# The model weights (that are considered the best) are loaded into the model.
model.load_weights(checkpoint_filepath)

The set of production:

total_bicubic_psnr = 0.0
total_test_psnr = 0.0for index, test_img_path in enumerate(test_img_paths[50:60]):
    img = load_img(test_img_path)
    lowres_input = get_lowres_image(img, upscale_factor)
    w = lowres_input.size[0] * upscale_factor
    h = lowres_input.size[1] * upscale_factor
    highres_img = img.resize((w, h))
    prediction = upscale_image(model, lowres_input)
    lowres_img = lowres_input.resize((w, h))
    lowres_img_arr = img_to_array(lowres_img)
    highres_img_arr = img_to_array(highres_img)
    predict_img_arr = img_to_array(prediction)
    bicubic_psnr = tf.image.psnr(lowres_img_arr, highres_img_arr, max_val=255)
    test_psnr = tf.image.psnr(predict_img_arr, highres_img_arr, max_val=255)    total_bicubic_psnr += bicubic_psnr
    total_test_psnr += test_psnr    print(
        "PSNR of low resolution image and high resolution image is %.4f" % bicubic_psnr
    )
    print("PSNR of predict and high resolution is %.4f" % test_psnr)
    plot_results(lowres_img, index, "lowres")
    plot_results(highres_img, index, "highres")
    plot_results(prediction, index, "prediction")print("Avg. PSNR of lowres images is %.4f" % (total_bicubic_psnr / 10))
print("Avg. PSNR of reconstructions is %.4f" % (total_test_psnr / 10))

Here, we can say that we have worked to clarify an important part of the convolutional neural network, where we have worked to show some of what he knows from theoretical definitions, even by working on certain data. This project was taken from the website and is a famous tool in this field.

Mohamed B Mahmoud. Data Scientist.