Convolutional Neural Networks for Image Classification

As a developer, the ability to build intelligent systems that can make decisions on their own is one of the most exciting aspects of working with cutting-edge technology. One of the most powerful tools for achieving this is the use of Machine Learning (ML) algorithms. In this post, I will delve into one of the most popular ML algorithms, Convolutional Neural Networks (CNNs), and provide a code example to help you understand how to implement them in your own projects.

CNNs are a type of deep learning algorithm that are commonly used for image classification and object recognition tasks. The basic idea behind CNNs is to use a series of convolutional and pooling layers to extract features from the input image, and then use a fully connected layer to make the final prediction. This architecture allows CNNs to learn hierarchies of features, where each layer extracts features at a different level of abstraction.

To get started with using CNNs, you’ll need to install the Keras library, which is a high-level library for building neural networks. You can install it by running the following command:

pip install keras

Once you have the library installed, you can import it and start building your model. Here’s an example of how you can use CNNs to classify images of handwritten digits:

from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.utils import to_categorical

(X_train, y_train), (X_test, y_test) = mnist.load_data()

X_train = X_train.reshape(60000, 28, 28, 1)
X_test = X_test.reshape(10000, 28, 28, 1)

y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(10, activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

model.fit(X_train, y_train, epochs=5, batch_size=32)

accuracy = model.evaluate(X_test, y_test, verbose=0)
print("Accuracy:", accuracy[1])

In this example, we’re first loading the MNIST dataset, which is a dataset of handwritten digits, from the Keras library. Next, we’re reshaping the data to have a shape of (60000, 28, 28, 1) for the training set and (10000, 28, 28, 1) for the test set, and converting the labels to categorical format.

We then define the architecture of our CNN model using the Sequential API from Keras. We start by adding a convolutional layer with 32 filters, a kernel size of (3, 3) and a ReLU activation function, followed by a max-pooling layer to reduce the spatial

Let’s now take a look at a more practical example. Imagine you are working on a project where you want to predict whether an image contains a dog or a cat, based on the pixels of the image. Here is an example of how you can use the CNN algorithm to predict cat or dog using Keras pre-trained model called “VGG16”:

from keras.applications import VGG16
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from keras.applications.vgg16 import preprocess_input
from keras.applications.vgg16 import decode_predictions

# load the model
model = VGG16()

# load an image from file
image = load_img('dog.jpg', target_size=(224, 224))

# convert the image pixels to a numpy array
image = img_to_array(image)

# reshape data for the model
image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))

# prepare the image for the VGG model
image = preprocess_input(image)

# predict the probability across all output classes
yhat = model.predict(image)

# convert the probabilities to class labels
label = decode_predictions(yhat)

retrieve the most likely result, e.g. highest probability
label = label[0][0]

print the classification
print('%s (%.2f%%)' % (label[1], label[2]*100))

In this example, we’re first loading the pre-trained VGG16 model from the Keras library. Next, we’re loading an image of a dog and converting it to a numpy array. We then reshape the data to have a shape of (1, 224, 224, 3) which is the input shape that the VGG16 model takes. Then we prepare the image for the VGG16 model by applying preprocess_input. Finally, we predict the probability of the image containing a dog or a cat by calling the predict method on the model and we use the decode_predictions to convert the probabilities to class labels and retrieve the most likely result.

It’s worth noting that using pre-trained models like VGG16 can save a lot of time and resources, and also gives good results. However, it’s important to note that this model was trained on a huge dataset and it may not work well on images that are very different from the ones it was trained on. Also, you can fine-tune the model by adding more layers or removing some layers to make it more suitable for your specific problem.