Face Emotion Recognition — DeepCNN Python

Nischay Gowda
5 min readJan 24, 2021

Emotion Recognition using Tensorflow, simple and easily understandable code.

Photo by Tengyart on Unsplash

The most common application of CNN Computer Vision technology is Image processing. Given the images as input, RGB or BW, we use underlying the pixel data to extract specific information, based on the given label. We then train a CNN network.

If you are just getting started into Computer Vision, the simplest problem to work on would be, Cat and Dog classifier, which you study here.

Problem Statement

The task at hand is to build a Face emotion recognition model, in simple terms an Emotion Classifier, from a given Image input. You can find more details of the Problem and the dataset here.

Data Overview

The given data consists of, (35887, 3) datapoints rows and 3 column features.

1. emotion — numerical value to indicate the type of emotion, 0–2 being negative, 3–5 being positive and 6 indicates neutral emotion.

2. Pixels — represents the pixel coordinate point of the image.

3. usage — data split

emotion label — ‘anger’, ‘disgust’, ‘fear’, ‘happiness’, ‘sadness’, ‘surprise’, and ‘neutral’.

Preprocessing

We start off doing a bit of Analysis of data features. After plotting emotions based on the label, we find that

emotion_lable VS count

From, the given emotion labels, we would be selecting only, highest 3 labels by count. The majority of emotion classes belong to ‘happiness’, followed by ‘neutral’ and ‘sadness’.

Plotting the image data looks something like this.

A plot of pixels column, with unique emotion_labels.

We would be only considering, the 3 highest emotion labels by count i.e ‘happiness’, ‘sadness’, and ‘neutral’.

happiness    8989
neutral 6198
sadness 6077
fear 5121
anger 4953
suprise 4002
disgust 547

From the given pixel data, we can see Image is 48 x 48pixel data.

48.0

Since the given pixel data is a 2d matrix, we convert it to 3d matrix for CNN input format. We reshape it to 48 x 48 x 1.

Model Building

The important part of any AI solution is the model underneath, built during training. We can use any pre-trained model, you can more information about that here.

In our case, we can build a new model, using the basic building blocks of any CNN would be the same.

1.Convolution Layer — convolution is to apply feature detectors on the input image. The feature detector is also an array of numbers. For each feature detector, we slide it over the image and produce a new array of numbers, representing a feature of the image. So, the operation between an input image and a feature detector that results in a feature map is Convolution. You learn more about how a Convolution layer works, here.

2. Max pooling — Max pooling is to reduce the size of a feature map by sliding a table. Repeating max pooling on each feature map produces a pooling layer. Fundamentally, max pooling is to reduce the number of nodes in the fully connected layers without losing key features and spatial structure information in the images.

3. Flatten layer — Flattening is converting the data into a 1-dimensional array for inputting it to the next layer. We flatten the output of the convolutional layers to create a single long feature vector. And it is connected to the final classification model, which is called a fully-connected layer.

Full connection of layer consisting of 7 layers would look something like this.

Saved model visualized using Netron

The model consists of 7 layers,

  • Relu activation at each convolution layer.
  • BatchNormalization — after each layer, so that input features are of the same scale at each layer.
  • Dropout — This layer helps in generalizing the learning, by ignoring certain neuron at random during training. used at each layer
  • Max pooling — used to reduce the number of features, prevents overfitting.
  • Compiling — Since it's a categorical output, we have used Categrocial CrossEntropy while compiling CNN for accuracy.
  • Optimizer Adam optimizer is being used.
  • Softmax Flatten layer predicts out of 3 Emotion class labels (‘happiness’, ‘sadness’, and ‘neutral’).

Accuracy

We trained the model for epoch = 10 and epoch = 100. We obtained better results for epoch =100.

Epoch = 10
Epoch = 100

Accuracy is around 73 %. It can be improved further by increasing the epoch value, keep in mind it will consume a lot of GPU and also time. Given epoch = 10, to almost 3 hours to complete.

The Epochs history shows that accuracy gradually increases and achieved +83% accuracy on both training and validation set, but at the end, the model starts overfitting training data.

Prediction

After, the training model, the prediction from the model for classes — happy and sad, were like below.

As we can observe there a few misclassified labels. Any model wouldn't ever be perfect, 100 % right. 😂

Great! If you found this post helpful, feel free to hit those 👏 ‘s! If you need the source code, visit my Github page 🤞🤞. You can also, connect with me on Linkedin

References :

--

--