### Building Emotional State Predictor using Deep Learning

#### July 9, 2017 by Sethu Iyer

Deep Learning Transfer Learning Intermediate

### Introduction

In this post we will implement a neural network model which recognizes various emotions from facial expressions. The reader is assumed to be already familiar with the basics of Convolutional Neural Network.

### 1. Problem Statement

Given an image containing a single face, the aim is to build a predictive model to determine the emotion of the person from the facial expressions. The emotions identified belongs to these classes: Angry, Sad, Happy, Scared, Shocked and Disgust.

Assumptions: The image contains only one face. The image is gray scaled and is of the size (224,224).

### 2. Data and Preprocessing

Dataset is collected using Bulk Bing Image Downloader. Make sure to download at least 50 images per class for good prediction. For this blog post, I am using 59 images per class. The number of images per class should be equal to avoid Class Imbalance problem.

For a better performance, you should try raising the number of training examples per class. The data directory structure should be as follows:

data/
train/
class_01/
class_02/
class_03/
...
validation/
class_01/
class_02/
class_03/
...


where class_01 and class_02 etc. are class labels. Make sure to remove .pickle file from each class folders.

In case of multiple faces in a single image, Use the script below to detect faces in image and auto crop it. It would make the classifier accurate as it will not bother about features other than facial expressions. We use dlib and OpenCV to do this.

# Detect faces, auto crop and save it.
import dlib
import cv2
face_detector = dlib.get_frontal_face_detector() #uses histogram of gradients
def detect_and_auto_crop_faces(img_path):
idx = img_path.index('.')
main_path = img_path[:idx]
ext=img_path[idx+1:]
detected_faces = face_detector(image,1)
for i,d in enumerate(detected_faces):
crop_img = image[d.top():d.bottom(),d.left():d.right()]
resized_image = cv2.resize(crop_img, (224, 224))
if i == 1:
crop_filename = main_path+'.'+ext
else:
crop_filename=main_path+str(i-1)+'.'+ext
cv2.imwrite(crop_filename,crop_img)


To select all the images, we use the glob module.

#3. Import the custom dataset, preprocesss it and compute the top 4096 features.
import glob
train_angry = glob.glob('./data/train/angry_face/*')
train_disgust = glob.glob('./data/train/disgust_face/*')
train_happy = glob.glob('./data/train/happy_face/*')
train_scared = glob.glob('./data/train/scared_face/*')
train_shocked = glob.glob('./data/train/shocked_face/*')

total_train = len(train_angry) + len(train_disgust) + len(train_happy) + len(train_sad) + len(train_scared) + len(train_shocked)

val_angry = glob.glob('./data/validation/angry_face/*')
val_disgust = glob.glob('./data/validation/disgust_face/*')
val_happy = glob.glob('./data/validation/happy_face/*')
val_scared = glob.glob('./data/validation/scared_face/*')
val_shocked = glob.glob('./data/validation/shocked_face/*')

total_val = len(val_happy) + len(val_sad) + len(val_angry) + len(val_disgust) + len(val_scared) + len(val_shocked)


The first step of pre processing is to convert the image into grayscale. To do that, we use the following code.

#Step 4: Convert them to grayscale
def convert_to_grayscale(img_path):
img_path=img_path.replace('\\','/')
dot = img_path.index('.')
img_path = img_path[0:dot]
img_path = img_path + '.jpg'
cv2.imwrite(img_path,image)

total_list = train_angry + train_disgust + train_happy + train_sad + train_scared + train_shocked + val_angry + val_disgust+val_happy+val_sad+val_scared+val_shocked
train_angry
for img_path in total_list:
convert_to_grayscale(img_path)


Second step of the pre processing is image encoding. VGG16 network will encode the (224,224) sized image into (4096,1) sized vector. To implement the encoding, we use Keras and NumPy.

from keras.applications import VGG16,imagenet_utils
import numpy as np
from keras.models import Model
preprocess = imagenet_utils.preprocess_input
model = VGG16(weights="imagenet",include_top=False)
def convert_img_to_vector(img_path):
image = img_to_array(image)
image = np.expand_dims(image,axis=0)
image = preprocess(image)
return image

def get_image_feature(img_path):
feats = model.predict(convert_img_to_vector(img_path))
return feats


get_image_feature is the main function which takes in the image path, and returns the encoding of the image.

After retrieving the filenames using glob module, we call get_image_feature and save the encoded training examples as .npy files.

feats_train_angry = np.array([[get_image_feature(filename)] for filename in train_angry])
feats_train_disgust = np.array([[get_image_feature(filename)] for filename in train_disgust])
feats_train_happy = np.array([[get_image_feature(filename)] for filename in train_happy])
feats_train_scared = np.array([[get_image_feature(filename)] for filename in train_scared])
feats_train_shocked = np.array([[get_image_feature(filename)] for filename in train_shocked])
np.save('feats_train.npy',feats_train)


We do the same with validation images.

### 3. Understanding the Problem statement

The problem of emotion detection is a multi class classification problem. So, one hot encoding of the output labels is necessary as Softmax with categorical cross entropy will be used. We can use np.eye for one hot encoding, but as this is an explanatory post, I am writing the encoding explicitly.

#First, lets one hot encode the label.
angry_encoding=[1,0,0,0,0,0]
disgust_encoding=[0,1,0,0,0,0]
happy_encoding=[0,0,1,0,0,0]
scared_encoding=[0,0,0,0,1,0]
shocked_encoding=[0,0,0,0,0,1]


Now, training labels can be defined using this encoding. Each encoding has the shape (1,6)

train_labels = np.array([angry_encoding * 59 +
disgust_encoding * 59 +
happy_encoding * 59 +
scared_encoding * 59 +
shocked_encoding * 59]).reshape(-1,6)
#59 examples of each


In a similar way, validation labels is defined.

### 4. Designing the network

Now that input preprocessing is done, we are ready to design the network. The problem in hand is a multi class classification problem, so Softmax with cross entropy is used at readout layer. To prevent over fitting, Dropout layer can be used. The image shown below is the summary of the network used in building the classifier.

The network is implemented using Keras. rmsprop is used although Adam optimizer is generally used.

from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import rmsprop
model = Sequential()
model.add(Dense(6, activation="softmax", kernel_initializer="normal")) # 6 classes to consider
opt = rmsprop(lr=0.0001,decay=1e-6)
model.compile(optimizer=opt,loss='categorical_crossentropy',metrics=['accuracy'])


### 5. Training the network

59 training images per class is very less and can lead to overfitting easily. Hence, early stopping is used while monitoring validation loss.

import keras
earlyStopping=keras.callbacks.EarlyStopping(monitor='val_loss', patience=1, verbose=0, mode='auto')
model.fit(feats_train,
train_labels,
epochs=15,
batch_size=16,
validation_data=(feats_val,validation_labels),
verbose=1,
callbacks=[earlyStopping])
print('Training Completed!')


On running this code, we get the following output

Train on 354 samples, validate on 12 samples
Epoch 1/15
354/354 [==============================] - 0s - loss: 1.7924 - acc: 0.1554 - val_loss: 1.7765 - val_acc: 0.2500
Epoch 2/15
354/354 [==============================] - 0s - loss: 1.7662 - acc: 0.2542 - val_loss: 1.7701 - val_acc: 0.1667
Epoch 3/15
354/354 [==============================] - ETA: 0s - loss: 1.7457 - acc: 0.300 - 0s - loss: 1.7437 - acc: 0.3107 - val_loss: 1.7566 - val_acc: 0.2500
Epoch 4/15
354/354 [==============================] - 0s - loss: 1.7186 - acc: 0.3672 - val_loss: 1.7522 - val_acc: 0.25000.359
Epoch 5/15
354/354 [==============================] - 0s - loss: 1.6914 - acc: 0.4181 - val_loss: 1.7460 - val_acc: 0.2500
Epoch 6/15
354/354 [==============================] - 0s - loss: 1.6601 - acc: 0.4548 - val_loss: 1.7360 - val_acc: 0.2500
Epoch 7/15
354/354 [==============================] - 0s - loss: 1.6291 - acc: 0.4831 - val_loss: 1.7263 - val_acc: 0.3333
Epoch 8/15
354/354 [==============================] - 0s - loss: 1.5970 - acc: 0.4972 - val_loss: 1.7309 - val_acc: 0.2500
Epoch 9/15
354/354 [==============================] - 0s - loss: 1.5589 - acc: 0.5508 - val_loss: 1.7273 - val_acc: 0.3333
Training Completed!


Validation accuracy is just 33% but it is better than random guessing which is $\frac{100}{6} = 16.67\%$. The validation accuracy is low because of less training examples.

### 6. Visualizing the results

We use the following code to view the results in IPython notebook.

%matplotlib inline
from matplotlib.pyplot import imshow
def predict_mood(img_path):
decode_dict={0: 'Angry', 1: 'Disgusted', 2: 'Happy', 3:'Sad', 4:'Scared',5:'Shocked'}
feats = get_image_feature(img_path)
feats = feats.reshape(-1,4096)
probab = model.predict_proba(feats,verbose=0)
top_2 = probab[0].argpartition(-2)[-2:][::-1] #get the top two probabilities.
percent_high = np.around(100*probab[0][top_2[0]],decimals=2)
percent_secondhigh = np.around(100*probab[0][top_2[1]],decimals=2)

print('The person in the image is '+str(percent_high)+' % '+decode_dict[top_2[0]] +' and '+str(percent_secondhigh)+' % '+decode_dict[top_2[1]])


Running this code on the test images, we get really good predictions.

The same can be extended as a web application. Click here to view the demo of a web app which recognizes whether the person is happy or sad. Code of the same is available on Github.

Here are a couple of useful tips that may improve the performance of the model:

• Use more training examples
• Add Dropout layer to the network to combat overfitting.
• Explore more about early stopping