koheikawata's Blog: Classification --- Deep Learning

We can try a deep learning algorithm by utilizing Keras on Tensorflow or CNTK. However, understanding what is going on inside neural networks is not an easy task. Visualizing how input data is converted into a simple array that describes classification category helps you figure out its structure.

Assume you want to classify a vehicle image into 12 category name, Accord, Altima, Camry, Corolla, Elantra, Freed, HR-V, Insight, Jazz, Prius, Roadster, S2000. When you input an RGB image with 128 * 128 pixels, the predicted outcome is one dimensional array of 12 elements which is in between 0 to 1. The neural networks are trained so the target values of an output array is below. This is an example of the target value when the input image is 'Jazz'.

Let's input a 'Jazz' image below to a trained deep learning model.

First of all, prepare images for training. The directory structure is like this. All images in the subfolder is RGB images with 128 * 128 pixels.

The code used to train using Keras on Tensorflow is this.

```
## Function definition --- Extract file names from subfolders
import os
def return_list_of_files(rootdir, printname=False):
    all_files = []

for subdir, dirs, files in os.walk(rootdir):
        for file in files:
            all_files.append(os.path.join(subdir, file))
            if printname: 
                print(os.path.join(subdir, file))
    return np.asarray(all_files)

## Function definition --- feature and label
## Point is [1] when splitting path. It depends on where you are now
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
def load_data(dataset_path):
    images_list = return_list_of_files(dataset_path)
    
    #print(images_list)
    features = np.empty(shape=(len(images_list), 128*128*3),
                     dtype=np.uint8)
    labels = []
    for i in range(len(images_list)):
        im = mpimg.imread(images_list[i])
        
        #features[i] = im
        features[i] = im.flatten()
        labels.append(images_list[i].split("/")[1])
    return features, np.asarray(labels)

## Call the functions
features, labels = load_data("processed_images")

## Encode labels
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit(labels)
labels_encoded = le.transform(labels)

## Describe classes
le.classes_

## split the training and testing data
from sklearn.model_selection import train_test_split
(X_train, X_test, Y_train, Y_test) = train_test_split(features, labels_encoded, test_size=0.2, random_state=999)

## Deep Learning training start
from __future__ import print_function
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K
batch_size = 32
num_classes = 12
epochs = 10
num_channels = 3

## input image dimensions
img_rows, img_cols = 128, 128

## make sure it is "channels_last"
print(K.image_data_format())

## Reshape training data
x_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, num_channels)
x_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, num_channels)
input_shape = (img_rows, img_cols, num_channels)

## define data type and normalize into 0-1
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(Y_train, num_classes)
y_test = keras.utils.to_categorical(Y_test, num_classes)

## set up model parameters
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.Adadelta(), metrics=['accuracy'])

## train model
model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=1, validation_data=(x_test, y_test))

```

The model that I trained can be described like below by using this code. It shows how each layers transform input arrays.

The details of model can be described by using model.summary()

The output looks like this

```
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D)            (None, 126, 126, 32)      896       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 63, 63, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 61, 61, 64)        18496     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 30, 30, 64)        0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 30, 30, 64)        0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 57600)             0         
_________________________________________________________________
dense_1 (Dense)              (None, 128)               7372928   
_________________________________________________________________
dropout_2 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 12)                1548      
=================================================================
Total params: 7,393,868
Trainable params: 7,393,868
Non-trainable params: 0
_________________________________________________________________
```

Input a 'Jazz.jpg' and predict on the model trained above

This is an output of the prediction

When you want to describe an intermediate output, the code is like this

The figure below describes how outputs after each layer looks like

Tuesday, September 25, 2018

Classification --- Deep Learning