This project shows how to classify german traffic signs using a modified LeNet neuronal network. (See e.g. Yann LeCu - Gradiant-Based Learning Applied to Document Recognition)
The steps of this project are the following:
I used the numpy library to calculate summary statistics of the traffic signs data set:
The following figure shows one example image for each label in the training set.
Here is an exploratory visualization of the data set. It is a bar chart showing how many samples are contained in the training set per label.
As a first step, I decided to convert the images to grayscale because several images in the training were pretty dark and contained only little color und the grayscaling reduces the amount of features and thus reduces execution time. Additionally, several research papers have shown good results with grayscaling of the images. Yann LeCun - Traffic Sign Recognition with Multi-Scale Convolutional Networks
Here is an example of a traffic sign image before and after grayscaling.
Then, I normalized the image using the formular
(pixel - 128)/ 128 which converts the int values of each pixel [0,255] to float values with range [-1,1]
The model architecture is based on the LeNet model architecture. I added dropout layers before each fully connected layer in order to prevent overfitting. My final model consisted of the following layers:
|Input||32x32x1 gray scale image|
|Convolution 5x5||1x1 stride, valid padding, outputs 28x28x6|
|Max pooling||2x2 stride, outputs 14x14x6|
|Convolution 5x5||1x1 stride, valid padding, outputs 10x10x16|
|Max pooling||2x2 stride, outputs 5x5x16|
|Fully connected||outputs 120|
|Fully connected||outputs 84|
|Fully connected||outputs 43|
To train the model, I used an Adam optimizer and the following hyperparameters:
My final model results were:
I used an iterative approach for the optimization of validation accuracy:
As an initial model architecture the original LeNet model from the course was chosen. In order to tailor the architecture for the traffic sign classifier usecase I adapted the input so that it accepts the colow images from the training set with shape (32,32,3) and I modified the number of outputs so that it fits to the 43 unique labels in the training set. The training accuracy was 83.5% and my test traffic sign “pedestrians” was not correctly classified. (used hyper parameters: EPOCHS=10, BATCH_SIZE=128, learning_rate = 0,001, mu = 0, sigma = 0.1)
After adding the grayscaling preprocessing the validation accuracy increased to 91% (hyperparameter unmodified)
The additional normalization of the training and validation data resulted in a minor increase of validation accuracy: 91.8% (hyperparameter unmodified)
reduced learning rate and increased number of epochs. validation accuracy = 94% (EPOCHS = 30, BATCH_SIZE = 128, rate = 0,0007, mu = 0, sigma = 0.1)
overfitting. added dropout layer after relu of final fully connected layer: validation accuracy = 94,7% (EPOCHS = 30, BATCH_SIZE = 128, rate = 0,0007, mu = 0, sigma = 0.1)
still overfitting. added dropout after relu of first fully connected layer. Overfitting reduced but still not good
added dropout before validation accuracy = 0.953 validation accuracy = 95,3% (EPOCHS = 50, BATCH_SIZE = 128, rate = 0,0007, mu = 0, sigma = 0.1)
further reduction of learning rate and increase of epochs. validation accuracy = 97,5% (EPOCHS = 150, BATCH_SIZE = 128, rate = 0,0006, mu = 0, sigma = 0.1)
Here are some German traffic signs that I found on the web:
The “right-of-way at the next intersection” sign might be difficult to classify because the triangular shape is similiar to several other signs in the training set (e.g. “Child crossing” or “Slippery Road”). Additionally, the “Stop” sign might be confused with the “No entry” sign because both signs have more ore less round shape and a pretty big red area.
Here are the results of the prediction:
The model was able to correctly guess 5 of the 5 traffic signs, which gives an accuracy of 100%. This compares favorably to the accuracy on the test set of 95.1%
The code for making predictions on my final model is located in the 21th cell of the jupyter notebook.
In the following images the top five softmax probabilities of the predictions on the captured images are outputted. As shown in the bar chart the softmax predictions for the correct top 1 prediction is bigger than 98%.
The detailed probabilities and examples of the top five softmax predictions are given in the next image.
Augmenting the training set might help improve model performance. Common data augmentation techniques include rotation, translation, zoom, flips, inserting jitter, and/or color perturbation. I would use OpenCV for most of the image processing activities.
All traffic sign images that I used for testing the predictions worked very well. It would be interesting how the model performs in case there are traffic sign that are less similiar to the traffic signs in the training set. Examples could be traffic signs drawn manually or traffic signs with a label that was not defined in the training set.
In Step 4 of the jupyter notebook some further guidance on how the layers of the neural network can be visualized is provided. It would be great to see what the network sees. Additionally it would be interesting to visualize the learning using TensorBoard
I would like to investigate how alternative model architectures such as Inception, VGG, AlexNet, ResNet perfom on the given training set. There is a tutorial for the TensorFlow Slim library which could be a good start.