The goals / steps of this project are the following:
My project includes the following files:
Using the Udacity provided simulator and my drive.py file, the car can be driven autonomously around the track by executing
python drive.py model.h5
The model.py file contains the code for training and saving the convolution neural network. The file shows the pipeline I used for training and validating the model, and it contains comments to explain how the code works.
The overall strategy for deriving a model architecture was to start with a very simple architecture in order to first setup a working end-to-end framework and to check all functionality (training, driving, simulator, video creation) and detect potential technical problems.
Then I replaced the simple network by the convolution model from Nvidia that was introduced in the class. The original Nvidia Net is described in the following image.
In order to gauge how well the model was working, I split my image and steering angle data into a training and validation set. I found that my first model had a low mean squared error on the training set but a high mean squared error on the validation set. This implied that the model was overfitting.
To combat the overfitting, I added dropout layers into model and added more training data.
Then I added a cropping layer in order to support focusing on the relevant parts of the images.
The final step was to run the simulator to see how well the car was driving around track one. There were a few spots where the vehicle fell off the track. It turned out that the cv2 lib is reading images in BGR format, and the drive.py providing images in RGB. Thus I added the conversion from BGR to RGB after loading the images.
At the end of the process, the vehicle is able to drive autonomously around the track without leaving the road video.mp4.
My final network consists of 11 layers, including
The input image is split into RGB planes and passed to the network.
The first layer of the network crops the images and removes the bottom and top parts that do not contribute to the calculation of the steering angle (bottom part contains the hood of the car and the top part captures trees and hills and sky). The normalizer is hard-coded and is not adjusted in the learning process. Performing normalization in the network allows the normalization scheme to be altered with the network architecture and to be accelerated via GPU processing. The convolutional layers were designed to perform feature extraction. The network uses strided convolutions in the first three convolutional layers with a 2×2 stride and a 5×5 kernel and a non-strided convolution with a 3×3 kernel size in the last two convolutional layers. After that fully connected layers leading to an output the steering angle.
The model contains dropout layers in order to reduce overfitting.
The model was trained and validated on two laps of route 1.
In order to allow huge amount of training data a data generator is used (model.py line 63-97)
The model was tested by running it through the simulator and ensuring that the vehicle could stay on the track.
The model used an adam optimizer, so the learning rate was not tuned manually (model.py line 51-52).
Training data was chosen to keep the vehicle driving on the road. I used a combination of center lane driving, recovering from the left and right sides of the road.
For details about how I created the training data, see the next section.
To capture good driving behavior, I first recorded a bit more than one lap on track one using center lane driving. Here is an example image of center lane driving:
Note: I got best results of training data when I controlled the simulator using my mouse.
I then recorded the vehicle recovering from the left side and right sides of the road back to center so that the vehicle would learn to get back to the center.
For each data point that was recorded by the simulator the following six images and steering angles are extracted (model.py line 104-137):
Examples of flipped center images
Examples of flipped right and left images
After the collection process, I had 10.308 images and steering angles.
I finally randomly shuffled the data set and put 20% of the data into a validation set.
I used this training data for training the model. The validation set helped determine if the model was over or under fitting. The ideal number of epochs was 10. I used an adam optimizer so that manually training the learning rate wasn’t necessary.