UDACITY SDCE Nanodegree: Term 1- Project 5: Vehicle Detection!

Through this project an algorithmic pipeline was developed capable of detecting and tracking vehicles.

The goals / steps of this project are the following:

  • Perform a Histogram of Oriented Gradients (HOG) feature extraction on a labeled training set of images and train a classifier Linear SVM classifier
  • Optionally, you can also apply a color transform and append binned color features, as well as histograms of color, to your HOG feature vector.
  • Note: for those first two steps don’t forget to normalize your features and randomize a selection for training and testing.
  • Implement a sliding-window technique and use your trained classifier to search for vehicles in images.
  • Run your pipeline on a video stream (start with the test_video.mp4 and later implement on full project_video.mp4) and create a heat map of recurring detections frame by frame to reject outliers and follow detected vehicles.
  • Estimate a bounding box for vehicles detected.

Histogram of Oriented Gradients (HOG)

Image for post
Image for post
Images of Car and non-Car Sampels

I then explored different color spaces and different skimage.hog() parameters (orientations, pixels_per_cell, and cells_per_block). Aslo I tried different classifiers (linear SVM, polynomial SVM, Naïve Bayes, and decision trees). The Naïve Bayes and decision trees consistently produced lower accuracy than the SVM; while the polynomial SVM was much slower at test time, although more accurate, than the linear SVM. Hence, I selected a linear classifier for my final solution. With regards to the feature space I ended up using the YCrCb color space with a combination of HoG on all three color channels and color histogram features. The spatial histogram resulted in more false positives so I dropped it from the feature process. I also experimented with other color spaces but the YCrCb produced more robust results. The final HoG parameters are orientations=12, pixels_per_cell=(10, 10) and cells_per_block=(2, 2). An example using some random images from each of the two classes are displayed below:

Image for post
Image for post
YCrCB channel representations and HoG transformations for non-car image (left) and car image (right)

I tried various combinations of parameters for the HoG transform. First I started by using only the HoG transform on a Grayscale image, without spatial binning or histogram. Even though this approach was fast it produced a lot of false positives. Hence, I tried to use all three channels for the HoG feature extraction. From the different color transformations (RGB, HSV, LUV, HLS, YUV, YCrCb), YCrCb produced the higher accuracy on the test set and so it was chosen as the final color transformation. However, after further testing on the test video the performance was not acceptable as the detection of different colored cars was not good. Intuitively, it was clear that some color information needed to be preserved in order to have some robustness whenever some HoG features are not detected due to the specific car color palate and contrast. First, I tried using both spatial binning and color histogram. Even though the test set accuracy increased the false positive rate on the test images increased. Intuitively, I think that this might be because the spatial binning may be to specific and does not abstract the information as well and so may end up overfitting the classifier. Hence, for the final set of experiments I tried using only color histogram with HoG. My intuition was correct as the results were improved. Finally, I also experimented with the HoG parametets. To be honest, I did not notice much difference on the accuracy for orientations between 8–12 and pixels_per_cell sizes between 8–10 so I ended up using orientations=12, pixels_per_cell=(10, 10) and cells_per_block=(2, 2) which marginally produced better results. One major difference was when using larger orientation values and larger cells_per_block size. In both cases the test speed increased considerably due to the larger vector size and also the accuracy decrease mainly because the feature space was much larger (> 12000) than the actual training size (~12 000) which is not a desirable property for SVMs. The final extracted features are normalized using the sklearn.preprocessing. StandardScaler(). The training information for pre-processing and training were saved in respective pickle files and loaded for use at run time. Results of the normalization process for random samples are shown below:

Image for post
Image for post
Initial Image Feature Vectors (Left) — Normalization Results for image features (right)

Classifier Training

Image for post
Image for post

Sliding Window Search

Image for post
Image for post
The windows searched at different scales

At the end, I searched on four scales using YCrCb 3-channel HOG features plus histograms of color in the feature vector, which provided a good result. Here are some detection examples:

Image for post
Image for post
Test Image Results

Heat-Map Optimizations

Image for post
Image for post
Successive Frames and Heat-Maps

Discussion

PhD in Computer Engineering, Self-Driving Car Engineering Nanodegree, Computer Vision, Visual Perception and Computing

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store