Computer Vision Algorithms implemented on FPGA

Christos Kyrkou

5 min readFeb 16, 2017

Overview

•Color Conversions (Gray-scale, Red Component, Blue Component, Green Component)

•Image Filters and Feature Extraction (Edge Detection, Image Gradients, Local Binary Pattern Features)

•Computer Vision (Skin Detection, Motion Detection)

•Machine Learning (Support Vector Machine Face Verification System)

•Platform: Spartan 6 LX150t Industrial Video Processing Kit

•Hardware Description Language: Verilog using Xilinx design suite (ISE and EDK)

•Embedded Processor: Microblaze

•Programming Language: C

Spartan 6 LX150t Industrial Video Processing Kit

FPGA Platform

The platform where the vision processor was developed is a Xilinx Spartan6-based Industrial Video Processing board equipped with an on-board video camera which makes it ideal for real-time video processing.

Overall System Architecture

The vision processor interfaces with a Microblaze soft-processor system. The processor configures various peripherals on the FPGA used in the prototype for communication with the host PC and Video Display. The software is written in C. Processing is real-time and is done in the video capturing rate which enables the system to operate at 30 frames per second.

System-on-Chip architecture for video processing system

Architecture for window- and pixel- based Image processing Cores

The edges, gradients and local binary patterns are implemented with the help of a scan-line buffer structure which is used to provide serial pixel input and parallel pixel output which corresponds to the image window that will be processed. Each window extracted from the scan-line buffer was processed by the kernels shown in the image below in order to obtain the processing result. The color conversions, dominant color finder, and skin detection operate on a pixel by pixel basis and thus a simpler scheme was used. For each pixel the comparisons shown in the image where implemented and the result was obtained.

Architecture for the motion detection core

The motion detection processing core stores each incoming frame in a memory module and compares each pixel of the stored image with the incoming pixel of the succeeding frame. It detect motion if the difference between the two values is greater that a certain threshold. If it is it displays a yellow pixel value otherwise the image is black.

Architecture of the face verification core

The face verification application determines if a face is placed in front of the camera or not. To do so it processes the image using a very popular pattern recognition algorithm called support vector machine (SVM) to classify the image as face or not. In order to implement this core different sub-modules needed to be implemented. An image rendered to display and scale the video output, a feature extraction module to pre-process the image and ready it for classification, and the classification module that implemented the SVM algorithm.

Image Rendering Module: This module is responsible for displaying the image and results on the monitor based on the outcome of the SVM classifier. First, it displays the image to be classifier on the top-left-most corner. On the remaining screen it displays the same image but in fullscreen. For this purpose a dynamic upscaling method was also implemented in hardware to perform this operation. Finally, depending on the classification outcome it overlays a face template image on top of the window to indicate the presence of a face.

Face Verification System on screen visualization

Feature Extraction: Prior to the classification process usefull features will need to be extracted from the image in order to account for different invariances in the face class and the unpredictability of the non-face class. The chosen features are histograms of local binary patterns which have a simple and fast computation which is a desirable property when considering embedded applications on FPGAs. The overall process is shown in the image below which is what has been implemented on the FPGA. It first requires performing comparison operations of the center of a sliding window to produce the LBP image. Then this image is segmented into cells and for each cell a histogram is generated. The concatenation of all individual histograms creates the feature that will be classified.

Support Vector Machine (SVM) Classifier: The SVM classifier was first trained in MATLAB in order to obtained the classification model. For this purpose a total of 3000 positive samples and 50000 negative samples where collected and used for training of a 2nd degree polynomial SVM. This model was then implemented on the FPGA using the architecture proposed in [Kyrkou and Theocharides 2012]. The architecture is shown below. First, for the LBP feature computation a scan-line buffer is used to receive the pixels from the window memory sequentially and output them in parallel in the form of a window which the LBP processor uses to find the feature value. Once the whole window is processed the histogram computation begins which counts the occurrence of values within each block and computes a histogram for each block. All histograms are stored sequentially in a memory which essentially contains the concatenated histogram. For the classification phase all histogram values are outputted sequentially to a chain of processing elements which handles the SVM classification. The SVM data are processed in parallel with the incoming values and a module at the right of the chain performs the result aggregation and obtains the final result.