This work proposes a supervised learning technique to control active cameras with Deep Convolutional Neural Networks to go directly from visual information to camera movement.

Active Camera Systems

Active vision systems (i.e., movable cameras with controllable parameters such as pan and tilt) can provide extended coverage, flexibility, and cost-efficiency, compared to static vision systems running in the cloud. This is particularly appealing for rapid deployment of cameras or temporary installations for particular affairs or other events. The cost saving over buying and installing a hard wired system for temporary or remote locations is immense. In addition, automated vision processing is necessary since there…

“The flexibility and cost efficiency of traffic monitoring using Unmanned Aerial Vehicles (UAVs) has made such a proposition an attractive topic of research.”


Unmanned Aerial Vehicles (drones) are emerging as a promising technology for both environmental and infrastructure monitoring, with broad use in a plethora of applications. Many such applications require the use of computer vision algorithms in order to analyse the information captured from an on-board camera. In particular, Road Traffic Monitoring (RTM) systems constitute a domain where the use of UAVs is receiving significant interest.

In traffic monitoring applications, UAVs can perform vehicle monitoring, without the need for…

Keras is a useful API for deep learning that also includes various pretrained models that you can used for transfer learning.

UPDATE! Now works with tf.keras!

The Keras API

Keras is a high level API (Application Programming Interface) for deep learning. That is it does not itself implement deep learning functionality but is built on-top of existing deep learning frameworks such as Tensorflow and provides improved functionality, faster implementation cycles, and added features. One of its most useful features is that it provided access to a large pool of existing deep learning models which are pre-trained on ImageNet (a rather time consuming and…

Deep learning approaches have demonstrated state-of-the-art performance in various computer vision tasks such as object detection and recognition. In this post I provide details on how to develop and train a Convolutional Neural Network (CNN) to detect top-view vehicles from UAV footage.


Unmanned Aerial Vehicles (drones) are emerging as a promising technology for both environmental and infrastructure monitoring, with broad use in a plethora of applications. In particular, Road Traffic Monitoring (RTM) constitutes a domain where the use of UAVs is receiving significant interest. Under the above deployments, UAVs are responsible for searching, collecting and sending, in real time, vehicle…

The objective of this project is to label pixels corresponding to road in images using a Fully Convolutional Network (FCN).


This specific module was a collaboration between UDACITY and NVIDIAs Deep Learning Institute. This module covers semantic segmentation, and inference optimization.

Semantic segmentation identifies free space on the road at pixel-level granularity, which improves decision-making ability. Inference optimizations accelerate the speed at which neural networks can run, which is crucial for computational-intense models like the semantic segmentation networks

The objective is to build and train fully convolutional networks that output an entire image, instead of just a classification output. For…

The goal of this project was to design a path planner that is able to create smooth, safe paths for the car to follow along a 3-lane highway with traffic. The path planner should be able to keep inside its lane, avoid hitting other cars, and pass slower moving traffic all by using localization, sensor fusion, and map data.


The goal of this project is to safely navigate a vehicle around a virtual highway. The requirement for this project is to design a path planner that creates smooth trajectories for the car to follow that are acceptably tolerable by a…

LBPs are local patterns that describe the relationship between a pixel and its neighborhood.

Local Binary Patterns (LBPs) have been used for a wide range of applications ranging from face detection [1], [2], face recognition [3], facial expression recognition [4], pedestrian detection [5], to remote sensing and texture classification [6] amongst others in order to build powerful visual object detection systems [7]. Many variants of LBPs have been proposed in literature [8]. The most common approach however, dictates that each 3×3 window in the image is processed to extract an LBP code. The processing involves thresholding the center pixel of…

Filtering and feature extraction are both very important tasks for efficient object recognition in embedded vision systems.

Perhaps one of the simplest, but also effective, forms of filtering is using color information which can be a very important factor in recognizing and detecting specific objects. For example, if you are interested in developing an app to detecting a red car then a wise first step during pre-processing is to first try and identify regions within the image that are mostly red. Of course, things are not always this simple but such approaches can be effective in many applications.

Below is…

“Parallel computing” stands for the ability of computer systems to perform multiple operations simultaneously. The main driver behind parallel computing is the fact that large problems can be divided to smaller ones which can be then solved in parallel — i.e. executed concurrently on the available computing resources.


The quest for squeezing more performance per silicon has been going on since the dawn of computing; from the deep execution pipelines in processors in the early 2000s, from the multicore SoCs found in mobile phones of today, to the General Purpose Graphics Processing Units that will power AI applications in the…

Object detection deals with determining whether an object of interest is present in an image/video frame or not. It is a necessary task for embedded vision systems as it enables them to interact more intelligently with their host environment, and increases their responsiveness and awareness with regards to their surroundings.

The detection/discovery of visual objects is a perceptual and cognitive task fundamental to vision and intelligence. It can be useful for a wide range of embedded applications ranging from robotics, surveillance and census systems, human-computer-interaction, intelligent transport systems, and military.

Object Detection Process

The process of visual object detection deals with determining whether…

Christos Kyrkou

PhD in Computer Engineering, Self-Driving Car Engineering Nanodegree, Computer Vision, Visual Perception and Computing

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store