In this blog we will learn a sample code which you can use with any laptop or computer which has a webcam to detect your eye and face. This sets up your foundation stone towards learning computer vision. In case you are new to OpenCV or computer we would request you to have a look at our earlier blogs as below -
Robotic vision Vs Computer vision
Ok, let us get started right away. So why do we need to detect a face or an eye? Guys ! the whole purpose of a robot is to make it do some work using some capabilities. (I know I did not follow a book definition). So if you train your webcam to recognise any human face and eye you have done a good job because it now knows who is a human! (irony and pun not intended).
Now we know why we need to detect face and eye so the next jargon to fight is - Haar Cascades.
Haar - These are digital image features used in object recognition. They owe their name to their intuitive similarity with Haar wavelets and were used in the first real-time face detector. Oops was that heavy? Ok, let me try to make it a bit simple using the concept of black and white shades. A Haar feature in simple terms consists of adding and subtracting rectangular image regions before thresholding the result as shown below in the grey rectangle ABCD.
Classifier - a classifier (namely a cascade of boosted classifiers working with haar-like features) is trained with a few hundred sample views of a particular object (i.e., a face or a car), called positive examples, that are scaled to the same size (say, 20x20), and negative examples - arbitrary images of the same size.
After a classifier is trained, it can be applied to a region of interest (of the same size as used during the training) in an input image. The classifier outputs a “1” if the region is likely to show the object (i.e., face/car), and “0” otherwise. To search for the object in the whole image one can move the search window across the image and check every location using the classifier. The classifier is designed so that it can be easily “resized” in order to be able to find the objects of interest at different sizes, which is more efficient than resizing the image itself. So, to find an object of an unknown size in the image the scan procedure should be done several times at different scales.
The word “cascade” in the classifier name means that the resultant classifier consists of several simpler classifiers (stages) that are applied subsequently to a region of interest until at some stage the candidate is rejected or all the stages are passed.
Ok now that we have got some ideas on the theories. Let us now drive to the approach we may take. There are 2 ways in OpenCV to detect face and eye in real time. One way is to train the classifier and the other (we think it is easier) way is to use the default training XML provided with the default installation provided. We will refer the XML as shown below which contains the trained data sets with human faces -
These files already have the trained data which we will refer to our code. The option to train own data set will be covered in a later blog as it is more difficult.
import numpy as np
# local modules
from video import create_capture
from common import clock, draw_str
def detect(img, cascade):
rects = cascade.detectMultiScale(img, scaleFactor=1.3, minNeighbors=4, minSize=(30, 30),
if len(rects) == 0:
rects[:,2:] += rects[:,:2]
This code block essentially calls the numpy packages and uses the default function - cascade.detectMultiScale() with below available parameters.
cascade – Haar classifier cascade (OpenCV 1.x API only). It can be loaded from XML or YAML file using Load(). When the cascade is not needed anymore, release it using cvReleaseHaarClassifierCascade(&cascade).
image – Matrix of the type CV_8U containing an image where objects are detected.
objects – Vector of rectangles where each rectangle contains the detected object.
scaleFactor – Parameter specifying how much the image size is reduced at each image scale.
minNeighbors – Parameter specifying how many neighbours each candidate rectangle should have to retain it.
flags – Parameter with the same meaning for an old cascade as in the function cvHaarDetectObjects. It is not used for a new cascade.
minSize – Minimum possible object size. Objects smaller than that are ignored.
maxSize – Maximum possible object size. Objects larger than that are ignored.
The next section would call the xml we talked above and also define the rectangle which would draw up the eyes and face.
def draw_rects(img, rects, color):
for x1, y1, x2, y2 in rects:
cv2.rectangle(img, (x1, y1), (x2, y2), color, 2)
if __name__ == '__main__':
import sys, getopt
args, video_src = getopt.getopt(sys.argv[1:], '', ['cascade=', 'nested-cascade='])
video_src = video_src
video_src = 0
args = dict(args)
cascade_fn = args.get('--cascade', "../../data/haarcascades/haarcascade_frontalface_alt.xml")
nested_fn = args.get('--nested-cascade', "../../data/haarcascades/haarcascade_eye.xml")
cascade = cv2.CascadeClassifier(cascade_fn)
nested = cv2.CascadeClassifier(nested_fn)
cam = create_capture(video_src, fallback='synth:bg=../data/lena.jpg:noise=0.05')
The complete code can be seen at - here.
Run the code and give it a few seconds before it can start detecting your cam and face. Try moving right and left and you would see that the classifier would now move along with you. Just in case the screen is frozen you may need to use your mouse to first select the face when you start the program.
In our next blog we would see how we can use a pan tilt servo in a robot car to use this algo to detect a blue ball. Please comment and share this blog.
Stand a chance to win a lucky prize if you tweet a working picture of your face identification at twitter #mierobot
By Prmorgan at English Wikipedia [Public domain], via Wikimedia Commons