OpenCV Cascade Training Part 1

This post assumes you have OpenCV installed on your computer as described in Installing OpenCV on Ubuntu.  If you have not, I highly recommend you go back and ensure you have all the proper settings.  The information below shows the cascade classifier training applied to a real life scenario.  The full OpenCV documentation for the classifier is here, should you need more information.

The Plan

You will need a lot of images.  Really… a lot of images unless you only desire one particular item to be recognized.  As I researched the best way to train for a “generic” four leaf clover I realized I needed in the neighborhood of 40 to 50 thousand images.  Yikes!  I don’t have enough family and friends to hunt for 40 or 50 images, must less 50,000!  Secondly, each image would have to have the location of the four leaf clover in the image added to a data file.  This is seemingly impossible.

This is where I decided to try to utilize OpenCV’s positive image creation.   If you have a single positive image it will create “test” positive images from the single image.  This method is great IF you only want to recognize a SPECIFIC clover.  For instance, unless the next four leaf clover looked very close to the single positive image, it would not recognize it.  Consider the following images of a four leaf clover.

If the first image on the left was used to train the cascade, it is very possible the other images may not be recognized.  At least not without recognizing a bunch of unwanted clovers (you know the unwanted 3 leaf type).  Even worse, when there are a LOT of 3 leaf clovers all mixed in together. Like the picture below, only one of the marked areas is a four leaf clover.

This may seem trivial to the human eye, but not trivial to the program.  We only want it to pick one out of all the clutter, the correct one…

So how does one go about this?  My solution is to gather 50 to 100 four leaf clover images.  Crop the image down to only contain the four leaf clover and resize them to 100×100.  From this set of positive images, use OpenCV to create a set of 2000 images from each positive image.  This gives a total of 80,000-100,000 positive images!  There are other ways to achieve this, but this was the most straight forward approach I could think of with the tools at my disposal.

A common method for working with video is to convert the stream to grayscale.  While in grayscale, perform all the object recognition and map the findings to the color image.  It is much quicker to work with an array of 256 numbers than three arrays of 256 numbers.  Grayscale is used throughout this training process for optimization purposes.

The Preparation

The MyWorkspace folder (where OpenCV was installed) needs to have the following folder layout.

The first step is to obtain a bunch (a large bunch) of negative images.  I didn’t want just any images, I wanted images that were close to what I was training the cascade on.  Therefore I looked for fields of grass, forest, landscapes etc… until I had roughly 40,000 images stockpiled.  A good place to start your hunt is at image-net.org.  There you can search for images of a specific type (Trees, Grass, etc.).  There is a bit of a trick, they provide you the URL to each image.  You must write code to pull them in from the internet.  Python is a great tool for such jobs and as such, I have added a base framework code to do just that.

import numpy as np
import cv2
import urllib.request
import os

# The link is the wnid number from image-net.org
def get_url_images(link, save_dir, w, h):
    global pic_num
    url_link = "http://image-net.org/api/text/imagenet.synset.geturls?wnid=" + link
    
    image_links = urllib.request.urlopen(url_link).read().decode()
    
    if not os.path.exists(save_dir):
        os.makedirs(save_dir)
    for i in image_links.split('\n'):
        try:
            urllib.request.urlretrieve(i, save_dir+"/"+str(pic_num)+".jpg")
            # uncomment the next line to change them to gray scale
            img = cv2.imread( save_dir+"/"+str(pic_num)+".jpg", cv2.IMREAD_GRAYSCALE)
            new_image = cv2.resize(img,(w,h))
            cv2.imwrite( save_dir+"/"+str(pic_num)+".jpg", new_image)
            pic_num +=1
            
        except Exception as e:
            print(str(e))

def set_image_size(path, w, h):
    for file_type in [path]:
        for img in os.listdir(file_type):
            try:
                str_path = str(file_type)+'/'+str(img)
                # uncomment the next line to change them to gray scale
                imgGray = cv2.imread(str_path, cv2.IMREAD_GRAYSCALE)
                new_image = cv2.resize(imgGray,(w,h))
                cv2.imwrite(str_path, new_image)
                
            except Exception as e:
                print(str(e))
                

# collect images sets from image-net.org
global pic_num
pic_num = 1
# Add links to the array to gather more images
myLinks = ["n11752937","n12102133"]
for link in myLinks:
    get_url_images(link, "myDir", 200,200)

# Change the image size in a directory called newSize
#set_image_size("newSize", 100,100)

Once you have an adequate number of negative images, place them in the MyWorkspace\neg folder.  The next component of the classifier is to build a background description file.  This is a Background description file, it contains a list of images which are used as a background for randomly distorted versions of the object.  This file will be called bg.txt and will need to be created (Python is a good candidate to do this chore also).  The file must contain a list of all the files in the neg folder.  The first few lines of my file look like this:

It is important to note, if you are creating this file in windows then transferring the file to a Linux machine, you will most likely have errors when you execute the classifier.  The reason for the error is the difference between window’s and Linux’s end of line deliminator.  I suggest using an application called  dos2unix.  You can install it with apt-get.  The command line to change the bg.txt file is:

dos2unix bg.txt

This little command will save you a lot of trouble shooting, as the classifier simply fails and does not tell you why, when bg.txt is not in Linux style.