OpenCV Cascade Training Part 2

Creating Positive Samples

When gathering and creating a sample image it is important (unless you desire to do a LOT of work) to crop your samples so they are the same size.  Also, they need to contain the largest possible sample within the frame.  A 50×50 sample does not work properly if it only has a 20×20 clover in it.  If you choose not to do it this way, you have to mark the location with coordinates for every sample file… not happening in my world, I have better things to do.  So, I created the base 50 to be 50×50 and then created the samples using the method described below.

The positive sample images (the ones you manually created) should be placed in the MyWorkspace directory.  I named my positive images 1x.jpg, 2x.jpg … 50x.jpg.   Once the directory structure is setup and all the positive samples are added, the following command sequence will need to run for EACH of the sample images.  In my case the command for creating the positive images for 1x.jpg is:

opencv_createsamples -img 1x.jpg -bg bg.txt -info info/info1.lst -pngoutput info -maxxangle 0.25 -maxyangle 0.25 -maxzangle 0.25 -num 2000

Note the 1x.jpg and the info1.lst, these need to be changed each time to match the number of positive samples.  For example, the next command would be:

opencv_createsamples -img 2x.jpg -bg bg.txt -info info/info2.lst -pngoutput info -maxxangle 0.25 -maxyangle 0.25 -maxzangle 0.25 -num 2000

This will create info1.lst – info50.lst files in the info directory along with all the sample images used for training.  I realize this could be placed into a batch script, please feel free to follow that route.  I simply used the up arrow and completed them at one setting while doing other work, but to keep the love going, I created a batch file CreateImageBatch to get you started.  Do remember to make it an sh file that is executable before running it.  To make it executable, use the command “sudo chmod +x filename.sh” where filename is the name you give to your sh file. 

The images in the info folder are automatically created and the location of the positive sample mapped to the info file.  THANK YOU, CREATORS OF OpenCV!!  This makes life super easy.  A sampling of my images look like this:

The info list files need to be concatenated to a single all.lst file.   Below is a quick screen shot of my folder contents (I moved the .lst files to MyWorkspace for simplicity, they do not have to be moved. If you do move them, the all.lst file must be copied to the info directory before running the classifier)

Once you have all the .lst files in the same directory, you can run the following command in the same directory and it will concatenate all the files for you into a single file named all.lst.

cat *.lst > all.lst

Next, we need to create the positive vector file.  This is the output file containing the positive samples for training.  I used 50×50 size for training.  This is a rather large size and depending on your system’s configuration you could run out of memory.  For beta testing, 20×20 usually works well.  It is important to use the same size when training the cascade, so make note of what size you use.  To create the vector file for my images I used the following command.

opencv_createsamples -info info/all.lst -num 50000 -w 50 -h 50 -vec all.vec

At this point, we have 50,000 positive sample images created in the info directory, the bg.txt file containing information about the negative images, the all.lst file containing information about positive images, and the all.vec file used by the trainer.  We are now ready to train our application… in other words, build the much sought after cascade xml file for object detection!  That will be discussed in part 3.