Skip to main content

Simple Opencv tutorial for yolo darknet object detection in DNN module

This tutorial will learn you how to use deep neural networks by Yolo Darknet to detect multiple classes of objects. The code is under 100 lines of simple code. The code is using yolov3-tiny.weights of neural network and appropriate configuration yolov3-tiny.cfg. The code is presented as simple as possible, without the staff nice to have,  but not necessary to understand the flow of the code. 
people detection deep neural network
Yolo darknet in opencv

OpenCV 4.x requirements for DNN module running Yolo (yolov3-tiny)

I am using OpenCV 4.2 compiled on windows machines with contribution modules. I do not compile OpenCV with any special backend, like Cuda and, etc. You can found the description of how to compile OpenCV by CMake for Visual Studio 2019 here on my blog. Just exclude the GStreamer related specialties. 

How to setup Opencv Visual Studio 2019 Project

This is very common. No big deal here. In project properties for Release configuration and Platform for x64 the additional include directories point into location to include directory under your  OpenCV installation. 
opencv deep neural network project
 Additional library Directories point to the location of x64/vc16/lib under the directory, where the opencv is installed.

opencv yolo visual studio
The list of additional dependencies contains the standard .lib as on the following picture and one from the extra modules opencv_dnn420.lib. 
opencv yolo in deep neural network module

Prerequisites for OPENCV DNN to run yolo neural networks

The following yolov3-tiny.weights, yolov3-tiny.cfg needs to be downloaded from Yolo darknet site. The first one contains the weights values of the neural network and the second .cfg the configuration. I put these two into the same directory, where Visual Studio is generating .exe file for my project. 
  • yolov3-tiny.weights
  • yolov3-tiny.cfg
We have all the needed prerequisites.
Yolo opencv dnn module
I am using YOLOv3-tiny model

Visual studio project structure for Opencv DNN

I will show you, where the models are located and how to loaded. This is why this chapter s located after the Yolo models are downloaded. The structure of my Visual Studio project is as follows. DNN is the root directory for my project. Under DNN is one DNN.sln solution, and two DNN and x64. The source code is located under DNN and the executable file is located in x64/Release. This depends if you are building a project in a release or debug setting in Visual Studio. 

Visual studio
In the release directory is the executable module DNN.exe. The opencv_xxxxxx420.dll is located in this folder for simplicity. You can skip the specification of the path system variable in this case. I have located here the input video samp.MOV. The last important thing is that yolo3-tiny.weights and yolo3-tiny.cfg downloaded from yolo website are located here as well. 

opencv project

Opencv DNN running Yolo darknet code high-level picture


int main()
{
    //* video capture setting    
    //* basic parameters 
    //* path to load model
    //* load neural network and config
    //* Set backend and target to execute network 
    for (;;)

    {
        if (!cap.isOpened()) {
            cout << "Video Capture Fail" << endl;
            break;
        }
        //* capture image
        Mat img;
        cap >> img;
        //* create blob from image
        //* Set blob as input of neural network.
        //* perform network.forward evaluation
        //* process the output layer evaluation 
        //  region proposal (most difficult step)
        //* display image
    }
    return 0;
}

Opencv yolo darknet DNN code explanation in details

I will skip some details about VideoCapture from the file. The code related to DNN started by:

    std::string model = "C:/Users/Vlada/Desktop/DNN/x64/Release/yolov3-tiny.weights";
    std::string config = "C:/Users/Vlada/Desktop/DNN/x64/Release/yolov3-tiny.cfg"; 

    Net network = readNet(model, config,"Darknet");
    network.setPreferableBackend(DNN_BACKEND_DEFAULT);
    network.setPreferableTarget(DNN_TARGET_OPENCL);

the cv::dnn::Net class allows you to create various deep neural network structures, based on the types of implemented layers.  The Net class is initialized by readNet function that reads network represented by various formats.
The first parameter of readNet is the location of the neural network model - weights definition, the second parameter is the configuration of the network and the last is the name of the framework (darknet in our example).  
The model supported in readNet:
*.caffemodel (Caffe, http://caffe.berkeleyvision.org/)
*.pb (TensorFlow, https://www.tensorflow.org/)
*.t7 | *.net (Torch, http://torch.ch/)
*.weights (Darknet, https://pjreddie.com/darknet/)
*.bin (DLDT, https://software.intel.com/openvino-toolkit)
*.onnx (ONNX, https://onnx.ai/)
The config supported in readNet:
*.prototxt (Caffe, http://caffe.berkeleyvision.org/)
*.pbtxt (TensorFlow, https://www.tensorflow.org/)
*.cfg (Darknet, https://pjreddie.com/darknet/)
*.xml (DLDT, https://software.intel.com/openvino-toolkit)

Net class .setPreferableBackend(DNN_BACKEND_DEFAULT)

This set computational backend for DNN. The best is to use the default value. If you build OpenCV with Inference engine by intel the default is DNN_BACKEND_INFERENCE_ENGINE. The DNN_BACKEND_HALIDE is a popular backend but a little bit difficult to build on a windows machine. This backend is built in LLVM compiler. I have issues to properly build OpenCV by LLVM with HALIDE computational backend. Just default is the best for the start. 

Net class .setPreferableTarget(DNN_TARGET_OPENCL)

This represents the build target that can speed up your computation by special hardware instructions of the target. I will try to use just the first three if your architecture allows. 

DNN_TARGET_CPU
DNN_TARGET_OPENCL
DNN_TARGET_OPENCL_FP16
DNN_TARGET_MYRIAD
DNN_TARGET_FPGA

The network is not loaded and Net class should be successfully initialized.

Opencv blobFromImage preprocessing for yolov3-tiny

The yolo3-tiny.cfg using the input layer of scale 416 x 416. This is reflected in blobFromImage, where input img is processed into blob 4D array. The width and height Size(416416)of the input layer. The blobFromImage is using an image as the first argument and produces output blob as the second argument of the function. The third argument is normalization. I am performing normalization after on network.setInput in scale parameter. This mean that the normalization of the blob is not needed the 1 to 1 with the original image.  The fourth argument is the Size, which needs to match the size of input layer. Since OpenCV sometimes using BGR color schema instead of RGB. The swapRB is needed and produces exactly BGR -> RGB transformation of color channels.
yolo3-tiny.cfg

Input blob needs to be normalized (RGB is color scale 0-255 for each channel). This normalization is into float from 0 - 1, The scale parameter normalize all intensity values into the range of 0-1 of blobFromImg in function network.setInput(  ,    , scale,    ) parameter. The mean subtraction
value. We can ignore the mean, but the scale is important. The general normalization of the blob in
setInput follows this equation: input(n,c,h,w)=scalefactor×(blob(n,c,h,w)−mean).

        bool swapRB = true;
        blobFromImage(img, blobFromImg, 1Size(416416), Scalar(), swapRB, false);
    
        float scale = 1.0 / 255.0;
        Scalar mean = 0;
        network.setInput(blobFromImg, "", scale, mean););
The input is set and netutal network forward evaluation can be performed.

Evaluate yolo neural network model

The following code will take input set in previous step and perform forward evaluation trough the network. The output is written into a preprepared outMat Mat container. 

        Mat outMat;
        network.forward(outMat);

Display results of yolo (yolov3-tiny) neural network 

I think This is the most difficult part. Do not afraid of this step at all. The outMat.rows and cols just evaluate the dimension of the output matrix. 
        // rows represent number of detected object (proposed region)
        int rowsNoOfDetection = outMat.rows;

        // The columns looks like this, The first is region center x, center y, width
        // height, The class 1 - N is the column entries, which gives you a number, 
        // where the biggest one corresponding to most probable class. 
        // [x ; y ; w; h; class 1 ; class 2 ; class 3 ;  ; ;....]
        //  
        int colsCoordinatesPlusClassScore = outMat.cols;

ONE row is exactly one region. The yolo darknet output is a region proposal. One row is one region and number of rows is number of the proposed region to consider. The row looks like this. 
[x ; y ; w; h; class 1 ; class 2 ; class 3 ;  ; ;....]
The x y w h values are the coordinates, width and heigth of proposed region. The class 1; to class n is number that evaluates the most probable class of detected object in this concrete region. This is super simple. Just have a look at the example. 

Yolo region proposal format

This outMat has 3 rows and 7 columns. This means 3 proposed region with the detected object. 
You can need to find the highest score from column 5 to column 7. For the first row is the car,  the second is the bus and the last is the truck. There is not a car bus and truck in the column. This is just the index of the score that belongs to the class. After you find the highest score, You will perform evaluation if this score is higher than your threshold. If yes, you can take the first four columns to write the rectangle. You can name the rectangle based on index of the higher score. 
[x ; y ; w; h; car 0.1 ; bus 0 ; truck 0.01]
[x ; y ; w; h; car 0 ; bus 0.9 ; truck 0.4]
[x ; y ; w; h; car 0 ; bus 0.3 ; truck 0.9]

This for loops process the rows (proposed regions one by one). The Mat called scores is just one row(j) from column 5 up to the size (number of columns). We are just skipping the information about the region coordinates. 

for (int j = 0; j < rowsNoOfDetection; ++j)
            {            // for each row, the score is from element 5 up to number of classes index
                                              //(5 - N columns)
             Mat scores = outMat.row(j).colRange(5, colsCoordinatesPlusClassScore);

The following code find the maximum in the score matrix and put into confidence variable and Position of the maximum to the PositionOfMax variable. 
                Point PositionOfMax;
                double confidence;
                minMaxLoc(scores, 0, &confidence, 0, &PositionOfMax);

Display the yolo region proposal results

If the confidence passes the threshold 0.0somethink. The region can be displayed. Now The first four elements of outMat row can be valuable. outMat.at<float>(j, 0) takes firs element of j row. the (j,1) takes the y, (j,2) takes width and the last fourth element of the row is height. Based on the calculated values the rectangle is displayed over the image. The position of the max is transferred into the string, which is the numeric value of the image. If you want to have a name instead of the numeric representation of the class. You need to have table to translate 1 to Car for each class. 

                if (confidence > 0.0001)
                {
                    int centerX = (int)(outMat.at<float>(j, 0) * img.cols); 
                    int centerY = (int)(outMat.at<float>(j, 1) * img.rows);
                    int width =   (int)(outMat.at<float>(j, 2) * img.cols+20); 
                    int height =   (int)(outMat.at<float>(j, 3) * img.rows+100); 
                    int left = centerX - width / 2;
                    int top = centerY - height / 2;


                    stringstream ss;
                    ss << PositionOfMax.x;
                    string clas = ss.str();
                    int color = PositionOfMax.x * 10;
                    putText(img, clas, Point(left, top), 12Scalar
                    (color, 255255), 2false);
                    stringstream ss2;
                    ss << confidence;
                    string conf = ss.str();
                   
                    rectangle(img, Rect(left, top, width, height), 
                    Scalar(color, 00), 280);

OpenCV DNN Yolo darknet full tutorial code sample for yolov3-tiny

#include <iostream>
#include <opencv2/opencv.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/video.hpp>
#include <opencv2/dnn.hpp>
#include <opencv2/videoio.hpp>
#include <opencv2/imgproc.hpp>
using namespace cv;
using namespace std;
using namespace dnn;

int main()
{
    VideoCapture cap("C:/Users/Vlada/Desktop/DNN/x64/Release/samp.MOV");
    std::string model = "C:/Users/Vlada/Desktop/DNN/x64/Release/yolov3-tiny.weights";  //findFile(parser.get<String>("model"));
    std::string config = "C:/Users/Vlada/Desktop/DNN/x64/Release/yolov3-tiny.cfg"; //findFile(parser.get<String>("config"));

    Net network = readNet(model, config,"Darknet");
    network.setPreferableBackend(DNN_BACKEND_DEFAULT);
    network.setPreferableTarget(DNN_TARGET_OPENCL);

    for (;;)
    {
        if (!cap.isOpened()) {
            cout << "Video Capture Fail" << endl;
            break;
        }
        Mat img;
        cap >> img;

        static Mat blobFromImg;
        bool swapRB = true;
        blobFromImage(img, blobFromImg, 1Size(416416), Scalar(), swapRB, false);
        cout << blobFromImg.size() << endl; 
        
        float scale = 1.0 / 255.0;
        Scalar mean = 0;
        network.setInput(blobFromImg, "", scale, mean);

        Mat outMat;
        network.forward(outMat);
            // rows represent number of detected object (proposed region)
            int rowsNoOfDetection = outMat.rows;

            // The coluns looks like this, The first is region center x, center y, width
            // height, The class 1 - N is the column entries, which gives you a number, 
            // where the biggist one corrsponding to most probable class. 
            // [x ; y ; w; h; class 1 ; class 2 ; class 3 ;  ; ;....]
            //  
            int colsCoordinatesPlusClassScore = outMat.cols;
            // Loop over number of detected object. 
            for (int j = 0; j < rowsNoOfDetection; ++j)
            {
                // for each row, the score is from element 5 up
                // to number of classes index (5 - N columns)
                Mat scores = outMat.row(j).colRange(5, colsCoordinatesPlusClassScore);

                Point PositionOfMax;
                double confidence;

                // This function find indexes of min and max confidence and related index of element. 
                // The actual index is match to the concrete class of the object.
                // First parameter is Mat which is row [5fth - END] scores,
                // Second parameter will gives you min value of the scores. NOT needed 
                // confidence gives you a max value of the scores. This is needed, 
                // Third parameter is index of minimal element in scores
                // the last is position of the maximum value.. This is the class!!
                minMaxLoc(scores, 0, &confidence, 0, &PositionOfMax);
            
                if (confidence > 0.0001)
                {
// thease four lines are
// [x ; y ; w; h;
                    int centerX = (int)(outMat.at<float>(j, 0) * img.cols); 
                    int centerY = (int)(outMat.at<float>(j, 1) * img.rows); 
                    int width =   (int)(outMat.at<float>(j, 2) * img.cols+20); 
                   int height =   (int)(outMat.at<float>(j, 3) * img.rows+100); 

                    int left = centerX - width / 2;
                    int top = centerY - height / 2;


                    stringstream ss;
                    ss << PositionOfMax.x;
                    string clas = ss.str();
                    int color = PositionOfMax.x * 10;
                    putText(img, clas, Point(left, top), 12Scalar(color, 255255), 2false);
                    stringstream ss2;
                    ss << confidence;
                    string conf = ss.str();

                    rectangle(img, Rect(left, top, width, height), Scalar(color, 00), 280);
                }
            }
        
        namedWindow("Display window", WINDOW_AUTOSIZE);// Create a window for display.
        imshow("Display window", img);
        waitKey(25);
    }
    return 0;
}


OpenCV DNN Yolo darknet Youtube tutorial







Comments

Popular posts from this blog

Opencv 3.1 Tutorial Optical flow (calcOpticalFlowFarneback)

Farneback Optical flow Opencv simple C++ tutorial and code to achieve optical flow and farneback optical flow of moving an object in opencv video. Lets check the video example and the achieved result on my blog. Working and well describe code is included. 

Optical Flow Farneback parameters remarksYou need 2 images at least to calculate optical flow, the previous image (prevgray) and current image (img).  !! The previous image must be initialized first !!  Both images have to be gray scale. 
Result is computer in flowUmat which has same size as inputs but format is CV_32FC2

0.4- image pyramid or simple image scale
1 is number of pyramid layers. 1 mean that flow is calculated only from previous image.  12 is win size.. Flow is computed over the window larger value is more robust to the noise.  2 mean number of iteration of algorithm 8 is polynomial degree expansion recommended value are 5 - 7 1.2 standard deviation used to smooth used derivatives recommended values from 1.1 - 1,5
calcO…

Fast Opencv people pedestrian detection Tutorial by CascadeClassifier

Simple Opencv C++ tutorial and example of people detection in video samples and pictures. There is nothing you cannot achieve in few simple steps. People detection and performace tasks in opencv could be solved if you have little bit knowledge of programing. Or just simple follow the steps..  Opencv tutorial instalation of opencv You can simple prepare the project inside the Visual Studio 2015 by Nuget Packages. This approach is easy for beginers and better than standard installation with all the environmental variables problems. Just follow the installation steps inside here 


Opencv is great and complex tools. There is lot of image processing and also machine learning features. You can simply learn your own detector. I would like to prepare some tutorial how to learn your own detector. It is long time run. 
All, you need to do, is some experience and basic opencv tools.  under opencv/build/x64/vc14/bin
opencv_createsamples.exe
opencv_traincascade.exe
Prepare your dataset and files w…

Install opencv Visual Studio 2015

Install opencv for Visual Studio 2015  Opencv tutorial how to build opencv from source in Visual Studio 2015. This is usefull when the new version just release and there is no prebuild library awailable..  If you download prebuild libs for windows Visual studio some times agou there is problem the newest VS just mussing. Lets checkt the version of libraries and VS.
Prebuild libs are only for version VC11 and VC12. This mean Visual Studio 2012 and 2013.

This step helps you compile your own opencv libs for VC14  - Visual Studio 2015 Community edition.

Important !!
 Now a days just use NUGET packages in Visual studio and you can code under 1 minutes. here.

Prepare third party libs for opencv  This step depends on your requirements. If you want python lets install python. But i can reccomend to install following parts. 
Intel © Threading Building Blocks (TBB)Intel © Integrated Performance Primitives (IPP)http://www.ffmpeg.org Build opencv 3.0.0  Download Opencv 3.0.0 gold https://github.com…

Opencv C++ Tutorial, Mat Roi, Region of interest

Opencv ROI, Region of Interest Simple opencv C++ tutorial how to work with ROI. Code example about selecting the rectangle region of interest inside the image and cut or display part of the image from the bigger picture. There is nothing what is difficult about this. Only trick is about one line of code. 
Rect RectangleToSelect(x,y,width,height) Mat source; Mat roiImage = source(RectangleToSelect);
This is first post from this series. This simple opencv tutorials are all over the web. I would like to visualize all my steps through the code and //comment them. Each tutorial will contain small amount of step to keep reader focused.  First tutorial about mat resizeing is available under that link Mat Resize
I am using Visual Studio 2015,  How to use Opencv 3.0.0 with Visual Studio can be found here install opencv visual studio 2015. In Visual studio 2015 is best options to use NUGET packages, Here is described how to install Opencv by NUGET. It is easy. Working under one minute after you f…

Opencv VideoCapture File, Web Camera, RTSP stream

Opencv VideoCapture File, Camera and stream Opencv tutorial simple code in C++ to capture video from File, Ip camera stream and also the web camera plug into the computer. The key is to have installed the FFMPEG especially in case of reading the stream of IP cameras. In windows just use Opencv Installation by Nugets packages Here. Simple easy under 2 minutes installation. In Linux you need to follow the instruction below. If you are on Debian Like package system. Under Fedora Red hat dist just use a different approach. Code is simple and installation is the key.. 
Windows use nugets packages Linux you have to install and build Opencv With FFMPEG. Also simple.  It is easy to capture video in OpenCV Video capture
 in OpenCV is a really easy task, but for a little bit experienced user. 
What is the problem?
The problem is the installation of Opencv without recommended dependencies.

Just install all basic libs that are recommended on the website.
# Basic packagessudo apt-get -y install …

Opencv reading IP camera, Video stream, Web camera, images and

Opencv reading video files, reading video stream, Images, IP and Web cameras. I would like to cover this all in one post. Yes, video writer is also important to store your results and achievements in video. There is couple of simple trick and if you follow them, you will never have a problem with the reading and writing video, stream, files in future. Basic opencv web camera reading There is couple think you need to take care. My favorite installation on windows platform is trough NUGET package system. It is easy in few steps. I describe this many times for example VS 2017 here. Nuget set up your project without any linking settings, library path selection, global environmental variables and you can directly start coding in few seconds. Just select and install nuget and compile code below. Nothing else.  You need to take care if you have included several thinks. highgui.hpp core.hpp, imgproc.hpp, videoio, imgcodecs. All of them are not necessary to read the web camera but for example…

Opencv build and install Visual Studio 2017 Contrib library

Easy install and build of Opencv 3+ tested on 3.2 version with contributor library and additional features described step by step, picture by picture. After this tutorial you can modify setting of CMAKE project according to HW possibilities and available libraries to build your own Opencv library. Most of the time, Prebuild libs with already generated DLL, LIBS are used to start project and coding. In case, that new visual studio 2017 is available there is no prebuild libraries for VS141, Thich is from my point of view confusing naming of Libraries compatible with Visual Studio 2017. Opencv VS 2017 install options Alternatives to this tutorial. You can skip this.  There is possibility use some compatibility pack downloaded to VS140 and use same prebuild library as in case of Visual Studio 2015 this is described hereThe second way is to try use some prebuild NUGET package. I am using nugets a lot. Simple installation under one line of code inside nuget packages console. here
Opencv Ins…

Head and people detection in opencv

LBP cascade for detect head and people in opencv  LBP cascade free to download to use in opencv to detect people and heads. Code example and cascade description. All you need to write your own people head detector from the youtube video.
Cascade is trained on my own people and head datasets. There are no perfect but in some cases are better then default opencv cascades. They are just different.. For example you can count that the head detector have much more false detection than the people detector.. The shape and feature space is much more common and close to others shapes than the whole people detector.


Issues with opencv detectMultiScale head and people detector Please let me know if cascades worked as expected.. In code example there is ground threshold settings and reccomentation. 



LBP cascade head detection properties Sure you can find inside file.
<!-- This is just basic 16 stage lbp cascade head detector develop by  V.K. from https://funvision.blogspot.com --> <?xml …

Opencv 3.1 people detection by CascadeClassifier

People detection by Cascade Classifier Opencv The comparison of Opencv Cascade for people detections. Default opencv cascades are good one. You can simply achive better result but you need to collect the training data. On my blog you can find the datasets for car detection. There is more than 2000 positives car samples available for you and test to learn your own detector..  Here in this article, I just want to show my cascade compare to cascade which are default in opencv.  LBP cascade for people detection Also I recently publish LBP cascade for people detection. You can download 12 stage LBP cascade LBP cascade for download. There is couple or maybe more than 3 tutorials how to use detect multiscale in opencv. Just download the LBP cascade and enjoy coding in Opencv..


Haar Cascade Classifier for People detection In this short tutorial, I would like to test some standard haar cascade in opencv 3.1.  My first note is where you can find this cascades. When you download opencv for wind…

Opencv C++ Tutorial Mat resize

Opencv Mat Resize   Resize the Mat or Image in the Opencv C++ tutorial. It is obviously simple task and important to learn. This tutorial is visualized step by step and well-described each of them. The main trick is in that simple code.

Mat Input;
Mat Resized;
int ColumnOfNewImage = 60;
int RowsOfNewImage = 60;
resize(Input, Resized, Size(ColumnOfNewImage,RowsOfNewImage));

This code just takes an Input image and resized save to output Mat. How big is the resized image is based on the Size? Size just contains two parameters. Simple numbers of how the result should be big. The simple number of columns (width) and rows (height). That is basically it. Enjoy

                                                Boring same face again and again. 


Load Image, resize and save Opencv C++ #include <Windows.h> #include "opencv2\highgui.hpp" #include "opencv2\imgproc.hpp" #include "opencv2\video\background_segm.hpp" #include "opencv2\video\tracking.hpp"
using n…