Skip to main content

Featured Post

OpenCV 4.5 simple optical flow GPU tutorial cuda::FarnebackOpticalFlow

This OpenCV tutorial is a very simple code example of GPU Cuda optical flow in OpenCV written in c++. The configuration of the project, code, and explanation are included for farneback Optical Flow method. Farneback algorithm is a dense method that is used to process all the pixels in the given image. The dense methods are slower but more accurate as all the pixels of the image are processed. In the following example, I am displaying just a few pixes based on a grid. I am not displaying all the pixes. In the opposite to dense method the sparse method like Lucas Kanade using just a selected subset of pixels. They are faster. Both methods have specific applications. Lucas-Kanade is widely used in tracking. The farneback can be used for the analysis of more complex movement in image scene and furder segmentation based on these changes. As dense methods are slightly slower, the GPU and Cuda implementation can lead to great performance improvements to calculate optical flow for all pixels o

Simple Opencv tutorial for yolo darknet object detection in DNN module

This tutorial will learn you how to use deep neural networks by Yolo Darknet to detect multiple classes of objects. The code is under 100 lines of simple code. The code is using yolov3-tiny.weights of neural network and appropriate configuration yolov3-tiny.cfg. The code is presented as simple as possible, without the staff nice to have,  but not necessary to understand the flow of the code. 
people detection deep neural network
Yolo darknet in opencv

OpenCV 4.x requirements for DNN module running Yolo (yolov3-tiny)

I am using OpenCV 4.2 compiled on windows machines with contribution modules. I do not compile OpenCV with any special backend, like Cuda and, etc. You can found the description of how to compile OpenCV by CMake for Visual Studio 2019 here on my blog. Just exclude the GStreamer related specialties. 

How to setup Opencv Visual Studio 2019 Project

This is very common. No big deal here. In project properties for Release configuration and Platform for x64 the additional include directories point into location to include directory under your  OpenCV installation. 
opencv deep neural network project
 Additional library Directories point to the location of x64/vc16/lib under the directory, where the opencv is installed.

opencv yolo visual studio
The list of additional dependencies contains the standard .lib as on the following picture and one from the extra modules opencv_dnn420.lib. 
opencv yolo in deep neural network module

Prerequisites for OPENCV DNN to run yolo neural networks

The following yolov3-tiny.weights, yolov3-tiny.cfg needs to be downloaded from Yolo darknet site. The first one contains the weights values of the neural network and the second .cfg the configuration. I put these two into the same directory, where Visual Studio is generating .exe file for my project. 
  • yolov3-tiny.weights
  • yolov3-tiny.cfg
We have all the needed prerequisites.
Yolo opencv dnn module
I am using YOLOv3-tiny model

Visual studio project structure for Opencv DNN

I will show you, where the models are located and how to loaded. This is why this chapter s located after the Yolo models are downloaded. The structure of my Visual Studio project is as follows. DNN is the root directory for my project. Under DNN is one DNN.sln solution, and two DNN and x64. The source code is located under DNN and the executable file is located in x64/Release. This depends if you are building a project in a release or debug setting in Visual Studio. 

Visual studio
In the release directory is the executable module DNN.exe. The opencv_xxxxxx420.dll is located in this folder for simplicity. You can skip the specification of the path system variable in this case. I have located here the input video samp.MOV. The last important thing is that yolo3-tiny.weights and yolo3-tiny.cfg downloaded from yolo website are located here as well. 

opencv project

Opencv DNN running Yolo darknet code high-level picture


int main()
{
    //* video capture setting    
    //* basic parameters 
    //* path to load model
    //* load neural network and config
    //* Set backend and target to execute network 
    for (;;)

    {
        if (!cap.isOpened()) {
            cout << "Video Capture Fail" << endl;
            break;
        }
        //* capture image
        Mat img;
        cap >> img;
        //* create blob from image
        //* Set blob as input of neural network.
        //* perform network.forward evaluation
        //* process the output layer evaluation 
        //  region proposal (most difficult step)
        //* display image
    }
    return 0;
}

Opencv yolo darknet DNN code explanation in details

I will skip some details about VideoCapture from the file. The code related to DNN started by:

    std::string model = "C:/Users/Vlada/Desktop/DNN/x64/Release/yolov3-tiny.weights";
    std::string config = "C:/Users/Vlada/Desktop/DNN/x64/Release/yolov3-tiny.cfg"; 

    Net network = readNet(model, config,"Darknet");
    network.setPreferableBackend(DNN_BACKEND_DEFAULT);
    network.setPreferableTarget(DNN_TARGET_OPENCL);

the cv::dnn::Net class allows you to create various deep neural network structures, based on the types of implemented layers.  The Net class is initialized by readNet function that reads network represented by various formats.
The first parameter of readNet is the location of the neural network model - weights definition, the second parameter is the configuration of the network and the last is the name of the framework (darknet in our example).  
The model supported in readNet:
*.caffemodel (Caffe, http://caffe.berkeleyvision.org/)
*.pb (TensorFlow, https://www.tensorflow.org/)
*.t7 | *.net (Torch, http://torch.ch/)
*.weights (Darknet, https://pjreddie.com/darknet/)
*.bin (DLDT, https://software.intel.com/openvino-toolkit)
*.onnx (ONNX, https://onnx.ai/)
The config supported in readNet:
*.prototxt (Caffe, http://caffe.berkeleyvision.org/)
*.pbtxt (TensorFlow, https://www.tensorflow.org/)
*.cfg (Darknet, https://pjreddie.com/darknet/)
*.xml (DLDT, https://software.intel.com/openvino-toolkit)

Net class .setPreferableBackend(DNN_BACKEND_DEFAULT)

This set computational backend for DNN. The best is to use the default value. If you build OpenCV with Inference engine by intel the default is DNN_BACKEND_INFERENCE_ENGINE. The DNN_BACKEND_HALIDE is a popular backend but a little bit difficult to build on a windows machine. This backend is built in LLVM compiler. I have issues to properly build OpenCV by LLVM with HALIDE computational backend. Just default is the best for the start. 

Net class .setPreferableTarget(DNN_TARGET_OPENCL)

This represents the build target that can speed up your computation by special hardware instructions of the target. I will try to use just the first three if your architecture allows. 

DNN_TARGET_CPU
DNN_TARGET_OPENCL
DNN_TARGET_OPENCL_FP16
DNN_TARGET_MYRIAD
DNN_TARGET_FPGA

The network is not loaded and Net class should be successfully initialized.

Opencv blobFromImage preprocessing for yolov3-tiny

The yolo3-tiny.cfg using the input layer of scale 416 x 416. This is reflected in blobFromImage, where input img is processed into blob 4D array. The width and height Size(416416)of the input layer. The blobFromImage is using an image as the first argument and produces output blob as the second argument of the function. The third argument is normalization. I am performing normalization after on network.setInput in scale parameter. This mean that the normalization of the blob is not needed the 1 to 1 with the original image.  The fourth argument is the Size, which needs to match the size of input layer. Since OpenCV sometimes using BGR color schema instead of RGB. The swapRB is needed and produces exactly BGR -> RGB transformation of color channels.
yolo3-tiny.cfg

Input blob needs to be normalized (RGB is color scale 0-255 for each channel). This normalization is into float from 0 - 1, The scale parameter normalize all intensity values into the range of 0-1 of blobFromImg in function network.setInput(  ,    , scale,    ) parameter. The mean subtraction
value. We can ignore the mean, but the scale is important. The general normalization of the blob in
setInput follows this equation: input(n,c,h,w)=scalefactor×(blob(n,c,h,w)−mean).

        bool swapRB = true;
        blobFromImage(img, blobFromImg, 1Size(416416), Scalar(), swapRB, false);
    
        float scale = 1.0 / 255.0;
        Scalar mean = 0;
        network.setInput(blobFromImg, "", scale, mean););
The input is set and netutal network forward evaluation can be performed.

Evaluate yolo neural network model

The following code will take input set in previous step and perform forward evaluation trough the network. The output is written into a preprepared outMat Mat container. 

        Mat outMat;
        network.forward(outMat);

Display results of yolo (yolov3-tiny) neural network 

I think This is the most difficult part. Do not afraid of this step at all. The outMat.rows and cols just evaluate the dimension of the output matrix. 
        // rows represent number of detected object (proposed region)
        int rowsNoOfDetection = outMat.rows;

        // The columns looks like this, The first is region center x, center y, width
        // height, The class 1 - N is the column entries, which gives you a number, 
        // where the biggest one corresponding to most probable class. 
        // [x ; y ; w; h; class 1 ; class 2 ; class 3 ;  ; ;....]
        //  
        int colsCoordinatesPlusClassScore = outMat.cols;

ONE row is exactly one region. The yolo darknet output is a region proposal. One row is one region and number of rows is number of the proposed region to consider. The row looks like this. 
[x ; y ; w; h; class 1 ; class 2 ; class 3 ;  ; ;....]
The x y w h values are the coordinates, width and heigth of proposed region. The class 1; to class n is number that evaluates the most probable class of detected object in this concrete region. This is super simple. Just have a look at the example. 

Yolo region proposal format

This outMat has 3 rows and 7 columns. This means 3 proposed region with the detected object. 
You can need to find the highest score from column 5 to column 7. For the first row is the car,  the second is the bus and the last is the truck. There is not a car bus and truck in the column. This is just the index of the score that belongs to the class. After you find the highest score, You will perform evaluation if this score is higher than your threshold. If yes, you can take the first four columns to write the rectangle. You can name the rectangle based on index of the higher score. 
[x ; y ; w; h; car 0.1 ; bus 0 ; truck 0.01]
[x ; y ; w; h; car 0 ; bus 0.9 ; truck 0.4]
[x ; y ; w; h; car 0 ; bus 0.3 ; truck 0.9]

This for loops process the rows (proposed regions one by one). The Mat called scores is just one row(j) from column 5 up to the size (number of columns). We are just skipping the information about the region coordinates. 

for (int j = 0; j < rowsNoOfDetection; ++j)
            {            // for each row, the score is from element 5 up to number of classes index
                                              //(5 - N columns)
             Mat scores = outMat.row(j).colRange(5, colsCoordinatesPlusClassScore);

The following code find the maximum in the score matrix and put into confidence variable and Position of the maximum to the PositionOfMax variable. 
                Point PositionOfMax;
                double confidence;
                minMaxLoc(scores, 0, &confidence, 0, &PositionOfMax);

Display the yolo region proposal results

If the confidence passes the threshold 0.0somethink. The region can be displayed. Now The first four elements of outMat row can be valuable. outMat.at<float>(j, 0) takes firs element of j row. the (j,1) takes the y, (j,2) takes width and the last fourth element of the row is height. Based on the calculated values the rectangle is displayed over the image. The position of the max is transferred into the string, which is the numeric value of the image. If you want to have a name instead of the numeric representation of the class. You need to have table to translate 1 to Car for each class. 

                if (confidence > 0.0001)
                {
                    int centerX = (int)(outMat.at<float>(j, 0) * img.cols); 
                    int centerY = (int)(outMat.at<float>(j, 1) * img.rows);
                    int width =   (int)(outMat.at<float>(j, 2) * img.cols+20); 
                    int height =   (int)(outMat.at<float>(j, 3) * img.rows+100); 
                    int left = centerX - width / 2;
                    int top = centerY - height / 2;


                    stringstream ss;
                    ss << PositionOfMax.x;
                    string clas = ss.str();
                    int color = PositionOfMax.x * 10;
                    putText(img, clas, Point(left, top), 12Scalar
                    (color, 255255), 2false);
                    stringstream ss2;
                    ss << confidence;
                    string conf = ss.str();
                   
                    rectangle(img, Rect(left, top, width, height), 
                    Scalar(color, 00), 280);

OpenCV DNN Yolo darknet full tutorial code sample for yolov3-tiny

#include <iostream>
#include <opencv2/opencv.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/video.hpp>
#include <opencv2/dnn.hpp>
#include <opencv2/videoio.hpp>
#include <opencv2/imgproc.hpp>
using namespace cv;
using namespace std;
using namespace dnn;

int main()
{
    VideoCapture cap("C:/Users/Vlada/Desktop/DNN/x64/Release/samp.MOV");
    std::string model = "C:/Users/Vlada/Desktop/DNN/x64/Release/yolov3-tiny.weights";  //findFile(parser.get<String>("model"));
    std::string config = "C:/Users/Vlada/Desktop/DNN/x64/Release/yolov3-tiny.cfg"; //findFile(parser.get<String>("config"));

    Net network = readNet(model, config,"Darknet");
    network.setPreferableBackend(DNN_BACKEND_DEFAULT);
    network.setPreferableTarget(DNN_TARGET_OPENCL);

    for (;;)
    {
        if (!cap.isOpened()) {
            cout << "Video Capture Fail" << endl;
            break;
        }
        Mat img;
        cap >> img;

        static Mat blobFromImg;
        bool swapRB = true;
        blobFromImage(img, blobFromImg, 1Size(416416), Scalar(), swapRB, false);
        cout << blobFromImg.size() << endl; 
        
        float scale = 1.0 / 255.0;
        Scalar mean = 0;
        network.setInput(blobFromImg, "", scale, mean);

        Mat outMat;
        network.forward(outMat);
            // rows represent number of detected object (proposed region)
            int rowsNoOfDetection = outMat.rows;

            // The coluns looks like this, The first is region center x, center y, width
            // height, The class 1 - N is the column entries, which gives you a number, 
            // where the biggist one corrsponding to most probable class. 
            // [x ; y ; w; h; class 1 ; class 2 ; class 3 ;  ; ;....]
            //  
            int colsCoordinatesPlusClassScore = outMat.cols;
            // Loop over number of detected object. 
            for (int j = 0; j < rowsNoOfDetection; ++j)
            {
                // for each row, the score is from element 5 up
                // to number of classes index (5 - N columns)
                Mat scores = outMat.row(j).colRange(5, colsCoordinatesPlusClassScore);

                Point PositionOfMax;
                double confidence;

                // This function find indexes of min and max confidence and related index of element. 
                // The actual index is match to the concrete class of the object.
                // First parameter is Mat which is row [5fth - END] scores,
                // Second parameter will gives you min value of the scores. NOT needed 
                // confidence gives you a max value of the scores. This is needed, 
                // Third parameter is index of minimal element in scores
                // the last is position of the maximum value.. This is the class!!
                minMaxLoc(scores, 0, &confidence, 0, &PositionOfMax);
            
                if (confidence > 0.0001)
                {
// thease four lines are
// [x ; y ; w; h;
                    int centerX = (int)(outMat.at<float>(j, 0) * img.cols); 
                    int centerY = (int)(outMat.at<float>(j, 1) * img.rows); 
                    int width =   (int)(outMat.at<float>(j, 2) * img.cols+20); 
                   int height =   (int)(outMat.at<float>(j, 3) * img.rows+100); 

                    int left = centerX - width / 2;
                    int top = centerY - height / 2;


                    stringstream ss;
                    ss << PositionOfMax.x;
                    string clas = ss.str();
                    int color = PositionOfMax.x * 10;
                    putText(img, clas, Point(left, top), 12Scalar(color, 255255), 2false);
                    stringstream ss2;
                    ss << confidence;
                    string conf = ss.str();

                    rectangle(img, Rect(left, top, width, height), Scalar(color, 00), 280);
                }
            }
        
        namedWindow("Display window", WINDOW_AUTOSIZE);// Create a window for display.
        imshow("Display window", img);
        waitKey(25);
    }
    return 0;
}


OpenCV DNN Yolo darknet Youtube tutorial







Comments

Popular

Opencv GStreamer (windows) video streaming tutorial + full source code for RTSP HLS streaming

Opencv C++ simple tutorial to use GStreamer to send video to Server that converts RTSP to HLS video stream. The source code and all needed configurations are included.  O pencv is a powerful computer vision library. You can use it in production and use it for image and video processing and modern machine learning. In some applications, You may want to stream your processed video results from your C++ OpenCV app outside and not just use a simple OpenCV graphical interface. The video streaming of your results is what you are looking for. Do you want to stream processed video from your IoT device? Yes, This is mainly for Linux. Do you want to stream processed video to the Web player, broadcast the video or just use VLC to play video processed by OpenCV? You may be interested in reading the next lines.  Opencv video stream to VLC or WEB There are basically two main options with OpenCV. The first one is to write a streaming application using FFMPEG. This is a little bit more advanced appro

Opencv Web camera and Video streams in Windows subsystem for Linux WSL, by FFmpeg and GStreamer

Opencv in Windows Subsystem for Linux (WSL) is a compatibility layer designed to running Linux binary executables (in ELF format) natively on Windows 10. I love it. There are some limitations to mention. The first biggest is the lack of support of CUDA, which could be a limitation for deep learning application and learning in WSL. The second trouble for Opencv development is the lack of Web camera support. This suspends WSL almost on a useless level for me until now.  VideoCapture cap;   is not working in WSL for now cap.open(0);  FFMPEG to WSL opencv program and back to WEB browser in windows This Video capture is right now not possible at in Ubuntu running under Windows (WSL). I will hit this limitation in this article. I will show you how to reach a video camera and learn something more about video streaming. Yes, the opencv processed frames will be stream to the web player on simple web site. Check the goal of this opencv tutorial on this video What you will learn

Compile Opencv with GStreamer for Visual Studio 2019 on windows 10 with and contribution modules

The goal of this tutorial is a simple step by step compilation of Opencv 4.2 with contribution extra modules with GStreamer as a bonus. The environment is Windows 10, Visual Studio 2019 C++ application. This took me almost one day of correcting of CMake setting. The goal of this tutorial is: compiled a set of OpenCV libraries with GStreamer and FFmpeg on Windows. I focus mainly on GStreamer. It is a little bit more tricky. You will reach the following information about your Opencv environment by compile and run this simple code. The Opencv GStreamer is turned as YES. GStreamer gives you a great opportunity to stream OpenCV output video outside of your program, for example, web application. I recently compiled with opencv 4.4. The update at the end of the post.  It is working!! wow, The working app and configuration in future tutorials. #include   <opencv2/opencv.hpp> #include   <iostream> using namespace cv; int   main () {      std ::cout <<  &q

Opencv HSL video stream to web

This tutorial will show you all components, configuration, and code needed to steam video output results from Opencv C++ to your Web player. The C++ program will take any video input and process this video. The processed video will be stream outside of OpenCV using the GStreamer pipeline (Windows part). The HLS video produces one Playlist file and several MPEG-TS segments of videos. This several HLS outputs are stored in the Windows file system. I am using WSL 2, windows subsystem for Linux to run Ubuntu distribution. Here the NGINX is installed with the RTMP module. NGINX is distributing a video stream from the windows file system to the web.  Let's have a look at this in more detail.  What is covered? Opencv C++ part + GStreamer pipeline NGINX configuration Architecture Web Player for your HLS stream What is not covered? Detailed instalation of Opencv + Gstreamer more here  GStreamer installation  ,  GStreamer Installation 2  on windows Detailed installation of NGINX + RTMP modul

OpenCV 4.5 simple optical flow GPU tutorial cuda::FarnebackOpticalFlow

This OpenCV tutorial is a very simple code example of GPU Cuda optical flow in OpenCV written in c++. The configuration of the project, code, and explanation are included for farneback Optical Flow method. Farneback algorithm is a dense method that is used to process all the pixels in the given image. The dense methods are slower but more accurate as all the pixels of the image are processed. In the following example, I am displaying just a few pixes based on a grid. I am not displaying all the pixes. In the opposite to dense method the sparse method like Lucas Kanade using just a selected subset of pixels. They are faster. Both methods have specific applications. Lucas-Kanade is widely used in tracking. The farneback can be used for the analysis of more complex movement in image scene and furder segmentation based on these changes. As dense methods are slightly slower, the GPU and Cuda implementation can lead to great performance improvements to calculate optical flow for all pixels o

Opencv C++ Tutorial Mat resize

Opencv Mat Resize   Resize the Mat or Image in the Opencv C++ tutorial. It is obviously simple task and important to learn. This tutorial is visualized step by step and well-described each of them. The main trick is in that simple code. Mat Input; Mat Resized; int ColumnOfNewImage = 60; int RowsOfNewImage = 60; resize(Input, Resized, Size( ColumnOfNewImage , RowsOfNewImage )); This code just takes an Input image and resized save to output Mat. How big is the resized image is based on the Size? Size just contains two parameters. Simple numbers of how the result should be big. The simple number of columns (width) and rows (height). That is basically it. Enjoy                                                 Boring same face again and again.  Load Image, resize and save Opencv C++ #include <Windows.h> #include "opencv2\highgui.hpp" #include "opencv2\imgproc.hpp" #include "opencv2\video\background_segm.hpp" #include &qu

Opencv C++ tutorial : Smoothing, blur, noise reduction / canceling

Smooth or blur, gaussian blur, and noise-canceling, This tutorial will learn OpenCV blur, GaussianBlur, median blur functions in C++. Additionally, the advanced technique for noise reduction  fastNlMeansDenoising family  will be introduced with a code example for each method.   You can use blurring of the image to hide identity or reduce the noise of the image.  Blur can be a very useful operation and it is a very common operation as well. For example, the anonymization of pedestrians, face or is one possible target for blue operation. The blur is the most common task to perform over the image to reduce noise. The noise reduction is more task for Gaussian blur than for simple blur operation. The various blur operations are very common for image processing on mobile devices.  The more important is the robustness issues of the data in pre-processing for machine learning. Sometimes, by blurring the images of the dataset can have a positive effect on the robustness of the achieved de

Opencv tutorial RTMP video streaming to NGINX restream as HLS

Video streaming Tutorial of sending processed Opencv video to NGINX and distributing video from NGINX (broadcast) by HLS stream for a wider audience, like multiple web players, VLC, or any other video stream receiver. Opencv application HLS streaming by GStreamer and NGINX  We will use GStreamer to send video from the Opencv application by rtmp2sink to an RTMP module in NGINX. In our example, the server is a widely used NGINX server with an Nginx-RTMP-module. The NGINX will receive RTMP video from Opencv and restream as an HLS video stream considered for multiple end consumers. This is a follow-up to the previous article about Video streaming from Opencv to RtspSimpleServer by rtsp protocol.   The goal is the same. Send video from Opencv to the server and restream the video for a wider audience. The difference is that RtspSimpleServer running on windows, NGINX is running in docker (WSL2). The one-to-one communication between Opencv and RtspSimpleServer was established by RTSP protocol

Opencv 4 C++ Tutorial simple Background Subtraction

This method is used to learn what belongs to the background of the image and what belongs to the foreground. The static cameras that monitor the area can very easily recognize, what is part of the image that is always here or there is something that is new and moving over the background.  Background subtraction Visual studio 2019 project setup If you have Opencv 4+ compiled or installed only steps you need to do is set the include directory with OpenCV header files. Set the Additional library Directories that point to \lib folder. Just note that Visual Studio 2019 should use VC16 \lib. Finally, As additional dependencies, specify the libs used to resolve the function implementation in the code. The list for Opencv 420 is here. The different version of opencv is using different numbering for example opencv 440 will use opencv_core440.lib.  opencv_bgsegm420.lib opencv_core420.lib opencv_videoio420.lib opencv_imgproc420.lib opencv_highgui420.lib opencv_video420.lib  Background sustract

Opencv VideoCapture File, Web Camera, RTSP stream

Opencv VideoCapture File, Camera and stream Opencv tutorial simple code in C++ to capture video from File, Ip camera stream and also the web camera plug into the computer. The key is to have installed the FFMPEG especially in case of reading the stream of IP cameras. In windows just use Opencv Installation by Nugets packages  Here . Simple easy under 2 minutes of installation. In Linux, you need to follow the instruction below. If you are on Debian Like package system. Under Fedora Red hat dist just use a different approach. Code is simple and installation is the key..  Windows use nugets packages Linux you have to install and build Opencv With FFMPEG. Also simple.  It is easy to capture video in OpenCV Video capture  in OpenCV is a really easy task, but for a little bit experienced user.  What is the problem? The problem is the installation of Opencv without recommended dependencies. Just install all basic libs that are recommended on the website. # Basic packa