Convolutional Neural Networks have become the first choice to extract information from visual data. For example, they are used in the Google search engine to classify images. Essentially, they mimic the way a human being recognizes images.
In order to do that, the system learns to recognize certain characteristics of an object. In the case of a person, this could be the limbs or head and face. It then produces a trained model.
What’s great about it: The algorithm learns the characteristics of an object on its own in the training process, there’s no need to point them out manually.
Of course, the system only detects objects it was previously trained with. The downside: You usually need a pretty fast graphics card to train a model or run inferences in it.
In this post, we use neural networks (specificly CNNs) to classify objects in a video stream on a cheap, ordinary PC without a dedicated GPU at all.
This is made possible by the recently released Movidius Neural Compute Stick.
Movidius Neural Compute Stick
The Neural Compute Stick (NCS) is a tiny computer meant to accelerate the execution of neural networks. The entire hardware is packed into a USB-stick which is compatible with USB 2.0 or later. However, it is recommended to use the device on a USB 3.0 port if possible.
Internally, the NCS uses a so-called Vision Processing Unit that goes by the name of Myriad 2. This relatively new kind of micro-processor is specifically made for machine-vision related tasks and thus very energy-efficient. According to the manufacturer, the typical power consumption of the VPU is around 1 Watts. The NCS is on sale for around 80 US-Dollars, but only available in a few online-shops as of September 2017.
The Movidius Neural Compute Stick is not a universal solution for deep learning. It is not possible to train a model on it. Instead, you can only run inferences on input data like a video stream with a pre-trained model. This means you will need to train your model first on your computer, which, as already mentioned, can take lots of time. Luckily, the NCS community already provides some pre-trained models that are available for free. Theoretically every model that was made with the deep-learning library Caffe (.caffemodel) is compatible, but you will have to convert it for the NCS. Also, make sure the model uses one input variable only. The NCS is currently limited to that.
In the screenshot below we see the classification of images from a webcam input stream. This example uses the GoogLeNet.
Classification of images from a webcam stream on the NCS running GoogLeNet.
Preparation and setup:
The software you need to program the Neural Compute Stick and use it on the target platform is available for download on the Movidius page. That includes some example applications like the code for the application in the screenshot above. The models for that are also available for download, among which are some widely-known ones like the AlexNet or the GoogLeNet.
Requirements for the development platform:
- x64-PC running a native Ubuntu Linux, virtual machines are not supported
- Windows and MacOS are (currently) not supported
- The Linux distribution has to be an Ubuntu 16.04 LTS.
- On Ubuntu you need to have Python version 3.5.2 installed.
Note: The target platform does not have to meet these requirements. In case you are wondering: Yes, you can also plug it into your Raspberry Pi. Movidius mentions explicitly that the NCS is compatible.
Requirements to run the example code for the NCS:
Installing Toolkit and API:
Before we can use the Movidius Neural Compute Stick, the API and Toolkit have to be installed. The Toolkit is used to convert or test a model for the NCS, the Movidius API allows you to access the functionality of the Neural Compute Stick, for example wih Python. Here is the download page: https://ncsforum.movidius.com/discussion/98/latest-version-movidius-neural-compute-sdk.
The installation is done by simply running a Bash-script in the unzipped Toolkit/API folder.
Keep in mind that the script, especially the one for the Toolkit, can take a lot of time to complete (15-30 minutes) and needs a stable internet connection.
Before we can use a model with the stick, we have to convert it with this script from the Toolkit:
$ python3 ./mvNCCompile.pyc sample_network.prototxt -w sample_network.caffemodel -s 12 -o name_of_outputfile
We can now use the generated “graph” file (default name, if not specified) in an application for the NCS.
Object detection with Tiny YOLO
Now comes the hard part. We want to be able to find various objects in a given scene and identify what they are. The provided models by Movidius are not up to this task.
They can only classify one object at a time respecitvely per frame. They are based on the assumption that an input image shows only one relevant object in close-up.
YOLO (You Only Look Once) in contrast is a neural network which can detect and localize multiple objects in one frame.
On top of that, YOLO can tell persons apart from objects in a given scene.
Tiny YOLO is the small brother of YOLO, a resource saving alternative for weaker devices. Thanks to various optimizations it enables the NCS to run object detection almost in realtime (approximately 0.2s processing time per frame). Naturally, this comes at a cost and so the error rate increases noticeably. However, for our purposes the detection is still sufficiently accurate.
A few developers on the Movidius Forums have already ported Tiny YOLO to the NCS. The result of their efforts is available for download on GitHub: https://github.com/gudovskiy/yoloNCS.
Here you can find some example code written in Python and the Tiny YOLO model itself.
Setup Tiny YOLO on the NCS
To run the sample code for Tiny YOLO, some additional software is required on the development and target platform. You have to build OpenCV as well as ffmpeg from source on your platform. A simple installation via Python’s own package manager pip is not sufficient, because this “light” edition of OpenCV is missing some important parts.
This means, OpenCV can probably not access your camera and start the video-stream. Furthermore, you need a webcam with Linux-compatible drivers.
Let’s start with ffmpeg. To install ffmpeg under Linux, follow the instructions in the official guide: https://trac.ffmpeg.org/wiki/CompilationGuide/Ubuntu.
Once the installation is finished, use the command below in a new terminal to check whether ffmpeg can find and use your camera.
Replace /dev/video0 with your video source.
$ ffmpeg -f v4l2 -list_formats all -i /dev/video0
The output should look something like this:
[video4linux2,v4l2 @ 0x2753960] Raw : yuyv422 : YUYV 4:2:2 : 640x480 352x288 320x240 176x144 160x120 1280x720 640x480
[video4linux2,v4l2 @ 0x2753960] Compressed: mjpeg : Motion-JPEG : 640x480 352x288 320x240 176x144 160x120 1280x720 640x480
After that, we’re done with ffmpeg. Let’s move on to OpenCV: http://docs.opencv.org/2.4/doc/tutorials/introduction/linux_install/linux_install.html.
To roughly check if OpenCV has been installed sucessfully, you can run some samples in your installation folder under /bin if you like.
As already mentioned in the introduction to the Movidius-Stick, the Caffemodel of Tiny YOLO needs to be converted in a format that is compatible with the NCS. The software for this task is included in the Movidius Toolkit.
The exact command is:
$ python3 ./mvNCCompile.pyc your_path/yolo_tiny_deploy.prototxt -s 12
Before you execute it, make sure the Caffemodel and the corresponding Prototxt-file are in the same folder. Both files must have the same name, otherwise the conversion will silently fail without throwing an error. The generated model is then useless.
Tiny YOLO on the NCS
Now we are finally ready to see Tiny YOLO in action. As a last step, move the “py_examples” folder from the “yoloNCS” repo to the “ncapi” (the unpacked NCS-API) folder. Alternatively, you can create a symlink. The example code includes these two samples:
- yolo_example: Will detect objects with the Tiny YOLO model in an .jpg image and highlights found objects in the image.
- yolo_object_detection_app: Will detect objects in a video stream from your webcam and highlights found objects in a video.
Let’s start the “object_detection_app” with Python 3. If everything is set up correctly, you will now see the video stream of your webcam in which Tiny YOLO highlights objects it has learned. The numbers state to what extent the detected objects resemble the trained template. The level of similarity from which the system considers an object as “detected” is configurable. This means we either end up with more false positives or false negatives, depending on our requirements.
This last screenshot and the video at the top of this page demonstrate what the Tiny YOLO object detection looks like with a webcam.
Object detection in a webcam stream on the NCS running the Tiny YOLO model.
The model, as it is trained now, detects 20 different things, among which are persons and a bunch of animals. The detection of human beings works best. “Lifeless” items however are sometimes not recognized.