With this series, we would like to give you an understanding of different machine and deep learning approaches, illustrated by the example of recognizing diesel vehicles. In this article, we have summarized the approach based on deep learning in neural networks. For other techniques, please refer to the other parts of this series.
Relevant links and the other parts can be found here:
>Deep Diesel – Part 1: Machine & Deep Learning for diesel car detection
>Deep Diesel – Part 2: Machine Learning diesel car detection using a HOG detector
>Deep Diesel – Part 3: Deep Learning diesel car detection using AWS DeepLens
>codecentric.ai youtube channel
As hardware we used the AWS Deeplens, which offers the possibility to use neural networks efficiently in an Edge Device by using a built-in Intel GPU. For the implementation we used the Google Deep Learning Framework TensorFlow and adapted a neural network to our task. The objective is summarized in part 1. As an example task, we recognize the green environment zone badge, which can be used as a marker for driving bans as soon as the blue sticker for cleaner EURO6 cars is introduced. In part 2 we have also shown that it is also possible to recognize diesel type plates (e.g. TDI). This also can be applied to car parts or vehicle types.
Object recognition with neural networks
In part 2 we recognized diesel cars based on geometric features, whereby the limitations of the approach became clear. Neural networks include more information for the recognition of features (environmental stickers, type plates, vehicle types) of a diesel vehicle, e.g. color, position, context,… . Compared to a pure classification of images, object recognition is a difficult task, because instead of outputting a class with a single output neuron, a bounding box and a probability need to be returned. There are different methods (e.g. Single Shot Detection, Faster R-CNN or R-FCN) which can be used here. Details on the different methods can be found in the paper Speed/accuracy trade-offs for modern convolutional object detectors.
In contrast to “classical” machine learning methods, we do not start from zero in building the detector, but draw on a pre-trained neural network that we merely teach the additional classes of the additional objects. This drastically reduces the required number of training data, so that we were able to achieve initial results with just a few hundred images. Since we want to use AWS Deeplens at runtime, there are special requirements for the architecture of the neural network. We decided to use an Inception v2 architecture, which we had to change in the course of the process, since we had new findings that made it inevitable.
As hardware platform, we use an AWS Deeplens, which our colleagues brought along from the last re:Invent. The “Deep Learning Camera” is well suited for testing in our scenario for various reasons: it works stand-alone just with power, has an Intel Gen-9 GPU, support for multiple machine learning frameworks, built-in WiFi, and integration with AWS IoT messaging services, logging, easy deployment and more. AWS Deeplens runs Ubuntu Linux and can be connected to a monitor, keyboard, and mouse if required. Both the input and the output stream (after inference by our trained detector) can be accessed locally or via stream if required.
An interesting concept is the integration of Deeplens into AWS as well as the deployment of the neural network (model) and the executing code (lambda). Deeplens is treated like any IoT device in AWS. A Greengrass instance is running locally, which takes over the code execution and messaging. Greengrass allows tasks that would otherwise be executed in the AWS cloud to be moved to an ‘edge device’ – such as a Lambda function. During deployment, a package, in our case consisting of a neural network and a Python code package, is simply pushed onto the device, gets unpacked and is executed there.
AWS Deeplens provides us with some sample projects that work more or less on a single-click deployment. Once on the camera, they are immediately executable. The sample projects include:
– Object recognition (based on Apache MX)
– A hotdog classifier (hotdog or not?)
– Dog or cat classification
– Style transfer
– Face recognition
– Detect activities
– Recognize head posture
In addition, models from Amazon SageMaker are available and can be integrated and easily deployed on AWS Deeplens. SageMaker is a convenient way to train the Machine Learning Model in AWS. You can find an intro video to SageMaker here: codecentric.ai.
But that’s too easy for us in this case, which is why we have to:
- use an external machine learning framework
- train and evaluate this in the cloud
- optimize the model outside of Deeplens
- deploy the result to AWS Deeplens
AWS Deeplens supports several Machine Learning frameworks, such as Apache MXNet Google TensorFlow as well as Berkleys Caffe Models. ‘Support’ refers in particular to the availability of the Model Optimizer on Deeplens, which is a wrapper around the Intel Open Vino-Toolkit. This optimizes the externally trained models for the Intel GPU. With some effort, it will probably be possible to get more frameworks running on the platform.
Setting up a Machine Learning cloud instance
Since we need a powerful GPU for the training, we have chosen a p3.2xlarge instance with 61 GB memory, 1 NVIDIA V100 GPU, and 8 virtual cores.
To set up our training environment, we used the Deep Learning Base AMI (Amazon Linux) image from the AWS Marketplace and upgraded to the latest Tensorflow version, the TensorFlow Models, Google’s protobuf, Jupyter Notebook and Tensorboard. It is recommended to load the data for the training either into a separateElastic-Block-Storage partition or into a Git repository, since it allows you to re-attach your data to a different EC2 instance. We needed two attempts to select the correct neural network as basis for our project, since Open Vino does not seem to support all versions of Inception v2 correctly. The final compatible version (.tgz) was only found as a link in the Open Vino documentation. To optimize the model for the Intel GPU, we set up a Deep-Learning-Base-AMI- (Ubuntu-)instance, because the setup of Intel Open Vino turned out to be a little complicated on other images.
Recommended Deep Learning EC2 Images
Collect & label training data
As training data, we used the same kind of pictures that we used for the HOG Detector. For the AWS Deeplens, we limited ourselves to static photos of our company cars and did not take any additional videos of passing vehicles.
For the labeling process, we used LabelIMG that gave us a proper PascalVOC dataset for further use with tensorflow.
LabelImg – petrol engine with green environmental badge
Tensorflow uses so-called tensorflow records, which contain the image and training data combined as a byte stream. Therefore, the data is more efficiently accessible during training. Details on the file format can be found here.
The Pascal VOC format resulting from LabelImg can be converted easily into the Tensorflow.records. Hint: the Git repository SSD TensorFlow contains an example implementation for the conversion pascalvoc_to_tfrecords.py*. (We didn’t try it because we wrote the converter ourselves.)
Training and evaluation
With a powerful GPU, training the net with a few hundred images only takes a few minutes. We have divided the available labeled images into two overlap-free sets:
training data 80%, evaluation data 20%. We use the evaluation data to assess whether we fall into overfitting during training. (For this purpose, we compare the development of the loss function on the training data with the development of the loss function on the evaluation data).
Starting the training with:
python object_detection/train.py --logtostderr --pipeline_config_path=/data/DeepDiesel/tdata/ssd_inception_v2_diesel.config --train_dir=/data/DeepDiesel/tdata/train_out_new/
and at the same time leaving the result validated against the evaluation data:
python object_detection/eval.py --logtostderr --checkpoint_dir=/data/DeepDiesel/tdata/train_out_new/ --pipeline_config_path=/data/DeepDiesel/tdata/ssd_inception_v2_diesel.config --eval_dir=/data/DeepDiesel/tdata/train_out_new/
A suitable template for the inception v2 config file can also be found in the Tensorflow model repository. The file describes the training and model parameters. Here you can also find important inputs for later optimization of the model.
Tensorboard is used to monitor/visualize the training. It displays information about the performance of the training and the evaluation set images that were processed by the detector. Furthermore, the structure of the model can be displayed.
tensorboard --logdir /data/DeepDiesel/tdata/train_out_new/
Tensorboard can be accessed on port 6006 of the corresponding machine (if the correct inbound rule is set in the security group – allow inbound tcp 6006)
Successfully recognized environment zone badge
At the end of the training, the model is ‘frozen’ to be used efficiently for detection.
python object_detection/export_inference_graph.py --input_type image_tensor --pipeline_config_path /data/DeepDiesel/tdata/ssd_inception_v2_diesel.config --trained_checkpoint_prefix /data/DeepDiesel/tdata/train_out_new/model.ckpt-263 --output_directory /data/DeepDiesel/frozen_diesel_new263.pb
(tensorflow):frozen_diesel_263.pb ec2-user$ ls
You can find an explanation on the process here (DE).
Optimization of the model for AWS Deeplens
Actually, it should be possible to optimize the existing frozen_inference_graph.pb directly in the deployed lambda function by calling mo.optimize at runtime. Unfortunately, an old Open Vino version was delivered with Deeplens and an upgrade is not easy at the moment. Therefore, we decided to do the optimization externally on our AMI Base Ubuntu instance. Open Vino can be downloaded directly from Intel: https://software.intel.com/en-us/openvino-toolkit/choose-download/free-download-linux
The following call optimizes the existing model.
./mo.py --input_model /tmp/frozen_inference_graph.pb --tensorflow_use_custom_operations_config extensions/front/tf/ssd_support.json --output="detection_boxes,detection_scores,num_detections" --input_shape="(1,300,300,3)"
In case of problems, a look into the documentation of OpenVINO is recommended, especially docs/TensorFlowObjectDetectionSSD.html explains important cornerstones. Here we also recognized our mistake regarding the wrong version of the underlying model.
The result should now indicate that the following three files are present:
Deeplens Project and Deployment
We create our own empty AWS Deeplens project with a custom model. Now we have to specify the model we want to upload and create a lambda function that detects AWS Deeplens (inference).
Regarding the model, we face the challenge of having to upload three model files instead of one ‘.pb’ file (in the tensorflow case). Unfortunately, this is not yet supported. Work-around: either the three files are packed into an archive and then unpacked first using the lambda function or manually using scp in /opt/awscam/artifacts. The model must be placed in an S3 bucket starting with “deeplens-“.
We create the lambda function starting from the template ‘greengrass-hello-world’, which contains all important modules and is accessible in the Lambda Template Library. A good example implementation for the required inference function can be found in the AWS documentation.
Detection of vehicles
To test the results in a real use case, we used the deep lenses with the model we trained on our company car park. With the first versions, the implementation still had to be readjusted. Here is an example where an incorrect label map was used and the detection threshold was set too low.
There are airplanes everywhere in the office.
Here Deeplens was built up on the parking lot
Consent to the use of the data 🙂
Evaluation of the live data
As a result, the performance of the detection was greatly improved compared to the HOG classifier. Practically all passing vehicles were detected correctly. We let the detection run at different driving speeds (6km/h, 20km/h, 30km/h, 40km/h and almost 50km/h) and detected the environmental zone badge almost every time in at least one frame. We could not test any faster in our parking lot. However, the applicable speed range is determined in principle by the speed up to which a frame can be recorded with low motion blur. Better optics and recording technology would further expand the speed range.
Conclusion and outlook
The implemented detector has made a considerable performance leap with regard to the application scenario. The implementation still has some framework-induced workarounds, but this might change as the framework implementations progress. The approach can be extended to better fit real-world scenarios. For example, events could be triggered in response to a detection. These events can be sent via the AWS IoT Message Broker towards consumers such as apps or IoT actuators. In addition, production systems for individual use cases can be built with relatively little effort.
The links to all parts and our Youtube channel can be found here:
>Deep Diesel – Part 1: Machine and Deep Learning for the recognition of environmental badges
>Deep Diesel – Part 2: Machine Learning diesel filter HOG detector
>Deep Diesel – Part 3: Deep-Learning diesel filter with AWS Deeplens
>codecentric.ai youtube channel
If you are interested in this topic, additions or questions, please contact me at firstname.lastname@example.org. Follow me on twitter: https://twitter.com/kherings