With this article we continue our endeavor of building dish-o-tron – an AI system designed to prevent the sudden appearance of dirty dishes in the community kitchen sink, and hence turning the community kitchen into a place of peace and harmony.
This is part 3 of the dish-o-tron series, you may want to start with the first part where we introduce the idea and the concept behind dish-o-tron and the second part where we collect the initial data set.
In this article, we use the data gathered in the previous part to build “the heart” (or – perhaps better – “the brains”) of dish-o-tron empowering it to detect dirty dishes. In concrete terms, we train a machine learning model which is capable of classifying images of sinks into clean (no dirty dishes) and not_clean (dirty dishes) using the fast.ai library and AutoML from Google Cloud.
If this is the point where you think to yourself “oops, I did not gather any data” – we warned you several times. It is absolutely necessary that you gather training data yourself to have the real dish-o-tron-experience. We strongly encourage you to revisit the previous article and gather your own data and, in particular, don’t download our pre-prepared dataset .
If this is the point where you think to yourself “Yay, I did gather my own data”: congratulations, you may now continue your journey and indulge in one of the most favourite occupations of every deep learner: watching an AI-model during training.
If you are a developer maybe you know watching your program compile or watching your CI/CD pipeline running tests. But watching an AI model train is something special. And if you watch harder (but not too hard) you might even influence the accuracy of the outcoming model! Depending on the architecture it might be necessary to watch deeper instead of harder – you will find out with further practice.
We start with a short excursion about the requirements of dish-o-tron.
Excursion: Revisit some requirements of dish-o-tron
The dish-o-tron needs to be able to set off an alarm in case of a coworker violating the general rules of using the community kitchen. In most kitchens there are rules are like:
- DO NOT PUT DIRTY DISHES IN THE SINK!
- Please, respect rule number 1 !!!1!eleven!
- If the dishwasher is running, take your stuff and leave it at your desk until the dishwasher has finished.
- EVERYBODY can empty the dishwasher.
- NO EXCUSES. DO NOT PUT DIRTY DISHES IN THE SINK. NEVER.
In many kitchens these rules are manifested on various posters, stickers and even laminated printouts! Some even colorize words (OMG!) to emphasize that really everybody should take care of this. But we all know it. We are rebels. While reading these signs one always thinks: One day, when nobody sees me, I will just put my cup in the sink and run!
So far, we are not sure if this is only a German thing and thus if you have rules like this in your community kitchen please share a pic by answering to this tweet. .
Because there is nothing we can do about this – we have to find another solution. The next reasonable step obviously is: Permanent control and punishment. That’s where dish-o-tron enters the arena. Inspired by the DEFCON levels of the United States Armed Forces we therefore propose the DISHCON levels (see this wikipedia article for reference.)
Since we are peace-loving problem solvers the escalations for DISHCON 1 and 2 WILL NEVER be implemented. Also privacy is important to us, so we will not record or save any images. We will not transfer any footage to the cloud. Dish-o-tron sees, maybe beeps, and then it just forgets.
Approach and Reasoning
Until only recently training a machine learning model for image classification would have required special knowledge in Data Science, however, current progress and development in particular in the environment of public cloud providers significantly simplified this task for problem solvers looking for rapid end-to-end progress.
This low barrier of entry into AI systems allows us to rely on existing libraries such as fast.ai and services like AutoML from Google Cloud to obtain a reasonable state-of-the-art vision model for our classification task. In this way we can build the first functioning prototype and focus on solving the actual problem at hand. At a later stage it might be useful to revisit the model training, however, the best model is useless as long as it is not integrated.
For many people dealing with AI and building neural networks from scratch is a lot of fun. However, be honest with yourself! There is close to zero chance that you will create something that will come close to existing solutions. In fact, you will spend lots of time for a worse outcome. It is essential that you focus! Don’t get sidetracked! You are a problem solver. Your goal is to solve an actual real-world problem. The AI model is merely a tool for you to bring peace and harmony to your community kitchen.
In the following, we pursue two options to obtain a vision model in just a few steps:
- We utilize the fast.ai library
- We use AutoML in the Google Cloud
Short sidenote: Yes, it might be useful to revisit the vision model at some point. At this stage of the project it is helpful to think about this point in time in terms of “as soon as 80% of all community kitchen sinks are equipped with a dish-o-tron”.
fast.ai is per se a great starting point if you want to start with deep learning and machine learning. With the mission of “Making neural nets uncool again” it provides a competitive high-level python library allowing for rapid progress while building an AI system.
The fast.ai library allows you to train state-of-the-art vision models in a few lines of code. To get started you use the following colab-notebook:
When finishing this notebook you will end up with a fast.ai model which is basically a pytorch model. This model can also be exported and used outside of the colab-notebook environment. However, so far we struggled a lot to deploy fast.ai models on edge devices and in particular on a Google Coral device. Somehow we did not find a painless way to do so. Feel free to investigate on your own and we are very happy if you reach out to us if you find a nice way.
AutoML is a Machine Learning Service from Google Cloud which allows you to automate the training of your own custom vision models. It comes with a graphical interface and the option to, e.g., export models to edge devices such as the Google Coral device. The only thing you have to provide are labeled images and money. Yes, that’s basically it: you trade money for AI-expertise and speed. For training a model with ~10.000 labeled images we expect costs of ~25 $.
Does this mean AutoML is always the right solution? Not at all! But it is a nice tool to have if you are looking for rapid end-to-end progress. This is particularly the case if the goal is to validate ideas. Here, learning slowly and struggling to make any real end-to-end progress with an idea in favour of saving a few bucks on your cloud bill is often the worst choice.
Obtaining an AutoML vision model requires four simple steps:
- A tiny bit more data preparation and uploading the data
- Creating the Dataset in AutoML
- Training a readily available computer vision model in AutoML
- Export the model (in a suitable format for the Coral device)
In order to follow along you require access to the Google Cloud and a Google Cloud project ideally with
project-owner access privileges.
ATTENTION: Not everything we do is covered by the free-tier and hence, some charges may apply.
1. Data preparation
Before we can use AutoML to train a vision model, we have to upload our data to Google Cloud and also prepare a CSV file containing meta information about the data such as, e.g., labels of the images. This is a necessary evil before we can finally lean back and throw some money at Google to do the rest of the work.
This Colab notebook should help you to take the final hurdle. Here, we provide a possible way to:
- Upload our data into a Storage Bucket in Google Cloud
- Generate the necessary metadata CSV-file for AutoML
Finally, we are in a position to use AutoML.
2. Creating the dataset in AutoML
The starting point for using AutoML is creating a dataset. Because we already uploaded the data into a GCS bucket and prepared the CSV metadata file, we can create the dataset with a few clicks in the UI. After triggering the upload the import will take some time. This is your chance to ponder about life and do some meditation. You could also watch some cat videos – if this is your thing – or just grab a cup of coffee. While you are in the kitchen there might be an opportunity to collect another dirty dishes video. Don’t get mad – you already made fantastic progress on your journey to build dish-o-tron.
As soon as the import is finished, we can inspect the dataset in AutoML. It is useful to make a few sanity checks at this point to ensure that the data is uploaded correctly.
3. Training the vision model
And now, finally, it is going to happen. We can start training the model with a few clicks in the UI. Because we plan on deploying the model on a Coral device, we choose the option “Edge”. For simplicity we select “optimize for best trade-off between latency and accuracy” and set (depending on the number of images) a suitable amount of node hours.
Please be aware that for each unit of time, Google Cloud uses 8 nodes in parallel, where each node is equivalent to a n1-standard-8 machine with an attached NVIDIA® Tesla® V100 GPU. Hence 8 node hours are approximately 1 “wall clock” hour. It is advisable to use the early stopping feature to ensure that training stops when further accuracy improvement is not possible. In the end, you pay only for the compute hours that are actually used.
Now push the final button.
You did it! You are now a real Deep Learner! Feel free to relax for a few hours and check at irregular intervals if the training is finished. This is your time to take a break without feeling bad about it. That is what being a Deep Learner is all about.
Training a model is a magical experience. Don’t forget to check on your model and observe it during the training every once in a while: Rumour has it that observing the training procedure will change the outcome of the experiment. There are even stories that the intensity of the observing influences the accuracy of the model.
When the training is complete or at the latest, when you are back at your desk and observe that the training is complete, it is time for a few sanity checks of the model. Again this is possible with built-in validations of AutoML. If the accuracy is below 95% there is a strong reason to believe that something went wrong with the data or the data preparation.
If everything looks fine, we export the model for coral devices.
That’s it! We have our first vision model for our dish-o-tron. Peace and harmony for your community kitchen were rarely as tangible as at this point in time.
Finishing this part of the tutorial is an important step for you and your future career as a professional problem solver. Frankly, that’s one (very) small step for Deep Learning, one giant leap for you – but that is okay. Be proud of yourself! This is how successful real-world problem solvers tackle AI tasks for the first iterations.
Okay, let’s make this more official: you have earned the AI TRAINING WATCHER badge (silver level)
Don’t be shy, you earned it! Feel free to print it out and proudly wear it however you enjoy!
In the next article, we will build the first physical version of DISH-O-TRON which can (and should) be put into use at a real community kitchen sink. Stay tuned!
Dein Job bei codecentric?
More articles in this subject area\n
Discover exciting further topics and let the codecentric world inspire you.