This is the second article in our dish-o-tron series (a non-standard Deep Learning tutorial) in which we tackle one of the biggest problems in community kitchens: coming across someone else’s dirty dishes. We are facing this problem by building a state-of-the-art AI-system – the dish-o-tron.
If this is the first time you hear about the dish-o-tron and you are interested in the whole story, you might want to start with the first part.
In our conception, the dish-o-tron uses a computer vision AI model in order to detect dirty dishes, hence we require training data to produce this kind of AI-model. A brief look around reveals there is no suitable data available. This might come as a shock. However, this realization is a typical issue for many problem solvers tackling real-world problems with AI. Don’t get discouraged!
In this blogpost we start building the dish-o-tron hands-on by gathering an initial “good enough” data set for the next steps. Although we have already collected a dataset that we will share with you at some point, we need to point out that we will NOT share the link with you until you have collected your own data. In order to have the real dish-o-tron-experience, it is absolutely necessary that you gather training data yourself.
Approach and reasoning
The high-level idea is to just start tackling the problem with an end-to-end solution. For the first prototype it is reasonable to take some shortcuts. In many cases this is a very promising approach. We do not want to start by collecting data for a few months, then train “the best” AI-model for a few weeks and finally try to set up a dish-o-tron on a kitchen sink.
Instead, we’d like to put a first version of the dish-o-tron on a kitchen sink as fast as possible and iteratively improve the solution. In this way, we can (hopefully) decide with more certainty which parts actually need improvement by taking into account real-world feedback.
So in this first article, we gather and prepare an initial data set for the dish-o-tron in a hands-on way and in three steps:
- Making videos of clean and dirty kitchen sinks
- Splitting the videos into images of clean and dirty kitchen sinks
- Splitting the images into training, validation and test data sets
As a starting point for your data collection we provide a Google Colab notebook to follow along. A Colab notebook is a service designed by Google that allows you to run code and train models in the cloud. And the best part: it is currently free. Think of a Colab notebook as a Linux Docker container that runs a web server that can execute Python code and even allows you to use a GPU for model training. You have a temporary file system in your container where you can download stuff and install libraries etc.
You can find the Colab notebook here. (You will need a Google account to be able to run the code. Save a copy of the notebook to your Google drive to persist your changes.)
Videos of kitchen sinks
As we have already mentioned several times (sorry but not sorry!), gathering data is a key step for many problem solvers to tackle actual real-world problems because these problems typically do not start with a polished Kaggle data set. For this reason, we strongly encourage you to leave the comfort of your desk chair and make videos of the kitchen sink in question.
Yes, at first glance, this is quite a hassle. However, gathering data on-site is a valuable learning experience because we obtain important information about the problem and its domain. In our case, this involvement with the domain, for instance, leads to questions like:
- What should the videos look like?
- What data is useful and required for solving the problem?
Thinking about such questions is important in order to tackle the actual problem at hand and not focus too narrowly, for example, on the AI component of the dish-o-tron.
A few further (possibly) helpful considerations about the videos:
- Take note of the future position of the dish-o-tron for the perspective
- It might be useful to take several videos of clean and dirty sinks with little changes such as:
- switching the lights on/off
- changing the position of the water tap
- repositioning of unrelated objects around the sink
- It might help to move the camera slightly back and forth to add some variance
- Dirty sinks come in many different configurations when making several videos with e.g. different plates/cups/cutlery, and changing their positions might be required.
This is certainly not a complete list, feel free to point out additional considerations for the videos.
Are you still sitting in your comfy desk chair? Did our passionate plea for the importance of gathering data not convince you? Okay, if you really want to skip this step you can just use our prepared dataset. Here is the link to that dataset.
You can do this! Gather data for your dish-o-tron! It’s worth it! Just grab your smartphone and follow these instructions:
- use a landscape perspective (please say NO to vertical videos)
If you are not sure why you should say NO to vertical videos, please study this comprehensive explanation on YouTube.
- film from a top-down position (not from the front)
- it’s allowed to have objects located next to the sink
- the difference between clean and not_clean is only determined by whether or not there are dishes IN the sink
- Don’t scroll down to read more. Take your smartphone and go to the sink.
PRIVACY WARNING: Make sure you do not record any personal things like photos or other people. Since you are recording video, also make sure you don’t record any conversations or other persons. Otherwise you won’t be able to share and talk about your great work later. Then you won’t become famous for building the best dish-o-tron in the world. As a result, you will not get the job as AI Lead at the self-driving car company and so on. So be careful, you have been warned.
Record 5 short videos (3-5 sec.) of a clean sink:
- slowly move the camera slightly to get some different angles and reflections
- for each video change some conditions e.g.
- switch light on/off
- open the tap to make the sink wet
- move the tap
- … be creative – what else could happen?
(sample video for not_clean sink)
Record 5-8 short videos of a not_clean sink:
- put dishes/glasses/tools/pans whatever into the sink
- for each video change something
- move the position of dishes
- put more / remove
- change light
That’s it. You have collected your very own first data to build the dish-o-tron. For the next steps this data will be enough. Will this data suffice to build a reliable AI product that works under every condition? Absolutely not! However, this data lays the foundation for building a running AI system and iteratively improving it.
Another source for additional data is your friends and colleagues. Just tell them about your journey to turn the community kitchen into a peaceful meeting ground. Ask them to provide additional videos of their dirty dishes for your collection. Believe it or not, this may be a nice door opener and starting point for interesting conversations with people to whom you didn’t talk for a while (and perhaps won’t for a while)!
Now go back to the Colab notebook and merge your data collection with the data that we provided. You can upload your additional videos into the Colab environment, for example via the UI in the panel on the left-hand side.
Put them into
data/video_samples and sort them into the right subfolders (first 5 into
clean and the rest into
not_clean). With this sorting step you have “labeled” the data. You told the dish-o-tron what a
clean and a
not_clean sink looks like. From your knowledge, dish-o-tron can learn everything!
That’s all the magic. You have put pictures into folders. Bravo.
OK, to be fair, labeling at scale is not that simple. Some datasets have millions of images. Some labels are not as easy to give as clean or not_clean. For example, a label could also be that you need to mark every pixel in an image where you see a road and seperate it from a wall. This might help a self-driving car to stay on track. Labels like this for millions of images can be very expensive – but also very valuable.
A short side note: Typically, AI systems benefit from large databases. Hence, we considered kick-starting a crowdsourcing campaign in order to gather a community DISH-O-TRON dataset. Potentially, this could improve all dish-o-trons around the world and crowdsourcing datasets would also be useful for various other kinds of problems.
In other IT communities, there are tools and platforms to collaboratively share and grow code. In some open-source software projects, hundreds or even thousands of collaborators are contributing to one big mission goal (often without getting anything back but good software). Unfortunately things like this do not yet exist to grow and collect datasets. But wouldn’t it be great to have a GitHub for datasets? To have a Kickstarter for data collection initiatives? To have hundreds of people around the world collecting data to feed the dish-o-tron? But maybe the data would just be too valuable to be shared – being the new oil or was it electricity? If you are interested in collaboratively building a large (dish-o-tron) dataset, please drop us a note.
Splitting images in train, validation and test datasets
A fundamental concept in training machine learning models is splitting the dataset into train, validation and test datasets. Because this concept is a comprehensive topic on its own, we only briefly discuss the intricacies of data splitting here. To get a better understanding, we strongly recommend familiarizing yourself with this topic further. A possible starting point is the articles here and here. (For our German speaking readers: You could also watch our introduction to machine learning video from our AI bootcamp here .)
Another side note: fast.ai is a really great starting point if you want to learn more about machine learning. Big kudos to Jeremy Howard, his team and the fast.ai community. You have been a great inspiration to us as well. We have watched your lessons. We love your practical way of teaching. Implement something and learn by doing! It’s not necessary and also not possible to understand all details of Deep Learning before you build something. Like this you will never build something. That’s how the dish-o-tron was born, by the way.
Choosing a validation dataset and test dataset in order to evaluate the model more or less defines the rules of the game. Hence, it is crucial to understand the implications of the chosen splitting approach. For example, in our case, the images originate from videos, and hence are not completely unrelated in this sense. Two chronologically close frames of a video might not be very different potentially resulting in two very similar images in the train and test dataset.
In the future it could be useful to have a test set for the dish-o-tron containing only images from kitchen sinks that are not present in the training and validation dataset. Many times it is not clear at the beginning what the test set should look like and creating a reliable test set is an art in itself. In many cases, we have to iterate and improve it over time.
A rule of thumb is that the test set should represent the actual real-life situation as well as possible. Therefore, there is a good argument in favour of putting data from the same kitchen sink into the train, validation and test set if the model for the dish-o-tron is only used for one particular sink.
Attention: the Colab notebook only provides temporary storage and all data will be deleted if the notebook is closed. Hence, to persist the data, you have to download it or store it, e.g. in your Google Drive storage.
In this article we tackled a sub-problem that appears to be tedious at first glance. However, gathering data, working and preparing data, understanding the data and its origins are fundamental tasks for problem solvers to understand the big picture of the problem at hand.
We hope that we were able to motivate you to actually get up from your desk and take videos of kitchen sinks in various configurations and work with the data. This is an important step to build the dish-o-tron and get the real problem solver experience.
In the next article we will use that data to train our first model. We will demonstrate this by using a service like Google AutoML as well as an easy-to-use framework like fast.ai. If you want to get in contact with us and maybe ask for the whole DIRTY-DISHES-DATASET, please answer to this tweet.
Continue with the third part of our series where we train the vision model.
Dein Job bei codecentric?
More articles in this subject area\n
Discover exciting further topics and let the codecentric world inspire you.